Top soft skills that should be highly valued by data scientists to distinguish themselves from the crowd
In this story, I’ll share with you my top 5 soft skills that should be highly valued by Data Scientists to distinguish themselves from the crowd. These observations are based on my experiences over the past decade, working with and managing Data Scientists.
1. Curiosity
I won’t catch you off guard with this one (or the next), yet curiosity remains a key characteristic for Data Scientists (and numerous other professions).
Keep learning
Data Scientists should value curiosity a lot because curiosity fosters continuous learning and adaptation in the rapidly evolving field of data science.
Given the constant influx of innovations from research papers or conferences, a curious mindset encourages Data Scientists to stay abreast of emerging trends, embrace new tools, and refine their skills.
I’ve experienced moments in my career where I would focus on a topic for months, lowering my head and forgetting about everything else. At some point, my curiosity brings me back to earth and urges me to open my eyes and look for what happened in the field during that span. Without it, I would get stuck and become outdated.
Think outside the box
Also, curiosity is a source of creativity in problem-solving.
Data Science often presents intricate challenges that demand innovative approaches and is also often about finding new ways to apply old and proven approaches to new domains in ways that have not been tried before.
Thus, curious minds are more likely to think outside the box, widen their options, and try new things out. That’s all you need to solve problems!
Creativity doesn’t find its lone source in curiosity. For example, I think that a team composed of people with diverse backgrounds has more chance to foster creativity. So if you are a manager, think about hiring people very different from you or your colleagues, be curious about unorthodox career paths, and you might find gems.
How to bolster your curiosity?
- Read
You should read more blogs and more books (or whatever other formats you like to read/listen to). You should pick topics you’re not experienced with, topics going beyond what you usually read. If you have a scientific background, read about business, communication, decision-making, etc.
I like reading about topics I’m passionate about like Basketball, Data Science, or Data Engineering. However, I also try reading more about software engineering, marketing, entrepreneurship, biology, personal finance, etc.
2. Step outside your comfort zone
Seize any opportunity to leave your comfort zone and learn new things. If you adopt a position where everything is of interest, and everyone is worth listening to, you’ll widen your world and start questioning yourself about things outside your domain of expertise.
2. Communication
In any profession, you’ve likely encountered the challenges of collaborating with someone who struggles with effective communication. This issue becomes particularly pronounced in the realm of Data Science, where adeptly conveying technical concepts presents a formidable challenge.
Adapt to your audience
Data Scientists are often at the interface of several teams, working with various stakeholders, and people with diverse backgrounds. So bridging the knowledge gap, and ensuring everybody has a common understanding of what is said is important.
Without clear communication, collaborating becomes rapidly impossible and might jeopardize a project, a team, or your job.
In my previous experience, I’ve always been impressed by people able to express their current work without using any technical concept. For me, it tells a lot about how much one person is in control of the subject and gets an overview of it before talking.
Communicating results efficiently
Even if just 1% of your project time is dedicated to communicating results, this seemingly small fraction often holds 99% of the value to be conveyed.
Regrettably, I’ve noticed that many junior Data Scientists devote weeks to perfecting their technical methodologies or data analyses, only to stumble when it comes to delivering impactful insights due to inadequate communication.
How to improve your communication?
- Practice
You should consider every opportunity to interact with people as a chance to practice and refine your communication skills.
For example, if you have daily meetings, prepare them for a few minutes and make sure to deliver clear communication to your audience.
Whenever I have an “important” meeting, I take 5 minutes to prepare. My recipe is the following:
- take 1 minute for a zygomatic warm-up (it helps to improve pronunciation) -> https://preply.com/en/blog/english-pronunciation-practice/
- define my objectives
- identify what my audience’s interest in listening to me
- write down key topics or a structured summary of my intervention
Always make the hypothesis that they don’t know anything about what you’re doing, and help them close the knowledge gap so that they feel comfortable with what you are saying.
Also, ask them for feedback: do you remember what is my primary objective today? Was it clear enough?
Resources:
- https://hbr.org/2023/06/how-to-give-and-receive-critical-feedback
- https://kimmalonescott.medium.com/asking-for-feedback-how-to-solicit-radical-candor-823dab2860c0
2. Read
Read books about communication! I highly recommend beginning with “Made To Stick” by Chip and Dan Heath, a book that offers a methodology for making your ideas more memorable and impactful.
You can also find interesting resources on the internet for free like https://online.hbs.edu/blog/post/communication-techniques
3. Scientific Mindset
It is easy to forget about the “scientist” part of the “data scientist” role name, but it is very important to remember that a Data Scientist must apply a scientific approach to solving problems using data.
Scientific rigor forms the bedrock of Data Science, ensuring that analyses are robust, reliable, and reproducible. Applying scientific rigor mainly refers to adhering to a rigorous methodology, and critically evaluating findings.
You can be a great software engineer without applying any scientific methodology. But you won’t be a good Data Scientist without it.
In my opinion, this is not an option. Even if you’re not a research scientist, exercising scientific rigor is vital to mitigate the risk of erroneous conclusions. That’s why, whenever I discuss experiments and results with a colleague, I tend to value the scientific approach over the results in the first phase.
Tips to embrace a higher level of scientific rigor
- Problem definition
It’s easier said than done. But, if you are solving the wrong problem, you won’t go anywhere and lose your time. Unfortunately, most people jump on the first available version of a problem because they prefer writing code. So, the solution is simple, invest more time in defining the problem, discussing desired outcomes with stakeholders, and setting proper starting hypotheses/constraints.
2. Statistics
Statistics is one of the 3 pillars of a Data Scientist’s hard skills. And, any scientific approach needs a fair use of statistical tools. For example, statistical tests will help you check for feature correlation or data distribution. So, if you’re not comfortable with it, think about leveling up your statistical arsenal.
You can start right now on Medium: https://towardsdatascience.com/ultimate-guide-to-statistics-for-data-science-a3d8f1fd69a7
I recommend also this comprehensive book: https://www.oreilly.com/library/view/practical-statistics-for/9781492072935/
3. Tools
People tend to try multiple things at once to save time. But going step by step, assessing one thing at a time is mandatory to make sure you understand what’s going on and draw the right conclusions.
Using the appropriate tool can significantly facilitate the process. This is precisely the function of “Experiment tracking” tools, which are increasingly used within the Data Science domain.
For personal projects like Kaggle competitions, I like using DVC which introduced experiment tracking features a few years ago. However, much more advanced tools are available on the market like MLFlow or Neptune.ai.
You can find a comprehensive comparison of experiment-tracking tools here: https://towardsdatascience.com/a-comprehensive-comparison-of-ml-experiment-tracking-tools-9f0192543feb
But, you don’t need a sophisticated tool to take notes of your thoughts, questions, and experiments. So I’d recommend at least taking the first step by just writing down things on a notepad.
4. Integrity
As any Data Scientist knows too well, a data analysis can vary significantly based on the narrative one wishes to convey to his audience.
Is he lying with data?
There is no denying that conclusions drawn from a dataset are inherently influenced by the perspective and intentions of the presenter. This simple fact underscores the importance of integrity.
Resources:
- How To Lie With Statistics by Darrell Huff
- https://towardsdatascience.com/lessons-from-how-to-lie-with-statistics-57060c0d2f19
With great power comes great responsibility. But, this is not the only reason why integrity is important to me.
Challenge model bias
Bias in models is also a now well-advertised issue every Data Science practitioner should care about.
Tackling this problem is a difficult task but everybody should feel concerned because of potential business impacts, and more importantly, because of the potential societal impact one biased model can have.
Interesting resources on the topic:
- https://developers.google.com/machine-learning/crash-course/fairness/video-lecture
- https://pair-code.github.io/what-if-tool/
If you’re an AWS user like me, Sagemaker Clarify provides numerous analyses for bias detection.
If you are interested in fairness for Deep Learning, take an attentive look at DEEL lab publications on the subject and their open-source solutions like Influencia.
Environmental Impact
Another ethical aspect of Data Science is its environmental impact which is often downplayed due to its complexity in measurement and comprehension.
I guess the research in this area is expanding, and I would appreciate hearing from you if you have materials to share with me on the subject.
It is imperative for Data Scientists to actively assess and mitigate their environmental impact. For instance, they must question whether processing an extensive quantity of data is essential to achieve the desired business objectives.
Additionally, they should explore methods to minimize the environmental repercussions of their models. Then, sharing results and perspectives with stakeholders will broaden awareness of the environmental implications inherent in data-driven decision-making.
There are many more dimensions to this problem and I’ll think about it for a future story.
Interesting resources:
- https://www.oecd.org/publications/measuring-the-environmental-impacts-of-artificial-intelligence-compute-and-applications-7babf571-en.htm
- https://www.nature.com/articles/s42256-020-0219-9
- Editors like AWS may provide tools like https://aws.amazon.com/aws-cost-management/aws-customer-carbon-footprint-tool/
If you want to start thinking about your code carbon footprint:
- https://mlco2.github.io/codecarbon/
- understanding where your code is inefficient is also a good start: https://pyinstrument.readthedocs.io/en/latest/
How do you keep your integrity?
Integrity is about being honest with yourself, consistent in your values, and acting according to your principles. So, the first thing is questioning yourself about your core values and the best way to incarnate them at work daily.
Then, resist external pressure and stay true to yourself. Also, don’t ignore the ethical challenges of the domain. They are an increasing concern for society, and we are responsible for providing solutions.
5. Being value-centric
All Data Scientists just love exploring data and building models. That’s why Kaggle is so popular. As a professional, it is easy to be trapped by infinite data exploration, unbounded experiments, or model optimizations.
Value-centricity refers to an approach or mindset that places a strong emphasis on delivering value as the primary objective in decision-making, problem-solving, and overall strategy.
So, in the context of Data Science, being value-centric means you’ll have to keep your focus and employ your skills to create value rather than losing your time with the technical issues you’d like to solve to reach an optimal solution.
Once again, the best Data Scientists I worked with are exploring data on purpose, asking and answering the questions that will help them solve the right problem. Then, they conduct the minimum required experiments to draw a solution and build an MVP. They quickly go to production to see what happens, and iterate.
This path is made of numerous tradeoffs between optimizing things and adding incremental value to the end user.
How to be focused on value?
One hard thing about being value-centric is recognizing that even if you’re not building a comprehensive data product by yourself, you’re building a piece of it, so you must conform to a product mindset and focus on the value you’ll ultimately generate for an end user.
Your decisions should always assess the time it will take to do something compared to the value it provides in terms of product. Some things are important but can be postponed for future iterations, other things are not interesting enough to be done.
When building a Data Science model, it is often possible to quickly assess if a model would yield good enough value depending on the expected performance and how it will impact the business.
For example, if you are not familiar with building custom scoring functions for evaluating your model based on nontechnical metrics, take a look at this: https://towardsdatascience.com/calculating-the-business-value-of-a-data-science-project-3b282de9be3c
At some point, you may also face the opportunity to improve model performance. This is the right time to understand what kind of improvements will be valuable. Is increasing your F1 score by 0.01 worth it compared to the effort you need to provide? Does it require collecting 100k new labeled data points?
As product owners and other software engineers might not understand all the technical aspects of developing a model, it is your responsibility to make these decisions.
Other interesting resources:
- The work ethic concept captures the ideas of value and ownership: https://hbr.org/2022/09/how-to-develop-a-strong-work-ethic
Conclusions
To become a better Data Scientist, you should focus on developing your curiosity, communication, integrity, scientific mindset, and value-centricity.
Most of the time, I would recommend reading books, but there are also numerous other valuable materials such as online courses and blog articles. Some of these skills can only be developed by being confronted with reality. Thus, be aware and prepared for the opportunities you’ll face.
If you would like to receive notifications for my upcoming posts regarding Data Science and more, please subscribe here or follow me.
Did you know you can clap multiple times?