The world’s leading publication for data science, AI, and ML professionals.

When Pandas Stopped Conjuring Bears for Me, I knew I Had Become a Data Scientist

Three things to stop worrying about as a young data scientist

Getting Started

Bears have plagued my professional life.

When I was a graduate student in psychology, there was this researcher we learned about, a guy named Dan Wegner, who was famous for studying unwanted thoughts. In his notorious experiments he would demonstrate how asking people not to think of things, like white bears, would backfire and result in high rates of intrusive thoughts related to the thing people were asked to suppress. "The white bear phenomenon" or as it is more aptly named "Ironic Process Theory" helps to explain all kinds of mental problems.

Not long after leaving grad school, I found myself confronted with bears all over again. This time they were Pandas. I remember thinking the Python library name was funny and as I went about first learning how to analyze data using it, my thoughts were intruded by images of the black and white lumps of fur.

But these intrusive bear thoughts didn’t last. As I learned more about Data Science and became more adept at using the Pandas library for manipulating data, my images of actual pandas were slowly replaced by dataframes and lines of code.

And as these mental images changed, I knew that I was no longer a hopeless psychologist suppressing thoughts of white bears, but rather a data scientist with a whole new set of anxieties surfacing in my thoughts.

Lest you think this post is only about bears, it isn’t. Instead, this article is about the things those bears have conjured in me over the years, mainly anxiety.

Thus, I want to focus on 3 common anxieties faced by young, developing data scientists. I was there too, and I faced them as well but came out on top. Herein, I share how I view them now, after working in the field for nearly 15 years. So, stop with the white bears and pandas, and let’s address the other elephants in the room for students of data science!

Data science education and anxiety seem to go together. Data science is a complex mix of many different skills. It encompasses lots of different technologies, requires some understanding of mathematical models, and is best served with critical thinking. As a result of its complexity, there is a lot of uncertainty when learning data science. And that uncertainty creates a lot of anxiety.

Anxiety #1: Focus

One of the biggest concerns for young data scientists is knowing what skills to focus on learning. In fact, just the other day I was at a university sponsored conference for a diverse group of young data scientists. During a breakout room session, one student asked:

"What skills do you think are the most important skills for me to focus on?"

At the heart of the concern is the realization that the field is evolving quickly, and with any growing field, specialization becomes more and more of an issue. Why hire a data scientist who knows a little about a lot of data science tools when you can hire a data scientist who knows a lot about a limited set of tools that apply to a specific business problem?

The problem with the specialist is, once they have solved the problem at hand, their utility for the company decreases. The problem with the generalist is, it takes a bit more time to experiment while working on new use cases.

At the end of the day, specialization is important because it provides the context under which to learn the data science process. That said, specialization should not be an end but rather a means through which students can learn how data science works more generally. Therefore, it is important for students to focus on learning a specialized skill set while remaining cognizant of how those skills generalize to other uses.

The key here is to recognize that it is not the specific skill set that matters most but rather that you were able to take a complex ensemble of skills, put them together, and solve a problem.

Focus less on what specific skills you should be learning and more on doing something useful with what you learn. You control the narrative with prospective employers or clients so make sure you can demonstrate that you understand how to apply a specialized set of tools to solve a specialized problem. Go further by explaining how the same process can apply to use cases that may be more relevant to them.

In short, the skill you should be focusing on is understand how the specific skills you learn are also part of a more fundamental process that applies to most data science problems. Demonstrate how you can move quickly from learning to applying.

Anxiety #2: Confidence

This second anxiety is confidence. Most young data scientists lack confidence in their ability to solve business problems with their newly acquired skills. The lack of confidence is inspired by two main forces: complexity and ego.

What do I mean by complexity?

One of the most popular data science libraries in Python is Scikit-Learn (sklearn). Sklearn contains nearly 200 different models or "estimators)." And that’s just one data science library. Add on to that the plethora of deep learning model architectures, hybrids, and ensembles and the choices for models are overwhelming. Not to mention all the techniques and technologies available for preprocessing data and deploying solutions, both of which have an impact on overall success.

All this complexity adds up to a high degree of uncertainty thereby reducing confidence.

The second force affecting confidence is ego. Data science having the term "science" as part of its definition comes with the baggage of academic scientists. Namely, there are some big egos in the space that are happy to judge the ability of newly minted data scientists entering the field.

These egos can make it feel as though the application of data science in business is more rigid than it really is. The bottom line, nobody knows everything, especially when it comes to data science in the wild.

Be confident in what you know because you have successfully applied it. Build confidence in what you are capable of learning because you continue to apply new things.

Anxiety #3: Appropriateness

The final anxiety that I’ll share, and I see a lot of students also going through, is appropriateness. This anxiety is related to confidence, but extends into concern over what is an "appropriate" use of data science. For example, I used to always question whether my solution was "right" or "correct." It never dawned on me that for most business problems there is no one single "right" or "correct" solution.

Moreover, there are simply too many possibilities to ever know the absolute "best" solution. So instead of worrying about whether your solution is appropriate, focus more on whether your solution brings value by solving a problem. Here is a simple framework for demonstrating value to a consumer:

Consider a good benchmark like an existing process, human error (humans make roughly 3–6 errors / hour), or a baseline model.

Build your solution around that benchmark to demonstrate its value.

Iterate from there based on business need, feedback, and ongoing evaluations of performance with new data.

That concludes just a few of the more common anxieties that I experienced early in my career and students of the field continue to express even today. I hope you have found this helpful in setting expectations and providing you with some insight to help overcome those anxieties.

Like engaging to learn about data science, career growth, or poor business decisions? Join me.


Related Articles