The world’s leading publication for data science, AI, and ML professionals.

For New Data Scientists, Domain Knowledge Is Sometimes More Important than Technical Skills

Sophia Yang talks about the importance of ongoing learning and finding great colleagues and mentors.

Author Spotlight

In the Author Spotlight series, TDS Editors chat with members of our community about their career path in Data Science, their writing, and their sources of inspiration. Today, we’re thrilled to share Sophia Yang‘s conversation with Ben Huberman.

Photo courtesy of Sophia Yang
Photo courtesy of Sophia Yang

Sophia is a Senior Data Scientist at Anaconda, Inc., where she manages key metrics, data pipelines, and models, and uses data science to facilitate decision making for various departments across the company. She is also the author of multiple open-source libraries such as condastats, cranlogs, PyPowerUp, intake-stripe, and intake-salesforce. She holds an M.S. in Statistics and a Ph.D. in Educational Psychology from The University of Texas at Austin.


First things first: How did you decide to go into data science?

My background is in Statistics and Psychology. I received a lot of training in causal inference, experimental design, and statistical modeling in graduate school. I used experiments, survey data, and longitudinal historical data to build statistical models and generate insights about human behavior. I love working with data and I wanted to keep working with data after I graduated. So, for me, transitioning to data science is quite natural.

Looking back, what were the most difficult aspects of launching your data science career?

When I first started my data science job, I did have some challenges in both the technical and business areas. But rather than seeing them as challenges, I see them as opportunities to grow.

Technically, I was trained in statistics, but not so much in computer science. So, when it comes to GPUs, parallel programming, big data processing, and so on, I was not prepared and did not know how to do them. Luckily, I have great mentors in the company and I was very eager to learn all the technical skills that I did not have before. I also took some related online courses to help me learn and grow.

Business-wise, the change of mentality from academia to business could be challenging. Industry projects can be very different from research projects. Publishing papers is no longer the goal. You are often in a tighter timeline and your work is often associated with revenues and financial consequences. I have to say that I like the change and embrace the change. Being able to see the direct impact of my projects is more rewarding than publishing academic papers.

Based on your own experience, what advice would you give to aspiring data scientists who are taking their first steps in the field?

I think domain knowledge is sometimes more important than technical skills. So, I would encourage people to try to do data science in their own field first. Come up with a meaningful data science question that you are interested in solving in your domain or even in your everyday work, establish hypotheses, gather the data you need, and start from there.

Another thing is never stop learning. There is so much to learn in the data science fields. Fortunately, there are also many learning resources, meetups, and conferences that help people learn. Also, I think it’s important to find role models, be around people you want to be like, and learn from them. Learning and growing everyday. Over time you will become more and more like the person you want to be.

What do you enjoy the most about your current role?

I have great mentors and coworkers. It’s the people who make the working environment enjoyable.

In terms of projects, as one of the only two data scientists in our company, I get to work on a variety of projects with a wide range of stakeholders. I work closely with product, sales, and marketing to understand user trends, product features, and identify business opportunities. I really enjoy working on my projects end-to-end. I define the scope of my projects, write ETL pipelines, create visualizations, build models, deploy dashboard and models, and then translate results into business.

How does your public writing fit within the context of your other professional activities? What inspired you to start?

At first, I just wanted to document what I have learned or done as notes to myself. Then when I would need to do similar things in the future, I could come back to my notes and know exactly what to do.

After a few articles, I realized that writing is a great way to learn. People always say teaching is the best way to learn. I think that writing is the best way to learn as well. Even on the topics I thought I knew very well, I was able to learn a lot during the process of researching and writing.

And to end on a future-facing note, what change do you hope to see within the data science community over the next couple of years?

Our CEO, Peter Wang, always says "data science is literacy, not a job." I think more and more people are going to become data science-literate. Python and R are the new excel. Everyone will be able to speak data science and use data science tools in their own domains.


Curious to learn more about Sophia’s work and projects? Follow her on Medium, Twitter, and LinkedIn. Here are some of our favorites from Sophia’s collection of Towards Data Science posts, which range from industry-oriented tutorials to clear, beginner-friendly explainers.

  • "Multiclass Logistic Regression from Scratch" (April 2021, TDS) "A lot of people use multiclass logistic regression all the time, but don’t really know how it works," so Sophia wrote a comprehensive walkthrough, complete with a Python implementation.

  • "Testing for data scientists" (January 2021, TDS) Software testing is commonplace for developers, but a less common practice among data scientists. In this post, Sophia covers two tools—Pytest and Hypothesis—that make unit testing accessible.

  • "Customer Lifetime Value in a Discrete-Time Contractual Setting" (August 2020, TDS) Here, Sophia draws on her industry experience and explains one of the most important metrics in the business world: customer lifetime value (LTV). She includes a math and Python implementation for good measure.

  • "Jupyter Workflow for Data Scientists" (December 2020, TDS) Jupyter notebooks are everywhere in the data science world, but quite a few practitioners don’t like using them; Sophia shares a practical workflow that helps her leverage the tool’s power from setup to deployment.


Stay tuned for our next featured author, coming soon. If you have suggestions for people you’d like to see in this space, drop us a note in the comments!


Related Articles