Office Hours

The What, Where, and How about continuously learning on the job as a data scientist

Continuously learning is a pre-requisite to grow as a data scientist. But how to keep learning while we are busy with our day-to-day tasks?

Quoc Tien Au
Towards Data Science
7 min readSep 28, 2021

--

Data science is a rapidly evolving field that requires anyone working in it to keep up to date. Some tools that increase our productivity by ten fold didn’t exist 2 years ago. Continuously learning is a pre-requisite to provide the most efficient solutions to the problems that matter to our work. But how to keep learning while we are busy with our day-to-day tasks?

What to learn?

There are tons of learning resources out there, mostly free. This is a blessing and a curse. It’s easy to get trapped into paralysis by analysis, trying to decide what we should learn next, instead of actually learning. Here are 4 questions I ask myself before learning something new:

  1. Is this new tool or method going to help me get my work done? This can range from how to deploy machine learning models on AWS, to how to version data with DVC, or how to process time-series data. What knowledge is needed here to deliver as much value as possible?
  2. Am I genuinely interested in this topic? Even if the topic isn’t directly applicable to my work today, it might be useful knowledge in the future. Learning should be fun anyway. For example, I studied transformers and BERT-like architectures in 2019, knowing that I could use them one day, and eventually applied them a year after at work.
  3. Does it help me become a full-stack data scientist? I like to own data science projects from A to Z, or at least know what is involved at each stage of the project. For example, I like to learn about data engineering and how to build the minimum viable data pipelines, or about product management and how to build a data product that solves business problems. It then also helps me to communicate with these teams better.
  4. Has someone else applied it to their work projects and got value from it? My sweet spot is when a data team have got value from a project, and their data maturity is one stage ahead of mine.

A good rule of thumb is to learn things that interest you, that you can possibly apply in your job (now or in the near future), that some people have applied in their job and got value from it.

If a random person on the internet tells you to learn X tool / method / algorithm, it might be a good idea to cross-check it with these questions above. If they are answered positively, now it’s time to focus and learn.

Don’t forget that learning is not linear. You can focus on one topic for a couple of weeks, then forget about it and move to the next one, and eventually come back to it. You don’t have to learn everything about optimization algorithms in 2 weeks.

Where to learn?

Now that you’ve narrowed down onto one topic, it’s time to find your learning resources. Personally, a Google search has never given me any tangible resource. Instead, I usually do my research on the following platforms (ranked by preference). By no means is this list exhaustive.

  • Ask a friend / colleague you trust. This is the fastest way to get started.
  • Newsletters. A weekly email to consume curated resources. Even though it’s curated, there is still noise to cut through. I usually read about 1/4 of the resources weekly, and maybe 1 resource out of 3-4 newsletters is eye-opening. My favourite ones are: Data Science Weekly ; Data elixir ; blef. I am subscribed to a couple other ones, but I don’t read them as much.
  • Company tech blogs. Lots of teams are writing publicly about their projects, usually to attract talents and improve their brand. Those blogs are gold mines. You can read about projects that are used in production, the methodology from A to Z, the challenges that they faced, how they overcome them, their next steps. Eugene Yan has gathered a fantastic list of resources in his repo Applied ML. My favourite ones are from companies that I see myself working at, or from companies that are one stage ahead of mine in their data science maturity.
  • Practical tutorials or long-form blog posts. After getting an introduction to the concept I want to learn, I like to follow a tutorial that gives me ways to practice, and directions into what I can go in depth with next. I am a fan of Made With ML that provides very practical advice on the whole DS project life cycle. Another example of practical tutorial is the Annotated Transformer that guides you through the famous paper Attention is all you need with detailed code, comments and explanation.
  • Follow the companies that are building the tools you have been using. They will usually post use cases from their customers, which can in turn give you inspiration on how to better use it. Follow their updates, their documentation, their roadmap, and you will have a glimpse into what they consider is cutting-edge. For example, I follow Explosion (spaCy, prodigy), DVC, neptune.ai, Monte Carlo Data. If they have an open-source product, it’s also a good idea to explore their codebase and get inspired from it.
  • Textbooks or course recordings. If I want to dig into a particular topic once I am familiar with it, I like to study with a textbook, or a recording from an university course. It gives me enough details to fully grasp the pros and cons of the method. I also like the questions and exercises at the end of the chapter, that generally make you think about the concept with different angles. I find myself regularly coming back to Speech and Language Processing, Elements of Statistical Learning, or Pascal Poupart’s Youtube channel.
  • Conference papers. I am not a research scientist so I don’t feel the need to keep up with the state-of-the-art. Although I like the papers that come from specialized workshops, or that present SOTA methods uniquely applied to a specific domain. For example, I was following the BEA workshop (Workshop on Innovative Use of NLP for Building Educational Applications) when I worked at an EdTech startup, and now I am following the work at Climate Change AI due to my work at Manifest Climate.
  • Social media. There is a lot of noise in what people share on social media. However, I find a lot of value in some LinkedIn posts. Especially, I like when people share practical and actionable advice, or personal experiences. I ignore posts that just share a link to a tool / another blog post / a repo / generic advice with zero added value. For example, I follow Vin Vashishta, Eric Weber, and Daliana Liu among others.
  • Podcasts, Youtube videos and online courses. I am not a fan of the audio format because I like to read. But there is a lot of valuable content out there. Coursera is a good place to start.
  • Collaborative spaces such as Kaggle, DAGsHub, Colab where people share notebooks and code snippets. I haven’t explored this avenue yet, but it seems promising.

There is a lot of platforms to which one will be more or less receptive. In a future article, I will give you my secret tips to cut out through the noise and select the resources that you need, when you need it.

How to keep learning while working?

At this stage, you know what you want to learn and where to learn it. The final question is how you will integrate this learning process into your workflow.

First, ask yourself if your employer expects you to keep learning, keep growing and keep sharpening your skills. If not, try to influence the work culture towards continuous self-improvement. Then if it doesn’t work, you might want to start interviewing, so that you can keep growing as a data scientist. The current market is a candidate market, and employers should invest in your self-development, with time or/and money. Take advantage of that, and don’t be afraid to integrate your learning as a work task.

Here are some tips to get you started with continuous learning at work:

  1. Create a recurring event in your calendar and consider it as non-negotiable (e.g. no slack distractions, no conflicting meetings, no last-minute appointments). Be realistic on the frequency. Start with a 20-minute session once a week, and increase the frequency and time once you are comfortable planning time to learn at work. I usually have 3–4 sessions a week.
  2. Focus on one concept or theme for at least a couple of weeks. You will learn faster and have a stronger knowledge retention. The goal is that you are comfortable enough with the new concept so that you can apply it with confidence in your work.
  3. Pick a couple of resources and start learning. Don’t get caught up with analysis paralysis. Once you start learning, you will have a better idea what the next steps are and what resources make sense to you at your stage.
  4. Take notes. Then, try to recall your notes by memory. It will fast-track your learning.
  5. Teach it to your colleagues. It will influence a healthy work environment where everybody is empowered to share their knowledge. In my next article, I will elaborate about learning as a data science team. Stay tuned!
  6. Once the above has become a habit, you can start being more intentional in your learning. For example, establish a quarterly learning roadmap, and divide it into 2–3 week chunks with resources to learn at each stage.

I hope this article will inspire you to integrate learning in your workflow everyday. Keep growing as a data scientist, by structuring your learning flow, focusing on one concept at a time, and making it an enjoyable habit.

Sneak peek at my project.

Connect with me on LinkedIn, I am always happy to chat about data science / machine learning / NLP.

--

--