The world’s leading publication for data science, AI, and ML professionals.

How to Apply Your Hard Earned Data Science Skillset

Taking the leap from learning to application can be tough. Here are some things to look out for to smooth the transition.

Don't spend all of your time in the classroom (photo by Shubham Sharan on Unsplash)
Don’t spend all of your time in the classroom (photo by Shubham Sharan on Unsplash)

Beware the Bookworm

It shouldn’t surprise you to know that I – like many data scientists – am a bit of a bookworm. I love to read and try to pick up new skills. There’s a special feeling when you hear something that just clicks and makes you think about something in a completely different way.

One thing that’s bitten me again and again throughout the years, however, is that gap between reading how to do something and actually doing it. Whether it be cooking, sport, drawing, or anything – you just don’t get there as quickly learning through reading as you do by getting the practice in.

What’s worse, in some cases the book learning can actually be harmful to your progress and slow you down if you don’t apply the knowledge your’re gaining.

I think this is especially true of Data Science – for both books and many of the courses available today.

Now, don’t get me wrong – I’m an avid believer that reading will have a positive impact on your capabilities and overall skillset. There are just some things that feel very different when you’re applying them.

I’m also a big fan of running through courses too. My first steps into understanding Machine Learning and data science were taking the original Machine Learning by Andrew Ng on Coursera.

Both books and courses serve a purpose and can provide a solid foundation in data science. If you rely on them too heavily though, without applying that knowledge you’ll soon realise that many real-world problems aren’t as clear cut.

Sometimes even the decision to use machine learning or not can be difficult.

From Learning to Leverage

So, how do you leverage these skills and get the best mix of academic and practical experience?

Data science is still a young field and it’s still rapidly changing. Many of the foundational components that make up the data science skillset are vast areas of research in their own rights. It takes expertise to teach and extensive experience to progress these fields.

Many pure academics advancing the fields of machine learning or statistics may never touch a real business problem.

So don’t be caught up in forever learning the latest algorithms and approaches. Or spend too much time trying to master academic problems. Take everything you can from these incredible people and the learning resources they deliver – then roll your sleeves up and get stuck into some real problems.

Here are some guidelines for some easy and practical first steps to get out of the classroom and into the office:

  1. Use the simplest solution available, where possible. It may be hard to do quantitatively, but try to at least mentally add a penalty term to your merit function that penalises you for more complicated solutions. If you can use standard, open-source, well-understood libraries to solve your particular problem, then do so. You’ll soon learn that the added headache of maintenance, debugging, support, and integration that comes with custom solutions will actually cost you more in the long run.
  2. Discover how others have applied the techniques you’re learning to real-world problems. Learning about XGBoost and work for a financial services organisation? Search online, check communities like Medium or Kaggle to see how it’s being applied and any of the hidden difficulties that aren’t apparent from the simple notebook example you’re starting from.
  3. Worry more about the data than the model. I understand, it’s fun to test models and tweak hyperparameters. Worse, eking out half a per cent here or there can actually give the illusion of progress. The largest jumps always come from improving the data quality, engineering new features, or taking a different view of the data.
  4. Master your data manipulation tool of choice. This is closely related to the point above. Many projects require vast amounts of data manipulation and a tiny bit of model tuning. If you use something like Pandas, are you comfortable with the apply method? What about multi-level indices? Knowing these features will make your life much easier. You can never know too much SQL.
  5. Learn how to interpret, compare, and communicate model output. When to consider accuracy? precision? recall? What does ROC AUC tell you? Learn how to explain your models too. Model interpretability can sink or save a project.
  6. Seek out weird and wonderful datasets in domains you love. This is super important – you need to be interested and driven to finish things. It’s great to build things with example and demo datasets but they won’t make you stand out and it’s more likely potential hiring managers have seen a similar solution with that dataset in the past. Something rare and niche that you’re passionate about though? It’s more interesting for you to build and more likely to stand out to people that see it. This will also teach you how to frame questions into a context you understand and communicate your findings – essential skills for every data scientist.

Conclusion

The theory is great. It’s essential in fact.

Don’t let it completely dominate your learning though. Take some time to find opportunities to apply this knowledge, test the boundaries, see where it falls down outside of the example problems and tidy demo datasets. If you get the balance right you’ll start to find it becomes easier to learn. You’ll have a growing wealth of context to draw from when learning something new – you’ll ask deeper questions and develop a much stronger understanding along the way.


Related Articles