Getting Started

A Practical Start to Machine Learning

Towards Data Science
5 min readMar 9, 2018

--

Working in a corporate innovation lab has taught me many things, with one of the most notable being teaching new students every four months the basics of applied machine learning. On top of that is teaching students with no machine learning (ML) experience how they can apply this technology to various aspects of their work. While I won’t say I’ve perfected this skill, I’ve been part of this cycle for three terms now and I have a better an understanding of what does and doesn’t work for teaching students the principles of applied machine learning in 1–2 weeks.

Interestingly, I’ve found that many students who complete online courses or learn the mathematics of a neural net can still struggle with feature engineering, data mining, and even loading generic data. Through teaching a few cohorts of students in our space, I’ve found the following resources to be my top recommended that I feel are worth sharing.

An End-to-End Example

While machine learning is currently a hot topic, many developers have no idea how it can actually be used. I find it’s best to walk through a simple end-to-end machine learning example for students to gain basic understandings of pre-processing, feature engineering, training, and then evaluating.

Painting by Willy Stöwer

My favourite “Hello World” tutorial for machine learning is the Titanic Dataset (although it’s slightly morbid). From this dataset, you learn to train a model that predicts whether or not a passenger will survive the Titanic’s accident. The file I use to walk students through can be found here. I find it’s best to get some practice using Jupyter Notebooks, Pandas, Scikit-Learn and Numpy.

The reason I like this dataset as an intro is because you learn important ML basics, such as reading in a CSV file, dropping rows with null entries and irrelevant columns, transforming a gender field from text to binary, and performing one-hot encoding of a categorical field. By going step-by-step through this example, you can cover many different feature engineering practices using a problem that is easy to grasp.

Hitting the Books

Once you’ve experienced what a simple machine learning pipeline looks like, I recommend reading a little and learning some of the fundamentals. Aurélien Géron’s book, Hands‑On Machine Learning with Scikit‑Learn and TensorFlow is, in my opinion, the best way to start understanding applied machine learning. I typically have students read the first few chapters or until they reach the TensorFlow section. What is great about the book is that it has accompanying Jupyter Notebooks for all the examples, so you can tweak and play with their code as you go along. I also love that they dive right into an end-to-end solution solution in Chapter 2, making it super easy for engineers to quickly pick things up.

Playing with Some Code

Now that you’ve learned some of the theory behind machine learning, it’s time to explore different types of implementations and use cases. However, some of us who focus solely on the readings might have a hard time jumping into code and implementing solutions with different data.

To find thousands of examples of different datasets, approaches, implementations and machine learning solutions, I turn to Kaggle. It has a huge amount of priceless knowledge, including hundreds of high-quality datasets in various domains. Possibly the greatest part of Kaggle is the kernels. Kernels are other users’ end-to-end code for problems, including everything from reading in the datasets and cleaning, to feature engineering, training and optimizations. You can even fork other users’ kernels and run them on their cloud, giving you a chance to explore real solutions.

Kaggle is great because it gives you an opportunity to break away from toy problems and simple data. It allows you to work with more challenging and realistic datasets, including images, raw and unorganized text, and numerical features. Taking a break and exploring Kaggle (and maybe even entering a few competitions) gives you an opportunity to learn from other Kaggler’s applied solutions, expanding your understanding beyond the theory and mathematics behind machine learning solutions.

Going Deeper

After exploring some kernels on Kaggle, it’s vital to learn some deep learning fundamentals and applications. The reason I advise skipping the TensorFlow section in Géron’s book is because I find Keras much easier to quickly pick-up building neural nets. Keras is a high-level deep learning library that relies on TensorFlow or Theano as a backend for performing its computations.

Luckily, the best book I’ve found for learning applied deep learning is also by the creator of Keras, François Chollet! In his amazing book, Deep Learning with Python, he covers deep learning concepts, computer vision, natural language processing, and more advanced neural network architectures for areas such as question-answering and much more. The other highlight of this book is that, since Chollet himself wrote it, it teaches the best practices, built-in APIs, and functions that you can use in Keras. This book contains many things I’ve never seen taught in online courses or in other books, and I would definitely consider it to be one of my top resources for looking up implementation tricks with Keras.

Wrapping Things Up

Taking these steps are just the beginning to having a deeper understanding of machine learning, and should give you the confidence and enough background to start working on different sources of data. By focusing on the implementation of different algorithms, resources, and libraries, it’s easier to get excited and learn and want to continue learning.

The alternative of focusing solely on mathematics can be hard to translate into a work environment. Some of the most useful things I’ve learned are the subtle functions built in to Keras, Scikit-Learn and Pandas that allow me to work faster. Some of the coolest pipelines I’ve learned for feature engineering have been inspired from solutions in competitions I’ve dabbled with through Kaggle.

With so many resources out there for machine learning, it can be overwhelming. I hope this list serves as a great starting point for you to dive into machine learning.

--

--