fast.ai : the BEST things in life are always FREE

Chamin Hewage
Towards Data Science
11 min readFeb 5, 2018

--

[Image[1] (Image courtesy: https://pixabay.com)]

“Having a cosy tete-a-tete with your friend and recall reminiscences, having a good sleep in your comfy bed after a busy days of work, spending real quality time with your family, volunteering for the cause that you truly believe in and you always wanted to help, looking into the that person’s eyes and make him/her feel how special he/she is to you…” are few of the best things in our lives. There are tremendous amount of such circumstances that really helps in bringing the best in ourselves. If you closely look at these things you’ll be pretty amazed to realize that all these best moments in our lives have a common characteristic. Yes! as the title suggests, “the BEST things in life are always FREE”.

Keep calm! this post is not about philosophical aspects of our lives :D. In this post, I’m going to give my two cents about a learning platform that is 100% free (yes! it’s a GOLD mine) where you can “do and learn” Deep Learning. Yes! “do and learn”. Please do stick with me throughout, I’ll explain every minute information that you need to know about this learning platform. Let’s set the wheels in motion :D

I always wanted to learn about data science and data engineering. Coming from the Computer Science background and having exposures in “Pattern Recognition”, “Data Mining and Data Warehousing”, “Distributed Systems”, “High Performance Computing”++ I realized that it is a matter of connecting all the dots. So I started to get serious on “start doing” data science and data engineering. Eventually I landed on “Kaggle”. I started to learn from peers, and it gave me a big boost. Whenever I try to start with a new problem, I had to go back and forth with the previous projects that I did (which is perfectly okay :D). I started with “Exploratory Data Analytics”[1][2] and gradually learn to build models and to do “Serious Stuff”. At this time, I was doing a cost benefit analysis between “Python” and “R” and I thought of sticking with “Python”.

Python is one of the many ways where you can get started. I initially thought of mastering both Python and R, then after reading some Reddit and Quora threads, I came to a conclusion that for the moment, I am going to stick with Python. The primary reason is, Python is more of like the engineer’s choice where R has been the choice of the statisticians.

Slow & Steady [Image[2] (Image courtesy:http://cathjayasuriya.blogspot.sg/2011/02/learning-from-ants.html)]

At the beginning, it was a slow but a steady journey. To do “Serious Stuff” I knew that I have to pull my finger out and “work hard”. “Work hard” means, to broaden my understanding about the topics in “data science” (I always try to strike a balance between “data science” and “data engineering” as I personally prefer to work in full spectrum).

That is the point where I realized the importance of deepening my understanding on Neural Networks. The quest began again. I came across few major leading online courses, Neural Networks for Machine Learning by Professor Geoffrey Hinton, Deep Learning Specialization by Professor Andrew Ng and Artificial Intelligency — Deep Learning Nanodegree by Udacity. These are undoubtedly the finest resources to master and excel in Neural Networks. You can learn all these materials for free (if you want to earn a certificate, then only you need to pay). Despite these resources being the best and the most #trending approaches that many follow, I still kept on searching for the right answer to my quest. Then I came across “Udemy” platform. After a thorough review of Udemy courses I picked “Deep Learning A-Z™: Hands-On Artificial Neural Networks” as this course jumps straight to the implementations without going into deeper mathematics. This course is $10 USD and I didn’t mind spending that amount to learn a material that gives me a good boost. If you want to explore deep concepts and mathematics behind Neural Networks, you need to read additional materials (the instructors in this course provide links to these materials). My quest for the right answer, the right platform to learn Neural Networks, was still there. Those days, I spend a great deal of time benchmarking each user’s perspective on Reddit and Quora.

It happened :D

I can’t exactly recall the story. I firmly believe that the learning platform that I found after a mighty quest is the right answer to my needs.

fast.ai

Coming from Computer Science background and being in the industry and computer science research for over 2+ years, I am convinced that for me to learn ‘Neural Networks’, it has to be a different approach, not learning theory first and then implement and then compete at Kaggle, which is the colloquial approach. “fast.ai” offers a practical way of mastering Deep Learning by straight away going coding and implementing of real Kaggle competitions. Then later slowly it builds a solid understanding about the underlying concepts of ‘Neural Networks’, how these concepts can be applied to real world scenarios, their(Neural Networks) limitations. What really fascinates me about this course is that the instructors of this course share their own experiences and these experiences help me in connecting dots which I mentioned earlier.

Let me brief about the instructors of this course.

The founders of “fast.ai” are “Jeremy Howard”- Research Scientist at the University of San Francisco (a voracious Kaggler (#1)), and “Rachel Thomas”- who was selected by Forbes as one of “20 Incredible Women Advancing AI Research.”. I encourage you to take a look at “fast.ai/about” to get more information about these two incredible personas. These two beautiful souls made this course totally free for everyone ❤

Jeremy Howard and Rachel Thomas [Image[3] (Image courtesy: http://www.fast.ai/)]

Let me walk you through the courses that offer via “fast.ai

Practical Deep Learning for Coders, Part 1

This is the right starting point for anyone with Computer Science (or other related disciplines) background to commence the journey in Neural Networks. You’ll do real Kaggle competitions while following this beginner course. The development environment is either Amazon Web Services (AWS) or Paperspace. By end of 7 videos you’ll have a thorough understanding on “how to approach a research question using deep learning approach”. Investing 6–10 hours a week to understand the content would be an early investment that you are making in your future career as a Deep Learning practitioner. Each of these videos are a maximum of 2 hours, while watching it’s good to re-watch again and again the same content until you get the fundamentals right. It pays HIGHER RETURNS. I’m still following this course and I will do a brief retrospect on what I learnt so far.

Cutting Edge Deep Learning For Coders, Part 2

This is the continuation of part one, “Practical Deep Learning for Coders, Part1”. I still didn’t start this course so let me share what this part 2 promises to deliver.

Cutting Edge Deep Learning For Coders, Part 2, where you’ll learn the latest developments in deep learning, how to read and implement new academic papers, and how to solve challenging end-to-end problems such as natual language translation (extracted from http://course.fast.ai/lessons/lessons.html)

Pretty exciting right :D I’m eagerly looking forward to “start doing” this part as soon as I complete part one.

Perhaps you might think that this course is only about Deep Learning. Keep calm!!! . Both Jeremy and Rachel started “fast.ai” with the vision of making Deep Learning accessible to everyone. While I was engaging in the “fast.ai forums”, sometime back I noted that these two beautiful souls have released a course on machine learning.

Another treat! Early access to Intro To Machine Learning videos

This is the third course offer by “fast.ai”. I wish I found this at the very early stages of my machine learning career. In this course Jeremy and Rachel discuss about how you can master your skills in applying the concepts of machine learning to real world problems through Kaggle competitions. I started following this material so that I can further deepen my understanding about machine learning to apply machine learning to real world problems.

Okay, now let me share some of my experiences by following “fast.ai”. I’m not going in depth. Since now you know the value of this “fast.ai” platform, I encourage you to go and experience this platform. In a nutshell, I’m going to reflect what I learn so far through this platform (how far I came from where I started :P).

Dogs vs Cats

Dogs vs Cats[Image[4] (Image courtesy: https://pixabay.com/)]

Dogs vs Cats” is a competition hosted in Kaggle and this is the very first task that “Practical Deep Learning for Coders, Part 1” tries to find the solutions via state of the art approach. This competition is about classifying images into two classes, “cats” and “dogs”. Initially we were given training data set which are labelled accordingly. Our task is to develop a classification model using these training data set. Finally the developed model need to correctly classify the test data set which are unlabeled. Okay, this sounds simple. But when you “start doing” the task, you’ll come across vast set of new things which you never come across before.

Following are the things that I learnt after doing completing this competition:

  • Setting up AWS for Deep Learning — this may sound strange to you at first. But I reckon this might not sound strange to a total beginner like me at that time. To train and develop deep learning models you need to have high computational power. I did try out several small scale deep learning tasks before. But I didn’t have this requirement to move to cloud infrastructure. Setting up AWS for deep learning gave me a great hands on experience on AWS infrastructure. I learnt about the “t2.xlarge” instance and “p2.xlarge” instances. Setting up roles in AWS IAM, creating AWS users, applying policies to users, EC2 instances, S3 buckets, Authenticator app secure root account when logging, VPC, Dynamo DB and many more ++ (the latter four I learnt by simply playing around with AWS management console).
  • Configure AWS in local machine and to use aliases to start, stop, get instance id via command line aliases.
  • Using tmux to multitask (didn’t used tmux before :( :P)
  • VGG16 model and ImageNet competition — VGG16 model is a pre-trained model using wide variety of labelled images. The main characteristic of these images which used to train the model VGG16 are, these images are clearly visible on the image itself. What I meant by that is, in the training image, no other object is visible except for the one single image that is visible. Let me be more clear. On the left hand side of the image, we can clearly see the dog in the image where as in the right hand side, we can see two dogs, but those dogs are not the center of the attraction. VGG16 model has trained using images similar to left hand side. You can refer to ImangeNet[3][4] for further more details.
Dogs vs Dog image with other objects[Image[5] (Image courtesy: Google Images)]
  • Apply the same set of techniques to other image classification tasks — You simply need to construct the same folder structure.
  • To understand and apply VGG16 model

I think this is fairly enough for you to understand why this course is “the BEST” out there. This course and the part two of this course now have evolved in to a much more sophisticated level. In the latest series, Jeremy Howard is using “fastai” library as the base library for all the projects. This time (as of 2018), the cloud infrastructure is from Paperspace.

Now let me walk you through some key learning outcomes I gained after completing the first task from “Introduction to Machine Learning” video series.

Blue Book for Bulldozers

Blue Book for Bulldozers [Image[6] (Image courtesy: https://pixabay.com/)]

This is the first Kaggle project that Jeremy discussed in the “Introduction to Machine Learning” series. The goal of this project is to predict the “sale price” which is the continuous value dependent variable. This project has data that had been collected over many years. The algorithm that used in the task is “Random Forest”. Jeremy straight away jumps to the point how “Random Forest” can use in developing a successful predictive model. While doing and after the model is constructed, he goes in deeper and covers breadth and depth.

I encourage you to follow these awesome FREE materials. You’ll find these immensely helpful. Be a part of the highly engaging community forum. I guarantee that you’ll learn many extraordinary content from the peer learners. My ultimate goal through this post is to let you guys know about this awesome platform “fast.ai”. That’s why I only took two scenarios in elaborating the worthiness of this platform. I firmly believe I am successful in delivering that message to you in an effective manner.

Following are the key points that I would like to highlight after taking this first task:

  • First, I learnt many things about Jupyter Notebook — mainly, to go through function implementations and documentations while in the notebook itself, shortcut keys, debugging ++
  • The value of “date” feature — “date” may consists of ‘year-month-date’ format or ‘year-month-date-hour-minute-second format’. In any time-series problem it is extremely vital to scrutinize ‘date’ parameter. From ‘date’ we can derive ‘week of year’, ‘week of date’ and many other features. These features will help to get a broader understanding about our data. We all know that feature engineering is an essential part in machine learning. If you follow this video on “fast.ai” platform you’ll be amazed and convinced by the feature engineering.
  • The process and importance of transforming ‘string variables to categorical variables’ and how the inner workings of Python libraries assign numeric values to variables once this categorization is done.
  • Deal with missing values (numeric and categorical)
  • Save data in the format as it save in the RAM — this methodology helpful in retrieving the project fast
  • Methods to improve Random Forest Regressor

In a nutshell, that’s what I learnt after completing the first task.

With that I would like to conclude this post by offering my heartfelt thanks to you. Let me know your thoughts and comments about this post. I admire your feedback. If you find this post fruitful do share among your friends on Faceboook, Twitter, Google+ and other social media platforms. Feel free to give me a one clap, two clap, three claps or may be big round of applause to motivate me. It would definitely help me to keep the momentum.

Like I mentioned in the title of this post:

The BEST things in life are always FREE!!!

[1] https://www.kaggle.com/omarelgabry/a-journey-through-titanic
[2] https://www.kaggle.com/mrisdal/exploring-survival-on-the-titanic
[3] http://www.image-net.org/
[4] http://image-net.org/explore

--

--

I am a Data Systems' scientist (PhD), and I work as a senior Database Engineer at HPE. I aspire to bring state-of-the-art to mainstream through innovation.