A “weird” introduction to Deep Learning

There are amazing introductions, courses and blog posts on Deep Learning. But this is a different kind of introduction. Spanish version here.

Published in

Towards Data Science

14 min readMar 23, 2018

There are amazing introductions, courses and blog posts on Deep Learning. I will name some of them in the resources sections, but this is a different kind of introduction.

But why weird? Maybe because it won’t follow the “normal” structure of a Deep Learning post, where you start with the math, then go into the papers, the implementation and then to applications.

It will be more close to the post I did before about “My journey into Deep Learning”, I think telling a story can be much more helpful than just throwing information and formulas everywhere. So let’s begin.

NOTE: There’s a companion webinar to this article. Find it here:

Webinars - DeepCognition.ai

About Favio: Physicist and computer engineer. Holds a Master's degree in Physical Sciences from UNAM. He works on Big…

deepcognition.ai

Why Am I making this introduction?

Sometimes is important to have a written backup of your thoughts. I tend to talk a lot, and be present in several presentations and conference, and this is my way of contributing with a little knowledge to everyone.

Deep Learning (DL)is such an important field for Data Science, AI, Technology and our lives right now, and it deserves all of the attention is getting. Please don’t say that deep learning is just adding a layer to a neural net, and that’s it, magic! Nope. I’m hoping that after reading this you have a different perspective of what DL is.

Deep Learning Timeline

I just created this timeline based on several papers and other timelines with the purpose of everyone seeing that Deep Learning is much more than just Neural Networks. There has been really theoretical advances, software and hardware improvements that were necessary for us to get to this day. If you want it just ping me and I’ll send it to you. (Find my contact in the end of the article).

What is weird about Deep Learning?

Deep Learning has been around for quite a while now. So why it became so relevant so fast the last 5–7 years?

As I said before, until the late 2000s, we were still missing a reliable way to train very deep neural networks. Nowadays, with the development of several simple but important theoretical and algorithmic improvements, the advances in hardware (mostly GPUs, now TPUs), and the exponential generation and accumulation of data, DL came naturally to fit this missing spot to transform the way we do machine learning.

Deep Learning is an active field of research too, nothing is settle or closed, we are still searching for the best models, topology of the networks, best ways to optimize their hyperparameters and more. Is very hard, as any other active field on science, to keep up to date with the investigation, but it’s not impossible.

A side note on topology and machine learning (Deep Learning with Topological Signatures by Hofer et al.):

Methods from algebraic topology have only recently emerged in the machine learning community, most prominently under the term topological data analysis (TDA). Since TDA enables us to infer relevant topological and geometrical information from data, it can offer a novel and potentially beneficial perspective on various machine learning problems.

Luckily for us, there are lots of people helping understand and digest all of this information through courses like the Andrew Ng one, blog posts and much more.

This for me is weird, or uncommon because normally you have to wait for sometime (sometime years) to be able to digest difficult and advance information in papers or research journals. Of course, most areas of science are now really fast too to get from a paper to a blog post that tells you what yo need to know, but in my opinion DL has a different feel.

Breakthroughs of Deep Learning and Representation Learning

We are working with something that is very exciting, most people in the field are saying that the last ideas in the papers of deep learning (specifically new topologies and configurations for NN or algorithms to improve their usage) are the best ideas in Machine Learning in decades (remember that DL is inside of ML).

I’ve used the word learning a lot in this article so far. But what is learning?

In the context of Machine Learning, the word “learning” describes an automatic search process for better representations of the data you are analyzing and studying (please have this in mind, is not making a computer learn).

This is a very important word for this field, REP-RE-SEN-TA-TION. Don’t forget about it. What is a representation? It’s a way to look at data.

Let me give you an example, let’s say I tell you I want you to drive a line that separates the blue circles from the green triangles for this plot:

Ian Goodfellow et al. (Deep Learning, 2016)

This example is from the book of Deep Learning by Ian Goodfellow, et al. (2016).

So, if you want to use a line this is what the author says:

“… we represent some data using Cartesian coordinates, and the task is impossible.”

This is impossible if we remember the concept of a line:

A line is a straight one-dimensional figure having no thickness and extending infinitely in both directions. From Wolfram MathWorld.

So is the case lost? Actually no. If we find a way of representing this data in a different way, in a way we can draw a straight line to separate the types of data. This is something that math taught us hundreds of years ago. In this case what we need is a coordinate transformation, so we can plot or represent this data in a way we can draw this line. If we look the polar coordinate transformation, we have the solution:

And that’s it now we can draw a line:

So, in this simple example we found and chose the transformation to get a better representation by hand. But if we create a system, a program that can search for different representations (in this case a coordinate change), and then find a way of calculating the percentage of categories being classified correctly with this new approach, in that moment we are doing Machine Learning.

This is something very important to have in mind, deep learning is representation learning using different kinds of neural networks and optimize the hyperparameters of the net to get (learn)the best representation for our data.

This wouldn’t be possible without the amazing breakthroughs that led us to the current state of Deep Learning. Here I name some of them:

Idea: Back Propagation.

Learning representations by back-propagating errors by David E. Rumelhart, Geoffrey E. Hinton & Ronald J. Williams.

A theoretical framework for Back-Propagation by Yann Lecun.

2. Idea: Better initialization of the parameters of the nets. Something to remember: The initialization strategy should be selected according to the activation function used (next).

Weight Initialization for Deep Networks - Practical aspects of Deep Learning | Coursera

This course will teach you the "magic" of getting deep learning to work well. Rather than the deep learning process…

www.coursera.org

How to train your Deep Neural Network

There are certain practices in Deep Learning that are highly recommended, in order to efficiently train Deep Neural…

rishy.github.io

CS231n Convolutional Neural Networks for Visual Recognition

Course materials and notes for Stanford class CS231n: Convolutional Neural Networks for Visual Recognition.

cs231n.github.io

3. Idea: Better activation functions. This mean, better ways of approximating the functions faster leading to faster training process.

Understanding Activation Functions in Neural Networks

Recently, a colleague of mine asked me a few questions like “why do we have so many activation functions?”, “why is…

medium.com

Activation Functions: Neural Networks

Sigmoid, tanh, Softmax, ReLU, Leaky ReLU EXPLAINED !!!

towardsdatascience.com

4. Idea: Dropout. Better ways of preventing overfitting and more.

Learning Less to Learn Better — Dropout in (Deep) Machine learning

In this post, I will primarily discuss the concept of dropout in neural networks, specifically deep nets, followed by…

medium.com

Dropout: A Simple Way to Prevent Neural Networks from Overfitting, a great paper by Srivastava, Hinton and others.

5. Idea: Convolutional Neural Nets (CNNs).

Gradient based learning applied to document recognition by Lecun and others

ImageNet Classification with Deep Convolutional Neural Networks by Krizhevsky and others.

6. Idea: Residual Nets (ResNets).

[1512.03385v1] Deep Residual Learning for Image Recognition

Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the…

arxiv.org

[1608.02908] Residual Networks of Residual Networks: Multilevel Residual Networks

Abstract: A residual-networks family with hundreds or even thousands of layers dominates major image recognition tasks…

arxiv.org

7. Idea: Region Based CNNs. Used for object detection and more.

[1311.2524v5] Rich feature hierarchies for accurate object detection and semantic segmentation

Abstract: Object detection performance, as measured on the canonical PASCAL VOC dataset, has plateaued in the last few…

arxiv.org

[1703.06870] Mask R-CNN

Abstract: We present a conceptually simple, flexible, and general framework for object instance segmentation. Our…

arxiv.org

facebookresearch/Detectron

Detectron - FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and…

github.com

8. Idea: Recurrent Neural Networks (RNNs) and LSTMs.

A Beginner's Guide to Recurrent Networks and LSTMs

Contents The purpose of this post is to give students of neural networks an intuition about the functioning of…

deeplearning4j.org

Understanding LSTM Networks -- colah's blog

These loops make recurrent neural networks seem kind of mysterious. However, if you think a bit more, it turns out that…

colah.github.io

Recurrent Layers - Keras Documentation

input_length: Length of input sequences, to be specified when it is constant. This argument is required if you are…

keras.io

BTW: It was shown by Liao and Poggio (2016) that ResNets == RNNs, arXiv:1604.03640v1.

9. Idea: Generative Adversarial Networks (GANs).

[1406.2661v1] Generative Adversarial Networks

Abstract: We propose a new framework for estimating generative models via an adversarial process, in which we…

arxiv.org

nashory/gans-awesome-applications

gans-awesome-applications - Curated list of awesome GAN applications and demo

github.com

10. Idea: Capsule Networks.

What is a CapsNet or Capsule Network?

What is a Capsule Network? What is Capsule? Is CapsNet better than Convolutional Neural Network (CNN)? This article is…

hackernoon.com

Understanding Hinton’s Capsule Networks. Part I: Intuition.

Part of Understanding Hinton’s Capsule Networks Series:

medium.com

gram-ai/capsule-networks

capsule-networks - A PyTorch implementation of the NIPS 2017 paper "Dynamic Routing Between Capsules".

github.com

And there are many others but I think those are really important theoretical and algorithmic breakthroughs that are changing the world, and that gave momentum for the DL revolution.

How to get started with Deep Learning?

It’s not easy to get started but I’ll try my best to guide you through this process. Check out this resources, but remember, this is not only watching videos and reading papers, it’s about understanding, programming, coding, failing and then making it happen.

-1. Learn Python and R ;)

0. Andrew Ng and Coursera (you know, he doesn’t need an intro):

Deep Learning | Coursera

Deep Learning from deeplearning.ai. If you want to break into AI, this Specialization will help you do so. Deep…

www.coursera.org

Siraj Raval: He’s amazing. He has the power to explain hard concepts in a fun and easy way. Follow him on his YouTube channel. Specifically this playlists:

— The Math of Intelligence:

— Intro to Deep Learning:

3. François Chollet’s book: Deep Learning with Python (and R):

Deep Learning with Python

The clearest explanation of deep learning I have come across...it was a joy to read.

www.manning.com

Deep Learning with R

The clearest explanation of deep learning I have come across...it was a joy to read.

www.manning.com

3. IBM Cognitive Class:

Deep Learning Fundamentals

About This Course Get a crash course on the what there is to learn and how to go about learning more. Deep Learning…

cognitiveclass.ai

Deep Learning with TensorFlow

This Deep Learning with TensorFlow course focuses on TensorFlow. If you are new to the subject of deep learning…

cognitiveclass.ai

5. DataCamp:

Deep Learning in Python

Deep learning is the machine learning technique behind the most exciting capabilities in diverse areas like robotics…

www.datacamp.com

keras: Deep Learning in R

As you know by now, machine learning is a subfield in Computer Science (CS). Deep learning, then, is a subfield of…

www.datacamp.com

Distributed Deep Learning

Deep Learning is one of the most important tools and theories a Data Scientist should learn. We are so lucky to see amazing people creating both research, software, tools and hardware specific for DL tasks.

DL is computationally expensive, and even though there’s been advances in theory, software and hardware, we need the developments in Big Data and Distributed Machine Learning to improve performance and efficiency. Great people and companies are making amazing efforts to join the distributed frameworks (Spark) and DL libraries (TF and Keras).

Here’s an overview:

Databricks: Deep Learning Pipelines (Soon will be merge to Spark)

Overview - Deep Learning Pipelines 0.2.0 Documentation

Deep Learning Pipelines 0.2.0 documentation homepage

databricks.github.io

2. Elephas: Distributed DL with Keras & PySpark:

maxpumperla/elephas

elephas - Distributed Deep learning with Keras & Spark

github.com

3. Yahoo! Inc.: TensorFlowOnSpark:

yahoo/TensorFlowOnSpark

TensorFlowOnSpark brings TensorFlow programs onto Apache Spark clusters

github.com

4. CERN Distributed Keras (Keras + Spark) :

cerndb/dist-keras

dist-keras - Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.

github.com

5. Qubole (tutorial Keras + Spark):

Distributed Deep Learning with Keras on Apache Spark | Qubole

Deep learning has been shown to produce highly effective machine learning models in a diverse group of fields. Some of…

www.qubole.com

6. Intel Corporation: BigDL (Distributed Deep Learning Library for Apache Spark)

intel-analytics/BigDL

BigDL: Distributed Deep Learning Library for Apache Spark

github.com

7. TensorFlow and Spark on Google Cloud:

Using Apache Spark with TensorFlow on Google Cloud Platform | Google Cloud Big Data and Machine…

Apache Spark and TensorFlow are both open-source projects that have made significant impact in the world of enterprise…

cloud.google.com

Getting stuff done with Deep Learning

As I’ve said before one of the most important moments for this field was the creation and open sourced of TensorFlow.

TensorFlow is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them.

The things you are seeing in the image above are tensor manipulations working with the Riemann Tensor in General Relativity.

Tensors, defined mathematically, are simply arrays of numbers, or functions, that transform according to certain rules under a change of coordinates.

But in the scope of Machine Learning and Deep Learning a tensor is a generalization of vectors and matrices to potentially higher dimensions. Internally, TensorFlow represents tensors as n-dimensional arrays of base datatypes.

We use heavily tensors all the time in DL, but you don’t need to be an expert in them to use it. You may need to understand a little bit about them so here I list some good resources:

Deep Learning 101: Demystifying Tensors

Tensors and new machine learning tools such as TensorFlow are hot topics these days, especially among people looking…

www.kdnuggets.com

Learning AI if You Suck at Math — P4 — Tensors Illustrated (with Cats!)

Welcome to part four of Learning AI if You Suck at Math. If you missed parts 1, 2, 3, 5, 6 and 7 be sure to check them…

hackernoon.com

After you check that out, the breakthroughs I mentioned before and the programming frameworks like TensorFlow or Keras (for more on Keras go here), now I think you have an idea of what you need to understand and work with Deep Learning.

But what have we achieved so far with DL? To name a few (from François Chollet book on DL):

Near-human level image classification.
Near-human level speech recognition.
Near-human level handwriting transcription.
Improved machine translation.
Improved text-to-speech conversion.
Digital assistants such as Google Now or Amazon Alexa.
Near-human level autonomous driving.
Improved ad targeting, as used by Google, Baidu, and Bing.
Improved search results on the web.
Answering natural language questions.
Superhuman Go playing.

And much more. Here’s a list of 30 great and funny applications of DL:

30 amazing applications of deep learning

Over the last few years Deep Learning was applied to hundreds of problems, ranging from computer vision to natural…

www.yaronhadad.com

Thinking about the future of Deep Learning (for programming or building applications), I’ll repeat what I said in other posts.

I really think GUIs and AutoML are the near future of getting things done with Deep Learning. Don’t get me wrong, I love coding, but I think the amount of code we will be writing next years will decay.

We cannot spend so many hours worldwide programming the same stuff over and over again, so I think these two features (GUIs and AutoML) will help Data Scientist on getting more productive and solving more problems.

On of the best free platforms for doing these tasks in a simple GUI is Deep Cognition. Their simple drag & drop interface helps you design deep learning models with ease. Deep Learning Studio can automatically design a deep learning model for your custom dataset thanks to their advance AutoML feature with nearly one click.

Here you can learn more about them:

DeepCognition - Become an AI-Powered Organization Today

Design, Train, and Deploy Deep Learning Models without Coding. Deep Learning Studio simplifies and accelerates the…

deepcognition.ai

Take a look at the prices :O, it’s freeeee :)

I mean, it’s amazing how fast the development in the area is right now, that we can have simple GUIs to interact with all the hard and interesting concepts I talked about in this post.

One of the things I like about that platform is that you can still code, interact with TensorFlow, Keras, Caffe, MXNet an much more with the command line or their Notebook without installing anything. You have both the notebook and the CLI!

I take my hat off to them and their contribution to society.

Other interesting applications of deep learning that you can try for free or for little cost are (some of them are on private betas):

Skejul - Simplify the Future...

If It's In The Future, It's On Skejul... Using Skejul's Context Aware Predictive ComputingTM Platform SET your meetings…

skejul.com

Seeing AI | Talking camera app for those with a visual impairment

A free app that narrates the world around you. Designed for the low vision community, this research project harnesses…

www.microsoft.com

Dialogflow

A conversational user experience platform.

dialogflow.com

Thanks for reading this weird introduction to Deep Learning. I hope it helped you getting started in this amazing area, or maybe just discover something new.

If you have questions just add me on LinkedIn and we’ll chat there: