The Architect of Artificial intelligence — Deep Learning

Published in

Towards Data Science

6 min readOct 2, 2018

Artificial Intelligence has been one the most remarkable advancements of the decade. People are hushing from explicit software development to building Ai based models, businesses are now relying on data driven decisions rather on someone manually defining rules. Everything is turning into Ai, ranging from Ai chat-bots to self driving cars, speech recognition to language translation, robotics to medicine. Ai is not a new thing to researchers though. It has been present even before 90’s. But what’s making it so trending and open to the world??

I’ve been working with Artificial Intelligence and Data Science for almost 2 years now and have worked around a lot of so called state-of-the-art Ai systems like generative chat-bots, speech recognition, language translation, text classification, object recognition, age and expression recognition etc. So, after spending 2 years in Ai, I believe there’s just one major technology (or whatever you call it) behind this Ai-boom, Deep Learning.

This being my introductory blog, I won’t dive into technical details of Deep Learning and Neural Nets (will talk about my work in upcoming blogs), but share with you why I think Deep learning is taking over other traditional methods. If you are not into Deep Learning and Ai stuffs, let me explain it to you in simple non-techie words. Imagine you have to build a method to classify emails into categories like social, promotional or spam, one of the prime Ai tasks that Google does for your Gmail inbox! What would you do to achieve this? May be you could make a list of words to look for into emails like ‘advertisement’, ‘subscribe’, ‘newsletter’ etc, then write a simple string matching regex to look for these words in the emails and classify them as promotional or spam if these words are found. But the problem here is how many keywords can you catch this way or how many rules can you manually write for this? You know the content over internet is cross folding and each day, new keywords would hop in. Thus, this keywords based approach won’t land you good results.

Now if you would give a closer thought to this, you have a computer which can do keyword matching million times faster than you. So rather than using your computationally powerful device just for simple string matching, why not let the computer decide the rules for classification too! What I mean is a computer can go through thousands of data and come up with more precise rules for the task in the time you could just think of 5 such rules.

This is deep learning all about! Instead of you explicitly designing rules and conditions which you think would solve the problem (like simple if-else, making dictionaries of keywords etc.), Deep Learning deals with giving computer the capability to produce certain rules which it can use to solve the problem. This means it’s an end-to-end architecture. You give in the data as input to the network and tell the desired output for each data point. The network then goes through the data and update the rules accordingly to land on a set of optimized rules.

This decision making ability is generally limited to we humans, right?
This is where Artificial Neural Networks (or simply neural nets) kick in. These are set of nodes arranged in layers and connected through weights (which are nothing but number matrices) in a similar way as neurons are connected in our brain. Again I won’t go into technical details of the architecture, their learning algorithms and mathematics behind it, but this is the way Deep Learning mimics brain’s learning process.

Lets take another example, suppose you are to recognize human face in an image which could be located anywhere in the image. How would you proceed?
One obvious way is to define a set of key-points all over human face which together can characterize the face. Generally these are in sets of 128 or 68. These points when interconnected forms an image mask. But what if the orientation of face changes from frontal view to side view?? The geometry of face which helped these points to identify a face changes and thus, the key-point method won’t detect the face.

68 key points of human face, Image taken from www.pyimagesearch.com

Deep Learning makes this possible too ! The key-points we used were based on a human’s perception of face features(like nose, ears, eyes). Hence to detect a face, we try to make the computer find these features together in an image. But guess what, these manually selected features are not so pronounced to computers. Deep Learning rather makes the computer go through a lot of faces (containing all sort of distortions and orientations) and lets the computer decide what feature maps seems relevant to the computer for face detection. After all the computer has to recognize the face, not you! And this gives surprisingly good results. You can go through one of my project here where I used ConvNets (a deep learning architecture) to recognize expression of the face.

Having large data set of faces for recognizing a face may occur as a problem to you. But one-shot learning methods such as Siamese Network have solved this problem too. It is an approach based on a special loss function called Contrastive Triplet Loss and was introduced in the FaceNet paper. I won’t discuss about this here. If you wish to know abut it, you can go through the paper here.

Siamese Network for Gender Detection, Image taken from www.semanticscholar.org

Another myth about Deep Learning is that Deep Learning is a Black Box. There’s no feature engineering and Maths involved behind the architecture. And so it simply replicates the data without actually providing a reliable and long-term solution to the problem.
NO, it’s not like ! It has mathematics and probability involved in a similar way traditional Machine Learning methods have, be it simple Linear Regression or Support Vector Machines. Deep Learning uses the same Gradient Descent equation to look for optimized parameter values as Linear Regression does. The cost function, the hypothesis, error calculation from target value (loss) are all done in similar fashion as they are in traditional algorithms (based on equations). Activation functions in deep nets are nothing but mathematical functions. Once you understand every mathematical aspect of Deep Learning, you can figure out how to build the model for a specific task and what changes need to be done. It’s just that the mathematics involved in Deep Learning turns out to be little complex. But if you get the concepts right, it’s no more a Black Box to you! In fact this is true to all the algorithms in the world.

As far as I’ve learnt, I’ve made my way through all the mathematics behind it. Beginning right from a simple perceptron, standard Wx+b equation of a neuron and back-propagation to modern architectures such as CNN, LSTM, Encoder-Decoder, Sequence2Sequence etc.

The purpose of this blog was to create more acceptance for Deep Learning in the field of Machine Learning and Artificial Intelligence. That’s why I didn’t talk about Deep Learning architectures, codes and Tensorflow. Companies basing their business over AI need to support Deep Learning along with traditional Machine Learning methods. In my upcoming blog, I will talk about some cool projects I did, may be Generative Chat-Bot or may be Neural Machine Translation. If you are into Artificial Intelligence too, do let me know about your opinions on the blog!

The Architect of Artificial intelligence — Deep Learning

Written by Saransh Mehta