In the past two years, I have been exposed to the basics of statistics and mathematical concepts that go on to lay the foundation of the complex AI machinery we see in the form of Alexa, Jarvis, Siri, etc. I feel some of these concepts are so much easier to visualize in real life. These things are pretty clear intuitively if we think about them practically based on what we experience in the world.
 and Photo by Possessed Photography on Unsplash)](https://towardsdatascience.com/wp-content/uploads/2021/08/1AcYdr0u4A4ACHmStKFd2ow.png)
The concept of Markov Chains – "The future is conditionally independent of the past given the present". At any point in our life, we can only control our present and our present is reflective of the entire past i.e. the knowledge and experiences from our past define our present condition and our future progress is solely dependent on what we are today and how much effort we put into transition to a more desirable state in the future.
The concept of MAP and MLE estimates – Maximum Likelihood estimate is a technique to para-metrically estimate the distribution from which the data that we observe is actually coming. Our perspective on something is entirely dependent on what we have witnessed in the past. Based on these experiences, we try to map these to an internal representation or a notion. e.g. We see rich and powerful people around us, and we try to form a notion of what they usually look like and what are their characteristics based on what we see in day to day lives.
MAP estimate extends the concept of MLE and also takes into account the prior knowledge of the distribution that we have. Here we have a prior notion or perspective about something and based on how strongly our brain is convinced by the latest findings that we encounter in our day-to-day life our internal representation of something is changed. e.g. We see rich and powerful people in movies having luxurious houses and cars and lifestyles, and we form a notion about them in our mind. We witness rich and powerful people in real life too, but our mind updates the notion not solely based on what we see in real life, but through a mixture including prior notions of what we have witnessed in movies as well.
The concept of Federated Learning(FL) – With on-device edge learning gaining popularity due to higher compute availability on the devices and to ensure the privacy of the data, FL turns out to be the best alternative for learning efficient and robust models on edge. The very idea of collaboratively learning is promising and holds true in general in real life too where experiences of a single individual might not be sufficient and might be biased too. The ability to incorporate the experiences of our friends or acquaintances gives us a better perspective on anything.
The idea of bias in sampling – The data that we see in real life is always biased. It’s entirely dependent on the data samples that we have actually seen. Robust sampling is no guarantee, and what we see working for someone else in real life may not always work for us. In real life, what we witness and experience tries to create a bias in our thoughts and perspectives.
The concept of Exploration and Exploitation – As human beings we tend to take lesser risk, and we always prefer something that has already been tried out by someone and actually turned out to be great. Thus, in real life too we tend to exploit the earlier knowledge and in general, the chance that a person will explore something less significant from the past is very less.
The concept of GAN’s and generative modeling – When we try to draw something like scenery, animals, or anything, we have a parallel image of the true scenes or artworks that we have seen in real life. GAN’s also do the same thing by trying to learn a distribution close to the ideal behavior and just as in GAN’s mode collapse problem is there i.e. it ends up learning a particular peak in the data and so do we in real life as we end up creating similar scenes most of the time.
The concept of transfer learning – It is very much applicable to real life as what we learn or if we have experience about a particular thing we try to adapt that to a different task and may incorporate some changes or learn the desired changes to successfully adapt to a different task. e.g. Learning to ride a scooter becomes easier if we know how to ride a bicycle and learning to ride a car becomes easier if we know how to ride a scooter.
Some other concepts –
K Nearest Neighbors, Attention are some of the more common parallelisms that one can draw from real life to understand intuitively why the complex AI machinery should work.
Although these things look pretty intuitive in real life, but proving them is no cakewalk in Deep Learning and one needs a smart set of experiments. There are different knobs that we control in Deep Learning as part of the hyperparameters, dataset, model architecture etc. to actually make a claim for causality and correlation.