Introductory Guide to Artificial Intelligence

Egor Dezhic
Towards Data Science
11 min readMay 22, 2018

--

The main goal of this guide is to provide intuition about theory, techniques and applications for people who want to learn AI. It consists of brief descriptions and links to explanatory articles and lectures. Each section contains basic materials to get the idea how it works. In the last section you can find additional books, courses, podcasts and other materials. You may also check out Playground to play with interactive examples. And, lets begin!

Contents:

  1. Origins
  2. Prerequisite Mathematics
  3. No Free Lunch Theorem
  4. Methods and Algorithms
  5. Development
  6. Real-world Applications
  7. Additional Resources

Origins

Artificial Intelligence was born in attempts to imitate human intelligence. Take a look at it’s brief history. It’s not yet clear whether it is possible to mimic human mind, but we can definitely re-create some of it’s functions. At the same time the whole field of Cognitive Sciences received broad attention.

At the same time it raised a lot of concerns in society. Problems with machines taking over the world are somewhat overblown, but job displacement is a real concern right now. Take a look at Artificial vs Natural Intelligence to get a better idea in which dimensions computers are superior to us.

In a sense, all computers and even calculators represent a kind of AI. As one of the founders of the field said:

“As soon as it works, no-one calls it AI anymore.” John McCarthy

As computers become faster and methods advance AI is going to get smarter. Recent studies suggest that up to 50% of jobs are threatened in the next 5–10 years. Regardless of whether we can fully simulate our minds or not, AI will have significant impact on our lives.

Prerequisite Mathematics

In fact math is not strictly required, but you’ll need it to gain a deeper understanding of the methods. I would recommend to gain at least basic intuition in each subsection before going further. I assume that you are already familiar with school algebra. If not, you may take a look at this guide or find free high-quality textbooks on openstax.

First of all it is worth noting that Classical logic, to which we are all accustomed, could not represent most modern techniques. Thereby, you should understand the ideas of Fuzzy logic.

Second important topic is Graphs. This gentle introduction will help you understand the main ideas.

Linear Algebra

Linear algebra expands concepts and operations from common algebra to collections of numbers. They usually represent inputs, outputs and operation parameters. I recommend to start with this guide.

Next thing is Tensor. Tensor is a generalization of scalars, vectors and matrices to higher-dimensional objects. This video will give you an intuitive explanation of this concept.

Probabilities

Since we usually do not have accurate information about anything, we have to deal with probabilities. This post explains the very basics, while this series will give you a more complete understanding of probabilities, statistics and Bayesian logic. I recommend to go through at least the first three parts of this series.

Target Functions

Which are also called Objective, Cost or Loss functions. They represent the main purpose of our methods. Usually target function measures how well our algorithm is doing it’s job. Further, by optimizing this function we can improve our algorithm.

The most important ones are Mean Squared Error and Cross-Entropy. You can find a description of MSE and Mean Absolute Error in this post, and a description for Cross-Entropy and Kullback-Leibler divergence in this one.

Sometimes we cannot compute objective function directly and need to evaluate the performance of the algorithm in action. But those evaluations serve the same goal.

Optimization

After we built the target function we need to optimize it parameters to improve the performance of our algorithm. The common approach to optimization is Gradient Descent. You can pick up the intuition here and a more detailed description here.

There are many types of GD. One of them is Stochastic Gradient Descent, which takes only a subset of training data to compute loss and gradients at each iteration. Another important class of GD algorithms includes Momentum, which forces the parameters to move more in a common direction.

No Free Lunch Theorem

I’ve devoted a distinct section about it because this theorem conveys a very important idea: there is no universally effective AI method. In short, this theorem states that every problem-solving procedure has some computational cost for each task, and none of these procedures is better on average than others. While this theorem has not yet been proved in the most general case, practice has shown it’s significance.

Some methods may look particularly wonderful, but you still can’t eat them :)

So, we must choose the appropriate methods for each problem.

Common Methods and Algorithms

The main purpose of each method is to construct a good input-to-output mapping model for a specific problem. Furthermore, their combinations lead to even better solutions. Techniques like parallel search, stacking and boosting are helping to construct better models using mixtures of simpler ones.

While search methods usually require only problem specification, most Deep Learning algorithms need huge amounts of data. Therefore available data play an important role in the choice of methods.

I will describe the principal classes of techniques in a roughly historical order of development.

Classic Programming

Although programming is not considered as AI tech anymore, it was many years ago. A single program may perform simple addition of it’s inputs, which may not look like an intellectual activity. However, it may control the robot’s movements and perform complex operations.

You heard me right. Even HTML pages are a kind of AI.

Interpretable and strict specifications of this approach allow to combine hundreds and even thousands of different programs in one structure. However, this approach fails in most complex real-world scenarios. It is extremely hard to anticipate all possible input-output combinations in complex systems.

Rule-based and Expert Systems

Typical Expert System consists of knowledge base as a set of if-then rules and an inference mechanism. This tutorial will give you the general idea. Modern Knowledge Graphs are usually used for question-answering and natural language processing in general. You may get intuitive explanation of them here. While those methods are still used today, popularity of expert systems is falling steadily.

Search

In cases when you can define the space of possible solutions, search will help you to find a good one. This demonstrative introduction will give you the intuition behind the common search algorithms and how they are applied in game development. In addition, this tutorial will provide a more formal description. Despite the apparent simplicity, these methods can achieve excellent results in many domains when used properly.

Genetic Algorithms

Genetic or Evolutionary algorithms are a kind of search inspired by biological evolution. This post will help you understand the idea.

Machine Learning

In general, ML methods also use a kind of search, usually Gradient Descent, to find a solution. In other words they use training examples to learn/fit parameters. There are actually dozens of ML algorithms, but most of them rely on the same principles.

Regression, Support Vector Machines, Naive Bayes and Decision Trees are among the most popular and widely used.

Remaining subsections will describe the most prominent fields of Machine Learning.

Probabilistic Graphical Models

Those models learn statistical dependencies between variables in the form of a graph. This post will provide the general idea of PGMs. Nowadays they are actively replaced by Neural Networks in real-world applications, but still useful for the analysis of complex systems.

Deep Learning

In short, Deep Learning is a subset of ML methods which include many layers of representations. This post provides the general overview. One of the most beautiful properties of NN is that you can stack different layers in any combination. A high-level description of the constituent layers is usually called Architecture of the network.

Essential types of Neural Networks:

Among more specialized blocks, the Neural Attention mechanism is showing great results in many applications. Other plug-and-play blocks like Long Short-Term Memory or this one for Relational Reasoning are giving a lot of flexibility in architecture design. In this way you can easily create Recurrent Convolutional Network with Attention and other stuff.

Restricted Boltzmann Machines is a popular example of unsupervised learning networks.

Some techniques are designed to improve Generalization of neural nets and other ML models, which in turn positively affects the accuracy. The most popular of them are Dropout and Batch Normalization.

Other successful class of networks is Autoencoders. Their most famous application — Word2Vec. In addition, they are used to create representations for documents, knowledge graph entities, images, genes and many other things.

Another interesting example is Generative Adversarial Network, which can learn to produce convincing images, videos and any other type of data.

Many other types of NN are popular in literature but have relatively few real applications: Self-Organizing Maps, Boltzmann Machines, Spiking Neural Networks, Adaptive Resonance Networks and others.

Reinforcement Learning

Intuition behind RL was inspired by behavioral psychologists who observed that animals learn how to behave from rewards. This led to the development of methods that search for policy that leads to the maximization of rewards. This post contains the general overview of Reinforcement Learning.

A lot of RL methods were developed in the course of history. Among state-of-the-art techniques are Evolution Strategies, Deep Reinforcement Learning, Asynchronous Advantage Actor-Critic (A3C) and others.

Development

Since most modern systems use more or less the same hardware(GPUs and TPUs), in this section I will focus on software development.

Basics

Python is probably the best programming language for beginners. Is is quite universal for current AI methods and has many similarities with first successful AI language Lisp. Python has intuitive syntax and a huge community with tons of packages, tutorials and training materials.

I recommend to start with these courses by University of Toronto: Part 1 and Part 2. They cover topics from the very basics of programming to best practices in Python.

Data Science

Because AI methods are highly dependent on data, you need to be able to analyze and manipulate it.

This Data Analysis with Python and Pandas series will help you get a deeper understanding of datasets, while this Numerical Linear Algebra for Coders course will help you master important operations.

These Cheat Sheets contain descriptions of the frequently used functions for the popular Python libraries. It’s very convenient to have those while coding.

Machine Learning

I strongly recommend to start with Machine Learning course by Andrew Ng. It covers all the necessary math and basic methods: Linear and Logistic Regressions, Support Vector Machines, Principal Components Analysis, simple Neural Networks and others. The only important thing that is missing in this course is Decision Tree. This Decision Trees tutorial and more advanced Gradient Boosted Trees tutorial will fill this gap.

Now you can move deeper. This Practical Deep Learning for Coders course teaches how to use state-of-the-art DL techniques. In addition, this Deep Reinforcement Learning course from Berkeley will introduce you modern RL methods.

There are also ways to automate Machine Learning models design. But to get good results AutoML needs much more resources than manually constructed models, so it’s not widespread yet. In addition, while working on AI project you should consider possible safety issues.

Going through courses and tutorials is great, but to really understand the whole process you should take some real-world data and work with it. These resources will help you start:

Open-Source Projects

Some interesting simple examples to learn from:

  1. Data Science IPython Notebooks — large collection of DS, ML, DL and other examples
  2. TensforFlow Examples — TF tutorial with examples for beginners
  3. Kaggle Kernels — thousands of open notebooks for Kaggle competitions

Datasets

You can use these open datasets to train your skills with different kinds of data:

  1. Kaggle Datasets — 700+ open datasets
  2. Awesome Public Datasets — 400+ open datasets
  3. DeepMind Open Datasets — unique datasets used in DeepMind’s research

Real-world Applications

This section is mainly intended to provide inspirational demonstrations for developers. You can check out how AI systems are changing the world right now and which directions will be particularly relevant in the near future: Medicine, Military, Education, Science, Physics, Economics and many others.

Additional Resources

Everything listed below is free of charge, unless otherwise indicated. In addition, most courses offer paid certificates for students and workers which usually cost around 50–100$.

Google is always your best assistant. Also, Quora is an excellent place to find answers. For example, here people have suggested a lot of materials for the study of AI.

Book List

  1. Artificial Intelligence: A Modern Approach by Stuart Russel and Peter Norvig (Not free!) — leading AI textbook
  2. Artificial Intelligence: Foundations of Computational Agents by David L. Poole and Alan K. Mackworth — just great AI textbook
  3. Deep Learning Book (in pdf, mobi and epub) by Ian Goodfellow, Yoshua Bengio and Aaron Courville — best DL textbook
  4. AI Playbook by Andreessen Horowitz — practice-oriented AI book
  5. Neural Networks and Deep Learning by Michael Nielsen — NN-focused book
  6. Machine Learning Yearning by Andrew Ng — a book about how to build a production-ready ML project
  7. The Emotion Machine by Marvin Minsky (Not free!) — great AI & CogSci book with an emphasis in theory by one of the fathers of AI field

You can also find additional ebooks on Machine Learning in this repository.

Online Courses

As the first courses in AI I would recommend Machine Learning by Andrew Ng and Artificial Intelligence by Ansaf Salleb-Aouissi.

  1. Intro to Artificial Intelligence by Peter Norvig and Sebastian Thrun — fundamentals of AI
  2. Intro to Machine Learning by Katie Malone and Sebastian Thrun — introduction to ML algorithms in Python
  3. Deep Learning by Andrew Ng — specialization of 5 courses on neural networks and their applications in real projects
  4. Deep Learning by Vincent Vanhoucke and Arpan Chakraborty — introduction to DL algorithms in Python
  5. Neural Networks for Machine Learning by Geoffrey Hinton — comprehensive theory-oriented course on Neural Networks
  6. Practical Deep Learning for Coders by Jeremy Howard — practice-oriented DL course
  7. Cutting Edge Deep Learning For Coders by Jeremy Howard — 2nd part of this course on state-of-the-art in DL
  8. Convolutional Neural Networks for Visual Recognition by Andrej Karpathy — Stanford lectures (spring 2017) on Convolutional Networks
  9. Deep Natural Language Processing, Oxford 2017 — comprehensive course on Deep Learning in Natural Language Processing
  10. Deep Learning Summer School, Montreal (2016, 2017) — lectures on many Deep Learning topics
  11. Creative Applications of Deep Learning with TensorFlow by Parag Mital — course on applications of DL in art
  12. Deep Learning for Self-Driving Cars, MIT 2017 — introductory course to the practice of deep learning for self-driving cars

Blogs

  1. OpenAI Blog — focused on safety issues of AI
  2. DeepMind Blog“Solve intelligence. Use it to make the world a better place.” is a motto of DeepMind
  3. Google Research Blog — latest news on research at Google
  4. Facebook Research Blog — latest news on research at Facebook
  5. Microsoft Next Blog — latest news on tech and research at Microsoft
  6. The BAIR Blog — Berkeley AI Research platform
  7. Intuition Machine — Deep Learning patterns, methodology and strategy
  8. LAB41 Blog — findings, experimental results, and thoughts on data analytics research
  9. Distill — the most clear Machine Learning research journal

Channels and Podcasts

  1. sentdex — all kinds of tutorials with Python
  2. DeepLearning.TV — simplified Deep Learning
  3. Siraj Raval — learning DL with fun
  4. Two Minutes Papers — short overviews of the latest research
  5. The AI Podcast — podcast on different topics by NVIDIA
  6. This Week in Machine Learning & AI — interesting interviews every week

--

--