
What is the probability that it will rain today? Before you answer, the things you consider might be the season, the general weather conditions of the city, and the density of clouds in the sky. Taking these signs into consideration, you can make an educated guess. Bayes’ theorem follows a similar approach. According to the bayes’ theorem, probability of an event is described in terms of prior knowledge or given conditions that are related.
Bayes’ theorem builds upon probability and conditional probability. Thus, it is better to get an overview of these topics first.

Probability simply means the likelihood of an event to occur and always takes a value between 0 and 1 (0 and 1 inclusive). The probability of event A is denoted as p(A) and calculated as the number of the desired outcome divided by the number of all outcomes. For example, when you roll a die, the probability of getting a number less than three is 2 / 6. The number of desired outcomes is 2 (1 and 2); the number of total outcomes is 6.
Conditional probability is the likelihood of an event A to occur given that another event that has a relation with event A has already occurred. Suppose that we have 6 blue balls and 4 yellows placed in two boxes as seen below. I ask you to randomly pick a ball. The probability of getting a blue ball is 6 / 10 = 0,6.
What is the probability of picking a yellow ball given that the ball is taken from box A? It is a conditional probability and denoted as P(Yellow | Box A).

There is one more concept to learn before introducing Bayes’ Theorem. A joint probability is the probability of two events occurring together and denoted as p(A and B). For independent events, joint probability can be written as:
p(A and B) = p(A).p(B) ……… (1)
Let’s say I roll a die and flip a coin. The probability of getting 1 and heads is:
(1 / 6).(1/2) = 1/12 = 0.08
For this calculation to be correct, events must be independent. The outcome of flipping a coin does not have any effect on the outcome of rolling a die so these events are independent. Let’s also give an example of dependent events. I picked one card from a deck and picked a second card from the same deck. The probability of a particular observation in the second pick certainly effected by the first pick. In the case of dependent events, equation 1 is not valid. It should be slightly changed to hold for any two events:
p(A and B) = p(A).p(B|A) ……… (2)
Equation (1) is a special case of equation (2) for independent events because if event B and event A independent, p(B|A) = p(B).
Bayes’ Theorem
We will start with the fact that joint probability is commutative for any two events. That is:
p(A and B) = p(B and A) ……… (3)
From equation 2, we know that:
p(A and B) = p(A).p(B|A)
p(B and A) = p(B).p(A|B)
We can rewrite equation 3 as:
p(A).p(B|A) = p(B).p(A|B)
Dividing two sides by p(B) gives us the Bayes’ Theorem:

So according to Bayes’ theorem, probability of event A given that event B has already occurred can be calculated using the probabilities of event A and event B and probability of event B given that A has already occurred.
Bayes’ theorem is so fundamental and ubiquitous that a field called "bayesian Statistics" exists. In bayesian statistics, the probability of an event or hypothesis as evidence comes into play. Therefore, prior probabilities and posterior probabilities differ depending on the evidence.
Naive bayes algorithm is structured by combining bayes’ theorem and some naive assumptions. Naive bayes algoritm assumes that features are independent of each other and there is no correlation between features. However, this is not the case in real life. This naive assumption of features being uncorrelated is the reason why this algorithm is called "naive".