
In this article, I’ll talk about independence, covariance, and correlation between two random variables. These are fundamental concepts in statistics and are pretty important in Data Science.
Introduction
Let us start with a brief definition of Random Variable with an example.
Random Variable
A random variable, usually written X, is defined as a variable whose possible values are numerical outcomes of a random phenomenon [1]. Given a random experiment with sample space S, a random variable X is a set function that assigns one and only one real number to each element s that belongs in the sample space S [2].
An example of a random variable can be a coin toss which can have heads (H) or tails (T) as the outcomes. Therefore the sample space is:
S = {H, T}
We can define the random variable X as follows:
- Let X = 0 for Heads
- Let X = 1 for Tails
Note that the random variable assigns one and only one real number ( 0 and 1) to each sample of the sample space (H and T). The support, or space, of X, is {0,1} in this case.
Probability Mass Function [2]
The probability that a discrete random variable X takes on a particular value x i.e. P(X=x) is denoted by f(x) and is called the probability mass function (p.m.f.). It is referred to as the probability density function (p.d.f.) for continuous random variables. The pmf is the probability distribution of a discrete random variable and provides the possible values and their associated probabilities [3]. It is defined as:
p(x) = P(X=xᵢ)
p(x) has a property that the probabilities associated with all possible values must be positive and add up to 1.
Now that we have a background about random variables and pmf, we will look at independence, covariance and correlation.
Independence of Random Variables
If X and Y are two random variables and the distribution of X is not influenced by the values taken by Y, and vice versa, the two random variables are said to be independent.
Mathematically, two discrete random variables are said to be independent if:
P(X=x, Y=y) = P(X=x) P(Y=y), for all x,y.
Intuitively, for independent random variables knowing the value of one of them, does not change the probabilities of the other. The joint pmf of X and Y is simply the product of the individual marginalized pmf of X and Y.
Let us solve one example problem to get a better understanding of how the formula is used. Suppose we have two random variables X and Y whose joint probabilities are known. It can be represented as a table and is shown below. The marginalized pmf values for X can be obtained by summing over all the Y values [5]. A similar marginalization can also be done for Y. In a joint pmf table it just corresponds to taking summation over columns. The joint pmf table along with marginal pmf values is shown below:

In order for two random variables to be independent, the cell entries for the joint pmf should be equal to the product of the marginalized pmf values represented in the summation rows and columns i.e. P(X=x, Y=y) = P(X=x) P(Y=y), for all x,y.
If this relationship is not true for any one of the x,y pairs, then the two random variables are not independent. So for our example, the pairs are not independent.
Following is the code for creating marginal pmfs from the distribution table. (Note it hasn’t been optimized in any way.)
The marginal pmfs are then used to check independence:
We look at two sample cases one independent and the other not.
We get the expected relationships from both tables:

Covariance
Covariance is the measure of the joint variability of two random variables [5]. It shows the degree of linear dependence between two random variables. Positive covariance implies that there is a direct linear relationship i.e. increase in one variable corresponds with greater values in the other. Negative covariance implies the greater the values of one random variable the lower the values for the other. Thus, the sign of covariance shows the nature of the linear relationship between two random variables. Finally, a covariance is zero for two independent random variables. However, a zero covariance does not imply that two random variables are independent.
The magnitude of covariance depends on the variables since it is not a normalized measure. As a result, the values themselves are not a clear indication of how strong the linear relationship is.
The formula for covariance is:

Note: E(X) is the expected value of the random variable. You can read more about it here: https://en.wikipedia.org/wiki/Expected_value
In addition to knowing the values of the joint pmf, we also require the mean values of X and Y to calculate the covariance. The following function calculates the covariance from the distribution table.
The covariance for our two test cases is:

We confirm that the covariance for our independent case is zero. While we see a positive covariance for the non-independent test case. It indicates that when X would increase, Y would tend to increase too.
Finally, let us look at Correlation.
Correlation
Correlation is just a scaled/normalised version of covariance so that the values lie between -1 to 1. The normalization is done using the standard deviations of X and Y respectively.

Independent variables have both zero covariance and correlation. A correlation value of 1 implies perfectly correlated with a positive line slope. While a correlation of -1 implies perfectly anticorrelated with a negative line slope.
To calculate the correlation, in addition to what we did for calculating covariance, we also need to calculate the standard deviations of X and X.
The correlation for our test cases is:

As expected, we get the same dependence as we did with covariance. The independent test case has a zero correlation. While the test case with a positive covariance has a positive correlation lower than 1.
One more thing, it is important to note that covariance/correlation does not imply causation i.e. X is correlated to Y does not mean that X is the cause of Y. Seema Singh has written a great article about this: https://towardsdatascience.com/why-correlation-does-not-imply-causation-5b99790df07e
Conclusion
To conclude, we looked at what are random variables, what is a probability mass function. After that, we discussed the independence of two random variables. Finally, we looked at covariance and correlation as metrics for measuring linear dependence between two random variables. With the latter being only the normalised version of the former.
References
[1]http://www.stat.yale.edu/Courses/1997-98/101/ranvar.htm
[2]https://online.stat.psu.edu/stat414/lesson/7/7.1
[3]https://en.wikipedia.org/wiki/Probability_mass_function
[4]https://www.math.umd.edu/~millson/teaching/STAT400fall18/slides/article16.pdf