PyTrix Series
Python is a popular language for Data Science. With it’s easy to learn (and read) syntax it makes getting up and running with the language much more accessible for newbies. However, without getting into the details, Python is an interpreted language which means it runs much slower than a compiled language, like C.
When we perform Deep Learning it’s likely we are using large amounts of data because that is when Deep Learning thrives.

Why am I saying all of this? Great question!
If we have large amounts of data and slow Python code, we are more than likely going to end up with a model that runs at snails pace because our code is not computationally optimal… What was man’s solution to this great disaster? Vectorization! B-)
What is Vectorization?
To put it in layman’s terms, It speeds up Python code without the need for looping, indexing, etc., and in Data Science we use Numpy to do this – Numpy is the de facto framework for scientific programming. Technically, we still perform these operations when we implement the vectorized form in Numpy, but just not in Python – under the hood. Instead, the operations are done in optimised, pre-compiled C code – see the Documentation for more information on this.
"This practice of replacing explicit loops with array expressions is commonly referred to as vectorisation. In general, vectorized array operations will often be one or two (or more) orders of magnitude faster than their pure python equivalents, with the biggest impact seen in any kind of numerical computations" – McKinney, 2012, p. 97
Pure Python vs Numpy Examples
In this section, I will implement some examples in python then implement the same code with numpy and compare the computation time of both, so we can get a visual understanding of vectorization. Link to my Github repository for the code is below.
Outer Product
The Outer product of two vectors will result in a matrix. For instance, if we have two vectors of n
and m
dimensions then the outer product of these two vectors – see Figure 2.
![Figure 2: Outer Product formula [Source]](https://towardsdatascience.com/wp-content/uploads/2020/05/1BwHV54X2d6BnGi6FW1K7jw.png)
import numpy as np
import time
a = np.arange(10000)
b = np.arange(10000)
# pure Python outer product implementation
tic = time.process_time()
outer_product = np.zeros((10000, 10000))
for i in range(len(a)):
for j in range(len(b)):
outer_product[i][j]= a[i] * b[j]
toc = time.process_time()
print("python_outer_product = "+ str(outer_product))
print("Time = "+str(1000*(toc - tic ))+"msn")
# Numpy outer product implementation
n_tic = time.process_time()
outer_product = np.outer(a, b)
n_toc = time.process_time()
print("numpy_outer_product = "+str(outer_product));
print("Time = "+str(1000*(n_toc - n_tic ))+"ms")
This cell outputs…

Dot Product
Also referred to as the inner product, the dot product takes two sequences of numbers that equal in length and returns a scalar – see Figure 4.

import numpy as np
import time
a = np.arange(10000000)
b = np.arange(10000000)
# pure Python outer product implementation
tic = time.process_time()
dot_product = 0
for i in range(len(a)):
dot_product += a[i] * b[i]
toc = time.process_time()
print("python_dot_product = "+ str(dot_product))
print("Time = "+str(1000*(toc - tic ))+"msn")
# Numpy outer product implementation
n_tic = time.process_time()
dot_product = np.dot(a, b)
n_toc = time.process_time()
print("numpy_dot_product = "+str(dot_product))
print("Time = "+str(1000*(n_toc - n_tic ))+"ms")
The output from this code block…

Ultimately, vectorization not only makes our code faster and easier to read, we reduce the amount of code we have to write, which usually means we get fewer bugs. On top of that, the code we write looks much "Pythonic" since we get rid of all the inefficient, difficult to read for loops in our code base.
Note: If this is completely new to you, I’d suggest viewing the videos linked below from Andrew Ng from the Deep Learning specialization on Coursera and getting to grips with the Numpy Documentation