Python for data science : Part 1

Rohan Joseph
Towards Data Science
6 min readJul 31, 2018

--

Python has numerous applications — web development, desktop GUIs, software development, business applications and scientific/numeric computing. In this series we will be focusing on how to use numeric computing in Python for data science.

In this module, we will be looking at the following basic features of Python :

1. Python function

2. Data types and sequences

3. Date and time

4. Lambda

5. Map

6. Filter

7. Reduce

8. Zip

9. For loop

10. List comprehension

1. Python function

A function is a block of code which only runs when it is called. You can pass data, known as parameters into a function. Let’s write a function to multiply two numbers.

#multiply two numbers using a python function
def multiply(x,y):
z = x*y
return z
#call the function to multiply the numbers 2 and 3
multiply(2,3)

Output : 6

2. Python data types and sequences

Python has built-in data types to store numeric and character data. Let us take a look at a few common types.

type(' My name is Rohan')

Output : str

type(1)

Output : int

type(1.0)

Output : float

type(None) #None signifies 'no value' or 'empty value'

Output : NoneType

type(multiply) #multiply is a function we created previously

Output : function

Now, let’s take a look at how we can store a list of numbers and characters, and how to perform few basic manipulations.

i. Tuples : They are immutable data structures which cannot be altered unlike lists

a = (1,2,3,4)
type(a)

Output : tuple

ii. Lists : They are mutable objects

b = [1,2,3,4]
type(b)

Output : list

Let’s append a number to the list b created above.

b.append(2.2) #append to list using this function
print(b)

Output : [1, 2, 3, 4, 2.2]

Loop through the list and print the numbers

for number in b: #looping through list
print(number)

Output :

1
2
3
4
2.2

Now, let’s concatanate two lists

[1,2,3] + [1,'abc','de'] #concatenate lists

Output : [1, 2, 3, 1, ‘abc’, ‘de’]

Create a list with repeating numbers.

[1,2]*3 #repeat lists

Output : [1, 2, 1, 2, 1, 2]

Check if an object you are searching for is in the list.

3 in b #in operator to check if required object is in list

Output : True

Unpack a list into separate variables.

a,b = ('abc','def')
print(a)
print(b)

Output : abc
def

iii. Strings : A string stores character objects

x = 'My name is Rohan'

Access characters from string :

x[0] #Access first letter

Output : ‘M’

x[0:2] #Accesses two letters

Output : ‘My’

x[:-1] #Accesses everything except last letter

Output : ‘My name is Roha’

x[10:] #returns all the characters from 10th position till end

Output : ‘ Rohan’

Now, let’s concatenate two strings.

first = 'Rohan'
last = 'Joseph'

Name = first + ' ' + last #string concatenation
print(Name)

Output : Rohan Joseph

Split the words in the previous string by using ‘split’ function.

Name.split(' ') #split the words in a string using split function

Output : [‘Rohan’, ‘Joseph’]

Show only the first word.

Name.split(' ')[0] #Show the first word

Output : ‘Rohan’

Now, show only the second word in the string

Name.split(' ')[1] #Show the second word

Output : ‘Joseph’

For concatenating numeric data to string, convert the number to a string first

#for concatenation convert objects to strings
'Rohan' + str(2)

Output : Rohan2

iv. Dictionary : A dictionary is a collection which is not ordered, but is indexed — and they have keys and values.

c = {"Name" : "Rohan", "Height" : 176}
type(c)

Output : dict

Print data contained within a dictionary

print(c)

Output : {‘Name’: ‘Rohan’, ‘Height’: 176}

Access dictionary values based on keys

c['Name'] #Access Name

Output : ‘Rohan’

c['Height']

Output : 176

Print all the keys in the dictionary

#print all the keys
for i in c:
print(i)

Output : Name
Height

Print all the values in the dictionary

for i in c.values():
print(i)

Output : Rohan
176

Iterate over all the items in the dictionary

for name, height in c.items():
print(name)
print(height)

Output : Name
Rohan
Height
176

3. Python Date and Time

The following modules helps us in manipulating date and time variables in simple ways.

import datetime as dt
import time as tm

Print the current time in seconds (starting from January 1, 1970)

tm.time() #print current time in seconds from January 1, 1970

Output : 1532483980.5827992

#convert timestamp to datetime
dtnow = dt.datetime.fromtimestamp(tm.time())
dtnow.year

Output : 2018

Get today’s date

today = dt.date.today()
today

Output : datetime.date(2018, 7, 30)

Subtract 100 days from today’s date

delta = dt.timedelta(days=100)
today - delta

Output : datetime.date(2018, 4, 21)

4. Map function

Map function returns a list of the results after applying the given function to each item of a given sequence. For example, let’s find the minimum value between two pairs of lists.

a = [1,2,3,10]
b = [5,6,2,9]

c = map(min,a,b) #Find the minimum between two pairs of lists
for item in c:
print(item) #print the minimum of the pairs

Output : 1
2
2
9

5. Lambda function

Lambda function is used for creating small, one-time and anonymous function objects in Python.

function = lambda a,b,c : a+b+c #function to add three numbers
function(2,2,3) #call the function

Output : 7

6. Filter function

Filter offers an easy way to filter out all the elements of a list. Filter (syntax : filter(function,list)) needs a function as its first argument, for which lambda can be used. As an example, let’s filter out only the numbers greater than 5 from a list

x = [1,2,3,4,5,6,7,8,9] #create a list
x2 = filter(lambda a : a>5, x) #filter using filter function
print(list(x2))

Output : [6,7,8,9]

7. Reduce function

Reduce is a function for performing some computation on a list and returning the result. It applies a rolling computation to sequential pairs of values in a list. As an example, let’s calculate the product of all the numbers in a list.

from functools import reduce #import reduce function
y = [1,2,3,4,5] #create list
reduce(lambda a,b : a*b,y) #use reduce

Output : 120

8. Zip function

Zip function returns a list of tuples, where the i-th tuple contains the i-th element from each of the sequences. Let’s look at an example.

a = [1,2,3,4] #create two lists
b = [5,6,7,8]
c = zip(a,b) #Use the zip function
print(list(c))

Output : [(1,5), (2,6), (3,7), (4,8)]

If the sequences used in the zip function is unequal, the returned list is truncated in length to the length of the shortest sequence.

a = [1,2] #create two lists
b = [5,6,7,8]
c = zip(a,b) #Use the zip function
print(c)

Output : [(1,5), (2,6)]

9. For loop

For loops are usually used when you have a block of code which you want to repeat a fixed number of times.

Let us use a for loop to print the list of even numbers from 1 to 100.

#return even numbers from 1 to 100

even=[]
for i in range(100):
if i%2 ==0:
even.append(i)
else:
None
print(even) #print the list

Output : [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98]

10. List comprehension

List comprehension provides an easier way to create lists. Continuing the same example, let’s create a list of even numbers from 1 to 100 using list comprehension.

even = [i for i in range(100) if i%2==0]
print(even)

Output : [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98]

The features we looked at help in understanding the basic features of Python which are used for numerical computing. Apart from these in-built functions, there are other libraries such as Numpy and Pandas (which we look at in the upcoming articles) which are used extensively in data science.

Resources :

  1. https://docs.python.org/
  2. https://www.coursera.org/specializations/data-science-python

--

--