The world’s leading publication for data science, AI, and ML professionals.

Recap on the Python Basics to Get Started for Data Science

Review what has been learned and learn something new.

66 Days Of Data

After we learn something for a long period, we will have a broader understanding of the subjects. Revising the knowledge we learned in the past might provide us with some new insight on a particular subject.

I was motivated by the "What is the # 66daysOfData" video by Ken Jee. The video explained a challenge started by him, which contains two-part:

  1. Learn about Data Science every day for 66 days straight, even for 5 minutes.
  2. Share our progress on our social media platform of choice using #66DaysOfData.

I decided to start my journey on 24 July 2021 and chose Medium as my sharing platform.


On the first day of my 66 Days Of Data journey🎉 , I decided to write on the idea I gotten during reading a book recommended by a friend, "Machine Learning Mastery with Python". The book explained how to work on a Machine Learning project end to end. The book starts with basic Python syntax and also some common libraries used for Machine Learning, such as Pandas, Numpy, Matplotlib, and ScikitLearn.

This reminds me of when I entered the first class of Machine Learning. I had self-learned Python basics before the class and I thought I was prepared, but I was wrong. The programming lab for Machine Learning started with some basic Python syntax and jumped directly to common libraries mentioned above. I was overwhelmed at the beginning of the semester 🤢.

Luckily, things are not that difficult and classmates are supportive 🥰.

Okay, enough memories, let’s get started!


This article is to share something I didn’t learn while I self-learned Python Basics, but that is quite useful (and fun) for my Data Science Learning Journey and my job.


Variable

Most of the time when we start to learn a Programming language, we start with variable assignments. Variable assignment in Python is the same as in other programming languages. We can assign string, number (include integer and float), and boolean as variables in Python. Furthermore, we can assign an empty value or so-called None as a variable in Python.

string_variable = "Hello"
numerical_variable = 5 
boolean_variable = True
no_value_variable = None

a. String Variable Fun Fact

Instead of writing a for loop to print a phrase 100 times, we can do the following script.

print("phrasen" * 100)

"n" refers to a new line, hence, the script will print the word "phrase" in a new line every time.

b. f-string

There are few ways to print the variable with the string, but f-string is the most efficient way.

name = "Janelle"
years = 10
print(f"My name is {name}, I have been working in this company for {years} years")
# output: My name is Janelle, I have been working in this company for 10 years

Yes, by adding an f after the bracket and wrap the variable name with a curly bracket, we can easily print multiple variables within a long line of string. The code will be easier to read if compared to using + sign to concatenate the string and variable, or using string formatting operations.

c. Multiple Assignment

Normally we assign a variable at a time, but sometimes it’s quite handy to do multiple assignments.

a,b = 3,5
print(f"a: {a}; b: {b}")
# output: a: 3; b: 5

For example, swiping variables is also a type of multiple assignments.

d. Swipe Two Variables

a,b = b,a
print(f"a: {a}; b: {b}")
# output: a: 5; b: 3

Instead of declaring a new variable to act as temporary storage for the value of one of the variables, in Python, we can directly swipe the value of two variables.


Data Structure

The common data structures of Python are lists, tuples, and dictionaries.

a_list = ["Tom", 26, "Anywhere"]
a_tuple = (25, 7)
a_dictionary = {"name": "Tom", "age": 26, "home": "Anywhere"}

In Data Science, the data structure that is frequently used included the DataFrame and NumPy array. A data science project usually used several libraries for different tasks. For example, data preprocessing and data visualization, and different libraries are not always using the same data structure for input. To use different libraries efficiently, we shall also know how to convert one data structure to another.

NumPy Array

NumPy is a Python library used for working with arrays. NumPy arrays are similar to lists except it stored at one continuous place in memory. So, Numpy arrays can be processed faster than lists [2].

import numpy

lists = [1, 2, 3, 4, 5]
lists = [x + 5 for x in lists]
print(lists)
# output: [6, 7, 8, 9, 10]
arrays = numpy.array([1, 2, 3, 4, 5])
arrays += 5
print(arrays)
# output: [ 6  7  8  9 10]

Both NumPy arrays and lists can be used as input for mathematical operations.

Pandas DataFrame

DataFrame is a multi-dimensional array where the columns and rows can be labelled [1]. In other words, DataFrame is similar to an Excel table.

Convert One Data Structure to Another

To better understand the DataFrame, and how to create DataFrame from Lists and Arrays.

The image below explained the output from the script above.


Flow Control

Flow control is referring to the While loop, For loop, and If-Then-Else.

time_now = 0
while time_now <= 8:
    print("Continue to work")
    time_now += 1
For x in range(n):
    print(x)
age = 23
if age < 21:
    print("You cannot drive as you under the legal age to own a license.")
elif age >= 21:
    print("You can drive.")

While loop: Continue to perform the action until the condition is not true.

For loop: Repeat the task for n times.

If-Then-Else: Perform the action if the condition is met

def adult(age):
    return age >= 21

age = 23
if adult(age):
    print("You can drive.")
else:
    print("You cannot drive as you under legal age to own a license.")

In this way, we may need to write more lines of code, but this makes the code much more readable. For me, I feel this is a good practice while coding for a large project (where I tend to forget what the numbers stand for).


Function

In Python basics, we learned about how to declare a new function and how to use the built-in function, such as print(). As you carry on your learning, new terms like recursion, and method may appear.

a. Recursion

Recursive functions referring to the function are called within the function itself, under certain conditions. If no proper conditions are set, an infinite loop might be accidentally created.

# Recursive function
def recursive_factorial(n):
    if n == 1:
        return n
    else:
        return n * recursive_factorial(n - 1)

# user input
num = 6

# check if the input is valid or not
if num < 0:
    print("Invalid input ! Please enter a positive number.")
elif num == 0:
    print("Factorial of number 0 is 1")
else:
    print("Factorial of number", num, "=", recursive_factorial(num))
# output: Factorial of number 6 = 720

Complete explanation on Recursion in Python.

b. Method

The method is the same as the function, it performs a task. When a function is attached to a Class, then it is called a Method.

The Class can be defined as a blueprint to create new objects, which have the same variables (known as features if it is attached to a Class) and may carry out certain methods(the function is called a method when it is attached to a Class or defined within a Class).

Have you noticed that I always type Class with Capital letter on the first letter, this is because that is how we named a Class. The name of a Class always starts with capital letters [3].

A typical example of the Class in Python is the Student mark collection. In the following example, the student’s name and mark shall be printed on the screen after the input is collected.

The output is shown in the figure below.

By creating a Student Class, we can save the effort to use print() function to print the students’ names and their marks one by one. Instead, we can use the method declared in the Student Class.

In the drop-down list in the image below, there are

  1. ‘f’ (stand for feature) before name and mark – the variables created in the Student Class
  2. ‘m’ (stand for method) before print_result() – the function defined in the Student Class

The Class can make our works easier if use it efficiently.


That’s the basics of Python syntax.

To get started in Data Science Path, we need few steps further. Just like the web developer shall at least know Flask or Django libraries, Data scientists shall know about:

  1. Pandas — loading and processing structure data
  2. Numpy — faster computation for numerical data
  3. Matplotlib — create static, animated, and interactive visualizations
  4. ScikitLearn — Machine Learning, data preprocessing and evaluation

The above are the basics we shall know to kick-start the Data Science journey. There is more to learn to be a Data Scientist. Keep in mind that a Scientist can never stop learning.

There is more than one library that can be used to perform a task. There are always newer and better libraries for the tasks. As scientists, we shall always keep track of the update of technologies, and learn to solve the problem most efficiently and accurately.


Always Have Some Fun with Python!

Learning is fun! Here’s a fun project I have done on the weekend with Python, recreating the Dot Painting by Damien Hirst with Python libraries, colorgram.py and turtle . A simple yet fun project.

Credit: This project is inspired by Dr Angela Yu in her 100 Days of Code Course.

The script is as follows.

Try yourselves too if you haven’t 😄


Side Note

Text Processing in Python – At the end of this article, I shared how to change the current data structure of the output to the desired data structure for different purposes.

Stay Connected

Subscribe on YouTube

Reference

[1] Machine Learning Mastery with Python by Jason Brownlee

[2] Introduction to Numpy by W3School

[3] 100 Days of Code Course by Dr Angela Yu

Congrats and thanks for reading to the end. Hope you enjoy this article. 😊


Related Articles