The world’s leading publication for data science, AI, and ML professionals.

17 Must Know Code Blocks For Every Data Scientist

Discussing the 17 code blocks that will help you to effectively tackle most tasks and projects as a data scientist

Photo by Pakata Goh on Unsplash
Photo by Pakata Goh on Unsplash

Python offers some of the simplistic and versatile codes to implement complex problems within minimal code blocks. While other programming languages have more complex annotations for solving a particular task, Python offers an easier solution most of the time. Every data scientist must know certain code blocks to get started with their Data Science and machine learning journeys.

It is essential to remember that some lines of code or particular code blocks are always reusable, and they can utilize in multiple programs. Hence, every level of coders, including beginners, intermediate-level coders, advanced, or experts, must develop the habit of remembering useful codes for acquiring quicker solutions.

In this article, our primary objective is to cover some code blocks that will be used regularly by data scientists no matter what type of projects, problems, or tasks they are working on at the current time. The seventeen code blocks shown in the article offer great reusability on most tasks with little or no modification.

While the initial focus and the targeted audience for this article are for beginner data scientists, other intermediate and advanced Data Science enthusiasts who are switching from a previous Programming language to Python will find this article useful. So, without further ado, let us get started with our exploration of these code blocks.


1. Conditional And Iterative Statements

def even(a):
    new_list = []
    for i in a:
        if i%2 == 0:
            new_list.append(i)
    return(new_list)

a = [1,2,3,4,5]
even(a)

The conditional and iterative statements are the codes that welcome most users into their programming language. Even though these elements are the most basic aspects of coding, they find extensive usage throughout Data Science related tasks in machine learning and deep learning. Without these code blocks, it is almost impossible to perform the most complex tasks.

The above code block is a simple example of a function that uses both the conditional if statement and the for loop. The for loop iterates over all the elements, and the if statement checks for the even numbers. While this code block is a trivial example, there are several other utility options that the user must keep in mind.


2. Lists

lst = ['one', 'two', 'three', 'four']
lst.append('five')
lst

Lists are the most crucial aspect of data structures. Most data structures are a collection of different data elements that are structured in some way. Lists have some properties that make them used in almost every single project or complex task developers work on. The mutability of lists allows them to be changed or modified according to the specific use case.

For any program, you will want a list to store some information or data related to the particular task you are performing. To store additional elements in a list, you will often use the append statement alongside a for loop for iterating over a specific command and storing the elements accordingly. To learn all the concepts and master lists, check out the following article provided below.

Mastering Python Lists For Programming!


3. Dictionaries

# Dictionary with integer keys
my_dict = {1: 'A', 2: 'B'}
print(my_dict)

# Dictionary with string keys
my_dict = {'name': 'X', 'age': 10}
print(my_dict)

# Dictionary with mixed keys
my_dict = {'name': 'X', 1: ['A', 'B']}
print(my_dict)

Another important data structure we will look at is the dictionary. This data structure also finds great utility in most programs. Dictionaries hold a collection of unordered elements. With the help of these dictionaries, you can store a key variable that can hold many values. When the particular key is called, we can access all their respective values as well.

Dictionaries are easy to create and store in any program. Developers mostly prefer these data structures for a variety of tasks that requires the storage of a doublet of elements. They store a pair of elements, namely, a key and a value. To learn more about dictionaries, check out the following article that covers most of the aspects in detail.

Mastering Dictionaries And Sets In Python!


4. Break And Continue

a = [1,2,3,4,5]
for i in a:
    if i%2 == 0:
        break
for j in a:
    if j%2 == 0:
        continue

The break and continue operations are two of the most useful components that developers and programmers must keep in mind while computing any complex task related to Data Science. These statements help to terminate a loop or conditional statement or continue the operation by skipping the unnecessary element.

The above code block shown above is a simple representation of the wide array of tasks that one can perform with these two statements. If you encounter a specific variable or condition and you want to end the loop, the break statement is the right choice for that task. Despite entering the specific condition or variable, if you just want to skip that particular element but continue with the entire operation, then the continue statement is the best choice for you.


5. Lambda

f = lambda x:x**2
f(5)

While normal functions use the def keyword, they are more suitable for bigger code blocks. However, if you want quick and efficient results with the most effective time and space complexities, developers must consider the use of the lambda function.

The lambda function evaluates a value and immediately returns a result or output solution in a single line of code. Hence, every developer must consider using a lambda function operation for simplifying the code and performing the appropriate task with relative ease and higher efficiency.


6. Filter

a = [1, 2, 3, 4, 5]
even = list(filter(lambda x: (x%2 == 0), a))
print(even)

The filter condition is used to simplify most operations in which we will remove all the unnecessary elements and keep only the most essential, required elements for the suitable task. The effectiveness of this function is because of the fact that any complex task can be solved within a single or a few lines of code.

In the first code block that is essential for everyone to remember, we discussed an example of printing all the even numbers. We noticed that we used both the conditional statement and the iterative loop for processing the following task. However, in the above code block, we can execute the same task of printing only the even numbers for a list of elements within a single line of code.


7. Map

a = [1, 2, 3, 4, 5]
squares = list(map(lambda x: x ** 2, a))
print(squares)

The map is another unique function that considers all the essential elements in the given data structure and loops through them accordingly. It performs a specific action for each of the mentioned elements as the argument is provided for this operation.

The map function can be summed up as a built-in function in Python that allows you to process and transform all the items in an iterable without using an explicit for loop. The above code block performs an operation of looping through the provided list and generating the squares of each of the provided elements accordingly.


8. Reduce

from functools import reduce

a = [1, 2, 3, 4, 5]
product = reduce(lambda x, y: x*y, a)
print(product)

Unlike the previous two functions, namely filter() and map(), the reduce function works slightly differently. It goes through the list of iterable numbers and proceeds to return only a single value. For using this function, you will import an additional module called functools, after which you can proceed to use the reduce operation.

The reduce function is the last of the anonymous functions that we will discuss in this article. To explore more on the topics of advanced functions and gain a more intuitive understanding with codes and examples on these concepts, check out one of my previous articles from the link provided below.

Understanding Advanced Functions In Python With Codes And Examples!


9. Numpy

import numpy as np
X = np.array(X)
y = np.array(y)

y = to_categorical(y, num_classes=vocab_size)

Numerical Python is one of the best libraries for the computation of mathematical problems. There are a wide array of problems and tasks that developers and programmers can solve with the help of this amazing library. You can convert the stored lists with integer elements into the numpy framework and start performing various operations on them.

The applications of numpy are numerous in every single field. In a topic such as Computer Vision, we can make use of numpy arrays for the visualization of RGB or grayscale images in the form of a numpy array and converting each of the elements accordingly. In most of the developed natural language processing projects, we usually prefer the conversion of the text data into the form of vectors and numbers for increased optimized computation. To perform the following task, you can import the numpy library as np and proceed to convert the text data into categorical data as shown in the above code block.


10. Pandas

Photo by Pascal Müller on Unsplash
Photo by Pascal Müller on Unsplash
import pandas as pd
data = pd.read_csv("fer2013.csv")
data.head()

Another machine learning library that you will constantly make use of to interpret the data is the panda’s library. Pandas is one of the best libraries for looking at data in almost any format, especially CSV or excel files. Its find exceptional utility in task related to data manipulation and data analysis in machine learning projects.

It handles most tasks related to data alignment, indexing, slicing, and sub-setting of extremely large datasets. The library offers high utility to solve most complex tasks in a structured format. You can simply read the data available to you in a single line of code and proceed to interpret the data in a convenient manner to the users.


11. Matplotlib

import matplotlib.pyplot as plt
plt.bar(classes, train_counts, width=0.5)
plt.title("Bar Graph of Train Data")
plt.xlabel("Classes")
plt.ylabel("Counts")
Bar Graph
Bar Graph

A final machine learning algorithm that is almost always paired with Numpy and Pandas is matplotlib. This library is extremely useful for visualization purposes. While the other two libraries help to look at the individual aspects of data elements in a structural or numerical manner, the matplotlib library helps us to cover these aspects in a visual representation form.

Having a visual representation of the data helps us to perform exploratory data analysis in machine learning tasks. With these analysis methods, we can obtain suitable directions for approaching a particular problem. The code block is a representation of viewing your data in the form of a bar graph. This visualization is a commonly used technique for looking at the data. To learn more visualization techniques to consider for your data science projects, check out the following article for a concise guide on the same.

8 Best Visualizations To Consider For Your Data Science Projects!


12. Regular Expressions

import re
capital = re.findall("[A-Z]w+", sentence)
re.split(".", sentence)
re.sub("[.?]", '!', sentence)
x = re.search("fun.", sentence)

The regular expression module is a pre-built library in Python that offers the developers with some of the best ways to deal with any natural language processing tasks. It provides the users with multiple commands to simplify the available textual data for the users. With the help of the re library, you can import it to perform multiple operations on letters, words, and sentences.

The above four code lines mentioned in the above code block are some of the most significant regular expression operations that users must know about. To learn more about this concept and how you can use the four mentioned regular expression operations to simplify natural language processing tasks, check out the following article below.

Natural Language Processing Made Simpler with 4 Basic Regular Expression Operators!


13. Natural language processing toolkit

import nltk

sentence = "Hello! Good morning."
tokens = nltk.word_tokenize(sentence)

While the regular expression operations are fabulous for dealing with the primary development stages of a natural language processing project, it becomes essential to make use of another brilliant library that will perform most tasks like stemming, tokenization, lemmatization, and other such operations effectively. Thanks to the Natural language processing toolkit (NLTK) library, users can easily develop NLP projects with ease.

The NLTK library is one of the most useful tools for developers to utilize because of the ability of this module to simplify the most complex tasks with a few lines of code. Most of the functions provided by this library allow you to perform complicated adjustments to the textual data within a single or a few lines of code. The above example in the code block with the provided output is one such example.


14. Images with pillow

# Importing the required libraries
import numpy as np
from PIL import Image
import PIL# Opening and analyzing an image
image1 = Image.open('Red.png')
print(image1.format)
print(image1.size)
print(image1.mode)

Working with images is an essential aspect for data scientists who are interested in dwelling further into the fields of computer vision and image processing. Pillow is one such library in Python that offers the users an enormous amount of versatility to deal with the management of images and pictures.

There are tons of tasks that users can perform with the help of the pillow library. The example shown in the above code block will help the users to open an image with the specified path. When you open the particular image in the known path, you can understand numerous parameters such as height, width, and the number of channels. You can manage and manipulate the image accordingly and finally save the image.


15. Images with Open-CV

import cv2 # Importing the opencv module

image = cv2.imread("lena.png") # Read The Image
cv2.imshow("Picture", image) # Frame Title with the image to be displayed 
cv2.waitKey(0)

Open-CV is one of the best libraries that is used by developers in all stages for the successful computation of tasks related to images, pictures, visuals, or videos. This library is used for the computation of most tasks, including activities related to real-time web camera operations. The overall accessibility and popularity of this module make it a must-know for most data scientists.

The above code block is a simple example for the visualization of an image to the specified directory path. To learn more and get started with computer vision and master all the basic elements of this library, I would highly recommend checking out one of my previous articles that covers all the basics related to computer vision with several codes and examples.

OpenCV: Complete Beginners Guide To Master the Basics Of Computer Vision With Code!


16. Classes

class Derivative_Calculator:
    def power_rule(*args):
        deriv = sympy.diff(*args)
        return deriv
    def sum_rule(*args):
        derive = sympy.diff(*args)
        return deriv
differentiatie = Derivative_Calculator
differentiatie.power_rule(Derivative)

Classes are an integral part of object-oriented programming languages. Python makes use of classes for encompassing the bundling of data and functionality together. Compared to other programming languages, the class mechanics in Python are a bit different. It is a mixture of the class mechanisms found in C++ and Modula-3.

Classes are used extensively, even for the development of deep learning models. While writing TensorFlow codes, you might want to create a custom class for defining your models accordingly. This model sub-classing method is used by developers at the highest stage. If you are curious to learn more about the above code block example, check out the following article that covers the topic of the best library to simplify math for machine learning.

Best Library To Simplify Math For Machine Learning!


17. Random

import random
r = random.uniform(0.0,1.0)

The random library, which is pre-built and offered by Python, is one of the most essential modules that will help you to achieve most tasks that require uncertainty or a level of randomness inserted into them. They find extensive usage in most programming tasks related to predictions in machine learning problems.

While humans prefer precision in most of the tasks that they try to perform, most computers have a range of values for predicting the precise values. Hence, the random variable and library are some of the most essential elements in Python since machine learning and deep learning projects require the user to specify a range of randomness from which the most accurate values can lie.


Conclusion:

Photo by Dean Pugh on Unsplash
Photo by Dean Pugh on Unsplash

"Any fool can write code that a computer can understand. Good programmers write code that humans can understand." – Martin Fowler

In any programming language, there are certain aspects that you will re-visit and reuse more often in comparison to other topics. The Python programming language is similar as well, as we have certain code blocks that the users will use more than others, and that is what we exactly tried to cover in this article. While we covered only some elements, they are many other concepts that are out there in the world of Python coding to explore.

In this article, we discussed seventeen of the code blocks that every developer and programmer of Data Science must have in mind while starting their coding process. By just remembering these simple code blocks or just knowing that they exist will help you to look these up and find the most suitable and appropriate solutions for any type of task that you are currently computing.

If you have any queries related to the various points stated in this article, then feel free to let me know in the comments below. I will try to get back to you with a response as soon as possible.

Check out some of my other articles that you might enjoy reading!

6 Best Projects For Image Processing With Useful Resources

Best PC Builds For Deep Learning In Every Budget Ranges

7 Best Free Tools For Data Science And Machine Learning

Best Library To Simplify Math For Machine Learning!

6 Best Programming Practices!

Thank you all for sticking on till the end. I hope all of you enjoyed reading the article. Wish you all a wonderful day!


Related Articles