
Introduction
A couple of days ago, I shared some Python and Pandas tricks to help Data Analysts and Data Scientists quickly learn new valuable concepts that they might not be aware of. This is also part of the collection of tricks I share daily on LinkedIn.
Pandas
Combine SQL statements and Pandas
My gut feeling is telling me that more than 80% of the Data Scientists use Pandas in their daily Data Science activities.
And, I believe that this is because of the benefits it offers of being part of the wider range of the Python universe, making it accessible to many people.
๐๐๐๐ฉ ๐๐๐ค๐ช๐ฉ ๐๐๐? Even though not everyone uses it in their daily life (because not every company has necessary a SQL Database?), SQL’s performance is undeniable. Also, it is human-readable which makes it easily understood by even non-tech people.
โWhat if we could find a way to ๐๐ค๐ข๐๐๐ฃ๐ ๐ฉ๐๐ ๐๐๐ฃ๐๐๐๐ฉ๐จ ๐ค๐ ๐๐ค๐ฉ๐ ๐๐๐ฃ๐๐๐จ ๐๐ฃ๐ ๐๐๐ statements?
โ Here is where ๐ฝ๐ฎ๐ป๐ฑ๐ฎ๐๐พ๐น comes in handy ๐๐๐
Below is an illustration ๐ก Also you can watch the full video here.

Update data of a given dataframe with another dataframe
There are multiple ways of replacing missing values ๐งฉ in Pandas, from simple imputation to more advanced methods.
But … ๐จ
Sometimes, you just want to replace them using non-NA values from another DataFrame.
โ This can be achieved using the built-in update function from Pandas.
It aligns both DataFrames on their index and columns before performing the update.
General syntax โ๏ธ below:
๐ณ๐ถ๐ฟ๐๐_๐ฑ๐ฎ๐๐ฎ๐ณ๐ฟ๐ฎ๐บ๐ฒ.๐๐ฝ๐ฑ๐ฎ๐๐ฒ(๐๐ฒ๐ฐ๐ผ๐ป๐ฑ_๐ฑ๐ฎ๐๐ฎ๐ณ๐ฟ๐ฎ๐บ๐ฒ)
โจ missing values from ๐ณ๐ถ๐ฟ๐๐_๐ฑ๐ฎ๐๐ฎ๐ณ๐ฟ๐ฎ๐บ๐ฒ dataframe are replaced with non-missing values from ๐๐ฒ๐ฐ๐ผ๐ป๐ฑ_๐ฑ๐ฎ๐๐ฎ๐ณ๐ฟ๐ฎ๐บ๐ฒ
โจ ๐ผ๐๐ฒ๐ฟ๐๐ฟ๐ถ๐๐ฒ=๐ง๐ฟ๐๐ฒ will overwrite ๐ณ๐ถ๐ฟ๐๐_๐ฑ๐ฎ๐๐ฎ๐ณ๐ฟ๐ฎ๐บ๐ฒ’s values from using ๐๐ฒ๐ฐ๐ผ๐ป๐ฑ_๐ฑ๐ฎ๐๐ฎ๐ณ๐ฟ๐ฎ๐บ๐ฒ data, and this is the default value. If ๐ผ๐๐ฒ๐ฟ๐๐ฟ๐ถ๐๐ฒ=๐๐ฎ๐น๐๐ฒ only the missing values are replaced.
Here is an illustration ๐ก

From unstructured to structured data
Data preprocessing is full of challenges ๐ฅ
Imagine you have this data with candidates’ information in the following format:
‘๐๐ฑ๐ท๐ฎ ๐๐ผ๐ป๐ฒ: ๐ต๐ฎ๐ ๐ ๐ฎ๐๐๐ฒ๐ฟ ๐ถ๐ป ๐ฆ๐๐ฎ๐๐ถ๐๐๐ถ๐ฐ๐ ๐ฎ๐ป๐ฑ ๐ถ๐ ๐ฎ๐ฏ ๐๐ฒ๐ฎ๐ฟ๐ ๐ผ๐น๐ฑ’
…
‘๐๐ฎ๐ป๐๐ฎ ๐ง๐ฟ๐ฎ๐ผ๐ฟ๐ฒ: ๐ต๐ฎ๐ ๐ฃ๐ต๐ ๐ถ๐ป ๐ฆ๐๐ฎ๐๐ถ๐๐๐ถ๐ฐ๐ ๐ฎ๐ป๐ฑ ๐ถ๐ ๐ฏ๐ฌ ๐๐ฒ๐ฎ๐ฟ๐ ๐ผ๐น๐ฑ’
Then, your task is to generate a table with the following information per candidate for further analysis:
โจ The first and last name
โจ The degree and field of study
โจ The Age
๐จ Performing such a task can be daunting ๐คฏ
โ This is where the ๐๐๐ฟ.๐ฒ๐ ๐๐ฟ๐ฎ๐ฐ๐() function in Pandas can help!
It is a powerful text-processing function for extracting structured information from unstructured textual data.
Below is an illustration ๐ก

Perform multiple aggregations with the agg() function
If you want to perform multiple aggregation functions like ๐๐๐บ, ๐ฎ๐๐ฒ๐ฟ๐ฎ๐ด๐ฒ, ๐ฐ๐ผ๐๐ป๐ … on one or multiple columns.
โ You can combine ๐ด๐ฟ๐ผ๐๐ฝ๐ฏ๐() ๐ฎ๐ป๐ฑ ๐ฎ๐ด๐ด() ๐ณ๐๐ป๐ฐ๐๐ถ๐ผ๐ป๐ from Pandas in one line of code.
Here is a Scenario ๐ฌ ๐๐ฝ
Let’s imagine this students’ data containing information about:
โจ Students’ areas of study
โจ Their grades
โจ The graduation years and the age of each student.
And, you have been requested to compute the following information per area of study and year:
โ The number of students
โ The average grade
โ The average age
Below is an image illustration ๐ก for solving the scenario.

Select observations between two specified times
When working with time series data, you might want to select observations between two specified times for further analysis.
โ This can be quickly achieved using the ๐ฏ๐ฒ๐๐๐ฒ๐ฒ๐ป_๐๐ถ๐บ๐ฒ() function.
Below is an illustration ๐ก

Python
Check if all elements meet a certain condition
โ The combination of ๐ณ๐ผ๐ฟ loops and ๐ถ๐ณ statements is not always the most elegant way when writing Python code.
For instance, let’s say that you want to check if all the elements of an iterable meet a certain condition.
Two possibilities may arise:
1๏ธโฃ Either use for loop and if statement.
OR
2๏ธโฃ Use the all() built-in function
Below is an illustration ๐ก

Check if any element meets a certain condition
Similarly to the previous case, if you want to check if at least one element of an iterable meet a certain condition.
โ Then use the any() built-in function which is more elegant than using for loop and if statement.
The illustration is similar to the above image.
Avoid nested for loops
Writing nested ๐ณ๐ผ๐ฟ loops is almost inevitable when your program becomes bigger and more complicated.
โ This can also make your code difficult to read and maintain.
โ A better alternative is to use the built-in ๐ฝ๐ฟ๐ผ๐ฑ๐๐ฐ๐() function instead.
Below is an illustration ๐ก

Automatically handle index in a list
Imagine you have to access elements in a list and their indexes at the same time.
One way of doing it is handling manually the indexes in a for loop.
โ Instead, you can use the ๐ฒ๐ป๐๐บ๐ฒ๐ฟ๐ฎ๐๐ฒ() built-in function.
This has two main benefits (I can think of).
โจ First it automatically handles the index variable.
โจ Then makes the code more readable.
Below is an illustration ๐ก

Conclusion
Thank you for reading! ๐ ๐พ
I hope you found this list of Python and Pandas tricks helpful! Keep an eye on here, because the content will be maintained with more tricks on a daily basis.
Also, If you like reading my stories and wish to support my writing, consider becoming a Medium member. With a $ 5-a-month commitment, you unlock unlimited access to stories on Medium.
Would you like to buy me a coffee โ๏ธ? โ Here you go!
Feel free to follow me on Medium, Twitter, and YouTube, or say Hi on LinkedIn. It is always a pleasure to discuss AI, ML, Data Science, NLP, and MLOps stuff!
Before you leave find the last two parts of this series below:
Pandas & Python Tricks for Data Science & Data Analysis – Part 1
Pandas & Python Tricks for Data Science & Data Analysis – Part 2
Pandas & Python Tricks for Data Science & Data Analysis – Part 3
Pandas & Python Tricks for Data Science & Data Analysis – Part 4