
If you’ve started learning Python, whether you want to be a Software Engineer or a Data Scientist, you absolutely need to master data structures.
Python has a lot of data structures that allow us to store data. In this article, we’ll dive into the most used ones. So, if you’re starting your career and need to learn data structures, then this article is definitely for you.
Here’s what you’ll find here:
Table of Contents:
Lists
Definitions and creation examples
Lists manipulation
List comprehension
List of lists
Tuples
Dictionaries
Dictionaries manipulation
Nested dictionaries
Dictionary comprehension
Data frames
Basic data frames manipulations with Pandas
Lists
Definition and creation examples
In Python, a list is a collection of ordered elements that can be of any type: strings, integers, floats, etc…
To create a list, the items must be inserted between square brackets and separated by a comma. For example, here’s how we can create a list of integers:
# Create list of integers
my_integers = [1, 2, 3, 4, 5, 6]
But lists can also have "mixed" types stored inside them. For example, let’s create a list with both integers and strings:
# Create a mixed list
mixed_list = [1, 3, "dad", 101, "apple"]
To create a list, we can also use the Python built-in function list()
. This is how we can use it:
# Create list and print it
my_list = list((1, 2, 3, 4, 5))
print(my_list)
>>>
[1, 2, 3, 4, 5]
This built-in function is very useful in some particular cases. For example, let’s say we want to create a list of numbers in the range (1–10). Here’s how we can do so:
# Create a list in a range
my_list = list(range(1, 10))
print(my_list)
>>>
[1, 2, 3, 4, 5, 6, 7, 8, 9]
NOTE:
Remember that the built-in function "range" includes the first value,
and excludes the last one.
Now, let’s see how we can manipulate lists.
Lists manipulation
Thanks to the fact that lists are mutable, we have lots of possibilities to manipulate them. For example, let’s say we have a list of names, but we made a mistake and we want to change one. Here’s how we can do so:
# List of names
names = ["James", "Richard", "Simon", "Elizabeth", "Tricia"]
# Change the wrong name
names[0] = "Alexander"
# Print list
print(names)
>>>
['Alexander', 'Richard', 'Simon', 'Elizabeth', 'Tricia']
So, in the above example, we’ve changed the first name of the list from James to Alexander.
NOTE:
In case you didn't know, note that in Python the first element
is always accessed by "0", regarding of the type we're manipulating.
So, in the above example, "names[0]" represents the first element
of the list "names".
Now, suppose we’ve forgotten a name. We can add it to our list like so:
# List of names
names = ["James", "Richard", "Simon", "Elizabeth", "Tricia"]
# Append another name
names.append("Alexander")
# Print list
print(names)
>>>
['James', 'Richard', 'Simon', 'Elizabeth', 'Tricia', 'Alexander']
If we need to concatenate two lists, we have two possibilities: the concatenate
method or the extend()
one. Let’s see them:
# Create list1
list1 = [1, 2, 3]
# Create list2
list2 = [4, 5, 6]
# Concatenate lists
concatenated_list = list1 + list2
# Print concatenated list
print(concatenated_list)
>>>
[1, 2, 3, 4, 5, 6]
So, this method creates a list that is the sum of other lists. Let’s see the extend()
method:
# Create list1
list1 = [1, 2, 3]
# Create list2
list2 = [4, 5, 6]
# Extend list1 with list2
list1.extend(list2)
# Print new list1
print(list1)
>>>
[1, 2, 3, 4, 5, 6]
As we can see, the results are the same, but the syntax is different. This method extends list1
with list2
.
If we want to remove elements, we have two possibilities: we can use the remove()
method or del
. Let’s see them:
# Create list
my_list = [1, 2, 3, 'four', 5.0]
# Remove one element and print
my_list.remove('four')
print(my_list)
>>>
[1, 2, 3, 5.0]
Let’s see the other method:
# Create list
my_list = [1, 2, 3, 'four', 5.0]
# Delete one element and print
del my_list[3]
print(my_list)
>>>
[1, 2, 3, 5.0]
So, we get the same results with both methods, but remove()
gives us the possibility to explicitly write the element to remove, while del
needs to access the position of the element of the list.
NOTE:
If you've gained familiarity with accessing positions, in the above
example my_list[3] = 'four'. Because, remember: in Python we start counting
positions from 0.
List comprehension
There are a lot of cases where we need to create lists starting from existing lists, generally applying some filters to the existing data. To do so, we have two possibilities:
- We use loops and statements.
- We use list comprehension.
Practically, they are both the same way to write the same thing, but list comprehension is more concise and elegant.
But before we discuss these methods, you may need a deep overview of loops and statements. Here are a couple of articles I wrote in the past that may help you:
Loops and statements in Python: A deep understanding (with examples)
Now, let’s see a couple of examples using loops and statements directly.
Suppose we have a shopping list. We want our program to print that we love one fruit and that we don’t like the others on the list. Here’s how we can do so:
# Create shopping list
shopping_list = ["banana", "apple", "orange", "lemon"]
# Print the one I like
for fruit in shopping_list:
if fruit == "lemon":
print(f"I love {fruit}")
else:
print(f"I don't like {fruit}")
>>>
I don't like banana
I don't like apple
I don't like orange
I love lemon
Another example could be the following. Suppose we have a list of numbers and we want to print just the even ones. Here’s how we can do so:
# Create list
numbers = [1,2,3,4,5,6,7,8]
# Create empty list
even_list = []
# Print even numbers
for even in numbers:
if even %2 == 0:
even_list.append(even)
else:
pass
print(even_list)
>>>
[2, 4, 6, 8]
NOTE:
If you are not familiar with the sintax %2 == 0 it means that we are
dividing a number by 2 and expect a reminder of 0. In other words,
we are asking our program to intercept the even numbers.
So, in the above example, we’ve created a list of numbers. Then, we’ve created an empty list that is used after the loop to append all the even numbers. This way, we’ve created a list of even numbers from a list with "general" numbers.
Now… this way of creating new lists with loops and statements is a little "heavy". I mean: it requires a lot of code. We can gain the same results in a more concise way using list comprehension.
For example, to create a list with even numbers we can use list comprehension like so:
# Create list
numbers = [1,2,3,4,5,6,7,8]
# Create list of even numbers
even_numbers = [even for even in numbers if even %2 == 0]
# Print even list
print(even_numbers)
>>>
[2, 4, 6, 8]
So, list comprehension creates directly a new list and we define the condition inside it. As we can see, we gain the same result as before, but in just one line of code: not bad!
Now, let’s create a list with comments on the fruit I love (and the fruit I don’t) with list comprehension:
# Create shipping list
shopping_list = ["banana", "apple", "orange", "lemon"]
# Create commented list and print it
commented_list = [f"I love {fruit}" if fruit == "banana"
else f"I don't like {fruit}"
for fruit in shopping_list]
print(commented_list)
>>>
['I love banana', "I don't like apple", "I don't like orange",
"I don't like lemon"]
So, we gained the same result as before, but with just a line of code. The only difference is that here we’ve printed a list (because list comprehension creates one!), while before we just printed the results.
List of lists
There is also the possibility to create lists of lists, that are lists nested into one list. This possibility is useful when we want to represent listed data as a unique list.
For example, consider we want to create a list of students and their grades. We could create something like that:
# Create lis with students and their grades
students = [
["John", [85, 92, 78, 90]],
["Emily", [77, 80, 85, 88]],
["Michael", [90, 92, 88, 94]],
["Sophia", [85, 90, 92, 87]]
]
This is a useful notation if, for example, we want to calculate the mean grade for each student. We can do it like so:
# Iterate over the list
for student in students:
name = student[0] # Access names
grades = student[1] # Access grades
average_grade = sum(grades) / len(grades) # Calculate mean grades
print(f"{name}'s average grade is {average_grade:.2f}")
>>>
John's average grade is 86.25
Emily's average grade is 82.50
Michael's average grade is 91.00
Sophia's average grade is 88.50
Tuples
Tuples are another data structure type in Python. They are defined with round brackets and, as lists, can contain any data type separated by a comma. So, for example, we can define a tuple like so:
# Define a tuple and print it
my_tuple = (1, 3.0, "John")
print(my_tuple)
>>>
(1, 3.0, 'John')
The difference between a tuple and a list is that a tuple is immutable. This means that the elements of a tuple can not be changed. So, for example, if we try to append a value to a tuple we get an error:
# Create a tuple with names
names = ("James", "Jhon", "Elizabeth")
# Try to append a name
names.append("Liza")
>>>
AttributeError: 'tuple' object has no attribute 'append'
So, since we can’t modify tuples, they are useful when we want our data to be immutable; for example, in situations where we don’t want to make mistakes.
A practical example may be the cart of an e-commerce. We may want this kind of data to be immutable so that we don’t make any mistakes when manipulating it. Imagine someone bought a shirt, a pair of shoes, and a watch from our e-commerce. We may report this data with quantity and price into one tuple:
# Create a chart as a tuple
cart = (
("Shirt", 2, 19.99),
("Shoes", 1, 59.99),
("Watch", 1, 99.99)
)
Of course, to be precise, this is a tuple of tuples.
Since tuples are immutable, they are more efficient in terms of performance, meaning they save our computer’s resources. But when it comes to manipulation, we can use the exact same code as we’ve seen for lists, so we won’t write it again.
Finally, similarly to lists, we can create a tuple with the built-in function tuple()
like so:
# Create a tuple in a range
my_tuple = tuple(range(1, 10))
print(my_tuple)
>>>
(1, 2, 3, 4, 5, 6, 7, 8, 9)
Dictionaries
A dictionary is a way to store data that are coupled as keys and values. This is how we can create one:
# Create a dictionary
my_dictionary = {'key_1':'value_1', 'key_2':'value_2'}
So, we create a dictionary with curly brackets and we store in it a couple of keys and values separated by a colon. The couples keys-values are then separated by a comma.
Now, let’s see how we can manipulate dictionaries.
Dictionaries manipulation
Both keys and values of a dictionary can be of any type: strings, integers, or floats. So, for example, we can create a dictionary like so:
# Create a dictionary of numbers and print it
numbers = {1:'one', 2:'two', 3:'three'}
print(numbers)
>>>
{1: 'one', 2: 'two', 3: 'three'}
But we can create one also like that:
# Create a dictionary of numbers and print it
numbers = {'one':1, 'two':2.0, 3:'three'}
print(numbers)
>>>
{'one': 1, 'two': 2.0, 3: 'three'}
Choosing the type for values and keys depends on the problem we need to solve. Anyway, considering the dictionary we’ve seen before, we can access both values and keys like so:
# Access values and keys
keys = list(numbers.keys())
values = tuple(numbers.values())
# Print values and keys
print(f"The keys are: {keys}")
print(f"The values are: {values}")
>>>
The keys are: ['one', 'two', 3]
The values are: (1, 2.0, 'three')
So, if our dictionary is called numbers
we access its key with numbers.keys()
. And with numbers.values()
we access its values. Also, note that we have created a list with the keys and a tuple with the values using the notation we’ve seen before.
Of course, we can also iterate over dictionaries. For example, suppose we want to print the values that are greater than a certain threshold:
# Create a shopping list with fruits and prices
shopping_list = {'banana':2, 'apple':1, 'orange':1.5}
# Iterate over the values
for values in shopping_list.values():
# Values greater than threshold
if values > 1:
print(values)
>>>
2
1.5
Like lists, dictionaries are mutable. So, if we want to add a value to a dictionary we have to define the key and the value to add to it. We can do it like so:
# Create the dictionary
person = {'name': 'John', 'age': 30}
# Add value and key and print
person['city'] = 'New York'
print(person)
>>>
{'name': 'John', 'age': 30, 'city': 'New York'}
To modify a value of a dictionary, we need to access its key:
# Create a dictionary
person = {'name': 'John', 'age': 30}
# Change age value and print
person['age'] = 35
print(person)
>>>
{'name': 'John', 'age': 35}
To delete a pair key-value from a dictionary, we need to access its key:
# Create dictionary
person = {'name': 'John', 'age': 30}
# Delete age and print
del person['age']
print(person)
>>>
{'name': 'John'}
Nested dictionaries
We have seen before that we can create lists of lists and tuples of tuples. Similarly, we can create nested dictionaries. Suppose, for example, we want to create a dictionary to store the data related to a class of students. We can do it like so:
# Create a classroom dictionary
classroom = {
'student_1': {
'name': 'Alice',
'age': 15,
'grades': [90, 85, 92]
},
'student_2': {
'name': 'Bob',
'age': 16,
'grades': [80, 75, 88]
},
'student_3': {
'name': 'Charlie',
'age': 14,
'grades': [95, 92, 98]
}
So, the data of each student are represented as a dictionary and all the dictionaries are stored in a unique dictionary, representing the classroom. As we can see, the values of a dictionary can even be lists (or tuples, if we’d like). In this case, we’ve used lists to store the grades of each student.
To print the values of one student, we just need to remember that, from the perspective of the classroom dictionary, we need to access the key and, in this case, the keys are the students themselves. This means we can do it like so:
# Access student_3 and print
student_3 = classroom['student_3']
print(student_3)
>>>
{'name': 'Charlie', 'age': 14, 'grades': [95, 92, 98]}
Dictionaries comprehension
Dictionary comprehension allows us to create dictionaries concisely and efficiently. It’s similar to list comprehension but, instead of creating a list, it creates a dictionary.
Suppose we have a dictionary where we have stored some objects and their prices. We want to know the objects that cost less than a certain threshold. We can do it like so:
# Define initial dictionary
products = {'shoes': 100, 'watch': 50, 'smartphone': 250, 'tablet': 120}
# Define threshold
max_price = 150
# Filter for threshold
products_to_buy = {fruit: price for fruit, price in products.items() if price <= max_price}
# Print filtered dictionary
print(products_to_buy)
>>>
{'shoes': 100, 'watch': 50, 'tablet': 120}
So, the syntax to use dictionary comprehension is:
new_dict = {key:value for key, value in iterable}
Where iterable is any iterable Python object. It can be a list, a tuple, another dictionary, etc…
Creating dictionaries with the "standard" method would require a lot of code, with conditions, loops, and statements. Instead, as we can see, dictionary comprehension allows us to create a dictionary, based on conditions, with just one line of code.
Dictionary comprehension is especially useful when we need to create a dictionary retrieving data from other sources or data structures. For example, say we need to create a dictionary retrieving values from two lists. We can do it like so:
# Define names and ages in lists
names = ['John', 'Jane', 'Bob', 'Alice']
cities = ['New York', 'Boston', 'London', 'Rome']
# Create dictionary from lists and print results
name_age_dict = {name: city for name, city in zip(names, cities)}
print(name_age_dict)
>>>
{'John': 'New York', 'Jane': 'Boston', 'Bob': 'London', 'Alice': 'Rome'}
Data frames

A data frame is a two-dimensional data structure consisting of columns and rows. So, it is somehow similar to a spreadsheet or a table in an SQL database. They have the following characteristics:
- Each row represents an individual observation or record.
- Each column represents a variable or a specific attribute of the data.
- They have labeled rows (called indexes) and columns, making it easy to manipulate the data.
- The columns can contain different types of data, like integers, strings, or floats. Even a single column can contain different data types.
While data frames are the typical data structure used in the context of Data Analysis and Data Science, it is not uncommon that a Python Software Engineer may need to manipulate a data frame, and this is why we’re having an overview of data frames.
Here’s how a data frame appears:

So, on the left (in the blue rectangle) we can see the indexes, meaning the row counts. We can then see that a data frame can contain different types of data. In particular, the column "Age" contains different data types (one string and two integers).
Basic data frames manipulation with Pandas
While recently a new library to manipulate data frames called "Polars" started circulating, here we’ll see some data manipulation with Pandas which is still the most used as of today.
First of all, generally, we can create data frames by importing data from .xlsx
or .cvs
files. In Pandas we can do it like so:
import pandas as pd
# Import cvs file
my_dataframe = pd.read_csv('a_file.csv')
# Import xlsx
my_dataframe_2 = pd.read_excel('a_file_2.xlsx')
If we want to create a data frame:
import pandas as pd
# Create a dictionary with different types of data
data = {
'Name': ['John', 'Alice', 'Bob'],
'Age': ['twenty-five', 30, 27],
'City': ['New York', 'London', 'Sydney'],
'Salary': [50000, 60000.50, 45000.75],
'Is_Employed': [True, True, False]
}
# Create the dataframe
df = pd.DataFrame(data)
This is the data frame we’ve shown above. So, as we can see, we first create a dictionary, and then we convert it to a data frame with the method pd.DataFrame()
.
We have three possibilities to visualize a data frame. Suppose we have a data frame called df
:
- The first one is
print(df)
. - The second one is
df.head()
that will show the first 5 rows of our data frame. In case we have a data frame with a lot of rows, we can show more than the first five. For example,df.head(20)
shows the first 20. - The third one is
df.tail()
that works exactly likehead()
, but this shows the last rows.
On the side of visualization, using the above df
, this is what df.head()
shows:

And this is what print(df)
shows:

In the case of small data sets like this one, the difference is only a matter of taste (I prefer head()
because it "shows the tabularity" of data). But in the case of large data sets, head()
is way much better. Try it, and let me know!
Consider that Pandas is a very wide library, meaning it allows us to manipulate tabular data in a variety of ways, so it’d need to be treated alone. Here we want to show just the very basics, so we’ll see how we can add and delete a column (the columns of a data frame are also called "Pandas series").
Suppose we want to add a column to the data frame df
we’ve seen above that is telling us if people are married or not. We can do it like so:
# Add marital status
df["married"] = ["yes", "yes", "no"]
NOTE:
this is the same notation we used to add values to a dictionary.
Return back on the article and compare the two methods.
And showing the head we have:

To delete one column:
# Delete the "Is_Employed" column
df = df.drop('Is_Employed', axis=1)
And we get:

Note that we need to use axis=1
because here we are telling Pandas to remove columns and since a data frame is a two-dimensional data structure, axis=1
represents the vertical direction.
Instead, if we want to drop a row, we need to use axis=0
. For example, suppose we want to delete the row associated with the index 1 ( that is the second row because, again, we start counting from 0):
# Delete the second row
df = df.drop(1, axis=0)
And we get:

Conclusions
So far, we’ve seen the most used data structures in Python. These are not the only ones, but surely the most used.
Also, there is no right or wrong in using one rather than another: we just need to understand what data we need to store and use the best data structure for this type of task.
I hope this article helped you understand the usage of these data structures and when to use them.
FREE PYTHON EBOOK:
Started learning Python Data Science but struggling with it? Subscribe to my newsletter and get my free ebook: this will give you the right learning path to follow to learn Python for Data Science with hands-on experience.
Enjoyed the story? Become a Medium member for 5$/month through my referral link: I’ll earn a small commission to no additional fee to you: