The world’s leading publication for data science, AI, and ML professionals.

Lists, Tuples, Dictionaries, And Data Frames in Python: The Complete Guide

All you need to know to master the most used data structures in Python

Image by Pexels on Pixabay
Image by Pexels on Pixabay

If you’ve started learning Python, whether you want to be a Software Engineer or a Data Scientist, you absolutely need to master data structures.

Python has a lot of data structures that allow us to store data. In this article, we’ll dive into the most used ones. So, if you’re starting your career and need to learn data structures, then this article is definitely for you.

Here’s what you’ll find here:

Table of Contents:

Lists
  Definitions and creation examples
  Lists manipulation
  List comprehension
  List of lists
Tuples
Dictionaries
  Dictionaries manipulation
  Nested dictionaries
  Dictionary comprehension
Data frames
   Basic data frames manipulations with Pandas

Lists

Definition and creation examples

In Python, a list is a collection of ordered elements that can be of any type: strings, integers, floats, etc…

To create a list, the items must be inserted between square brackets and separated by a comma. For example, here’s how we can create a list of integers:

# Create list of integers
my_integers = [1, 2, 3, 4, 5, 6]

But lists can also have "mixed" types stored inside them. For example, let’s create a list with both integers and strings:

# Create a mixed list
mixed_list = [1, 3, "dad", 101, "apple"]

To create a list, we can also use the Python built-in function list(). This is how we can use it:

# Create list and print it
my_list = list((1, 2, 3, 4, 5))
print(my_list)

>>>  

    [1, 2, 3, 4, 5]

This built-in function is very useful in some particular cases. For example, let’s say we want to create a list of numbers in the range (1–10). Here’s how we can do so:

# Create a list in a range
my_list = list(range(1, 10))
print(my_list)

>>>

  [1, 2, 3, 4, 5, 6, 7, 8, 9]
NOTE:

Remember that the built-in function "range" includes the first value,
and excludes the last one.

Now, let’s see how we can manipulate lists.

Lists manipulation

Thanks to the fact that lists are mutable, we have lots of possibilities to manipulate them. For example, let’s say we have a list of names, but we made a mistake and we want to change one. Here’s how we can do so:

# List of names
names = ["James", "Richard", "Simon", "Elizabeth", "Tricia"]
# Change the wrong name
names[0] = "Alexander"
# Print list
print(names)

>>>

    ['Alexander', 'Richard', 'Simon', 'Elizabeth', 'Tricia']

So, in the above example, we’ve changed the first name of the list from James to Alexander.

NOTE:

In case you didn't know, note that in Python the first element
is always accessed by "0", regarding of the type we're manipulating.
So, in the above example, "names[0]" represents the first element
of the list "names".

Now, suppose we’ve forgotten a name. We can add it to our list like so:

# List of names
names = ["James", "Richard", "Simon", "Elizabeth", "Tricia"]
# Append another name
names.append("Alexander")
# Print list
print(names) 

>>>

    ['James', 'Richard', 'Simon', 'Elizabeth', 'Tricia', 'Alexander']

If we need to concatenate two lists, we have two possibilities: the concatenate method or the extend()one. Let’s see them:

# Create list1
list1 = [1, 2, 3]
# Create list2
list2 = [4, 5, 6]
# Concatenate lists
concatenated_list = list1 + list2
# Print concatenated list
print(concatenated_list)

>>>

  [1, 2, 3, 4, 5, 6]

So, this method creates a list that is the sum of other lists. Let’s see the extend() method:

# Create list1
list1 = [1, 2, 3]
# Create list2
list2 = [4, 5, 6]
# Extend list1 with list2
list1.extend(list2)
# Print new list1
print(list1)

>>>

  [1, 2, 3, 4, 5, 6]

As we can see, the results are the same, but the syntax is different. This method extends list1 with list2.

If we want to remove elements, we have two possibilities: we can use the remove() method or del. Let’s see them:

# Create list
my_list = [1, 2, 3, 'four', 5.0]
# Remove one element and print
my_list.remove('four')
print(my_list)

>>>

  [1, 2, 3, 5.0]

Let’s see the other method:

# Create list
my_list = [1, 2, 3, 'four', 5.0]
# Delete one element and print
del my_list[3]
print(my_list)

>>>

    [1, 2, 3, 5.0]

So, we get the same results with both methods, but remove() gives us the possibility to explicitly write the element to remove, while del needs to access the position of the element of the list.

NOTE:

If you've gained familiarity with accessing positions, in the above
example my_list[3] = 'four'. Because, remember: in Python we start counting
positions from 0.

List comprehension

There are a lot of cases where we need to create lists starting from existing lists, generally applying some filters to the existing data. To do so, we have two possibilities:

  1. We use loops and statements.
  2. We use list comprehension.

Practically, they are both the same way to write the same thing, but list comprehension is more concise and elegant.

But before we discuss these methods, you may need a deep overview of loops and statements. Here are a couple of articles I wrote in the past that may help you:

Loops and statements in Python: A deep understanding (with examples)

Python Loops: A Complete Guide On How To Iterate in Python

Now, let’s see a couple of examples using loops and statements directly.

Suppose we have a shopping list. We want our program to print that we love one fruit and that we don’t like the others on the list. Here’s how we can do so:

# Create shopping list
shopping_list = ["banana", "apple", "orange", "lemon"]
# Print the one I like
for fruit in shopping_list:
    if fruit == "lemon":
        print(f"I love {fruit}")
    else:
        print(f"I don't like {fruit}")

>>>

    I don't like banana
    I don't like apple
    I don't like orange
    I love lemon

Another example could be the following. Suppose we have a list of numbers and we want to print just the even ones. Here’s how we can do so:

# Create list
numbers = [1,2,3,4,5,6,7,8]
# Create empty list
even_list = []
# Print even numbers
for even in numbers:
    if even %2 == 0:
        even_list.append(even)
    else:
        pass

print(even_list)

>>>

    [2, 4, 6, 8]
NOTE:

If you are not familiar with the sintax %2 == 0 it means that we are
dividing a number by 2 and expect a reminder of 0. In other words,
we are asking our program to intercept the even numbers.

So, in the above example, we’ve created a list of numbers. Then, we’ve created an empty list that is used after the loop to append all the even numbers. This way, we’ve created a list of even numbers from a list with "general" numbers.

Now… this way of creating new lists with loops and statements is a little "heavy". I mean: it requires a lot of code. We can gain the same results in a more concise way using list comprehension.

For example, to create a list with even numbers we can use list comprehension like so:

# Create list
numbers = [1,2,3,4,5,6,7,8]
# Create list of even numbers
even_numbers = [even for even in numbers if even %2 == 0]
# Print even list
print(even_numbers)

>>>

    [2, 4, 6, 8]

So, list comprehension creates directly a new list and we define the condition inside it. As we can see, we gain the same result as before, but in just one line of code: not bad!

Now, let’s create a list with comments on the fruit I love (and the fruit I don’t) with list comprehension:

# Create shipping list
shopping_list = ["banana", "apple", "orange", "lemon"]
# Create commented list and print it
commented_list = [f"I love {fruit}" if fruit == "banana"
                  else f"I don't like {fruit}"
                  for fruit in shopping_list]
print(commented_list)

>>>

  ['I love banana', "I don't like apple", "I don't like orange",
   "I don't like lemon"]

So, we gained the same result as before, but with just a line of code. The only difference is that here we’ve printed a list (because list comprehension creates one!), while before we just printed the results.

List of lists

There is also the possibility to create lists of lists, that are lists nested into one list. This possibility is useful when we want to represent listed data as a unique list.

For example, consider we want to create a list of students and their grades. We could create something like that:

# Create lis with students and their grades
students = [
    ["John", [85, 92, 78, 90]],
    ["Emily", [77, 80, 85, 88]],
    ["Michael", [90, 92, 88, 94]],
    ["Sophia", [85, 90, 92, 87]]
]

This is a useful notation if, for example, we want to calculate the mean grade for each student. We can do it like so:

# Iterate over the list
for student in students:
    name = student[0] # Access names
    grades = student[1] # Access grades
    average_grade = sum(grades) / len(grades) # Calculate mean grades
    print(f"{name}'s average grade is {average_grade:.2f}")

>>>

    John's average grade is 86.25
    Emily's average grade is 82.50
    Michael's average grade is 91.00
    Sophia's average grade is 88.50

Tuples

Tuples are another data structure type in Python. They are defined with round brackets and, as lists, can contain any data type separated by a comma. So, for example, we can define a tuple like so:

# Define a tuple and print it
my_tuple = (1, 3.0, "John")
print(my_tuple)

>>>

    (1, 3.0, 'John')

The difference between a tuple and a list is that a tuple is immutable. This means that the elements of a tuple can not be changed. So, for example, if we try to append a value to a tuple we get an error:

# Create a tuple with names
names = ("James", "Jhon", "Elizabeth")
# Try to append a name
names.append("Liza")

>>>

    AttributeError: 'tuple' object has no attribute 'append'

So, since we can’t modify tuples, they are useful when we want our data to be immutable; for example, in situations where we don’t want to make mistakes.

A practical example may be the cart of an e-commerce. We may want this kind of data to be immutable so that we don’t make any mistakes when manipulating it. Imagine someone bought a shirt, a pair of shoes, and a watch from our e-commerce. We may report this data with quantity and price into one tuple:

# Create a chart as a tuple
cart = (
    ("Shirt", 2, 19.99),
    ("Shoes", 1, 59.99),
    ("Watch", 1, 99.99)
)

Of course, to be precise, this is a tuple of tuples.

Since tuples are immutable, they are more efficient in terms of performance, meaning they save our computer’s resources. But when it comes to manipulation, we can use the exact same code as we’ve seen for lists, so we won’t write it again.

Finally, similarly to lists, we can create a tuple with the built-in function tuple() like so:

# Create a tuple in a range
my_tuple = tuple(range(1, 10))
print(my_tuple)

>>>

  (1, 2, 3, 4, 5, 6, 7, 8, 9)

Dictionaries

A dictionary is a way to store data that are coupled as keys and values. This is how we can create one:

# Create a dictionary
my_dictionary = {'key_1':'value_1', 'key_2':'value_2'}

So, we create a dictionary with curly brackets and we store in it a couple of keys and values separated by a colon. The couples keys-values are then separated by a comma.

Now, let’s see how we can manipulate dictionaries.

Dictionaries manipulation

Both keys and values of a dictionary can be of any type: strings, integers, or floats. So, for example, we can create a dictionary like so:

# Create a dictionary of numbers and print it
numbers = {1:'one', 2:'two', 3:'three'}
print(numbers)

>>>

    {1: 'one', 2: 'two', 3: 'three'}

But we can create one also like that:

# Create a dictionary of numbers and print it
numbers = {'one':1, 'two':2.0, 3:'three'}
print(numbers)

>>>

  {'one': 1, 'two': 2.0, 3: 'three'}

Choosing the type for values and keys depends on the problem we need to solve. Anyway, considering the dictionary we’ve seen before, we can access both values and keys like so:

# Access values and keys
keys = list(numbers.keys())
values = tuple(numbers.values())
# Print values and keys
print(f"The keys are: {keys}")
print(f"The values are: {values}")

>>>

    The keys are: ['one', 'two', 3]
    The values are: (1, 2.0, 'three')

So, if our dictionary is called numbers we access its key with numbers.keys(). And with numbers.values() we access its values. Also, note that we have created a list with the keys and a tuple with the values using the notation we’ve seen before.

Of course, we can also iterate over dictionaries. For example, suppose we want to print the values that are greater than a certain threshold:

# Create a shopping list with fruits and prices
shopping_list = {'banana':2, 'apple':1, 'orange':1.5}
# Iterate over the values
for values in shopping_list.values():
    # Values greater than threshold
    if values > 1:
        print(values)

>>>

    2
    1.5

Like lists, dictionaries are mutable. So, if we want to add a value to a dictionary we have to define the key and the value to add to it. We can do it like so:

# Create the dictionary
person = {'name': 'John', 'age': 30}
# Add value and key and print
person['city'] = 'New York'
print(person)

>>>

    {'name': 'John', 'age': 30, 'city': 'New York'}

To modify a value of a dictionary, we need to access its key:

# Create a dictionary
person = {'name': 'John', 'age': 30}
# Change age value and print
person['age'] = 35
print(person)

>>>

    {'name': 'John', 'age': 35}

To delete a pair key-value from a dictionary, we need to access its key:

# Create dictionary
person = {'name': 'John', 'age': 30}
# Delete age and print
del person['age']
print(person)

>>>

    {'name': 'John'}

Nested dictionaries

We have seen before that we can create lists of lists and tuples of tuples. Similarly, we can create nested dictionaries. Suppose, for example, we want to create a dictionary to store the data related to a class of students. We can do it like so:

# Create a classroom dictionary
classroom = {
    'student_1': {
        'name': 'Alice',
        'age': 15,
        'grades': [90, 85, 92]
    },
    'student_2': {
        'name': 'Bob',
        'age': 16,
        'grades': [80, 75, 88]
    },
    'student_3': {
        'name': 'Charlie',
        'age': 14,
        'grades': [95, 92, 98]
    }

So, the data of each student are represented as a dictionary and all the dictionaries are stored in a unique dictionary, representing the classroom. As we can see, the values of a dictionary can even be lists (or tuples, if we’d like). In this case, we’ve used lists to store the grades of each student.

To print the values of one student, we just need to remember that, from the perspective of the classroom dictionary, we need to access the key and, in this case, the keys are the students themselves. This means we can do it like so:

# Access student_3 and print
student_3 = classroom['student_3']
print(student_3)

>>>

    {'name': 'Charlie', 'age': 14, 'grades': [95, 92, 98]}

Dictionaries comprehension

Dictionary comprehension allows us to create dictionaries concisely and efficiently. It’s similar to list comprehension but, instead of creating a list, it creates a dictionary.

Suppose we have a dictionary where we have stored some objects and their prices. We want to know the objects that cost less than a certain threshold. We can do it like so:

# Define initial dictionary
products = {'shoes': 100, 'watch': 50, 'smartphone': 250, 'tablet': 120}
# Define threshold
max_price = 150
# Filter for threshold
products_to_buy = {fruit: price for fruit, price in products.items() if price <= max_price}
# Print filtered dictionary
print(products_to_buy)

>>>

    {'shoes': 100, 'watch': 50, 'tablet': 120}

So, the syntax to use dictionary comprehension is:

new_dict = {key:value for key, value in iterable}

Where iterable is any iterable Python object. It can be a list, a tuple, another dictionary, etc…

Creating dictionaries with the "standard" method would require a lot of code, with conditions, loops, and statements. Instead, as we can see, dictionary comprehension allows us to create a dictionary, based on conditions, with just one line of code.

Dictionary comprehension is especially useful when we need to create a dictionary retrieving data from other sources or data structures. For example, say we need to create a dictionary retrieving values from two lists. We can do it like so:

# Define names and ages in lists
names = ['John', 'Jane', 'Bob', 'Alice']
cities = ['New York', 'Boston', 'London', 'Rome']
# Create dictionary from lists and print results
name_age_dict = {name: city for name, city in zip(names, cities)}
print(name_age_dict)

>>>

   {'John': 'New York', 'Jane': 'Boston', 'Bob': 'London', 'Alice': 'Rome'}

Data frames

A data frame is the representation of tabular data. Image from the Panda's website here: https://pandas.pydata.org/docs/getting_started/index.html
A data frame is the representation of tabular data. Image from the Panda’s website here: https://pandas.pydata.org/docs/getting_started/index.html

A data frame is a two-dimensional data structure consisting of columns and rows. So, it is somehow similar to a spreadsheet or a table in an SQL database. They have the following characteristics:

  1. Each row represents an individual observation or record.
  2. Each column represents a variable or a specific attribute of the data.
  3. They have labeled rows (called indexes) and columns, making it easy to manipulate the data.
  4. The columns can contain different types of data, like integers, strings, or floats. Even a single column can contain different data types.

While data frames are the typical data structure used in the context of Data Analysis and Data Science, it is not uncommon that a Python Software Engineer may need to manipulate a data frame, and this is why we’re having an overview of data frames.

Here’s how a data frame appears:

A data frame. Image by Federico Trotta.
A data frame. Image by Federico Trotta.

So, on the left (in the blue rectangle) we can see the indexes, meaning the row counts. We can then see that a data frame can contain different types of data. In particular, the column "Age" contains different data types (one string and two integers).

Basic data frames manipulation with Pandas

While recently a new library to manipulate data frames called "Polars" started circulating, here we’ll see some data manipulation with Pandas which is still the most used as of today.

First of all, generally, we can create data frames by importing data from .xlsx or .cvs files. In Pandas we can do it like so:

import pandas as pd

# Import cvs file
my_dataframe = pd.read_csv('a_file.csv')

# Import xlsx
my_dataframe_2 = pd.read_excel('a_file_2.xlsx')

If we want to create a data frame:

import pandas as pd

# Create a dictionary with different types of data
data = {
    'Name': ['John', 'Alice', 'Bob'],
    'Age': ['twenty-five', 30, 27],
    'City': ['New York', 'London', 'Sydney'],
    'Salary': [50000, 60000.50, 45000.75],
    'Is_Employed': [True, True, False]
}

# Create the dataframe
df = pd.DataFrame(data)

This is the data frame we’ve shown above. So, as we can see, we first create a dictionary, and then we convert it to a data frame with the method pd.DataFrame().

We have three possibilities to visualize a data frame. Suppose we have a data frame called df:

  1. The first one is print(df).
  2. The second one is df.head() that will show the first 5 rows of our data frame. In case we have a data frame with a lot of rows, we can show more than the first five. For example, df.head(20) shows the first 20.
  3. The third one is df.tail() that works exactly like head(), but this shows the last rows.

On the side of visualization, using the above df, this is what df.head() shows:

What df.head() shows. Image by Federico Trotta.
What df.head() shows. Image by Federico Trotta.

And this is what print(df) shows:

What print(df) shows. Image by Federico Trotta.
What print(df) shows. Image by Federico Trotta.

In the case of small data sets like this one, the difference is only a matter of taste (I prefer head() because it "shows the tabularity" of data). But in the case of large data sets, head() is way much better. Try it, and let me know!

Consider that Pandas is a very wide library, meaning it allows us to manipulate tabular data in a variety of ways, so it’d need to be treated alone. Here we want to show just the very basics, so we’ll see how we can add and delete a column (the columns of a data frame are also called "Pandas series").

Suppose we want to add a column to the data frame df we’ve seen above that is telling us if people are married or not. We can do it like so:

# Add marital status
df["married"] = ["yes", "yes", "no"]
NOTE:

this is the same notation we used to add values to a dictionary.
Return back on the article and compare the two methods.

And showing the head we have:

The data frame df with the marital status. Image by Federico Trotta.
The data frame df with the marital status. Image by Federico Trotta.

To delete one column:

# Delete the "Is_Employed" column
df = df.drop('Is_Employed', axis=1)

And we get:

The data frame without the column related to employment data. Image by Federico Trotta.
The data frame without the column related to employment data. Image by Federico Trotta.

Note that we need to use axis=1 because here we are telling Pandas to remove columns and since a data frame is a two-dimensional data structure, axis=1 represents the vertical direction.

Instead, if we want to drop a row, we need to use axis=0. For example, suppose we want to delete the row associated with the index 1 ( that is the second row because, again, we start counting from 0):

# Delete the second row 
df = df.drop(1, axis=0)

And we get:

The data frame without the second row. Image by Federico Trotta.
The data frame without the second row. Image by Federico Trotta.

Conclusions

So far, we’ve seen the most used data structures in Python. These are not the only ones, but surely the most used.

Also, there is no right or wrong in using one rather than another: we just need to understand what data we need to store and use the best data structure for this type of task.

I hope this article helped you understand the usage of these data structures and when to use them.


FREE PYTHON EBOOK:

Started learning Python Data Science but struggling with it? Subscribe to my newsletter and get my free ebook: this will give you the right learning path to follow to learn Python for Data Science with hands-on experience.

Enjoyed the story? Become a Medium member for 5$/month through my referral link: I’ll earn a small commission to no additional fee to you:

Join Medium with my referral link – Federico Trotta


Related Articles