The world’s leading publication for data science, AI, and ML professionals.

10 Ways to Add a Column to Pandas DataFrames

We often need to derive or create new columns

Photo by Austin Chan on Unsplash
Photo by Austin Chan on Unsplash

DataFrame is a two-dimensional data structure with labeled rows and columns. We often need to add new columns as part of data analysis or feature engineering processes.

There are many different ways of adding new columns. What suits best to your need depends on the task at hand.

In this article, we’ll learn 10 ways to add a column to Pandas DataFrames.

Let’s start by creating a simple DataFrame using the DataFrame constructor of Pandas. We’ll pass the data as a Python dictionary with column names being keys and rows being the values of the dictionary.

import pandas as pd

# create DataFrame
df = pd.DataFrame(

    {
        "first_name": ["Jane", "John", "Max", "Emily", "Ashley"],
        "last_name": ["Doe", "Doe", "Dune", "Smith", "Fox"],
        "id": [101, 103, 143, 118, 128]
    }   
)

# display DataFrame
df
df (image by author)
df (image by author)

1. Use a constant value

We can add a new column of a constant value as follows:

df.loc[:, "department"] = "engineering"

# display DataFrame
df
df (image by author)
df (image by author)

2. Use array-like structure

We can use an array-like structure to add a new column. In this case, make sure the number of values in the array is the same as the number of rows in the DataFrame.

df.loc[:, "salary"] = [45000, 43000, 42000, 45900, 54000]

In the example above, we used a Python list. Let’s determine the values randomly with NumPy’s random module.

import numpy as np

df.loc[:, "salary"] = np.random.randint(40000, 55000, size=5)

# display DataFrame
df
df (image by author)
df (image by author)

3. Derive from other columns

It’s highly common to derive new columns from the existing ones.

Let’s do a simple one to demonstrate this case and create a name column by combining the first_name and last_name .

df.loc[:, "name"] = df["first_name"] + ' ' + df["last_name"]

# display DataFrame
df
df (image by author)
df (image by author)

We can also use the cat function under the str accessor to combine (i.e. concatenate) strings.

df.loc[:, "name"] = df["first_name"].str.cat(df["last_name"], sep = ' ')

4. The insert function

By default, new columns are added at the end so it becomes the last column. If we need to add the new column at a specific location (e.g. as the first one), we can use the insert function.

For instance, in the previous example, having the name column as last while the first_name and last_name are at the beginning doesn’t seem nice.

Let’s drop the name column and add it back again but as the first column.

# drop the name column
df = df.drop(["name"], axis = 1)

# add the name column as the first column
df.insert(0, "name", df["first_name"].str.cat(df["last_name"], sep = ' '))

# display DataFrame
df
df (image by author)
df (image by author)

The index function has 3 parameters:

  • The first one is the index of the new column (0 for the first column, 1 for the second, and so on).
  • The second on is the column name.
  • Third one is the column values.

5. Pandas where function

The where function of Pandas allows for adding a new column with values determined using a condition based on other columns.

# initialize the column with all 0s
df.loc[:, "high_salary"] = 0

# update the values as 1 for the rows that do not fit the given condition
df.loc[:, "high_salary"] = df.where(df["salary"] <= 48000, 1)

# display DataFrame
df
df (image by author)
df (image by author)

We added a new column called high_salary which takes a value of 1 if the salary is more than 48000 and 0 otherwise.

The where function updates the values that do not fit the condition. This is the reason why we specify the condition as being equal to or less than 48000.

The values for the rows that fit the condition remain the same.


6. NumPy where function

We can also use the where function of NumPy for adding new columns. It’s more flexible than Pandas’s where because it allows for updating values that fit and do not fit the condition.

Let’s see how we’d create the high_salary column with NumPy’s where :

# drop the existing high_salary column
df = df.drop(["high_salary"], axis = 1)

# create the column
df.loc[:, "high_salary"] = np.where(df["salary"] <= 48000, 0 , 1)

# display DataFrame
df
df (image by author)
df (image by author)

We did not have to initialize the column with all 0s because we can directly create it with 1s and 0s depending on the given condition.

The first parameter of NumPy’s where function specifies the condition. The second one is the value to be used for rows that fit the condition and the third one is for rows that do not fit the condition.


7. NumPy select function

The select function of NumPy can evaluate multiple conditions and assign a separate value for each one. Thus, we can use it for creating conditional columns as well.

The conditions and associated values are written in a Python list. Then, we just pass them as arguments to the select function. We can also define a default value to be used for rows that do not fit any of the given conditions.

Let’s create a salary_cond column with values high, mid, and low. The values are determined according to the values in the salary column.

# create conditions list
conditions = [
    (df["salary"] > 50000),
    (df["salary"] <= 50000) &amp; (df["salary"] > 45000),
    (df["salary"] <= 45000)
]

# create values list
values = ["high", "mid", "low"]

# create the column
df.loc[:, "salary_cond"] = np.select(conditions, values)

# display DataFrame
df
df (image by author)
df (image by author)

8. Pandas assign function

We can use the assign function for creating multiple columns in a single operation. They can be derived from the existing ones or created from scratch.

Here is an example that shows how to create the department , high_salary , and salary_cond columns using the assign function in a single operation:

# drop the columns first
df = df.drop(["department", "high_salary", "salary_cond"], axis = 1)

# create the columns
df = df.assign(

    department = "engineering",
    high_salary = np.where(df["salary"] <= 48000, 0 , 1),
    salary_condition = np.select(conditions, values)

)

# display DataFrame
df
df (image by author)
df (image by author)

9. Pandas apply function

The apply function, as its name suggests, applies a function along an axis (i.e. columns or rows), which can be used for adding a new column to a DataFrame.

Let’s first create a new DataFrame:

# create a DataFrame with random integers
df = pd.DataFrame(np.random.randint(10, size=(4,5)), columns=list("ABCDE"))

# display DataFrame
df
df (image by author)
df (image by author)

We’ll now use the apply function to create a new column called total , which contains the sum of values in other columns:

# create a DataFrame with random integers
df["total"] = df.apply(np.sum, axis=1)

# display DataFrame
df
df (image by author)
df (image by author)

Please note that we can do the same operation using df["total"] = df.sum(axis=1) but there might be cases where we need to perform complex functions along an axis to create a new column.


10. Lambda expressions

One of the cool things about the apply function is that we can use it with a lambda expression or custom function.

Consider the following DataFrame:

# create DataFrame
rates = pd.DataFrame({

    "item": ["A", "B", "C", "D"],
    "rates": [[11, 15, 12], [5, 7, 4], [24, 18, 22], [42, 39, 27]]
})

# display DataFrame
rates
rates (image by author)
rates (image by author)

Let’s say we need to create a new column that has the minimum value of the lists in the rates column. We can do this task with apply function and a lambda expression as follows:

# create the min_rate column
rates["min_rate"] = rates["rates"].apply(lambda x: pd.Series(x).min())

# display DataFrame
rates
rates (image by author)
rates (image by author)

Adding new columns to a DataFrame is a frequently done operation. Pandas is quite flexible and efficient in this task as we’ve seen 10 different ways of adding new columns.

Knowing these ways will definitely improve your Pandas skills and make you more productive at using this library.

You can become a Medium member to unlock full access to my writing, plus the rest of Medium. If you already are, don’t forget to subscribe if you’d like to get an email whenever I publish a new article.

Thank you for reading. Please let me know if you have any feedback.


Related Articles