
DataFrame is a two-dimensional data structure with labeled rows and columns. We often need to add new columns as part of data analysis or feature engineering processes.
There are many different ways of adding new columns. What suits best to your need depends on the task at hand.
In this article, we’ll learn 10 ways to add a column to Pandas DataFrames.
Let’s start by creating a simple DataFrame using the DataFrame
constructor of Pandas. We’ll pass the data as a Python dictionary
with column names being keys and rows being the values of the dictionary.
import pandas as pd
# create DataFrame
df = pd.DataFrame(
{
"first_name": ["Jane", "John", "Max", "Emily", "Ashley"],
"last_name": ["Doe", "Doe", "Dune", "Smith", "Fox"],
"id": [101, 103, 143, 118, 128]
}
)
# display DataFrame
df

1. Use a constant value
We can add a new column of a constant value as follows:
df.loc[:, "department"] = "engineering"
# display DataFrame
df

2. Use array-like structure
We can use an array-like structure to add a new column. In this case, make sure the number of values in the array is the same as the number of rows in the DataFrame.
df.loc[:, "salary"] = [45000, 43000, 42000, 45900, 54000]
In the example above, we used a Python list. Let’s determine the values randomly with NumPy’s random
module.
import numpy as np
df.loc[:, "salary"] = np.random.randint(40000, 55000, size=5)
# display DataFrame
df

3. Derive from other columns
It’s highly common to derive new columns from the existing ones.
Let’s do a simple one to demonstrate this case and create a name
column by combining the first_name
and last_name
.
df.loc[:, "name"] = df["first_name"] + ' ' + df["last_name"]
# display DataFrame
df

We can also use the cat
function under the str
accessor to combine (i.e. concatenate) strings.
df.loc[:, "name"] = df["first_name"].str.cat(df["last_name"], sep = ' ')
4. The insert function
By default, new columns are added at the end so it becomes the last column. If we need to add the new column at a specific location (e.g. as the first one), we can use the insert
function.
For instance, in the previous example, having the name
column as last while the first_name
and last_name
are at the beginning doesn’t seem nice.
Let’s drop the name
column and add it back again but as the first column.
# drop the name column
df = df.drop(["name"], axis = 1)
# add the name column as the first column
df.insert(0, "name", df["first_name"].str.cat(df["last_name"], sep = ' '))
# display DataFrame
df

The index
function has 3 parameters:
- The first one is the index of the new column (0 for the first column, 1 for the second, and so on).
- The second on is the column name.
- Third one is the column values.
5. Pandas where function
The where
function of Pandas allows for adding a new column with values determined using a condition based on other columns.
# initialize the column with all 0s
df.loc[:, "high_salary"] = 0
# update the values as 1 for the rows that do not fit the given condition
df.loc[:, "high_salary"] = df.where(df["salary"] <= 48000, 1)
# display DataFrame
df

We added a new column called high_salary
which takes a value of 1 if the salary
is more than 48000 and 0 otherwise.
The where
function updates the values that do not fit the condition. This is the reason why we specify the condition as being equal to or less than 48000.
The values for the rows that fit the condition remain the same.
6. NumPy where function
We can also use the where
function of NumPy for adding new columns. It’s more flexible than Pandas’s where
because it allows for updating values that fit and do not fit the condition.
Let’s see how we’d create the high_salary
column with NumPy’s where
:
# drop the existing high_salary column
df = df.drop(["high_salary"], axis = 1)
# create the column
df.loc[:, "high_salary"] = np.where(df["salary"] <= 48000, 0 , 1)
# display DataFrame
df

We did not have to initialize the column with all 0s because we can directly create it with 1s and 0s depending on the given condition.
The first parameter of NumPy’s where
function specifies the condition. The second one is the value to be used for rows that fit the condition and the third one is for rows that do not fit the condition.
7. NumPy select function
The select
function of NumPy can evaluate multiple conditions and assign a separate value for each one. Thus, we can use it for creating conditional columns as well.
The conditions and associated values are written in a Python list. Then, we just pass them as arguments to the select
function. We can also define a default value to be used for rows that do not fit any of the given conditions.
Let’s create a salary_cond
column with values high, mid, and low. The values are determined according to the values in the salary
column.
# create conditions list
conditions = [
(df["salary"] > 50000),
(df["salary"] <= 50000) & (df["salary"] > 45000),
(df["salary"] <= 45000)
]
# create values list
values = ["high", "mid", "low"]
# create the column
df.loc[:, "salary_cond"] = np.select(conditions, values)
# display DataFrame
df

8. Pandas assign function
We can use the assign
function for creating multiple columns in a single operation. They can be derived from the existing ones or created from scratch.
Here is an example that shows how to create the department
, high_salary
, and salary_cond
columns using the assign
function in a single operation:
# drop the columns first
df = df.drop(["department", "high_salary", "salary_cond"], axis = 1)
# create the columns
df = df.assign(
department = "engineering",
high_salary = np.where(df["salary"] <= 48000, 0 , 1),
salary_condition = np.select(conditions, values)
)
# display DataFrame
df

9. Pandas apply function
The apply
function, as its name suggests, applies a function along an axis (i.e. columns or rows), which can be used for adding a new column to a DataFrame.
Let’s first create a new DataFrame:
# create a DataFrame with random integers
df = pd.DataFrame(np.random.randint(10, size=(4,5)), columns=list("ABCDE"))
# display DataFrame
df

We’ll now use the apply
function to create a new column called total
, which contains the sum of values in other columns:
# create a DataFrame with random integers
df["total"] = df.apply(np.sum, axis=1)
# display DataFrame
df

Please note that we can do the same operation using df["total"] = df.sum(axis=1)
but there might be cases where we need to perform complex functions along an axis to create a new column.
10. Lambda expressions
One of the cool things about the apply
function is that we can use it with a lambda expression or custom function.
Consider the following DataFrame:
# create DataFrame
rates = pd.DataFrame({
"item": ["A", "B", "C", "D"],
"rates": [[11, 15, 12], [5, 7, 4], [24, 18, 22], [42, 39, 27]]
})
# display DataFrame
rates

Let’s say we need to create a new column that has the minimum value of the lists in the rates
column. We can do this task with apply
function and a lambda expression as follows:
# create the min_rate column
rates["min_rate"] = rates["rates"].apply(lambda x: pd.Series(x).min())
# display DataFrame
rates

Adding new columns to a DataFrame is a frequently done operation. Pandas is quite flexible and efficient in this task as we’ve seen 10 different ways of adding new columns.
Knowing these ways will definitely improve your Pandas skills and make you more productive at using this library.
You can become a Medium member to unlock full access to my writing, plus the rest of Medium. If you already are, don’t forget to subscribe if you’d like to get an email whenever I publish a new article.
Thank you for reading. Please let me know if you have any feedback.