The world’s leading publication for data science, AI, and ML professionals.

7 Functions You Can Use to Create New Columns in a Pandas DataFrame

A typical task in data analysis, data cleaning, and feature engineering

Pandas DataFrame is a two-dimensional data structure with labeled rows and columns. A row represents an observation (i.e. a data point) and the columns are the features that describe the observations.

We sometimes need to create a new column to add a piece of information about the data points. The columns can be derived from the existing columns or new ones from an external data source.

In this article, we will learn about 7 functions that can be used for creating a new column.

Let’s start by creating a sample DataFrame.

import numpy as np
import pandas as pd
df = pd.DataFrame({
    "division": ["A", "B", "A", "C", "B"],
    "category": ["101-A", "14-C", "1020-D", "112-A", "11-A"],
    "mes1": np.random.randint(10, 40, size=5),
    "mes2": np.random.randint(10, 60, size=5),
    "mes3": np.random.randint(10, 100, size=5)

})
df


1. Pandas where

The where function of Pandas can be used for creating a column based on the values in other columns.

We define a condition or a set of conditions and take a column. The values in this column remain the same for the rows that fit the condition. The other values are replaced with the specified value.

It is easier to understand with an example. Let’s say we want to update the values in the mes1 column based on a condition on the mes2 column.

If the value in mes2 is higher than 50, we want to add 10 to the value in mes1. Otherwise, we want to keep the value as is. Here is how we can perform this operation using the where function.

df["mes_updated"] = df["mes1"].where(

    df["mes2"] <= 50, 
    df["mes1"] + 10
)
df[["mes1", "mes2", "mes_updated"]]

As we see in the output above, the values that fit the condition (mes2 ≤ 50) remain the same. The other values are updated by adding 10.


2. NumPy where

The where function of NumPy is more flexible than that of Pandas. We are able to assign a value for the rows that fit the given condition. This is not possible with the where function of Pandas as the values that fit the condition remain the same.

Let’s do the same example. If the value in mes2 is higher than 50, we want to add 10 to the value in mes1. Otherwise, we want to subtract 10.

df["mes_updated_2"] = np.where(

    df["mes2"] <= 50, 
    df["mes1"] - 10,
    df["mes1"] + 10
)
df[["mes1", "mes2", "mes_updated_2"]]


3. NumPy select

The where function assigns a value based on one set of conditions. The select function takes it one step further. It accepts multiple sets of conditions and is able to assign a different value for each set of conditions.

Let’s create a new column based on the following conditions:

  • If division is A and mes1 is higher than 10, then the value is 1
  • If division is B and mes1 is higher than 10, then the value is 2
  • Otherwise the value is 0
conditions = [

  (df["division"] == "A") & (df["mes1"] > 10),
  (df["division"] == "B") & (df["mes1"] > 10)

]
values = [1, 2]
df["select_col"] = np.select(conditions, values, default=0)
df[["division", "mes1", "select_col"]]

The conditions and the associated values are written in separate Python lists. The default parameter specifies the value for the rows that do not fit any of the listed conditions.


4. Pandas assign

The assign function of Pandas can be used for creating multiple columns in a single operation. We can derive columns based on the existing ones or create from scratch. If we do the latter, we need to make sure the length of the variable is the same as the number of rows in the DataFrame.

Let’s do an example.

df = df.assign(

    cat1 = df["category"].str.split("-", expand=True)[0],
    mes_all = lambda x: x.mes1 ** 2 + x.mes2 * 10 + x.mes3,
    id = [1, 2, 3, 4, 5]
)

We have created 3 new column:

  • The first one is the first part of the string in the category column, which is obtained by string splitting.
  • The second one is created using a calculation that involves the mes1, mes2, and mes3 columns.
  • The third one is just a list of integers.

Here are the new columns:


5. Pandas insert

When we create a new column to a DataFrame, it is added at the end so it becomes the last column. The insert function allows for specifying the location of the new column in terms of the column index.

Let’s create an id column and make it as the first column in the DataFrame.

df.insert(0, "id", [1, 2, 3, 4, 5])
df

The insert function takes 3 arguments:

  • The first one is the index of the new column (0 means the first one).
  • The second one is the name of the new column.
  • The third one is the values of the new column.

6. Pandas split

The split function is quite useful when working with textual data. Consider we have a text column that contains multiple pieces of information. We can split it and create a separate column for each part.

Note: The split function is available under the str accessor.

Let’s create cat1 and cat2 columns by splitting the category column.

df[["cat1","cat2"]] = df["category"].str.split("-", expand=True)
df[["category","cat1","cat2"]]


7. Pandas cat

The cat function is the opposite of the split function. It can be used for creating a new column by combining string columns. The cat function is also available under the str accessor.

Here is how we would create the category column by combining the cat1 and cat2 columns.

df["category"] = df["cat1"].str.cat(df["cat2"], sep="-")

Creating new columns in a typical task in data analysis, data cleaning, and feature engineering for Machine Learning. Thankfully, Pandas makes it quite easy by providing several functions and methods. In this article, we have covered 7 functions that expedite and simplify these operations.


You can become a Medium member to unlock full access to my writing, plus the rest of Medium. If you already are, don’t forget to subscribe if you’d like to get an email whenever I publish a new article.


Thank you for reading. Please let me know if you have any feedback.


Related Articles