
In the last tutorial, we looked at the various approaches to read the data files containing the date-time variables. In this tutorial, we will discuss how to handle columns of an existing dataframe that contains date-time data but are still read by Python as an object datatype.
Assumption and Recommendation
Being hands-on is the key to master programming. We recommend that you continue to implement the codes as you follow through with the tutorial. The sample data and the associated Jupiter notebook are available in the Scenario_6 folder of this GitHub link.
If you are new to GitHub and want to learn it, please go through [this](https://towardsdatascience.com/getting-started-guide-anaconda-80a4d30d3486) tutorial. To set up a new Python environment on your system, please go through this tutorial.
Following is the list of Python concepts and pandas functions/ methods used in the tutorial:
Pandas functions
- read_csv
- to_datetime
Challenges & Solutions
Knowing the dataset
As explained in the last tutorial, we have created a dummy dataset that contains multiple versions of a data field, _release_date_. The data dictionary of this dummy dataset is as follows:
- release_date – Actual date column with first date value deleted.
- release_date_int – Another column containing date information but in an integer format, for example, date 2020–02–12 is present as 20200212 in YYYYMMDD format.
- release_date_text – Column containing dates in text format, and # as the separator. The format code is YYYY#DD#MM.
- Year – Column containing only year values of the date data.
- Month – Column containing only month values of the date data.
- Day – Column containing information on days
A snapshot of the dataset is shared below:

In all our scenarios explained below, we have assumed that this dummy data file is read in Python using the vanilla _read_csv_ implementation. The code to read the dummy data and information on each column after reading it is as below:
#### Sample Code
#### Importing Pandas
import pandas as pd
#### Importing Data File - Change the Windows Folder Location
imp_data = pd.read_csv("C:UjjwalAnalyticsGitScenario_6Date_Dataset.csv")
#### Checking the dataset information
imp_data.info()
#### Sample output
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3000 entries, 0 to 2999
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 release_date 2999 non-null object
1 release_date_int 3000 non-null int64
2 release_date_text 3000 non-null object
3 year 3000 non-null int64
4 month 3000 non-null int64
5 day 3000 non-null int64
dtypes: int64(4), object(2)
memory usage: 140.8+ KB
Challenge 1 – Converting the column containing the date-time data in text format into a date-time object
In this scenario, we are trying to convert the _release_datetext column from and object datatype into the date-time datatype. The sample code to demonstrate this is as below:
#### Sample Code
#### Datatype Conversion (Assuming the data is already read into a dataframe)
imp_data["release_date_text_converted"] = pd.to_datetime(imp_data["release_date_text"], format="%Y#%d#%m")
#### Checking the dataset information
imp_data.info()
#### Sample output
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3000 entries, 0 to 2999
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 release_date 2999 non-null object
1 release_date_int 3000 non-null int64
2 release_date_text 3000 non-null object
3 year 3000 non-null int64
4 month 3000 non-null int64
5 day 3000 non-null int64
6 release_date_text_converted 3000 non-null datetime64[ns]
dtypes: datetime64[ns](1), int64(4), object(2)
memory usage: 164.2+ KB
Explanation
- The new column _release_date_text_converted_ is now available as a date-time object.
- to_datetime – The function expects the first argument to be the column (series in Pandas) that we want to convert into a date-time object. As the second parameter, we have to provide the format in which the data currently exists.
In our case, the data is in YYYY#DD#MM format. In this format, we have represented the year component as %Y, the month component as %m, and the date component as %d.
Challenge 2 – Converting the column containing the date in integral form into a date-time object
This scenario is a minor diversion from the previous one. Here, rather than having the source column in text format, we have it in integral format. The code to implement this solution is as follows:
#### Sample Code
#### Datatype Conversion (Assuming the data is already read into a dataframe)
imp_data["release_date_int_converted"] = pd.to_datetime(imp_data["release_date_int"], format="%Y%m%d")
#### Checking the dataset information
imp_data.info()
#### Sample output
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3000 entries, 0 to 2999
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 release_date 2999 non-null object
1 release_date_int 3000 non-null int64
2 release_date_text 3000 non-null object
3 year 3000 non-null int64
4 month 3000 non-null int64
5 day 3000 non-null int64
6 release_date_text_converted 3000 non-null datetime64[ns]
7 release_date_int_converted 3000 non-null datetime64[ns]
dtypes: datetime64[ns](2), int64(4), object(2)
memory usage: 187.6+ KB
Explanation
- The new column _release_date_int_converted_ is now available as a date-time object.
- to_datetime— The code in this solution remains the same except the input to the format parameter. Since there were no separators between the date components, the revised format we have used is %Y%m%d.
Challenge 3 – Manually adding a new column with a constant date value to the data frame
Sometimes the situation warrants us to create a new date column with some fixed values. A minor modification to the above code can help us achieve this.
#### Sample Code
#### Datatype Conversion (Assuming the data is already read into a dataframe)
imp_data["new_constant_date_column"] = pd.to_datetime("1985-17-03", format="%Y-%d-%m")
#### Checking the dataset information
imp_data.info()
#### Sample output
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3000 entries, 0 to 2999
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 release_date 2999 non-null object
1 release_date_int 3000 non-null int64
2 release_date_text 3000 non-null object
3 year 3000 non-null int64
4 month 3000 non-null int64
5 day 3000 non-null int64
6 release_date_text_converted 3000 non-null datetime64[ns]
7 release_date_int_converted 3000 non-null datetime64[ns]
8 new_constant_date_column 3000 non-null datetime64[ns]
dtypes: datetime64[ns](3), int64(4), object(2)
memory usage: 211.1+ KB

Explanation
- Observe the new column _new_constant_date_column_ containing the constant date values.
- to_datetime – This time, rather than passing the column containing the date-time objects, we have passed a constant date value to the function. The format parameter is also adjusted accordingly.
Closing note
Given the approach we have taken in this tutorial, can you think of a solution to convert the three date columns: year, month, and day into a single date column? Refer to the Jupyter notebook shared through GitHub Repository for the solution.
In this tutorial, we learned about converting dataframe columns from any data type into a date-time object. In the next tutorial, we will go through the various data-wrangling activities associated with the date-time columns.
HAPPY LEARNING ! ! ! !