Data Processing in Python

As written in the title, this article is part 2 of my Dealing with Dates in Python’s DataFrame series. Below show the content of each part of the Dealing with Dates in Python’s DataFrame series.

In my previous article, I have shown the DateTime series creation methods. Then, in this article, I will show the basic properties and methods to deal with the Datetime series in a data frame.
With that, this article will be structured as below:
- Convert Datatype to DateTime
- General DateTime Info Extraction
- Check if the Date is the Start or the End of the Period
- Check if the Date Belongs to a Leap Year
- Check the Number of Days in the Month
Let’s get started!
Convert Data Type to DateTime
For the date creation methods I show in the previous article, the series is created as a DateTime object. When you are reading the data from Excel or any other data source, the DateTime column will be read as a string object if you did not parse them as dates. To extract the DateTime info from the DateTime series, the column needs to be converted into the DateTime data type beforehand.
There are two methods to convert the data type into DateTime.
Pandas.Series.astype("DateTime")
pandas.datetime(pandas.SeriesSeries)
I created a demo.csv file with the script below to demonstrate the methods and properties for this section.
Python">import pandas as pd
import datetime as dt
df = pd.DataFrame()
initial_date = pd.Timestamp("2023-01-04 21:55:00")
df["timedelta_range"] = pd.timedelta_range(start="1 day", end="5days", freq="4H")
df.insert(loc=0, column="initial_date",value=initial_date)
df.insert(loc=2, column="next_date",value=df["initial_date"] + df["timedelta_range"])
print(df.head(10))
print("Data Type: ", df.dtypes)
df.to_csv("demo.csv")
The output data frame is as in the screenshot below.

Now, we will read the file generated.
import pandas as pd
df = pd.read_csv(r"demo.csv", usecols=["next_date"])
print(df.info())
df.head()
For demonstration purposes, only the next_date
column will be read.

As you can see, when the column is imported directly without parsing it as DateTime, the column will be a string column where the Dtype is the object. Below are two popular methods to convert the column into a DateTime data type.
- pandas.to_datetime(pandas.Series)
df["next_date"] = pd.to_datetime(df["next_date"])
df.info()
- pandas.Series.astype("datetime64")
df["next_date"] = df["next_date"].astype("datetime64")
df.info()
Output from both scripts above:

Alternatively, you can parse the columns into a DateTime object at the point you import the data with parse_dates
parameter.
import pandas as pd
df = pd.read_csv(r"demo.csv", usecols=["next_date"], parse_dates=["next_date"])
print(df.info())
df.head()

General DateTime Info Extraction
There is a lot of info that can be obtained from the datetime series.
- Timestamp
- Week
- Day of Year
- Quarter
- ISO Calendar
Extract the Timestamp Info
Below is the list of properties and methods to return timestamp info.
- pandas.Series.dt.date – Returns the date part of Timestamps without time and timezone information.
- pandas.Series.dt.time – Return the time part of the Timestamps.
- pandas.Series.dt.year – Return the year of the datetime.
- pandas.Series.dt.month – Return the month as January=1, December=12
- pandas.Series.dt.month_name() – Return the month names of the Series or DatetimeIndex with specified locale.
- pandas.Series.dt.day – Return the day of the datetime.
- pandas.Series.dt.hour – Return the hours of the datetime.
- pandas.Series.dt.minute – Return the minutes of the datetime.
- pandas.Series.dt.second – Return the seconds of the datetime.
- pandas.Series.dt.microsecond – Return the microseconds of the datetime.
- pandas.Series.dt.nanosecond – Return the nanoseconds of the datetime.
import datetime as dt
The datetime module needs to be imported before using the methods or properties under series.dt
. Below is an example of extracting the timestamp info.
df.insert(loc=1, column="Date_",value=df["next_date"].dt.date)
df.insert(loc=2, column="Time",value=df["next_date"].dt.time)
df.insert(loc=3, column="year",value=df["next_date"].dt.year)
df.insert(loc=4, column="month",value=df["next_date"].dt.month)
# note that the month_name is a method instead of properties
df.insert(loc=5, column="month_name",value=df["next_date"].dt.month_name())
df.insert(loc=6, column="day",value=df["next_date"].dt.day)
df.insert(loc=7, column="hour",value=df["next_date"].dt.hour)
df.insert(loc=8, column="minute",value=df["next_date"].dt.minute)
df.head()
Output:

One thing to take note of is the columns created are not a DateTime object, even the "Date_" column.

You may notice that the examples for second, microsecond and nanosecond are not demonstrated. That’s because they do not apply to the dataset. Furthermore, the way to apply is the same. The column type needs to be converted to datetime before using the properties or methods to return the respective value.
Extract Week Info/Day of Year/Quarter/ISO Calendar
Below is the list of properties and methods to return the week number, the day of the year, the quarter and the info based on the ISO Calendar for the DateTime series.
Week Info
- pandas.Series.dt.dayofweek – The day of the week with Monday=0, Sunday=6.
- pandas.Series.dt.day_of_week – The day of the week with Monday=0, Sunday=6.
- pandas.Series.dt.weekday – The day of the week with Monday=0, Sunday=6.
- pandas.Series.dt.day_name() – Return the day names of the Series or DatetimeIndex with specified locale.
Day of Year
- pandas.Series.dt.dayofyear – The ordinal day of the year.
- pandas.Series.dt.day_of_year – The ordinal day of the year.
Quarter
- pandas.Series.dt.quarter – The quarter of the date.
ISO Calendar
- pandas.Series.dt.isocalendar() – Calculate year, week, and day according to the ISO 8601 standard. (Return year, week and day columns)
To show the fun stuff we can do with the methods/properties above, I create a list of dates with randomly picked dates distributed throughout the year as shown below.
import pandas as pd
import datetime as dt
date_list = ["2022-10-03", "2022-11-17", "2022-12-14", "2023-01-23", "2023-02-14", "2023-03-23", "2023-04-11", "2023-05-28", "2023-06-24", "2023-07-04", "2023-08-06", "2023-09-08"]
df = pd.DataFrame(date_list, columns=["Date"])
df["Date"] = pd.to_datetime(df["Date"])
df.head(12)
As we are not reading from a file, there is no parse_dates
function available. Hence, the column has to be converted into datetime manually.

Below is an example of extracting the week, day of the year, quarter and ISO Calendar. The columns are named based on the properties or methods names for better understanding.
df.insert(loc=1, column="Day of Week",value=df["Date"].dt.day_of_week)
df.insert(loc=2, column="Weekday",value=df["Date"].dt.weekday)
# note that the month_name is a method instead of properties
df.insert(loc=3, column="Day Name",value=df["Date"].dt.day_name())
# day of the year
df.insert(loc=4, column="Day of Year",value=df["Date"].dt.day_of_year)
# quarter
df.insert(loc=5, column="Quarter",value=df["Date"].dt.quarter)
# iso calendar
df.insert(loc=6, column="ISO Year",value=df["Date"].dt.isocalendar().year)
df.insert(loc=7, column="ISO Week",value=df["Date"].dt.isocalendar().week)
df.insert(loc=8, column="ISO Day",value=df["Date"].dt.isocalendar().day)
df[["Date", "Day of Week", "Weekday", "Day Name", "Day of Year", "Quarter", "ISO Year", "ISO Week", "ISO Day"]].head(12)
Output:

Below is a summary of the table above:
- For both
day_of_week
andweekday
properties, they return the day of the week with index counting starting from 0. - For
day_of_year
,quarter
properties and theisocalendar()
method, they return output with index counting starting from 1.
The isocalendar()
method index for counting the weekday is start from 1, while the weekday
starts from 0. They both count weekdays starting from Monday. In another word, the first index is referring to Monday.
Check if the Date is the Start or the End of the Period
For this section, a different date list will be created to better demonstrate the properties below.
- pandas.Series.dt.is_month_start – Indicates whether the date is the first day of the month.
- pandas.Series.dt.is_month_end – Indicates whether the date is the last day of the month.
- pandas.Series.dt.is_quarter_start – Indicator for whether the date is the first day of a quarter.
- pandas.Series.dt.is_quarter_end – Indicator for whether the date is the last day of a quarter.
- pandas.Series.dt.is_year_start – Indicate whether the date is the first day of a year.
- pandas.Series.dt.is_year_end – Indicate whether the date is the last day of the year.
Example:
date_list = ["2023-01-01", "2023-01-23", "2023-01-31", "2023-02-01", "2023-02-28", "2023-04-01", "2023-06-30", "2023-09-30", "2023-11-30", "2023-12-31"]
df = pd.DataFrame(date_list, columns=["Date"])
df["Date"] = pd.to_datetime(df["Date"])
df.insert(loc=1, column="Month Start",value=df["Date"].dt.is_month_start)
df.insert(loc=2, column="Month End",value=df["Date"].dt.is_month_end)
df.insert(loc=3, column="Quarter Start",value=df["Date"].dt.is_quarter_start)
df.insert(loc=4, column="Quarter End",value=df["Date"].dt.is_quarter_end)
df.insert(loc=5, column="Year Start",value=df["Date"].dt.is_year_start)
df.insert(loc=6, column="Year End",value=df["Date"].dt.is_year_end)
df.head(12)
Output:

Thoughts: I believe the properties are best for anyone that needs to prepare a new report on a monthly, quarterly or yearly basis.
These properties will help them to refresh their report based on the automation logic created. Other than that, the properties above may also be useful in the calculation that needs to be restarted periodically.
Check if the Date Belongs to a Leap Year
- pandas.Series.dt.is_leap_year – Boolean indicator if the date belongs to a leap year.
A leap year is a year, which has 366 days (instead of 365) including the 29th of February as an intercalary day. Leap years are years which are multiples of four except for the years divisible by 100 but not by 400.
We can demonstrate this function with the date created with the period range.
df = pd.DataFrame()
df["Year"] = pd.period_range(start="2022/1/1", periods=10, freq="Y")
df.insert(loc=1, column="Leap Year",value=df["Year"].dt.is_leap_year)
print(df.head(10))
Output:

Check the Number of Days in the Month
Both of the properties below can return the number of days in a month.
- pandas.Series.dt.daysinmonth – The number of days in the month.
- pandas.Series.dt.days_in_month – The number of days in the month.
df = pd.DataFrame()
df["Month"] = pd.period_range(start="2022/1/1", periods=12, freq="M")
df.insert(loc=1, column="Days in Month",value=df["Month"].dt.days_in_month)
df.head(12)
Output:

Conclusion
In conclusion, some basic properties and methods to deal with the DateTime series have been explained. The method to convert the datatype of columns that contained datetime objects to datetime has been shown. Then, the basic properties and methods to extract or return the datetime info have been demonstrated. The datetime info like weekday has different indexing for different methods.
Other than that, some methods to check the properties of the date, like the date is the start or end of a period, or whether the date belongs to a leap year have been shown. Lastly, the method to check the number of a date in a month have been presented. These methods and properties might be useful for reporting purposes.
That’s all for the basics of dealing with dates in Python. I hope you enjoy reading this article and hope it helps you to understand more about how to deal with the DateTime series in a DataFrame. Thanks! 😊
Stay Connected
Subscribe on YouTube
Side Note
Part 1 of this article, Dealing with Dates in Python’s DataFrame Part 1 – Date Series Creation.
I have explained the possible manipulation you can perform on the DateTime variables in Dealing with Dates in Python.
In Report Automation Tips with Python, I have explained some tips on report automation. Check it out!
Thank you and congrats for reading to the end 😊 !
Happy 2023!
