The world’s leading publication for data science, AI, and ML professionals.

The time and its digitization

Eight things you should know before working with dates and times

I never worried too much about dates, times, and time stuff in general. Not only in my data science work but in daily life as well. I admit: I was so confused that I had a very messy idea about when to bring forward or backward the hands of my clock and even the direction of the change. This happened until I needed to deal with timestamps for the first time. That work changed my view of time and it has made me aware of common pitfalls related to managing these types of data.

Obviously, I knew the Greenwich meridian and the fact that this is the reference for time zones, but I didn’t know much more. The encodings for time zones was for me a great mystery indeed.

Here I talk about eight things that helped me to overcome my confusion, and they apply regardless of the software used. Lets’ start!

1. How the flowing time is measured

In computer science, a common method to represent a point on the timeline is to count the number of seconds elapsed from a fixed starting point, named epoch. The Unix time, also named POSIX time, is a widespread standard enumerating seconds since midnight of January 1st, 1970 UTC (wait a moment to understand what UTC is).

POSIX is the standard adopted by default from common data science environments such as R and Python. Microsoft Excel, for one thing, uses different standards.

Computer programs render the daytime as the number of seconds from midnight. Hence, remember: midnight begins a day, so she is the hour 0, not 24.

Example > Encoding integer: 1609455600, Decoding render: 2020–12–31 23:00:00 GMT.

If you are managing simply date objects, the encoding will be the number of days from the epoch.

2. What UTC is

Measuring time requires a universal reference, that historically is given by the Greenwich meridian. The "Greenwich hour", more formally Greenwich Mean Time (GMT), is calculated based on Earth rotation. The Universal Time (UT) in its variations (whose principal is UT1) is a new "astrophysical" reference to define the time that replaced GMT. Anyway, the current way to mark time is by using atomic clocks. TAI (International Atomic Time) is the time provided by this technology.

Time zones are defined using neither GMT/UT1 nor TAI, but UTC, the Universal Time Coordinated. UTC descends from TAI, but it’s slightly different. Due to the variations in the rotation speed of the Earth, UT1 gradually shifts (usually delaying) from TAI. This shift is compensated by regulating the atomic time adding a leap second to maintain the difference less than 1 second. The insertion of leap seconds is unpredictable, and conventionally it occurs on June 30 or December 31. The result of adding leap seconds to atomic time is UTC.

If you got lost reading this intricate issue, I suggest you consult this brief note that sums up the matter very well.

Considering common data science problems, the question of leap seconds is just culture, because Posix time ignores leap seconds: UTC in your machine is not the true UTC! Each day in your machine encompasses 86,400 seconds, not one more.

3. How time zones are defined

A time zone is an area that observes a uniform standard time, where the local time is measured as shifting hours from UTC. London, since she is the owner of the Greenwich meridian, has exactly UTC (i.e. UTC+0). Moving east we need to add one (UTC+1), two (UTC+2), or n hours in order to regulate our human lives based on the cycle of the Sun. Similarly, moving west we need to subtract hours (UTC-1, UTC-2, etc.).

Ok, that’s so trivial! Are you telling me that if you say "London hour" you mean UTC and if you say "Rome hour" you mean UTC+1, right? Ehm… not exactly, because this is Solar Time, but Daylight Saving Time exists too.

4. There is an infernal thing called Daylight Saving Time

During the Spring, usually many Countries advance their clocks by one hour to take advantage of one more light hour. Clocks go forward, going back in Autumn. Hence, the same Country can adopt different conventions depending on the period of the year.

As a result, the "London hour" is GMT during Winter and BST (British Summer Time) during Summer. Similarly, Rome follows CET (Central European Time) during Winter and CEST (Central European Summer Time) during Summer. As you can see in the diagram below, in hot seasons London adopts the same time which Rome adopts during cold seasons!

For this reason, a time zone is defined from the offset from UTC plus the information ruling the daylight saving time.

The transition between UTC offsets within the same time zone has two paradoxical effects:

  • during the last Sunday of March, at 2:00 AM clock hands go forward, so times between 2:00 and 3:00 do not occur;
  • during the last Sunday of October, at 3:00 AM clock hands go backward, so times between 2:00 and 3:00 occur twice.

The diagram below shows three lines representing three time zones: the reference UTC and the two standards in use in Italy: CET and CEST. The transition between CET and CEST occurs during the last Sunday of March, while during the last Sunday of October Italy returns to the CET zone.

5. There is a blessed thing called Time Zone Database

To manage this complexity, a time zone database is available. This source, known as IANA or the "Olson" Database, provides a uniform naming convention for time zones in the form Area/Location (such as Europe/London or Europe/Rome) and includes transitions for daylight saving time.

Thanks to the time zone database, working with timestamp objects you don’t have to worry too much about daylight saving time. Timestamp encoding stems from this database, so when for example you declare a time zone as Europe/Rome, your software automatically manages the transition.

6. POSIX encoding doesn’t care

Since Time is stored as a number of seconds from the epoch, it’s timezone-free. Time zone is just an attribute. You can convert a timestamp from one time zone to another, but its value doesn’t change. Time zone affects its decoding, not its encoding.

7. The question "what day is today?" is relative

Because of time zones, the current date, as well as the time, is relative. It’s obvious, but we often forget it.

Example > The timestamp 1609455600 is rendered as 2020–12–31 23:00:00 GMT if you are in the UK, but 2021–01–01 00:00:00 CET if you are in Italy. Two different days, and more: two different years!

You have to pay close attention to this question especially when you want to extract a date from a timestamp because depending on the time zone you could get different results from the same object.

In the R software, the function as.Date of the base environment has a different behavior when compared to the function as_date of the popular package lubridate. Both functions extract the date of the day from a timestamp giving the option to specify the output time zone to render the date. However, as default, the first one uses the current locale, while the second one uses the original time zone of the timestamp object.

8. There are many ways to render dates and times

There are many different standards for time representation. A common one is ISO 8601, in which the Unix epoch is:

1970–01–01T00:00:00+00:00

According to this norm, the date (in the format "yyyy-mm-dd") is separated from the time (in the format "hh:mm:ss") from the character "T", an abbreviation for "Time". The trailing string "+00:00" stands for the offset from UTC ("hh:mm").

Example > Christmas 2020 in Rome has come in "2020–12–31T00:00:00+01:00".

For the special case UTC+0, the character "Z" (an abbreviation for "Zero meridian") can be used to specify the time zone using a compact form:

1970–01–01T00:00:00Z

An equivalent way to express the same value is:

1970–01–01T00:00:00.000Z

where the last digits ".000" before "Z" quantifies the milliseconds elapsed from the second recorded.

A simple but explicative document about time standards, but also about time rules in general, is the guide of Markus Kuhn. I recommend you take a look at it.


Related Articles