8 min read · Dec 1, 2023
--
Recently I realized that, despite having used pandas for several years, I knew very little about working with time series in pandas. That’s why I decided to dive into this topic. As a result, I learned some very interesting things, some of which appeared confusing at first but became clearer the more I learned to use them. For example, did you know there’s not just one or two, but four different time-related concepts in pandas? They are datetimes, periods, time deltas, and time offsets.
In today’s article, I will focus on the first three concepts, because they’re pretty similar to one another in how they’re constructed and used. Note that this article is not meant to be a complete guide on these concepts. Rather, it will be a quick introduction for those of you who wish to either try a taste of time series in pandas or review your knowledge of this topic.
The concept of datetime (also called timestamp) refers to a specific date and time, which may be associated with a specific timezone. In pandas, each datetime is represented by an instance of the Timestamp
class. If you’re familiar with the standard datetime
module in Python, Timestamp
is very similar to the datetime.datetime
class. Sequences of Timestamp
s are collected in DatetimeIndex
instances. As the name suggests, these objects are often used as the indices of objects such as Series
and DataFrame
.
A time span is a regular interval of time defined by two things: a point in time and a frequency. Each time span is represented by a Period
instance. For example, Period("2023Q4", "Q-DEC")
represents the fourth quarter of 2023 (with the year ending in December). As with datetimes, there’s also the PeriodIndex
class for storing sequences of time spans.
Time deltas are absolute differences in time and are represented by the Timedelta
class, which is similar to the datetime.timedelta
class from the standard Python library. Sequences of Timedelta
s are stored in TimedeltaIndex
instances.
The remaining of this article will involve code, so first let’s import the pandas library:
import pandas as pd
pd.__version__
# ------
# Output
# '2.1.3'
Constructing a Single Datetime
To create a single datetime, use the Timestamp
constructor. You can pass to it one of the following:
- A string representing a timestamp
pd.Timestamp("2023-11-26 12:43:52 AM")
# ------
# Output
# Timestamp('2023-11-26 00:43:52')
- An integer or a float representing an epoch time, i.e., the number of units (nanoseconds, by default) from midnight on 1970–01–01 (UTC/GMT). You can specify your desired unit, but note that for floats, the unit will always be reset to nanosecond. The valid unit strings are
"D"
(day),"h"
(hour),"m"
(minute),"s"
(second),"ms"
(millisecond),"us"
(microsecond), and"ns"
(nanosecond).
pd.Timestamp(1700978132, unit="s")
# ------
# Output
# Timestamp('2023-11-26 05:55:32')
- A
datetime.datetime
instance. You must import thedatetime
module beforehand. - Several integer arguments representing different components of a datetime
pd.Timestamp(2023, 11, 26, 18, 25)
# ------
# Output
# Timestamp('2023-11-26 18:25:00')
- Keyword arguments for different components of a datetime
pd.Timestamp(year=2023, month=11, day=26, hour=18, minute=25, nanosecond=123)
# ------
# Output
# Timestamp('2023-11-26 18:25:00.000000123')
Constructing a Sequence of Datetimes
As noted earlier, sequences of datetimes are often used as the indices of Series
and DataFrame
objects, so one way to obtain such a sequence is from a Series
/DataFrame
with this type of index:
ts = pd.Series(
[1, 2, 3],
index=[
pd.Timestamp("2023-11-26"),
pd.Timestamp("2023-11-27"),
pd.Timestamp("2023-11-28")
]
)
ts.index
# ------
# Output
# DatetimeIndex(['2023-11-26', '2023-11-27', '2023-11-28'], dtype='datetime64[ns]', freq=None)
Or you can use either the DatetimeIndex
constructor or the top-level function pd.date_range()
. The first argument to the DatetimeIndex
constructor must be an array or list of datetime-like data:
pd.DatetimeIndex(["20231126", "20231127", "20231128"])
# ------
# Output
# DatetimeIndex(['2023-11-26', '2023-11-27', '2023-11-28'], dtype='datetime64[ns]', freq=None)
The pd.date_range()
function is used to create a range of datetimes. There are essentially four ways to use this function:
- Specify the start, the end, and the frequency (via the
freq
keyword argument). See here for a list of valid frequency strings; these strings can have multiples. By default, the frequency is a calendar day.
pd.date_range("2023-11-26 12", "2023-11-27", freq="8H")
# ------
# Output
# DatetimeIndex(['2023-11-26 12:00:00', '2023-11-26 20:00:00'], dtype='datetime64[ns]', freq='8H')
- Specify the start, the number of datetimes to generate (via the
periods
keyword argument), and the frequency
pd.date_range("2023-11-26 12", periods=3, freq="8H")
# ------
# Output
# DatetimeIndex(['2023-11-26 12:00:00', '2023-11-26 20:00:00',
# '2023-11-27 04:00:00'],
# dtype='datetime64[ns]', freq='8H')
- Specify the end (via the
end
keyword argument), the number of datetimes to generate, and the frequency
pd.date_range(end="2023-11-26 12", periods=3, freq="1.5H")
# ------
# Output
# DatetimeIndex(['2023-11-26 09:00:00', '2023-11-26 10:30:00',
# '2023-11-26 12:00:00'],
# dtype='datetime64[ns]', freq='90T')
- Specify the start, the end, and the number of datetimes to generate. The frequency will be generated automatically (linearly spaced).
pd.date_range("2023-11-26", "2023-11-27", periods=3)
# ------
# Output
# DatetimeIndex(['2023-11-26 00:00:00', '2023-11-26 12:00:00',
# '2023-11-27 00:00:00'],
# dtype='datetime64[ns]', freq=None)
Constructing a Single Time Span
The Period
constructor has a similar API to that of the Timestamp
constructor. You can create a Period
instance from one of the following:
- A string representing a time span
# The fourth quarter of the year 2023
pd.Period("4Q2023")
# ------
# Output
# Period('2023Q4', 'Q-DEC')pd.Period("2023-11-26 13:48")
# ------
# Output
# Period('2023-11-26 13:48', 'T')
- A datetime-like object. You must also specify the span via
freq
keyword argument. See here for a list of valid frequency strings (referred to as period aliases); these strings can have multiples.
pd.Period(pd.Timestamp("2023-11"), freq="2M")
# ------
# Output
# Period('2023-11', '2M')
- Keyword arguments:
year
,month
,quarter
,day
,hour
,minute
, andsecond
. The frequency must also be specified.
pd.Period(year=2023, month=11, day=26, hour=23, freq="h")
# ------
# Output
# Period('2023-11-26 23:00', 'H')
Constructing a Sequence of Time Spans
As with sequences of datetimes, you can obtain a sequence of time spans from the index of a Series
/DataFrame
, from the PeriodIndex
constructor, or from the top-level function pd.period_range()
. You can pass to the PeriodIndex
constructor an array or list of period-like data, along with a period alias specifying the span:
pd.PeriodIndex(["2023Q4", "2024Q1", "2024Q2"], freq="Q")
# ------
# Output
# PeriodIndex(['2023Q4', '2024Q1', '2024Q2'], dtype='period[Q-DEC]')
Or you can use keyword arguments, each taking the value of an integer, array, or Series
:
pd.PeriodIndex(
year=[2023, 2024],
month=pd.Series([1, 3]),
day=12,
freq="H"
)
# ------
# Output
# PeriodIndex(['2023-01-12 00:00', '2024-03-12 00:00'], dtype='period[H]')
The pd.period_range()
function is used to create a range of periods. You must specify two of the following: the start, the end, and the number of periods to generate. By default, the frequency is taken from the start or the end if they are Period
instances, or a calendar day otherwise:
# Specify the start and the end
pd.period_range("2023-11-26", "2023-11-28")
# ------
# Output
# PeriodIndex(['2023-11-26', '2023-11-27', '2023-11-28'], dtype='period[D]')# Specify the start and the end using `Period` instances
pd.period_range(
pd.Period("2023-11-26", freq="1.5D"),
pd.Period("2023-11-30", freq="1.5D")
)
# ------
# Output
# PeriodIndex(['2023-11-26 00:00', '2023-11-27 12:00', '2023-11-29 00:00'], dtype='period[36H]')
# Specify the start and the number of periods,
# along with an optional frequency
pd.period_range("2023-11-26", periods=3)
# ------
# Output
# PeriodIndex(['2023-11-26', '2023-11-27', '2023-11-28'], dtype='period[D]')
# Specify the end and the number of periods,
# along with an optional frequency
pd.period_range(end="2023-11-26", periods=3, freq="1.5H")
# ------
# Output
# PeriodIndex(['2023-11-25 21:00', '2023-11-25 22:30', '2023-11-26 00:00'], dtype='period[90T]')
Constructing a Single Time Delta
Similarly to how you construct single datetimes or single time spans, you can create a single time delta by passing one of the following to the Timedelta
constructor:
- A string representing a time difference/duration
pd.Timedelta("1 day 2 hours")
# ------
# Output
# Timedelta('1 days 02:00:00')pd.Timedelta("1D2H3m")
# ------
# Output
# Timedelta('1 days 02:03:00')
- A time-delta-like object, e.g., an instance of the
datetime.timedelta
class from the standard library, or of thetimedelta64
class from NumPy - A number and a unit string. Possible unit strings include
"day"
,"hour"
,"minute"
,"seconds"
,"milliseconds"
,"microseconds"
,"nanoseconds"
(which is the default), and their abbreviations. In addition, the following strings are also accepted:"W"
(week),"D"
(day),"T"
(minute),"S"
(second),"L"
(millisecond),"U"
(microsecond), and"N"
(nanosecond).
pd.Timedelta(1.5, "d")
# ------
# Output
# Timedelta('1 days 12:00:00')
- Keyword arguments:
weeks
,days
,minutes
,hours
,seconds
,microseconds
,milliseconds
, andnanoseconds
pd.Timedelta(weeks=1, days=2, hours=3, nanoseconds=1e6)
# ------
# Output
# Timedelta('9 days 03:00:00.001000')
Time deltas can be negative. Note how such a time difference is represented:
pd.Timedelta("-6h")
# ------
# Output
# Timedelta('-1 days +18:00:00')
Constructing a Sequence of Time Deltas
You can use the TimedeltaIndex
constructor:
pd.TimedeltaIndex(["0 days", "1 days", "2 days", "3 days", "4 days"])
# ------
# Output
# TimedeltaIndex(['0 days', '1 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq=None)pd.TimedeltaIndex([1, 2, 4, 8], unit="d")
# ------
# Output
# TimedeltaIndex(['1 days', '2 days', '4 days', '8 days'], dtype='timedelta64[ns]', freq=None)
Or you can use the top-level function pd.timedelta_range()
, which has a very similar API to that of pd.date_range()
:
# Specify the start and the end, with the default daily frequency
pd.timedelta_range("1H", "3D")
# ------
# Output
# TimedeltaIndex(['0 days 01:00:00', '1 days 01:00:00', '2 days 01:00:00'], dtype='timedelta64[ns]', freq='D')# Specify the start, the end, and
# a frequency different from the default value
pd.timedelta_range("1H", "3D", freq="18H")
# ------
# Output
# TimedeltaIndex(['0 days 01:00:00', '0 days 19:00:00', '1 days 13:00:00',
# '2 days 07:00:00'],
# dtype='timedelta64[ns]', freq='18H')
# Specify the start and the number of time deltas,
# with the default daily frequency
pd.timedelta_range("1H", periods=3)
# ------
# Output
# TimedeltaIndex(['0 days 01:00:00', '1 days 01:00:00', '2 days 01:00:00'], dtype='timedelta64[ns]', freq='D')
# Specify the end and the number of time deltas,
# with the default daily frequency
pd.timedelta_range(end="1H", periods=3)
# ------
# Output
# TimedeltaIndex(['-2 days +01:00:00', '-1 days +01:00:00', '0 days 01:00:00'], dtype='timedelta64[ns]', freq='D')
# Specify the start, the end, and the number of time deltas
# The frequency is generated automatically (linearly spaced)
pd.timedelta_range("1H", "3D", periods=3)
# ------
# Output
# TimedeltaIndex(['0 days 01:00:00', '1 days 12:30:00', '3 days 00:00:00'], dtype='timedelta64[ns]', freq=None)
I hope by now you’ve gained a decent understanding of how time is represented in pandas. Of course, because this article was just a quick introduction, many things were left out, such as timezones, indexing datetime-like objects, properties of such objects, and time-related arithmetic operations. For more information, you can visit the official pandas user guide on time series.