Dates and Times#

We already met the datetime module in Web Access for handling points of time and time durations. Pandas extends those capabilities by introducing time periods (durations associated with a point) and more advanced calendar arithmetics.

Pandas also provides date and time related index objects to easily index time series data: DatetimeIndex, TimedeltaIndex, PeriodIndex.

import pandas as pd

Time Stamps#

The basic data structure for representing points in time are Timestamp objects. They provide lots of useful methods for conversion from and to other date and time formats.

some_day = pd.Timestamp(year=2020, month=2, day=15, hour=12, minute=34)
some_day
Timestamp('2020-02-15 12:34:00')

Time Deltas#

The basic data structure for representing durations are Timedelta objects. They can be used in their own or to shift time stamps.

a_long_time = pd.Timedelta(days=10000, minutes=100)
a_long_time
Timedelta('10000 days 01:40:00')

Periods#

A period in Pandas is a time interval paired with a time stamp. Interpretation is as follows:

  • The interval is one of several preset intervals, like a calendar month or a week from Monday till Sunday. See Offset aliases and Anchored aliases for available intervals.

  • The time stamp selects a concrete interval, the month or the week containing the time stamp, for instance.

The basic data structure for representing periods are Period objects.

year2010 = pd.Period('1/1/2010', freq='A')
print(year2010.start_time)
print(year2010.end_time)
2010-01-01 00:00:00
2010-12-31 23:59:59.999999999
year2010oct = pd.Period('1/1/2010', freq='A-OCT')
print(year2010oct.start_time)
print(year2010oct.end_time)
2009-11-01 00:00:00
2010-10-31 23:59:59.999999999

Time Stamp Indices#

Using time stamps for indexing offers lots of nice features in Pandas, because Pandas originally has been developed for handling time series.

Creating Time Stamp Indices#

The constructor for DatetimeIndex objects takes a list of time stamps and an optional frequency. Frequency has to match the passed time stamps. If there is no common frequency in the data, the frequency is None (default).

A more convenient method is pd.date_range:

index = pd.date_range(start='2018-03-14', freq='2D12H', periods=10)
index
DatetimeIndex(['2018-03-14 00:00:00', '2018-03-16 12:00:00',
               '2018-03-19 00:00:00', '2018-03-21 12:00:00',
               '2018-03-24 00:00:00', '2018-03-26 12:00:00',
               '2018-03-29 00:00:00', '2018-03-31 12:00:00',
               '2018-04-03 00:00:00', '2018-04-05 12:00:00'],
              dtype='datetime64[ns]', freq='60H')
index = pd.date_range(start='2018-03-14', end='2018-03-20', freq='D')
index
DatetimeIndex(['2018-03-14', '2018-03-15', '2018-03-16', '2018-03-17',
               '2018-03-18', '2018-03-19', '2018-03-20'],
              dtype='datetime64[ns]', freq='D')

We may also use columns of an existing data frame to create a DatetimeIndex:

df = pd.DataFrame({'day': [1, 2, 3, 4], 'month': [4, 6, 9, 9], 'year': [2018, 2018, 2020, 2020]})
display(df)

pd.to_datetime(df)
day month year
0 1 4 2018
1 2 6 2018
2 3 9 2020
3 4 9 2020
0   2018-04-01
1   2018-06-02
2   2020-09-03
3   2020-09-04
dtype: datetime64[ns]

The to_datetime function expects that columns are named 'day', 'month', 'year'. It returns a series of type datetime64 which may be converted to a DatetimeIndex.

Indexing#

Exact Indexing#

Using Timestamp objects for label based indexing yields items with the corresponding time stamp, if there are any. Slicing works as usual.

index = pd.date_range(start='2018-03-14', freq='D', periods=10)
s = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], index=index)
print(s, '\n')

print(s.loc[pd.Timestamp('2018-3-16')], '\n')
#print(s.loc[pd.Timestamp('2018-3-16 10:00')], '\n')    # KeyError
print(s.loc[pd.Timestamp('2018-3-16 00:00')], '\n')
print(s.loc[pd.Timestamp('2018-3-16'):pd.Timestamp('2018-3-20')])
2018-03-14     1
2018-03-15     2
2018-03-16     3
2018-03-17     4
2018-03-18     5
2018-03-19     6
2018-03-20     7
2018-03-21     8
2018-03-22     9
2018-03-23    10
Freq: D, dtype: int64 

3 

3 

2018-03-16    3
2018-03-17    4
2018-03-18    5
2018-03-19    6
2018-03-20    7
Freq: D, dtype: int64

Inexact Indexing#

Passing strings containing partial dates/times selects time ranges. This technique is referred to as partial string indexing. Slicing is allowed.

index = pd.date_range(start='2018-03-14', freq='D', periods=100)
s = pd.Series(range(1, len(index) + 1), index=index)
print(s, '\n')

print(s.loc['2018-3'], '\n')
print(s.loc['2018-3':'2018-4'])
2018-03-14      1
2018-03-15      2
2018-03-16      3
2018-03-17      4
2018-03-18      5
             ... 
2018-06-17     96
2018-06-18     97
2018-06-19     98
2018-06-20     99
2018-06-21    100
Freq: D, Length: 100, dtype: int64 

2018-03-14     1
2018-03-15     2
2018-03-16     3
2018-03-17     4
2018-03-18     5
2018-03-19     6
2018-03-20     7
2018-03-21     8
2018-03-22     9
2018-03-23    10
2018-03-24    11
2018-03-25    12
2018-03-26    13
2018-03-27    14
2018-03-28    15
2018-03-29    16
2018-03-30    17
2018-03-31    18
Freq: D, dtype: int64 

2018-03-14     1
2018-03-15     2
2018-03-16     3
2018-03-17     4
2018-03-18     5
2018-03-19     6
2018-03-20     7
2018-03-21     8
2018-03-22     9
2018-03-23    10
2018-03-24    11
2018-03-25    12
2018-03-26    13
2018-03-27    14
2018-03-28    15
2018-03-29    16
2018-03-30    17
2018-03-31    18
2018-04-01    19
2018-04-02    20
2018-04-03    21
2018-04-04    22
2018-04-05    23
2018-04-06    24
2018-04-07    25
2018-04-08    26
2018-04-09    27
2018-04-10    28
2018-04-11    29
2018-04-12    30
2018-04-13    31
2018-04-14    32
2018-04-15    33
2018-04-16    34
2018-04-17    35
2018-04-18    36
2018-04-19    37
2018-04-20    38
2018-04-21    39
2018-04-22    40
2018-04-23    41
2018-04-24    42
2018-04-25    43
2018-04-26    44
2018-04-27    45
2018-04-28    46
2018-04-29    47
2018-04-30    48
Freq: D, dtype: int64

Inexact indexing has some pitfalls, which are described in Partial string indexing and Slice vs. exact match of the Pandas user guide.

Useful Functions for Time Stamp Indexed Data#

Pandas provides lots of functions for working with time stamp indices. Some are:

  • asfreq (upsampling with fill values or filling logic)

  • shift (shift index or data by some time period)

  • resample (downsampling with aggregation, see below)

The resample method returns a Resampler object, which provides several methods for calculating data values at the new time stamps. Examples are sum, mean, min, max. All these methods return a series or a data frame.

index = pd.date_range(start='2018-03-14', freq='D', periods=100)
s = pd.Series(range(1, len(index) + 1), index=index)

s2 = s.resample('5D').sum()
s2
2018-03-14     15
2018-03-19     40
2018-03-24     65
2018-03-29     90
2018-04-03    115
2018-04-08    140
2018-04-13    165
2018-04-18    190
2018-04-23    215
2018-04-28    240
2018-05-03    265
2018-05-08    290
2018-05-13    315
2018-05-18    340
2018-05-23    365
2018-05-28    390
2018-06-02    415
2018-06-07    440
2018-06-12    465
2018-06-17    490
Freq: 5D, dtype: int64

Period indices#

Period indices work analogously to time stamp indices. Corresponding class is PeriodIndex.

index = pd.period_range(start='2018-03-14', freq='D', periods=10)
print(index)

s = pd.Series(range(1, len(index) + 1), index=index)
s
PeriodIndex(['2018-03-14', '2018-03-15', '2018-03-16', '2018-03-17',
             '2018-03-18', '2018-03-19', '2018-03-20', '2018-03-21',
             '2018-03-22', '2018-03-23'],
            dtype='period[D]')
2018-03-14     1
2018-03-15     2
2018-03-16     3
2018-03-17     4
2018-03-18     5
2018-03-19     6
2018-03-20     7
2018-03-21     8
2018-03-22     9
2018-03-23    10
Freq: D, dtype: int64

Indexing with time stamps selects the appropriate period, like with IntervalIndex objects:

s.loc[pd.Timestamp('2018-03-15 12:34')]
2

With time stamp index the above line would lead to a KeyError. But for periods it’s interpreted as: select the period containing the time stamp.

Similar is possible with slicing:

s.loc[pd.Timestamp('2018-03-15 12:34'):pd.Timestamp('2018-03-18 23:45')]
2018-03-15    2
2018-03-16    3
2018-03-17    4
2018-03-18    5
Freq: D, dtype: int64

Methods asfreq, shift, resample also work for periods indices.