Dates and Times
Contents
Dates and Times#
We already met the datetime
module in Web Access for handling points of time and time durations. Pandas extends those capabilities by introducing time periods (durations associated with a point) and more advanced calendar arithmetics.
Pandas also provides date and time related index objects to easily index time series data: DatetimeIndex
, TimedeltaIndex
, PeriodIndex
.
import pandas as pd
Time Stamps#
The basic data structure for representing points in time are Timestamp
objects. They provide lots of useful methods for conversion from and to other date and time formats.
some_day = pd.Timestamp(year=2020, month=2, day=15, hour=12, minute=34)
some_day
Timestamp('2020-02-15 12:34:00')
Time Deltas#
The basic data structure for representing durations are Timedelta
objects. They can be used in their own or to shift time stamps.
a_long_time = pd.Timedelta(days=10000, minutes=100)
a_long_time
Timedelta('10000 days 01:40:00')
Periods#
A period in Pandas is a time interval paired with a time stamp. Interpretation is as follows:
The interval is one of several preset intervals, like a calendar month or a week from Monday till Sunday. See Offset aliases and Anchored aliases for available intervals.
The time stamp selects a concrete interval, the month or the week containing the time stamp, for instance.
The basic data structure for representing periods are Period
objects.
year2010 = pd.Period('1/1/2010', freq='A')
print(year2010.start_time)
print(year2010.end_time)
2010-01-01 00:00:00
2010-12-31 23:59:59.999999999
year2010oct = pd.Period('1/1/2010', freq='A-OCT')
print(year2010oct.start_time)
print(year2010oct.end_time)
2009-11-01 00:00:00
2010-10-31 23:59:59.999999999
Time Stamp Indices#
Using time stamps for indexing offers lots of nice features in Pandas, because Pandas originally has been developed for handling time series.
Creating Time Stamp Indices#
The constructor for DatetimeIndex
objects takes a list of time stamps and an optional frequency. Frequency has to match the passed time stamps. If there is no common frequency in the data, the frequency is None
(default).
A more convenient method is pd.date_range
:
index = pd.date_range(start='2018-03-14', freq='2D12H', periods=10)
index
DatetimeIndex(['2018-03-14 00:00:00', '2018-03-16 12:00:00',
'2018-03-19 00:00:00', '2018-03-21 12:00:00',
'2018-03-24 00:00:00', '2018-03-26 12:00:00',
'2018-03-29 00:00:00', '2018-03-31 12:00:00',
'2018-04-03 00:00:00', '2018-04-05 12:00:00'],
dtype='datetime64[ns]', freq='60H')
index = pd.date_range(start='2018-03-14', end='2018-03-20', freq='D')
index
DatetimeIndex(['2018-03-14', '2018-03-15', '2018-03-16', '2018-03-17',
'2018-03-18', '2018-03-19', '2018-03-20'],
dtype='datetime64[ns]', freq='D')
We may also use columns of an existing data frame to create a DatetimeIndex
:
df = pd.DataFrame({'day': [1, 2, 3, 4], 'month': [4, 6, 9, 9], 'year': [2018, 2018, 2020, 2020]})
display(df)
pd.to_datetime(df)
day | month | year | |
---|---|---|---|
0 | 1 | 4 | 2018 |
1 | 2 | 6 | 2018 |
2 | 3 | 9 | 2020 |
3 | 4 | 9 | 2020 |
0 2018-04-01
1 2018-06-02
2 2020-09-03
3 2020-09-04
dtype: datetime64[ns]
The to_datetime
function expects that columns are named 'day'
, 'month'
, 'year'
. It returns a series of type datetime64
which may be converted to a DatetimeIndex
.
Indexing#
Exact Indexing#
Using Timestamp
objects for label based indexing yields items with the corresponding time stamp, if there are any. Slicing works as usual.
index = pd.date_range(start='2018-03-14', freq='D', periods=10)
s = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], index=index)
print(s, '\n')
print(s.loc[pd.Timestamp('2018-3-16')], '\n')
#print(s.loc[pd.Timestamp('2018-3-16 10:00')], '\n') # KeyError
print(s.loc[pd.Timestamp('2018-3-16 00:00')], '\n')
print(s.loc[pd.Timestamp('2018-3-16'):pd.Timestamp('2018-3-20')])
2018-03-14 1
2018-03-15 2
2018-03-16 3
2018-03-17 4
2018-03-18 5
2018-03-19 6
2018-03-20 7
2018-03-21 8
2018-03-22 9
2018-03-23 10
Freq: D, dtype: int64
3
3
2018-03-16 3
2018-03-17 4
2018-03-18 5
2018-03-19 6
2018-03-20 7
Freq: D, dtype: int64
Inexact Indexing#
Passing strings containing partial dates/times selects time ranges. This technique is referred to as partial string indexing. Slicing is allowed.
index = pd.date_range(start='2018-03-14', freq='D', periods=100)
s = pd.Series(range(1, len(index) + 1), index=index)
print(s, '\n')
print(s.loc['2018-3'], '\n')
print(s.loc['2018-3':'2018-4'])
2018-03-14 1
2018-03-15 2
2018-03-16 3
2018-03-17 4
2018-03-18 5
...
2018-06-17 96
2018-06-18 97
2018-06-19 98
2018-06-20 99
2018-06-21 100
Freq: D, Length: 100, dtype: int64
2018-03-14 1
2018-03-15 2
2018-03-16 3
2018-03-17 4
2018-03-18 5
2018-03-19 6
2018-03-20 7
2018-03-21 8
2018-03-22 9
2018-03-23 10
2018-03-24 11
2018-03-25 12
2018-03-26 13
2018-03-27 14
2018-03-28 15
2018-03-29 16
2018-03-30 17
2018-03-31 18
Freq: D, dtype: int64
2018-03-14 1
2018-03-15 2
2018-03-16 3
2018-03-17 4
2018-03-18 5
2018-03-19 6
2018-03-20 7
2018-03-21 8
2018-03-22 9
2018-03-23 10
2018-03-24 11
2018-03-25 12
2018-03-26 13
2018-03-27 14
2018-03-28 15
2018-03-29 16
2018-03-30 17
2018-03-31 18
2018-04-01 19
2018-04-02 20
2018-04-03 21
2018-04-04 22
2018-04-05 23
2018-04-06 24
2018-04-07 25
2018-04-08 26
2018-04-09 27
2018-04-10 28
2018-04-11 29
2018-04-12 30
2018-04-13 31
2018-04-14 32
2018-04-15 33
2018-04-16 34
2018-04-17 35
2018-04-18 36
2018-04-19 37
2018-04-20 38
2018-04-21 39
2018-04-22 40
2018-04-23 41
2018-04-24 42
2018-04-25 43
2018-04-26 44
2018-04-27 45
2018-04-28 46
2018-04-29 47
2018-04-30 48
Freq: D, dtype: int64
Inexact indexing has some pitfalls, which are described in Partial string indexing and Slice vs. exact match of the Pandas user guide.
Useful Functions for Time Stamp Indexed Data#
Pandas provides lots of functions for working with time stamp indices. Some are:
asfreq
(upsampling with fill values or filling logic)shift
(shift index or data by some time period)resample
(downsampling with aggregation, see below)
The resample
method returns a Resampler
object, which provides several methods for calculating data values at the new time stamps. Examples are sum
, mean
, min
, max
. All these methods return a series or a data frame.
index = pd.date_range(start='2018-03-14', freq='D', periods=100)
s = pd.Series(range(1, len(index) + 1), index=index)
s2 = s.resample('5D').sum()
s2
2018-03-14 15
2018-03-19 40
2018-03-24 65
2018-03-29 90
2018-04-03 115
2018-04-08 140
2018-04-13 165
2018-04-18 190
2018-04-23 215
2018-04-28 240
2018-05-03 265
2018-05-08 290
2018-05-13 315
2018-05-18 340
2018-05-23 365
2018-05-28 390
2018-06-02 415
2018-06-07 440
2018-06-12 465
2018-06-17 490
Freq: 5D, dtype: int64
Period indices#
Period indices work analogously to time stamp indices. Corresponding class is PeriodIndex
.
index = pd.period_range(start='2018-03-14', freq='D', periods=10)
print(index)
s = pd.Series(range(1, len(index) + 1), index=index)
s
PeriodIndex(['2018-03-14', '2018-03-15', '2018-03-16', '2018-03-17',
'2018-03-18', '2018-03-19', '2018-03-20', '2018-03-21',
'2018-03-22', '2018-03-23'],
dtype='period[D]')
2018-03-14 1
2018-03-15 2
2018-03-16 3
2018-03-17 4
2018-03-18 5
2018-03-19 6
2018-03-20 7
2018-03-21 8
2018-03-22 9
2018-03-23 10
Freq: D, dtype: int64
Indexing with time stamps selects the appropriate period, like with IntervalIndex
objects:
s.loc[pd.Timestamp('2018-03-15 12:34')]
2
With time stamp index the above line would lead to a KeyError
. But for periods it’s interpreted as: select the period containing the time stamp.
Similar is possible with slicing:
s.loc[pd.Timestamp('2018-03-15 12:34'):pd.Timestamp('2018-03-18 23:45')]
2018-03-15 2
2018-03-16 3
2018-03-17 4
2018-03-18 5
Freq: D, dtype: int64
Methods asfreq
, shift
, resample
also work for periods indices.