Timeseries Part 1: Manipulating Time Series with Pandas

Prakhar S
4 min readNov 29, 2021

Python’s Pandas has lots of built-in functionalities to deal with DateTime columns and series. Here is a brief overview :

import pandas as pd
import matplotlib.pyplot as plt

date_range

Get a range of dates, specifying the start date, range and frequency level whether ‘Y’(year), ‘M’(month) or ‘D’(day).

pd.date_range('2021-01-01', periods = 10, freq = 'D')

Result:

>>> DatetimeIndex(['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04',
'2021-01-05', '2021-01-06', '2021-01-07', '2021-01-08',
'2021-01-09', '2021-01-10'],
dtype='datetime64[ns]', freq='D')to_datetime changes strings into datetime objects

to_datetime

Converts a list of strings with date-time like sequences into DatetimeIndex objects. We can also pass a ‘format’ argument to specify the format we are passing such as ‘%d%m%y’

pd.to_datetime(['2/1/18', '3/1/18'],format = '%d/%m/%y')

Result :

DatetimeIndex(['2018-01-02', '2018-01-03'], dtype='datetime64[ns]', freq=None)

While reading a CSV file into a pandas data frame, if there is a date column present, it can be set as an index, using index_col= ‘Date’

df = pd.read_csv('sample.csv', index_col='Date',parse_dates=True)

‘parse_dates=True’ argument actually sets the ‘dates’ as dates instead of strings.

Output sample:

Getting the month as string names from the index :

The ‘strftime’ function converts dates into string formats. You can read more here.

Resampling

A data frame indexed using time order like above can use the resample() function for frequency conversion. We can apply various frequencies such as ‘yearly’, ‘monthly’ and ‘daily’ to resample our time series data.

The frequencies that can be applied and their aliases are given below:

Example: Resampling by year

Resampling can be used both for upsampling(to a narrower timeframe, such as from month to days) and downsampling(to a wider timeframe, as in the above example). In the case of upsampling, we can use methods such ‘ffill’ (forward fill) or ‘bfill’ (backward fill) to fill the missing values.

Time shifting

df.shift(periods=1,freq='D')

shifts every row by one month. we can change the ‘freq’ to “Y” or “M” and ‘periods’ to higher or even negative (to shift backwards). We can also specify the ‘fill_value’ parameter if we do not want the ‘Nan’ values.

Rolling mean

This gives an aggregate for a moving window of time periods, whose size we can specify. Most commonly used with mean.

df.rolling(window=30).mean() 

will choose a period of 30 days or whatever time granularity is given in our data and create a mean for the last 7 days. (there will not be any information for the first 29 days) . When plotted, this kind of graph will show the general trend.

Example :

Expanding

Expanding is similar to rolling, except in this case all previous data aggregate is considered for each time level.

df['column'].expanding().mean()

at every point, the average of all points before that time will be shown calculated and shown.

Example:

Exponentially Weighted Moving Average (EWMA): EWMA is an improvement on Simple Moving Average calculated above using ‘Rolling’, in that it provides more weight to the more recent data, and reduces the lag effect from SMA. Pandas has an ‘ewm’ method through which we can find the EWMA trend rather than SMA .

We can see that in EWMA, the trend is quite flat in the beginning but it tends to represent the data more and more in the later stages.

Thanks for reading. I am going to follow with an article about the Python Statsmodel library soon.

--

--