In this article about TimeSeries Data, we will discuss Pythons Statsmodels library and how it can be used to explore and analyze time-series data. The jupyter notebook for this blog can be found here.
First, let's explore some concepts related to TimeSeries Data:
Trend
Any kind of pattern observed in the data. A time-series data can have an upward, a downward or a horizontal/stationary trend.
Seasonality
Any kind of repeating trends in the time-series data.
Cyclicality
Trends with no set patterns.
Stationarity
A time-series is said to be stationary if it does not display any trends or seasonality. In the figure, the first series does not have an upward or downward trend, nor does it display any seasonality. One more way of defining stationarity is that it is when data does not have any time-dependent mean, variance or covariance.
Hodrick-Prescott Filter
Separates a time series into a trend component and a cyclical component. Find more here.
A parameter lambda needs to be specified, and as a thumb rule, the value is taken to be 1600 for quarterly data, 6.25 for annual data and 129600 for monthly data.
Example Usage :
Load the data into a dataframe :
Using lambda of 129600 as this a monthly data, we import ‘hpfilter’ from the Statsmodel and plot the cyclical and trend component.
ETS decomposition
ETS (Error, Trend, Seasonality) decomposition, breaks down a time-series into a trend component, a seasonality component and an error(residual) component. While performing ETS decomposition, we need to specify if the model is ‘additive’ or ‘multiplicative’. A model is said to be ‘additive’ if it is increasing or decreasing at a linear rate. If the rate of increase is non-linear, we choose ‘multiplicative. In the above trend, we can see that peaks are becoming higher each year, which seems to indicate this is a multiplicative model.
In python, using the statsmodels library we can perform ETS decomposition as below:
Holt-Winters Method
Holt-Winters method provides a triple exponential smoothing for level, trend and seasonal components. It has three sets of parameters: alpha, beta and gamma. Alpha specifies the coefficient for the level smoothing. Beta specifies the coefficient for the trend smoothing. Gamma specifies the coefficient for the seasonal smoothing. There is also a parameter for the type of seasonality: Additive seasonality, where each season changes by a constant number. Multiplicative seasonality, where each season changes by a factor. For more details on the concept, refer to this article. Here I will show the simple implementation of the method using the statsmodels library.
Simple Exponential Smoothing
Double Exponential Smoothing (Holts Method)
In the above chart, the ‘green’ line for the Double Exponential Smoothing fits the original time-series quite perfectly, as can be seen from the chart.
Triple Exponential Smoothing (Holt-Winters Method)
In this case, the Double exponential smoothing performs better than Triple Exponential smoothing in fitting the data.
ACF and PACF Plots
Statsmodels gives us ready-to-use functions to plot both ACF and PACF plots, which can then be used for building ARIMA models.
AD Fuller Test
AD Fuller Test helps us in checking the stationarity in a time-series. It gives us a p-value output, which can be used to decide if the null hypothesis (Data is non-stationary) can be rejected or not.
Month Plots and Quarter Plots
These plots provide us with a better overview of the seasonality in the time-series.
In the next article in this series, we will explore some forecasting methods for time-series data.
Thanks for reading. Your comments and suggestions are welcome.