1.Augmented Dickey-Fuller ADF test Manual, using Python, R, STATA

Augmented Dickey-Fuller (ADF) test Manual, using Python, R, STATA

Augmented Dickey-Fuller ADF test Manual, using Python, R, STATA. Learn how to perform the Augmented Dickey-Fuller ADF test manually to check stationarity in time series data. This guide explains the ADF test step-by-step with practical examples in Python, R, and STATA. Perfect for students, researchers, and data analysts looking to understand unit root testing and improve their econometrics and time series analysis skills. ADF test is the basic test to check the stationarity in Time Series Analysis.

Econometrics subject is the major subject of research analysis in all fields but specifically economics and finance performed in Masters, M.Phil and PhD level researches. This subject is taught in all major Universities around the Globe at these levels such as Bonn, Freie, Konstanz, DIW, PU, QAU, AIOU, MU, DU and many other Universities around the Globe.

Augmented Dickey-Fuller ADF test Manual, using Python, R, STATA

Introduction

The ADF test helps us figure out if a time series is stable over time or if it changes in some way. A stable (stationary) series means its average, spread, and how values relate to each other don’t shift as time passes. In contrast, a non-stationary series changes in these aspects, like a trend that keeps going up or down.

The Augmented Dickey-Fuller (ADF) test checks if a time series has a “unit root,” which basically means it’s non-stationary.

  • The test’s null hypothesis assumes the series is non-stationary (has a unit root).
  • The alternative hypothesis suggests the series is stationary (no unit root).

Stationarity is important because many forecasting models, like ARIMA, rely on data that doesn’t change its basic properties over time. If you use non-stationary data without fixing it, the model’s predictions can be unreliable and misleading.

The “Augmented” in ADF means the test is improved to handle more complex patterns in the data, like when past values influence future values beyond just one-time step.

Types of ADF Model

The Augmented Dickey-Fuller (ADF) test has three common model specifications that differ based on whether they include a constant and/or a trend, depending on the nature of the time series data.

1.Without Constant and trend Model

    \[ \mathrm{\Delta}Y_{t} = \gamma Y_{t - 1} + \sum\delta_{i}\mathrm{\Delta}Y_{t - 1} + \varepsilon_{t}\ \]

2.With Constant only drift Model

    \[ \mathrm{\Delta}Y_{t} = \alpha + \gamma Y_{t - 1} + \sum\delta_{i}\mathrm{\Delta}Y_{t - 1} + \varepsilon_{t}\ \]

3.With Constant & trend Model

    \[ \mathrm{\Delta}Y_{t} = \alpha + \beta t + \gamma Y_{t - 1} + \sum\delta_{i}\mathrm{\Delta}Y_{t - 1} + \varepsilon_{t}\ \]

Marquee Tag
bcfeducation.com
“BCF Education: Your Gateway to Mastering Finance, Economics, and Data Insights”

Model 1: No Constant, No Trend (None) Random Walk

  • Use when the series fluctuates around zero with no trend and is mean-reverting around zero. This applies to already demeaned or differenced data.
  • It assumes no drift or trend, is the most restrictive model, and is rarely used with raw economic or financial data.
  • Tests for a zero-mean random walk; null hypothesis: unit root with no drift or trend vs. alternative: stationary around zero.

Model 2: With Constant Only (Drift)

  • Use when the series fluctuates around a non-zero mean with no clear trend, typical for financial returns or difference-stationary series.
  • Includes a constant term capturing drift (average change).
  • Most commonly applied model; appropriate for series with wandering means but no deterministic trend.
  • Null hypothesis: unit root with drift; alternative: stationary around a non-zero mean.

Model 3: With Constant and Trend

  • Use when the series shows a clear upward or downward deterministic trend, such as macroeconomic variables in levels (GDP, population, prices).
  • Includes both constant and a time trend term.
  • The most general and conservative model, suitable when a trend may be present.
  • Null hypothesis: unit root with drift and trend; alternative: stationary around a deterministic trend.

Why Model Choice Matters

  • Models 1 and 2 test stationarity around a constant mean (zero or non-zero).
  • Model 3 tests stationarity around a deterministic trend, which is essential when the series has a trending behavior.
  • Using a model without appropriate deterministic components may lead to false conclusions (wrong rejection or low power).

Practical Decision Framework

Visual Inspection:

  • Fluctuates around zero, no trend → Model 1
  • Fluctuates around a non-zero mean, no trend → Model 2
  • Shows a clear trend → Model 3

Statistical Strategy:

  • Start with the most general model (constant + trend).
  • If trend coefficient is insignificant, simplify to Model 2 or Model 1.
  • Use information criteria (AIC/BIC) to compare fits.

Economic Context:

  • Stock prices: usually tested with constant and trend.
  • Interest rates: typically, with constant only.
  • GDP growth: usually with constant only.

Example for Manual Calculation

Year200020012002200320042005200620072008
Yt101215202225283035
Yt∆Yt = Yt – Yt-1Yt-1Yt-1 x ∆YtY²t-1
10    
1221020100
1531236144
2051575225
2222040400
2532266484
2832575625
3022856784
35530150900
     
   ∑(Yt-1 x ∆Yt)= 518∑(Y²t-1)=3662

Estimation of Slope (OLS Regression)

    \[ Slope = \widehat{\gamma} = \frac{\sum(Y_{t - 1} \times \mathrm{\Delta}Y_{t})}{\sum(Y_{t - 1}^{2})}\ \]

    \[ Slope = \widehat{\gamma} = \frac{518}{3662} = 0.1414\ \]

Calculation of Residuals

γ̂Yt-1∆Yt∆Ŷt = γ̂ Yt-1ɛ̂ = ∆Yt – ∆Ŷtɛ̂²
0.14141021.4140.5860.343396
0.14141231.69681.30321.69833
0.14141552.1212.8798.288641
0.14142022.828-0.8280.685584
0.14142233.1108-0.11080.012277
0.14142533.535-0.5350.286225
0.14142823.9592-1.95923.838465
0.14143054.2420.7580.574564
     ∑ ɛ̂² = 15.72748

Standard Error of γ̂

    \[ S.E\ \left( \widehat{\gamma} \right) = \sqrt{\frac{\sum\ \varepsilon\hat{}²}{(n - 1)\sum Y_{t - 1}^{2}}}\ \]

    \[ S.E\ \left( \widehat{\gamma} \right) = \sqrt{\frac{15.72748}{(8 - 1)3662}}\ \]

    \[ S.E\ \left( \widehat{\gamma} \right) = \sqrt{\frac{15.72748}{25634}}\ \]

    \[ S.E\ \left( \widehat{\gamma} \right) = \sqrt{0.0006135}\ \]

    \[ S.E\ \left( \widehat{\gamma} \right) = 0.0247\ \]

Test Statistic

    \[ ADF = \ \frac{\widehat{\gamma}}{S.E\ \left( \widehat{\gamma} \right)}\ \]

    \[ ADF = \ \frac{0.1414}{0.0247}\ \]

    \[ ADF = 5.72\ \]

Decision

Critical value at 5% (no constant, small sample) ≈ -2.9

Our statistic = +5.72 (positive)

Since ADF > critical value, fail to reject H₀.

Conclusion

The series has a unit root → it is non-stationary.

Marquee Tag
bcfeducation.com
“BCF Education: Your Gateway to Mastering Finance, Economics, and Data Insights”
Augmented Dickey-Fuller ADF test Manual, using Python, R, STATA

How to Calculate ADF Full Version Test in Python

# Import libraries

import pandas as pd

from statsmodels.tsa.stattools import adfuller

# Your dataset

data = [10, 12, 15, 20, 22, 25, 28, 30, 35]

# Convert to pandas Series

y = pd.Series(data)

# Run Augmented Dickey-Fuller test

result = adfuller(y)

# Print results

print(“ADF Statistic:”, result[0])

print(“p-value:”, result[1])

print(“Used Lags:”, result[2])

print(“Number of Observations:”, result[3])

print(“Critical Values:”, result[4])

# Interpretation

if result[1] < 0.05:

    print(“Reject H0: Series is stationary”)

else:

    print(“Fail to reject H0: Series is non-stationary (unit root present)”)

How to Calculate ADF Full Version Test in R

# Install package if not already installed

install.packages(“tseries”)

# Load the library

library(tseries)

# Your dataset

y <- c(10, 12, 15, 20, 22, 25, 28, 30, 35)

# Run Augmented Dickey-Fuller test

adf_result <- adf.test(y)

# Print the result

print(adf_result)

How to Calculate ADF Full Version Test in STATA

clear

input year Yt

2000 10

2001 12

2002 15

2003 20

2004 22

2005 25

2006 28

2007 30

2008 35

end

tsset year

dfuller Yt, lags(1)What should we do for Stationarity?

Different econometric models like ARIMA, OLS, VAR etc. requires stationarity. If a time series is non-stationary, it means that mean, variance and autocorrelation changes overtime which is not suitable for econometric models discussed above.

We should do following measures to make it stationary:

1.Differencing

Take the 1st or 2nd difference such as:

First Difference

    \[ \mathrm{\Delta}Y_{t} = Y_{t} - Y_{t - 1}\ \ \]

Second Difference

    \[ \mathrm{\Delta}^{2}Y_{t} = {\mathrm{\Delta}Y}{t} - {\mathrm{\Delta}Y}{t - 1}\ \]

2. Transformation (For Non-Constant Variance)

If the series shows changing fluctuations over time—meaning the size of its ups and downs isn’t consistent (called heteroscedasticity) you can apply a transformation to make the variance more stable. Common methods include taking the logarithm or the square root of the values, or using a more flexible approach called the Box-Cox transformation, which covers both log and square root transformations as special cases. These transformations shrink the bigger values more than the smaller ones, which evens out the variation across the data. This is especially helpful for series that grow exponentially.

How to use it: First, apply one of these transformations to your data. But since the series might still show a trend after that, you may also need to take differences (for example, difference the log-transformed series) to fully stabilize it.

3.DE trending

If a time series is non-stationary just because it has a clear, predictable trend like a straight line you can deal with this by modelling that trend and then removing it.

Y’t = Yt – (α + βt)

By taking away the trend, you’re left with the more stable, stationary parts of the series that fluctuate around a constant level.

How to do it: You fit a simple linear or polynomial regression to the time points and then subtract the trend values you’ve estimated from the original data.

Keep in mind, though, this works well if the trend is deterministic (fixed and predictable). For more random or unpredictable trends, differencing the data is usually a better choice because it handles such stochastic trends more effectively.

4. Seasonal Differencing

If your time series shows clear seasonal patterns like sales always peaking every December, it means the series isn’t stable across those seasons.

Seasonal differencing helps fix this by subtracting each value from the value at the same time in the previous season. For example, with monthly data that has yearly seasonality, you’d subtract the value from 12 months ago:

    \[ Yˊ{t}\ = Y{t} - Y_{t - m}\ \]

(where m is the seasonal period, like 12 for monthly data).

It removes repeating seasonal effects by comparing each point to its counterpart in the previous cycle.

Seasonal differencing is often combined with regular differencing and is a key step in models like SARIMA.

5. Decomposition

Decomposition takes a detailed approach by splitting the series into its main parts: trend, seasonality, and residuals (the leftover noise).

In additive model:

    \[ Y_{t} = Trend_{t} + Seasonal_{t} + Residual_{t}\ \]

In multiplicative model:

    \[ Y_{t} = Trend_{t} \times Seasonal_{t} \times Residual_{t}\ \]

After breaking the series down, the residual part should be stable and random, which makes it easier to model.

How to use it: Apply decomposition methods like STL to separate these components, then focus on modelling the stationary residuals.

2.What is the KPSS Test, How to Perform KPSS in Python, R, and STATA

Marquee Tag
bcfeducation.com
“BCF Education: Your Gateway to Mastering Finance, Economics, and Data Insights”
Augmented Dickey-Fuller ADF test Manual, using Python, R, STATA

Leave a Comment

Your email address will not be published. Required fields are marked *