2.What is the KPSS Test, How to Perform KPSS in Python, R, and STATA

2.What is the KPSS Test? How to Perform KPSS in Python, R, and STATA

2.What is the KPSS Test, How to Perform KPSS in Python, R, and STATA. In this blog post, we explore the KPSS (Kwiatkowski–Phillips–Schmidt–Shin) test for stationarity, a crucial tool in time series analysis within econometrics. Understanding whether a series is stationary is fundamental before applying forecasting models or regression techniques, as non-stationary data can lead to misleading results.

This post provides a clear explanation of the KPSS test along with practical applications in Python, R, and STATA. We also highlight the importance of econometrics in research, with a special focus on how the KPSS test and the Augmented Dickey-Fuller (ADF) test complement each other to strengthen empirical analysis and ensure robust findings in economics, finance, and social science studies.

Econometrics subject is the major subject of research analysis in all fields but specifically economics and finance performed in Masters, M.Phil. and PhD level researches. This subject is taught in all major Universities around the Globe at these levels such as BonnFreieKonstanzDIWPUQAUAIOUMUDU and many other Universities around the Globe.

2.What is the KPSS Test, How to Perform KPSS in Python, R, and STATA

The KPSS test, named after Kwiatkowski, Phillips, Schmidt, and Shin, is a statistical method used in econometrics to determine whether a time series is stationary. Stationarity means that the series’ statistical properties, like its mean and variance, do not change over time.

How Does KPSS Differ from the ADF Test?

The KPSS test is unique because it reverses the usual setting of the Augmented Dickey-Fuller (ADF) test, which is more commonly used. In the ADF test, the null hypothesis assumes the series is non-stationary (has a unit root), while in the KPSS test, the null hypothesis assumes the series is stationary.

KPSS Null Hypothesis (H₀): The series is stationary (either around a constant mean or a deterministic trend).

KPSS Alternative Hypothesis (H₁): The series is non-stationary (has a unit root).

Why Use the KPSS Test?

Because the KPSS test assumes the opposite null hypothesis compared to the ADF test, using both tests together allow for stronger, more reliable conclusions about stationarity:

  • If the ADF test rejects its null (indicating stationarity) and the KPSS test fails to reject its null (also indicating stationarity), you gain confidence that the series is indeed stationary.
  • If the tests disagree, it signals more complex types of non-stationarity, such as sudden shifts in the mean or rapid adjustments, warranting further analysis.

Versions of the KPSS Test

The KPSS test can be applied in two main ways to assess stationarity:

Level Stationarity Test: Assumes the time series fluctuates around a constant mean.

Trend Stationarity Test: Assumes the time series is stationary around a deterministic trend over time.

This makes the KPSS test a valuable complement to traditional tests like the ADF, helping researchers get a fuller picture of a series’ behavior.

The Test Statistic

The KPSS statistic is calculated as:

    \[ \mathbf{KPSS = \ }\frac{\mathbf{\sum}\mathbf{S}_{\mathbf{t}}^{\mathbf{2}}}{\mathbf{T}^{\mathbf{2}}{\widehat{\mathbf{\sigma}}}^{\mathbf{2}}}\ \]

Where:

    \[ St = \sum e_{i}\ \]

The partial sum of the residuals. You first regress your series on a constant (for level stationarity) or on a constant and a trend (for trend stationarity) to get the residuals.

T is the sample size (number of observations).

    \[ {\widehat{\sigma}}^{2}\ \]

It is a consistent estimator of the long-run variance of the residuals. This is the tricky part in manual calculation, as it involves selecting a lag truncation parameter to account for serial correlation. For simplicity, we will assume no serial correlation, so it is just the variance of the residuals.

Marquee Tag
bcfeducation.com
“BCF Education: Your Gateway to Mastering Finance, Economics, and Data Insights”

Manual Calculation Example

Let’s use a simple, hypothetical dataset of 10 observations to test for level stationarity.

The Data
We have the following time series:

y675487891010

Regress on a Constant and Find Residuals
Under the null of level stationarity, the model is: 

yt=α+ɛt

The best estimate for α is the mean of y.

y675487891010∑y=74

    \[ \overline{y} = \frac{\sum y}{n} = \frac{74}{10} = 7.4\ \]

tyet =y – y̅StSt²et²
16-1.4-1.41.961.96
27-0.4-1.83.240.16
35-2.4-4.217.645.76
44-3.4-7.657.7611.56
580.6-7490.36
67-0.4-7.454.760.16
780.6-6.846.240.36
891.6-5.227.042.56
9102.6-2.66.766.76
10102.6006.76
    ∑St² = 264.4∑et² = 36.4

Note: The final St should always be 0 (or very close) when using OLS residuals from a regression that includes a constant.

Calculation of the Variance of residuals

    \[ {\widehat{\sigma}}^{2}\mathbf{=}\frac{\sum e_{t}^{2}}{\mathbf{T}}\mathbf{=}\frac{\mathbf{36.4}}{\mathbf{10}}\mathbf{= 3.64}\ \]

    \[ KPSS = \ \frac{\sum S_{t}^{2}}{T^{2}{\widehat{\sigma}}^{2}}\ \]

    \[ KPSS = \ \frac{264.4}{10^{2} \times 3.64}\ \]

    \[ KPSS = \ \frac{264.4}{364}\ \]

    \[ KPSS = 0.7263\ \]

Compare with Critical Value
We need to compare our calculated statistic (0.7263) with the critical value from the KPSS distribution table.

For Level Stationarity at a 5% significance level, the critical value for T=10 is approximately 0.347.

Conclusion:
Since our test statistic (0.7263) is greater than the critical value (0.347), we reject the null hypothesis.

Interpretation: There is statistical evidence to suggest that this time series is not level-stationary.

Note: KPSS Tests heavily depend on lags, which is difficult in manual calculation, so you will have to run KPSS test in Python, R or STATA. Below all the commands are given at the same data for KPSS in Python, R, and STATA.

Marquee Tag
bcfeducation.com
“BCF Education: Your Gateway to Mastering Finance, Economics, and Data Insights”
2.What is the KPSS Test, How to Perform KPSS in Python, R, and STATA

KPSS Calculation for the Above Data in Python

import numpy as np

import pandas as pd

from statsmodels.tsa.stattools import kpss

# Given data

y = [6, 7, 5, 4, 8, 7, 8, 9, 10, 10]

# Run KPSS test (default = level stationarity, i.e., around mean)

statistic, p_value, n_lags, critical_values = kpss(y, regression=’c’, nlags=”auto”)

# Print results

print(“KPSS Test Statistic:”, statistic)

print(“p-value:”, p_value)

print(“Number of Lags:”, n_lags)

print(“Critical Values:”, critical_values)

# Interpretation

if p_value < 0.05:

    print(“Reject Null Hypothesis: The series is likely NON-STATIONARY.”)

else:

    print(“Fail to Reject Null Hypothesis: The series is likely STATIONARY.”)

KPSS Calculation for the Above Data in R

# Install the package if not already installed

# install.packages(“tseries”)

# Load the library

library(tseries)

# Given data

y <- c(6, 7, 5, 4, 8, 7, 8, 9, 10, 10)

# Run KPSS test (stationarity around mean)

result <- kpss.test(y, null = “Level”)

# Print results

print(result)

# If you want to test stationarity around a trend:

result_trend <- kpss.test(y, null = “Trend”)

print(result_trend)

KPSS Calculation for the Above Data in STATA

* KPSS is not installed by default in STATA, so first install

ssc install kpss

* Step 1: Enter your data

input y

6

7

5

4

8

7

8

9

10

10

end

* Step 2: Create a simple time index

gen t = _n

* Step 3: Declare the dataset as time series

tsset t

* Step 4: Run KPSS test

* For level stationarity (around a mean)

kpss y, notrend

* For trend stationarity (around a deterministic trend)

kpss y, trend

Why Checking Stationarity with KPSS is Important

Stationarity: The Foundation of Time Series Analysis

Many econometric models like ARIMA, VAR, cointegration tests, and Granger causality—rely on the key assumption that the data series is stationary. A stationary time series has a constant mean, variance, and auto-covariance over time. Without stationarity, models risk producing spurious results, meaning relationships that appear significant but are actually meaningless.

KPSS as a Complement to Other Tests

The KPSS test stands out because its null hypothesis assumes the series is stationary, unlike the ADF or PP tests that assume non-stationarity as their null. Using KPSS alongside these tests is a best practice called confirmatory testing:

  • If ADF fails to reject non-stationarity but KPSS rejects stationarity, it indicates a strong case for a unit root (non-stationarity).
  • If both tests agree, confidence in the result increases significantly.

This complementary approach reduces the chance of drawing the wrong conclusions about your data.

Guiding Model Choice

Knowing if your series is stationary determines how you model it:

  • Stationary series: Model directly (e.g., ARMA).
  • Non-stationary series: May require differencing (ARIMA) or checking for cointegration with other series.

Correct identification prevents model misspecification.

Avoiding Spurious Regression

Regressing one non-stationary series on another can produce deceptively high statistical measures, falsely suggesting a meaningful relationship. By applying stationarity tests like KPSS (with ADF/PP), you confirm if differencing or transformations are needed, thus avoiding misleading results.

Implications for Forecasting and Policy

Knowing the stationarity status helps determine whether to forecast in levels or growth rates:

  • Non-stationary data (like GDP or stock prices) often grow over time; forecasts use growth rates.
  • Stationary data (such as inflation or interest rate spreads) are forecast in levels.
  • KPSS aids in making these crucial decisions.

The Importance of Stationarity

Stationarity ensures that the characteristics of your time series are consistent over time, specifically regarding mean, variance, and covariance. Using non-stationary data can lead to:

  • Spurious regression: False correlations.
  • Unreliable estimates: Biased coefficients and statistics.
  • Poor predictive performance: Inaccurate and invalid models.

Why KPSS is Specifically Valuable

A Robust Partner to the ADF Test

The KPSS test’s unique value lies in its opposite null hypothesis compared to ADF:

TestNull Hypothesis (H₀)Alternative (H₁)
ADFNon-Stationary (unit root)Stationary
KPSSStationaryNon-Stationary (unit root)

Using both together gives a clearer picture through four possible scenarios:

ScenarioADF ResultKPSS ResultInterpretation
1. Clearly StationaryReject H₀Do not reject H₀Strong evidence of stationarity
2. Clearly Non-StationaryDo not reject H₀Reject H₀Strong evidence of unit root
3. Difference-StationaryDo not reject H₀Do not reject H₀Series needs differencing (stochastic trend)
4. Trend-StationaryReject H₀Reject H₀Series stationary around a deterministic trend; needs de-trending

This partnership is essential for choosing the right data transformation.

Clarifying Types of Non-Stationarity

  • If ADF says non-stationary but KPSS says stationary (scenario 3), this indicates a stochastic trend requiring differencing (using lags of the series) to achieve stationarity.
  • If ADF says stationary but KPSS rejects stationarity (scenario 4), this suggests a deterministic trend where de-trending (removing linear trend) is the correct treatment.

Applying the wrong treatment can reduce forecast accuracy, so KPSS helps avoid costly errors.

Higher Power in Certain Cases

The KPSS test sometimes detects stationarity more reliably than ADF, especially when the series is close to non-stationary but actually stationary. It strengthens confidence when ADF’s evidence is weak.

Real-World Example: Analyzing GDP

An economist testing GDP would:

  • Run ADF — if it fails to reject non-stationarity, the GDP likely has a unit root.
  • Run KPSS — if it also rejects stationarity, it confirms non-stationarity.
  • Conclude the series has a stochastic trend and needs differencing modeling GDP growth rather than the level.

Summary

Using the KPSS test is crucial because it:

  • Provides a vital robustness check alongside ADF.
  • Helps diagnose the exact type of non-stationarity to apply the correct transformation.
  • Improves model accuracy and forecast reliability by proper treatment of trends.
  • Offers stronger evidence for stationarity when it fails to reject its null.

In short, ignoring KPSS in time series analysis means missing key diagnostic information that ensures your conclusions and models are trustworthy.

Marquee Tag
bcfeducation.com
“BCF Education: Your Gateway to Mastering Finance, Economics, and Data Insights”

1.Augmented Dickey-Fuller ADF test Manual, using Python, R, STATA

Leave a Comment

Your email address will not be published. Required fields are marked *