Granger Causality and Hypothesis Testing

Ziyi Zhu

Ziyi Zhu / December 11, 2022

7 min read––– views

This article discusses causal illusions as a form of cognitive bias and explores the use of Granger causality to detect causal structures in time series. It is common practice to analyse (linear) structure, estimate linear models and perform forecasts based on single stationary time series. However, the world does not consist of independent stochastic processes. In accordance with general equilibrium theory, economists usually assume that everything depends on everything else. Therefore, it is important to understand and quantify the (causal) relationships between different time series.

Epiphenomena

Epiphenomena is a class of causal illusions where the direction of causal relationships is ambiguous. For example, when you spend time on the bridge of a ship with a large compass in front, you can easily develop the impression that the compass is directing the ship rather than merely reflecting its direction. Here is an image that perfectly illustrates the point that correlation is not causation:

image

Nassim Nicholas Taleb explored this concept in his book Antifragile to highlight the causal illusion that universities generate wealth in society. He presented a miscellany of evidence which suggests that classroom education does not lead to wealth as much as it comes from wealth (an epiphenomenon). Taleb proposes that antifragile risk-taking is largely responsible for innovation and growth instead of education and formal, organized research. However, it does not mean that theories and research play no role, but rather shows that we are fooled by randomness into overestimating the role of good-sounding ideas. Because of cognitive biases, historians are prone to epiphenomena and other illusions of cause and effect.

We can debunk epiphenomena in the cultural discourse and consciousness by looking at the sequence of events and checking their order or occurrences. This method is refined by Clive Granger who developed a rigorously scientific approach that can be used to establish causation by looking at time series sequences and measuring the "Granger cause".

Granger causality

In the following, we present the definition of Granger causality and the different possibilities of causal events resulting from it. Consider two weakly stationary time series xx and yy:

  1. Granger Causality: xx is (simply) Granger causal to yy if and only if the application of an optimal linear prediction function leads to a smaller forecast error variance on the future value of yy if current and past values of xx are used.

  2. Instantaneous Granger Causality: xx is instantaneously Granger causal to yy if and only if the application of an optimal linear prediction function leads to a smaller forecast error variance on the future value of yy if the future value of xx is used in addition to the current and past values of xx.

  3. Feedback: There is feedback between xx and yy if xx is causal to yy and yy is causal to xx. Feedback is only defined for the case of simple causal relations.

Hypothesis testing in general linear models

Consider the general linear model Y=Xβ+ε\mathbf{Y} = X \boldsymbol{\beta} + \varepsilon. In hypothesis testing, we want to know whether certain variables influence the result. If, say, the variable x1x_1 does not influence Y\mathbf{Y}, then we must have β1=0\beta_1 = 0. So the goal is to test the hypothesis H0:β1=0H_0: \beta_1 = 0 versus H1:β10H_1: \beta_1 \neq 0. We will tackle a more general case, where β\beta can be split into two vectors β0\beta_0 and β1\beta_1, and we test if β1\beta_1 is zero.

Suppose Xn×p=(X0n×p0X1n×(pp0))\underset{n \times p}{X} = \left(\underset{n \times p_0}{X_0} \underset{n \times\left(p - p_0\right)}{X_1}\right) and β=(β0β1)\boldsymbol{\beta} = \left(\begin{array}{c}\boldsymbol{\beta}_0 \\ \boldsymbol{\beta}_1\end{array}\right), where rank(X)=p\operatorname{rank}(X)= p, rank(X0)=p0\operatorname{rank}\left(X_0\right) = p_0. We want to test H0:β1=0H_0: \boldsymbol{\beta}_1 = 0 against H1:β10H_1: \boldsymbol{\beta}_1 \neq 0. Under H0H_0, X1β1X_1 \boldsymbol{\beta}_1 vanishes and

Y=X0β0+ε\mathbf{Y} = X_0 \boldsymbol{\beta}_0 + \varepsilon

Under H0H_0, the maximum likelihood estimation (MLE) of β0\boldsymbol{\beta}_0 and σ2\sigma^2 are

β^^0=(X0TX0)1X0TYσ^^2=1n(YX0β^^0)T(YX0β^^0)=RSS0n\begin{aligned} \hat{\hat{\boldsymbol{\beta}}}_0 &= \left(X_0^T X_0\right)^{-1} X_0^T \mathbf{Y} \\ \hat{\hat{\sigma}}^2 &= \frac{1}{n}\left(\mathbf{Y}-X_0 \hat{\hat{\boldsymbol{\beta}}}_0\right)^T\left(\mathbf{Y}-X_0 \hat{\hat{\boldsymbol{\beta}}}_0\right) \\ &= \frac{\mathrm{RSS}_0}{n} \end{aligned}

and we have previously shown these are independent. So the fitted values under H0H_0 are

Y^^=X0(X0TX0)1X0TY=P0Y\hat{\hat{\mathbf{Y}}} = X_0\left(X_0^T X_0\right)^{-1} X_0^T \mathbf{Y} = P_0 \mathbf{Y}

where P0=X0(X0TX0)1X0TP_0=X_0\left(X_0^T X_0\right)^{-1} X_0^T.

Note that our poor estimators wear two hats instead of one. We adopt the convention that the estimators of the null hypothesis have two hats, while those of the alternative hypothesis have one.

The generalized likelihood ratio test of H0H_0 against H1H_1 is

ΛY(H0,H1)=(σ^^2σ^2)n/2=(1+RSS0RSSRSS)n/2\begin{aligned} \Lambda_{\mathbf{Y}}\left(H_0, H_1\right) &= \left(\frac{\hat{\hat{\sigma}}^2}{\hat{\sigma}^2}\right)^{n / 2} \\ &= \left(1+\frac{\mathrm{RSS}_0 - \mathrm{RSS}}{\mathrm{RSS}}\right)^{n / 2} \end{aligned}

We reject H0H_0 when 2logΛ2 \log \Lambda is large, equivalently when RSS0RSSRSS\frac{\mathrm{RSS}_0-\mathrm{RSS}}{\mathrm{RSS}} is large. Under H0H_0, we have

2logΛ=nlog(1+RSS0RSSRSS)2 \log \Lambda = n \log \left(1+\frac{\mathrm{RSS}_0 - \mathrm{RSS}}{\mathrm{RSS}}\right)

which is approximately a χpp02\chi_{p-p_0}^2 random variable. We can also get an exact null distribution, and get an exact test. The F statistic under H0H_0 is given by

F=(RSS0RSS)/(pp0)RSS/(np)Fpp0,npF = \frac{\left(\mathrm{RSS}_0 - \operatorname{RSS}\right) /\left(p-p_0\right)}{\operatorname{RSS} /(n-p)} \sim F_{p-p_0, n-p}

Hence we reject H0H_0 if F>Fpp0,np(α)F > F_{p-p_0, n-p}(\alpha). RSS0RSS\mathrm{RSS}_0 -\mathrm{RSS} is the reduction in the sum of squares due to fitting β1\boldsymbol{\beta}_1 in addition to β0\boldsymbol{\beta}_0.

Source of var.d.f.sum of squaresmean squares
Fitted modelpp0p - p_0RSS0RSS\mathrm{RSS}_0-\mathrm{RSS}RSS0RSSpp0\frac{\mathrm{RSS}_0-\mathrm{RSS}}{p - p_0}
Residualnpn - pRSS\mathrm{RSS}RSSnp\frac{\mathrm{RSS}}{n - p}
Totalnp0n - p_0RSS0\mathrm{RSS}_0

The ratio RSS0RSSRSS0\frac{\mathrm{RSS}_0-\mathrm{RSS}}{\mathrm{RSS}_0} is sometimes known as the proportion of variance explained by β1\boldsymbol{\beta}_1, and denoted R2R^2.

def fit(data, p=1):
    n = data.shape[0] - p
    Y = data[p:]
    X = np.stack([np.ones(n)] + [data[p-i-1:-i-1] for i in range(p)], axis=-1)
    
    beta_mle = np.linalg.inv(X.T.dot(X)).dot(X.T.dot(Y))
    R = Y - X.dot(beta_mle)
    RSS = R.T.dot(R) 
    var_mle = RSS / n
    
    return beta_mle, var_mle, RSS

Causality tests

To test for simple causality from xx to yy, it is examined whether the lagged values of xx in the regression of yy on lagged values of xx and yy significantly reduce the error variance. By using the ordinary least squares (OLS) method, the following equation is estimated:

yt=α0+k=1k1α11kytk+k=k0k2α12kxtk+u1,ty_{t}=\alpha_{0}+\sum_{k=1}^{k_{1}} \alpha_{11}^{k} y_{t-k}+\sum_{k=k_{0}}^{k_{2}} \alpha_{12}^{k} x_{t-k}+u_{1, t}

with k0=1k_0 = 1. An F test is applied to test the null hypothesis, H0:α121=α121==α12k2=0H_0: \alpha^{1}_{12} = \alpha^{1}_{12} = \cdots = \alpha^{k_2}_{12} = 0. By changing xx and yy, it can be tested whether a simple causal relation from yy to xx exists. There is a feedback relation if the null hypothesis is rejected in both directions. To test whether there is instantaneous causality, we finally set k0=0k_0 = 0 and perform a t or F test for the null hypothesis H0:α120=0H_0: \alpha^{0}_{12} = 0.

def causality_tests(y, x, alpha=0.05, k_1=1, maxlag=1):
    for k_2 in range(1, maxlag + 1):
        p = 1 + k_1 + k_2
        n = y.shape[0] - np.max([k_1, k_2])

        _, _, RSS = fit_xy(y, x, k_1, k_2)
        _, _, RSS_0 = fit(y, k_1)

        chi2 = n * np.log(RSS_0 / RSS)
        f = ((RSS_0 - RSS) / k_2) / (RSS / (n - p))

One problem with this test is that the results are strongly dependent on the number of lags of the explanatory variable, k2k_2. There is a trade-off: the more lagged values we include, the better the influence of this variable can be captured. This argues for a high maximal lag. On the other hand, the power of this test is lower the more lagged values are included.

We can fit a linear model on real financial data and use the Granger causality test to find causal relationships.

image

Hypothesis Test (p=1)
F test:       F=24.4610 , p=0.0000, df_denom=199, df_num=1
chi2 test: chi2=23.3023 , p=0.0000, df=1
Reject null hypothesis

Hypothesis Test (p=2)
F test:       F=8.2501  , p=0.0045, df_denom=197, df_num=1
chi2 test: chi2=8.2051  , p=0.0042, df=1
Reject null hypothesis

Hypothesis Test (p=3)
F test:       F=0.4181  , p=0.5187, df_denom=195, df_num=1
chi2 test: chi2=0.4262  , p=0.5139, df=1
Accept null hypothesis

Hypothesis Test (p=4)
F test:       F=4.9506  , p=0.0272, df_denom=193, df_num=1
chi2 test: chi2=5.0148  , p=0.0251, df=1
Reject null hypothesis