Day 4: Cointegration

time series
code
analysis
Author

Robert W. Walker

Published

August 11, 2022

Slides

A replication

This is an example with an ECM from a model presented by DeBoef and Keele in their excellent paper “Taking Time Seriously: Dynamic Regression.” The data come from a paper Gilmour and Wolbrecht.

use "https://github.com/robertwwalker/Essex-Data/raw/main/dgw.dta", clear
* These data have already been time set:
tsset
* The dependent variable in this example is Congressional Approval
*Table 2
reg capp l.capp econexp nytavg kg hb vetoes override intrasum mbill 
bgodfrey, lag(1 2 3)
library(haven); library(tidyverse); library(fpp3)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.3.6     ✔ purrr   0.3.4
✔ tibble  3.1.8     ✔ dplyr   1.0.9
✔ tidyr   1.2.0     ✔ stringr 1.4.0
✔ readr   2.1.2     ✔ forcats 0.5.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
── Attaching packages ──────────────────────────────────────────── fpp3 0.4.0 ──

✔ lubridate   1.8.0     ✔ feasts      0.2.2
✔ tsibble     1.1.1     ✔ fable       0.3.1
✔ tsibbledata 0.4.0     

── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
✖ lubridate::date()    masks base::date()
✖ dplyr::filter()      masks stats::filter()
✖ tsibble::intersect() masks base::intersect()
✖ tsibble::interval()  masks lubridate::interval()
✖ dplyr::lag()         masks stats::lag()
✖ tsibble::setdiff()   masks base::setdiff()
✖ tsibble::union()     masks base::union()
dgw.data <- read_stata("https://github.com/robertwwalker/Essex-Data/raw/main/dgw.dta")
dgw.data <- dgw.data %>% filter(!is.na(year))
dgw.data <- dgw.data %>% mutate(date = paste0(year," Q",quarter, sep="")) %>% mutate(dateQ = yearquarter(date))
dgw.ts <- dgw.data %>% as_tsibble(index=dateQ)
# A dynamic linear model
dgw.ts %>% model(TSLM(capp~lag(capp,1)+econexp+nytavg+kg+hb+vetoes+override+intrasum+mbills)) %>% report()
Series: capp 
Model: TSLM 

Residuals:
    Min      1Q  Median      3Q     Max 
-9.1137 -1.3908 -0.1014  1.5446  7.0873 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)   9.37793    3.33043   2.816  0.00634 ** 
lag(capp, 1)  0.79605    0.05166  15.409  < 2e-16 ***
econexp       0.07173    0.03090   2.321  0.02323 *  
nytavg        0.20716    0.06982   2.967  0.00413 ** 
kg           -1.29097    1.08646  -1.188  0.23881    
hb           -4.68564    1.69975  -2.757  0.00746 ** 
vetoes        0.24382    0.08898   2.740  0.00781 ** 
override     -0.99198    0.55252  -1.795  0.07697 .  
intrasum     -0.16907    0.12199  -1.386  0.17021    
mbills       -0.43939    0.28270  -1.554  0.12469    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.976 on 69 degrees of freedom
Multiple R-squared: 0.885,  Adjusted R-squared: 0.8699
F-statistic: 58.97 on 9 and 69 DF, p-value: < 2.22e-16
*Table 2 with Pres. Approval
reg capp l.capp p_prap econexp nytavg kg hb vetoes override intrasum mbill 
bgodfrey, lag(1 2 3)
dgw.ts %>% model(TSLM(capp~lag(capp,1)+p_prap+econexp+nytavg+kg+hb+vetoes+override+intrasum+mbills)) %>% report()
Series: capp 
Model: TSLM 

Residuals:
    Min      1Q  Median      3Q     Max 
-8.4892 -1.3161 -0.1978  1.7633  6.3624 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  10.13865    3.38461   2.996  0.00382 ** 
lag(capp, 1)  0.77084    0.05585  13.803  < 2e-16 ***
p_prap        0.04708    0.04024   1.170  0.24610    
econexp       0.08161    0.03195   2.554  0.01290 *  
nytavg        0.20342    0.06971   2.918  0.00477 ** 
kg           -1.61049    1.11745  -1.441  0.15411    
hb           -4.87343    1.70280  -2.862  0.00559 ** 
vetoes        0.24012    0.08880   2.704  0.00865 ** 
override     -0.96229    0.55163  -1.744  0.08560 .  
intrasum     -0.14809    0.12298  -1.204  0.23267    
mbills       -0.46652    0.28290  -1.649  0.10375    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.968 on 68 degrees of freedom
Multiple R-squared: 0.8872, Adjusted R-squared: 0.8706
F-statistic:  53.5 on 10 and 68 DF, p-value: < 2.22e-16

The Breusch-Godfrey test

sapply(c(1:3), function(x) {lmtest::bgtest(lm(capp~lag(capp,1)+p_prap+econexp+nytavg+kg+hb+vetoes+override+intrasum+mbills, data=dgw.ts), order = x)})
             [,1]                                                          
statistic    0.5038368                                                     
parameter    1                                                             
method       "Breusch-Godfrey test for serial correlation of order up to 1"
p.value      0.4778191                                                     
data.name    character,2                                                   
coefficients numeric,12                                                    
vcov         numeric,144                                                   
             [,2]                                                          
statistic    1.043041                                                      
parameter    2                                                             
method       "Breusch-Godfrey test for serial correlation of order up to 2"
p.value      0.5936172                                                     
data.name    character,2                                                   
coefficients numeric,13                                                    
vcov         numeric,169                                                   
             [,3]                                                          
statistic    1.531487                                                      
parameter    3                                                             
method       "Breusch-Godfrey test for serial correlation of order up to 3"
p.value      0.6750225                                                     
data.name    character,2                                                   
coefficients numeric,14                                                    
vcov         numeric,196                                                   
* An Alternate Measure
* ADL
reg capp l.capp p_prap l.p_prap econexp l.econexp nytavg l.nytavg kg hb vetoes override intrasum mbill 
fitstat
bgodfrey, lag(1 2 3)
dgw.ts %>% model(TSLM(capp~lag(capp, 1)+p_prap+lag(p_prap, 1)+econexp+lag(econexp,1)+nytavg+lag(nytavg, 1)+kg+hb+vetoes+override+intrasum+mbills)) %>% gg_tsresiduals()
Warning: Removed 1 row(s) containing missing values (geom_path).
Warning: Removed 1 rows containing missing values (geom_point).
Warning: Removed 1 rows containing non-finite values (stat_bin).

reg capp l.capp p_prap l.p_prap econexp l.econexp nytavg l.nytavg l2.nytavg kg hb vetoes override intrasum mbill 
fitstat
test l.p_prap 
summary(lm(capp~ dplyr::lag(capp, 1)+p_prap+dplyr::lag(p_prap, 1)+econexp+ dplyr::lag(econexp, 1)+ nytavg+ dplyr::lag(nytavg, 1) + dplyr::lag(nytavg, 2)+ kg+hb+vetoes+override+intrasum+mbills, data=dgw.data))

Call:
lm(formula = capp ~ dplyr::lag(capp, 1) + p_prap + dplyr::lag(p_prap, 
    1) + econexp + dplyr::lag(econexp, 1) + nytavg + dplyr::lag(nytavg, 
    1) + dplyr::lag(nytavg, 2) + kg + hb + vetoes + override + 
    intrasum + mbills, data = dgw.data)

Residuals:
    Min      1Q  Median      3Q     Max 
-7.2813 -1.1587 -0.2843  1.5055  5.5623 

Coefficients:
                        Estimate Std. Error t value Pr(>|t|)    
(Intercept)            13.255444   4.027775   3.291  0.00164 ** 
dplyr::lag(capp, 1)     0.726775   0.065844  11.038 2.31e-16 ***
p_prap                  0.138223   0.057373   2.409  0.01892 *  
dplyr::lag(p_prap, 1)  -0.093813   0.055274  -1.697  0.09459 .  
econexp                 0.005848   0.070969   0.082  0.93459    
dplyr::lag(econexp, 1)  0.082163   0.072223   1.138  0.25959    
nytavg                  0.192404   0.071465   2.692  0.00908 ** 
dplyr::lag(nytavg, 1)   0.023006   0.069422   0.331  0.74144    
dplyr::lag(nytavg, 2)   0.138535   0.078110   1.774  0.08097 .  
kg                     -1.566093   1.122181  -1.396  0.16774    
hb                     -4.340598   1.715171  -2.531  0.01389 *  
vetoes                  0.283759   0.101779   2.788  0.00700 ** 
override               -1.061008   0.580564  -1.828  0.07235 .  
intrasum               -0.107917   0.122254  -0.883  0.38074    
mbills                 -0.715715   0.304646  -2.349  0.02196 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.914 on 63 degrees of freedom
  (2 observations deleted due to missingness)
Multiple R-squared:  0.8992,    Adjusted R-squared:  0.8768 
F-statistic: 40.14 on 14 and 63 DF,  p-value: < 2.2e-16
anova(lm(capp~ dplyr::lag(capp, 1)+p_prap+econexp+ dplyr::lag(econexp, 1)+ nytavg+ dplyr::lag(nytavg, 1) + dplyr::lag(nytavg, 2)+ kg+hb+vetoes+override+intrasum+mbills, data=dgw.data), lm(capp~ dplyr::lag(capp, 1)+p_prap+dplyr::lag(p_prap, 1)+econexp+ dplyr::lag(econexp, 1)+ nytavg+ dplyr::lag(nytavg, 1) + dplyr::lag(nytavg, 2)+ kg+hb+vetoes+override+intrasum+mbills, data=dgw.data)) 
Analysis of Variance Table

Model 1: capp ~ dplyr::lag(capp, 1) + p_prap + econexp + dplyr::lag(econexp, 
    1) + nytavg + dplyr::lag(nytavg, 1) + dplyr::lag(nytavg, 
    2) + kg + hb + vetoes + override + intrasum + mbills
Model 2: capp ~ dplyr::lag(capp, 1) + p_prap + dplyr::lag(p_prap, 1) + 
    econexp + dplyr::lag(econexp, 1) + nytavg + dplyr::lag(nytavg, 
    1) + dplyr::lag(nytavg, 2) + kg + hb + vetoes + override + 
    intrasum + mbills
  Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
1     64 559.29                              
2     63 534.83  1    24.454 2.8806 0.09459 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
*ECM
reg d.capp l.capp d.p_prap l.p_prap d.econexp l.econexp d.nytavg l.nytavg kg hb vetoes override intrasum mbill 
bgodfrey, lag(1 2 3)
summary(lm(difference(capp)~dplyr::lag(capp, 1) + difference(p_prap) +  dplyr::lag(p_prap, 1)+ difference(econexp)+lag(econexp, 1)+difference(nytavg) + dplyr::lag(nytavg) +  kg +  hb +  vetoes + override +  intrasum +  mbills, data=dgw.data))

Call:
lm(formula = difference(capp) ~ dplyr::lag(capp, 1) + difference(p_prap) + 
    dplyr::lag(p_prap, 1) + difference(econexp) + lag(econexp, 
    1) + difference(nytavg) + dplyr::lag(nytavg) + kg + hb + 
    vetoes + override + intrasum + mbills, data = dgw.data)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.0575 -1.2991 -0.2338  1.6723  6.0297 

Coefficients:
                      Estimate Std. Error t value Pr(>|t|)    
(Intercept)            9.92398    3.59485   2.761 0.007490 ** 
dplyr::lag(capp, 1)   -0.22348    0.05979  -3.738 0.000394 ***
difference(p_prap)     0.11480    0.05548   2.069 0.042508 *  
dplyr::lag(p_prap, 1)  0.02166    0.04356   0.497 0.620667    
difference(econexp)    0.02214    0.07027   0.315 0.753732    
lag(econexp, 1)        0.08156    0.03311   2.463 0.016420 *  
difference(nytavg)     0.18074    0.07179   2.518 0.014290 *  
dplyr::lag(nytavg)     0.22229    0.09152   2.429 0.017921 *  
kg                    -1.36022    1.12380  -1.210 0.230519    
hb                    -4.24191    1.72938  -2.453 0.016865 *  
vetoes                 0.20605    0.09261   2.225 0.029567 *  
override              -0.75686    0.55956  -1.353 0.180870    
intrasum              -0.12549    0.12277  -1.022 0.310504    
mbills                -0.53209    0.28906  -1.841 0.070217 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.939 on 65 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.4115,    Adjusted R-squared:  0.2938 
F-statistic: 3.496 on 13 and 65 DF,  p-value: 0.0003862
*Bewley
ivreg capp p_prap d.p_prap econexp d.econexp nytavg d.nytavg kg hb vetoes override intrasum mbill (d.capp = l.capp p_prap l.p_prap econexp l.econexp nytavg l.nytavg)