Day 6: Panel Data

panel data
code
analysis
Author

Robert W. Walker

Published

August 15, 2022

Slides

panelr

The panelr package vignette on between-within

Starting the panel data, or the generalization to multiple time series, perhaps the most famous question in the generic literature is a question about fixed and random effects, more precisely, do we estimate specific unobserved constants or do we seek only the distribution of these constants. The implications of this basic issue are substantial.

Some Simulated Data

Random effects and pooled regressions can be terribly wrong when the pooled and random effects moment condition fails. Let’s show some data here to illustrate the point. The true model here is \[ y_{it} = \alpha_{i} + X_{it}\beta + \epsilon_{it} \] where the \(\beta=1\) and \(\alpha_{i}=\{6,0,-6\}\) and \(\epsilon \sim \mathcal{N}(0,1)\). Here is the plot.

X.FE <- c(seq(-2.5,-0.5,by=0.05),seq(-2,0,by=0.05),seq(-1.5,0.5,by=0.05))
y.FE <- -3*c(rep(-2,41),rep(0,41),rep(2,41))+X.FE + rnorm(123,0,1)
FE.data <- data.frame(y.FE,X.FE,unit=c(rep(1,41),rep(2,41),rep(3,41)), time=rep(seq(1,41,1),3))
library(foreign)
write.dta(FE.data, "FEData-2.dta")
par(mfrow=c(1,2))
with(FE.data, plot(X.FE,y.FE, bty="n", main="Pooled"))
with(FE.data, abline(lm(y.FE~X.FE), lty=2, col="brown"))
with(FE.data, plot(X.FE,y.FE, bty="n", col=unit, main="Fixed Effects"))
abline(a=-6,b=1, col="blue")
abline(a=0,b=1, col="blue")
abline(a=6,b=1, col="blue")

Three Models

library(plm)
FE.pdata <- pdata.frame(FE.data, c("unit","time"))
mod.RE <- plm(y.FE~X.FE, data=FE.pdata, model="random")
mod.RE2 <- plm(y.FE~X.FE, data=FE.pdata, model="random", random.method = "amemiya")
mod.RE3 <- plm(y.FE~X.FE, data=FE.pdata, model="random", random.method = "walhus")
mod.RE4 <- plm(y.FE~X.FE, data=FE.pdata, model="random", random.method = "nerlove")
mod.FE <- plm(y.FE~X.FE, data=FE.pdata, model="within")
mod.pool <- plm(y.FE~X.FE, data=FE.pdata, model="pooling")

Omitted Fixed Effects can be Very Bad

As we can see, the default random effects model in R [and Stata] is actually pretty horrible.

library(stargazer)
stargazer(mod.RE,mod.RE2,mod.RE3,mod.RE4,mod.pool,mod.FE, type="html", column.labels=c("RE","RE-WalHus","RE-Amemiya","RE-Nerlove","Pooled","FE"))
Dependent variable:
y.FE
RE RE-WalHus RE-Amemiya RE-Nerlove Pooled FE
(1) (2) (3) (4) (5) (6)
X.FE -3.043*** 0.837*** 0.764*** 0.839*** -3.043*** 0.842***
(0.524) (0.140) (0.164) (0.139) (0.524) (0.140)
Constant -4.084*** -0.203 -0.277 -0.202 -4.084***
(0.646) (2.866) (0.845) (3.538) (0.646)
Observations 123 123 123 123 123 123
R2 0.218 0.228 0.153 0.230 0.218 0.234
Adjusted R2 0.211 0.222 0.146 0.224 0.211 0.215
F Statistic 33.666*** 35.758*** 21.821*** 36.205*** 33.666*** (df = 1; 121) 36.446*** (df = 1; 119)
Note: p<0.1; p<0.05; p<0.01

Discussion

The random method matters quite a bit though; many of them are very close to the truth. Models containing much or all of the between information are wrong.

If the X and unit effects are dependent, then there are serious threats to proper inference.

plm things

Beck and Katz (1995) standard errors are provided with vcovBK(). The key argument is cluster which averages over groups or time. The Beck and Katz paper would involve cluster="time".

Almost all panel unit root testing goes on with purtest. The test= argument is key for IPS, Levin, et al., Maddala-Wu, Hadri, and various tests proposed by Choi (2001). A few others are specified individually below.

  • The test of serial correlation for panel models is given by pbgtest(model).

  • The Baltagi and Li test of serial correlation in panel models with random effects is given by pbltest(model). The various alternatives are specified in alternative.

  • The Baltagi-Wu statistic for AR(1) disturbances is given by pbnftest(model, test="lbi") while a BNF (1982) statistic is the default for this test for fixed effects models.

# replicate Baltagi (2013), p. 101, table 5.1:
re <- plm(inv ~ value + capital, data = Grunfeld, model = "random")
pbnftest(re, test = "lbi")
  • pbsytest(model) gives the joint test of Baltagi and Li and a variant owing to Bera, et. al (2001) and Sosa-Escudero and Bera (2008) – the latter is a paper in Stata journal with companion software to be installed.

  • pcdtest(formula, data) gives the Pesaran test for cross-sectional dependence.

  • pdwtest(model) gives a panel Durbin-Watson statistic.

  • pFtest gives the F-test of fixed effects.

  • pggls gives GLS estimators for panel data specifying the effect and a model of within, pooling, fd.

  • phansitest(purtest object) combines unit root tests in the method proposed by Hanck (2013).

  • phtest(model1, model2) is the Hausman test for panel data models. This one has robust options detailed in the last section of ?phtest.

  • piest(formula, data) performs Chamberlain’s tests on the within regression.

  • Another test of unit/time effects is given in plmtest().

  • Chow tests of poolability are given by pooltest() applied to a pooled or within regression.

  • pvar ensures variation along dimensions.

  • pvcm will estimate variable coefficients models ala Swamy (1970).

  • Joint tests of coefficients are constructed using pwaldtest.

  • Wooldridge’s test for serial correlation in within models is pwartest(model)

  • Wooldridge’s test for AR(1) errors in level or differenced panel models is given by pwfdtest(model). The underlying idea is clever; if the levels are independent then the errors in first-differences will be correlated as -0.5. The test can be implemented against either within/fe or first-difference alternatives.

  • pwtest(pooling model) gives a semi-parametric test for the presence of (individual or time) unobserved effects in panel models that owes to Wooldridge.

  • ranef and fixef extract the random and fixed effects, respectively.