<- c(seq(-2.5,-0.5,by=0.05),seq(-2,0,by=0.05),seq(-1.5,0.5,by=0.05))
X.FE <- -3*c(rep(-2,41),rep(0,41),rep(2,41))+X.FE + rnorm(123,0,1)
y.FE <- data.frame(y.FE,X.FE,unit=c(rep(1,41),rep(2,41),rep(3,41)), time=rep(seq(1,41,1),3))
FE.data library(foreign)
write.dta(FE.data, "FEData-2.dta")
par(mfrow=c(1,2))
with(FE.data, plot(X.FE,y.FE, bty="n", main="Pooled"))
with(FE.data, abline(lm(y.FE~X.FE), lty=2, col="brown"))
with(FE.data, plot(X.FE,y.FE, bty="n", col=unit, main="Fixed Effects"))
abline(a=-6,b=1, col="blue")
abline(a=0,b=1, col="blue")
abline(a=6,b=1, col="blue")
Slides
panelr
The panelr package vignette on between-within
Starting the panel data, or the generalization to multiple time series, perhaps the most famous question in the generic literature is a question about fixed and random effects, more precisely, do we estimate specific unobserved constants or do we seek only the distribution of these constants. The implications of this basic issue are substantial.
Some Simulated Data
Random effects and pooled regressions can be terribly wrong when the pooled and random effects moment condition fails. Let’s show some data here to illustrate the point. The true model here is \[ y_{it} = \alpha_{i} + X_{it}\beta + \epsilon_{it} \] where the \(\beta=1\) and \(\alpha_{i}=\{6,0,-6\}\) and \(\epsilon \sim \mathcal{N}(0,1)\). Here is the plot.
Three Models
library(plm)
<- pdata.frame(FE.data, c("unit","time"))
FE.pdata <- plm(y.FE~X.FE, data=FE.pdata, model="random")
mod.RE <- plm(y.FE~X.FE, data=FE.pdata, model="random", random.method = "amemiya")
mod.RE2 <- plm(y.FE~X.FE, data=FE.pdata, model="random", random.method = "walhus")
mod.RE3 <- plm(y.FE~X.FE, data=FE.pdata, model="random", random.method = "nerlove")
mod.RE4 <- plm(y.FE~X.FE, data=FE.pdata, model="within")
mod.FE <- plm(y.FE~X.FE, data=FE.pdata, model="pooling") mod.pool
Omitted Fixed Effects can be Very Bad
As we can see, the default random effects model in R [and Stata] is actually pretty horrible.
library(stargazer)
stargazer(mod.RE,mod.RE2,mod.RE3,mod.RE4,mod.pool,mod.FE, type="html", column.labels=c("RE","RE-WalHus","RE-Amemiya","RE-Nerlove","Pooled","FE"))
Dependent variable: | ||||||
y.FE | ||||||
RE | RE-WalHus | RE-Amemiya | RE-Nerlove | Pooled | FE | |
(1) | (2) | (3) | (4) | (5) | (6) | |
X.FE | -3.043*** | 0.837*** | 0.764*** | 0.839*** | -3.043*** | 0.842*** |
(0.524) | (0.140) | (0.164) | (0.139) | (0.524) | (0.140) | |
Constant | -4.084*** | -0.203 | -0.277 | -0.202 | -4.084*** | |
(0.646) | (2.866) | (0.845) | (3.538) | (0.646) | ||
Observations | 123 | 123 | 123 | 123 | 123 | 123 |
R2 | 0.218 | 0.228 | 0.153 | 0.230 | 0.218 | 0.234 |
Adjusted R2 | 0.211 | 0.222 | 0.146 | 0.224 | 0.211 | 0.215 |
F Statistic | 33.666*** | 35.758*** | 21.821*** | 36.205*** | 33.666*** (df = 1; 121) | 36.446*** (df = 1; 119) |
Note: | p<0.1; p<0.05; p<0.01 |
Discussion
The random method matters quite a bit though; many of them are very close to the truth. Models containing much or all of the between information are wrong.
If the X and unit effects are dependent, then there are serious threats to proper inference.
plm
things
Beck and Katz (1995) standard errors are provided with vcovBK()
. The key argument is cluster
which averages over groups or time. The Beck and Katz paper would involve cluster="time"
.
Almost all panel unit root testing goes on with purtest
. The test=
argument is key for IPS, Levin, et al., Maddala-Wu, Hadri, and various tests proposed by Choi (2001). A few others are specified individually below.
The test of serial correlation for panel models is given by
pbgtest(model)
.The Baltagi and Li test of serial correlation in panel models with random effects is given by
pbltest(model)
. The various alternatives are specified inalternative
.The Baltagi-Wu statistic for AR(1) disturbances is given by
pbnftest(model, test="lbi")
while a BNF (1982) statistic is the default for this test for fixed effects models.
# replicate Baltagi (2013), p. 101, table 5.1:
re <- plm(inv ~ value + capital, data = Grunfeld, model = "random")
pbnftest(re, test = "lbi")
pbsytest(model)
gives the joint test of Baltagi and Li and a variant owing to Bera, et. al (2001) and Sosa-Escudero and Bera (2008) – the latter is a paper in Stata journal with companion software to be installed.pcdtest(formula, data)
gives the Pesaran test for cross-sectional dependence.pdwtest(model)
gives a panel Durbin-Watson statistic.pFtest
gives the F-test of fixed effects.pggls
gives GLS estimators for panel data specifying the effect and a model ofwithin, pooling, fd
.phansitest(purtest object)
combines unit root tests in the method proposed by Hanck (2013).phtest(model1, model2)
is the Hausman test for panel data models. This one has robust options detailed in the last section of?phtest
.piest(formula, data)
performs Chamberlain’s tests on thewithin
regression.Another test of unit/time effects is given in
plmtest()
.Chow tests of
poolability
are given bypooltest()
applied to a pooled or within regression.pvar
ensures variation along dimensions.pvcm
will estimate variable coefficients models ala Swamy (1970).Joint tests of coefficients are constructed using
pwaldtest
.Wooldridge’s test for serial correlation in
within
models ispwartest(model)
Wooldridge’s test for AR(1) errors in level or differenced panel models is given by
pwfdtest(model)
. The underlying idea is clever; if the levels are independent then the errors in first-differences will be correlated as -0.5. The test can be implemented against either within/fe or first-difference alternatives.pwtest(pooling model)
gives a semi-parametric test for the presence of (individual or time) unobserved effects in panel models that owes to Wooldridge.ranef
andfixef
extract the random and fixed effects, respectively.