Day 1: Time, Decomposition, and Summaries

panel data
time series
code
analysis
Author

Robert W. Walker

Published

August 8, 2022

Slides

The Data

The data are in Stata format. The github for this course. There is also a link here for a local copy.

Basic R commands

Loading the data in R and a summary using skimr::skim.

library(haven); library(kableExtra)
HR.Data <- read_dta(url("https://github.com/robertwwalker/DADMStuff/raw/master/ISQ99-Essex.dta"))
library(skimr)
skim(HR.Data) %>% kable() %>% scroll_box(width="80%", height="50%")
skim_type skim_variable n_missing complete_rate numeric.mean numeric.sd numeric.p0 numeric.p25 numeric.p50 numeric.p75 numeric.p100 numeric.hist
numeric IDORIGIN 0 1.0000000 446.7178771 243.1931782 2.00 290.000 435.000 640.00 990.00 ▆▇▇▆▂
numeric YEAR 0 1.0000000 1984.5000000 5.1889328 1976.00 1980.000 1984.500 1989.00 1993.00 ▇▆▇▆▇
numeric AI 1061 0.6707014 2.7533549 1.0752989 1.00 2.000 3.000 3.00 5.00 ▃▇▇▃▂
numeric SD 587 0.8178150 2.2406072 1.1303528 1.00 1.000 2.000 3.00 5.00 ▇▇▆▂▁
numeric POLRT 382 0.8814401 3.8095070 2.2230297 1.00 2.000 3.000 6.00 7.00 ▇▂▂▁▇
numeric MIL2 382 0.8814401 0.2725352 0.4453421 0.00 0.000 0.000 1.00 1.00 ▇▁▁▁▃
numeric LEFT 393 0.8780261 0.1763874 0.3812168 0.00 0.000 0.000 0.00 1.00 ▇▁▁▁▂
numeric BRIT 290 0.9099938 0.3553888 0.4787126 0.00 0.000 0.000 1.00 1.00 ▇▁▁▁▅
numeric PCGNP 443 0.8625078 3591.6509536 5698.3554010 52.00 390.000 1112.000 3510.00 36670.00 ▇▁▁▁▁
numeric AINEW 468 0.8547486 2.4433551 1.1558005 1.00 1.000 2.000 3.00 5.00 ▇▇▇▃▂
numeric SDNEW 468 0.8547486 2.2618010 1.1365604 1.00 1.000 2.000 3.00 5.00 ▇▇▆▂▁
numeric IDGURR 0 1.0000000 455.7709497 246.5201369 2.00 290.000 450.000 663.00 990.00 ▆▇▇▇▃
numeric AILAG 644 0.8001241 2.4499612 1.1479673 1.00 1.000 2.000 3.00 5.00 ▇▇▇▃▂
numeric SDLAG 644 0.8001241 2.2470908 1.1156632 1.00 1.000 2.000 3.00 5.00 ▇▇▆▂▁
numeric PERCHPCG 618 0.8081937 4.6138441 13.2208934 -95.50 -2.545 4.615 11.76 128.57 ▁▂▇▁▁
numeric PERCHPOP 293 0.9090627 2.1928815 4.0424128 -48.45 0.910 2.220 2.94 126.01 ▁▇▁▁▁
numeric LPOP 115 0.9643079 15.4819279 1.8633316 11.00 14.510 15.590 16.64 20.89 ▂▃▇▃▁
numeric PCGTHOU 443 0.8625078 3.5916985 5.6983334 0.05 0.390 1.110 3.51 36.67 ▇▁▁▁▁
numeric DEMOC3 793 0.7538796 3.6817620 4.3577178 0.00 0.000 0.000 9.00 10.00 ▇▁▁▂▃
numeric CWARCOW 407 0.8736809 0.0920071 0.2890873 0.00 0.000 0.000 0.00 1.00 ▇▁▁▁▁
numeric IWARCOW2 380 0.8820608 0.0862069 0.2807187 0.00 0.000 0.000 0.00 1.00 ▇▁▁▁▁

An xtsum function for R

A little function that I wrote up on github.

library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
✔ ggplot2 3.3.6     ✔ purrr   0.3.4
✔ tibble  3.1.7     ✔ dplyr   1.0.9
✔ tidyr   1.2.0     ✔ stringr 1.4.0
✔ readr   2.1.2     ✔ forcats 0.5.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter()     masks stats::filter()
✖ dplyr::group_rows() masks kableExtra::group_rows()
✖ dplyr::lag()        masks stats::lag()
library(plm)

Attaching package: 'plm'
The following objects are masked from 'package:dplyr':

    between, lag, lead
source(url("https://raw.githubusercontent.com/robertwwalker/DADMStuff/master/xtsum.R"))
# Be careful with the ID variable, the safest is to make it factor; this can go wildly wrong
xtsum(IDORIGIN~., data=HR.Data) %>% kable() %>% scroll_box(width="80%", height="50%")
Note: Using an external vector in selections is ambiguous.
ℹ Use `all_of(unit)` instead of `unit` to silence this message.
ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
This message is displayed once per session.
O.mean O.sd O.min O.max O.SumSQ O.N B.mean B.sd B.min B.max B.Units B.t.bar W.sd W.min W.max W.SumSQ Within.Ovr.Ratio
YEAR 1984.5 5.189 1976 1993 86725.5 3222 1984.5 0 1984.5 1984.5 179 18 5.189 -8.5 8.5 86725.5 1
AI 2.753 1.075 1 5 2497.538 2161 2.498 0.989 1 5 173 12.491 0.631 -2.375 2.5625 860.822 0.345
SD 2.241 1.13 1 5 3365.455 2635 2.241 1.004 1 5 178 14.803 0.624 -2.666667 3.0625 1025.695 0.305
POLRT 3.81 2.223 1 7 14029.94 2840 3.78 1.99 1 7 179 15.866 0.925 -4 4.777778 2428.552 0.173
MIL2 0.273 0.445 0 1 563.058 2840 0.24 0.377 0 1 179 15.866 0.216 -0.9444444 0.8888889 132.778 0.236
LEFT 0.176 0.381 0 1 410.983 2829 0.157 0.334 0 1 179 15.804 0.157 -0.8888889 0.8888889 69.611 0.169
BRIT 0.355 0.479 0 1 671.685 2932 0.335 0.473 0 1 179 16.38 0 0 0 0 0
PCGNP 3591.651 5698.355 52 36670 90205144379 2779 3449.178 5049.297 112.2222 22653.89 173 16.064 2278.412 -12303.33 16961.67 14421042273 0.16
AINEW 2.443 1.156 1 5 3677.663 2754 2.379 1.012 1 5 178 15.472 0.622 -2.388889 2.944444 1064.102 0.289
SDNEW 2.262 1.137 1 5 3556.241 2754 2.253 1.006 1 5 178 15.472 0.631 -2.588235 3 1096.442 0.308
IDGURR 455.771 246.52 2 990 195747185 3222 455.771 247.173 2 990 179 18 0 0 0 0 0
AILAG 2.45 1.148 1 5 3396.045 2578 2.402 1.039 1 5 177 14.565 0.609 -2.411765 3 955.37 0.281
SDLAG 2.247 1.116 1 5 3207.603 2578 2.236 0.991 1 5 177 14.565 0.608 -2.5 3.058824 952.174 0.297
PERCHPCG 4.614 13.221 -95.5 128.57 454983.6 2604 3.325 6.893 -36.21333 15.03765 168 15.5 12.393 -92.50235 114.8882 399763 0.879
PERCHPOP 2.193 4.042 -48.45 126.01 47846.75 2929 2.842 9.443 -2.126471 126.01 176 16.642 3.018 -48.12235 80.69765 26663.59 0.557
LPOP 15.482 1.863 11 20.89 10784.05 3107 15.488 1.844 11.09056 20.76889 177 17.554 0.129 -0.7288889 0.7311111 51.883 0.005
PCGTHOU 3.592 5.698 0.05 36.67 90204.45 2779 3.449 5.049 0.1122222 22.65389 173 16.064 2.278 -12.30333 16.96167 14420.95 0.16
DEMOC3 3.682 4.358 0 10 46107 2429 3.774 3.96 0 10 155 15.671 1.726 -7.277778 7.941176 7229.815 0.157
CWARCOW 0.092 0.289 0 1 235.17 2815 0.095 0.245 0 1 179 15.726 0.175 -0.8888889 0.9444444 85.693 0.364
IWARCOW2 0.086 0.281 0 1 223.879 2842 0.092 0.227 0 1 179 15.877 0.19 -0.8888889 0.9444444 102.992 0.46

The Core Idea

In R, this is an essential group_by calculation in the tidyverse. The between data are a summary table with units constituting the rows. The within data is the overall data with group means subtracted.

HR.Data %>% 
  group_by(IDORIGIN) %>% 
  mutate(DEMOC.Centered = 
           DEMOC3 - mean(DEMOC3, na.rm=TRUE)) %>%
  filter(IDORIGIN==42) %>% 
  select(IDORIGIN, YEAR, DEMOC3, DEMOC.Centered) 
# A tibble: 18 × 4
# Groups:   IDORIGIN [1]
   IDORIGIN  YEAR DEMOC3 DEMOC.Centered
      <dbl> <dbl>  <dbl>          <dbl>
 1       42  1976      1         -5.11 
 2       42  1977      1         -5.11 
 3       42  1978      6         -0.111
 4       42  1979      6         -0.111
 5       42  1980      6         -0.111
 6       42  1981      6         -0.111
 7       42  1982      7          0.889
 8       42  1983      7          0.889
 9       42  1984      7          0.889
10       42  1985      7          0.889
11       42  1986      7          0.889
12       42  1987      7          0.889
13       42  1988      7          0.889
14       42  1989      7          0.889
15       42  1990      7          0.889
16       42  1991      7          0.889
17       42  1992      7          0.889
18       42  1993      7          0.889

In Stata, the key is to first load and declare the data.

use "https://github.com/robertwwalker/Essex-Data/raw/main/ISQ99-Essex.dta"
des

Stata Load

xtset denotes two key features of the data, the \(i\) and \(t\).

xtset IDORIGIN YEAR

xtset

Stata has internal capabilities for summarising and describing xt data. A between and within summary is given by xtsum

xtsum

xtsum

The description can be deceptive because the indices are a complete grid.

xtdes

xtdes

Leads and Lags

The core operators in Stata lead with a letter before . With panel data, we will need to be cautious of the grouping.

  • F is forward,
  • L is Lag,
  • D is difference.

Operators

Qualitative Variables

xttab

  • Between: How many units received each category?
  • Within: Of all observations of units that received that category at least once, what percent of observations take this value?
xttab AINEW

xttab

The Between

between.tally <- function(x) {
HR.Data %>% select(IDORIGIN, AINEW) %>% filter(AINEW==x) %$% table(IDORIGIN) %>% length()
}
sapply(c(1:5), function(x) {between.tally(x)})
[1]  96 121 113  86  43

xttrans

A first-order transition matrix.

xttrans AINEW

xttrans I want to rely on the dplyr version of lag so I am explicit here. Take the data, group it by id, calculate the lag, ungroup them, and create a table. I prefer to keep this explicit with order_by. The janitor library provides tabyl and it is explicit among missing values.

library(janitor)
HR.Data %>% 
  group_by(IDORIGIN) %>% 
  mutate(Lag.AI = dplyr::lag(AINEW, n=1L, order_by = YEAR)) %>%
  ungroup() %>%
  tabyl(Lag.AI, AINEW)
 Lag.AI   1   2   3   4  5 NA_
      1 545  83  13   1  0   5
      2  86 480 130  20  3   2
      3   5 150 496  76 14   7
      4   2  14  97 167 47   0
      5   0   1   5  53 76   0
     NA  67  41  41  33  8 454