Linear Regression With Linked Data

August 22nd, 2019

Part 2 of a multi-part workshop series on record linkage

The PDHP workshop series resumes August 22nd with Part 2 of our ongoing Record Linkage series: Linear Regression With Linked Data.  This half-day workshop, conducted by Emanuel Ben-David (of the US Census Bureau’s Center for Statistical Research and Methodology) and Martin Slawski (of George Mason University), is geared toward population researchers, computational social scientists, statisticians, and data scientists of all experience levels.

Topics include:

  • Overview of record linkage and entity resolution
  • Impact of linkage error on regression analyses of linked data files
  • Linkage error adjustment and correction methods (including regression techniques and optimal matching)
  • Hands-on training and practice of these techniques using R software

Software:

Demos for this workshop are conducted using R and rely upon the user installing two specific R packages.

Install R (required)

fastLink package for R (required)

MASS package for R (required)

R packages and example data can be installed using the following code:

## install packages
install.packages(c("fastLink","MASS"))