An Introduction to Entity Resolution

(PDHP Record Linkage Workshop Series Part 1 – July 10th, 2019)

Part 1 of a multi-part workshop series on record linkage

July 10th, 2019

The PDHP workshop series resumes July 10th with the first in a multi-part series of workshops on record linkage topics & techniques within social research. Please join Assistant Professor Rebecca C. Steorts, PhD, of Duke University’s Department of Statistical Science, as she presents An Introduction to Entity Resolution, a half-day workshop geared toward population researchers, computational social scientists, statisticians, and data scientists of all experience levels. This hands-on workshop will cover both the theory and practice of probabilistic entity resolution, while demonstrating state-of-the-art techniques using R software and Apache Spark.

Topics include:

  • Overview and introduction to entity resolution
  • Entity resolution fundamentals (record linkage, de-duplication, blocking, and computational gains)
  • Entity resolution evaluation metrics (including precision, reduction ratio, and robustness to tuning parameters)
  • Bayesian entity resolution models (including both parametric and nonparametric Bayesian mixture models)
  • Hands-on demonstration of state-of-the-art R packages (using blink) and computational gains (using Apache Spark)

 

Software:

Demos for this workshop are conducted using R and rely upon the user installing a handful of specific R packages and a data package from Github.

Install R (required)

R packages and example data can be installed using the following code:

## install packages
install.packages(c("devtools", "RecordLinkage", "blink", "knitr", "ggplot2",
                   "igraph", "textreuse", "tokenizers", "numbers"))

## install data package
devtools::install_git("https://github.com/resteorts/RLdata")

Workshop Slides & Materials:

The 4-hour PDHP workshop is a shortened version of a fuller shortcourse that is available online.  Sections presented live at PDHP by Dr. Steorts are denoted with “(Michigan)”.

Need an accessible version of content on this page? Request an accessible resource . Accessibility Statement