An Introduction to Entity Resolution

Part 1 of a multi-part workshop series on record linkage

July 10th, 2019, 9am Р1pm,  1430 ISR Thompson

Workshop video & materials

The PDHP workshop series resumes July 10th with the first in a multi-part series of workshops on record linkage topics & techniques within social research. Please join Assistant Professor Rebecca C. Steorts, PhD, of Duke University’s Department of Statistical Science, as she presents An Introduction to Entity Resolution, a half-day workshop geared toward population researchers, computational social scientists, statisticians, and data scientists of all experience levels. This hands-on workshop will cover both the theory and practice of probabilistic entity resolution, while demonstrating state-of-the-art techniques using R software and Apache Spark.

Topics include:

  • Overview and introduction to entity resolution
  • Entity resolution fundamentals (record linkage, de-duplication, blocking, and computational gains)
  • Entity resolution evaluation metrics (including precision, reduction ratio, and robustness to tuning parameters)
  • Bayesian entity resolution models (including both parametric and nonparametric Bayesian mixture models)
  • Hands-on demonstration of state-of-the-art R packages (using blink) and computational gains (using Apache Spark)


View Past PDHP Workshops