Multiple Imputation in Practice
July 13, 2022
Please join as the PDHP workshop series resumes on July 13 with “Multiple Imputation in Practice”, presented by Trivellore Raghunathan (Michigan Survey Research Center; Michigan School of Public Health; Michigan/Maryland Joint Program in Survey Methodology). This half-day workshop is geared toward data analysts from all fields and of all skill levels, and will cover practical applications of multiple imputation (including various data structures and patterns of missing data, as well as analysis of imputed data arising from complex survey designs). Attendees will also receive hands on practice with multiple imputation via sequential regression multivariate imputation (commonly known as chained equations) using the IVEware software (in stand-alone form, and also as a plug-in for SAS, R, and Stata).
- Definition of missing data, including patterns and mechanisms
- Multiple imputation using sequential regression multivariate imputation (AKA chained equations)
- Multiple imputation for complex survey designs and data structures
- Hands-on practice using IVEware software (stand-alone and within SAS, R, and Stata)
Tools For Reproducible Research
March 28, 2022
Please join as we conduct a new PDHP workshop titled “Tools For Reproducible Research”, presented by Alexandru Cernat (associate professor of social statistics, University of Manchester). This half-day workshop will cover the main concepts of reproducible research as well as best practices in the field (including meta-analyses, pre-registration, and sensitivity analysis), while mixing both lecture and practical application. Attendees will also get hands-on practice with state-of-the-art tools of reproducible research, such as research project management using R/RStudio and version control using Github.
- Challenges to social research such as publication bias and specification bias
- Solutions to the reproducibility crisis: meta-analyses, pre-registration, and sensitivity analysis
- Tools for better research workflows: project management (via Rprojects and the renv package), version control via Github, and dynamic documents (via git, usethis and Rmarkdown)
Sequence Analysis for Social Science
February 9, 2022
Please join as we kick off a new season of PDHP workshops, with a workshop entitled “Sequence Analysis for Social Science”, presented by Anette E. Fasang (Humboldt Universität zu Berlin) and Emanuela Struffolino (University of Milan). Sequence analysis, originally developed in biology to analyze strings of DNA, has attracted increasing attention in the social sciences as a key tool for using longitudinal data to analyze life course processes, including labor market careers, transitions to adulthood, and family formation. This workshop covers the theoretical foundation of sequence analysis, basic descriptive tools, as well as the general work-flow of sequence analysis. Hands-on examples using R will demonstrate the basic analytical steps using illustrative data on family and labor market trajectories.
- The theoretical foundation of sequence analysis in the social sciences
- Making informed choices when compiling sequences from a longitudinal dataset
- Description and visualization of sequences using R
- Using output from sequence analysis (e.g. distance matrices, measures of complexity) in further basic analysis (such as cluster analysis or regression)
Introduction to Multilevel Models
August 19, 2021
PDHP resumes our 2021 workshop series on Thursday, August 19th, with a workshop entitled Introduction to Multilevel Models, presented by Dr. Kris Preacher of Vanderbilt University’s Quantitative Methods program (within the Department of Psychology and Human Development). This half-day workshop is geared toward data analysts and researchers of all levels, particularly those performing analysis on hierarchically clustered (nested) data using Mplus, R, or SPSS. Attendees will receive an introduction to the key concepts of multilevel models (appropriate settings for their use over standard statistical models, equation conventions, and interpretation), as well as hands-on practice implementing state-of-the-art features of MLM using popular statistical software packages.
- Key concepts and motivation for MLM vs. standard statistical models
- Estimating and plotting interaction effects
- Implications of nested vs. cross-classified mutlilevel data
- Power analysis for MLM using a general Monte Carlo technique
Sociogenomics & Polygenic Scores
March 16, 2021
PDHP begins our 2021 workshop series on March 16th, with a workshop entitled Sociogenomics & Polygenic Scores, co-presented by Ben Domingue of Stanford University’s Graduate School of Education and Erin Ware of the University of Michigan Population Neurodevelopment & Genetics Group. This half-day workshop is geared toward data analysts interested in combining social science and genetic analysis, and will provide information on the recent history of sociogenomics and a novel approach for examining gene-by-environment interactions, as well as hands-on practice with state-of-art techniques in the field (including creating polygenic scores from simulated plink data using a high-performance computing environment).
- Recent history of sociogenomics
- A novel approach for examining gene-by-environment interactions
- Hands-on introduction to high-performance computing and genetic data types
- Computation of polygenic scores using PRSice2 software
November 18, 2020
PDHP resumes our 2020 workshop series on Nov. 18th, with a workshop entitled Principles of Text Analysis, presented by Patrick van Kessel, senior data scientist at Pew Research Center. This half-day workshop is geared toward data analysts with unstructured text data (e.g. open-ended survey responses or web-curated text), and will provide a tutorial on cleaning, processing, and analyzing data from text-based sources using state-of-the-art text analytics techniques primarily using Python, with some examples also provided in R (experience with either of these languages is recommended but not required).
- Preprocessing and cleaning messy text data
- Feature extraction using TF-IDF vectorization
- Text analytics techniques including topic modelling and unsupervised clustering methods
- Software demonstration featuring the scikitlearn library for Python
February 21, 2020
PDHP kicks off our 2020 workshop series on Feb. 21st, with a workshop entitled Evidence-Based Data Visualization, presented by Dr. Audrey Michal of the Michigan Department of Psychology. This half-day workshop will provide a general introduction to data visualization techniques, while introducing a unique evidence-based approach to data viz design (based on Dr. Michal’s research on visual routines in graph comprehension and interpretation), and different data visualization strategies for data exploration versus data explanation. Attendees will also get hands-on practice creating different types of data visualizations with R software, using GGPlot2 and other state-of-the-art R packages. As always, this workshop is free and open to the public.
- Introduction to data visualization and principles of data viz design
- Evidence-based practices for data viz (from Dr. Michal’s research on graph interpretation)
- Data viz strategies for data exploration vs. explanation
- Hands-on practice creating different types of data visualizations using R’s GGPlot2 package.
November 12, 2019
Please join us for the conclusion of the 2019 PDHP workshop series, as Richard Valliant (University of Michigan & University of Maryland Joint Program in Survey Methodology) presents “A Practical Guide To Survey Weighting“. This workshop will present a comprehensive guide to the design and creation of survey weights, including sampling weights, nonresponse adjustment, and calibration, as well as approaches for weighting non-probability samples.
Additional topics include:
- Stochastic missingness & nonresponse adjustment.
- Calibration techniques including poststratification, raking, and GREG
- Demonstration and hands-on practice using R and Stata.
October 25, 2019
Please join instructor Adam Eck (assistant professor of computer science, Oberlin College), as he conducts a half-day workshop titled “Machine Learning in Survey Research”. This workshop is designed for population/survey researchers and analysts of all skill levels, and will present an introduction to machine learning concepts and their applications to survey research (such as sample frame creation, respondent modelling, and open-ended response coding).
- Introduction to machine learning and its applications to survey research
- Decision trees and random forests
- Deep learning and other neural network-based techniques
- ML techniques to model respondent behaviors, assist with coding of open-ended responses, and more
- Demonstration using R and Python
September 24, 2019
Please join instructor Brady T. West of the University Of Michigan’s Program in Survey Methodology, as he conducts a half-day workshop titled “Design-Based Analysis of Survey Data”. This workshop is designed for survey data analysts of all skill levels, and will present theoretically appropriate methods of analyzing survey data collected from complex sample designs. Dr. West will also present the implications of incorrect analyses based on his research findings from a meta-analysis of analytic error, while also providing examples of proper design-based data analysis techniques using SAS and Stata. As always, this workshop is free and open to the public.
- Overview of theoretically appropriate design-based analysis of survey data collected from complex samples
- Case studies in analytic error (including findings from a meta-analysis of recent scientific publications), and the implications of using incorrect analysis methods
- Appropriate use of survey weights and design-based methods of variance estimation for population inference related to descriptive parameters and regression models
- Examples of proper design-based data analysis techniques using SAS and Stata (attendees are also welcome to ask about similar methods in other software packages)
Part 2 of a multi-part workshop series on record linkage
August 22, 2019
The PDHP workshop series resumes August 22nd with Part 2 of our ongoing Record Linkage series: Linear Regression With Linked Data. This half-day workshop, conducted by Emanuel Ben-David (of the US Census Bureau’s Center for Statistical Research and Methodology) and Martin Slawski (of George Mason University), is geared toward population researchers, computational social scientists, statisticians, and data scientists of all experience levels.
- Overview of record linkage and entity resolution
- Impact of linkage error on regression analyses of linked data files
- Linkage error adjustment and correction methods (including regression techniques and optimal matching)
- Hands-on training and practice of these techniques using R software
Part 1 of a multi-part workshop series on record linkage
July 10, 2019
The PDHP workshop series resumes July 10th with the first in a multi-part series of workshops on record linkage topics & techniques within social research. Please join Assistant Professor Rebecca C. Steorts, PhD, of Duke University’s Department of Statistical Science, as she presents An Introduction to Entity Resolution, a half-day workshop geared toward population researchers, computational social scientists, statisticians, and data scientists of all experience levels. This hands-on workshop will cover both the theory and practice of probabilistic entity resolution, while demonstrating state-of-the-art techniques using R software and Apache Spark.
- Overview and introduction to entity resolution
- Entity resolution fundamentals (record linkage, de-duplication, blocking, and computational gains)
- Entity resolution evaluation metrics (including precision, reduction ratio, and robustness to tuning parameters)
- Bayesian entity resolution models (including both parametric and nonparametric Bayesian mixture models)
- Hands-on demonstration of state-of-the-art R packages (using blink) and computational gains (using Apache Spark)
June 4, 2019
The PDHP workshop series resumes with our first workshop of the summer: Network Analysis: Overview and Applications To Population Science. Please join instructors Ceren Budak and Daniel Romero (both from U of M School of Information and formerly Microsoft Research) for a half-day workshop geared toward population researchers and data scientists of all experience levels. The workshop features 2 hours of lecture (covering fundamental principles and theory of network analysis) followed by 2 hours of lab (simulation-based information diffusion within networks and optimal seed node selection), while exploring the connections between network analysis and social research.
- Basic concepts of networks and network data
- Measuring network properties such as centrality and node/edge importance
- Various models of information diffusion and cascade effects
- Network-based classification methods (including Random Walk and K-nearest neighbors)
- Network simulation using Python
- Impact of seed node selection on network properties.
QMP/SMP Methodological Seminar
This seminar is intended to facilitate collaboration among behavioral, social, health, and data scientists interested in the development and application of adaptive approaches to intervention (prevention, treatment or policy), measurement, and data collection. The term ‘adaptation’, which is broadly defined as ‘the ability to change to suit different conditions’, has different meaning across different domains of behavioral, social and health practice and research. The goal of this seminar is to bridge this gap by exploring and discussing how concepts, tools and procedures used to inform or operationalize adaptation in one domain (e.g., intervention science) can be used to inform or operationalize adaptation in other domains (e.g., survey methodology), and how the various approaches can be used synergistically to advance precision medicine initiatives. For example, how can we improve health by combining ideas from the design of adaptive interventions, which use ongoing information about an individual or context to decide how to modify treatments over time, with ideas from responsive survey design, which uses ongoing information about an individual or context to decide how best to engage an individual in a research survey, and adaptive measurement, which focuses on efficient, low-burden approaches to measuring change in health constructs?
October 23, 2018
Instructors Brady T. West and Paul Schulz are kicking off the new PDHP workshop series with an overview of the Total Survey Error framework and its implications for survey research. This half-day workshop is geared toward survey researchers of all types and experience levels, and will cover the design, implementation, and monitoring of survey data collections using the TSE paradigm as a guiding set of principles. The workshop will use a mix of conceptual discussions and team exercises to explore both the underlying theory and real world applications of the TSE paradigm in survey research.
- Sources of survey error
- Quantifying and evaluating TSE in a data collection
- Implications of TSE for study design
- TSE reduction strategies
- Linking TSE and Responsive / Adaptive Survey Design