Machine Learning in Survey Research

Adam Eck

October 25th, 2019

Machine Learning in Survey Research

 

Please join instructor Adam Eck (assistant professor of computer science, Oberlin College), as he conducts a half-day workshop titled “Machine Learning in Survey Research”.  This workshop is designed for population/survey researchers and analysts of all skill levels, and will present an introduction to machine learning concepts and their applications to survey research (such as sample frame creation, respondent modelling, and open-ended response coding).

Topics Include:
  • Introduction to machine learning and its applications to survey research
  • Decision trees and random forests
  • Deep learning and other neural network-based techniques
  • ML techniques to model respondent behaviors, assist with coding of open-ended responses, and more
  • Demonstration using R and Python

Slides & Lab Materials:

Software:

The lab portion of this workshop will be mirrored in both R (using R Markdown) and Python (using Jupyter Notebook).

R Users:

R software (required)

-The lab uses 5 R packages (caret, rpart, rpart.plot, randomForest, and mxnet), which can be installed using the code below.

## install packages
install.packages(c("caret","rpart","rpart.plot","randomForest","mxnet"))

Python Users:

Python3 (required)

–The lab uses 6 Python libraries (pandasscipyscikit-learnIPythongraphvizand pydotplus), which can be installed using pip with the code below:

python -m pip install --upgrade pip
python -m pip install -U pandas
python -m pip install -U scipy
python -m pip install -U scikit-learn
python -m pip install -U IPython
python -m pip install -U graphviz
python -m pip install -U pydotplus
 
NOTE: All of the Python libraries used in this workshop plus many other useful libraries for data science and machine learning can be installed in one shot by installing the Anaconda Python wrapper.