Machine Learning in Survey Research
Adam Eck
October 25th, 2019
Please join instructor Adam Eck (assistant professor of computer science, Oberlin College), as he conducts a half-day workshop titled “Machine Learning in Survey Research”. This workshop is designed for population/survey researchers and analysts of all skill levels, and will present an introduction to machine learning concepts and their applications to survey research (such as sample frame creation, respondent modelling, and open-ended response coding).
- Introduction to machine learning and its applications to survey research
- Decision trees and random forests
- Deep learning and other neural network-based techniques
- ML techniques to model respondent behaviors, assist with coding of open-ended responses, and more
- Demonstration using R and Python
Slides & Lab Materials:
- A zip file containing slides, sample data, and all R and Python code is available here
- A sample dataset for use in the lab is available here
- Pre-compiled versions of the lab (R version and Python version)
Software:
The lab portion of this workshop will be mirrored in both R (using R Markdown) and Python (using Jupyter Notebook).
R Users:
–R software (required)
-The lab uses 5 R packages (caret, rpart, rpart.plot, randomForest, and mxnet), which can be installed using the code below.
## install packages
install.packages(c("caret","rpart","rpart.plot","randomForest","mxnet"))
Python Users:
–Python3 (required)
–The lab uses 6 Python libraries (pandas, scipy, scikit-learn, IPython, graphviz, and pydotplus), which can be installed using pip with the code below:
python -m pip install --upgrade pip
python -m pip install -U pandas
python -m pip install -U scipy
python -m pip install -U scikit-learn
python -m pip install -U IPython
python -m pip install -U graphviz
python -m pip install -U pydotplus