Principles of Text Analysis
November 18, 2020, 9am-1pm, Virtual via Zoom
PDHP resumes our 2020 workshop series on Nov. 18th, with a workshop entitled Principles of Text Analysis, presented by Patrick van Kessel, senior data scientist at Pew Research Center. This half-day workshop is geared toward data analysts with unstructured text data (e.g. open-ended survey responses or web-curated text), and will provide a tutorial on cleaning, processing, and analyzing data from text-based sources using state-of-the-art text analytics techniques primarily using Python, with some examples also provided in R (experience with either of these languages is recommended but not required).
- Preprocessing and cleaning messy text data
- Feature extraction using TF-IDF vectorization
- Text analytics techniques including topic modelling and unsupervised clustering methods
- Software demonstration featuring the scikitlearn library for Python