PDHP Workshop Series

Principles of Text Analysis

November 18, 2020, 9am-1pm,  Virtual via Zoom

Workshop Materials & Video

Principles of Text Analysis, presented by Patrick van Kessel

PDHP resumes our 2020 workshop series on Nov. 18th, with a workshop entitled Principles of Text Analysis, presented by Patrick van Kessel, senior data scientist at Pew Research Center.  This half-day workshop is geared toward data analysts with unstructured text data (e.g. open-ended survey responses or web-curated text), and will provide a tutorial on cleaning, processing, and analyzing data from text-based sources using state-of-the-art text analytics techniques primarily using Python, with some examples also provided in R (experience with either of these languages is recommended but not required).

Topics include:

  • Preprocessing and cleaning messy text data
  • Feature extraction using TF-IDF vectorization
  • Text analytics techniques including topic modelling and unsupervised clustering methods
  • Software demonstration featuring the scikitlearn library for Python


