Winter School in Empirical Research Methods

Analyzing Unstructured Data

Instructor: Mark Daku

Course description

The world is awash in massive amounts of data, and more and more of that data appears to be unstructured – it comes in the form of text, audio, video, and images – not in nicely organized datasets. Working with this kind of data requires new tools and approaches and ways of thinking.

With a substantive focus on questions of health and social outcomes, this short course introduces students to techniques for finding structure in unstructured data, with the goal of providing actionable insights. We will work primarily with text data, and the short course will lean more towards the practical aspects of working with unstructured data. The short course should be an appropriate introduction for social science scholars with a good grasp of quantitative social science methods.

Prerequisite 

Basic R Programming

Hardware

A laptop computer is required.

Software

R and RStudio

Structure

The course is organised as such:

  • Day 1: Introduction, R Refresher, and Processing Text Data
  • Day 2: Exploring Unstructured Data
  • Day 3: Finding Latent Structures in Data
  • Day 4: Visualization of Unstructured Data
  • Day 5: Machine Learning & Predictive Data Mining


More details, specific readings, and resources will be provided closer to the course start date.