This course is aimed at the intermediate R developer who wants to learn how to do useful text and sentiment analysis tasks in R. It will focus on “modern R”, specifically using the tidyverse collection of packages which are designed for data science.

Text and sentiment analysis is a huge topic and we couldn’t possibly cover it all in one short course. The purpose of this workshop is to give you an introduction to some of the most useful tools and to demonstrate some of the most common problems that surface.

This workshop assumes you have knowledge of R equivalent to that covered in Beginning R, Intermediate R and Introduction to Data Analysis in R.

You can jump ahead to any chapter:

  1. Tokenising - walkthrough video
  2. Sentiment analysis - walkthrough video
  3. Regular Expressions - walkthrough video
  4. Word clouds - walkthrough video
  5. n-grams - walkthrough video
  6. Summary - walkthrough video

For the purposes of this workshop we will be using RStudio. If you haven’t installed and got RStudio working, then please follow the instructions in the Intermediate R workshop.

Just as in Introduction to Data Analysis in R we will be writing R markdown notebooks using RStudio. Instructions on how to do this are given here.

We will use a range of packages in this workshop. To install them, please run;

install.packages(c("tidyverse",
                   "tidytext",
                   "textdata",
                   "gutenbergr",
                   "wordcloud",
                   "igraph",
                   "ggraph"))