Word clouds can be a useful visualisation to quickly view the most used words in a block of text. R has the wordcloud function in the wordcloud package that can be used to quickly generate these plots.
To use it, we will first load the packages that will be used in this chapter;
library(tidyverse)
library(tidytext)
library(gutenbergr)
library(wordcloud)
We are going to create word clouds from the tidied text of The Adventure of Sherlock Holmes. We start by downloading the text and tidying it up, as we did in the previous chapters.
sherlock <- gutenberg_download(1661) %>%
mutate(linenumber=row_number()) %>%
unnest_tokens(word, text) %>% anti_join(stop_words)
We want the size each word in the wordcloud to depend on the number of times it appears. We thus create a tibble of these counts.
sherlock_counts <- sherlock %>% count(word, sort=TRUE)
The wordcloud function is very easy to use. You just need to pass in a column of words (sherlock_counts$word) and a column of their counts (sherlock_counts$n). We need to restrict the plot to only displaying the common words, or else we will end up overloading R. We do this by only displaying words that appear 25 or more times, using min.freq=25. The random.order argument is set to FALSE so that the plot is reproducible (as much as possible - the option tells R to plot the words in order of descreasing frequency, rather than a random order).
wordcloud(sherlock_counts$word, sherlock_counts$n,
min.freq=25, random.order=FALSE)
You can also use the with function to make this part of a pipeline, e.g.
sherlock_counts %>% with(wordcloud(word, n,
min.freq=25, random.order=FALSE))
You can also colour the words from highest frequency to least by adding a colour palette, e.g.
sherlock_counts %>% with(wordcloud(word, n,
min.freq=25,
random.order=FALSE,
colors=brewer.pal(8, "Dark2")))
All of the available colour palettes are listed here - http://applied-r.com/rcolorbrewer-palettes/
You can also display the ones available in your R installation using;
display.brewer.all()
Draw a coloured word cloud of words that appear in your chosen book.
What about colouring words according to their sentiment? First we would need to recreate the table that has the sentiment counts, using the same procedure as the previous chapter.
sentiments <- get_sentiments("nrc")
sherlock_sentiments <- sherlock %>%
inner_join(sentiments) %>%
count(word, sentiment, sort=TRUE) %>%
pivot_wider(names_from=sentiment,
values_from=n,
values_fill=0)
We could now draw word clouds for specific sentiments by picking out specific columns, e.g. here are all of the common words associated with “joy”.
sherlock_sentiments %>% with(wordcloud(word, joy,
min.freq=10,
random.color = FALSE,
colors = brewer.pal(8, "PuOr")))
And here are the common words associated with “fear”.
sherlock_sentiments %>% with(wordcloud(word, fear,
min.freq=10,
random.color = FALSE,
colors = brewer.pal(12, "Paired")))
Create some sentiment word clouds for your chosen book, exploring different sentiments. What are the advantages and disadvantages of word clouds?