21.34 Word Length Counts
A simple plot is then effective in showing the distribution of the word lengths. Here we create a single column data frame that is passed on to ggplot2::ggplot() to generate a histogram, with a vertical line to show the mean length of words.
data.frame(nletters=nchar(words)) %>%
ggplot(aes(x=nletters)) +
geom_histogram(binwidth=1) +
geom_vline(xintercept=mean(nchar(words)),
colour="green", size=1, alpha=.5) +
labs(x="Number of Letters", y="Number of Words")
Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0
