2 Introducing R

Data scientists write programs to ingest, fuse, clean, wrangle, visualise, analyse, and model data. Programming over data is a core task for the data scientist. We will primarily use R (R Core Team 2021) and in particular the tidyverse as our programming language and assume basic familiarity of R as may be gained from the many resources available on the Intranet, particularly from https://cran.r-project.org/manuals.html.

The development of the tidyverse has been instrumental in bringing R into the modern data science era and the resources provided by RStudio and the tidyverse community are extensive. In particular, as you develop your data analyses, be sure to have the RStudio cheatsheets for the tidyverse in front of you. You will find them invaluable. Visit https://rstudio.com/resources/cheatsheets/.

Programmers of data develop sentences or code. Code instructs a computer to perform specific tasks. A collection of sentences written in a language is what we might call a program. Through programming by example and learning by immersion we will share programs to deliver insights and outcomes from our data.

R is a large and complex ecosystem for the practice of data science. There is much freely available information on the Internet from which we can continually learn and borrow useful code segments that illustrate almost any task we might think of. We introduce here the basics for getting started with R, libraries and packages which extend the language, and the concepts of functions, commands, and operators.


R Core Team. 2021. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

