Data Science Desktop Survival Guide by Graham Williams Desktop Survival Project Home Preface Data Science Introducing R R Constructs R Tasks R Strings R Read, Write, and Create Data Template Data Exploration Data Wrangling Data Visualisation Statistics ML Template ML Scenarios ML Activities ML Applications ML Algorithms Cluster Analysis Decision Trees Computer Vision Graph Data Privacy Literate Data Science Coding with Style Resources Bibliography Index

## Pipe and Plot

Raw A common scenario for pipeline processing is to prepare data for plotting. Indeed, plotting itself has a pipeline type concept where we build a plot by adding layers to it. Below the rattle::weatherAUS dataset is stats::filter()ed for observations from four Australian cities. We stats::filter() observations that have missing values for the variable Temp3pm using an embedded pipeline. The embedded pipeline pipes the Temp3pm data through the base::is.na() function which tests if the value is missing. These results are then piped to magrittr::not() which inverts the true/false values so that we include those that are not missing. A plot is generated using ggplot2::ggplot() into which we pipe the processed dataset. We add a geometric layer using ggplot2::geom_density() which consists of a density plot with transparency specified through the alpha= argument. We also add a title and label the axes using ggplot2::labs().
 cities <- c("Canberra", "Darwin", "Melbourne", "Sydney") weatherAUS %>%   filter(Location %in% cities) %>%   filter(Temp3pm %>% is.na() %>% not()) %>%   ggplot(aes(x=Temp3pm, colour=Location, fill=Location)) +   geom_density(alpha=0.55) +   labs(title = "Density Distributions of the 3pm Temperature",        x     = "Temperature Recorded at 3pm",        y     = "Density") We now observe and tell a story from the plot. Our narrative will begin with the observation that Darwin has quite a different and warmer pattern of temperatures at 3pm than Canberra, Melbourne and Sydney.