 Data Science Desktop Survival Guide by Graham Williams Desktop Survival Project Home Preface Data Science Introducing R R Constructs R Tasks R Strings R Read, Write, and Create Data Template Data Exploration Data Wrangling Data Visualisation Statistics ML Template ML Scenarios ML Activities ML Applications ML Algorithms Cluster Analysis Decision Trees Computer Vision Graph Data Privacy Literate Data Science Coding with Style Resources Bibliography Index

## Scatter Plot Colour Alternative

20200608 ds %>%   sample_n(1000) %>%   ggplot(aes(x=min_temp, y=max_temp, colour=rain_tomorrow)) +   geom_point() +   scale_colour_brewer(palette="Set2") +   labs(x      = vnames["min_temp"],        y      = vnames["max_temp"],        colour = vnames["rain_tomorrow"]) The simplest plot is a scatter plot which displays points scattered over a plot. If the dataset is large the resulting plot will be rather dense. For illustrative purposes a random subset of just 1,000 observations is used. A linear relationship between the two variables can be seen. The random sample of 1,000 rows is generated using dplyr::sample_n() and is then piped through to ggplot2::ggplot(). The function argument identifies the aesthetics of the plot so that x= associates the variable min_temp with the x-axis and y= associates the variable max_temp with the y-axis. In addition the colour= option provides a mechanism to distinguish between days where the observation rain_tomorrow is Yex and where it is No. A colour palette can be chosen using ggplot2::scale_colour_brewer(). A graphical layer is added to the plot consisting of points coloured appropriately. The function ggplot2::geom_point() achieves this. The original variable names stored as vnames are used to label the plot using ggplot2::labs(). The original names will make more sens to the reader than our chosen normalised names.