Data Science Desktop Survival Guide
by Graham Williams
Scatter Plot Colour Alternative
ggplot(aes(x=min_temp, y=max_temp, colour=rain_tomorrow)) +
labs(x = vnames["min_temp"],
y = vnames["max_temp"],
colour = vnames["rain_tomorrow"])
The simplest plot is a scatter plot which displays points scattered over a plot. If the dataset is large the resulting plot will be rather dense. For illustrative purposes a random subset of just 1,000 observations is used. A linear relationship between the two variables can be seen.
The random sample of 1,000 rows is generated using dplyr::sample_n() and is then piped through to ggplot2::ggplot(). The function argument identifies the aesthetics of the plot so that x= associates the variable min_temp with the x-axis and y= associates the variable max_temp with the y-axis.
In addition the colour= option provides a mechanism to distinguish between days where the observation rain_tomorrow is Yex and where it is No. A colour palette can be chosen using ggplot2::scale_colour_brewer().
A graphical layer is added to the plot consisting of points coloured appropriately. The function ggplot2::geom_point() achieves this.
The original variable names stored as vnames are used to label the plot using ggplot2::labs(). The original names will make more sens to the reader than our chosen normalised names.