Data Science Desktop Survival Guide
by Graham Williams |
|||||
Scatter Plot Colour Alternative |
20200608
ds %>%
sample_n(1000) %>% ggplot(aes(x=min_temp, y=max_temp, colour=rain_tomorrow)) + geom_point() + scale_colour_brewer(palette="Set2") + labs(x = vnames["min_temp"], y = vnames["max_temp"], colour = vnames["rain_tomorrow"])
The simplest plot is a scatter plot which displays points scattered over a plot. If the dataset is large the resulting plot will be rather dense. For illustrative purposes a random subset of just 1,000 observations is used. A linear relationship between the two variables can be seen. The random sample of 1,000 rows is generated using dplyr::sample_n() and is then piped through to ggplot2::ggplot(). The function argument identifies the aesthetics of the plot so that x= associates the variable min_temp with the x-axis and y= associates the variable max_temp with the y-axis. In addition the colour= option provides a mechanism to distinguish between days where the observation rain_tomorrow is Yex and where it is No. A colour palette can be chosen using ggplot2::scale_colour_brewer(). A graphical layer is added to the plot consisting of points coloured appropriately. The function ggplot2::geom_point() achieves this. The original variable names stored as vnames are used to label the plot using ggplot2::labs(). The original names will make more sens to the reader than our chosen normalised names.
|