Data Science Desktop Survival Guide
by Graham Williams |
|||||
Data Review |
20180721 Having ingested the dataset and normalised the variable names we can now explore more. Using dplyr::glimpse() gives us some insight:
# Review the dataset.
glimpse(ds)
Observe the variety of data types here, ranging from Date (date), through character (chr) and numeric (dbl). The data mostly looks as expected though it is odd that evaporation and sunshine are identified as character. Probably because they seem to be all missing, at least in the first 10 or so observations. We begin question other aspects of the data too. For example, is date an ongoing sequence of days as it appears to be here? Does location have values other than Albury? What is the distribution of the different variables? These are all questions we will start asking ourselves in the context of “living and breathing” our data. Our aim should be to gleam all we can about the data that we are dealing with. Data science is very much about understanding, not blindly processing. The excitement is in the discovery of patterns in the data and the narrative the data is seeking to tell.
|