Data Science Desktop Survival Guide
by Graham Williams |
|||||
Evaporation and Sunshine |
20180723 The next two character variables are:
evaporation
, sunshine
. It does seem odd that these would be character,
expecting both to be numeric values. If we look at the top of the
dataset we see they have missing values.
# Note the character remaining variables to be dealt with.
head(ds$evaporation)
head(ds$sunshine)
# Review other random values.
sample(ds$evaporation, 8)
sample(ds$sunshine, 8)
The heuristic used to determine the data type when ingesting data only looks at a subset of all the data before it determines the data type. In this case the early observations are all missing and so default to character which is general enough to capture all potential values. We need to convert the variables to numeric.
|
# Identify the vairables to process.
cvars <- c("evaporation", "sunshine") # Check the current class of the variables. ds[cvars] %>% sapply(class)
# Convert to numeric.
ds[cvars] %<>% sapply(as.numeric) # Review some random values. sample(ds$evaporation, 10)
sample(ds$sunshine, 10)
|