Data Science Desktop Survival Guide
by Graham Williams |
|||||
Rain |
20180723 The two remaining character variables are:
rain_today
, rain_tomorrow
. Their distributions are generated by
dplyr::select()ing from the dataset those variables that
start with rain_
and then build a base::table() over
those variables. We use base::sapply() to apply
base::table() to the selected columns to count the
frequency of the occurrence of each value of a variable within the
dataset.
# Review the distribution of observations across levels.
ds %>% select(starts_with("rain_")) %>% sapply(table)
Noting that
|
# Note the names of the rain variables.
ds %>% select(starts_with("rain_")) %>% names() -> vnames # Confirm these are currently character variables. ds[vnames] %>% sapply(class)
# Convert these variables from character to factor.
ds[vnames] %<>% lapply(factor) %>% data.frame() %>% as_tibble() # Confirm they are now factors. ds[vnames] %>% sapply(class)
|