Data Science Desktop Survival Guide by Graham Williams Desktop Survival Project Home Preface Data Science Introducing R R Constructs R Tasks R Strings R Read, Write, and Create Data Template Data Exploration Data Wrangling Data Visualisation Statistics ML Template ML Scenarios ML Activities ML Applications ML Algorithms Cluster Analysis Decision Trees Computer Vision Graph Data Privacy Literate Data Science Coding with Style Resources Bibliography Index

## Rain

20180723 The two remaining character variables are: `rain_today`, `rain_tomorrow`. Their distributions are generated by dplyr::select()ing from the dataset those variables that start with `rain_` and then build a base::table() over those variables. We use base::sapply() to apply base::table() to the selected columns to count the frequency of the occurrence of each value of a variable within the dataset.

# Review the distribution of observations across levels.

ds %>%
select(starts_with("rain_")) %>%
sapply(table)
 ```## rain_today rain_tomorrow ## No 135371 135353 ## Yes 37058 37077 ```

Noting that `No` and `Yes` are the only values these two variables will take it makes sense to convert them both to factors. We will keep the ordering as alphabetic and so a simple call to base::factor() will to convert from character to factor.

# Note the names of the rain variables.

ds %>%
select(starts_with("rain_")) %>%
names() ->
vnames

# Confirm these are currently character variables.

ds[vnames] %>% sapply(class)
 ```## rain_today rain_tomorrow ## "factor" "factor" ```

# Convert these variables from character to factor.

ds[vnames] %<>%
lapply(factor) %>%
data.frame() %>%
as_tibble()

# Confirm they are now factors.

ds[vnames] %>% sapply(class)
 ```## rain_today rain_tomorrow ## "factor" "factor" ```