10.52 Rain
20180723 The two remaining character variables are:
rain_today
, rain_tomorrow
. Their distributions are generated by
dplyr::select()ing from the dataset those variables that
start with rain_
and then build a base::table() over
those variables. We use base::sapply() to apply
base::table() to the selected columns to count the
frequency of the occurrence of each value of a variable within the
dataset.
# Review the distribution of observations across levels.
ds %>%
select(starts_with("rain_")) %>%
sapply(table)
## rain_today rain_tomorrow
## No 171174 171165
## Yes 48919 48929
Noting that No
and Yes
are the only values these two
variables will take it makes sense to convert them both to
factors. We will keep the ordering as alphabetic and so a simple call
to base::factor() will to convert from character to factor.
# Note the names of the rain variables.
ds %>%
select(starts_with("rain_")) %>%
names() ->
vnames
# Confirm these are currently character variables.
ds[vnames] %>% sapply(class)
## rain_today rain_tomorrow
## "factor" "factor"
# Convert these variables from character to factor.
ds[vnames] %<>%
lapply(factor) %>%
data.frame() %>%
as_tibble()
# Confirm they are now factors.
ds[vnames] %>% sapply(class)
## rain_today rain_tomorrow
## "factor" "factor"
Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0