Go to TogaWare.com Home Page. Data Science Desktop Survival Guide
by Graham Williams
Duck Duck Go



20180723 The two remaining character variables are: rain_today, rain_tomorrow. Their distributions are generated by dplyr::select()ing from the dataset those variables that start with rain_ and then build a base::table() over those variables. We use base::sapply() to apply base::table() to the selected columns to count the frequency of the occurrence of each value of a variable within the dataset.

# Review the distribution of observations across levels.

ds %>%
  select(starts_with("rain_")) %>%
##     rain_today rain_tomorrow
## No      135371        135353
## Yes      37058         37077

Noting that No and Yes are the only values these two variables will take it makes sense to convert them both to factors. We will keep the ordering as alphabetic and so a simple call to base::factor() will to convert from character to factor.

# Note the names of the rain variables.

ds %>%
  select(starts_with("rain_")) %>%
  names() ->

# Confirm these are currently character variables.

ds[vnames] %>% sapply(class)
##    rain_today rain_tomorrow 
##      "factor"      "factor"

# Convert these variables from character to factor.

ds[vnames] %<>%
  lapply(factor) %>%
  data.frame() %>%

# Confirm they are now factors.

ds[vnames] %>% sapply(class)
##    rain_today rain_tomorrow 
##      "factor"      "factor"

Support further development by purchasing the PDF version of the book.
Other online resources include the GNU/Linux Desktop Survival Guide.
Books available on Amazon include Data Mining with Rattle and Essentials of Data Science.
Popular open source software includes rattle and wajig.
Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 2000-2020 Togaware Pty Ltd. . Creative Commons ShareAlike V4.