15.1 Machine Learning Setup

20200514 Packages used in this chapter include magrittr (Bache and Wickham 2020), and rattle (G. Williams 2021).

Packages are loaded into the currently running R session from your local library directories on disk. Missing packages can be installed using utils::install.packages() within R. On Ubuntu, for example, R packages can also be installed using $ wajig install r-cran-<pkgname>.

# Load required packages from local library into the R session.

library(magrittr)     # Data pipelines: %>% %<>% %T>% equals().
library(rattle)       # Dataset: weather.

The rattle::weatherAUS dataset is loaded into the template variable ds and further template variables are setup as introduced by Graham J. Williams (2017). See Chapter 8 for details.

dsname <- "weatherAUS"
ds     <- get(dsname)
    
nobs   <- nrow(ds)

vnames <- names(ds)
ds    %<>% clean_names(numerals="right")
names(vnames) <- names(ds)

vars   <- names(ds)
target <- "rain_tomorrow"
vars   <- c(target, vars) %>% unique() %>% rev()

The variable form is used in this chapter as the formula describing the model to be built.

form
## rain_tomorrow ~ .
ds  %>% sample_frac()
## # A tibble: 191,431 x 24
##    date       location         min_temp max_temp rainfall evaporation sunshine
##    <date>     <chr>               <dbl>    <dbl>    <dbl>       <dbl>    <dbl>
##  1 2018-09-28 MountGinini           4       13        0          NA       NA  
##  2 2014-10-31 Richmond             11.1     37        0          NA       NA  
##  3 2010-06-11 Adelaide              9.6     16.1      1.4         0.8      3.9
##  4 2012-03-25 Cairns               23.8     32.6      0           6.6      2.2
##  5 2018-06-21 Witchcliffe          NA       17.8     NA          NA       NA  
##  6 2011-07-22 MelbourneAirport      5       14.4      1           0.2      8.8
##  7 2019-06-04 Williamtown           7       12.4      0.4        NA       NA  
##  8 2014-04-03 NorahHead            19.4     27        0          NA       NA  
##  9 2017-08-12 Nhil                  8.5     16.1      0.4        NA       NA  
## 10 2017-05-25 Nuriootpa             5.9     18.8      0.1         2       NA  
## # … with 191,421 more rows, and 17 more variables: wind_gust_dir <ord>,
## #   wind_gust_speed <dbl>, wind_dir_9am <ord>, wind_dir_3pm <ord>,
## #   wind_speed_9am <dbl>, wind_speed_3pm <dbl>, humidity_9am <int>,
## #   humidity_3pm <int>, pressure_9am <dbl>, pressure_3pm <dbl>,
## #   cloud_9am <int>, cloud_3pm <int>, temp_9am <dbl>, temp_3pm <dbl>,
## #   rain_today <fct>, risk_mm <dbl>, rain_tomorrow <fct>


Your donation will support ongoing development and give you access to the PDF version of the book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0.