15.1 Machine Learning Setup

20200514 Packages used in this chapter include magrittr (Bache and Wickham 2020), and rattle (G. Williams 2020).

Packages are loaded into the currently running R session from your local library directories on disk. Missing packages can be installed using utils::install.packages() within R. On Ubuntu, for example, R packages can also be installed using $ wajig install r-cran-<pkgname>.

# Load required packages from local library into the R session.

library(magrittr)     # Data pipelines: %>% %<>% %T>% equals().
library(rattle)       # Dataset: weather.

The rattle::weatherAUS dataset is loaded into the template variable ds and further template variables are setup as introduced by Graham J. Williams (2017). See Chapter 8 for details.

dsname <- "weatherAUS"
ds     <- get(dsname)
    
nobs   <- nrow(ds)

vnames <- names(ds)
ds    %<>% clean_names(numerals="right")
names(vnames) <- names(ds)

vars   <- names(ds)
target <- "rain_tomorrow"
vars   <- c(target, vars) %>% unique() %>% rev()

The variable form is used in this chapter as the formula describing the model to be built.

form
## rain_tomorrow ~ .
ds  %>% sample_frac()
## # A tibble: 176,747 x 24
##    date       location     min_temp max_temp rainfall evaporation sunshine
##    <date>     <chr>           <dbl>    <dbl>    <dbl>       <dbl>    <dbl>
##  1 2015-09-14 Nuriootpa         8.7     18.6      0           4.9      8.8
##  2 2014-02-22 Katherine        19.6     34.5      0           7.4     NA  
##  3 2010-07-10 PearceRAAF        7.7     17.3     NA          NA        4.4
##  4 2019-12-10 Albury           15.3     36.3      0          NA       NA  
##  5 2012-05-14 Albany           10.1     19.2      7.8         3.8      6.7
##  6 2012-03-11 MountGambier      4.5     22.3      0           2.2      6.1
##  7 2008-09-06 Melbourne         7.3     18.9      0           2.6      7.3
##  8 2017-10-20 NorahHead        17.4     19.8      4          NA       NA  
##  9 2009-08-03 Melbourne         8.8     15.4      3.2         3.6      7.1
## 10 2014-04-17 MountGinini       3.2     12.4      0          NA       NA  
## # … with 176,737 more rows, and 17 more variables: wind_gust_dir <ord>,
## #   wind_gust_speed <dbl>, wind_dir_9am <ord>, wind_dir_3pm <ord>,
## #   wind_speed_9am <dbl>, wind_speed_3pm <dbl>, humidity_9am <int>,
## #   humidity_3pm <int>, pressure_9am <dbl>, pressure_3pm <dbl>,
## #   cloud_9am <int>, cloud_3pm <int>, temp_9am <dbl>, temp_3pm <dbl>,
## #   rain_today <fct>, risk_mm <dbl>, rain_tomorrow <fct>

References

Bache, Stefan Milton, and Hadley Wickham. 2020. Magrittr: A Forward-Pipe Operator for r. https://CRAN.R-project.org/package=magrittr.
Williams, Graham. 2020. Rattle: Graphical User Interface for Data Science in r. https://rattle.togaware.com/.
Williams, Graham J. 2017. The Essentials of Data Science: Knowledge Discovery Using r. The r Series. CRC Press.


Your donation will support ongoing development and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0.