19.1 Clustering Setup

20200902 Packages used in this chapter include biclust (Kaiser et al. 2020).

# Load required packages from local library into the R session.

library(biclust)      # Bicluster analysis.
library(dplyr)        # Wrangling: glimpse() group_by() print() select() mutate().
library(rattle)       # Weather dataset.

The rattle::weatherAUS dataset is loaded into the template variable ds and further template variables are setup as introduced by Graham J. Williams (2017). See Chapter 8 for details.

dsname <- "weatherAUS"
ds     <- get(dsname)
    
nobs   <- nrow(ds)

vnames <- names(ds)
ds    %<>% clean_names(numerals="right")
names(vnames) <- names(ds)

vars   <- names(ds)
target <- "rain_tomorrow"
vars   <- c(target, vars) %>% unique() %>% rev()

It is always useful to remind ourselves of the dataset with a random sample:

ds  %>% sample_frac() %>% select(date, location, sample(3:length(vars), 5))
## # A tibble: 191,431 x 7
##    date       location     sunshine wind_speed_9am rain_tomorrow humidity_9am
##    <date>     <chr>           <dbl>          <dbl> <fct>                <int>
##  1 2017-12-24 MountGambier     NA               28 No                      52
##  2 2014-10-06 Woomera           5.6             31 No                      15
##  3 2021-02-08 Wollongong       NA               31 Yes                     75
##  4 2014-09-22 Albany            9               24 No                      54
##  5 2020-05-24 MountGambier     NA               19 No                      77
##  6 2009-09-05 GoldCoast        NA               17 No                      97
##  7 2014-11-16 Newcastle        NA                4 No                      77
##  8 2011-12-24 Hobart            7.2              7 Yes                     72
##  9 2014-10-06 Portland          1.7             13 Yes                     90
## 10 2009-04-19 CoffsHarbour      9.5             33 Yes                     60
## # … with 191,421 more rows, and 1 more variable: temp_9am <dbl>


Your donation will support ongoing development and give you access to the PDF version of the book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0.