19.1 Clustering Setup

THIS SECTION IS UNDER DEVELOPMENT. PLEASE CHECK BACK LATER

20200902 The R packages used in this chapter include biclust (Kaiser et al. 2023).

# Load required packages from local library into the R session.

library(biclust)      # Bicluster analysis.
library(dplyr)        # Wrangling: glimpse() group_by() print() select() mutate().
library(rattle)       # Weather dataset.

The rattle::weatherAUS dataset is loaded into the template variable ds and further template variables are setup as introduced by Graham J. Williams (2017). See Chapter 8 for details.

dsname <- "weatherAUS"
ds     <- get(dsname)
    
nobs   <- nrow(ds)

vnames <- names(ds)
ds    %<>% clean_names(numerals="right")
names(vnames) <- names(ds)

vars   <- names(ds)
target <- "rain_tomorrow"
vars   <- c(target, vars) %>% unique() %>% rev()

It is always useful to remind ourselves of the dataset with a random sample:

ds  %>% sample_frac() %>% select(date, location, sample(3:length(vars), 5))
## # A tibble: 226,868 × 7
##    date       location     wind_speed_3pm min_temp humidity_9am evaporation
##    <date>     <chr>                 <dbl>    <dbl>        <int>       <dbl>
##  1 2015-03-22 Cobar                    11     16.9           56         8.6
##  2 2019-05-15 Bendigo                  19      5             97        NA  
##  3 2020-02-16 NorahHead                28     20.4           89        NA  
##  4 2022-02-08 CoffsHarbour             17     17.3           62        NA  
##  5 2013-03-24 Tuggeranong              19      5.4           85        NA  
##  6 2019-10-27 Adelaide                 15     11.6           58        NA  
....

References

Kaiser, Sebastian, Rodrigo Santamaria, Tatsiana Khamiakova, Martin Sill, Roberto Theron, Luis Quintales, Friedrich Leisch, Ewoud De Troyer, and Sami Leon. 2023. Biclust: BiCluster Algorithms.
Williams, Graham J. 2017. The Essentials of Data Science: Knowledge Discovery Using r. The r Series. CRC Press.


Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0