3.8 Pipeline

20210103 Suppose we want to produce a base::summary() of a selection of numeric variables. We can pipe the output of dplyr::select() into base::summary().

# Select variables from the dataset and summarise the result.

ds %>% 
  select(min_temp, max_temp, rainfall, sunshine) %>%
  summary()
##     min_temp        max_temp        rainfall          sunshine    
##  Min.   :-8.70   Min.   :-4.10   Min.   :  0.000   Min.   : 0.00  
##  1st Qu.: 7.50   1st Qu.:18.10   1st Qu.:  0.000   1st Qu.: 4.90  
##  Median :12.00   Median :22.80   Median :  0.000   Median : 8.50  
##  Mean   :12.15   Mean   :23.36   Mean   :  2.241   Mean   : 7.66  
##  3rd Qu.:16.90   3rd Qu.:28.40   3rd Qu.:  0.600   3rd Qu.:10.60  
##  Max.   :33.90   Max.   :48.90   Max.   :474.000   Max.   :14.50  
##  NA's   :2349    NA's   :2105    NA's   :4318      NA's   :93859

Perhaps we would like to review only those observations where there is more than a little rain on the day of the observation. To do so we stats::filter() the observations.

# Select specific variables and observations from the dataset.

ds %>% 
  select(min_temp, max_temp, rainfall, sunshine) %>%
  filter(rainfall >= 1)
## # A tibble: 39,122 x 4
##    min_temp max_temp rainfall sunshine
##       <dbl>    <dbl>    <dbl>    <dbl>
##  1     17.5     32.3      1         NA
##  2     13.1     30.1      1.4       NA
##  3     15.9     21.7      2.2       NA
##  4     15.9     18.6     15.6       NA
##  5     12.6     21        3.6       NA
##  6     13.5     22.9     16.8       NA
##  7     11.2     22.5     10.6       NA
##  8     12.5     24.2      1.2       NA
##  9     18.8     35.2      6.4       NA
## 10     14.6     29        3         NA
## # … with 39,112 more rows

This sequence of functions operating on the original rattle::weatherAUS dataset returns a subset of that dataset where all observations have at least 1mm of rain.



Your donation will support ongoing development and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0.