3.11 Pipeline Syntactic Sugar

20210103 For the technically minded we note that what is actually happening here is that the syntax (i.e., how we write the sentences) is changed by R from the traditional functional expression, increasing the ease with which we can read the code. This is important as we keep in mind that we write our code for others (and ourselves later on) to read.

The pipeline below combines a series of commands to operate on the dataset.

# Summarise observations with little or no rainfall.

ds %>% 
  select(min_temp, max_temp, rainfall, sunshine) %>%
  filter(rainfall < 1) %>%
  summary()
##     min_temp        max_temp       rainfall          sunshine    
##  Min.   :-8.70   Min.   :-2.1   Min.   :0.00000   Min.   : 0.00  
##  1st Qu.: 7.20   1st Qu.:19.1   1st Qu.:0.00000   1st Qu.: 6.20  
##  Median :11.90   Median :23.8   Median :0.00000   Median : 9.30  
##  Mean   :11.97   Mean   :24.3   Mean   :0.05825   Mean   : 8.37  
##  3rd Qu.:16.70   3rd Qu.:29.3   3rd Qu.:0.00000   3rd Qu.:11.00  
##  Max.   :33.90   Max.   :48.9   Max.   :0.90000   Max.   :14.50  
##  NA's   :569     NA's   :612                      NA's   :70700

Contrast this with how it is mapped by R into the functional construct below, which is how we might have traditionally written it. For many of us it will take quite a bit of effort to parse this traditional functional form of the expression, and so to understand what it is doing. The pipeline alternative above provides a clearer narrative.

# Functional form equivalent to the pipeline above.

summary(filter(select(ds, 
                      min_temp, max_temp, rainfall, sunshine),
               rainfall < 1))
##     min_temp        max_temp       rainfall          sunshine    
##  Min.   :-8.70   Min.   :-2.1   Min.   :0.00000   Min.   : 0.00  
##  1st Qu.: 7.20   1st Qu.:19.1   1st Qu.:0.00000   1st Qu.: 6.20  
##  Median :11.90   Median :23.8   Median :0.00000   Median : 9.30  
##  Mean   :11.97   Mean   :24.3   Mean   :0.05825   Mean   : 8.37  
##  3rd Qu.:16.70   3rd Qu.:29.3   3rd Qu.:0.00000   3rd Qu.:11.00  
##  Max.   :33.90   Max.   :48.9   Max.   :0.90000   Max.   :14.50  
##  NA's   :569     NA's   :612                      NA's   :70700

Anything that improves the readability of our code is useful. Computers are quite capable of doing the hard work of transforming a simpler sentence into this much more complex looking sentence for its own purposes. For our purposes, let’s keep it simple for others to follow.



Your donation will support ongoing development and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0.