Go to TogaWare.com Home Page. Data Science Desktop Survival Guide
by Graham Williams
Duck Duck Go



CLICK HERE TO VISIT THE UPDATED SURVIVAL GUIDE

Pipeline Syntactic Sugar

20210103 For the technically minded we note that what is actually happening here is that the syntax (i.e., how we write the sentences) is changed by R from the traditional functional expression, increasing the ease with which we can read the code. This is important as we keep in mind that we write our code for others (and ourselves later on) to read. The pipeline below combines a series of commands to operate on the dataset.
# Summarise observations with little or no rainfall.

ds %>%
  select(min_temp, max_temp, rainfall, sunshine) %>%
  filter(rainfall < 1) %>%
  summary()
##     min_temp        max_temp       rainfall          sunshine    
##  Min.   :-8.70   Min.   :-2.1   Min.   :0.00000   Min.   : 0.00  
##  1st Qu.: 7.20   1st Qu.:19.1   1st Qu.:0.00000   1st Qu.: 6.20  
##  Median :11.90   Median :23.8   Median :0.00000   Median : 9.30  
##  Mean   :11.97   Mean   :24.3   Mean   :0.05825   Mean   : 8.37  
##  3rd Qu.:16.70   3rd Qu.:29.3   3rd Qu.:0.00000   3rd Qu.:11.00  
##  Max.   :33.90   Max.   :48.9   Max.   :0.90000   Max.   :14.50  
....

Contrast this with how it is mapped by R into the functional construct below, which is how we might have traditionally written it. For many of us it will take quite a bit of effort to parse this traditional functional form of the expression, and so to understand what it is doing. The pipeline alternative above provides a clearer narrative.

# Functional form equivalent to the pipeline above.

summary(filter(select(ds,
                      min_temp, max_temp, rainfall, sunshine),
               rainfall < 1))
##     min_temp        max_temp       rainfall          sunshine    
##  Min.   :-8.70   Min.   :-2.1   Min.   :0.00000   Min.   : 0.00  
##  1st Qu.: 7.20   1st Qu.:19.1   1st Qu.:0.00000   1st Qu.: 6.20  
##  Median :11.90   Median :23.8   Median :0.00000   Median : 9.30  
##  Mean   :11.97   Mean   :24.3   Mean   :0.05825   Mean   : 8.37  
##  3rd Qu.:16.70   3rd Qu.:29.3   3rd Qu.:0.00000   3rd Qu.:11.00  
##  Max.   :33.90   Max.   :48.9   Max.   :0.90000   Max.   :14.50  
....

Anything that improves the readability of our code is useful. Computers are quite capable of doing the hard work of transforming a simpler sentence into this much more complex looking sentence for its own purposes. For our purposes, let's keep it simple for others to follow.


Support further development by purchasing the PDF version of the book.
Other online resources include the GNU/Linux Desktop Survival Guide.
Books available on Amazon include Data Mining with Rattle and Essentials of Data Science.
Popular open source software includes rattle and wajig.
Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 2000-2020 Togaware Pty Ltd. . Creative Commons ShareAlike V4.