3.7 Pipe Operator

20210103 A function (Section 3.5) performs an action on input data and returns the results of those actions as the output from the function. They are the verbs of the language—the action words of our sentences.

As we learn new functions we will construct longer sentences that string together a sequence of verbs to deliver the outcomes. This is a powerful programming concept, combining dedicated, well designed and implemented functions, each focused on achieving a specific outcome.

To combine single focus functions into more complex operations we use the powerful concept of pipes. Pipes will be familiar to command line users of Unix and Linux. The idea is to pass the output of one function on as the input to another function. Each function does one task very well, very accurately, and very simply from a user’s point of view. We can pipe together many such specialist functions to deliver very complex and quite sophisticated data transformations in an easily accessible manner.

Pipes were introduced in R through the magrittr (Bache and Wickham 2020) package (%>%) and became part of base R in 2021 with version 4.1 (|>).

To illustrate the concept of pipes recall the contents of the dataset (rattle::weatherAUS):

# Review the dataset of weather observations.

ds

## # A tibble: 176,747 x 24
##    date       location min_temp max_temp rainfall evaporation sunshine
##    <date>     <chr>       <dbl>    <dbl>    <dbl>       <dbl>    <dbl>
##  1 2008-12-01 Albury       13.4     22.9      0.6          NA       NA
##  2 2008-12-02 Albury        7.4     25.1      0            NA       NA
##  3 2008-12-03 Albury       12.9     25.7      0            NA       NA
##  4 2008-12-04 Albury        9.2     28        0            NA       NA
##  5 2008-12-05 Albury       17.5     32.3      1            NA       NA
##  6 2008-12-06 Albury       14.6     29.7      0.2          NA       NA
##  7 2008-12-07 Albury       14.3     25        0            NA       NA
##  8 2008-12-08 Albury        7.7     26.7      0            NA       NA
##  9 2008-12-09 Albury        9.7     31.9      0            NA       NA
## 10 2008-12-10 Albury       13.1     30.1      1.4          NA       NA
## # … with 176,737 more rows, and 17 more variables: wind_gust_dir <ord>,
## #   wind_gust_speed <dbl>, wind_dir_9am <ord>, wind_dir_3pm <ord>,
## #   wind_speed_9am <dbl>, wind_speed_3pm <dbl>, humidity_9am <int>,
## #   humidity_3pm <int>, pressure_9am <dbl>, pressure_3pm <dbl>,
## #   cloud_9am <int>, cloud_3pm <int>, temp_9am <dbl>, temp_3pm <dbl>,
## #   rain_today <fct>, risk_mm <dbl>, rain_tomorrow <fct>

We might be interested in the distribution of specific numeric variables. For that we will dplyr::select() a few numeric variables using a pipe.

# Select variables from the dataset.

ds %>%
  select(min_temp, max_temp, rainfall, sunshine)

## # A tibble: 176,747 x 4
##    min_temp max_temp rainfall sunshine
##       <dbl>    <dbl>    <dbl>    <dbl>
##  1     13.4     22.9      0.6       NA
##  2      7.4     25.1      0         NA
##  3     12.9     25.7      0         NA
##  4      9.2     28        0         NA
##  5     17.5     32.3      1         NA
##  6     14.6     29.7      0.2       NA
##  7     14.3     25        0         NA
##  8      7.7     26.7      0         NA
##  9      9.7     31.9      0         NA
## 10     13.1     30.1      1.4       NA
## # … with 176,737 more rows

Typing ds by itself lists the whole dataset. Piping the whole dataset to dplyr::select() using the pipe %>% selects the named variables. The end result returned as the output of the pipeline is a subset of the original dataset containing just the named columns.

References

Bache, Stefan Milton, and Hadley Wickham. 2020. Magrittr: A Forward-Pipe Operator for r. https://CRAN.R-project.org/package=magrittr.

Your donation will support ongoing development and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0.