Go to TogaWare.com Home Page. Data Science Desktop Survival Guide
by Graham Williams
Duck Duck Go



CLICK HERE TO VISIT THE UPDATED SURVIVAL GUIDE

Pipe Operator

20210103 A function (Section 3.5) performs an action on input data and returns the results of those actions as the output from the function. They are the verbs of the language—the action words of our sentences. As we learn new functions we will construct longer sentences that string together a sequence of verbs to deliver the outcomes. This is a powerful programming concept, combining dedicated, well designed and implemented functions, each focused on achieving a specific outcome. To combine single focus functions into more complex operations we use the powerful concept of pipes. Pipes will be familiar to command line users of Unix and Linux. The idea is to pass the output of one function on as the input to another function. Each function does one task very well, very accurately, and very simply from a user's point of view. We can pipe together many such specialist functions to deliver very complex and quite sophisticated data transformations in an easily accessible manner. Pipes were introduced in R through the magrittr package (%>%) and became part of base R in 2021 (|>). To illustrate the concept of pipes recall the contents of the dataset (rattle::weatherAUS):
# Review the dataset of weather observations.

ds
## # A tibble: 176,747 x 24
##    date       location min_temp max_temp rainfall evaporation sunshine
##    <date>     <chr>       <dbl>    <dbl>    <dbl>       <dbl>    <dbl>
##  1 2008-12-01 Albury       13.4     22.9      0.6          NA       NA
##  2 2008-12-02 Albury        7.4     25.1      0            NA       NA
....

We might be interested in the distribution of specific numeric variables. For that we will dplyr::select() a few numeric variables using a pipe.

# Select variables from the dataset.

ds %>%
  select(min_temp, max_temp, rainfall, sunshine)
## # A tibble: 176,747 x 4
##    min_temp max_temp rainfall sunshine
##       <dbl>    <dbl>    <dbl>    <dbl>
##  1     13.4     22.9      0.6       NA
##  2      7.4     25.1      0         NA
....

Typing ds by itself lists the whole dataset. Piping the whole dataset to dplyr::select() using the pipe tidyr::https://www.rdocumentation.org/packages/tidyr/topics/end result returned as the output of the pipeline is a subset of the original dataset containing just the named columns.


Support further development by purchasing the PDF version of the book.
Other online resources include the GNU/Linux Desktop Survival Guide.
Books available on Amazon include Data Mining with Rattle and Essentials of Data Science.
Popular open source software includes rattle and wajig.
Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 2000-2020 Togaware Pty Ltd. . Creative Commons ShareAlike V4.