Go to TogaWare.com Home Page. Data Science Desktop Survival Guide
by Graham Williams
Duck Duck Go


Pipeline Identity Operator

REVIEW A handy trick when building a pipeline is to use what is effectively the identity operator. An identity operator simply takes the data being communicated through the pipeline, and without changing it passes it on to the next operator in the pipeline. An effective identity operator is constructed with the syntax {.}. In R terms this is a compound statement containing just the period whereby the period represents the data. Effectively this is an operator that passes the data through without processing it—an identity operator. Why is this useful? Whilst we are building our pipeline, one line at a time, we will be wanting to put a pipe at the end of each line, but of cause we can not do so if there is no following operator. Also, whilst debugging a pipeline, we may want to execute only a part of it, and so the identity operator is handy there too. As a typical scenario we might be in the process of building a pipeline as here and find that including the tidyr::https://www.rdocumentation.org/packages/tidyr/topics/the end of the line of the dplyr::select() operation:
ds %>%
  select(rainfall, min_temp, max_temp, sunshine) %>%
## # A tibble: 176,747 x 4
##    rainfall min_temp max_temp sunshine
##       <dbl>    <dbl>    <dbl>    <dbl>
##  1      0.6     13.4     22.9       NA
##  2      0        7.4     25.1       NA
##  3      0       12.9     25.7       NA

We then add the next operation into the pipeline without having to modify any of the code already present:

ds %>%
  select(rainfall, min_temp, max_temp, sunshine) %>%
  summary() %>%
##     rainfall          min_temp        max_temp        sunshine    
##  Min.   :  0.000   Min.   :-8.70   Min.   :-4.10   Min.   : 0.00  
##  1st Qu.:  0.000   1st Qu.: 7.50   1st Qu.:18.10   1st Qu.: 4.90  
##  Median :  0.000   Median :12.00   Median :22.80   Median : 8.50  
##  Mean   :  2.241   Mean   :12.15   Mean   :23.36   Mean   : 7.66  
##  3rd Qu.:  0.600   3rd Qu.:16.90   3rd Qu.:28.40   3rd Qu.:10.60  
##  Max.   :474.000   Max.   :33.90   Max.   :48.90   Max.   :14.50  
##  NA's   :4318      NA's   :2349    NA's   :2105    NA's   :93859

And so on. Whilst it appears quite a minor convenience, over time as we build more pipelines, this becomes quite a handy trick.

Support further development by purchasing the PDF version of the book.
Other online resources include the GNU/Linux Desktop Survival Guide.
Books available on Amazon include Data Mining with Rattle and Essentials of Data Science.
Popular open source software includes rattle and wajig.
Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 2000-2020 Togaware Pty Ltd. . Creative Commons ShareAlike V4.