3.11 Pipeline Identity Operator
REVIEW A handy trick when building a pipeline is to
use what is effectively the identity operator. An identity operator
simply takes the data being communicated through the pipeline, and
without changing it passes it on to the next operator in the
pipeline. An effective identity operator is constructed with the
syntax {.}
. In R terms this is a compound statement
containing just the period whereby the period represents the
data. Effectively this is an operator that passes the data through
without processing it—an identity operator.
Why is this useful? Whilst we are building our pipeline, one line at a time, we will be wanting to put a pipe at the end of each line, but of cause we can not do so if there is no following operator. Also, whilst debugging a pipeline, we may want to execute only a part of it, and so the identity operator is handy there too.
As a typical scenario we might be in the process of building a
pipeline as here and find that including the %>%
at
the end of the line of the dplyr::select() operation:
## # A tibble: 226,868 × 4
## rainfall min_temp max_temp sunshine
## <dbl> <dbl> <dbl> <dbl>
## 1 0.6 13.4 22.9 NA
## 2 0 7.4 25.1 NA
## 3 0 12.9 25.7 NA
## 4 0 9.2 28 NA
## 5 1 17.5 32.3 NA
## 6 0.2 14.6 29.7 NA
## 7 0 14.3 25 NA
## 8 0 7.7 26.7 NA
## 9 0 9.7 31.9 NA
## 10 1.4 13.1 30.1 NA
## # ℹ 226,858 more rows
We then add the next operation into the pipeline without having to modify any of the code already present:
## rainfall min_temp max_temp sunshine
## Min. : 0.000 Min. :-8.70 Min. :-4.10 Min. : 0.00
## 1st Qu.: 0.000 1st Qu.: 7.50 1st Qu.:17.90 1st Qu.: 4.90
## Median : 0.000 Median :11.90 Median :22.60 Median : 8.50
## Mean : 2.348 Mean :12.09 Mean :23.21 Mean : 7.63
## 3rd Qu.: 0.600 3rd Qu.:16.80 3rd Qu.:28.20 3rd Qu.:10.60
## Max. :474.000 Max. :33.90 Max. :48.90 Max. :14.50
## NA's :6775 NA's :3800 NA's :3630 NA's :132637
And so on. Whilst it appears quite a minor convenience, over time as we build more pipelines, this becomes quite a handy trick.
Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0