Go to TogaWare.com Home Page. Data Science Desktop Survival Guide
by Graham Williams
Duck Duck Go



CLICK HERE TO VISIT THE UPDATED SURVIVAL GUIDE

Tee Pipe Use Case: Load all CSV Files

REVIEW Example of usage of the tee operator. Load all .csv.gz files in the data folder into a single data frame and include a message for each file loaded to monitor progress.

fpath <- "data"
files <- dir(fpath, "*.csv.gz")
ds <- data.frame()
for (i in seq_len(length(files)))
{
  ds <- fpath %>%
    file.path(files[i]) %T>%
    cat("\n") %>%
    readr::read_csv() %>%
    rbind(ds, .)
}

This can also be useful when combined with an on the fly function which is introduced with curly braces. We can use the global assignment operator to define variables that can be used later on in the pipeline. This example is a little contrived but illustrates its use. The on the fly function calculates the order of wind gust directions based on the maximum temperature for any day within each group defined by the wind direction. This is then used to order the wind directions and saving that order to a global variable lvls. That variable is then used to mutate the original data frame (note the use of the tee pipe) to reorder the levels of the wind_gust_dir variable. This is typically done within a pipeline that feeds into a plot where we want to reorder the levels so that there is some meaning to the order of the bars in a bar plot, for example.

# List the levels of a factor.

levels(ds$wind_gust_dir)
##  [1] "N"   "NNE" "NE"  "ENE" "E"   "ESE" "SE"  "SSE" "S"   "SSW" "SW"  "...
## [13] "W"   "WNW" "NW"  "NNW"
#

ds %>%
  filter(rainfall>0) %T>%
  {
    lvls «- select(., wind_gust_dir, max_temp) %>%
      group_by(wind_gust_dir) %>%
      summarise(Maxmax_temp=max(max_temp)) %>%
      arrange(Maxmax_temp) %>%
      pull(wind_gust_dir)
  } %>%
  mutate(wind_gust_dir=factor(wind_gust_dir, levels=lvls)) %$%
  levels(wind_gust_dir)
## `summarise()` ungrouping output (override with `.groups` argument)
##  [1] "SW"  "NNE" "N"   "NE"  "ENE" "E"   "ESE" "SE"  "SSE" "S"   "SSW" "...
## [13] "W"   "WNW" "NW"  "NNW"


Support further development by purchasing the PDF version of the book.
Other online resources include the GNU/Linux Desktop Survival Guide.
Books available on Amazon include Data Mining with Rattle and Essentials of Data Science.
Popular open source software includes rattle and wajig.
Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 2000-2020 Togaware Pty Ltd. . Creative Commons ShareAlike V4.