Pipes: Tee Pipe Load CSV Files

		Data Science Desktop Survival Guide by Graham Williams

CLICK HERE TO VISIT THE UPDATED SURVIVAL GUIDE

Pipes: Tee Pipe Load CSV Files

20210103 This use case loads all .csv.gz files in the data folder into a single data frame and prints a message for each file loaded to monitor progress.

fpath <- "data"
files <- dir(fpath, "*.csv.gz")
ds    <- data.frame()
for (i in seq_len(length(files)))
{
  fpath %>%
    file.path(files[i]) %T>%
    cat("\n") %>%
    readr::read_csv() %>%
    rbind(ds, .) ->
  ds
}

This can be useful when combined with a sub pipeline which is introduced with curly braces. A global assignment operator saves the result for later. The example is a little contrived though illustrative. The sub-pipeline calculates the order of wind gust directions based on the maximum temperature for any day within each group defined by the wind direction. This is then used to order the wind directions, saving it in a global variable lvls. That variable is then used to mutate the original data frame (note the use of the tee pipe) to reorder the levels of the wind_gust_dir variable. This is typically done within a pipeline that feeds into a plot where we want to reorder the levels so that there is some meaning to the order of the bars in a bar plot, for example.

levels(ds$wind_gust_dir)

##  [1] "N"   "NNE" "NE"  "ENE" "E"   "ESE" "SE"  "SSE" "S"   "SSW" "SW"  "...
## [13] "W"   "WNW" "NW"  "NNW"

ds %>%
  filter(rainfall>0) %T>%
  {
    select(., wind_gust_dir, max_temp) %>%
      group_by(wind_gust_dir) %>%
      summarise(max_max_temp=max(max_temp), .groups="drop") %>%
      arrange(max_max_temp) %>%
      pull(wind_gust_dir) -»
    lvls
  } %>%
  mutate(wind_gust_dir=factor(wind_gust_dir, levels=lvls)) %$%
  levels(wind_gust_dir)

##  [1] "SW"  "NNE" "N"   "NE"  "ENE" "E"   "ESE" "SE"  "SSE" "S"   "SSW" "...
## [13] "W"   "WNW" "NW"  "NNW"

Support further development by purchasing the PDF version of the book.
Other online resources include the GNU/Linux Desktop Survival Guide.
Books available on Amazon include Data Mining with Rattle and Essentials of Data Science.
Popular open source software includes rattle and wajig.
Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 2000-2020 Togaware Pty Ltd. . Creative Commons ShareAlike V4.