Data Science Desktop Survival Guide
by Graham Williams |
|||||
Tee Pipe Use Case: Load all CSV Files |
REVIEW Example of usage of the tee operator. Load
all .csv.gz
files in the data folder into a single data frame
and include a message for each file loaded to monitor progress.
fpath <- "data"
files <- dir(fpath, "*.csv.gz") ds <- data.frame() for (i in seq_len(length(files))) { ds <- fpath %>% file.path(files[i]) %T>% cat("\n") %>% readr::read_csv() %>% rbind(ds, .) } This can also be useful when combined with an on the fly function which is introduced with curly braces. We can use the global assignment operator to define variables that can be used later on in the pipeline. This example is a little contrived but illustrates its use. The on the fly function calculates the order of wind gust directions based on the maximum temperature for any day within each group defined by the wind direction. This is then used to order the wind directions and saving that order to a global variable lvls. That variable is then used to mutate the original data frame (note the use of the tee pipe) to reorder the levels of the wind_gust_dir variable. This is typically done within a pipeline that feeds into a plot where we want to reorder the levels so that there is some meaning to the order of the bars in a bar plot, for example. |
# List the levels of a factor.
levels(ds$wind_gust_dir)
#
## `summarise()` ungrouping output (override with `.groups` argument)ds %>% filter(rainfall>0) %T>% { lvls «- select(., wind_gust_dir, max_temp) %>% group_by(wind_gust_dir) %>% summarise(Maxmax_temp=max(max_temp)) %>% arrange(Maxmax_temp) %>% pull(wind_gust_dir) } %>% mutate(wind_gust_dir=factor(wind_gust_dir, levels=lvls)) %$% levels(wind_gust_dir) ## [1] "SW" "NNE" "N" "NE" "ENE" "E" "ESE" "SE" "SSE" "S" "SSW" "... ## [13] "W" "WNW" "NW" "NNW" |