Data Science Desktop Survival Guide
by Graham Williams |
|||||
Pipes: Tee Pipe Load CSV Files |
20210103 This use case loads all .csv.gz
files in the
data folder into a single data frame and prints a message for each
file loaded to monitor progress.
fpath <- "data"
files <- dir(fpath, "*.csv.gz") ds <- data.frame() for (i in seq_len(length(files))) { fpath %>% file.path(files[i]) %T>% cat("\n") %>% readr::read_csv() %>% rbind(ds, .) -> ds } This can be useful when combined with a sub pipeline which is introduced with curly braces. A global assignment operator saves the result for later. The example is a little contrived though illustrative. The sub-pipeline calculates the order of wind gust directions based on the maximum temperature for any day within each group defined by the wind direction. This is then used to order the wind directions, saving it in a global variable lvls. That variable is then used to mutate the original data frame (note the use of the tee pipe) to reorder the levels of the wind_gust_dir variable. This is typically done within a pipeline that feeds into a plot where we want to reorder the levels so that there is some meaning to the order of the bars in a bar plot, for example. |
levels(ds$wind_gust_dir)
ds %>%
filter(rainfall>0) %T>% { select(., wind_gust_dir, max_temp) %>% group_by(wind_gust_dir) %>% summarise(max_max_temp=max(max_temp), .groups="drop") %>% arrange(max_max_temp) %>% pull(wind_gust_dir) -ยป lvls } %>% mutate(wind_gust_dir=factor(wind_gust_dir, levels=lvls)) %$% levels(wind_gust_dir)
|