10.3 Add Columns
20200814 Adding columns to a dataset is accomplished with dplyr::mutate(). Within a pipeline dplyr::mutate() will modify the data as it passes through.
In the example below we add two new columns to the dataset. A tee pipe is used to print a sample of the resulting dataset using dplyr::select() and dplyr::sample_frac() within the curly braces. The ongoing pipe assigns the result into a new variable.
ds %>%
mutate(range_temp=max_temp-min_temp,
describe_temp=case_when(max_temp > 30 ~ "hot",
max_temp > 20 ~ "mild",
max_temp > 0 ~ "cold",
TRUE ~ "freezing")) %T>%
{
select(., date, location, ends_with("_temp")) %>%
sample_frac() %>%
print()
} ->
newds
## # A tibble: 226,868 × 6
## date location min_temp max_temp range_temp describe_temp
## <date> <chr> <dbl> <dbl> <dbl> <chr>
## 1 2020-06-09 NorfolkIsland 14.2 19.4 5.2 cold
## 2 2018-06-09 Brisbane 15.6 23.7 8.1 mild
## 3 2020-09-24 Watsonia 7.2 14.4 7.2 cold
## 4 2023-03-21 Richmond 16.9 23.2 6.3 mild
## 5 2018-05-14 SalmonGums 12.1 21.8 9.7 mild
## 6 2020-08-02 MountGinini 0.5 8.6 8.1 cold
## 7 2007-12-12 Canberra 11.7 21.5 9.8 mild
## 8 2017-08-24 Sydney 11.2 18 6.8 cold
## 9 2009-06-23 Ballarat 7.3 14.1 6.8 cold
## 10 2011-09-10 WaggaWagga 3 14.7 11.7 cold
## # ℹ 226,858 more rows
To overwrite the original dataset instead of saving it as a new dataset, replace the first pipe with an assignment pipe magrittr::%<>%.
Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0