10.3 Add Columns

20200814 Adding columns to a dataset is accomplished with dplyr::mutate(). Within a pipeline dplyr::mutate() will modify the data as it passes through.

In the example below we add two new columns to the dataset. A tee pipe is used to print a sample of the resulting dataset using dplyr::select() and dplyr::sample_frac() within the curly braces. The ongoing pipe assigns the result into a new variable.

ds %>%
  mutate(range_temp=max_temp-min_temp,
         describe_temp=case_when(max_temp > 30 ~ "hot",
                                 max_temp > 20 ~ "mild",
                                 max_temp >  0 ~ "cold",
                                 TRUE          ~ "freezing")) %T>%
  {
    select(., date, location, ends_with("_temp")) %>%
    sample_frac() %>%
    print()
  } ->
newds

## # A tibble: 226,868 × 6
##    date       location      min_temp max_temp range_temp describe_temp
##    <date>     <chr>            <dbl>    <dbl>      <dbl> <chr>        
##  1 2020-06-09 NorfolkIsland     14.2     19.4        5.2 cold         
##  2 2018-06-09 Brisbane          15.6     23.7        8.1 mild         
##  3 2020-09-24 Watsonia           7.2     14.4        7.2 cold         
##  4 2023-03-21 Richmond          16.9     23.2        6.3 mild         
##  5 2018-05-14 SalmonGums        12.1     21.8        9.7 mild         
##  6 2020-08-02 MountGinini        0.5      8.6        8.1 cold         
##  7 2007-12-12 Canberra          11.7     21.5        9.8 mild         
##  8 2017-08-24 Sydney            11.2     18          6.8 cold         
##  9 2009-06-23 Ballarat           7.3     14.1        6.8 cold         
## 10 2011-09-10 WaggaWagga         3       14.7       11.7 cold         
## # ℹ 226,858 more rows

To overwrite the original dataset instead of saving it as a new dataset, replace the first pipe with an assignment pipe magrittr::%<>%.

Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0