10.3 Add Columns

20200814 Adding columns to a dataset is accomplished with dplyr::mutate(). Within a pipeline dplyr::mutate() will modify the data as it passes through.

In the example below we add two new columns to the dataset. A tee pipe is used to print a sample of the resulting dataset using dplyr::select() and dplyr::sample_frac() within the curly braces. The ongoing pipe assigns the result into a new variable.

ds %>%
  mutate(range_temp=max_temp-min_temp,
         describe_temp=case_when(max_temp > 30 ~ "hot",
                                 max_temp > 20 ~ "mild",
                                 max_temp >  0 ~ "cold",
                                 TRUE          ~ "freezing")) %T>%
  {
    select(., date, location, ends_with("_temp")) %>%
    sample_frac() %>%
    print()
  } ->
newds
## # A tibble: 191,431 x 6
##    date       location         min_temp max_temp range_temp describe_temp
##    <date>     <chr>               <dbl>    <dbl>      <dbl> <chr>        
##  1 2014-06-18 Richmond              5.8     19.6      13.8  cold         
##  2 2014-09-05 MountGinini          -2.3      4.9       7.2  cold         
##  3 2018-06-16 Uluru                 1.9     20.6      18.7  mild         
##  4 2013-12-22 NorfolkIsland        17.1     23.5       6.40 mild         
##  5 2019-07-18 MelbourneAirport      5.6     14.4       8.8  cold         
##  6 2012-05-03 Cobar                 8.2     16.1       7.9  cold         
##  7 2009-05-11 Melbourne             8.2     17.8       9.6  cold         
##  8 2012-11-20 Launceston            6.9     20.9      14.0  mild         
##  9 2010-08-11 Woomera               5.7     16.6      10.9  cold         
## 10 2018-08-29 MountGambier          1.5     16.8      15.3  cold         
## # … with 191,421 more rows

To overwrite the original dataset instead of saving it as a new dataset, replace the first pipe with an assignment pipe magrittr::%<>%.



Your donation will support ongoing development and give you access to the PDF version of the book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0.