Add Columns

		Data Science Desktop Survival Guide by Graham Williams

CLICK HERE TO VISIT THE UPDATED SURVIVAL GUIDE

Add Columns

20200814 Adding columns to a dataset is accomplished with dplyr::mutate(). Within a pipeline dplyr::mutate() will modify the data as it passes through.

In the example below we add two new columns to the dataset. A tee pipe is used to print a sample of the resulting dataset using dplyr::select() and dplyr::sample_frac() within the curly braces. The ongoing pipe assigns the result into a new variable.

ds %>%
  mutate(range_temp=max_temp-min_temp,
         describe_temp=case_when(max_temp > 30 ~ "hot",
                                 max_temp > 20 ~ "mild",
                                 max_temp > 0 ~ "cold",
                                 TRUE          ~ "freezing")) %T>%
  {
    select(., date, location, ends_with("_temp")) %>%
    sample_frac() %>%
    print()
  } ->
newds

## # A tibble: 176,747 x 6
##    date       location      min_temp max_temp range_temp describe_temp
##    <date>     <chr>            <dbl>    <dbl>      <dbl> <chr>        
##  1 2015-06-28 SydneyAirport      6.9     17.6       10.7 cold         
##  2 2013-06-15 MountGambier       7.5     15.3        7.8 cold         
##  3 2010-01-23 Bendigo           14.2     27         12.8 mild         
##  4 2011-07-06 Cairns            11.4     26.6       15.2 mild         
##  5 2013-04-01 Tuggeranong        7.4     22.2       14.8 mild         
##  6 2011-03-08 Adelaide          21.4     23.2        1.8 mild         
##  7 2010-06-13 PearceRAAF         7.8     24.4       16.6 mild         
##  8 2015-08-20 Ballarat           2.5     14         11.5 cold         
##  9 2018-01-15 Canberra           8.3     26         17.7 mild         
## 10 2018-08-17 Launceston         3.5     12.7        9.2 cold         
## # ... with 176,737 more rows

To overwrite the original dataset instead of saving it as a new dataset, replace the first pipe with an assignment pipe magrittr::https://www.rdocumentation.org/packages/magrittr/topics/

Support further development by purchasing the PDF version of the book.
Other online resources include the GNU/Linux Desktop Survival Guide.
Books available on Amazon include Data Mining with Rattle and Essentials of Data Science.
Popular open source software includes rattle and wajig.
Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 2000-2020 Togaware Pty Ltd. . Creative Commons ShareAlike V4.