Go to TogaWare.com Home Page. Data Science Desktop Survival Guide
by Graham Williams
Duck Duck Go

Add Counts

20200814 Using dplyr::add_count() a new column will be added to the dataset recording the size of groups. The column name will be n.

ds %<>%
  add_count(location) %T>%
  {
    select(., date, location, n) %>%
    sample_frac() %>%
    print()
  }
## # A tibble: 176,747 x 3
##    date       location             n
##    <date>     <chr>            <int>
##  1 2009-02-11 MountGinini       3680
##  2 2009-06-01 PerthAirport      3648
##  3 2010-05-12 MelbourneAirport  3649
##  4 2018-08-17 Sale              3649
##  5 2015-03-15 Hobart            3833
##  6 2017-07-19 Canberra          4076
##  7 2010-12-12 Brisbane          3833
##  8 2013-04-06 Richmond          3649
##  9 2008-10-11 Perth             3832
## 10 2014-12-16 NorfolkIsland     3649
## # ... with 176,737 more rows

names(ds)
##  [1] "date"            "location"        "min_temp"        "max_temp"   ...
##  [5] "rainfall"        "evaporation"     "sunshine"        "wind_gust_di...
##  [9] "wind_gust_speed" "wind_dir_9am"    "wind_dir_3pm"    "wind_speed_9...
## [13] "wind_speed_3pm"  "humidity_9am"    "humidity_3pm"    "pressure_9am...
## [17] "pressure_3pm"    "cloud_9am"       "cloud_3pm"       "temp_9am"   ...
## [21] "temp_3pm"        "rain_today"      "risk_mm"         "rain_tomorro...
## [25] "n"


Support further development by purchasing the PDF version of the book.
Other online resources include the GNU/Linux Desktop Survival Guide.
Books available on Amazon include Data Mining with Rattle and Essentials of Data Science.
Popular open source software includes rattle and wajig.
Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 2000-2020 Togaware Pty Ltd. . Creative Commons ShareAlike V4.