Go to TogaWare.com Home Page. Data Science Desktop Survival Guide
by Graham Williams
Duck Duck Go



CLICK HERE TO VISIT THE UPDATED SURVIVAL GUIDE

Wrangling Data Review

It is always useful to remind ourselves of the dataset with a random sample:

ds  %>% sample_frac() %>% select(date, location, sample(3:length(vars), 5))
## # A tibble: 176,747 x 7
##    date       location   humidity_9am cloud_9am max_temp temp_9am rain_t...
##    <date>     <chr>             <int>     <int>    <dbl>    <dbl> <fct> ...
##  1 2013-07-10 Walpole              97        NA     17       11   No    ...
##  2 2010-01-25 SydneyAir~           80         8     26.9     22.3 No    ...
##  3 2009-12-18 Sale                 64         2     22.4     14.9 No    ...
##  4 2019-08-11 PerthAirp~           62         0     22.4     13.9 No    ...
##  5 2018-05-11 Albury               91         8      9.9      5.7 Yes   ...
##  6 2013-04-22 Penrith              90        NA     24.4     16   No    ...
##  7 2013-06-16 Dartmoor            100        NA     13.2      5.1 Yes   ...
##  8 2017-08-11 CoffsHarb~           29        NA     27.7     21.9 No    ...
##  9 2019-10-26 Perth                56         0     28.9     21.3 No    ...
## 10 2013-10-25 WaggaWagga           60         0     19.3      9.6 No    ...
## # ... with 176,737 more rows

glimpse(ds)
## Rows: 176,747
## Columns: 24
## $ date            <date> 2008-12-01, 2008-12-02, 2008-12-03, 2008-12-04,...
## $ location        <chr> "Albury", "Albury", "Albury", "Albury", "Albury"...
## $ min_temp        <dbl> 13.4, 7.4, 12.9, 9.2, 17.5, 14.6, 14.3, 7.7, 9.7...
## $ max_temp        <dbl> 22.9, 25.1, 25.7, 28.0, 32.3, 29.7, 25.0, 26.7, ...
## $ rainfall        <dbl> 0.6, 0.0, 0.0, 0.0, 1.0, 0.2, 0.0, 0.0, 0.0, 1.4...
## $ evaporation     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ sunshine        <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ wind_gust_dir   <ord> W, WNW, WSW, NE, W, WNW, W, W, NNW, W, N, NNE, W...
## $ wind_gust_speed <dbl> 44, 44, 46, 24, 41, 56, 50, 35, 80, 28, 30, 31, ...
## $ wind_dir_9am    <ord> W, NNW, W, SE, ENE, W, SW, SSE, SE, S, SSE, NE, ...
## $ wind_dir_3pm    <ord> WNW, WSW, WSW, E, NW, W, W, W, NW, SSE, ESE, ENE...
## $ wind_speed_9am  <dbl> 20, 4, 19, 11, 7, 19, 20, 6, 7, 15, 17, 15, 28, ...
## $ wind_speed_3pm  <dbl> 24, 22, 26, 9, 20, 24, 24, 17, 28, 11, 6, 13, 28...
## $ humidity_9am    <int> 71, 44, 38, 45, 82, 55, 49, 48, 42, 58, 48, 89, ...
## $ humidity_3pm    <int> 22, 25, 30, 16, 33, 23, 19, 19, 9, 27, 22, 91, 9...
## $ pressure_9am    <dbl> 1007.7, 1010.6, 1007.6, 1017.6, 1010.8, 1009.2, ...
## $ pressure_3pm    <dbl> 1007.1, 1007.8, 1008.7, 1012.8, 1006.0, 1005.4, ...
## $ cloud_9am       <int> 8, NA, NA, NA, 7, NA, 1, NA, NA, NA, NA, 8, 8, N...
....


Support further development by purchasing the PDF version of the book.
Other online resources include the GNU/Linux Desktop Survival Guide.
Books available on Amazon include Data Mining with Rattle and Essentials of Data Science.
Popular open source software includes rattle and wajig.
Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 2000-2020 Togaware Pty Ltd. . Creative Commons ShareAlike V4.