Go to TogaWare.com Home Page. Data Science Desktop Survival Guide
by Graham Williams
Duck Duck Go

Wrangling Data Review

It is useful to review a random sample of the dataset again:

ds  %>% sample_frac() %>% select(date, location, sample(3:length(vars), 5))
## # A tibble: 176,747 x 7
##    date       location   humidity_9am cloud_9am max_temp temp_9am rain_t...
##    <date>     <chr>             <int>     <int>    <dbl>    <dbl> <fct> ...
##  1 2013-07-10 Walpole              97        NA     17       11   No    ...
##  2 2010-01-25 SydneyAir~           80         8     26.9     22.3 No    ...
##  3 2009-12-18 Sale                 64         2     22.4     14.9 No    ...
##  4 2019-08-11 PerthAirp~           62         0     22.4     13.9 No    ...
##  5 2018-05-11 Albury               91         8      9.9      5.7 Yes   ...
##  6 2013-04-22 Penrith              90        NA     24.4     16   No    ...
##  7 2013-06-16 Dartmoor            100        NA     13.2      5.1 Yes   ...
##  8 2017-08-11 CoffsHarb~           29        NA     27.7     21.9 No    ...
##  9 2019-10-26 Perth                56         0     28.9     21.3 No    ...
## 10 2013-10-25 WaggaWagga           60         0     19.3      9.6 No    ...
## # ... with 176,737 more rows

glimpse(ds)
## Rows: 176,747
## Columns: 24
## $ date            <date> 2008-12-01, 2008-12-02, 2008-12-03, 2008-12-04,...
## $ location        <chr> "Albury", "Albury", "Albury", "Albury", "Albury"...
## $ min_temp        <dbl> 13.4, 7.4, 12.9, 9.2, 17.5, 14.6, 14.3, 7.7, 9.7...
## $ max_temp        <dbl> 22.9, 25.1, 25.7, 28.0, 32.3, 29.7, 25.0, 26.7, ...
## $ rainfall        <dbl> 0.6, 0.0, 0.0, 0.0, 1.0, 0.2, 0.0, 0.0, 0.0, 1.4...
## $ evaporation     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ sunshine        <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ wind_gust_dir   <ord> W, WNW, WSW, NE, W, WNW, W, W, NNW, W, N, NNE, W...
## $ wind_gust_speed <dbl> 44, 44, 46, 24, 41, 56, 50, 35, 80, 28, 30, 31, ...
## $ wind_dir_9am    <ord> W, NNW, W, SE, ENE, W, SW, SSE, SE, S, SSE, NE, ...
## $ wind_dir_3pm    <ord> WNW, WSW, WSW, E, NW, W, W, W, NW, SSE, ESE, ENE...
## $ wind_speed_9am  <dbl> 20, 4, 19, 11, 7, 19, 20, 6, 7, 15, 17, 15, 28, ...
## $ wind_speed_3pm  <dbl> 24, 22, 26, 9, 20, 24, 24, 17, 28, 11, 6, 13, 28...
## $ humidity_9am    <int> 71, 44, 38, 45, 82, 55, 49, 48, 42, 58, 48, 89, ...
## $ humidity_3pm    <int> 22, 25, 30, 16, 33, 23, 19, 19, 9, 27, 22, 91, 9...
## $ pressure_9am    <dbl> 1007.7, 1010.6, 1007.6, 1017.6, 1010.8, 1009.2, ...
## $ pressure_3pm    <dbl> 1007.1, 1007.8, 1008.7, 1012.8, 1006.0, 1005.4, ...
## $ cloud_9am       <int> 8, NA, NA, NA, 7, NA, 1, NA, NA, NA, NA, 8, 8, N...
....


Support further development by purchasing the PDF version of the book.
Other online resources include the GNU/Linux Desktop Survival Guide.
Books available on Amazon include Data Mining with Rattle and Essentials of Data Science.
Popular open source software includes rattle and wajig.
Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 2000-2020 Togaware Pty Ltd. . Creative Commons ShareAlike V4.