10.38 Numeric

20180723 Summaries of numeric data are provided using base::summary(). In the following we identify the numeric variables and summarise each. In doing so, as a data scientist, we want to again observe any oddities and to explain them.

ds %>%
  sapply(is.numeric) %>%
  which() %>%
  names %T>%
  print() ->
numi
##  [1] "min_temp"        "max_temp"        "rainfall"        "evaporation"    
##  [5] "sunshine"        "wind_gust_speed" "wind_speed_9am"  "wind_speed_3pm" 
##  [9] "humidity_9am"    "humidity_3pm"    "pressure_9am"    "pressure_3pm"   
## [13] "cloud_9am"       "cloud_3pm"       "temp_9am"        "temp_3pm"       
## [17] "risk_mm"
ds[numi] %>% 
  summary()
##     min_temp       max_temp        rainfall        evaporation    
##  Min.   :-8.7   Min.   :-4.10   Min.   :  0.000   Min.   :  0.00  
##  1st Qu.: 7.5   1st Qu.:18.00   1st Qu.:  0.000   1st Qu.:  2.60  
##  Median :11.9   Median :22.70   Median :  0.000   Median :  4.80  
##  Mean   :12.1   Mean   :23.29   Mean   :  2.254   Mean   :  5.54  
##  3rd Qu.:16.8   3rd Qu.:28.30   3rd Qu.:  0.600   3rd Qu.:  7.40  
##  Max.   :33.9   Max.   :48.90   Max.   :474.000   Max.   :138.70  
##  NA's   :2760   NA's   :2555    NA's   :5037      NA's   :96586   
##     sunshine      wind_gust_speed  wind_speed_9am  wind_speed_3pm
##  Min.   : 0.00    Min.   :  2.00   Min.   : 0.00   Min.   : 0.0  
##  1st Qu.: 4.90    1st Qu.: 31.00   1st Qu.: 7.00   1st Qu.:13.0  
##  Median : 8.50    Median : 39.00   Median :13.00   Median :19.0  
##  Mean   : 7.65    Mean   : 40.18   Mean   :14.07   Mean   :18.7  
##  3rd Qu.:10.70    3rd Qu.: 48.00   3rd Qu.:19.00   3rd Qu.:24.0  
##  Max.   :14.50    Max.   :135.00   Max.   :87.00   Max.   :87.0  
##  NA's   :104989   NA's   :14376    NA's   :3351    NA's   :6406  
##   humidity_9am    humidity_3pm     pressure_9am     pressure_3pm   
##  Min.   :  0.0   Min.   :  0.00   Min.   : 979.1   Min.   : 978.9  
##  1st Qu.: 56.0   1st Qu.: 35.00   1st Qu.:1013.0   1st Qu.:1010.5  
##  Median : 69.0   Median : 51.00   Median :1017.7   Median :1015.3  
##  Mean   : 68.4   Mean   : 50.89   Mean   :1017.8   Mean   :1015.3  
##  3rd Qu.: 83.0   3rd Qu.: 65.00   3rd Qu.:1022.6   3rd Qu.:1020.2  
##  Max.   :100.0   Max.   :100.00   Max.   :1041.1   Max.   :1040.1  
##  NA's   :3828    NA's   :7472     NA's   :21098    NA's   :21089   
##    cloud_9am       cloud_3pm        temp_9am        temp_3pm    
##  Min.   :0.00    Min.   :0.00    Min.   :-6.20   Min.   :-5.10  
##  1st Qu.:1.00    1st Qu.:2.00    1st Qu.:12.20   1st Qu.:16.60  
##  Median :5.00    Median :5.00    Median :16.70   Median :21.20  
##  Mean   :4.57    Mean   :4.58    Mean   :16.97   Mean   :21.75  
##  3rd Qu.:7.00    3rd Qu.:7.00    3rd Qu.:21.60   3rd Qu.:26.50  
##  Max.   :9.00    Max.   :9.00    Max.   :40.20   Max.   :48.20  
##  NA's   :80359   NA's   :86239   NA's   :2832    NA's   :6472   
##     risk_mm       
##  Min.   :  0.000  
##  1st Qu.:  0.000  
##  Median :  0.000  
##  Mean   :  2.254  
##  3rd Qu.:  0.600  
##  Max.   :474.000  
##  NA's   :5038

Reviewing this information we can make some obvious observations. Temperatures, for example, appears to be in degrees Celsius rather than Fahrenheit. Rainfall looks like millimetres. There are some quite skewed distributions with min and median small but large max values. As data scientists we will further explore the distributions as in Chapter 9.



Your donation will support ongoing development and give you access to the PDF version of the book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0.