Go to TogaWare.com Home Page. Data Science Desktop Survival Guide
by Graham Williams
Duck Duck Go



CLICK HERE TO VISIT THE UPDATED SURVIVAL GUIDE

Numeric

20180723 Summaries of numeric data are provided using base::summary(). In the following we identify the numeric variables and summarise each. In doing so, as a data scientist, we want to again observe any oddities and to explain them.

ds %>%
  sapply(is.numeric) %>%
  which() %>%
  names %T>%
  print() ->
numi
##  [1] "min_temp"        "max_temp"        "rainfall"        "evaporation"...
##  [5] "sunshine"        "wind_gust_speed" "wind_speed_9am"  "wind_speed_3...
##  [9] "humidity_9am"    "humidity_3pm"    "pressure_9am"    "pressure_3pm...
## [13] "cloud_9am"       "cloud_3pm"       "temp_9am"        "temp_3pm"   ...
## [17] "risk_mm"

ds[numi] %>%
  summary()
##     min_temp        max_temp        rainfall        evaporation    
##  Min.   :-8.70   Min.   :-4.10   Min.   :  0.000   Min.   :  0.00  
##  1st Qu.: 7.50   1st Qu.:18.10   1st Qu.:  0.000   1st Qu.:  2.80  
##  Median :12.00   Median :22.80   Median :  0.000   Median :  4.80  
##  Mean   :12.15   Mean   :23.36   Mean   :  2.241   Mean   :  5.53  
##  3rd Qu.:16.90   3rd Qu.:28.40   3rd Qu.:  0.600   3rd Qu.:  7.40  
##  Max.   :33.90   Max.   :48.90   Max.   :474.000   Max.   :133.90  
##  NA's   :2349    NA's   :2105    NA's   :4318      NA's   :86289   
##     sunshine     wind_gust_speed  wind_speed_9am  wind_speed_3pm 
##  Min.   : 0.00   Min.   :  2.00   Min.   : 0.00   Min.   : 0.00  
##  1st Qu.: 4.90   1st Qu.: 31.00   1st Qu.: 7.00   1st Qu.:13.00  
##  Median : 8.50   Median : 39.00   Median :13.00   Median :19.00  
##  Mean   : 7.66   Mean   : 40.19   Mean   :14.05   Mean   :18.72  
##  3rd Qu.:10.60   3rd Qu.: 48.00   3rd Qu.:19.00   3rd Qu.:24.00  
##  Max.   :14.50   Max.   :135.00   Max.   :87.00   Max.   :87.00  
##  NA's   :93859   NA's   :13036    NA's   :2924    NA's   :5434   
....

Reviewing this information we can make some obvious observations. Temperatures, for example, appears to be in degrees Celsius rather than Fahrenheit. Rainfall looks like millimetres. There are some quite skewed distributions with min and median small but large max values. As data scientists we will further explore the distributions as in Chapter [*].


Support further development by purchasing the PDF version of the book.
Other online resources include the GNU/Linux Desktop Survival Guide.
Books available on Amazon include Data Mining with Rattle and Essentials of Data Science.
Popular open source software includes rattle and wajig.
Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 2000-2020 Togaware Pty Ltd. . Creative Commons ShareAlike V4.