Go to TogaWare.com Home Page. Data Science Desktop Survival Guide
by Graham Williams
Duck Duck Go



CLICK HERE TO VISIT THE UPDATED SURVIVAL GUIDE

Comparing Distributions

Raw Continuing with our pipeline example, we might want a base::summary() of the dataset.
# Summarise subset of variables for observations with rainfall.

weatherAUS %>%
  select(MinTemp, MaxTemp, Rainfall, Sunshine) %>%
  filter(Rainfall >= 1) %>%
  summary()
##     MinTemp         MaxTemp         Rainfall          Sunshine     
##  Min.   :-8.50   Min.   :-4.10   Min.   :  1.000   Min.   : 0.000  
##  1st Qu.: 8.40   1st Qu.:15.50   1st Qu.:  2.200   1st Qu.: 2.400  
##  Median :12.20   Median :19.30   Median :  4.600   Median : 5.500  
##  Mean   :12.72   Mean   :20.19   Mean   :  9.681   Mean   : 5.379  
##  3rd Qu.:17.20   3rd Qu.:24.40   3rd Qu.: 10.800   3rd Qu.: 8.100  
##  Max.   :28.90   Max.   :46.30   Max.   :474.000   Max.   :14.200  
....

It could be useful to contrast this with a base::summary() of those observations where there was virtually no rain.

# Summarise observations with little or no rainfall.

weatherAUS %>%
  select(MinTemp, MaxTemp, Rainfall, Sunshine) %>%
  filter(Rainfall < 1) %>%
  summary()
##     MinTemp         MaxTemp        Rainfall          Sunshine    
##  Min.   :-8.70   Min.   :-2.1   Min.   :0.00000   Min.   : 0.00  
##  1st Qu.: 7.20   1st Qu.:19.1   1st Qu.:0.00000   1st Qu.: 6.20  
##  Median :11.90   Median :23.8   Median :0.00000   Median : 9.30  
##  Mean   :11.97   Mean   :24.3   Mean   :0.05825   Mean   : 8.37  
##  3rd Qu.:16.70   3rd Qu.:29.3   3rd Qu.:0.00000   3rd Qu.:11.00  
##  Max.   :33.90   Max.   :48.9   Max.   :0.90000   Max.   :14.50  
....

Any number of functions can be included in a pipeline to achieve the results we desire. In the following chapters we will see many examples and some will string together ten or more functions. Each step along the way is of itself generally easily understandable. The power is in what we can achieve by stringing together many simple steps to produce something more complex.


Support further development by purchasing the PDF version of the book.
Other online resources include the GNU/Linux Desktop Survival Guide.
Books available on Amazon include Data Mining with Rattle and Essentials of Data Science.
Popular open source software includes rattle and wajig.
Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 2000-2020 Togaware Pty Ltd. . Creative Commons ShareAlike V4.