2.19 A Glimpse of the Dataset
REVIEW Another example of a useful command that we will find ourselves using often is the glimpse() command from the dplyr (Wickham et al. 2023) package. This command can be accessed in the R console as dplyr::glimpse() once the dplyr (Wickham et al. 2023) package has been installed. This particular command accepts an argument x= which names the dataset we wish to glimpse. In the following R example we use the weatherAUS dataset from the rattle (G. Williams 2024) package.
## Rows: 226,868
## Columns: 24
## $ Date <date> 2008-12-01, 2008-12-02, 2008-12-03, 2008-12-04, 2008-12…
## $ Location <chr> "Albury", "Albury", "Albury", "Albury", "Albury", "Albur…
## $ MinTemp <dbl> 13.4, 7.4, 12.9, 9.2, 17.5, 14.6, 14.3, 7.7, 9.7, 13.1, …
## $ MaxTemp <dbl> 22.9, 25.1, 25.7, 28.0, 32.3, 29.7, 25.0, 26.7, 31.9, 30…
## $ Rainfall <dbl> 0.6, 0.0, 0.0, 0.0, 1.0, 0.2, 0.0, 0.0, 0.0, 1.4, 0.0, 2…
## $ Evaporation <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ Sunshine <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ WindGustDir <ord> W, WNW, WSW, NE, W, WNW, W, W, NNW, W, N, NNE, W, SW, NA…
## $ WindGustSpeed <dbl> 44, 44, 46, 24, 41, 56, 50, 35, 80, 28, 30, 31, 61, 44, …
## $ WindDir9am <ord> W, NNW, W, SE, ENE, W, SW, SSE, SE, S, SSE, NE, NNW, W, …
## $ WindDir3pm <ord> WNW, WSW, WSW, E, NW, W, W, W, NW, SSE, ESE, ENE, NNW, S…
## $ WindSpeed9am <dbl> 20, 4, 19, 11, 7, 19, 20, 6, 7, 15, 17, 15, 28, 24, 4, N…
## $ WindSpeed3pm <dbl> 24, 22, 26, 9, 20, 24, 24, 17, 28, 11, 6, 13, 28, 20, 30…
## $ Humidity9am <int> 71, 44, 38, 45, 82, 55, 49, 48, 42, 58, 48, 89, 76, 65, …
## $ Humidity3pm <int> 22, 25, 30, 16, 33, 23, 19, 19, 9, 27, 22, 91, 93, 43, 3…
## $ Pressure9am <dbl> 1007.7, 1010.6, 1007.6, 1017.6, 1010.8, 1009.2, 1009.6, …
## $ Pressure3pm <dbl> 1007.1, 1007.8, 1008.7, 1012.8, 1006.0, 1005.4, 1008.2, …
## $ Cloud9am <int> 8, NA, NA, NA, 7, NA, 1, NA, NA, NA, NA, 8, 8, NA, NA, 0…
## $ Cloud3pm <int> NA, NA, 2, NA, 8, NA, NA, NA, NA, NA, NA, 8, 8, 7, NA, N…
## $ Temp9am <dbl> 16.9, 17.2, 21.0, 18.1, 17.8, 20.6, 18.1, 16.3, 18.3, 20…
## $ Temp3pm <dbl> 21.8, 24.3, 23.2, 26.5, 29.7, 28.9, 24.6, 25.5, 30.2, 28…
## $ RainToday <fct> No, No, No, No, No, No, No, No, No, Yes, No, Yes, Yes, Y…
## $ RISK_MM <dbl> 0.0, 0.0, 0.0, 1.0, 0.2, 0.0, 0.0, 0.0, 1.4, 0.0, 2.2, 1…
## $ RainTomorrow <fct> No, No, No, No, No, No, No, No, Yes, No, Yes, Yes, Yes, …
We can see here the command that was run and the output from running
that command. As a convention used in this book the output from
running R commands is prefixed with ##
. The
#
introduces a comment in an R script file and tells
R to ignore everything that follows on that line. We use the
`##’ convention throughout the book to clearly identify
output produced by R. When we run these commands ourselves in R
this prefix is not displayed.
Long lines of output are also truncated for our presentation here. The
...
at the end of the lines and the ....
at the end of
the output indicate that the output has been truncated for the sake of
keeping our example output to a minimum.
This is a data frame, which is the basic data structure used to store a dataset within R, enhanced by the tidyverse to add functionality that improves our interactions with the data frame.
References
Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0