Go to TogaWare.com Home Page. Data Science Desktop Survival Guide
by Graham Williams
Duck Duck Go

The Shape of the Dataset

20180721 Once the dataset is loaded we want to get a basic idea of what it looks like—its shape. Being an extended data frame (what we call a tibble), we can display the data as a tibble simply by printing the data referred to by the variable name.

# Print the dataset in a human useful way.

weather
## # A tibble: 366 x 24
##    Date       Location MinTemp MaxTemp Rainfall Evaporation Sunshine
##    <date>     <chr>      <dbl>   <dbl>    <dbl>       <dbl>    <dbl>
##  1 2007-11-01 Canberra     8      24.3      0           3.4      6.3
##  2 2007-11-02 Canberra    14      26.9      3.6         4.4      9.7
##  3 2007-11-03 Canberra    13.7    23.4      3.6         5.8      3.3
##  4 2007-11-04 Canberra    13.3    15.5     39.8         7.2      9.1
##  5 2007-11-05 Canberra     7.6    16.1      2.8         5.6     10.6
##  6 2007-11-06 Canberra     6.2    16.9      0           5.8      8.2
##  7 2007-11-07 Canberra     6.1    18.2      0.2         4.2      8.4
##  8 2007-11-08 Canberra     8.3    17        0           5.6      4.6
##  9 2007-11-09 Canberra     8.8    19.5      0           4        4.1
## 10 2007-11-10 Canberra     8.4    22.8     16.2         5.4      7.7
## # ... with 356 more rows, and 17 more variables: WindGustDir <ord>,
## #   WindGustSpeed <dbl>, WindDir9am <ord>, WindDir3pm <ord>,
## #   WindSpeed9am <dbl>, WindSpeed3pm <dbl>, Humidity9am <int>,
## #   Humidity3pm <int>, Pressure9am <dbl>, Pressure3pm <dbl>, Cloud9am <i...
## #   Cloud3pm <int>, Temp9am <dbl>, Temp3pm <dbl>, RainToday <fct>,
## #   RISK_MM <dbl>, RainTomorrow <fct>

We observe that dataset consists of 366 observations of 24 variables. The enhanced nature of the data frame that representing it as a tibble brings to us is that the printout is more informative. The first few observations are shown with a subset of the variables followed by a list of all of the other variables.


Support further development by purchasing the PDF version of the book.
Other online resources include the GNU/Linux Desktop Survival Guide.
Books available on Amazon include Data Mining with Rattle and Essentials of Data Science.
Popular open source software includes rattle and wajig.
Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 2000-2020 Togaware Pty Ltd. . Creative Commons ShareAlike V4.