Go to TogaWare.com Home Page. Data Science Desktop Survival Guide
by Graham Williams
Duck Duck Go



CLICK HERE TO VISIT THE UPDATED SURVIVAL GUIDE

Dataset Head and Tail

20180721 Datasets can be very large, with many observations (millions) and many variables (thousands). We can't be expected to browse through all of the observations and variables. Instead we might review the contents of the dataset using utils::head() and utils::tail() to consider the top six (by default) and the bottom six observations.

# Review the first few observations.

head(ds) %>% print.data.frame()
##         date location min_temp max_temp rainfall evaporation sunshine
## 1 2008-12-01   Albury     13.4     22.9      0.6          NA       NA
## 2 2008-12-02   Albury      7.4     25.1      0.0          NA       NA
## 3 2008-12-03   Albury     12.9     25.7      0.0          NA       NA
## 4 2008-12-04   Albury      9.2     28.0      0.0          NA       NA
## 5 2008-12-05   Albury     17.5     32.3      1.0          NA       NA
## 6 2008-12-06   Albury     14.6     29.7      0.2          NA       NA
##   wind_gust_dir wind_gust_speed wind_dir_9am wind_dir_3pm wind_speed_9am
## 1             W              44            W          WNW             20
## 2           WNW              44          NNW          WSW              4
## 3           WSW              46            W          WSW             19
....

# Review the last few observations.

tail(ds) %>% print.data.frame()
##         date location min_temp max_temp rainfall evaporation sunshine
## 1 2020-04-24    Uluru     23.0     36.7        0          NA       NA
## 2 2020-04-25    Uluru     18.4     37.4        0          NA       NA
## 3 2020-04-26    Uluru     21.4     32.7        0          NA       NA
## 4 2020-04-27    Uluru     19.4     32.2        0          NA       NA
## 5 2020-04-28    Uluru     16.6     32.6        0          NA       NA
## 6 2020-04-29    Uluru     16.7     25.7        0          NA       NA
##   wind_gust_dir wind_gust_speed wind_dir_9am wind_dir_3pm wind_speed_9am
## 1           ESE              31            E            E             22
## 2             W              35            E          NNW              9
## 3             S              54          ESE          ESE             20
....

All the time we are building a picture of the data we are looking at. It is beginning to confirm that location has multiple values whilst date does appear to be a sequence for each location.


Support further development by purchasing the PDF version of the book.
Other online resources include the GNU/Linux Desktop Survival Guide.
Books available on Amazon include Data Mining with Rattle and Essentials of Data Science.
Popular open source software includes rattle and wajig.
Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 2000-2020 Togaware Pty Ltd. . Creative Commons ShareAlike V4.