Data Science Desktop Survival Guide by Graham Williams Desktop Survival Project Home Preface Data Science Introducing R R Constructs R Tasks R Strings R Read, Write, and Create Data Template Data Exploration Data Wrangling Data Visualisation Statistics ML Template ML Scenarios ML Activities ML Applications ML Algorithms Cluster Analysis Decision Trees Computer Vision Graph Data Privacy Literate Data Science Coding with Style Resources Bibliography Index

## A Data Frame as a Dataset

20210103 A data frame is essentially a rectangular table (or matrix) of data consisting of rows (observations) and columns (variables). We can base::print.data.frame() to view a table, here choosing the first 10 observations of the first 6 variables of the ds dataset.
# Display the table structure of the ingested dataset.

ds[1:10,1:6] %>% print.data.frame()
 ```## date location min_temp max_temp rainfall evaporation ## 1 2008-12-01 Albury 13.4 22.9 0.6 NA ## 2 2008-12-02 Albury 7.4 25.1 0.0 NA ## 3 2008-12-03 Albury 12.9 25.7 0.0 NA ## 4 2008-12-04 Albury 9.2 28.0 0.0 NA ## 5 2008-12-05 Albury 17.5 32.3 1.0 NA ## 6 2008-12-06 Albury 14.6 29.7 0.2 NA ## 7 2008-12-07 Albury 14.3 25.0 0.0 NA ## 8 2008-12-08 Albury 7.7 26.7 0.0 NA ## 9 2008-12-09 Albury 9.7 31.9 0.0 NA ## 10 2008-12-10 Albury 13.1 30.1 1.4 NA ```

Alternatively we might sample 10 random observations (dplyr::sample_n()) of 5 random variables (dplyr::select()):

# Display a random selection of observations and variables.

ds %>%
sample_n(10) %>%
select(sample(1:ncol(ds), 5)) %>%
print.data.frame()
 ```## wind_gust_speed max_temp rainfall wind_speed_3pm rain_today ## 1 30 25.1 0.2 11 No ## 2 72 30.7 1.0 33 No ## 3 56 14.9 1.6 20 Yes ## 4 33 28.8 0.0 20 No ## 5 37 31.3 0.0 20 No ## 6 35 35.7 0.0 15 No ## 7 24 15.5 0.0 15 No ## 8 22 22.5 0.0 6 No ## 9 31 20.0 0.4 4 No ## 10 35 21.9 0.0 17 No ```

This tabular form (i.e., it has rows and columns) is common for data science and we refer to it as our dataset.