10.33 Data Source
20180908 We begin with identifying a data source and choose the simplest of sources—a text-based csv (comma separated value) file as a typical data source format.
The rattle::weatherAUS dataset from rattle (G. Williams 2024) will again be used. A binary formatted R dataset is provided by the package but the CSV file for the same dataset is available at https://rattle.togaware.com/weatherAUS.csv.
Identify and record the location of the CSV file to analyse. R can
ingesting data directly from the Internet and so we will illustrate
that here. The location of the file (the so-called URL or universal
resource location) will be saved as a string in a variable called
dspath
—the path to the dataset. The following assignment
command does this for us. Simply type this into your R script file
within RStudio. The command is then executed in RStudio by
clicking the Run
button whilst the cursor is situated on the
line within the script file.
# Note the source location of a dataset to ingest into R.
dspath <- "http://rattle.togaware.com/weatherAUS.csv"
The assignment operator
<- will store
the value on the right hand side (which is a string enclosed within
quotation marks) into the computer’s memory and we can later refer to
it as the R variable dspath
—we retrieve the string
simply by reference to the variable dspath
.
By typing the name of the variable (dspath
) in the R
Console at the >
prompt R will respond with the value
stored in the variable:
## [1] "http://rattle.togaware.com/weatherAUS.csv"
If not connected to the Internet we can read the data directly from a
local copy of the csv file. The
rattle (G. Williams 2024) package (once the package has been installed)
provides a smaller sample . The location of the
CSV file within rattle (G. Williams 2024) is determined using
base::system.file(). Knowing that csv files
are located within the csv
sub-directory of the
rattle (G. Williams 2024) package we generate the string that
identifies the file system path to .
## [1] "/usr/lib/R/site-library/rattle/csv/weather.csv"
This is the path to the CSV file on my file system. Your path may well be different depending on where your system installed the rattle (G. Williams 2024) package.
Note that this is a considerably smaller subset of the full weatherAUS dataset and ingesting this rather than the full dataset will lead to different results to those presented here.
If you have separately downloaded then you can identify its location. Here we identify that the downloaded file is located in the current working directory.
References
Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0