Go to TogaWare.com Home Page. Data Science Desktop Survival Guide
by Graham Williams
Duck Duck Go



CLICK HERE TO VISIT THE UPDATED SURVIVAL GUIDE

Data Source

20180908 We begin with identifying a data source and choose the simplest of sources—a text-based csv (comma separated value) file as a typical data source format.

The rattle::weatherAUS dataset from rattle will again be used. A binary formatted R dataset is provided by the package but the CSV file for the same dataset is available at https://rattle.togaware.com/weatherAUS.csv.

Identify and record the location of the CSV file to analyse. R can ingesting data directly from the Internet and so we will illustrate that here. The location of the file (the so-called URL or universal resource location) will be saved as a string in a variable called dspath—the path to the dataset. The following assignment command does this for us. Simply type this into your R script file within . The command is then executed in by clicking the Run button whilst the cursor is situated on the line within the script file.

# Note the source location of a dataset to ingest into R.

dspath <- "http://rattle.togaware.com/weatherAUS.csv"

The assignment operator $<$- will store the value on the right hand side (which is a string enclosed within quotation marks) into the computer's memory and we can later refer to it as the R variable dspath—we retrieve the string simply by reference to the variable dspath.

By typing the name of the variable (dspath) in the R Console at the > prompt R will respond with the value stored in the variable:

dspath
## [1] "http://rattle.togaware.com/weatherAUS.csv"

If not connected to the Internet we can read the data directly from a local copy of the csv file. The rattle package (once the package has been installed) provides a smaller sample weather.csv. The location of the CSV file within rattle is determined using base::system.file(). Knowing that csv files are located within the csv sub-directory of the rattle package we generate the string that identifies the file system path to weather.csv.

dspath <- system.file("csv", "weather.csv", package="rattle") %T>% print()
## [1] "/usr/lib/R/site-library/rattle/csv/weather.csv"

This is the path to the CSV file on my file system. Your path may well be different depending on where your system installed the rattle package.

Note that this is a considerably smaller subset of the full weatherAUS dataset and ingesting this rather than the full dataset will lead to different results to those presented here.

If you have separately downloaded weatherAUS.csv then you can identify its location. Here we identify that the downloaded file is located in the current working directory.

dspath <- "./weatherAUS.csv"


Support further development by purchasing the PDF version of the book.
Other online resources include the GNU/Linux Desktop Survival Guide.
Books available on Amazon include Data Mining with Rattle and Essentials of Data Science.
Popular open source software includes rattle and wajig.
Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 2000-2020 Togaware Pty Ltd. . Creative Commons ShareAlike V4.