Go to TogaWare.com Home Page. Data Science Desktop Survival Guide
by Graham Williams
Duck Duck Go

Data Ingestion

20180721 Having identified the source of the dataset we can ingest the dataset into the memory of the computer using readr::read_csv() which returns an enhanced data frame.

We set up a reference to the data frame's location in the computer's memory by assigning the result of the call to the function readr::read_csv() to the R variable weather.

# Ingest the dataset.

weatherAUS <- read_csv(file=dspath)

As a side effect of calling the function readr::read_csv() helpful messages are displayed that identify the data types for each of the variables found in the ingested dataset. We should review these to ensure they match our expectations. If they don't, there are optional arguments to readr::read_csv() to inform it otherwise.

Note that the rattle also provides a smaller rattle::weather dataset as an R dataset, also named weather. Simply by attaching the rattle package from the library a variable called weather becomes available. Running the above command will replace the dataset provided by rattle. Having done so we can still access the weather dataset provided by rattle using the package prefix as in rattle::weather.


Support further development by purchasing the PDF version of the book.
Other online resources include the GNU/Linux Desktop Survival Guide.
Books available on Amazon include Data Mining with Rattle and Essentials of Data Science.
Popular open source software includes rattle and wajig.
Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 2000-2020 Togaware Pty Ltd. . Creative Commons ShareAlike V4.