Go to TogaWare.com Home Page. Data Science Desktop Survival Guide
by Graham Williams
Duck Duck Go



CLICK HERE TO VISIT THE UPDATED SURVIVAL GUIDE

CSV Data

20201022 One of the simplest and most common ways of sharing data today remains the Comma Separated Values (CSV) format. As a simple and even trivial format the use of csv files has become a standard file format used to exchange data between many different applications. csv files can, for example, be exported and imported by numerous applications and spreadsheets and databases, including Rattle, LibreOffice Calc, Gnumeric, MS/Excel, SAS/Enterprise Miner, Teradata, Netezza, and many, many, other applications. The downside of the csv format is that the file does not contain explicit metadata (i.e., data about the data), like the data types of the different columns. Typically we would like to know whether there is numeric or character data within the column. If numeric data then are they dates, or dollars, and if character are they categoric (factors). Without this metadata R commands for reading the data have to make a guess and will sometimes determine the wrong data type for a particular column. There are options available to reduce this possibility or to provide the metadata. Reading csv files is straight forward using readr::read_csv().

library(readr)        # Modern and efficient data reader/writer.

ds <- read_csv("mydata.csv")

Column types can be specified using col_types="inffcD". The string can contain any of c (character), i (integer), n (numeric), d (double), l (logical), f (factor), D (date), T (data and time), t (time), ? (guess), _, (ignore).

Writing a dataset to a csv file is straightforward using readr::write_csv():

library(rattle)       # Dataset: weatherAUS.
library(dplyr)        # Wrangling: select().
library(readr)        # Modern and efficient data reader/writer.

ds <- weatherAUS

fname  <- "temperatureAUS.csv"

ds %>%
  select(Date, Location, MinTemp, MaxTemp, Temp9am, Temp3pm) %>%
  write_csv(fname)

To turn off the messaging of the identified columns, set the option for the number of columns to report to 0:

options(readr.num_columns=0)

To turn it back on, set it to NULL.


Support further development by purchasing the PDF version of the book.
Other online resources include the GNU/Linux Desktop Survival Guide.
Books available on Amazon include Data Mining with Rattle and Essentials of Data Science.
Popular open source software includes rattle and wajig.
Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 2000-2020 Togaware Pty Ltd. . Creative Commons ShareAlike V4.