20200104 Data is available in an enormous variety of
formats and stored in a diverse number of locations, including our own
data stores, local disks, cloud stores, database systems, and from the
Internet sites like http://data.gov, http://data.gov.uk/,
and http://data.gov.au/. A major task we face as Data
Scientists is ingesting that data into R in order for us to perform
our analyses. R provides extensive capabilities for
reading/importing/ingesting data and for writing/exporting data.
Here we explore the options, including a simple and widely used format
known as comma separated values (CSV) files. A variation of this is
the tab separated values, or indeed other special characters used to
separate the columns of the data.
But data comes in an amazing variety of formats, including many
proprietary formats that need special effort to decipher. R
supports almost every known format through many different
packages. This chapter introduces the numerous options available.
Also included is a guide to creating your own random dataset for
testing ideas and building systems where the actual data may not be so
readily available due to privacy, for example.