Data Science Desktop Survival Guide by Graham Williams Desktop Survival Project Home Preface Data Science Introducing R R Constructs R Tasks R Strings R Read, Write, and Create Data Template Data Exploration Data Wrangling Data Visualisation Statistics ML Template ML Scenarios ML Activities ML Applications ML Algorithms Cluster Analysis Decision Trees Computer Vision Graph Data Privacy Literate Data Science Coding with Style Resources Bibliography Index

## Formatting Numbers with XTable

Raw As with knitr::kable() we can limit the number of digits displayed to avoid giving an impression of a high level of accuracy or to simplify the presentation. In Table 22.3 we have removed all decimal points.

# Display a table removing digits from numbers.

ds %>%
xtable(digits=0
, caption="Decimal points."
, label="tbldp0") %>%
print(include.rownames=FALSE)

Table 22.3: Decimal points.
 Location MinTemp MaxTemp Rainfall Evaporation Tuggeranong 10 25 0 Wollongong 21 31 1 Dartmoor 6 15 2 3 Sale 17 29 0 6 WaggaWagga 8 31 0 4

When we have large numbers being displayed it is imperative that we include commas to separate the thousands. Very many mistakes are made misreading numbers that include many digits when commas are not included.

# Take a copy of the dataset so as to change the data.

dst     <- ds %>% sample_frac(0.01)

# Randomly create very large numbers across all but the first variable.

dst[-1] <- sample(10000:99999, nrow(dst)) * dst[-1]

# Illustrate the default table display of large numbers.

dst %>%
xtable(digits=0
, caption="Large numbers."
, label="tbllrg") %>%
print(include.rownames=FALSE)

Table 22.4: Large numbers.
 Location MinTemp MaxTemp Rainfall Evaporation

Consider the result in Table 22.4. It is difficult to distinguish between the thousands and millions. We often find ourselves having to carefully count the digits to check whether the reader we should always use a comma to separate the thousands and millions. This simple principle makes it much easier for the reader to appreciate the scale and to avoid misreading data, yet it is so often overlooked. We can see the result in Table 22.5.

# Format large numbers using commas as appropriate.

dst %>%
xtable(digits=0
, caption="Large numbers formatted."
, label="tbllrgf") %>%
print(include.rownames=FALSE,
format.args=list(big.mark=","))

Table 22.5: Large numbers formatted.
 Location MinTemp MaxTemp Rainfall Evaporation

Other online resources include the GNU/Linux Desktop Survival Guide.
Books available on Amazon include Data Mining with Rattle and Essentials of Data Science.
Popular open source software includes rattle and wajig.
Hosted by Togaware, a pioneer of free and open source software since 1984.