Go to TogaWare.com Home Page. Data Science Desktop Survival Guide
by Graham Williams
Duck Duck Go



CLICK HERE TO VISIT THE UPDATED SURVIVAL GUIDE

Random Dataset

20200421 Using wakefield::r_data_frame() it is straightforward to create simple random data datasets. More sophisticated sampling can be used if required though extra work is required to get correlated distributions across variables when looking to most accurately match some real world data. Nonetheless, creating random data quickly is the forte of wakefield.

library(wakefield)    # Generate random datasets.

nobs <- 100 # Number of observatios.

r_data_frame(n=nobs,
             id=id_factor,
             givennames=name,
             surname=name,
             dob=dob(start=Sys.Date()-365*100, k=365*99),
             sex=sex,
             phone=r_sample(x=400000000:499999999),
             diabetes=answer,
             result=r_sample_factor(x=c("pending", "positive", "negative")),
             previous=answer(prob=c(0.8,0.2)),
             bps=r_sample(90:135),
             temp=normal(36.3, 1.2),
             gcs=r_sample(3:15, prob=c(rep(0.01, 12), 0.88)),
             crp=normal(2.5, 2, min=0, max=200)) ->
random_ds

glimpse(random_ds)
## Rows: 100
## Columns: 13
## $ id         <fct> 001, 002, 003, 004, 005, 006, 007, 008, 009, 010, 011...
## $ givennames <chr> "Jackalyn", "Elmedina", "Abryl", "Maylis", "Kenzingty...
## $ surname    <chr> "Cogan", "Janyriah", "Gidgett", "Chaisty", "Yanis", "...
## $ dob        <date> 1996-01-13, 1978-03-04, 1936-05-07, 1995-04-06, 1988...
## $ sex        <fct> Male, Female, Female, Female, Male, Male, Female, Fem...
## $ phone      <int> 471696500, 409332300, 480420807, 492645846, 440390288...
## $ diabetes   <fct> No, No, Yes, Yes, No, Yes, Yes, Yes, Yes, No, No, Yes...
## $ result     <fct> negative, positive, positive, pending, positive, posi...
## $ previous   <fct> No, No, No, Yes, No, No, No, No, No, No, No, No, No, ...
## $ bps        <int> 93, 129, 114, 118, 108, 127, 104, 123, 124, 124, 94, ...
## $ temp       <dbl> 37.43873, 38.01132, 34.90968, 38.08261, 35.84380, 36....
## $ gcs        <int> 15, 15, 15, 15, 14, 15, 15, 15, 15, 15, 15, 15, 15, 1...
## $ crp        <dbl> 0.6167378, 0.3240050, 4.7457651, 3.4928462, 2.5923718...


Support further development by purchasing the PDF version of the book.
Other online resources include the GNU/Linux Desktop Survival Guide.
Books available on Amazon include Data Mining with Rattle and Essentials of Data Science.
Popular open source software includes rattle and wajig.
Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 2000-2020 Togaware Pty Ltd. . Creative Commons ShareAlike V4.