Data Science Desktop Survival Guide
by Graham Williams |
|||||
Random Dataset |
20200421 Using wakefield::r_data_frame() it is straightforward to create simple random data datasets. More sophisticated sampling can be used if required though extra work is required to get correlated distributions across variables when looking to most accurately match some real world data. Nonetheless, creating random data quickly is the forte of wakefield.
library(wakefield) # Generate random datasets.
nobs <- 100 # Number of observatios. r_data_frame(n=nobs, id=id_factor, givennames=name, surname=name, dob=dob(start=Sys.Date()-365*100, k=365*99), sex=sex, phone=r_sample(x=400000000:499999999), diabetes=answer, result=r_sample_factor(x=c("pending", "positive", "negative")), previous=answer(prob=c(0.8,0.2)), bps=r_sample(90:135), temp=normal(36.3, 1.2), gcs=r_sample(3:15, prob=c(rep(0.01, 12), 0.88)), crp=normal(2.5, 2, min=0, max=200)) -> random_ds glimpse(random_ds)
|