Random Seed

		Data Science Desktop Survival Guide by Graham Williams

CLICK HERE TO VISIT THE UPDATED SURVIVAL GUIDE

Random Seed

Raw In much of our modelling we will be randomly sampling datasets. In sampling datasets a random number sequence will be used. Such a sequence can be repeatable by initialising with a “randomly” selected seed. We do this so that we can replicate the examples presented throughout this book. We will shortly identify a random training dataset as a subset of the whole dataset. To ensure the same random subset is selected each time we initiate the random number generator with a specific seed using base::set.seed(). For no particular reason we choose a seed.

seed <- 42
set.seed(seed)

It is worth noting that many model builders use heuristics to search for a good model. The general approach is to search for a good model rather than the best model. This is often necessary because the computational requirements to find the best model will generally be prohibitive and can be as much as years of computer time. Searching for the best model involves searching through an enormous search space of all of the possible models. Our algorithms will reduce the computational requirements to something feasible using heuristics. Such heuristics often involve some level of random decision making in deciding which paths to follow in any search. By setting the seed for the random number generator to a known initial value will ensure we can replicate the model building later.

Support further development by purchasing the PDF version of the book.
Other online resources include the GNU/Linux Desktop Survival Guide.
Books available on Amazon include Data Mining with Rattle and Essentials of Data Science.
Popular open source software includes rattle and wajig.
Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 2000-2020 Togaware Pty Ltd. . Creative Commons ShareAlike V4.