20210314 This book has been a work in progress since 1995, just as Data Science continues to develop and expand into our lives. Every section (eventually) will begin with a date that indicates the currency of the section—when it was last reviewed and/or updated.
Since beginning the survival guide books in 1995 they have grown in all kinds of directions. My original aim was to capture useful notes for the varied and many common tasks we find ourselves doing as data scientists (or data miners back then). I structured the book as one page nuggets of information. That is, each section within a chapter targeted a single printed page, and focused on a single task. This was the origin of my OnePageR Desktop Survival Guides. It seems to have worked well over the years, from my personal use and your feedback. This material has also lead to the publication of two books.
Readers are invited to send corrections, comments, suggestions, and updates to me at Graham.Williams@togaware.com. Your feedback is most welcome and will be acknowledged within the book.
A pdf version of this book is available for a small financial donation which goes towards supporting the development and availability of the book. Please visit Togaware for the details. The html version contains the same material and remains freely available from Togaware
This book is produced using bookdown. Emacs is used to edit the text. Many will be using RStudio to edit their bookdown documents, which is a generally more friendly environment and is the environment of choice for bookdown support. I’ve used Emacs since 1985 and as a fully extensible “kitchen-sink” type of editor, it has served me well for over 35 years, despite numerous flirtations with “better” editors over my career. RStudio and Visual Studio Code come close.
Bookdown is an rmarkdown based platform for intermixing text with executable code (like Python, R and Shell code blocks). Rmarkdown itself utilises the simple markdown syntax to markup the sections of a document. After running knitr over the rmarkdown material a markdown document is produced.
Pandoc is then utilised to produce html which is published on the Web. For the pdf output pandoc utilises LaTeX, converting the markdown into LaTeX markup, with xetex used to then convert that to pdf.
All these tools are open source software and available on multiple platforms.
What’s In A Name
GNU/Linux refers to the GNU environment and the GNU and other applications running in that environment on top of the Linux operating system kernel.
Ubuntu and its underlying base distribution Debian are complete repository based distributions which include many applications pre-built for the particular choice of operating system kernel. The repositories house pre-built packages ready to be installed.
X Window System is the common windowing system used in Ubuntu and is a separate complementary component to the operating system itself.
Microsoft Windows (or MS/Windows and less informatively just Windows) usually refers to the whole of the popular operating system, from kernel to applications, irrespective of which version of Microsoft Windows is being run, unless the version is important. Microsoft Windows is one of many windowing systems and came on to the screen rather later than the pioneering Apple Macintosh windowing system and the Unix windowing systems. We will refer to MS/Windows version 10 as the last release of this Microsoft operating system, which going forward has snapshot releases rather than new versions.
Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0