Go to TogaWare.com Home Page. Data Science Desktop Survival Guide
by Graham Williams
Duck Duck Go


Naïve Bayes

Representation: Probabilities

Consider classifying input data by determining the probability of an event happening given the probability of another event that has already occured. Class and conditional probabilities are calculated to determine the likelihood of an event.

Naïve Bayes is a simple and effective machine learning classifier based on Bayes' Theorem:

$\displaystyle P(A\vert B) = {{P(B\vert A) * P(A)}\over{P(B)}}

The class probabilities are $P(A)$ (the probability of event $A$) and $P(B)$. The conditional probabilities are $P(A\vert B)$ (the probability of $A$ happening given that $B$ occurs) and $P(B\vert A)$.

A Naïve Bayes classifier uses these probabilities to classify independent features that have the same weight or importance from the input dataset to make predictions.

The idea is simple and can be applied to small datasets. It suffers from the zero frequency problem where a class with probability of 0 for a selected feature, and hence its conditional probability is then also 0 and so excluded from further consideration. Laplace smoothing assigns the class a small probability.

The concept also assumes independence of the features and when not met will perform poorly.

Support further development by purchasing the PDF version of the book.
Other online resources include the GNU/Linux Desktop Survival Guide.
Books available on Amazon include Data Mining with Rattle and Essentials of Data Science.
Popular open source software includes rattle and wajig.
Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 2000-2020 Togaware Pty Ltd. . Creative Commons ShareAlike V4.