Classification - Naive Bayes

The Bayes Classifier is a binary classification algorithm. It is often introduced as one of the first algorithms to master in the field of machine learning.

Bayes Theory

Bayes Theory allows us to determine posterior probabilities from priors when presented with evidence. Bayes theory is a fundamental probability result. It states that, for two given event and :

is the conditional probability of the event B knowing A. This is the fundamental theory that motivates the development of the Naive Bayes Classifier.

Notation

In binary classification, we try to predict labels : is either or . The datas we use is a matrix of observations with features.

Our aim is to define a function such that . As usual, we can define a loss minimization problem where :

Solution

We should bear in mind that the law of is unknown. The aim of the Bayes Classifier is to define a decision function such that the probability of an observation being under or above it is exactly . Therefore, if the output of the Bayes Classifier is above , the observation will be classified as belonging to the class , whereas in the other case, it would be classified as belonging to the class . The decision function is actually defined as the line for which :

How do we minimize the loss function then ?

This means we have two ways to make a mistake :

  • by classifying an observation belonging to the class in the class ,
  • by classifying an observation belonging to the class in the class .

By Fubini, we can develop the above stated equation :

The prior probability is defined as :

We can replace in the loss function :

Then, for a given , the minimium of is reached at for

which means :

Otherwise -

We can define a general expression for the global minimum :

Conclusion : The Bayes Classifier has a major limitation since we need to know the joint distribution of . It is much easier to estimate statistically the decision frontier. This is what we are going to cover in the next articles.