Bayes’ Theorem

by Thomas Bayes
when E for evidence(e.g. smoke, hard to know proba directly), B for belief(e.g. fire)
$P(B|E)= \frac{P(E|B)P(B)}{P(E)}= \frac{P(E|B)P(B)}{P(E|B)P(B)+P(E|\overline B)P(\overline B)}$
we call $P(B)$ prior probability, $P(E|B)$ likelihoods
$P(B_i|E)= \frac{P(B_i)P(E|B_i)}{ \sum_{k} P(B_k)P(E|B_k) }$

Naïve Bayes

assume each evidence makes an independent(can multiply) and equal contribution to the belief(relevant to belief, otherwise useless)
$P(B_i|E)= \frac{P(B_i) \prod_j P(E_j|B_i)}{ \sum_{k} P(B_k)\prod_j P(E_j|B_k) }$
using logarithm will be helpful to avoid floating-point underflow when we only care about numerator to make comparison
$B_{NB} = argmax_{B_i} \left( logP(B_i) + \sum_{n=1}^d logP(e_n|B_i) \right)$
Zero Frequency Problem: some evidences that specified to consider but haven’t been observed in training data set will result in log(0) if we apply logarithm sum
- additive smoothing

Continuous Probability Distribution: consider probability=density*size_of_neighborhood(delta d), and delta will be cancel out in quotient, so just plug in density will be ok

Applications

Real-time prediction(So fast...)
Text classification(Spam Filtering)
Recommendation system(Bayes classifier and collaborative filtering together build a recommendation system that uses machine learning and data mining techniques to filter unseen information and predict whether a user would like a given resource or not)

Code

sklearn

training = np.array([[0, 0, 0, 1], [0, 0, 0, 0], [2, 0, 0, 1], [1, 1, 0, 1], [1, 2, 1, 1],
[1, 2, 1, 0], [2, 2, 1, 0], [0, 1, 0, 1], [0, 2, 1, 1], [1, 1, 1, 1],
[0, 1, 1, 0], [2, 1, 0, 0], [2, 0, 1, 1], [1, 1, 0, 0]])
outcome = np.array([1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1])
new_sample = np.array([[0, 2, 0, 0]])
clf = sklearn.naive_bayes.CategoricalNB(alpha=1.0e-10).fit(training, outcome)
pred_class = clf.predict(new_sample); prob = clf.predict_proba(new_sample)
print(pred_class, prob)