Translate

Saturday, October 8, 2016

Bayesian Classification: Why?


Bayesian Classification: Why?
n  Probabilistic learning:  Calculate explicit probabilities for hypothesis, among the most practical approaches to certain types of learning problems
n  Incremental: Each training example can incrementally increase/decrease the probability that a hypothesis is correct.  Prior knowledge can be combined with observed data.
n  Probabilistic prediction:  Predict multiple hypotheses, weighted by their probabilities
n  Standard: Even when Bayesian methods are computationally intractable, they can provide a standard of optimal decision making against which other methods can be measured
Bayesian Theorem
Naïve Bayes Classifier
n  A simplified assumption: attributes are conditionally independent:
n  Greatly reduces the computation cost, only count the class distribution.
n  Given a training set, we can compute the probabilities
Bayesian classification
n  The classification problem may be formalized using a-posteriori probabilities:
n    P(C|X)  = prob. that the sample tuple                                                   X=<x1,…,xk> is of class C.
n  E.g. P(class=N | outlook=sunny,windy=true,…)
n  Idea: assign to sample X the class label C such that P(C|X) is maximal
Estimating a-posteriori probabilities
n  Bayes theorem:
P(C|X) = P(X|C)·P(C) / P(X)
n  P(X) is constant for all classes
n  P(C) = relative freq of class C samples
n  C such that P(C|X) is maximum =
C such that P(X|C)·P(C) is maximum
n  Problem: computing P(X|C) is unfeasible!
Naïve Bayesian Classification
n  Naïve assumption: attribute independence
P(x1,…,xk|C) = P(x1|C)·…·P(xk|C)
n  If i-th attribute is categorical:
P(xi|C) is estimated as the relative freq of samples having value xi as i-th attribute in class C
n  If i-th attribute is continuous:
P(xi|C) is estimated thru a Gaussian density function
n  Computationally easy in both cases
Play-tennis example: estimating P(xi|C)
Play-tennis example: classifying X
n  An unseen sample X = <rain, hot, high, false>
n  P(X|p)·P(p) =
P(rain|p)·P(hot|p)·P(high|p)·P(false|p)·P(p) = 3/9·2/9·3/9·6/9·9/14 = 0.010582
n  P(X|n)·P(n) =
P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n) = 2/5·2/5·4/5·2/5·5/14 = 0.018286
n  Sample X is classified in class n (don’t play)
The independence hypothesis…
n  … makes computation possible
n  … yields optimal classifiers when satisfied
n  … but is seldom satisfied in practice, as attributes (variables) are often correlated.
n  Attempts to overcome this limitation:
n  Bayesian networks, that combine Bayesian reasoning with causal relationships between attributes
n  Decision trees, that reason on one attribute at the time, considering most important attributes first
Bayesian Belief Networks
n  Bayesian belief network allows a subset of the variables conditionally independent
n  A graphical model of causal relationships
n  Several cases of learning Bayesian belief networks
n  Given both network structure and all the variables: easy
n  Given network structure but only some variables
n  When the network structure is not known in advance

No comments:

Post a Comment

silahkan membaca dan berkomentar