Bayesian Classification: Why?
n Probabilistic
learning: Calculate explicit
probabilities for hypothesis, among the most practical approaches to certain
types of learning problems
n Incremental:
Each training example can incrementally increase/decrease the probability that
a hypothesis is correct. Prior knowledge
can be combined with observed data.
n Probabilistic
prediction: Predict multiple
hypotheses, weighted by their probabilities
n Standard:
Even when Bayesian methods are computationally intractable, they can provide a
standard of optimal decision making against which other methods can be measured
Bayesian Theorem
Naïve Bayes Classifier
n A
simplified assumption: attributes are conditionally independent:
n Greatly
reduces the computation cost, only count the class distribution.
n Given
a training set, we can compute the probabilities
Bayesian classification
n The
classification problem may be formalized using a-posteriori probabilities:
n P(C|X)
= prob. that the sample tuple X=<x1,…,xk>
is of class C.
n E.g.
P(class=N | outlook=sunny,windy=true,…)
n Idea:
assign to sample X the class label C such that P(C|X) is maximal
Estimating a-posteriori probabilities
n Bayes
theorem:
P(C|X) = P(X|C)·P(C) / P(X)
n P(X)
is constant for all classes
n P(C)
= relative freq of class C samples
n C
such that P(C|X) is maximum =
C such that P(X|C)·P(C) is maximum
C such that P(X|C)·P(C) is maximum
n Problem:
computing P(X|C) is unfeasible!
Naïve Bayesian Classification
n Naïve
assumption: attribute independence
P(x1,…,xk|C) = P(x1|C)·…·P(xk|C)
n If
i-th attribute is categorical:
P(xi|C) is estimated as the relative freq of samples having value xi as i-th attribute in class C
P(xi|C) is estimated as the relative freq of samples having value xi as i-th attribute in class C
n If
i-th attribute is continuous:
P(xi|C) is estimated thru a Gaussian density function
P(xi|C) is estimated thru a Gaussian density function
n Computationally
easy in both cases
Play-tennis example: estimating P(xi|C)
Play-tennis example: classifying X
n An
unseen sample X = <rain, hot, high, false>
n P(X|p)·P(p)
=
P(rain|p)·P(hot|p)·P(high|p)·P(false|p)·P(p) = 3/9·2/9·3/9·6/9·9/14 = 0.010582
P(rain|p)·P(hot|p)·P(high|p)·P(false|p)·P(p) = 3/9·2/9·3/9·6/9·9/14 = 0.010582
n P(X|n)·P(n)
=
P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n) = 2/5·2/5·4/5·2/5·5/14 = 0.018286
P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n) = 2/5·2/5·4/5·2/5·5/14 = 0.018286
n Sample
X is classified in class n (don’t play)
The independence hypothesis…
n …
makes computation possible
n …
yields optimal classifiers when satisfied
n …
but is seldom satisfied in practice, as attributes (variables) are often
correlated.
n Attempts
to overcome this limitation:
n Bayesian
networks, that combine Bayesian reasoning with causal relationships between
attributes
n Decision
trees, that reason on one attribute at the time, considering most important
attributes first
Bayesian Belief Networks
n Bayesian
belief network allows a subset of the variables conditionally
independent
n A
graphical model of causal relationships
n Several
cases of learning Bayesian belief networks
n Given
both network structure and all the variables: easy
n Given
network structure but only some variables
n When
the network structure is not known in advance
No comments:
Post a Comment
silahkan membaca dan berkomentar