teknik informatika: Bayesian Classification: Why?

Bayesian Classification: Why?

n Probabilistic learning: Calculate explicit probabilities for hypothesis, among the most practical approaches to certain types of learning problems

n Incremental: Each training example can incrementally increase/decrease the probability that a hypothesis is correct. Prior knowledge can be combined with observed data.

n Probabilistic prediction: Predict multiple hypotheses, weighted by their probabilities

n Standard: Even when Bayesian methods are computationally intractable, they can provide a standard of optimal decision making against which other methods can be measured

Bayesian Theorem

Naïve Bayes Classifier

n A simplified assumption: attributes are conditionally independent:

n Greatly reduces the computation cost, only count the class distribution.

n Given a training set, we can compute the probabilities

Bayesian classification

n The classification problem may be formalized using a-posteriori probabilities:

n P(C|X) = prob. that the sample tuple X=<x₁,…,x_k> is of class C.

n E.g. P(class=N | outlook=sunny,windy=true,…)

n Idea: assign to sample X the class label C such that P(C|X) is maximal

Estimating a-posteriori probabilities

n Bayes theorem:

P(C|X) = P(X|C)·P(C) / P(X)

n P(X) is constant for all classes

n P(C) = relative freq of class C samples

n C such that P(C|X) is maximum =
C such that P(X|C)·P(C) is maximum

n Problem: computing P(X|C) is unfeasible!

Naïve Bayesian Classification

n Naïve assumption: attribute independence

P(x₁,…,x_k|C) = P(x₁|C)·…·P(x_k|C)

n If i-th attribute is categorical:
P(x_i|C) is estimated as the relative freq of samples having value x_i as i-th attribute in class C

n If i-th attribute is continuous:
P(x_i|C) is estimated thru a Gaussian density function

n Computationally easy in both cases

Play-tennis example: estimating P(x_i|C)

Play-tennis example: classifying X

n An unseen sample X = <rain, hot, high, false>

n Sample X is classified in class n (don’t play)

The independence hypothesis…

n … makes computation possible

n … yields optimal classifiers when satisfied

n … but is seldom satisfied in practice, as attributes (variables) are often correlated.

n Attempts to overcome this limitation:

n Bayesian networks, that combine Bayesian reasoning with causal relationships between attributes

n Decision trees, that reason on one attribute at the time, considering most important attributes first

Bayesian Belief Networks

n Bayesian belief network allows a subset of the variables conditionally independent

n A graphical model of causal relationships

n Several cases of learning Bayesian belief networks

n Given both network structure and all the variables: easy

n Given network structure but only some variables

n When the network structure is not known in advance

teknik informatika

Translate

Saturday, October 8, 2016

Bayesian Classification: Why?

No comments:

Post a Comment