teknik informatika: Classification by Decision Tree Induction

Classification by Decision Tree Induction

n Decision tree

n A flow-chart-like tree structure

n Internal node denotes a test on an attribute

n Branch represents an outcome of the test

n Leaf nodes represent class labels or class distribution

n Decision tree generation consists of two phases

n Tree construction

n At start, all the training examples are at the root

n Partition examples recursively based on selected attributes

n Tree pruning

n Identify and remove branches that reflect noise or outliers

n Use of decision tree: Classifying an unknown sample

n Test the attribute values of the sample against the decision tree

Training Dataset

Output: A Decision Tree for “buys_computer”

Algorithm for Decision Tree Induction

n Basic algorithm (a greedy algorithm)

n Tree is constructed in a top-down recursive divide-and-conquer manner

n At start, all the training examples are at the root

n Attributes are categorical (if continuous-valued, they are discretized in advance)

n Examples are partitioned recursively based on selected attributes

n Test attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain)

n Conditions for stopping partitioning

n All samples for a given node belong to the same class

n There are no remaining attributes for further partitioning – majority voting is employed for classifying the leaf

n There are no samples left

Attribute Selection Measure

n Information gain (ID3/C4.5)

n All attributes are assumed to be categorical

n Can be modified for continuous-valued attributes

n Gini index (IBM IntelligentMiner)

n All attributes are assumed continuous-valued

n Assume there exist several possible split values for each attribute

n May need other tools, such as clustering, to get the possible split values

n Can be modified for categorical attributes

Information Gain

n Select the attribute with the highest information gain

n Assume there are two classes, P and N

n Let the set of examples S contain p elements of class P and n elements of class N

n The amount of information, needed to decide if an arbitrary example in S belongs to P or N is defined as

Information Gain in Decision Tree Induction

Attribute Selection by Information Gain Computation

Gini Index (IBM IntelligentMiner)

Extracting Classification Rules from Trees

n Represent the knowledge in the form of IF-THEN rules

n One rule is created for each path from the root to a leaf

n Each attribute-value pair along a path forms a conjunction

n The leaf node holds the class prediction

n Rules are easier for humans to understand

n Example

IF age = “<=30” AND student = “no” THEN buys_computer = “no”

IF age = “<=30” AND student = “yes” THEN buys_computer = “yes”

IF age = “31…40” THEN buys_computer = “yes”

IF age = “>40” AND credit_rating = “excellent” THEN buys_computer = “yes”

IF age = “>40” AND credit_rating = “fair” THEN buys_computer = “no”

Avoid Overfitting in Classification

n The generated tree may overfit the training data

n Too many branches, some may reflect anomalies due to noise or outliers

n Result is in poor accuracy for unseen samples

n Two approaches to avoid overfitting

n Prepruning: Halt tree construction early—do not split a node if this would result in the goodness measure falling below a threshold

n Difficult to choose an appropriate threshold

n Postpruning: Remove branches from a “fully grown” tree—get a sequence of progressively pruned trees

n Use a set of data different from the training data to decide which is the “best pruned tree”

Approaches to Determine the Final Tree Size

n Separate training (2/3) and testing (1/3) sets

n Use cross validation, e.g., 10-fold cross validation

n Use all the data for training

n but apply a statistical test (e.g., chi-square) to estimate whether expanding or pruning a node may improve the entire distribution

n Use minimum description length (MDL) principle:

n halting growth of the tree when the encoding is minimized

Enhancements to basic decision tree induction

n Allow for continuous-valued attributes

n Dynamically define new discrete-valued attributes that partition the continuous attribute value into a discrete set of intervals

n Handle missing attribute values

n Assign the most common value of the attribute

n Assign probability to each of the possible values

n Attribute construction

n Create new attributes based on existing ones that are sparsely represented

n This reduces fragmentation, repetition, and replication

teknik informatika

Translate

Saturday, October 8, 2016

Classification by Decision Tree Induction

No comments:

Post a Comment