Translate

Saturday, October 8, 2016

Classification by Decision Tree Induction

Classification by Decision Tree Induction
n  Decision tree
n  A flow-chart-like tree structure
n  Internal node denotes a test on an attribute
n  Branch represents an outcome of the test
n  Leaf nodes represent class labels or class distribution
n  Decision tree generation consists of two phases
n  Tree construction
n  At start, all the training examples are at the root
n  Partition examples recursively based on selected attributes
n  Tree pruning
n  Identify and remove branches that reflect noise or outliers
n  Use of decision tree: Classifying an unknown sample
n  Test the attribute values of the sample against the decision tree
Training Dataset
Output: A Decision Tree for “buys_computer”
Algorithm for Decision Tree Induction
n  Basic algorithm (a greedy algorithm)
n  Tree is constructed in a top-down recursive divide-and-conquer manner
n  At start, all the training examples are at the root
n  Attributes are categorical (if continuous-valued, they are discretized in advance)
n  Examples are partitioned recursively based on selected attributes
n  Test attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain)
n  Conditions for stopping partitioning
n  All samples for a given node belong to the same class
n  There are no remaining attributes for further partitioning – majority voting is employed for classifying the leaf
n  There are no samples left
Attribute Selection Measure
n  Information gain (ID3/C4.5)
n  All attributes are assumed to be categorical
n  Can be modified for continuous-valued attributes
n  Gini index (IBM IntelligentMiner)
n  All attributes are assumed continuous-valued
n  Assume there exist several possible split values for each attribute
n  May need other tools, such as clustering, to get the possible split values
n  Can be modified for categorical attributes
Information Gain
n  Select the attribute with the highest information gain
n  Assume there are two classes, P  and N
n  Let the set of examples S contain p elements of class P  and n elements of class N
n  The amount of information, needed to decide if an arbitrary example in S belongs to P  or N is defined as
Information Gain in Decision Tree Induction
Attribute Selection by Information Gain Computation
Gini Index (IBM IntelligentMiner)
Extracting Classification Rules from Trees
n  Represent the knowledge in the form of IF-THEN rules
n  One rule is created for each path from the root to a leaf
n  Each attribute-value pair along a path forms a conjunction
n  The leaf node holds the class prediction
n  Rules are easier for humans to understand
n  Example
IF age = “<=30” AND student = “no   THEN buys_computer = “no
IF age = “<=30” AND student = “yes  THEN buys_computer = “yes
IF age = “31…40”                                              THEN buys_computer = “yes
IF age = “>40”   AND credit_rating = “excellent   THEN buys_computer = “yes
IF age = “>40” AND credit_rating = “fair  THEN buys_computer = “no
Avoid Overfitting in Classification
n  The generated tree may overfit the training data
n  Too many branches, some may reflect anomalies due to noise or outliers
n  Result is in poor accuracy for unseen samples
n  Two approaches to avoid overfitting
n  Prepruning: Halt tree construction early—do not split a node if this would result in the goodness measure falling below a threshold
n  Difficult to choose an appropriate threshold
n  Postpruning: Remove branches from a “fully grown” tree—get a sequence of progressively pruned trees
n  Use a set of data different from the training data to decide which is the “best pruned tree”
Approaches to Determine the Final Tree Size
n  Separate training (2/3) and testing (1/3) sets
n  Use cross validation, e.g., 10-fold cross validation
n  Use all the data for training
n  but apply a statistical test (e.g., chi-square) to estimate whether expanding or pruning a node may improve the entire distribution
n  Use minimum description length (MDL) principle:
n  halting growth of the tree when the encoding is minimized
Enhancements to basic decision tree induction
n  Allow for continuous-valued attributes
n  Dynamically define new discrete-valued attributes that partition the continuous attribute value into a discrete set of intervals
n  Handle missing attribute values
n  Assign the most common value of the attribute
n  Assign probability to each of the possible values
n  Attribute construction
n  Create new attributes based on existing ones that are sparsely represented
n  This reduces fragmentation, repetition, and replication

No comments:

Post a Comment

silahkan membaca dan berkomentar