Classification in Large Databases
n Classification—a
classical problem extensively studied by statisticians and machine learning researchers
n Scalability:
Classifying data sets with millions of examples and hundreds of attributes with
reasonable speed
n Why
decision tree induction in data mining?
n relatively
faster learning speed (than other classification methods)
n convertible
to simple and easy to understand classification rules
n can
use SQL queries for accessing databases
n comparable
classification accuracy with other methods
Scalable Decision Tree Induction Methods in Data Mining
Studies
n SLIQ
(EDBT’96 — Mehta et al.)
n builds
an index for each attribute and only class list and the current attribute list
reside in memory
n SPRINT
(VLDB’96 — J. Shafer et al.)
n constructs
an attribute list data structure
n PUBLIC
(VLDB’98 — Rastogi & Shim)
n integrates
tree splitting and tree pruning: stop growing the tree earlier
n RainForest (VLDB’98 — Gehrke, Ramakrishnan & Ganti)
n separates
the scalability aspects from the criteria that determine the quality of the
tree
n builds
an AVC-list (attribute, value, class label)
Data Cube-Based Decision-Tree Induction
n Integration
of generalization with decision-tree induction (Kamber et al’97).
n Classification
at primitive concept levels
n E.g.,
precise temperature, humidity, outlook, etc.
n Low-level
concepts, scattered classes, bushy classification-trees
n Semantic
interpretation problems.
n Cube-based
multi-level classification
n Relevance
analysis at multi-levels.
n Information-gain
analysis with dimension + level.
Presentation of Classification Results
No comments:
Post a Comment
silahkan membaca dan berkomentar