Translate

Saturday, October 8, 2016

Classification in Large Databases

Classification in Large Databases
n  Classification—a classical problem extensively studied by statisticians and machine learning researchers
n  Scalability: Classifying data sets with millions of examples and hundreds of attributes with reasonable speed
n  Why decision tree induction in data mining?
n  relatively faster learning speed (than other classification methods)
n  convertible to simple and easy to understand classification rules
n  can use SQL queries for accessing databases
n  comparable classification accuracy with other methods
Scalable Decision Tree Induction Methods in Data Mining Studies
n  SLIQ (EDBT’96 — Mehta et al.)
n  builds an index for each attribute and only class list and the current attribute list reside in memory
n  SPRINT (VLDB’96 — J. Shafer et al.)
n  constructs an attribute list data structure
n  PUBLIC (VLDB’98 — Rastogi & Shim)
n  integrates tree splitting and tree pruning: stop growing the tree earlier
n  RainForest  (VLDB’98 — Gehrke, Ramakrishnan & Ganti)
n  separates the scalability aspects from the criteria that determine the quality of the tree
n  builds an AVC-list (attribute, value, class label)
Data Cube-Based Decision-Tree Induction
n  Integration of generalization with decision-tree induction (Kamber et al’97).
n  Classification at primitive concept levels
n  E.g., precise temperature, humidity, outlook, etc.
n  Low-level concepts, scattered classes, bushy classification-trees
n  Semantic interpretation problems.
n  Cube-based multi-level classification
n  Relevance analysis at multi-levels.
n  Information-gain analysis with dimension + level.
Presentation of Classification Results

No comments:

Post a Comment

silahkan membaca dan berkomentar