Translate

Saturday, October 8, 2016

Classification and Prediction


Classification vs. Prediction
n  Classification: 
n  predicts categorical class labels
n  classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data
n  Prediction: 
n  models continuous-valued functions, i.e., predicts unknown or missing values
n  Typical Applications
n  credit approval
n  target marketing
n  medical diagnosis
n  treatment effectiveness analysis
Classification—A Two-Step Process
n  Model construction: describing a set of predetermined classes
n  Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute
n  The set of tuples used for model construction: training set
n  The model is represented as classification rules, decision trees, or mathematical formulae
n  Model usage: for classifying future or unknown objects
n  Estimate accuracy of the model
n  The known label of test sample is compared with the classified result from the model
n  Accuracy rate is the percentage of test set samples that are correctly classified by the model
n  Test set is independent of training set, otherwise over-fitting will occur
Classification Process (): Model Construction
Classification Process (): Use the Model in Prediction
Supervised vs. Unsupervised Learning
n  Supervised learning (classification)
n  Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations
n  New data is classified based on the training set
n  Unsupervised learning (clustering)
n  The class labels of training data is unknown
n  Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data
Issues regarding classification and prediction (1): Data Preparation
n  Data cleaning
n  Preprocess data in order to reduce noise and handle missing values
n  Relevance analysis (feature selection)
n  Remove the irrelevant or redundant attributes
n  Data transformation
n  Generalize and/or normalize data
Issues regarding classification and prediction (2): Evaluating Classification Methods
n  Predictive accuracy
n  Speed and scalability
n  time to construct the model
n  time to use the model
n  Robustness
n  handling noise and missing values
n  Scalability
n  efficiency in disk-resident databases
n  Interpretability:
n  understanding and insight provded by the model
n  Goodness of rules
n  decision tree size
n  compactness of classification rules

No comments:

Post a Comment

silahkan membaca dan berkomentar