teknik informatika: Classification and Prediction

Classification vs. Prediction

n Classification:

n predicts categorical class labels

n classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data

n Prediction:

n models continuous-valued functions, i.e., predicts unknown or missing values

n Typical Applications

n credit approval

n target marketing

n medical diagnosis

n treatment effectiveness analysis

Classification—A Two-Step Process

n Model construction: describing a set of predetermined classes

n Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute

n The set of tuples used for model construction: training set

n The model is represented as classification rules, decision trees, or mathematical formulae

n Model usage: for classifying future or unknown objects

n Estimate accuracy of the model

n The known label of test sample is compared with the classified result from the model

n Accuracy rate is the percentage of test set samples that are correctly classified by the model

n Test set is independent of training set, otherwise over-fitting will occur

Classification Process (): Model Construction

Classification Process (): Use the Model in Prediction

Supervised vs. Unsupervised Learning

n Supervised learning (classification)

n Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations

n New data is classified based on the training set

n Unsupervised learning (clustering)

n The class labels of training data is unknown

n Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data

Issues regarding classification and prediction (1): Data Preparation

n Data cleaning

n Preprocess data in order to reduce noise and handle missing values

n Relevance analysis (feature selection)

n Remove the irrelevant or redundant attributes

n Data transformation

n Generalize and/or normalize data

Issues regarding classification and prediction (2): Evaluating Classification Methods

n Predictive accuracy

n Speed and scalability

n time to construct the model

n time to use the model

n Robustness

n handling noise and missing values

n Scalability

n efficiency in disk-resident databases

n Interpretability:

n understanding and insight provded by the model

n Goodness of rules

n decision tree size

n compactness of classification rules

teknik informatika

Translate

Saturday, October 8, 2016

Classification and Prediction

No comments:

Post a Comment