Classification vs. Prediction
n Classification:
n predicts
categorical class labels
n classifies
data (constructs a model) based on the training set and the values (class
labels) in a classifying attribute and uses it in classifying new data
n Prediction:
n models
continuous-valued functions, i.e., predicts unknown or missing values
n Typical
Applications
n credit
approval
n target
marketing
n medical
diagnosis
n treatment
effectiveness analysis
Classification—A Two-Step Process
n Model
construction: describing a set of predetermined classes
n Each
tuple/sample is assumed to belong to a predefined class, as determined by the
class label attribute
n The
set of tuples used for model construction: training set
n The
model is represented as classification rules, decision trees, or mathematical
formulae
n Model
usage: for classifying future or unknown objects
n Estimate
accuracy of the model
n The
known label of test sample is compared with the classified result from the
model
n Accuracy
rate is the percentage of test set samples that are correctly classified by the
model
n Test
set is independent of training set, otherwise over-fitting will occur
Classification Process (): Model Construction
Classification Process (): Use the Model in Prediction
Supervised vs. Unsupervised Learning
n Supervised
learning (classification)
n Supervision:
The training data (observations, measurements, etc.) are accompanied by labels
indicating the class of the observations
n New
data is classified based on the training set
n Unsupervised
learning (clustering)
n The
class labels of training data is unknown
n Given
a set of measurements, observations, etc. with the aim of establishing the
existence of classes or clusters in the data
Issues regarding classification and prediction (1): Data Preparation
n Data
cleaning
n Preprocess
data in order to reduce noise and handle missing values
n Relevance
analysis (feature selection)
n Remove
the irrelevant or redundant attributes
n Data
transformation
n Generalize
and/or normalize data
Issues regarding classification and prediction (2):
Evaluating Classification Methods
n Predictive
accuracy
n Speed
and scalability
n time
to construct the model
n time
to use the model
n Robustness
n handling
noise and missing values
n Scalability
n efficiency
in disk-resident databases
n Interpretability:
n understanding
and insight provded by the model
n Goodness
of rules
n decision
tree size
n compactness
of classification rules
No comments:
Post a Comment
silahkan membaca dan berkomentar