public class SimpleCart extends RandomizableClassifier implements AdditionalMeasureProducer, TechnicalInformationHandler
@book{Breiman1984, address = {Belmont, California}, author = {Leo Breiman and Jerome H. Friedman and Richard A. Olshen and Charles J. Stone}, publisher = {Wadsworth International Group}, title = {Classification and Regression Trees}, year = {1984} }Valid options are:
-S <num> Random number seed. (default 1)
-D If set, classifier is run in debug mode and may output additional info to the console
-M <min no> The minimal number of instances at the terminal nodes. (default 2)
-N <num folds> The number of folds used in the minimal cost-complexity pruning. (default 5)
-U Don't use the minimal cost-complexity pruning. (default yes).
-H Don't use the heuristic method for binary split. (default true).
-A Use 1 SE rule to make pruning decision. (default no).
-C Percentage of training data size (0-1]. (default 1).
Modifier and Type | Field and Description |
---|---|
protected double |
m_Alpha
Alpha-value (for pruning) at the node.
|
protected Attribute |
m_Attribute
Attribute used to split data.
|
protected Attribute |
m_ClassAttribute
Class attriubte of data.
|
protected double[] |
m_ClassProbs
Class probabilities.
|
protected double |
m_ClassValue
Class value if the node is leaf.
|
protected double[] |
m_Distribution
Distributions of leaf node (or temporary leaf node in minimal cost-complexity pruning)
|
protected boolean |
m_Heuristic
If use huristic search for nominal attributes in multi-class problems (default true).
|
protected boolean |
m_isLeaf
Indicate if the node is a leaf node.
|
protected double |
m_minNumObj
Minimum number of instances in at the terminal nodes.
|
protected int |
m_numFoldsPruning
Number of folds for minimal cost-complexity pruning.
|
protected double |
m_numIncorrectModel
Number of training examples misclassified by the model (subtree rooted).
|
protected double |
m_numIncorrectTree
Number of training examples misclassified by the model (subtree not rooted).
|
protected double[] |
m_Props
Proportion for each branch.
|
protected boolean |
m_Prune
If use minimal cost-compexity pruning.
|
protected double |
m_SizePer
Training data size.
|
protected String |
m_SplitString
Split subset used to split data for nominal attributes.
|
protected double |
m_SplitValue
Split point for a numeric attribute.
|
protected SimpleCart[] |
m_Successors
Successor nodes.
|
protected int |
m_totalTrainInstances
Total number of instances used to build the classifier.
|
protected Instances |
m_train
Training data.
|
protected boolean |
m_UseOneSE
If use the 1SE rule to make final decision tree.
|
m_Seed
m_Debug
Constructor and Description |
---|
SimpleCart() |
Modifier and Type | Method and Description |
---|---|
void |
buildClassifier(Instances data)
Build the classifier.
|
void |
calculateAlphas()
Updates the alpha field for all nodes.
|
protected double |
computeGini(double[] dist,
double total)
Compute and return gini index for a given distribution of a node.
|
protected double |
computeGiniGain(double[] parentDist,
double[][] childDist)
Compute and return gini gain for given distributions of a node and its
successor nodes.
|
protected double |
computeSortedInfo(Instances data,
int[][] sortedIndices,
double[][] weights,
double[] classProbs)
Compute sorted indices, weights and class probabilities for a given
dataset.
|
double[] |
distributionForInstance(Instance instance)
Computes class probabilities for instance using the decision tree.
|
Enumeration |
enumerateMeasures()
Return an enumeration of the measure names.
|
protected void |
fillInnerNodes(Vector nodeList)
Fills a list with all inner nodes in the tree.
|
Capabilities |
getCapabilities()
Returns default capabilities of the classifier.
|
boolean |
getHeuristic()
Get if use heuristic search for nominal attributes in multi-class problems.
|
protected Vector |
getInnerNodes()
Return a list of all inner nodes in the tree.
|
double |
getMeasure(String additionalMeasureName)
Returns the value of the named measure.
|
double |
getMinNumObj()
Get minimal number of instances at the terminal nodes.
|
int |
getNumFoldsPruning()
Set number of folds in internal cross-validation.
|
String[] |
getOptions()
Gets the current settings of the classifier.
|
String |
getRevision()
Returns the revision string.
|
double |
getSizePer()
Get training set size.
|
TechnicalInformation |
getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing
detailed information about the technical background of this class,
e.g., paper reference or book this class is based on.
|
boolean |
getUseOneSE()
Get if use the 1SE rule to choose final model.
|
boolean |
getUsePrune()
Get if use minimal cost-complexity pruning.
|
String |
globalInfo()
Return a description suitable for displaying in the explorer/experimenter.
|
String |
heuristicTipText()
Returns the tip text for this property
|
Enumeration |
listOptions()
Returns an enumeration describing the available options.
|
static void |
main(String[] args)
Main method.
|
protected void |
makeLeaf(Instances data)
Make the node leaf node.
|
protected void |
makeTree(Instances data,
int totalInstances,
int[][] sortedIndices,
double[][] weights,
double[] classProbs,
double totalWeight,
double minNumObj,
boolean useHeuristic)
Make binary decision tree recursively.
|
double |
measureTreeSize()
Return number of tree size.
|
String |
minNumObjTipText()
Returns the tip text for this property
|
void |
modelErrors()
Updates the numIncorrectModel field for all nodes when subtree (to be
pruned) is rooted.
|
protected SimpleCart |
nodeToPrune(Vector nodeList)
Find the node with minimal alpha value.
|
protected String |
nominalDistribution(double[][] props,
double[][][] dists,
Attribute att,
int[] sortedIndices,
double[] weights,
double[][] subsetWeights,
double[] giniGains,
Instances data,
boolean useHeuristic)
Compute distributions, proportions and total weights of two successor
nodes for a given nominal attribute.
|
protected double |
numericDistribution(double[][] props,
double[][][] dists,
Attribute att,
int[] sortedIndices,
double[] weights,
double[][] subsetWeights,
double[] giniGains,
Instances data)
Compute distributions, proportions and total weights of two successor
nodes for a given numeric attribute.
|
String |
numFoldsPruningTipText()
Returns the tip text for this property
|
int |
numInnerNodes()
Method to count the number of inner nodes in the tree.
|
int |
numLeaves()
Compute number of leaf nodes.
|
int |
numNodes()
Compute size of the tree.
|
void |
prune(double alpha)
Prunes the original tree using the CART pruning scheme, given a
cost-complexity parameter alpha.
|
int |
prune(double[] alphas,
double[] errors,
Instances test)
Method for performing one fold in the cross-validation of minimal
cost-complexity pruning.
|
void |
setHeuristic(boolean value)
Set if use heuristic search for nominal attributes in multi-class problems.
|
void |
setMinNumObj(double value)
Set minimal number of instances at the terminal nodes.
|
void |
setNumFoldsPruning(int value)
Set number of folds in internal cross-validation.
|
void |
setOptions(String[] options)
Parses a given list of options.
|
void |
setSizePer(double value)
Set training set size.
|
void |
setUseOneSE(boolean value)
Set if use the 1SE rule to choose final model.
|
void |
setUsePrune(boolean value)
Set if use minimal cost-complexity pruning.
|
String |
sizePerTipText()
Returns the tip text for this property
|
protected void |
splitData(int[][][] subsetIndices,
double[][][] subsetWeights,
Attribute att,
double splitPoint,
String splitStr,
int[][] sortedIndices,
double[][] weights,
Instances data)
Split data into two subsets and store sorted indices and weights for two
successor nodes.
|
String |
toString()
Prints the decision tree using the protected toString method from below.
|
protected String |
toString(int level)
Outputs a tree at a certain level.
|
void |
treeErrors()
Updates the numIncorrectTree field for all nodes.
|
protected void |
unprune()
Method to "unprune" the CART tree.
|
String |
useOneSETipText()
Returns the tip text for this property
|
String |
usePruneTipText()
Return the tip text for this property
|
getSeed, seedTipText, setSeed
classifyInstance, debugTipText, forName, getDebug, makeCopies, makeCopy, runClassifier, setDebug
protected Instances m_train
protected SimpleCart[] m_Successors
protected Attribute m_Attribute
protected double m_SplitValue
protected String m_SplitString
protected double m_ClassValue
protected Attribute m_ClassAttribute
protected double m_minNumObj
protected int m_numFoldsPruning
protected double m_Alpha
protected double m_numIncorrectModel
protected double m_numIncorrectTree
protected boolean m_isLeaf
protected boolean m_Prune
protected int m_totalTrainInstances
protected double[] m_Props
protected double[] m_ClassProbs
protected double[] m_Distribution
protected boolean m_Heuristic
protected boolean m_UseOneSE
protected double m_SizePer
public String globalInfo()
public TechnicalInformation getTechnicalInformation()
getTechnicalInformation
in interface TechnicalInformationHandler
public Capabilities getCapabilities()
getCapabilities
in interface CapabilitiesHandler
getCapabilities
in class Classifier
Capabilities
public void buildClassifier(Instances data) throws Exception
buildClassifier
in class Classifier
data
- the training instancesException
- if something goes wrongprotected void makeTree(Instances data, int totalInstances, int[][] sortedIndices, double[][] weights, double[] classProbs, double totalWeight, double minNumObj, boolean useHeuristic) throws Exception
data
- the training instancestotalInstances
- total number of instancessortedIndices
- sorted indices of the instancesweights
- weights of the instancesclassProbs
- class probabilitiestotalWeight
- total weight of instancesminNumObj
- minimal number of instances at leaf nodesuseHeuristic
- if use heuristic search for nominal attributes in multi-class problemException
- if something goes wrongpublic void prune(double alpha) throws Exception
alpha
- the cost-complexity parameterException
- if something goes wrongpublic int prune(double[] alphas, double[] errors, Instances test) throws Exception
alphas
- array to hold the generated alpha-valueserrors
- array to hold the corresponding error estimatestest
- test set of that fold (to obtain error estimates)Exception
- if something goes wrongprotected void unprune()
protected double numericDistribution(double[][] props, double[][][] dists, Attribute att, int[] sortedIndices, double[] weights, double[][] subsetWeights, double[] giniGains, Instances data) throws Exception
props
- proportions of each two branches for each attributedists
- class distributions of two branches for each attributeatt
- numeric att split onsortedIndices
- sorted indices of instances for the attirubteweights
- weights of instances for the attirbutesubsetWeights
- total weight of two branches split based on the attributeginiGains
- Gini gains for each attributedata
- training instancesException
- if something goes wrongprotected String nominalDistribution(double[][] props, double[][][] dists, Attribute att, int[] sortedIndices, double[] weights, double[][] subsetWeights, double[] giniGains, Instances data, boolean useHeuristic) throws Exception
props
- proportions of each two branches for each attributedists
- class distributions of two branches for each attributeatt
- numeric att split onsortedIndices
- sorted indices of instances for the attirubteweights
- weights of instances for the attirbutesubsetWeights
- total weight of two branches split based on the attributeginiGains
- Gini gains for each attributedata
- training instancesuseHeuristic
- if use heuristic searchException
- if something goes wrongprotected void splitData(int[][][] subsetIndices, double[][][] subsetWeights, Attribute att, double splitPoint, String splitStr, int[][] sortedIndices, double[][] weights, Instances data) throws Exception
subsetIndices
- sorted indecis of instances for each attribute
for two successor nodesubsetWeights
- weights of instances for each attribute for
two successor nodeatt
- attribute the split based onsplitPoint
- split point the split based on if att is numericsplitStr
- split subset the split based on if att is nominalsortedIndices
- sorted indices of the instances to be splitweights
- weights of the instances to bes splitdata
- training dataException
- if something goes wrongpublic void modelErrors() throws Exception
Exception
- if something goes wrongpublic void treeErrors() throws Exception
Exception
- if something goes wrongpublic void calculateAlphas() throws Exception
Exception
- if something goes wrongprotected SimpleCart nodeToPrune(Vector nodeList)
nodeList
- list of inner nodesprotected double computeSortedInfo(Instances data, int[][] sortedIndices, double[][] weights, double[] classProbs) throws Exception
data
- training datasortedIndices
- sorted indices of instances at the nodeweights
- weights of instances at the nodeclassProbs
- class probabilities at the nodeException
- if something goes wrongprotected double computeGiniGain(double[] parentDist, double[][] childDist)
parentDist
- class distributions of parent nodechildDist
- class distributions of successor nodesprotected double computeGini(double[] dist, double total)
dist
- class distributionstotal
- class distributionspublic double[] distributionForInstance(Instance instance) throws Exception
distributionForInstance
in class Classifier
instance
- the instance for which class probabilities is to be computedException
- if something goes wrongprotected void makeLeaf(Instances data)
data
- trainging datapublic String toString()
protected String toString(int level)
level
- the level at which the tree is to be printedpublic int numNodes()
public int numInnerNodes()
protected Vector getInnerNodes()
protected void fillInnerNodes(Vector nodeList)
nodeList
- the list to be filledpublic int numLeaves()
public Enumeration listOptions()
listOptions
in interface OptionHandler
listOptions
in class RandomizableClassifier
public void setOptions(String[] options) throws Exception
-S <num> Random number seed. (default 1)
-D If set, classifier is run in debug mode and may output additional info to the console
-M <min no> The minimal number of instances at the terminal nodes. (default 2)
-N <num folds> The number of folds used in the minimal cost-complexity pruning. (default 5)
-U Don't use the minimal cost-complexity pruning. (default yes).
-H Don't use the heuristic method for binary split. (default true).
-A Use 1 SE rule to make pruning decision. (default no).
-C Percentage of training data size (0-1]. (default 1).
setOptions
in interface OptionHandler
setOptions
in class RandomizableClassifier
options
- the list of options as an array of stringsException
- if an options is not supportedpublic String[] getOptions()
getOptions
in interface OptionHandler
getOptions
in class RandomizableClassifier
public Enumeration enumerateMeasures()
enumerateMeasures
in interface AdditionalMeasureProducer
public double measureTreeSize()
public double getMeasure(String additionalMeasureName)
getMeasure
in interface AdditionalMeasureProducer
additionalMeasureName
- the name of the measure to query for its valueIllegalArgumentException
- if the named measure is not supportedpublic String minNumObjTipText()
public void setMinNumObj(double value)
value
- minimal number of instances at the terminal nodespublic double getMinNumObj()
public String numFoldsPruningTipText()
public void setNumFoldsPruning(int value)
value
- number of folds in internal cross-validation.public int getNumFoldsPruning()
public String usePruneTipText()
public void setUsePrune(boolean value)
value
- if use minimal cost-complexity pruningpublic boolean getUsePrune()
public String heuristicTipText()
public void setHeuristic(boolean value)
value
- if use heuristic search for nominal attributes in
multi-class problemspublic boolean getHeuristic()
public String useOneSETipText()
public void setUseOneSE(boolean value)
value
- if use the 1SE rule to choose final modelpublic boolean getUseOneSE()
public String sizePerTipText()
public void setSizePer(double value)
value
- training set sizepublic double getSizePer()
public String getRevision()
getRevision
in interface RevisionHandler
getRevision
in class Classifier
public static void main(String[] args)
args
- the options for the classifierCopyright © 2015 University of Waikato, Hamilton, NZ. All rights reserved.