public class Discretize extends PotentialClassIgnorer implements UnsupervisedFilter, WeightedInstancesHandler
-unset-class-temporarily Unsets the class index temporarily before the filter is applied to the data. (default: no)
-B <num> Specifies the (maximum) number of bins to divide numeric attributes into. (default = 10)
-M <num> Specifies the desired weight of instances per bin for equal-frequency binning. If this is set to a positive number then the -B option will be ignored. (default = -1)
-F Use equal-frequency instead of equal-width discretization.
-O Optimize number of bins using leave-one-out estimate of estimated entropy (for equal-width discretization). If this is set then the -B option will be ignored.
-R <col1,col2-col4,...> Specifies list of columns to Discretize. First and last are valid indexes. (default: first-last)
-V Invert matching sense of column indexes.
-D Output binary attributes for discretized attributes.
Modifier and Type | Field and Description |
---|---|
protected double[][] |
m_CutPoints
Store the current cutpoints
|
protected String |
m_DefaultCols
The default columns to discretize
|
protected double |
m_DesiredWeightOfInstancesPerInterval
The desired weight of instances per bin
|
protected Range |
m_DiscretizeCols
Stores which columns to Discretize
|
protected boolean |
m_FindNumBins
Find the number of bins using cross-validated entropy.
|
protected boolean |
m_MakeBinary
Output binary attributes for discretized attributes.
|
protected int |
m_NumBins
The number of bins to divide the attribute into
|
protected boolean |
m_UseEqualFrequency
Use equal-frequency binning if unsupervised discretization turned on
|
m_ClassIndex, m_IgnoreClass
m_FirstBatchDone, m_InputRelAtts, m_InputStringAtts, m_NewBatch, m_OutputRelAtts, m_OutputStringAtts
Constructor and Description |
---|
Discretize()
Constructor - initialises the filter
|
Discretize(String cols)
Another constructor, sets the attribute indices immediately
|
Modifier and Type | Method and Description |
---|---|
String |
attributeIndicesTipText()
Returns the tip text for this property
|
boolean |
batchFinished()
Signifies that this batch of input to the filter is finished.
|
String |
binsTipText()
Returns the tip text for this property
|
protected void |
calculateCutPoints()
Generate the cutpoints for each attribute
|
protected void |
calculateCutPointsByEqualFrequencyBinning(int index)
Set cutpoints for a single attribute.
|
protected void |
calculateCutPointsByEqualWidthBinning(int index)
Set cutpoints for a single attribute.
|
protected void |
convertInstance(Instance instance)
Convert a single instance over.
|
String |
desiredWeightOfInstancesPerIntervalTipText()
Returns the tip text for this property
|
protected void |
findNumBins(int index)
Optimizes the number of bins using leave-one-out cross-validation.
|
String |
findNumBinsTipText()
Returns the tip text for this property
|
String |
getAttributeIndices()
Gets the current range selection
|
int |
getBins()
Gets the number of bins numeric attributes will be divided into
|
Capabilities |
getCapabilities()
Returns the Capabilities of this filter.
|
double[] |
getCutPoints(int attributeIndex)
Gets the cut points for an attribute
|
double |
getDesiredWeightOfInstancesPerInterval()
Get the DesiredWeightOfInstancesPerInterval value.
|
boolean |
getFindNumBins()
Get the value of FindNumBins.
|
boolean |
getInvertSelection()
Gets whether the supplied columns are to be removed or kept
|
boolean |
getMakeBinary()
Gets whether binary attributes should be made for discretized ones.
|
String[] |
getOptions()
Gets the current settings of the filter.
|
String |
getRevision()
Returns the revision string.
|
boolean |
getUseEqualFrequency()
Get the value of UseEqualFrequency.
|
String |
globalInfo()
Returns a string describing this filter
|
boolean |
input(Instance instance)
Input an instance for filtering.
|
String |
invertSelectionTipText()
Returns the tip text for this property
|
Enumeration |
listOptions()
Gets an enumeration describing the available options.
|
static void |
main(String[] argv)
Main method for testing this class.
|
String |
makeBinaryTipText()
Returns the tip text for this property
|
void |
setAttributeIndices(String rangeList)
Sets which attributes are to be Discretized (only numeric
attributes among the selection will be Discretized).
|
void |
setAttributeIndicesArray(int[] attributes)
Sets which attributes are to be Discretized (only numeric
attributes among the selection will be Discretized).
|
void |
setBins(int numBins)
Sets the number of bins to divide each selected numeric attribute into
|
void |
setDesiredWeightOfInstancesPerInterval(double newDesiredNumber)
Set the DesiredWeightOfInstancesPerInterval value.
|
void |
setFindNumBins(boolean newFindNumBins)
Set the value of FindNumBins.
|
boolean |
setInputFormat(Instances instanceInfo)
Sets the format of the input instances.
|
void |
setInvertSelection(boolean invert)
Sets whether selected columns should be removed or kept.
|
void |
setMakeBinary(boolean makeBinary)
Sets whether binary attributes should be made for discretized ones.
|
void |
setOptions(String[] options)
Parses a given list of options.
|
protected void |
setOutputFormat()
Set the output format.
|
void |
setUseEqualFrequency(boolean newUseEqualFrequency)
Set the value of UseEqualFrequency.
|
String |
useEqualFrequencyTipText()
Returns the tip text for this property
|
getIgnoreClass, getOutputFormat, ignoreClassTipText, setIgnoreClass
batchFilterFile, bufferInput, copyValues, copyValues, filterFile, flushInput, getCapabilities, getInputFormat, initInputLocators, initOutputLocators, inputFormatPeek, isFirstBatchDone, isNewBatch, isOutputFormatDefined, makeCopies, makeCopy, numPendingOutput, output, outputFormatPeek, outputPeek, push, resetQueue, runFilter, setOutputFormat, testInputFormat, toString, useFilter, wekaStaticWrapper
protected Range m_DiscretizeCols
protected int m_NumBins
protected double m_DesiredWeightOfInstancesPerInterval
protected double[][] m_CutPoints
protected boolean m_MakeBinary
protected boolean m_FindNumBins
protected boolean m_UseEqualFrequency
protected String m_DefaultCols
public Discretize()
public Discretize(String cols)
cols
- the attribute indicespublic Enumeration listOptions()
listOptions
in interface OptionHandler
listOptions
in class PotentialClassIgnorer
public void setOptions(String[] options) throws Exception
-unset-class-temporarily Unsets the class index temporarily before the filter is applied to the data. (default: no)
-B <num> Specifies the (maximum) number of bins to divide numeric attributes into. (default = 10)
-M <num> Specifies the desired weight of instances per bin for equal-frequency binning. If this is set to a positive number then the -B option will be ignored. (default = -1)
-F Use equal-frequency instead of equal-width discretization.
-O Optimize number of bins using leave-one-out estimate of estimated entropy (for equal-width discretization). If this is set then the -B option will be ignored.
-R <col1,col2-col4,...> Specifies list of columns to Discretize. First and last are valid indexes. (default: first-last)
-V Invert matching sense of column indexes.
-D Output binary attributes for discretized attributes.
setOptions
in interface OptionHandler
setOptions
in class PotentialClassIgnorer
options
- the list of options as an array of stringsException
- if an option is not supportedpublic String[] getOptions()
getOptions
in interface OptionHandler
getOptions
in class PotentialClassIgnorer
public Capabilities getCapabilities()
getCapabilities
in interface CapabilitiesHandler
getCapabilities
in class Filter
Capabilities
public boolean setInputFormat(Instances instanceInfo) throws Exception
setInputFormat
in class PotentialClassIgnorer
instanceInfo
- an Instances object containing the input instance
structure (any instances contained in the object are ignored - only the
structure is required).Exception
- if the input format can't be set successfullypublic boolean input(Instance instance)
input
in class Filter
instance
- the input instanceIllegalStateException
- if no input format has been defined.public boolean batchFinished()
batchFinished
in class Filter
IllegalStateException
- if no input structure has been definedpublic String globalInfo()
public String findNumBinsTipText()
public boolean getFindNumBins()
public void setFindNumBins(boolean newFindNumBins)
newFindNumBins
- Value to assign to FindNumBins.public String makeBinaryTipText()
public boolean getMakeBinary()
public void setMakeBinary(boolean makeBinary)
makeBinary
- if binary attributes are to be madepublic String desiredWeightOfInstancesPerIntervalTipText()
public double getDesiredWeightOfInstancesPerInterval()
public void setDesiredWeightOfInstancesPerInterval(double newDesiredNumber)
newDesiredNumber
- The new DesiredNumber value.public String useEqualFrequencyTipText()
public boolean getUseEqualFrequency()
public void setUseEqualFrequency(boolean newUseEqualFrequency)
newUseEqualFrequency
- Value to assign to UseEqualFrequency.public String binsTipText()
public int getBins()
public void setBins(int numBins)
numBins
- the number of binspublic String invertSelectionTipText()
public boolean getInvertSelection()
public void setInvertSelection(boolean invert)
invert
- the new invert settingpublic String attributeIndicesTipText()
public String getAttributeIndices()
public void setAttributeIndices(String rangeList)
rangeList
- a string representing the list of attributes. Since
the string will typically come from a user, attributes are indexed from
1. IllegalArgumentException
- if an invalid range list is suppliedpublic void setAttributeIndicesArray(int[] attributes)
attributes
- an array containing indexes of attributes to Discretize.
Since the array will typically come from a program, attributes are indexed
from 0.IllegalArgumentException
- if an invalid set of ranges
is suppliedpublic double[] getCutPoints(int attributeIndex)
attributeIndex
- the index (from 0) of the attribute to get the cut points ofprotected void calculateCutPoints()
protected void calculateCutPointsByEqualWidthBinning(int index)
index
- the index of the attribute to set cutpoints forprotected void calculateCutPointsByEqualFrequencyBinning(int index)
index
- the index of the attribute to set cutpoints forprotected void findNumBins(int index)
index
- the attribute indexprotected void setOutputFormat()
protected void convertInstance(Instance instance)
instance
- the instance to convertpublic String getRevision()
getRevision
in interface RevisionHandler
getRevision
in class Filter
public static void main(String[] argv)
argv
- should contain arguments to the filter: use -h for helpCopyright © 2015 University of Waikato, Hamilton, NZ. All rights reserved.