weka.associations
Class FPGrowth

java.lang.Object
  extended by weka.associations.AbstractAssociator
      extended by weka.associations.FPGrowth
All Implemented Interfaces:
java.io.Serializable, java.lang.Cloneable, Associator, CapabilitiesHandler, OptionHandler, RevisionHandler, TechnicalInformationHandler

public class FPGrowth
extends AbstractAssociator
implements OptionHandler, TechnicalInformationHandler

Class implementing the FP-growth algorithm for finding large item sets without candidate generation. Iteratively reduces the minimum support until it finds the required number of rules with the given minimum metric. For more information see:

J. Han, J.Pei, Y. Yin: Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM-SIGMID International Conference on Management of Data, 1-12, 2000.

BibTeX:

 @inproceedings{Han2000,
    author = {J. Han and J.Pei and Y. Yin},
    booktitle = {Proceedings of the 2000 ACM-SIGMID International Conference on Management of Data},
    pages = {1-12},
    title = {Mining frequent patterns without candidate generation},
    year = {2000}
 }
 

Valid options are:

 -P <attribute index of positive value>
  Set the index of the attribute value to consider as 'positive'
  for binary attributes in normal dense instances. Index 2 is always
  used for sparse instances. (default = 2)
 -I <max items>
  The maximum number of items to include in large items sets (and rules). (default = -1, i.e. no limit.)
 -N <require number of rules>
  The required number of rules. (default = 10)
 -T <0=confidence | 1=lift | 2=leverage | 3=Conviction>
  The metric by which to rank rules. (default = confidence)
 -C <minimum metric score of a rule>
  The minimum metric score of a rule. (default = 0.9)
 -U <upper bound for minimum support>
  Upper bound for minimum support. (default = 1.0)
 -M <lower bound for minimum support>
  The lower bound for the minimum support. (default = 0.1)
 -D <delta for minimum support>
  The delta by which the minimum support is decreased in
  each iteration. (default = 0.05)
 -S
  Find all rules that meet the lower bound on
  minimum support and the minimum metric constraint.
  Turning this mode on will disable the iterative support reduction
  procedure to find the specified number of rules.
 -transactions <comma separated list of attribute names>
  Only consider transactions that contain these items (default = no restriction)
 -rules <comma separated list of attribute names>
  Only print rules that contain these items. (default = no restriction)
 -use-or
  Use OR instead of AND for must contain list(s). Use in conjunction
  with -transactions and/or -rules

Version:
$Revision: 7092 $
Author:
Mark Hall (mhall{[at]}pentaho{[dot]}com)
See Also:
Serialized Form

Nested Class Summary
static class FPGrowth.AssociationRule
           
static class FPGrowth.BinaryItem
          Inner class that handles a single binary item
 
Constructor Summary
FPGrowth()
          Construct a new FPGrowth object.
 
Method Summary
 void buildAssociations(Instances data)
          Method that generates all large item sets with a minimum support, and from these all association rules with a minimum metric (i.e.
 java.lang.String deltaTipText()
          Returns the tip text for this property
 java.lang.String findAllRulesForSupportLevelTipText()
          Tip text for this property suitable for displaying in the GUI.
 java.util.List<FPGrowth.AssociationRule> getAssociationRules()
          Gets the list of mined association rules.
 Capabilities getCapabilities()
          Returns default capabilities of the classifier.
 double getDelta()
          Get the value of delta.
 boolean getFindAllRulesForSupportLevel()
          Get whether all rules meeting the lower bound on min support and the minimum metric threshold are to be found.
 double getLowerBoundMinSupport()
          Get the value of lowerBoundMinSupport.
 int getMaxNumberOfItems()
          Gets the maximum number of items to be included in large item sets.
 SelectedTag getMetricType()
          Get the metric type to use.
 double getMinMetric()
          Get the value of minConfidence.
 int getNumRulesToFind()
          Get the number of rules to find.
 java.lang.String[] getOptions()
          Gets the current settings of the classifier.
 int getPositiveIndex()
          Get the index of the attribute value to consider as positive for binary attributes in normal dense instances.
 java.lang.String getRevision()
          Returns the revision string.
 java.lang.String getRulesMustContain()
          Get the comma separated list of items that rules must contain in order to be output.
 TechnicalInformation getTechnicalInformation()
          Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
 java.lang.String getTransactionsMustContain()
          Gets the comma separated list of items that transactions must contain in order to be considered for large item sets and rules.
 double getUpperBoundMinSupport()
          Get the value of upperBoundMinSupport.
 boolean getUseORForMustContainList()
          Gets whether OR is to be used rather than AND when considering must contain lists.
 java.lang.String globalInfo()
          Returns a string describing this associator
 java.lang.String graph(weka.associations.FPGrowth.FPTreeRoot tree)
          Assemble a dot graph representation of the FP-tree.
 java.util.Enumeration<Option> listOptions()
          Returns an enumeration describing the available options.
 java.lang.String lowerBoundMinSupportTipText()
          Returns the tip text for this property
static void main(java.lang.String[] args)
          Main method.
 java.lang.String maxNumberOfItemsTipText()
          Tip text for this property suitable for displaying in the GUI.
 java.lang.String metricTypeTipText()
          Tip text for this property suitable for displaying in the GUI.
 java.lang.String minMetricTipText()
          Returns the tip text for this property
 java.lang.String numRulesToFindTipText()
          Tip text for this property suitable for displaying in the GUI.
 java.lang.String positiveIndexTipText()
          Tip text for this property suitable for displaying in the GUI.
 void resetOptions()
          Reset all options to their default values.
 java.lang.String rulesMustContainTipText()
          Returns the tip text for this property
 void setDelta(double v)
          Set the value of delta.
 void setFindAllRulesForSupportLevel(boolean s)
          If true then turn off the iterative support reduction method of finding x rules that meet the minimum support and metric thresholds and just return all the rules that meet the lower bound on minimum support and the minimum metric.
 void setLowerBoundMinSupport(double v)
          Set the value of lowerBoundMinSupport.
 void setMaxNumberOfItems(int max)
          Set the maximum number of items to include in large items sets.
 void setMetricType(SelectedTag d)
          Set the metric type to use.
 void setMinMetric(double v)
          Set the value of minConfidence.
 void setNumRulesToFind(int numR)
          Set the desired number of rules to find.
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 void setPositiveIndex(int index)
          Set the index of the attribute value to consider as positive for binary attributes in normal dense instances.
 void setRulesMustContain(java.lang.String list)
          Set the comma separated list of items that rules must contain in order to be output.
 void setTransactionsMustContain(java.lang.String list)
          Set the comma separated list of items that transactions must contain in order to be considered for large item sets and rules.
 void setUpperBoundMinSupport(double v)
          Set the value of upperBoundMinSupport.
 void setUseORForMustContainList(boolean b)
          Set whether to use OR rather than AND when considering must contain lists.
 java.lang.String toString()
          Output the association rules.
 java.lang.String transactionsMustContainTipText()
          Returns the tip text for this property
 java.lang.String upperBoundMinSupportTipText()
          Returns the tip text for this property
 java.lang.String useORForMustContainListTipText()
          Returns the tip text for this property
 java.lang.String xmlRules()
           
 
Methods inherited from class weka.associations.AbstractAssociator
forName, makeCopies, makeCopy
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

FPGrowth

public FPGrowth()
Construct a new FPGrowth object.

Method Detail

getCapabilities

public Capabilities getCapabilities()
Returns default capabilities of the classifier.

Specified by:
getCapabilities in interface Associator
Specified by:
getCapabilities in interface CapabilitiesHandler
Overrides:
getCapabilities in class AbstractAssociator
Returns:
the capabilities of this classifier
See Also:
Capabilities

globalInfo

public java.lang.String globalInfo()
Returns a string describing this associator

Returns:
a description of the evaluator suitable for displaying in the explorer/experimenter gui

getTechnicalInformation

public TechnicalInformation getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.

Specified by:
getTechnicalInformation in interface TechnicalInformationHandler
Returns:
the technical information about this class

resetOptions

public void resetOptions()
Reset all options to their default values.


positiveIndexTipText

public java.lang.String positiveIndexTipText()
Tip text for this property suitable for displaying in the GUI.

Returns:
the tip text for this property.

setPositiveIndex

public void setPositiveIndex(int index)
Set the index of the attribute value to consider as positive for binary attributes in normal dense instances. Index 1 is always used for sparse instances.

Parameters:
index - the index to use for positive values in binary attributes.

getPositiveIndex

public int getPositiveIndex()
Get the index of the attribute value to consider as positive for binary attributes in normal dense instances. Index 1 is always used for sparse instances.

Returns:
the index to use for positive values in binary attributes.

setNumRulesToFind

public void setNumRulesToFind(int numR)
Set the desired number of rules to find.

Parameters:
numR - the number of rules to find.

getNumRulesToFind

public int getNumRulesToFind()
Get the number of rules to find.

Returns:
the number of rules to find.

numRulesToFindTipText

public java.lang.String numRulesToFindTipText()
Tip text for this property suitable for displaying in the GUI.

Returns:
the tip text for this property.

setMetricType

public void setMetricType(SelectedTag d)
Set the metric type to use.

Parameters:
d - the metric type

setMaxNumberOfItems

public void setMaxNumberOfItems(int max)
Set the maximum number of items to include in large items sets.

Parameters:
max - the maxim number of items to include in large item sets.

getMaxNumberOfItems

public int getMaxNumberOfItems()
Gets the maximum number of items to be included in large item sets.

Returns:
the maximum number of items to be included in large items sets.

maxNumberOfItemsTipText

public java.lang.String maxNumberOfItemsTipText()
Tip text for this property suitable for displaying in the GUI.

Returns:
the tip text for this property.

getMetricType

public SelectedTag getMetricType()
Get the metric type to use.

Returns:
the metric type to use.

metricTypeTipText

public java.lang.String metricTypeTipText()
Tip text for this property suitable for displaying in the GUI.

Returns:
the tip text for this property.

minMetricTipText

public java.lang.String minMetricTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getMinMetric

public double getMinMetric()
Get the value of minConfidence.

Returns:
Value of minConfidence.

setMinMetric

public void setMinMetric(double v)
Set the value of minConfidence.

Parameters:
v - Value to assign to minConfidence.

transactionsMustContainTipText

public java.lang.String transactionsMustContainTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setTransactionsMustContain

public void setTransactionsMustContain(java.lang.String list)
Set the comma separated list of items that transactions must contain in order to be considered for large item sets and rules.

Parameters:
list - a comma separated list of items (empty string indicates no restriction on the transactions).

getTransactionsMustContain

public java.lang.String getTransactionsMustContain()
Gets the comma separated list of items that transactions must contain in order to be considered for large item sets and rules.

Returns:
return the comma separated list of items that transactions must contain.

rulesMustContainTipText

public java.lang.String rulesMustContainTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setRulesMustContain

public void setRulesMustContain(java.lang.String list)
Set the comma separated list of items that rules must contain in order to be output.

Parameters:
list - a comma separated list of items (empty string indicates no restriction on the rules).

getRulesMustContain

public java.lang.String getRulesMustContain()
Get the comma separated list of items that rules must contain in order to be output.

Returns:
the comma separated list of items that rules must contain in order to be output.

useORForMustContainListTipText

public java.lang.String useORForMustContainListTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setUseORForMustContainList

public void setUseORForMustContainList(boolean b)
Set whether to use OR rather than AND when considering must contain lists.

Parameters:
b - true if OR should be used instead of AND when considering transaction and rules must contain lists.

getUseORForMustContainList

public boolean getUseORForMustContainList()
Gets whether OR is to be used rather than AND when considering must contain lists.

Returns:
true if OR is used instead of AND.

deltaTipText

public java.lang.String deltaTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying, in the explorer/experimenter gui

getDelta

public double getDelta()
Get the value of delta.

Returns:
Value of delta.

setDelta

public void setDelta(double v)
Set the value of delta.

Parameters:
v - Value to assign to delta.

lowerBoundMinSupportTipText

public java.lang.String lowerBoundMinSupportTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getLowerBoundMinSupport

public double getLowerBoundMinSupport()
Get the value of lowerBoundMinSupport.

Returns:
Value of lowerBoundMinSupport.

setLowerBoundMinSupport

public void setLowerBoundMinSupport(double v)
Set the value of lowerBoundMinSupport.

Parameters:
v - Value to assign to lowerBoundMinSupport.

upperBoundMinSupportTipText

public java.lang.String upperBoundMinSupportTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getUpperBoundMinSupport

public double getUpperBoundMinSupport()
Get the value of upperBoundMinSupport.

Returns:
Value of upperBoundMinSupport.

setUpperBoundMinSupport

public void setUpperBoundMinSupport(double v)
Set the value of upperBoundMinSupport.

Parameters:
v - Value to assign to upperBoundMinSupport.

findAllRulesForSupportLevelTipText

public java.lang.String findAllRulesForSupportLevelTipText()
Tip text for this property suitable for displaying in the GUI.

Returns:
the tip text for this property.

setFindAllRulesForSupportLevel

public void setFindAllRulesForSupportLevel(boolean s)
If true then turn off the iterative support reduction method of finding x rules that meet the minimum support and metric thresholds and just return all the rules that meet the lower bound on minimum support and the minimum metric.

Parameters:
s - true if all rules meeting the lower bound on the support and minimum metric thresholds are to be found.

getFindAllRulesForSupportLevel

public boolean getFindAllRulesForSupportLevel()
Get whether all rules meeting the lower bound on min support and the minimum metric threshold are to be found.

Returns:
true if all rules meeting the lower bound on min support and the min metric threshold are to be found.

getAssociationRules

public java.util.List<FPGrowth.AssociationRule> getAssociationRules()
Gets the list of mined association rules.

Returns:
the list of association rules discovered during mining. Returns null if mining hasn't been performed yet.

listOptions

public java.util.Enumeration<Option> listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface OptionHandler
Returns:
an enumeration of all the available options.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options.

Valid options are:

 -P <attribute index of positive value>
  Set the index of the attribute value to consider as 'positive'
  for binary attributes in normal dense instances. Index 2 is always
  used for sparse instances. (default = 2)
 -I <max items>
  The maximum number of items to include in large items sets (and rules). (default = -1, i.e. no limit.)
 -N <require number of rules>
  The required number of rules. (default = 10)
 -T <0=confidence | 1=lift | 2=leverage | 3=Conviction>
  The metric by which to rank rules. (default = confidence)
 -C <minimum metric score of a rule>
  The minimum metric score of a rule. (default = 0.9)
 -U <upper bound for minimum support>
  Upper bound for minimum support. (default = 1.0)
 -M <lower bound for minimum support>
  The lower bound for the minimum support. (default = 0.1)
 -D <delta for minimum support>
  The delta by which the minimum support is decreased in
  each iteration. (default = 0.05)
 -S
  Find all rules that meet the lower bound on
  minimum support and the minimum metric constraint.
  Turning this mode on will disable the iterative support reduction
  procedure to find the specified number of rules.
 -transactions <comma separated list of attribute names>
  Only consider transactions that contain these items (default = no restriction)
 -rules <comma separated list of attribute names>
  Only print rules that contain these items. (default = no restriction)
 -use-or
  Use OR instead of AND for must contain list(s). Use in conjunction
  with -transactions and/or -rules

Specified by:
setOptions in interface OptionHandler
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of the classifier.

Specified by:
getOptions in interface OptionHandler
Returns:
an array of strings suitable for passing to setOptions

buildAssociations

public void buildAssociations(Instances data)
                       throws java.lang.Exception
Method that generates all large item sets with a minimum support, and from these all association rules with a minimum metric (i.e. confidence, lift etc.).

Specified by:
buildAssociations in interface Associator
Parameters:
data - the instances to be used for generating the associations
Throws:
java.lang.Exception - if rules can't be built successfully

toString

public java.lang.String toString()
Output the association rules.

Overrides:
toString in class java.lang.Object
Returns:
a string representation of the model.

graph

public java.lang.String graph(weka.associations.FPGrowth.FPTreeRoot tree)
Assemble a dot graph representation of the FP-tree.

Parameters:
tree - the root of the FP-tree
Returns:
a graph representation as a String in dot format.

xmlRules

public java.lang.String xmlRules()

getRevision

public java.lang.String getRevision()
Returns the revision string.

Specified by:
getRevision in interface RevisionHandler
Overrides:
getRevision in class AbstractAssociator
Returns:
the revision

main

public static void main(java.lang.String[] args)
Main method.

Parameters:
args - the commandline options