public class CoverTree extends NearestNeighbourSearch implements TechnicalInformationHandler
@inproceedings{Beygelzimer2006, address = {New York, NY, USA}, author = {Alina Beygelzimer and Sham Kakade and John Langford}, booktitle = {ICML'06: Proceedings of the 23rd international conference on Machine learning}, pages = {97-104}, publisher = {ACM Press}, title = {Cover trees for nearest neighbor}, year = {2006}, location = {Pittsburgh, Pennsylvania}, HTTP = {http://hunch.net/\~jl/projects/cover_tree/cover_tree.html} }Valid options are:
-B <value> Set base of the expansion constant (default = 1.3).
Modifier and Type | Class and Description |
---|---|
class |
CoverTree.CoverTreeNode
class representing a node of the cover tree.
|
protected class |
CoverTree.MyHeap
A class for a heap to store the nearest k neighbours to an instance.
|
protected class |
CoverTree.MyHeapElement
A class for storing data about a neighboring instance.
|
NearestNeighbourSearch.NeighborList, NearestNeighbourSearch.NeighborNode
Modifier and Type | Field and Description |
---|---|
protected double |
il2
if we have base 2 then this can be viewed as 1/ln(2), which can be
used later on to do il2*ln(d) instead of ln(d)/ln(2), to get log2(d),
in get_scale method.
|
protected double |
m_Base
The base of our expansion constant.
|
protected double[] |
m_DistanceList
Array holding the distances of the nearest neighbours.
|
protected EuclideanDistance |
m_EuclideanDistance
The euclidean distance function to use.
|
protected int |
m_MaxDepth
Number of nodes in the tree.
|
protected int |
m_NumLeaves
Number of nodes in the tree.
|
protected int |
m_NumNodes
Number of nodes in the tree.
|
protected CoverTree.CoverTreeNode |
m_Root
The root node.
|
protected TreePerformanceStats |
m_TreeStats
Tree Stats variables.
|
m_DistanceFunction, m_Instances, m_kNN, m_MeasurePerformance, m_Stats
Constructor and Description |
---|
CoverTree()
default constructor.
|
Modifier and Type | Method and Description |
---|---|
void |
addInstanceInfo(Instance ins)
Adds the given instance info.
|
String |
baseTipText()
Returns the tip text for this property.
|
protected CoverTree.CoverTreeNode |
batch_insert(Integer p,
int max_scale,
int top_scale,
Stack<weka.core.neighboursearch.CoverTree.DistanceNode> point_set,
Stack<weka.core.neighboursearch.CoverTree.DistanceNode> consumed_set)
Creates a cover tree recursively using batch insert method.
|
protected void |
batch_nearest_neighbor(int k,
CoverTree.CoverTreeNode tree_root,
CoverTree.CoverTreeNode query_root,
Stack<NearestNeighbourSearch.NeighborList> results)
Performs k-NN search for a batch of queries provided in the form
of a cover tree.
|
protected void |
brute_nearest(int k,
CoverTree.CoverTreeNode query,
Stack<weka.core.neighboursearch.CoverTree.d_node> zero_set,
CoverTree.MyHeap upper_k,
Stack<NearestNeighbourSearch.NeighborList> results)
Does a brute force NN search on the nodes in the given zero set.
|
protected void |
buildCoverTree(Instances insts)
Builds the tree on the given set of instances.
|
protected void |
checkMissing(Instances instances)
Checks if there is any instance with missing values.
|
protected double |
compare(int p1,
int p2,
Stack<weka.core.neighboursearch.CoverTree.d_node> cover_set)
Returns the difference of two given nodes distance to
the query.
|
protected void |
copy_cover_sets(CoverTree.CoverTreeNode query_chi,
CoverTree.MyHeap new_upper_k,
Stack<Stack<weka.core.neighboursearch.CoverTree.d_node>> cover_sets,
Stack<Stack<weka.core.neighboursearch.CoverTree.d_node>> new_cover_sets,
int current_scale,
int max_scale)
Copies the contents of one set of cover sets to the other.
|
protected void |
copy_zero_set(CoverTree.CoverTreeNode query_chi,
CoverTree.MyHeap new_upper_k,
Stack<weka.core.neighboursearch.CoverTree.d_node> zero_set,
Stack<weka.core.neighboursearch.CoverTree.d_node> new_zero_set)
Copies the contents of one zero set to the other.
|
protected int |
descend(CoverTree.CoverTreeNode query,
CoverTree.MyHeap upper_k,
int current_scale,
int max_scale,
Stack<Stack<weka.core.neighboursearch.CoverTree.d_node>> cover_sets,
Stack<weka.core.neighboursearch.CoverTree.d_node> zero_set)
This functions adds nodes for inspection at the next level during NN
search.
|
protected double |
dist_of_scale(int s)
Returns the distance/value of a given scale/level.
|
protected void |
dist_split(Stack<weka.core.neighboursearch.CoverTree.DistanceNode> point_set,
Stack<weka.core.neighboursearch.CoverTree.DistanceNode> new_point_set,
weka.core.neighboursearch.CoverTree.DistanceNode new_point,
int max_scale)
Moves all the points in point_set covered by (the ball of) new_point
into new_point_set, based on the given scale/level.
|
Enumeration |
enumerateMeasures()
Returns an enumeration of the additional measure names.
|
protected NearestNeighbourSearch.NeighborList |
findKNearest(Instance target,
int k)
Performs k-NN serach for a single given query/test Instance.
|
protected int |
get_scale(double d)
Finds the scale/level of a given value.
|
double |
getBase()
Returns the base in use for expansion constant.
|
protected Stack<weka.core.neighboursearch.CoverTree.d_node> |
getCoverSet(int idx,
Stack<Stack<weka.core.neighboursearch.CoverTree.d_node>> cover_sets)
Returns a cover set for a given level/scale.
|
double[] |
getDistances()
Returns the distances of the (k)-NN(s) found earlier
by kNearestNeighbours()/nearestNeighbour().
|
double |
getMeasure(String additionalMeasureName)
Returns the value of the named measure.
|
String[] |
getOptions()
Gets the current settings of KDtree.
|
String |
getRevision()
Returns the revision string.
|
TechnicalInformation |
getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing detailed
information about the technical background of this class, e.g., paper
reference or book this class is based on.
|
String |
globalInfo()
Returns a string describing this nearest neighbour search algorithm.
|
protected void |
halfsort(Stack<weka.core.neighboursearch.CoverTree.d_node> cover_set)
Half-sorts a cover set, so that nodes nearer to the query
are at the front.
|
protected void |
internal_batch_nearest_neighbor(int k,
CoverTree.CoverTreeNode query_node,
Stack<Stack<weka.core.neighboursearch.CoverTree.d_node>> cover_sets,
Stack<weka.core.neighboursearch.CoverTree.d_node> zero_set,
int current_scale,
int max_scale,
CoverTree.MyHeap upper_k,
Stack<NearestNeighbourSearch.NeighborList> results)
Performs a recursive k-NN search for a given batch of queries provided in the
form of a cover tree.
|
Instances |
kNearestNeighbours(Instance target,
int k)
Returns k-NNs of a given target instance, from among the previously
supplied training instances (supplied through setInstances method)
P.S.: May return more than k-NNs if more one instances have
the same distance to the target as the kth NN.
|
Enumeration |
listOptions()
Returns an enumeration describing the available options.
|
static void |
main(String[] args)
Method for testing the class from command line.
|
protected double |
max_set(Stack<weka.core.neighboursearch.CoverTree.DistanceNode> v)
Returns the max distance of the reference point p in current node to
it's children nodes.
|
double |
measureMaxDepth()
Returns the depth of the tree.
|
double |
measureNumLeaves()
Returns the number of leaves.
|
double |
measureTreeSize()
Returns the size of the tree.
|
Instance |
nearestNeighbour(Instance target)
Returns the NN instance of a given target instance, from among
the previously supplied training instances.
|
protected CoverTree.CoverTreeNode |
new_leaf(Integer idx)
Creates a new leaf node for a given Instance/point p.
|
protected CoverTree.CoverTreeNode |
new_node(Integer idx)
Creates a new internal node for a given Instance/point p.
|
protected static void |
print_space(int s)
Prints the specified number of spaces.
|
protected static void |
print(int depth,
CoverTree.CoverTreeNode top_node)
Prints a cover tree starting from the given node.
|
protected static void |
print(Object o)
Prints an object to stdout.
|
protected static void |
print(String s)
Prints a string to stdout.
|
protected static void |
println(Object o)
Prints an object to stdout followed by
newline.
|
protected static void |
println(String s)
Prints a string to stdout followed by
newline.
|
void |
setBase(double b)
Sets the base to use for expansion constant.
|
void |
setDistanceFunction(DistanceFunction df)
Sets the distance function to use for nearest neighbour search.
|
void |
setInstances(Instances instances)
Builds the Cover Tree on the given set of instances.
|
void |
setOptions(String[] options)
Parses a given list of options.
|
protected void |
setter(CoverTree.MyHeap heap,
double upper_bound,
int k)
Initializes a heap with k values of the the given upper_bound.
|
protected boolean |
shell(double parent_query_dist,
double child_parent_dist,
double upper_bound)
Function to check if a child node can be inside a query ball,
without calculating the child node's distance to the query.
|
protected void |
split(Stack<weka.core.neighboursearch.CoverTree.DistanceNode> point_set,
Stack<weka.core.neighboursearch.CoverTree.DistanceNode> far_set,
int max_scale)
Splits a given point_set into near and far based on the given
scale/level.
|
protected void |
SWAP(int a,
int b,
Stack<weka.core.neighboursearch.CoverTree.d_node> cover_set)
Swap two nodes in a cover set.
|
protected void |
update(CoverTree.MyHeap upper_bound,
double new_bound)
Replaces the current top/max value in the heap with the new one.
|
void |
update(Instance ins)
Adds an instance to the cover tree.
|
combSort11, distanceFunctionTipText, getDistanceFunction, getInstances, getMeasurePerformance, getPerformanceStats, measurePerformanceTipText, partition, quickSort, setMeasurePerformance
protected EuclideanDistance m_EuclideanDistance
protected CoverTree.CoverTreeNode m_Root
protected double[] m_DistanceList
protected int m_NumNodes
protected int m_NumLeaves
protected int m_MaxDepth
protected TreePerformanceStats m_TreeStats
protected double m_Base
protected double il2
public String globalInfo()
globalInfo
in class NearestNeighbourSearch
public TechnicalInformation getTechnicalInformation()
getTechnicalInformation
in interface TechnicalInformationHandler
public Enumeration listOptions()
listOptions
in interface OptionHandler
listOptions
in class NearestNeighbourSearch
public void setOptions(String[] options) throws Exception
-B <value> Set base of the expansion constant (default = 1.3).
setOptions
in interface OptionHandler
setOptions
in class NearestNeighbourSearch
options
- the list of options as an array of stringsException
- if an option is not supportedpublic String[] getOptions()
getOptions
in interface OptionHandler
getOptions
in class NearestNeighbourSearch
protected double dist_of_scale(int s)
s
- the level/scaleprotected int get_scale(double d)
d
- the value whose scale/level is to be determined.protected CoverTree.CoverTreeNode new_node(Integer idx)
idx
- The index of the instance the node represents.protected CoverTree.CoverTreeNode new_leaf(Integer idx)
idx
- The index of the instance this leaf node
represents.protected double max_set(Stack<weka.core.neighboursearch.CoverTree.DistanceNode> v)
v
- The stack of DistanceNode objects.protected void split(Stack<weka.core.neighboursearch.CoverTree.DistanceNode> point_set, Stack<weka.core.neighboursearch.CoverTree.DistanceNode> far_set, int max_scale)
point_set
- The supplied set from which all far points
would be removed.far_set
- The set in which all far points having distance
> base^max_scale would be put into.max_scale
- The given scale based on which the distances
of points are judged to be far or near.protected void dist_split(Stack<weka.core.neighboursearch.CoverTree.DistanceNode> point_set, Stack<weka.core.neighboursearch.CoverTree.DistanceNode> new_point_set, weka.core.neighboursearch.CoverTree.DistanceNode new_point, int max_scale)
point_set
- The supplied set of instances from which
all points covered by new_point will be removed.new_point_set
- The set in which all points covered by
new_point will be put into.new_point
- The given new point.max_scale
- The scale based on which distances are
judged (radius of cover ball is calculated).protected CoverTree.CoverTreeNode batch_insert(Integer p, int max_scale, int top_scale, Stack<weka.core.neighboursearch.CoverTree.DistanceNode> point_set, Stack<weka.core.neighboursearch.CoverTree.DistanceNode> consumed_set)
p
- The index of the instance from which to create the
first node. All other points will be inserted beneath this node
for p.max_scale
- The current scale/level where the node is to be
created (Also determines the radius of the cover balls created at
this level).top_scale
- The max scale in the whole tree.point_set
- The set of unprocessed points from which child nodes
need to be created.consumed_set
- The set of processed points from which child
nodes have already been created. This would be used to find the
radius of the cover ball of p.protected void buildCoverTree(Instances insts) throws Exception
insts
- The instances on which to build
the cover tree.Exception
- If the supplied set of
Instances is empty, or if there are missing
values.protected void setter(CoverTree.MyHeap heap, double upper_bound, int k) throws Exception
heap
- The heap to put values into.upper_bound
- The value to put into heap (the value with
which it should be initialized).k
- The number of times upper_bound should be put into
heap for initialization.Exception
- If there is some problem in initializing
the heap (if k > size of the heap).protected void update(CoverTree.MyHeap upper_bound, double new_bound) throws Exception
upper_bound
- The heap.new_bound
- The new value that should replace the old top one.Exception
- if the new value is greater than the old value.protected Stack<weka.core.neighboursearch.CoverTree.d_node> getCoverSet(int idx, Stack<Stack<weka.core.neighboursearch.CoverTree.d_node>> cover_sets)
idx
- The level/scale for which the cover set
is required.cover_sets
- The covers sets. Consists of stack
of a stack of d_node objects.protected void copy_zero_set(CoverTree.CoverTreeNode query_chi, CoverTree.MyHeap new_upper_k, Stack<weka.core.neighboursearch.CoverTree.d_node> zero_set, Stack<weka.core.neighboursearch.CoverTree.d_node> new_zero_set) throws Exception
query_chi
- The child node of our query node that we are
going to inspect.new_upper_k
- New heap that will store the distances of the
k NNs for query_chi.zero_set
- The zero set of query_chi's parent that needs
to be copied.new_zero_set
- The new zero set of query_chi where old zero
sets need to be copied into.Exception
- If there is some problem.protected void copy_cover_sets(CoverTree.CoverTreeNode query_chi, CoverTree.MyHeap new_upper_k, Stack<Stack<weka.core.neighboursearch.CoverTree.d_node>> cover_sets, Stack<Stack<weka.core.neighboursearch.CoverTree.d_node>> new_cover_sets, int current_scale, int max_scale) throws Exception
query_chi
- The child node of our query node that we are
going to inspect.new_upper_k
- New heap that will store the distances of the
k NNs for query_chi.cover_sets
- The cover_sets of query_chi's parent, which
need to be copied to new_cover_sets.new_cover_sets
- The new set of cover_sets that need to
contain contents of cover_sets.current_scale
- The scale/level we are inspecting in our
cover tree.max_scale
- The maximum level so far possible in our
search (this is only updated as we descend and a deeper
child is found inside the query ball).Exception
- If there is problem.protected void SWAP(int a, int b, Stack<weka.core.neighboursearch.CoverTree.d_node> cover_set)
a
- The index first node.b
- The index of second node.cover_set
- The cover set in which the two nodes are.protected double compare(int p1, int p2, Stack<weka.core.neighboursearch.CoverTree.d_node> cover_set)
p1
- The index of first node.p2
- The index of second node.cover_set
- The cover set containing the two given
nodes.protected void halfsort(Stack<weka.core.neighboursearch.CoverTree.d_node> cover_set)
cover_set
- The cover set to sort.protected boolean shell(double parent_query_dist, double child_parent_dist, double upper_bound)
parent_query_dist
- The distance of parent to the querychild_parent_dist
- The distance of child to the parent.upper_bound
- The distance to the query of the best kth
NN found so far.protected int descend(CoverTree.CoverTreeNode query, CoverTree.MyHeap upper_k, int current_scale, int max_scale, Stack<Stack<weka.core.neighboursearch.CoverTree.d_node>> cover_sets, Stack<weka.core.neighboursearch.CoverTree.d_node> zero_set) throws Exception
query
- The query (in shape of a cover tree node, as we
are doing batch searching).upper_k
- Heap containing distances of best k-NNs found so
far.current_scale
- The current scale/level being looked at in
the tree.max_scale
- The max scale/level that has so far been looked
at.cover_sets
- The cover sets of tree nodes for each level of
our trees for.zero_set
- The set containing leaf nodes.Exception
- If there is some problem (in updating the
heap upper_k).protected void brute_nearest(int k, CoverTree.CoverTreeNode query, Stack<weka.core.neighboursearch.CoverTree.d_node> zero_set, CoverTree.MyHeap upper_k, Stack<NearestNeighbourSearch.NeighborList> results) throws Exception
k
- The k in kNN.query
- The query.zero_set
- The zero set on which the brute force NN search
is performed.upper_k
- The heap storing distances of k-NNs found during
the search.results
- The returned k-NNs.Exception
- If there is somem problem.protected void internal_batch_nearest_neighbor(int k, CoverTree.CoverTreeNode query_node, Stack<Stack<weka.core.neighboursearch.CoverTree.d_node>> cover_sets, Stack<weka.core.neighboursearch.CoverTree.d_node> zero_set, int current_scale, int max_scale, CoverTree.MyHeap upper_k, Stack<NearestNeighbourSearch.NeighborList> results) throws Exception
k
- The number of NNs to find.query_node
- The node of the query tree to start the search from.cover_sets
- The set of sets that contains internal
nodes that were found to be inside the query ball at previous scales/levels
(intially there would be just the root node at root level).zero_set
- The set that'll contain the leaf nodes that are found to
be inside the query ball.current_scale
- The level/scale to do the search from (this value
would be used to inspect the cover set in the provided set of cover sets).max_scale
- The max scale/level that has so far been inspected.upper_k
- The heap containing distances of the best k-NNs found so
far (initialized to Double.POSITIVE_INFINITY).results
- The list of returned k-NNs.Exception
- If there is some problem during the search.protected void batch_nearest_neighbor(int k, CoverTree.CoverTreeNode tree_root, CoverTree.CoverTreeNode query_root, Stack<NearestNeighbourSearch.NeighborList> results) throws Exception
k
- The number of k-NNs to find.tree_root
- The root of the cover tree on which k-NN search
is to be performed.query_root
- The root of the cover tree consisting of queries.results
- The list of returned k-NNs.Exception
- If there is some problem during the search.protected NearestNeighbourSearch.NeighborList findKNearest(Instance target, int k) throws Exception
target
- The query/test instance.k
- Number of k-NNs to find.Exception
- If there is some problem during the search
for k-NNs.public Instances kNearestNeighbours(Instance target, int k) throws Exception
kNearestNeighbours
in class NearestNeighbourSearch
target
- The instance for which k-NNs are required.k
- The number of k-NNs to find.Exception
- If there is some problem find the k-NNs.public Instance nearestNeighbour(Instance target) throws Exception
nearestNeighbour
in class NearestNeighbourSearch
target
- The instance for which NN is required.Exception
- If there is some problem finding the nearest
neighbour.public double[] getDistances() throws Exception
getDistances
in class NearestNeighbourSearch
Exception
- If the tree hasn't been built (by calling
setInstances()), or none of kNearestNeighbours() or
nearestNeighbour() has been called before.protected void checkMissing(Instances instances) throws Exception
instances
- the instances to checkException
- if missing values are encounteredpublic void setInstances(Instances instances) throws Exception
setInstances
in class NearestNeighbourSearch
instances
- The insts on which the Cover Tree is to be
built.Exception
- If some error occurs while
building the Cover Treepublic void update(Instance ins) throws Exception
update
in class NearestNeighbourSearch
ins
- The instance to add.Exception
- Alway throws this, as current
implementation doesn't allow addition of instances
after building.public void addInstanceInfo(Instance ins)
addInstanceInfo
in class NearestNeighbourSearch
ins
- The instance to add the information of. Usually this is
the test instance supplied to update the range of
attributes in the distance function.public void setDistanceFunction(DistanceFunction df) throws Exception
setDistanceFunction
in class NearestNeighbourSearch
df
- the distance function to useException
- if not EuclideanDistancepublic String baseTipText()
public double getBase()
public void setBase(double b)
b
- the new base;public double measureTreeSize()
public double measureNumLeaves()
public double measureMaxDepth()
public Enumeration enumerateMeasures()
enumerateMeasures
in interface AdditionalMeasureProducer
enumerateMeasures
in class NearestNeighbourSearch
public double getMeasure(String additionalMeasureName)
getMeasure
in interface AdditionalMeasureProducer
getMeasure
in class NearestNeighbourSearch
additionalMeasureName
- the name of the measure to query for
its valueIllegalArgumentException
- if the named measure is not supportedprotected static void print(String s)
s
- The string to print.protected static void println(String s)
s
- The string to print.protected static void print(Object o)
o
- The object to print.protected static void println(Object o)
o
- The object to print.protected static void print_space(int s)
s
- The number of space characters to print.protected static void print(int depth, CoverTree.CoverTreeNode top_node)
depth
- The depth of top_node.top_node
- The node to start printing from.public String getRevision()
getRevision
in interface RevisionHandler
public static void main(String[] args)
args
- The supplied command line arguments.Copyright © 2015 University of Waikato, Hamilton, NZ. All rights reserved.