Q-Digest datastructure.
Answers approximate quantile queries: actual rank of the result of query(q)
is in q-eps .. q+eps, where eps = log(sigma)/compressionFactor
and log(sigma) is ceiling of binary log of the largest value inserted,
i.e. height of the tree.
Two Q-Digests can be joined (see
unionOf(QDigest, QDigest)
).
Source:
N.Shrivastava, C.Buragohain, D.Agrawal
Medians and Beyond: New Aggregation Techniques for Sensor Networks
http://www.cs.virginia.edu/~son/cs851/papers/ucsb.sensys04.pdf
This is a slightly modified version.
There is a small problem with the compression algorithm in the paper,
see https://plus.google.com/u/0/109909935680879695595/posts/768ZZ9Euqz6
So we use a different algorithm here:
- When an item is inserted, we compress along the path to root from the item's leaf
- When the structure becomes too large (above the theoretical bound), or
at "too destructive" operations (e.g. union or rebuild) we compress fully
Note that the accuracy of the structure does NOT suffer if "property 2"
from the paper is violated (in fact, restoring property 2 at any node
decreases accuracy).
So we can say that we preserve the paper's accuracy and memory consumption claims.