Exponential decay, resulting in log-scale lists, an excellent way to work with an unbounded amount of history in a bounded amount of space. I’ve described this in two previous articles . In my first article, I modeled logarithmic decay via a sequence of ring buffers: each ring would lose (or merge) one of its items rather than passing it forward. In my second article, I expressed a concern that ring buffers might lose information that occurs at a regular frequency, and modeled the decay probabilistically instead. The latter ‘improved’ form of exponential decay is certainly easier to implement, and could be generalized in interesting ways (such as time-based collapse). But original design had a nice property too. For example, it was extremely predictable.
In this article, I describe an even simpler model:
- Choose a maximum number of entries for your history buffer.
- Whenever you reach the maximum, decimate your history.
- To decimate: Take groups of ten entries. From each group, kill one.
The word ‘decimate’ means to kill one man in every ten to punish or subdue a group. This results in a 90% survival rate. Repeatedly decimating our history will give us exponential decay. Operating on groups of ten keeps the operation simple, and ensures the decay process is relatively uniform and predictable. Of course, the particular choice of tenths is arbitrary and could be adjusted without loss of generality.
The decision, then, is how to choose one element from each group to destroy.
The probabilistic approach is to roll a ‘die!’.
The probabilistic approach can be deterministic in a simple way: use a hash function on each group of entries to roll a die for that group. This sort of technique would be convenient when working with replicated histories, distributed databases, and so on.
A non-probabilistic model might heuristically select one entry to erase based on estimated loss of information. For example, if we’re erasing camera frames, we might favor deleting a frame that looks most similar to another frame in the group. Or if our history consists of a path of a robot, we might favor deleting entries where the robot was basically sitting still or moving in a straight line.
Of course, merging is also a fine idea. We could choose two items to merge, ideally in an associative manner such that
merge(a,merge(b,c)) = merge(merge(a,b),c); this can be done with weighted averages, keeping counts and sums and sums-of-squares, and generally developing a monoidal type for history entries.
I developed this particular approach to logarithmic history to support Wikilon. Logarithmic history offers a happy medium between keeping a complete history and keeping the most recent version of each page/word. It does have a few disadvantages. For example you can’t get a “permanent link” to a historical version of a word. But you can at least browse historical versions of the wiki/dictionary, get a story for how a system evolved, access old versions of ideas. And the long-term space requirements are very predictable.