## Windowed Memoization?

I assume you already know what memoization is and why we might want to use it.

In the context of reactive computations, memoization can improve performance a great deal, avoiding recalculation of data that has not changed. Most reactive paradigms, however, achieve only some trivial memoization – i.e. only for the last element, and only for functions fully lifted into the reactive model (shallow, not applicable to recursive or inner functions).

I am interested in richer memoization for reactive programming. The trivial example problem for memoization seems to be fibonacci sequence. Consider mapping it across a list of integers:

``` fib n = if (n < 2) then n else fib (n - 1) + fib (n - 2) map fib [7,6,5,4,5] ```

As written, this would be very expensive because we recompute fibonacci values very often – both vertically within each calculation of fib, and horizontally across the list of integers. For example, if we apply `map fib [7,6,5,4,5]` we’ll perform the same computation `fib 2` a total of 21 times, and `fib 3` a total 13 times.

One challenge, then, is to support memoization in both the horizontal and vertical directions.

Haskell developers have found elegant, pure ways to achieve that level of memoization. The common idea is to represent a lazy data structure that maps every element of the domain (potentially infinite) to an element in the range. Computations will slowly complete this data structure.

``` memoFib = map fib [0,1..] !! where fib 0 = 0 fib 1 = 1 fib n = memoFib (n - 1) + memoFib (n - 2) map memoFib [7,6,5,4,5] ```

This example is borrowed from the Haskell wiki page on memoization. We can do better – generalize, use a faster lookup structures, and separate the structure from the function being memoized. Packages for this include Data.MemoCombinators and Data.MemoTrie. A trie would be preferable to a simple infinite list because it better supports sparse computations and fast lookups.

Unfortunately, growing an infinite data structure in a long-running program causes very predictable problems – space leaks, when a particular domain element is never encountered again. For reactive programming, space leaks are unacceptable, and memory consumption should be predictable over time.

I must be able to GC results that haven’t been used recently, while preserving results that have been used recently.

Somehow I need windowed memoization.

For example, instead of memoizing the entire history, I memoize only results for the previous two elements in a list. When I finish mapping fib to the last element of [7,6,5,4,5], I’ve forgotten fib 7 and fib 6. But I’ve still computed fib 5 only once, since it is available within the window at every step.

I’m short on ideas for how to achieve this elegantly, purely, unobtrusively. I expect I could hack a decent unobtrusive solution with unsafePerformIO and state, if I’m careful about interactions with data-parallelism and anticipation. I might be able to create an obtrusive solution with a stateful memoization monad. If I don’t mind shallow memoization, I could probably lift explicit use of state right into the reactive model.

Ideas welcome.

This entry was posted in Language Design, Live Programming, Modularity, State. Bookmark the permalink.

### 8 Responses to Windowed Memoization?

1. Ross Angle says:

I’d hold memoization entries in a trie, keep a constant-sized list of such tries, and occasionally shift a new trie onto one end and an old trie off the other (to be reclaimed). I haven’t used Haskell much, but this seems like it should be a usable technique even for tail-recursive computations where the “state” is maintained as a parameter, and it should easily survive across monad operations just thanks to being in the bound functions’ lexical scope.

(Note that whenever an entry is looked up, the result should be added to the newest trie, even if it was found on an older trie. No use letting it fall off the end just when we need it!)

If I were to implement this kind of memoization on top of an existing RDP platform, I’d look for a external-state tuple space whose entries had limited lifetime. For fine-tuning, it might be nice to specify the lifetime duration as a probability distribution rather than a single length of time, and it might be nice to specify it as a function of current memory congestion, the entry’s own size, and even the entry value itself (since certain applications might be interested in the memoization of functions that explicitly bundle each result with an ad hoc appraisal of how much effort it took to compute).

Limited-lifetime tuple spaces might not take the place of pervasive memoization throughout the language, since we might want to memoize certain requests we’re making to obtain the tuple spaces in the first place. However, they’re probably a good tool for RDP-level metaprogramming and a good alternative to stuffing complex memoization preferences into an optimization hint system.

• dmbarbour says:

A list of `pure` tries used in existing memoization schema won’t work because I can’t tell whether a particular entry has been evaluated or not. I could use stateful tries, but then I must somehow export the results after weaving through a function. (I could go monadic, as I said, but it’s obtrusive.)

I’ve been considering use of a tuple-space state model, with simple expirations, as a basis for memoization and caching. This would be shallow – e.g. for fib [8,6,7,5,3,0,9] it isn’t clear that there’d ever be any entries for `fib 2` in the shared space. But shallow is okay for many roles.

A probability distribution for lifetime of caching and memoization does seem like a good idea. With a Poisson decay model the space cost would have a factor based on the logarithm of program lifetime – not constant, but at least under control – while improving support for long-running cyclic behaviors.

• Ross Angle says:

You’re right, it looks like existing memoization tools (MemoTrie and MemoCombinators?) are too joined to the idea of a `memo :: (a -> b) -> (a -> b)` function that takes advantage of lazy data structures in between, and this would need a more explicit `(a -> b) -> (a -> State [Map a b] b)` or something. Guess I’m just catching up to you there.

I think the naive approach to “export the results after weaving through a function” is to go monadic, since the type `a -> s -> (b, s)` accomplishes the same thing as `a -> State s b`. If we viewed this as `MyStateArrow s a b` instead, would that be less obtrusive or more? :-p

• dmbarbour says:

The difficulty with `a -> State s b` monad is that it becomes difficult to compose multiple memoized functions (`s` grows unwieldy), or pass them as arguments to other functions. The `memoFib` described earlier has problems, but not this one. Something closer to the ST monad would allow me to encapsulate state without exposing it in the type.

If we go with arrows, we could also do without any global state. I.e. we could have:

``` arrMemo :: (Memoable a) => (a -> b) -> MemA a b stepMemA :: MemA a b -> a -> (b, MemA a b) ```

With the output MemA capturing the memoized results in a pure way.

This is a feasible option. It might be a direction I end up pursuing. (Not that I’m desparate for memoization right this instant. It’s just been bugging me a bit.)

2. gasche says:

The reason why lazyness is a good fit here is because call-by-need (as opposed to call-by-name) evaluation implicitly performs a state mutation, memoizing a computation that is guaranteed to always return the correct result. This means that you have effectively hardcoded memoization into your operational model. It’s no wonder than it is then easy to use memoization in the language in an apparently non-side-effectful way, by just relying on the host capability. But this also mean that this feature is discontinuous: as long as what you want fits what’s hardcoded, life will be easy, and as soon as you go just a little more demanding, reality strikes back: memoization is effectful.
My angle of attack would therefore be to look for a more powerful language that can actually *explain* why this style of effects is safe (referentially transparent), that would be able to handle the mode of use of lazyness, but also richer things you want. In fact you’re looking for a system which can reason about local state — shared across function calls, which I’m not sure the ST monad, the first go-to local state housekeeper, can handle.

That said, for you particular problem of memory use, I think there may be simpler partial solutions. The reason why you have a “leak” is that the cache lifetime is the function lifetime: if your function is globally accessible, the cache is never released. But you could have different versions of the function with different lifetimes and distinct caches. Instead of `fib : Int -> Int` you could have say `fibFactory : () -> Int -> Int` which, when invoked, creates a memoized fibonacci. The user is then free to control memory usage by making sure she’s only building caches locally, as a resource with lexical lifetime. In your list example, one could decide to split the list in groups of K elements and compute each group with a local cache. But that would be absolutely useless here, if you’re only interested in peak memory usage, because of the caching pattern of the fibonacci function. And that’s really only a mitigated use of state, because if you want to delegate work to auxiliary function, you need to pass the local fib instance around.

• dmbarbour says:

“look for a more powerful language that can actually *explain* why this style of effects is safe” – I think you are right.

And I also don’t see ST helping much across iterations and delegations. I’ve been thinking about what a phased memo-monad might look like at the surface. One possibility was similar to ST:

``` type Memo w a -- abstract type Memoizer w a b -- abstract class (Eq a) => Memoable a where ... newMemoizer :: (Memoable a) => (a -> b) -> Memo (Memoizer w a b) getMemoized :: Memoizer w a b -> a -> Memo w b setPhaseMemory :: Int -> Memo w () markPhase :: Memo w () -- end of phase memoReset :: Memo w () -- clear all memory runMemo :: (forall w . Memo w a) -> a ```

But the difficulty is similar to ST – if I try to protect the memoizers (associate them with one Memo world) it becomes a bit difficult to reuse them. If I do reuse them, there are difficulties with concurrency, etc.. It’s also ugly and obtrusive.

Perhaps a stateful arrow would work better than a monad, here, since the memoized arrows could then be baked into the model, eliminating need to pass handles around (and loop would be accessible for recursion). I could control access to state so it can only be used for memoization.

I have always been assuming a `local` memoization, e.g. distinct lookup table for each point in the reactive graph that uses a memoized fib. I agree that creating a memoized fib for every K elements could limit memory growth and would hurt memoization. Yet, it might be good enough if K is large. Memoization is a probabilistic optimization anyway.

3. Thanks for raising these issues, David. I’m glad to see the discussion. My hunch agrees with gasche’s: (a) memoization in a pure+lazy setting comes out so well operationally thanks to the close match with the call-by-need operational model; and (b) a way forward is to open up this narrow lazy/call-by-need technique and explore the broader question of semantically benign side-effects. For instance, in addition to replacing thunks with evaluations, we can do the reverse, without semantic change. In other words, while lazy evaluation is a dandy technique for implementing purely denotative/functional languages, it also points down the road to further investigations leading to more lovely & powerful insights & techniques.

4. Hmm. I wonder if an exponentially decaying history might be a reasonable foundation for windowed memoization of pure operations.