The “vat” is a delightfully simple concept, which I borrow from E language. A vat is essentially a thread with an event loop, but with a few relevant properties:
- Vats do not share state.
- Vats are internally deterministic, up to input.
- Vat events are first-class, i.e. objects or closures.
- Vats communicate by causing events in other vats, via object references.
- Vat events are non-blocking.
- Events from one vat to another maintain order.
Vats offer coarse-grained concurrency, i.e. a single vat can model thousands of objects or actors. The ordering property between vats is weak (since it says nothing about how events are interleaved) but is sufficient for a lot of useful reasoning.
I use vats as a substrate model for implementing RDP.
Basic Vat Model
In Haskell, I modeled vats with a few typeclasses:
-- A Vat can schedule future events, or make babies. class (Monad v) => Vat v where eventually :: v () -> v () -- schedule an event spawn :: v () -> v () -- fork a new vat with initial task -- VatCB creates a callable that, when called, creates -- an event in the source vat. class (Vat v, Callable v cb) => VatCB v cb where method :: (a -> v ()) -> v (a -> cb) -- A callable type should be guaranteed to never block. class (Monad m, Monoid cb) => Callable m cb where call :: cb -> m () -- a 'preferred' callable type, to avoid inference issues. type family Method (v :: * -> *)
The VatCB and Callable types serve to model the creation and execution of events. A vat can also schedule its own events. There are several design motivations for using a separate callable type cb:
- The type cb can be callable from multiple contexts, not just other vats. For example, RDPIO methods can be called from IO threads, which is useful when adapting to a blocking IO model.
- Methods often need callbacks with results, and cb makes a clean callback type (compared to `v ()`) – i.e. the thread performing the callback is not at risk of running arbitrary code.
- Type constructors for cb can enforce that it is non-blocking. A little developer discipline can ensure it is non-divergent.
- The ‘call’ operation can be strict in the argument to cb, which provides a little more performance isolation between vats.
My RDP vat type is named RDPIO. It represents a particular implementation of the vat model, has several more queues, and provides MonadIO to integrate with the wide variety of existing Haskell libraries and FFI.
Timed and Clocked Vats
Useful features for real-time programming – even the fuzzy ‘soft’ real-time that I’m aiming for – are ability to schedule events for a particular time, and ability to reason about fair scheduling. I acheive these features by extending the basic Vat concept with Timed and Clocked concepts.
-- TimedVat can schedule events on a logical clock. -- Special note! 'eventually' for a TimedVat will schedule an -- event for the current logical instant. class (HasClock v, AffineSpace (Time v)) => TimedVat v where atTime :: Time v -> v () -> v () atTPlus :: Diff (Time v) -> v () -> v () atTPlus dt op = getTime >>= \ t -> atTime (t .+^ dt) op class (Monad m) => HasClock m where type Time m getTime :: m (Time m) -- ClockedVat will guarantee 'drift' relative to a clock. class (TimedVat v) => ClockedVat v where setMaxDrift :: Diff (Time v) -> v ()
Every Timed Vat has a logical time. A lot of events might occur at a given logical instant, without time advancing at all. Everything that goes into the ‘eventually’ queue is now part of the same logical instant.
The Clocked Vat extends the Timed Vat with the idea that there is a relationship between logical time and some external ‘clock’:
- The logical time must not advance past the shared clock time.
- The logical time must not fall more than the maximum ‘drift’ behind the shared clock time.
If the shared clock happens to be a wall clock, and drift is guaranteed, then one could say each individual vat is real-time (but not the system as a whole, since drift doesn’t cover latency between vats).
My RDPIO vats are, unfortunately, not true real-time. So, rather than guaranteeing the drift value, I simply stop the shared clock whenever a vat falls too far behind. This forces the fast vats to wait for the slow ones. More importantly, it forces the Haskell scheduler to provide a greater slice of time to the vats that fall behind, thus enabling them to catch up.
It is by this basis that I am willing to call RDPIO ‘soft’ real-time: Clocked Vat provides adequate control over vat scheduling to ensure that no vat is left behind. So long as each vat is able to keep up with its workload independently, and assuming the CPU is not overloaded, then each vat will keep up individually.
For distributed system modeling, waiting between vats is unacceptable. Instead, I plan to model different groups of RDPIO vats as having different clocks. I call the grouping ‘host’, and I tag RDPIO types with a phantom type to constrain certain interactions between vats from different hosts. Most applications only need one global host, of course, and I’m always tempted to call it ‘Ghost’, which seems an appropriate name for a phantom type.
Snapshot Consistency and Glitch Freedom
A ‘glitch’ refers to a temporary inconsistency between descriptions of a resource used in a computation at a given instant. Taming glitches is an important requirement of any semi-decent reactive system, and it’s sad to say that a lot of reactive systems – especially the hand-written observer patterns – fail to even be ‘semi-decent’ by this criterion.
For RDP, I have a modest requirement of glitch-freedom at the local scale. That is, if developers can prove two identifiers in an RDP expression to describe the same resource, without depending on any whole-program knowledge, then they must be consistent. RDP allows glitches at a larger scale because trying to eradicate them entirely would hinder achieving performance and security goals.
RDP provides three layers of protection against inconsistency:
- Developers are encouraged to model logical delays as part of their RDP behavior expressions. If the modeled delays accurately reflect the real communication and computation costs, the resulting RDP behavior will be consistent (modulo anticipation and the occasional scheduling or network hiccup). Delay absorbs glitches, e.g. providing extra time for straggling updates by turning them into logically on-time updates.
- Using the ‘logical instants’ within a Timed Vat, RDP state between each pair of vats can exhibit snapshot consistency. This is described in greater detail below.
- RDP is eventually consistent, and thus will recover from glitches after they occur – i.e. as opposed to exacerbating and multiplying them like you might expect from an imperative model. Even RDP state models can retroactively corrected, just keeping a little extra history. Eventual consistency is a safety net; it is much preferable to avoid glitches than recover from them.
Snapshot consistency between pairs of vats is weaker than global glitch-freedom. For example, a situation such as: “Alice’s view of Charlie is inconsistent with Alice’s view of Bob’s view of Charlie” is allowed because Alice’s snapshot of Charlie is from a different time than Alice’s snapshot of Bob’s snapshot of Charlie. However, snapshot consistency – by itself – is sufficient to guarantee my modest glitch-freedom requirements for RDP. If you reason locally, from Alice – i.e. without whole system knowledge about how Bob is suppose to act – then you could not say that these descriptions are for the same resource.
The other two layers of protection go beyond local reasoning and more towards issues of whole-system wall-clock consistency.
Snapshot consistency is achieved by buffering updates into atomic ‘snapshots’:
- At the beginning of each logical instant, the vat atomically grabs the takes of available RDP updates.
- At the end of each logical instant, the vat atomically delivers a batch of RDP updates to the other vat.
Since these ‘batches’ are delivered all at once, each vat sees only the end-of-instant states from the other vats. With RDPIO, I also force vats to ‘settle’ any local RDP relationships, such that they are internally consistent at the end of each instant. This ensures that each vat is an ‘island of consistency’ from the perspective of every other vat.
In a distributed system, this same sort of update-batching can be achieved easily enough, just across a serialized connection. All it takes is a clear indicator for ‘end of batch’.
As an additional protection for consistency, the Clocked Vat allows developers to guarantee the logical delay between vats: simply model delay equal to allowed maximum drift every time a behavior hops between vats. Using such a technique, it is feasible to achieve full determinism for RDP behavior within a given RDPIO host… though, in practice, this might be a little on the conservative side, trading too much latency for too few consistency benefits.
Vat Performance Characteristics
Vats potentially have very nice performance characteristics:
- There is no direct waiting between vats – no starvation or deadlock. There may be some contention on the central shared clock, but that is a relatively rare interaction.
- There is a lot of implicit batching, which reduces relative concurrency overhead. RDP updates are always batch-processed, and VatCB events might be batch-processed if several events are received before a new instant starts.
- The Clocked Vat ‘drift’ is much weaker than lockstep, so allows a high level of parallelism where different vats process different numbers of instants in any given period.
- Vats have a high level of performance isolation. If each vat, individually, can keep up with its schedule, then so can the full set of vats unless the CPU is overloaded.
- It is easy to isolate performance problems. Clocked Vat allows for one vat to hold up the whole host, but the consequence is that said vat also sticks out like a sore thumb.
Unfortunately, RDP won’t take much advantage. Most RDP computations are lazy transforms of signals, which means most of the real computation is being performed at the occasional sampling point – network, state, video buffers, etc.. I’m sure you can get a few vats on these things, but this sort of non-homogeneous tasking won’t reliably leverage CPU resources. And RDP is designed to ‘settle’ quickly and scale, so if computation is proportional to model size at a large scale, then something is wrong with the model (maybe not enough state for stability).
An idiom I’ve found to achieve a useful level of parallelism, but in a controlled way, is to use ‘par’ sparks to spin off computations that you know or anticipate to be necessary at a near logical future instant. E.g. something like:
(eval args) `par` atTPlus 0.01 (action args).
There are reasons other than performance, of course, to use multiple vats. The performance isolation, lack of state sharing, and local consistency properties can help developers modularize and reason about an application. My own main reason is proof-of-concept for RDP’s scalability and concurrency.