I want more expressive task parallelism in context of purely functional computation than is provided by Haskell’s Control.Parallel (`par` and `pseq`). So I’ve been exploring approaches analogous to Haskell’s ST monad, where we guarantee determinism but support explicit mutation internally.
Kahn process networks are a deterministic model for distributed computation. Modulo type-safety, it isn’t difficult to support Kahn Process Networks via monad:
-- THE MONADIC API read :: ReadPort -> KPN m m write :: WritePort -> m -> KPN m () wire :: ReadPort -> WritePort -> (m -> m) -> KPN m () fork :: ProcName -> KPN m () -> KPN m () -- THE ONTOLOGY type KPN m a -- all messages have type m instance Monad (KPN m) type Port = Outer PortName | Inner (ProcName * PortName) type ReadPort = ReadPort Port -- a port we read from type WritePort = WritePort Port -- a port we write to type ProcName = String -- second-class process identifier type PortName = String -- second-class port identifier -- AN EVALUATOR runKPN :: (forall m . KPN m a) -> Maybe a
Read will wait for a message if none is available, and removes the message from the port. Write will buffer the message asynchronously and return immediately. KPNs are deterministic because there is no option to wait on multiple ports for the “first” read. The special features for KPNs are wire and fork. With wire we permanently attach a read port to a write port, allowing data to move implicitly and efficiently. With fork, we can attach a confined child behavior that we can compute independently.
Note: We could extend this API with bounded write buffers to support strong guarantees about memory consumption. We also could introduce logical timestamps for reactivity.
The `runKPN` evaluator in this case assumes a closed KPN for our final computation. It will either return a deterministic value – Nothing if we halt on a read. A naive reference implementation of runKPN could evaluate child processes and wires one small step at a time, while an accelerated or compiler-intrinsic implementation could leverage threads and mutable buffers under the hood.
As an alternative to `runKPN` we could have a runner that supports interactive reads and writes with a long-lived KPN computation in the background.
-- ANOTHER EVALUATOR type BGKPN m = Request m a -> IO (BGKPN m, a) data Request m a = Write :: PortName -> m -> Request m () Read :: PortName -> Request m (Maybe m) bgKPN :: KPN m () -> IO (BGKPN m)
This would allow us to monotonically read and write to the running KPN. If we had linear or uniqueness types, we could use those instead of IO. (It’s also feasible to use ST instead of IO.)
Unfortunately, integration of this KPN monad with the type system leaves a lot to be desired. For example:
- External ports should be visible in the KPN type.
- Ports should have heterogeneous message types.
- After wiring two ports, typefully forbid further use.
I haven’t found a good way to represent these properties. Some variant of effect types might be adapted, e.g. with an effect for each port, and perhaps some indexed types so we can hide ports upon wire.
An alternative to KPNs is to develop a monad around single-assignment variables and promise pipelining. Channels can be modeled by cleverly leaving unassigned variables in ‘hole’ positions, to be written by a producer thread. This is simpler to support in the type system, similar to Haskell’s ST monad:
-- THE MONADIC API new :: P v (v a) read :: v a -> P v a write :: a -> v a -> P v () fork :: P v () -> P v () type P v a -- promises monad, variable type 'v' instance Monad (P v) runP :: (forall v . P v a) -> a
In this monad, `read` waits for a variable to be written but does not modify the variable. Instead `write` diverges if we attempt to write a variable more than once. Consequently, our `runP` must wait on all forked threads to complete, to ensure that we don’t have any duplicate writes. If `runP` returns at all, the result is deterministic.
We could easily implement monad P in Haskell above MVars. So it’s easy to implement and easy to type in Haskell. OTOH, P would be less friendly in context of distribution or memoization.