Partial Evaluation Process Networks

In conventional functional programming, we provide all inputs to a function before we can access any outputs. This makes for convenient reasoning about program behavior, but it also restricts expressiveness and potential for concurrency. An intriguing alternative is to make partial-evaluation first-class and accessible in the language, enabling rich forms of function composition and concurrent interactions while preserving functional semantics.

Trivial Example

As a trivial example, assume we have two functions, f(in X, out Y, out Z) and g(out X, in Y, out Z').  Here I’m using out-args instead of return values, but we could conventionally type these as f : X -> (Y,Z) and g : Y -> (X,Z).

Assuming a suitable syntax and sufficient knowledge about data dependencies, it is feasible to construct a concurrent computation that ties the output of g as input to f and vice versa.  Consider the following:

f(in x, out y, out z) { 
   y <- x + 3
   z <- y * 6
g(out x, in y, out z) {
  z <- (y * 2)
  x <- 4
p(out z, out z') {
    vars x,y

Program p declares second-class single-assignment variables x and y (these are not values), then evaluates f and g concurrently (based on data dependencies). Evaluation of g will immediately produce x=4f will consume x then produce y=7 and z=42, then g will consume y and produce z'=14. The final results, z and z' are visible to the caller of p.

Although trivial, this example exhibits concurrency: in p neither f nor g will evaluate independently, each relies on the other. Without access to concurrent evaluation, we would be forced to inline f and g into p then reorder assignments.

In general, a compiler can implement correct concurrent computations expressed in terms of concurrent partial evaluations by unraveling the components and reordering assignments. However, this does require access to the implementation details.  Alternatively, if separate compilation is required, a compiler could use semaphores and threads, so f and g would evaluate in separate threads and wait on input variables.

The Oz language uses this idea as a foundation for deterministic concurrency, albeit using much richer first-class unification variables and explicit threads. What I propose is a constrained form of what Oz does, where we structurally and typefully guarantee that output variables are assigned at most once, that all required variables are assigned.

I’m still experimenting with alternative syntax, e.g. based on keyword parameters. I doubt that declared variables will be an efficient way to express these computations, especially after I introduce conditional interactions (below).

Session Types for Concurrent Interaction

Relying on deep knowledge about data dependencies is terrible for modularity. In the trivial example, a programmer could break program p by making a change to g that does not affect its interface. Thus, we really need a more precise interface.

Session types can be adapted to this problem. Instead of pi-calculus ‘channels’, we can assume a set of named single-assignment parameters (which may be input or output), each of which is recorded in our type. Thus, for program g from our trivial example, we might use a session type of form x!int . y?int . z!int to record that x is available before y is required as an input. We can safely provide inputs earlier than required, so this is a subtype of y?int . x!int . z!int.

Conventional functions with named parameters and results can modeled using the general form { a1 : ta1, ..., aN : taN } -> { r1 : tr1, ..., rM : trM } with names a1..aN, r1..rM and types ta1..taN, tr1..trM. With session types, this can always be modeled as a1?ta1 . ... aN?taN . r1!tr1 . ... rM!trM. This will always be a supertype of interaction with concurrent partial evaluation.

Session types with just sequencing (the . separator) are not very precise in representing a data-dependency graph. Input dependencies for an output from a function may generally form a directed acyclic graph. For example, consider a dependency graph with three inputs a1,a2,a3 and three outputs r1,r2,r3 where r1 depends on a1,a2, r2 depends on a1,a3, and r3 depends a2,a3.

To represent a dependency graph, the best option is a graph data structure. Graphs are rather awkward to represent using sequential text, but there certainly are ways to do so (e.g. using labels). Session types are already very ugly and noisy, IMO. Developing an acceptable syntax  is left as an exercise to the language designer.

Conditional Interactions

A language can easily enforce that all ‘out’ parameters from a function call are always written. But an intriguing possibility exists: we can conditionally decide which output parameters are written or expected based on prior inputs or outputs. This would allow us to model flexible, dynamic ‘sessions’ while preserving a purely functional semantics, and it can be achieved in a type-safe manner.

In session type systems, we distinguish ‘external choice’ (A & B) vs ‘internal choice’ (A + B). An external choice is one imposed on us by the environment, while an internal choice is one we impose on the environment. There exists a duality based on perspective (am I the environment?). But roughly, we might model & as an input boolean, and + as an output boolean, while A and B are entire session types. Essentially, we’re deciding on entire interactions using a very limited form of ‘dependent type’.

For scalability and extensibility, labeled choices are usually preferred, e.g. &{ add: ?x:int . ?y:int . !r:int , negate: ?x:int.!r:int }.

In adaptation to functional programming, we also need a clear identifier for the label. Thus, we may use a representation such as method&{ add: ?x:int . ?y:int . !r:int , negate: ?x:int.!r:int }. Then perhaps case(method) will allow us to access the labeled choice of interactions like add vs negate, while method.add.x gives us access to the x parameter under add. Access to fine-grained parameters might t

As a reminder, semantically all of these interaction variables are distinct input or output parameters. Logically, method.add.x is a distinct parameter from method.negate.x. Because of type safety, the compiler could allocate a space for only one integer then share it between these two choices (like a C union). But they’re still logically distinct.

With conditional interactions, functions will usually touch only a small subset of their parameters based on the case(method). The syntax based on single-assignment variables, used in the trivial example, is infeasible. We’ll instead require good support for keyword parameters.

Intriguingly, types for conditional interactions will fully subsume conventional types for object-oriented programming interfaces. They further can represent and enforce richly typed multi-step protocols. With support for conditional interactions during partial evaluation, objects are again the poor man’s closure. Conditional interactions hugely increase expressiveness.

Session Types as Data Types

It is feasible to understand structured data types – records and variants – as simply ‘session types’ that are purely output (or purely input). Intuitively, this makes a lot of sense: access to data is essentially a ‘session’ without any interaction, no input requirements. A record is a simple sequence of labeled output data types, an a variant is an output conditional interaction where all cases are pure output data sessions.

Recursive Session Types

Recursive sessions are quite feasible, via conditional interactions, and would model recursive arbitrary-length interactions. In context of functional programming, recursive sessions will form an arbitrary ‘tree’ of single-assignment input-output parameters.

Recursive data structures can be modeled as recursive pure-output data types. Like type Tree a = +{ leaf , node: val!a . left!Tree a . right!Tree a }.

Sequences and Channels

In a prior article, I described some ideas for a statically allocated functional language. General recursive data or session types are mostly incompatible with this goal. However, use of ‘sequences’ can still be a pretty good fit due to stream fusion, the ability to process a sequence in-place.

Sequences can be understood as a subset of recursive session types, where recursion is permitted but the recursive type is permitted to appear at most once in a case. Thus, a tree is not a sequence, but a list is a valid sequence: type List a = + { end, cons: hd!!List a }.

In context of partial evaluation with functional programming, we can begin to output this list before we’ve finished computing it. Another function might then begin to process this list before it has completed. This gives a simple basis for long-lived communications between functional processes, reminiscent of Kahn Process Network channels.

Intriguingly, with session types we aren’t restricted to sequences of pure data. We can very easily model sequences of interactive sessions.

Interactions as Effects

Effects can generally be understood as interactions between a computation and its environment. That is, early outputs are used to determine or influence future inputs. Between conditional interactions and potential use of sequences as long-lived channels, we can model a lot of effects with this approach.

In particular, potential support for lists whose elements are conditional interactions (rather than pure data) could feasibly cover a richly expressive ‘effects’ model. However, I’m still determining relative expressiveness compared to algebraic effects and handlers.

Why Functions?

I’ve gone pretty deep into session types. Why not favor a process calculus?

Purely functional programming has some very nice properties: determinism and locality for easy comprehension and testing, we can generally abstract the program without a separate ‘mode’ of computation like templates, and termination is very natural (all outputs are computed). These nice properties are preserved even if we evaluate just functions a little at a time, or evaluate only a subset of the function’s outputs.

Meanwhile, conventional purely functional programming has proven awkward for modeling effects or support for stream fusion and pipelining. Leveraging the concurrency implicit to precise dataflow analysis and partial evaluation can help cover these weaknesses. These relative benefits would be diminished in context of a process calculi that already does a pretty good job at concurrency.

Posted in Language Design | 2 Comments

Static Resource Programming Language

I am interested in static allocation. Beyond benefits for resource control, hardware description, or embedded systems, static allocation is convenient from a UX perspective. In contrast to a stack or heap, variables from a static allocation have a static existence, amenable to stable layout on a screen or virtual environment. Direct manipulation is also feasible insofar as it fits within the language’s effects-model.

I am also interested in static processing guarantees. Beyond benefits for real-time systems, static processing is convenient from a UX perspective. Compared to loops that wait indefinitely for input, or process indefinite input streams, a guaranteed termination provides a clean semantic opportunity for user interaction: administrative interrupts, user events, direct manipulation of state, or live coding of process behavior.

Continue reading

Posted in Language Design | 5 Comments

Purely Functional Ephemeron Tables

Modulo challenges regarding heterogeneous types and garbage collection, it is not difficult to model variable systems from purely functional languages: variable references can be represented by integers, variable states using a table mapping from integers to values. If we restrict or avoid copying (perhaps using substructural or uniqueness types) we can further support in-place mutation.

I’m willing to accept specialized type extensions to safely interact with such tables. There’s a lot of relevant research on heterogeneous or dependent types that I won’t reiterate here. But this leaves the challenge of garbage collection: ideally, a runtime should be able to remove entries from a table that are no longer referenced.

A compiler could easily support special opaque tables with a limited API, such that we cannot observe removal of elements. However, this still leaves the possibility that we construct identical tables with a common origin. Under referential transparency, constructions of the same table should support the same variable reference values. To support garbage collection, we must therefore sacrifice either referential transparency OR our freedom to use values in arbitrary ways.

Reducing our freedom to use values in arbitrary ways is reminiscent of substructural type systems. However, the separation of variable references and variable state tables requires some form of divided or associative substructural type that I haven’t encountered before, and that I’m uncertain how to formalize.

I could easily develop some ad-hoc static safety analysis that covers common use cases for ephemeron tables. But I’d be very interested if anyone has a good idea for formalizing these divided substructural types in general.

Posted in Language Design | Leave a comment

Streaming Language Rewrite Processing

Streaming Language Rewrite Processing (SLRP) is a low-level computing model that is essentially characterized by limited-memory processors rewriting a much larger input stream. Computation under this condition requires multiple passes, but is highly amenable to pipelining and partitioning, and thus parallelism. SLRP processors may have side channels to external devices that we drive through ‘effectful’ rewrite rules.

This article describes SLRP, its strengths and weaknesses, and methods to mitigate those weaknesses.

TLDR: SLRP is a very promising alternative to the von Neumann architecture. SLRP offers advantages for security, concurrency, distribution, and scalability. However, SLRP performs poorly for data plumbing and loops because there is no pass-by-reference within the stream. This weakness can be mitigated by effects, shunting a region of the stream to a remote device and keeping a small placeholder – a reference.

SLRP Background

The essential requirement for SLRP is a rewrite language that can be mechanically implemented under the constraint of a fixed-memory processor.

Fixed memory is obviously a non-issue for working with fixed-width values, such as natural numbers in range 0..999. A fixed-memory processor can rewrite `*(6)(7)` to `(42)` if it wants, no problem at all if number of digits is limited. In practice we might favor 32-bit or 64-bit binary encodings of numbers, but it’s the same basic idea.

The challenge is dealing variable width values, including first-class functions.

For variable-width values, I assume we can detect the ‘end’ of a value either by counting delimiters or by having a prefix that indicates size then counting bytes. Depth or size is still bounded in these cases, but the bound may be absurdly large, like 2^64.

Mechanically, here’s what we can do: we can erase a variable-width value, we can write a variable-width value to output, we can write a prefix before writing a variable-width value to output (to be handled by a later pass), we can ferry a fixed-width buffer of code or data across a variable-width value, and we keep have a finite state machine that reminds us what to do when we’re done skipping a variable-width value. That’s it.

Below, I implement a Turing complete concatenative combinatory logic via stream rewriting under SLRP constraints. The article assumes at least a distant “I foggily remember reading about it years ago” familiarity with those subjects and either Church encodings or Mogensen-Scott encodings. If you don’t know about these things, or feel you need a refresher, try Wikipedia.

SLRP Machine Code

A usable SLRP machine code is valuable as a proof of concept and a concrete foundation for discussion. The machine language I develop below is based on a concatenative combinatory logic. I’ll start with a semantic foundation of my Awelon language:

a[A][B] == [B]A     (apply)
b[A][B] == [A[B]]   (bind)
c[A]    == [A][A]   (copy)
d[A]    ==          (drop)

The SLRP processor will observe `a[A][B]` from left to right. The essential challenge for SLRP is that our processor has limited memory, yet our operands are unbounded. Hence, given a huge operand, our processor will only see the front end like `a[A...`, and must achieve useful progress within that constraint. Nonetheless, we can implement this in bounded memory if we leverage several auxiliary rewrites.

-- language extensions
w[A][B] == [B][A]   (swap)
q[A]    == [[A]]    (quote)
o[A][B] == [AB]     (compose)*
i[A]    == A        (inline)
\[      ==          (join)

-- mechanically trivial rewrites
 \[  == 
o[A] == [A\
o[A\ == [A\o
q[A] == [[A]]
q[A\ == [[A\\q
d[A] ==
d[A\ == d
i[A] == A
i[A\ == Ai
b    == oa[q]

-- `xyz` fits processor memory
a[xyz][B] == [B]xyz
a[xyz][B\ == [B\a[xyz]
w[xyz][B] == [B][xyz]
w[xyz][B\ == [B\w[xyz]
c[xyz]    == [xyz][xyz]

-- divide-and-conquer tactics
a[XY    == a[X]a[Y
a[[X]Y  == iw[X][a[q]w]a[Y
a[[X\Y  == iw[X][a[o]w]a[Y

w[XY    == a[o]w[X]w[Y
w[[X]Y  == iw[X][a[oq]w]w[Y
w[[X\Y  == iw[X][a[ok]w]w[Y
    where k = o[o]q

c[XY    == oa[a[o]w]c[X]c[Y
c[[X]Y  == iw[X][oqa[a[oq]w]c]c[Y
c[[X\Y  == iw[X][oka[a[ok]w]c]c[Y
    where k = o[o]q

To overcome the limitation that our SLRP processor generally cannot see the end of an operator's first operand, I introduce open-ended blocks, terminated by `\`. Formally, we have `[A\ == o[A]`, so `\` is a valid block terminator where it counts.

Note that several rewrites have two cases depending in whether a block ends in `\` vs `]`. Note also that, in each such case, that particular block has the same output prefix. Thus, we're only deciding on some conditional behavior after we reach the block terminator.

Swap, copy, and apply are complicated in the general case, requiring divide-and-conquer tactics. However, in practice we will mostly use the base-case where `xyz` fits our processor's input buffer. A larger memory will improve data plumbing performance.

Performance of data-plumbing aside, the above set of combinators gives us Turing complete, confluent computation. The machine code is concatenative, which is very convenient for modularizing and constructing programs - e.g. concatenation of machine-code files corresponds to a composition of functions. For those who favor a lambda calculus, we can very easily compile lambda expressions to use tacit `a b c d` operators.

I immediately recommend two more operators for convenience:

A delay operator: `~[A][B == [A][B`. Delay represents an intention to wait for two operands, allowing the second operand to be open-ended. A delay operator allows programmers to control partial evaluations and represent call-by-need computations and codata (such as infinite, procedurally generated trees) via `[ctor~[Arg]]`. It can also be used to embed comments via `d~[comment]`. If our first operand is too large, we can rewrite `~[A == dwa[~[]][A`.

A strict fixpoint combinator: `z[F][X] == F[z[F]][X]`. We could implement this by hand: `z = ic~[iwba[c]b[ic~a[~]]]~`. However, a dedicated loop combinator can reduce overheads and simplify recognition of loops by a processor that might want to perform internal optimizations. If the function operand is too large, we can rewrite `z[F == ia[o[z]q]c~[F`.

I'm sure we can find more operators that improve expressiveness or performance. However, two items deserve special attention: data and effects!


For high-level data structures (trees, stream values, etc.), a Mogensen-Scott encoding isn't bad. However, it isn't suitable for lower-level data IO. So we'll extend our SLRP language with binary data types.

To embed binaries in SLRP, it's sufficient to use a sized header such as `F` before a four-byte floating point. This way, our processor knows that `F]]]]` represents the number 996937981862346752 instead of four block terminators. Besides floats, we could support double-precision floats and integers of common sizes (1,2,4,8 bytes, signed and unsigned). Further, it's feasible to support vectors and simple pairs of fixed-width elements such that `V32PBF` would represent the header for an sequence of 32 byte-float pairs.

Encoding type information with our binary is convenient for safety, for rendering and debugging our streams, and for polymorphic manipulations - e.g. such that `+F....` is implicitly a floating-point add. It is feasible to achieve some collections-oriented programming features with dedicated manipulations of vectors, or adding a float to a vector implicitly adding to every element.

Our drop, quote, swap, and copy operations must be extended to handle binary data. (We cannot apply or inline a binary.) Swap and copy may need divide-and-conquer tactics for large binaries, which might be achieved using vector and pair manipulations (split, append, etc.).

Effectful SLRP

SLRP processors may have side channels for interaction with the external environment - disk, display, sensor, network, etc.. The SLRP stream may act as a 'driver' for these channels, with a subset of rewrites involving interaction with the environment. I propose a request operator `!` for this role:

![ChannelID][Request] => [Response]

Operator `!` tells our processor to push a request down the specified channel, then replaces the request by the response, if any. The channel identifier should be a binary type. The request and response will often be binaries, but that may depend on the channel.

Requests in SLRP streams are opportunistic and unordered. Further, copy and drop operations might replicate or remove a request before it executes. Insofar as this is a problem, SLRP just leaves it to application programmers, who could use patterns such as monadic programming to control use of effects.

Virtual Memory for SLRP

The conventional notion of 'virtual memory' mapping an address space does not apply to SLRP streams (which are not addressed). However, we can adapt the idea: we offload a volume of our stream through a side channel, leaving behind a placeholder. When the code or data is needed again, we can load the data back into our stream.

There are several benefits for virtual memory in SLRP: The placeholder may be much smaller than the data it replaces, allowing for more efficient data plumbing. We can offload to a cheap, high-latency memory to simulate larger memory. We could evaluate the offloaded region on remote processors, thus using virtual memory to partition a parallel computation.

Virtual memory will be implemented effectfully. But, properly implemented, the only observable impact to the SLRP computation should be performance. We might wish to specialize handling of virtual memory to further improve performance. I'm inclined to develop dedicated operators for this purpose.

SLRP Hardware and Virtualization

SLRP processors have a fixed amount of memory. I imagine this quantity to be at least tens of kilobytes. A larger processor memory will allow more data plumbing to be handled by base case rewrite `a[xyz][B] == [B]xyz`, which is much more efficient than divide-and-conquer techniques. Further, processors may buffer output and backtrack, computing multiple passes within a 'sliding window' over the larger input stream. Doing so can greatly improve efficiency for tight loops, allows the processor to focus effort where progress is visible.

However, where SLRP shines is multi-processor systems. We can pipeline our SLRP stream through dozens of processors, thus achieving a great deal of parallel work per full-memory pass. Further, we can use virtual memory to partition and schedule fragments of SLRP streams across distinct processor groups, further improving parallelism and concurrency.

In a heterogeneous multi-processor system, where different processors support different request channels, we could heuristically arrange for code to move to the processor that can handle a given request. This would simulate remote procedure calls or mobile code, while preserving SLRP's simple stream rewriting semantics.

SLRP systems may integrate 'virtual' processors, implemented in software. This is an economic necessity to get SLRP started. However, even assuming SLRP grows into a popular hardware architecture, virtualized processors might prove convenient for integrating 'effects' channels that we're unready or unwilling to implement in hardware - such as higher level GUI layers, or sandboxes.

Aside: SLRP processors are free to optimize rewrites, such as fusing `oa[ops]` or `ow[ops]` into a single rewrite, simply erasing `a[]` and `ww`. Such optimizations are welcome, especially as the technology matures. But I feel we shouldn't rely too much on these little tricks.

Alternative SLRP Languages

As an architecture, SLRP certainly isn't restricted to a basis in concatenative combinators. We could build SLRP over register machines, for example. We could use sized regions to carry code instead of delimited blocks. We could feasibly extend SLRP to work with multiple input and output streams, and perhaps separate the 'code' stream from several parallel 'data' streams. I might explore more of this design space in a later article. But I encourage you to not limit consideration to the specific machine code I developed above.

Posted in Language Design | Leave a comment

Deterministic Parallel Monads

I want more expressive task parallelism in context of purely functional computation than is provided by Haskell’s Control.Parallel (`par` and `pseq`). So I’ve been exploring approaches analogous to Haskell’s ST monad, where we guarantee determinism but support explicit mutation internally.

Kahn process networks are a deterministic model for distributed computation. Modulo type-safety, it isn’t difficult to support Kahn Process Networks via monad:

read  :: ReadPort -> KPN m m
write :: WritePort -> m -> KPN m ()
wire :: ReadPort -> WritePort -> (m -> m) -> KPN m ()
fork :: ProcName -> KPN m () -> KPN m ()

type KPN m a -- all messages have type m
instance Monad (KPN m)
type Port = Outer PortName | Inner (ProcName * PortName)
type ReadPort = ReadPort Port -- a port we read from
type WritePort = WritePort Port -- a port we write to
type ProcName = String -- second-class process identifier
type PortName = String -- second-class port identifier

runKPN :: (forall m . KPN m a) -> Maybe a

Read will wait for a message if none is available, and removes the message from the port. Write will buffer the message asynchronously and return immediately. KPNs are deterministic because there is no option to wait on multiple ports for the “first” read. The special features for KPNs are wire and fork. With wire we permanently attach a read port to a write port, allowing data to move implicitly and efficiently. With fork, we can attach a confined child behavior that we can compute independently.

Note: We could extend this API with bounded write buffers to support strong guarantees about memory consumption. We also could introduce logical timestamps for reactivity.

The `runKPN` evaluator in this case assumes a closed KPN for our final computation. It will either return a deterministic value – Nothing if we halt on a read. A naive reference implementation of runKPN could evaluate child processes and wires one small step at a time, while an accelerated or compiler-intrinsic implementation could leverage threads and mutable buffers under the hood.

As an alternative to `runKPN` we could have a runner that supports interactive reads and writes with a long-lived KPN computation in the background.

type BGKPN m = Request m a -> IO (BGKPN m, a)
data Request m a =
  Write :: PortName -> m -> Request m ()
  Read :: PortName -> Request m (Maybe m)
bgKPN :: KPN m () -> IO (BGKPN m)

This would allow us to monotonically read and write to the running KPN. If we had linear or uniqueness types, we could use those instead of IO. (It’s also feasible to use ST instead of IO.)

Unfortunately, integration of this KPN monad with the type system leaves a lot to be desired. For example:

  • External ports should be visible in the KPN type.
  • Ports should have heterogeneous message types.
  • After wiring two ports, typefully forbid further use.

I haven’t found a good way to represent these properties. Some variant of effect types might be adapted, e.g. with an effect for each port, and perhaps some indexed types so we can hide ports upon wire.

An alternative to KPNs is to develop a monad around single-assignment variables and promise pipelining. Channels can be modeled by cleverly leaving unassigned variables in ‘hole’ positions, to be written by a producer thread. This is simpler to support in the type system, similar to Haskell’s ST monad:

new :: P v (v a)
read :: v a -> P v a
write :: a -> v a -> P v ()
fork :: P v () -> P v ()

type P v a -- promises monad, variable type 'v'
instance Monad (P v)
runP :: (forall v . P v a) -> a

In this monad, `read` waits for a variable to be written but does not modify the variable. Instead `write` diverges if we attempt to write a variable more than once. Consequently, our `runP` must wait on all forked threads to complete, to ensure that we don’t have any duplicate writes. If `runP` returns at all, the result is deterministic.

We could easily implement monad P in Haskell above MVars. So it’s easy to implement and easy to type in Haskell. OTOH, P would be less friendly in context of distribution or memoization.

Posted in Language Design | Leave a comment

Hierarchical Transaction Machines

Hierarchical transaction machines (HTXMs) are an interesting idea I’ve been developing recently. HTXMs accomplish my goals of live, resilient, declarative, extensible software systems. Further, HTXMs accomplish these goals in a way that’s easy to explain and easy to integrate with existing service and device.

First, let’s ignore that ‘hierarchical’ adjective. I’ll get back to it!

A transaction machine (TXM) is essentially: a deterministic transaction, repeated indefinitely.

We assume our system has a set of TXMs. Non-determinism arises due to an arbitrary schedule for a TXM executions within the set. Process coordination is implicit – if a transaction is unproductive, then deterministic repetition would still be unproductive, so for efficiency we simply wait for a change to an observed variable. Programmers can leverage this, for example by aborting a stream-processing transaction when the input channel is empty. External agents and devices are implicitly modeled as TXMs interacting asynchronously and atomically through shared variables.

TXMs have many nice properties:

TXMs support liveness: We can implicitly read our TXM behavior definition at the start of each transaction. This enables us to modify system behavior atomically. This can be leveraged for process control, safe runtime upgrades, and developing application models around live-coding.

TXMs have robust process coordination: There is no need for polling, locks, signals, or interrupts. There is no risk of deadlock or dropped signals. We can wait on arbitrary conditions without specific condition variables. A process scheduler can ensure fairness or prioritize transactions that touch HCI variables.

TXMs are robustly securable: we can define TXMs as operating on an abstract environment provided as a parameter, without static variables. This makes it easy to sandbox the environment, or to simulate it for testing purposes. We can safely work with distrusted TXM definitions by shallowly wrapping them.

TXMs are openly extensible: TXMs are anonymous and operate on a shared environment, like a multi-agent blackboard system. It’s easy to add new features to a system by introducing TXMs that observe or tweak some variables also used by other TXMs. It’s equally easy to remove features.

TXMs intrinsically are idempotent: introducing more than one copy of a TXM won’t affect a system’s logical behavior. This allows us to reason declaratively about whether or not a system has a particular behavior or extension. A set of TXMs is also commutative, but that’s not unusual for concurrency models.

TXMs are fault tolerant and resilient: via hierarchical transactions, we can robustly support strong exception safety with fallbacks for graceful degradation. In a set of TXMs, even if some operate in a failed or degraded mode, the others can operate normally. After the cause for failure is fixed, a TXM will swiftly recover to optimal operating conditions.

TXMs are suitable for real-time programming: it is not difficult to develop a scheduler that will guarantees time slots for hard real-time mission-critical TXMs. We can prioritize transactions responsible for HCI to ensure low-latency reactivity. Compared to threads, TXMs give us very nicely bounded operations for scheduling.

TXMs are easily debuggable: It’s easy to capture and simulate the variable environment observed by a transaction for error reports and regression testing. If transactions return a value on error or commit, we can easily record the history of such values and render it in a task manager or debugger. Debuggers can be supported as normal extensions to the system, allowing ad-hoc views and direct manipulations, rather than hacked in via special compiler modes.

TXMs are simple and familiar: Asynchronous effects through variables are an excellent fit for network access, task queues, direct memory access, publish-subscribe, and other common software design patterns. We can leverage conventional object-oriented interfaces or opaque data types to help protect system invariants. It’s easy to explain and understand TXMs, and to integrate them with real-world systems.

Hierachical TXMs (HTXMs) are an extension to TXMs to support task-parallel concurrency and multi-stage programming while preserving liveness.

The idea for HTXMs is to form a read-fork tree with read-write loops at the leaves. After a change to an observed variables, we might rebuild a branch of the tree, or even the whole thing, thereby preserving liveness. But insofar as we stabilize the tree (via design patterns, caching and buffering, etc.), computation will mostly occur at the leaf nodes.

To support HTXMs is not difficult: we simply add a `fork` operation to our transactional API, which indicates we should attempt a transaction after we commit. Forked transactions can inherit the read assumptions of the parent. Then read-fork operations can be optimized. (This does awkwardly permit expression of mixed read-write-fork, but we could handle that arbitrarily or treat it as a runtime error.)

Conveniently, a single HTXM can model an entire set of TXMs. We can also hierarchically embed HTXMs.

The many advantages of HTXMs come at a cost: rework. When a transaction aborts after some effort, we have wasted at least part of that effort. The premise for TXMs as a practical model is that the amount of rework can be controlled to an acceptable level. Design patterns can help a lot to focus a system on productive work. We can also develop heuristic schedulers that are sensitive to read-write conflict histories to reduce rework due to concurrency conflicts.

Caveat: TXMs are unsuitable for distributed processing. I assume network access is modeled as an effect. If users wish to model PAXOS or CRDTs, they should do so explicitly. (As an exception to prove the rule, TXMs over RDMA might be acceptable.)

Posted in Language Design | 4 Comments

RDP Simplified

Reactive Demand Programming (RDP) is a reactive programming model I developed circa 2010-2012. The essential idea behind RDP is that “effects” should be commutative and continuous.

What I mean by “continuous” is that behaviors must be expressed statefully, attached to stateful representations. Continuity makes it easy to observe and directly manipulate system behavior, which is valuable from a human-in-the-loop perspective. Meanwhile, commutativity means that the order of expression for effects must be irrelevant. This property simplifies reasoning about concurrent behaviors, both within a program and for multi-agent systems.

To achieve commutative and continuous effects, RDP uses a concept of “demands“. A demand is a continuous request from a program (or subprogram), which may be observed and fulfilled by other programs (or subprograms). Demands are only observable collectively, as a set, not individually.

Unfortunately, my original expression of RDP was complicated by three factors:

  • Expression of RDP programs was arrowized, which is relatively painful to work with when compared to monadic programming.
  • I wasted efforts to support “instantaneous” reactive loops, i.e. such that we can observe effects without logical delay.
  • I required idempotence, such that duplicate expression of effects do not contribute to system behaviors.

Monadic expression of RDP programs is not difficult, especially if we remove instantaneous loops. A simplified monadic RDP API:

typeclass (Monoid a) ⇒ Bag a // requires commutative mappend
typeclass (Monad m) ⇒ RDP m where
    type Var m : * → *
    type Dir m : *
    dir : Dir m → String → m (Dir m)
    var : (Bag a, Typeable a) ⇒ Dir m → String → m (Var m a)
    get : Var m a → m a
    put : Var m a → a → m ()

Unlike conventional monadic effect loops, an RDP monadic behavior is applied contininuously, repeating at each time-step. We have a notion of persistent, labeled variables that are available from one time-step to the next. When we get a variable, we’ll receive the value from the previous time-step. When we put, we contribute our value to the next time-step. All contributions within a time-step are unioned together, via commutative mappend. 

By relaxing idempotence – that is, by using bags of demands instead of sets – we can add entries together. This is convenient for modeling votes, priorities, weighted preferences.  We can similarly model negation or suppression, which is convenient for modeling rules or constraints (see Behavioral Programming for examples of how this might be useful).

Although we do not allocate ‘new’ variables, a notion of local state is still valuable for RDP systems. For the API above, I borrow the concept of filesystem-like directories. Local state is achieved by assigning a distinct subdirectory to each distinct module or subprogram, such that subprograms cannot accidentally (or maliciously) interfere with each other. We can feasibly support object-capability patterns involving sharing of vars. Of course, a major goal of RDP is to preserve access to stable views of state for monitoring, extension, debugging, etc.. Thus, local state is never hidden from the parent process.

Effects in RDP system are simply modeled in terms of shared variables between a subprogram and its environment – which at the top-level should be a runtime or operating system. A subprogram can output demands – e.g. to display a window, to move a robot – into a variable that the environment observes. The environment can provide inputs into a variable that our subprogram observes. (The specific variables involved can easily be hidden behind a record of monadic operations/capabilities.) Of course, providing all possible inputs at all times would be inefficient. But we can leverage flexible patterns that involve registering available services or resources, or use of outputs to demand specific inputs.

In this sense, RDP has some similarities with tuple spaces or blackboard systems, albeit with RDP variables being more fine-grained than blackboards. There is also a similarity with functional relational programming, as described in Out of the Tarpit by Ben Mosely.

Implementing RDP efficiently remains a challenge. In general, we could try to optimize evaluation of an RDP subprogram by memoizing its partial evaluation and only re-evaluating sub-computations whose observable results have changed. This might require an explicit memoization combinator, with a comparable result type. Extending the monad with for explicit fork-join concurrency (e.g. fork : m a → m b → m (a * b) and fork_ : m () → m b → m b) may also help in isolating evaluation of subprograms. However, an efficient implementation of a monadic RDP API should at least be simpler than efficient implementation of arrowized RDP, by virtue of monads being simpler than arrows.

Posted in Language Design | 6 Comments

Monadic RDP

Reactive Demand Programming (RDP) is a programming model I developed circa 2010-2012. I’ve tabled it for a few years to work on other project goals, and since then I’ve also become interested in Kahn Process Networks (KPNs) as a simpler alternative in some ways. But RDP still is an interesting paradigm, with features convenient for live programming and open, distributed systems.

The essential idea for RDP is simply this: concurrent effects should be commutative and idempotent.

That is, expressing a specific request twice is the same as expressing it one or three times. If we have two different requests at the same logical time, the order in which we evaluate or observe them should not matter. With these properties, we preserve a lot of the nice equational reasoning and refactoring features associated with purely functional programming, and a lot of declarative reasoning is feasible. Essentially, effects are sets of declarations for input or output.

There are also some secondary ideas developed for RDP, such as precise logical timing. You can search this blog for more on the subject.

Anyhow, the original RDP models are arrowized, but monadic expression of behaviors would likely be more convenient.

Instead of normal imperative variables, which have one value at a time, our monadic RDP system will only have set variables, each with a label. Writing a variable simply adds a given value to the set. Reading the variable will return the prior value at the label, which will be a set.

The sets observed upon reading a variable must have deterministic ordering and duplicates removed, so we only support comparable value types. Actually, rather than sets we could use any commutative, idempotent monoid type. But sets are the most obvious such type.

Like RDP arrowized behaviors, our monad value would be applied continuously, corresponding to a stateless behavior applied in just one time-step. Like arrowized RDP, we could also support a “delay” operation that enables RDP behaviors to be distributed over multiple time-steps, but we might need to constrain the number of delay steps (otherwise we end up with untethered, fire-and-forget RDP behaviors – which is interesting, but undermines any live programming features). Effects would be modeled in terms exposing a subset of variables for external input or output. We might also add a few auxiliary features, such as single-assignment (per time step) labels, value suppression, forking of behavior threads, hierarchical structure for labels, and so on.

Posted in Concurrency, Distributed Programming, Language Design, Live Programming, Open Systems Programming, State | Leave a comment

Awelon Dictionary

A goal of Awelon project is to make software artifacts easy to share, modify, interact with, and compose. Sharing is easiest for documents, images, and similar artifacts that have no external dependencies. For Awelon, I apply this concept to the codebase.

An Awelon codebase is called a “dictionary”. It’s a big set of definitions that are versioned, curated, and maintained together. A dictionary does not depend on external definitions beyond the few Awelon primitives. This opposes the conventional “library” model where we use ad-hoc dependency and configuration managers to glue together interdependent subsets of definitions.

An Awelon dictionary is not limited to defining functions. It can easily contain definitions for data – maps, music, almanacs, articles, paintings, presentations, fonts, forums, books, or blogs. Binary objects may be embedded indirectly via %secureHash references.  Effectively, a dictionary serves as a “smart” filesystem with potential support for typechecked data, partial evaluations, transparent procedural generation, and spreadsheet-like views. In some cases, soft real-time updates are possible, e.g. adding telemetry data for moving objects.

(Aside: We could follow Haskell’s example of modeling a process with a `main` function and monadic effects detached from dictionary state. However, Awelon project is exploring alternative application models to make computation more accessible and visible. Software agents recording application state in a codebase seems a promising basis.)

To share entire codebases, we may logically embed one dictionary within another, such that dict/foo represents access to foo under the named dictionary. This structure avoids naming conflicts and securely controls dataflows – i.e. data can only flow outwards from `dict/` to its host. However, because dictionaries have no external dependencies, they’ll tend to replicate common function definitions (such as arithmetic and list processing).

Ultimately, Awelon dictionaries are massive, widely shared, largely similar, and frequently updated. In context of use, I desire incremental compilation that is optimally shared with equivalent definitions in similar dictionaries.

It took me a long while to develop a suitable representation.

I eventually found inspiration from log-structured merge-trees, radix trees, and use of secureHash for acyclic reference between binaries. Proposed representation:

/prefix1 secureHash1
/prefix2 secureHash2
:symbol1 definition1
:symbol2 definition2

Each dictionary node is represented as a series of updates. We can either update a symbol (defining or deleting it, `:` or `~` respectively) or an entire prefix `/`. In the latter case, we use a secure hash to identify another dictionary node, and we strip the matched prefix from symbols within that node.

To find the definition for a symbol, only the last update with matching symbol or prefix is examined. Hence, this representation can be used as an append-only log for streaming updates. For example, an update to `/p` will mask prior updates to `:poke` and `/prod`. We assume the new node at `/p` will contain those definitions if we intend to preserve them. The empty prefix (as in `/ secureHash`) is valid and useful. It masks all prior updates and essentially represents a checkpoint or reset in context of a stream, or a prototype inheritance as the first update in a node.

Hierarchical dictionaries may be naively included. Using `/dict/ secureHash` we can logically copy an entire dictionary. The radix tree structure that strips matched prefixes is essential for structure sharing. With `:dict/foo def` we can deeply update an individual definition.

Because recent updates are buffered near the root node, we effectively have lightweight working sets with near O(1) update. Because inner nodes are identified by secure hash, we obtain a lot of structure sharing and reasonably efficient diff computations to help maintain a cache of evaluated or compiled forms.

It seems this very simple dictionary representation can support most of my goals. I can even feasibly support distributed storage and lazy download of secure hash references, so clients can work with dictionaries much larger than a single disk.

Posted in Language Design | Leave a comment

Awelon Labeled Data

Awelon is a minimalist language, with four confluent concatenative combinators:

[B][A]a == A[B]     (apply)  
[B][A]b == [[B]A]   (bind)
   [A]c == [A][A]   (copy)
   [A]d ==          (drop)

The motivation for minimalism is simplicity. Awelon project’s goals include that every feature be simple, comprehensible, trustable – from language primitives up to multi-agent software systems (via shared monotonic dictionaries, etc.). So I construct programs from these few primitives. Of course, there are also some disadvantages for performance and syntax. For Awelon, these disadvantages are ameliorated by use of acceleration and projectional editing.

Originally, my intention has been to leverage acceleration and projectional editing even for labeled data – records and variants – in Awelon.

The main benefits of labeled data are extensibility, commutative structure for refactoring, and human-meaningful documentation. Labels can be modeled as paths through a trie, where the path encodes the label. Unfortunately, I’ve discovered that benefits of labeled data seem to be weakened when we choose a concrete encoding of labels and records. It becomes difficult to reason about which observations can be made on a label or record, or how accessors and records might be constructed dynamically. It’s even difficult to preserve meaningful labels when reporting type errors.

I’ve chosen to provisionally add label-primitives to Awelon that work as follows:

[[A] :a [B] :b [C] :c] .b == [B] [[A] :a [C] :c]
[A] :a [B] :b == [B] :b [A] :a  (labels commute)

Logically, these operate linearly on row-polymorphic abstract record constructors:

:a   :   {R without a} [A] :a → {R, a:[A]}
.a   :   [{} → {R, a:[A]}] → [A] [{} → {R}]

The actual record value is abstract and ephemeral, never represented within Awelon code or evaluations. Clients can treat `[[A] :a [B] :b]` as a record. But it’s simultaneously a normal function subject to ad-hoc composition, abstraction, and factoring. Awelon’s abstract record model forbids shadowing of labels and has linear access by default. Records don’t provide any features we couldn’t equally achieve using tuples `[[A][B]]` with extrinsically labeled positions.

For labeled variants, consider that basic sum type `(A+B)` has a Church encoding `∀r.(A→r)→(B→r)→r`. That is, an observer will supply a “handler” for each case, and the value itself selects and applies one handler, dropping the others. For labeled sums, we simply need a labeled product of handlers – `[[OnA]:a [OnB]:b [OnC]:c …]`. Variant values could be given concrete representation such as `[.label [Value] case]`. We need projectional editing to make this pretty, but the primitive syntax is reasonably compact, self-documenting, and close to the intended meaning.

I’m somewhat reluctant to add this primitive concept to Awelon language. I feel labeled data is sufficiently valuable for extensibility, commutativity, and human-meaningful documentation to warrant some special treatment. But there are also some risks – use of labels adds a lot of bloat to data representations, for example. Labels themselves will tend to bloat data, and introduce pressure for naming of things. Many data structures such as lists or trees can get by with just a few labels, so positions work well enough. This features currently has a “provisional” status in case I find a better approach in the future.

Posted in Language Design | Leave a comment