Purely Functional Ephemeron Tables

Modulo challenges regarding heterogeneous types and garbage collection, it is not difficult to model variable systems from purely functional languages: variable references can be represented by integers, variable states using a table mapping from integers to values. If we restrict or avoid copying (perhaps using substructural or uniqueness types) we can further support in-place mutation.

I’m willing to accept specialized type extensions to safely interact with such tables. There’s a lot of relevant research on heterogeneous or dependent types that I won’t reiterate here. But this leaves the challenge of garbage collection: ideally, a runtime should be able to remove entries from a table that are no longer referenced.

A compiler could easily support special opaque tables with a limited API, such that we cannot observe removal of elements. However, this still leaves the possibility that we construct identical tables with a common origin. Under referential transparency, constructions of the same table should support the same variable reference values. To support garbage collection, we must therefore sacrifice either referential transparency OR our freedom to use values in arbitrary ways.

Reducing our freedom to use values in arbitrary ways is reminiscent of substructural type systems. However, the separation of variable references and variable state tables requires some form of divided or associative substructural type that I haven’t encountered before, and that I’m uncertain how to formalize.

I could easily develop some ad-hoc static safety analysis that covers common use cases for ephemeron tables. But I’d be very interested if anyone has a good idea for formalizing these divided substructural types in general.

Posted in Language Design | Leave a comment

Streaming Language Rewrite Processing

Streaming Language Rewrite Processing (SLRP) is a low-level computing model that is essentially characterized by limited-memory processors rewriting a much larger input stream. Computation under this condition requires multiple passes, but is highly amenable to pipelining and partitioning, and thus parallelism. SLRP processors may have side channels to external devices that we drive through ‘effectful’ rewrite rules.

This article describes SLRP, its strengths and weaknesses, and methods to mitigate those weaknesses.

TLDR: SLRP is a very promising alternative to the von Neumann architecture. SLRP offers advantages for security, concurrency, distribution, and scalability. However, SLRP performs poorly for data plumbing and loops because there is no pass-by-reference within the stream. This weakness can be mitigated by effects, shunting a region of the stream to a remote device and keeping a small placeholder – a reference.

SLRP Background

The essential requirement for SLRP is a rewrite language that can be mechanically implemented under the constraint of a fixed-memory processor.

Fixed memory is obviously a non-issue for working with fixed-width values, such as natural numbers in range 0..999. A fixed-memory processor can rewrite `*(6)(7)` to `(42)` if it wants, no problem at all if number of digits is limited. In practice we might favor 32-bit or 64-bit binary encodings of numbers, but it’s the same basic idea.

The challenge is dealing variable width values, including first-class functions.

For variable-width values, I assume we can detect the ‘end’ of a value either by counting delimiters or by having a prefix that indicates size then counting bytes. Depth or size is still bounded in these cases, but the bound may be absurdly large, like 2^64.

Mechanically, here’s what we can do: we can erase a variable-width value, we can write a variable-width value to output, we can write a prefix before writing a variable-width value to output (to be handled by a later pass), we can ferry a fixed-width buffer of code or data across a variable-width value, and we keep have a finite state machine that reminds us what to do when we’re done skipping a variable-width value. That’s it.

Below, I implement a Turing complete concatenative combinatory logic via stream rewriting under SLRP constraints. The article assumes at least a distant “I foggily remember reading about it years ago” familiarity with those subjects and either Church encodings or Mogensen-Scott encodings. If you don’t know about these things, or feel you need a refresher, try Wikipedia.

SLRP Machine Code

A usable SLRP machine code is valuable as a proof of concept and a concrete foundation for discussion. The machine language I develop below is based on a concatenative combinatory logic. I’ll start with a semantic foundation of my Awelon language:

a[A][B] == [B]A     (apply)
b[A][B] == [A[B]]   (bind)
c[A]    == [A][A]   (copy)
d[A]    ==          (drop)

The SLRP processor will observe `a[A][B]` from left to right. The essential challenge for SLRP is that our processor has limited memory, yet our operands are unbounded. Hence, given a huge operand, our processor will only see the front end like `a[A...`, and must achieve useful progress within that constraint. Nonetheless, we can implement this in bounded memory if we leverage several auxiliary rewrites.

-- language extensions
w[A][B] == [B][A]   (swap)
q[A]    == [[A]]    (quote)
o[A][B] == [AB]     (compose)*
i[A]    == A        (inline)
\[      ==          (join)

-- mechanically trivial rewrites
 \[  == 
o[A] == [A\
o[A\ == [A\o
q[A] == [[A]]
q[A\ == [[A\\q
d[A] ==
d[A\ == d
i[A] == A
i[A\ == Ai
b    == oa[q]

-- `xyz` fits processor memory
a[xyz][B] == [B]xyz
a[xyz][B\ == [B\a[xyz]
w[xyz][B] == [B][xyz]
w[xyz][B\ == [B\w[xyz]
c[xyz]    == [xyz][xyz]

-- divide-and-conquer tactics
a[XY    == a[X]a[Y
a[[X]Y  == iw[X][a[q]w]a[Y
a[[X\Y  == iw[X][a[o]w]a[Y

w[XY    == a[o]w[X]w[Y
w[[X]Y  == iw[X][a[oq]w]w[Y
w[[X\Y  == iw[X][a[ok]w]w[Y
    where k = o[o]q

c[XY    == oa[a[o]w]c[X]c[Y
c[[X]Y  == iw[X][oqa[a[oq]w]c]c[Y
c[[X\Y  == iw[X][oka[a[ok]w]c]c[Y
    where k = o[o]q

To overcome the limitation that our SLRP processor generally cannot see the end of an operator's first operand, I introduce open-ended blocks, terminated by `\`. Formally, we have `[A\ == o[A]`, so `\` is a valid block terminator where it counts.

Note that several rewrites have two cases depending in whether a block ends in `\` vs `]`. Note also that, in each such case, that particular block has the same output prefix. Thus, we're only deciding on some conditional behavior after we reach the block terminator.

Swap, copy, and apply are complicated in the general case, requiring divide-and-conquer tactics. However, in practice we will mostly use the base-case where `xyz` fits our processor's input buffer. A larger memory will improve data plumbing performance.

Performance of data-plumbing aside, the above set of combinators gives us Turing complete, confluent computation. The machine code is concatenative, which is very convenient for modularizing and constructing programs - e.g. concatenation of machine-code files corresponds to a composition of functions. For those who favor a lambda calculus, we can very easily compile lambda expressions to use tacit `a b c d` operators.

I immediately recommend two more operators for convenience:

A delay operator: `~[A][B == [A][B`. Delay represents an intention to wait for two operands, allowing the second operand to be open-ended. A delay operator allows programmers to control partial evaluations and represent call-by-need computations and codata (such as infinite, procedurally generated trees) via `[ctor~[Arg]]`. It can also be used to embed comments via `d~[comment]`. If our first operand is too large, we can rewrite `~[A == dwa[~[]][A`.

A strict fixpoint combinator: `z[F][X] == F[z[F]][X]`. We could implement this by hand: `z = ic~[iwba[c]b[ic~a[~]]]~`. However, a dedicated loop combinator can reduce overheads and simplify recognition of loops by a processor that might want to perform internal optimizations. If the function operand is too large, we can rewrite `z[F == ia[o[z]q]c~[F`.

I'm sure we can find more operators that improve expressiveness or performance. However, two items deserve special attention: data and effects!


For high-level data structures (trees, stream values, etc.), a Mogensen-Scott encoding isn't bad. However, it isn't suitable for lower-level data IO. So we'll extend our SLRP language with binary data types.

To embed binaries in SLRP, it's sufficient to use a sized header such as `F` before a four-byte floating point. This way, our processor knows that `F]]]]` represents the number 996937981862346752 instead of four block terminators. Besides floats, we could support double-precision floats and integers of common sizes (1,2,4,8 bytes, signed and unsigned). Further, it's feasible to support vectors and simple pairs of fixed-width elements such that `V32PBF` would represent the header for an sequence of 32 byte-float pairs.

Encoding type information with our binary is convenient for safety, for rendering and debugging our streams, and for polymorphic manipulations - e.g. such that `+F....` is implicitly a floating-point add. It is feasible to achieve some collections-oriented programming features with dedicated manipulations of vectors, or adding a float to a vector implicitly adding to every element.

Our drop, quote, swap, and copy operations must be extended to handle binary data. (We cannot apply or inline a binary.) Swap and copy may need divide-and-conquer tactics for large binaries, which might be achieved using vector and pair manipulations (split, append, etc.).

Effectful SLRP

SLRP processors may have side channels for interaction with the external environment - disk, display, sensor, network, etc.. The SLRP stream may act as a 'driver' for these channels, with a subset of rewrites involving interaction with the environment. I propose a request operator `!` for this role:

![ChannelID][Request] => [Response]

Operator `!` tells our processor to push a request down the specified channel, then replaces the request by the response, if any. The channel identifier should be a binary type. The request and response will often be binaries, but that may depend on the channel.

Requests in SLRP streams are opportunistic and unordered. Further, copy and drop operations might replicate or remove a request before it executes. Insofar as this is a problem, SLRP just leaves it to application programmers, who could use patterns such as monadic programming to control use of effects.

Virtual Memory for SLRP

The conventional notion of 'virtual memory' mapping an address space does not apply to SLRP streams (which are not addressed). However, we can adapt the idea: we offload a volume of our stream through a side channel, leaving behind a placeholder. When the code or data is needed again, we can load the data back into our stream.

There are several benefits for virtual memory in SLRP: The placeholder may be much smaller than the data it replaces, allowing for more efficient data plumbing. We can offload to a cheap, high-latency memory to simulate larger memory. We could evaluate the offloaded region on remote processors, thus using virtual memory to partition a parallel computation.

Virtual memory will be implemented effectfully. But, properly implemented, the only observable impact to the SLRP computation should be performance. We might wish to specialize handling of virtual memory to further improve performance. I'm inclined to develop dedicated operators for this purpose.

SLRP Hardware and Virtualization

SLRP processors have a fixed amount of memory. I imagine this quantity to be at least tens of kilobytes. A larger processor memory will allow more data plumbing to be handled by base case rewrite `a[xyz][B] == [B]xyz`, which is much more efficient than divide-and-conquer techniques. Further, processors may buffer output and backtrack, computing multiple passes within a 'sliding window' over the larger input stream. Doing so can greatly improve efficiency for tight loops, allows the processor to focus effort where progress is visible.

However, where SLRP shines is multi-processor systems. We can pipeline our SLRP stream through dozens of processors, thus achieving a great deal of parallel work per full-memory pass. Further, we can use virtual memory to partition and schedule fragments of SLRP streams across distinct processor groups, further improving parallelism and concurrency.

In a heterogeneous multi-processor system, where different processors support different request channels, we could heuristically arrange for code to move to the processor that can handle a given request. This would simulate remote procedure calls or mobile code, while preserving SLRP's simple stream rewriting semantics.

SLRP systems may integrate 'virtual' processors, implemented in software. This is an economic necessity to get SLRP started. However, even assuming SLRP grows into a popular hardware architecture, virtualized processors might prove convenient for integrating 'effects' channels that we're unready or unwilling to implement in hardware - such as higher level GUI layers, or sandboxes.

Aside: SLRP processors are free to optimize rewrites, such as fusing `oa[ops]` or `ow[ops]` into a single rewrite, simply erasing `a[]` and `ww`. Such optimizations are welcome, especially as the technology matures. But I feel we shouldn't rely too much on these little tricks.

Alternative SLRP Languages

As an architecture, SLRP certainly isn't restricted to a basis in concatenative combinators. We could build SLRP over register machines, for example. We could use sized regions to carry code instead of delimited blocks. We could feasibly extend SLRP to work with multiple input and output streams, and perhaps separate the 'code' stream from several parallel 'data' streams. I might explore more of this design space in a later article. But I encourage you to not limit consideration to the specific machine code I developed above.

Posted in Language Design | Leave a comment

Deterministic Parallel Monads

I want more expressive task parallelism in context of purely functional computation than is provided by Haskell’s Control.Parallel (`par` and `pseq`). So I’ve been exploring approaches analogous to Haskell’s ST monad, where we guarantee determinism but support explicit mutation internally.

Kahn process networks are a deterministic model for distributed computation. Modulo type-safety, it isn’t difficult to support Kahn Process Networks via monad:

read  :: ReadPort -> KPN m m
write :: WritePort -> m -> KPN m ()
wire :: ReadPort -> WritePort -> (m -> m) -> KPN m ()
fork :: ProcName -> KPN m () -> KPN m ()

type KPN m a -- all messages have type m
instance Monad (KPN m)
type Port = Outer PortName | Inner (ProcName * PortName)
type ReadPort = ReadPort Port -- a port we read from
type WritePort = WritePort Port -- a port we write to
type ProcName = String -- second-class process identifier
type PortName = String -- second-class port identifier

runKPN :: (forall m . KPN m a) -> Maybe a

Read will wait for a message if none is available, and removes the message from the port. Write will buffer the message asynchronously and return immediately. KPNs are deterministic because there is no option to wait on multiple ports for the “first” read. The special features for KPNs are wire and fork. With wire we permanently attach a read port to a write port, allowing data to move implicitly and efficiently. With fork, we can attach a confined child behavior that we can compute independently.

Note: We could extend this API with bounded write buffers to support strong guarantees about memory consumption. We also could introduce logical timestamps for reactivity.

The `runKPN` evaluator in this case assumes a closed KPN for our final computation. It will either return a deterministic value – Nothing if we halt on a read. A naive reference implementation of runKPN could evaluate child processes and wires one small step at a time, while an accelerated or compiler-intrinsic implementation could leverage threads and mutable buffers under the hood.

As an alternative to `runKPN` we could have a runner that supports interactive reads and writes with a long-lived KPN computation in the background.

type BGKPN m = Request m a -> IO (BGKPN m, a)
data Request m a =
  Write :: PortName -> m -> Request m ()
  Read :: PortName -> Request m (Maybe m)
bgKPN :: KPN m () -> IO (BGKPN m)

This would allow us to monotonically read and write to the running KPN. If we had linear or uniqueness types, we could use those instead of IO. (It’s also feasible to use ST instead of IO.)

Unfortunately, integration of this KPN monad with the type system leaves a lot to be desired. For example:

  • External ports should be visible in the KPN type.
  • Ports should have heterogeneous message types.
  • After wiring two ports, typefully forbid further use.

I haven’t found a good way to represent these properties. Some variant of effect types might be adapted, e.g. with an effect for each port, and perhaps some indexed types so we can hide ports upon wire.

An alternative to KPNs is to develop a monad around single-assignment variables and promise pipelining. Channels can be modeled by cleverly leaving unassigned variables in ‘hole’ positions, to be written by a producer thread. This is simpler to support in the type system, similar to Haskell’s ST monad:

new :: P v (v a)
read :: v a -> P v a
write :: a -> v a -> P v ()
fork :: P v () -> P v ()

type P v a -- promises monad, variable type 'v'
instance Monad (P v)
runP :: (forall v . P v a) -> a

In this monad, `read` waits for a variable to be written but does not modify the variable. Instead `write` diverges if we attempt to write a variable more than once. Consequently, our `runP` must wait on all forked threads to complete, to ensure that we don’t have any duplicate writes. If `runP` returns at all, the result is deterministic.

We could easily implement monad P in Haskell above MVars. So it’s easy to implement and easy to type in Haskell. OTOH, P would be less friendly in context of distribution or memoization.

Posted in Language Design | Leave a comment

Hierarchical Transaction Machines

Hierarchical transaction machines (HTXMs) are an interesting idea I’ve been developing recently. HTXMs accomplish my goals of live, resilient, declarative, extensible software systems. Further, HTXMs accomplish these goals in a way that’s easy to explain and easy to integrate with existing service and device.

First, let’s ignore that ‘hierarchical’ adjective. I’ll get back to it!

A transaction machine (TXM) is essentially: a deterministic transaction, repeated indefinitely.

We assume our system has a set of TXMs. Non-determinism arises due to an arbitrary schedule for a TXM executions within the set. Process coordination is implicit – if a transaction is unproductive, then deterministic repetition would still be unproductive, so for efficiency we simply wait for a change to an observed variable. Programmers can leverage this, for example by aborting a stream-processing transaction when the input channel is empty. External agents and devices are implicitly modeled as TXMs interacting asynchronously and atomically through shared variables.

TXMs have many nice properties:

TXMs support liveness: We can implicitly read our TXM behavior definition at the start of each transaction. This enables us to modify system behavior atomically. This can be leveraged for process control, safe runtime upgrades, and developing application models around live-coding.

TXMs have robust process coordination: There is no need for polling, locks, signals, or interrupts. There is no risk of deadlock or dropped signals. We can wait on arbitrary conditions without specific condition variables. A process scheduler can ensure fairness or prioritize transactions that touch HCI variables.

TXMs are robustly securable: we can define TXMs as operating on an abstract environment provided as a parameter, without static variables. This makes it easy to sandbox the environment, or to simulate it for testing purposes. We can safely work with distrusted TXM definitions by shallowly wrapping them.

TXMs are openly extensible: TXMs are anonymous and operate on a shared environment, like a multi-agent blackboard system. It’s easy to add new features to a system by introducing TXMs that observe or tweak some variables also used by other TXMs. It’s equally easy to remove features.

TXMs intrinsically are idempotent: introducing more than one copy of a TXM won’t affect a system’s logical behavior. This allows us to reason declaratively about whether or not a system has a particular behavior or extension. A set of TXMs is also commutative, but that’s not unusual for concurrency models.

TXMs are fault tolerant and resilient: via hierarchical transactions, we can robustly support strong exception safety with fallbacks for graceful degradation. In a set of TXMs, even if some operate in a failed or degraded mode, the others can operate normally. After the cause for failure is fixed, a TXM will swiftly recover to optimal operating conditions.

TXMs are suitable for real-time programming: it is not difficult to develop a scheduler that will guarantees time slots for hard real-time mission-critical TXMs. We can prioritize transactions responsible for HCI to ensure low-latency reactivity. Compared to threads, TXMs give us very nicely bounded operations for scheduling.

TXMs are easily debuggable: It’s easy to capture and simulate the variable environment observed by a transaction for error reports and regression testing. If transactions return a value on error or commit, we can easily record the history of such values and render it in a task manager or debugger. Debuggers can be supported as normal extensions to the system, allowing ad-hoc views and direct manipulations, rather than hacked in via special compiler modes.

TXMs are simple and familiar: Asynchronous effects through variables are an excellent fit for network access, task queues, direct memory access, publish-subscribe, and other common software design patterns. We can leverage conventional object-oriented interfaces or opaque data types to help protect system invariants. It’s easy to explain and understand TXMs, and to integrate them with real-world systems.

Hierachical TXMs (HTXMs) are an extension to TXMs to support task-parallel concurrency and multi-stage programming while preserving liveness.

The idea for HTXMs is to form a read-fork tree with read-write loops at the leaves. After a change to an observed variables, we might rebuild a branch of the tree, or even the whole thing, thereby preserving liveness. But insofar as we stabilize the tree (via design patterns, caching and buffering, etc.), computation will mostly occur at the leaf nodes.

To support HTXMs is not difficult: we simply add a `fork` operation to our transactional API, which indicates we should attempt a transaction after we commit. Forked transactions can inherit the read assumptions of the parent. Then read-fork operations can be optimized. (This does awkwardly permit expression of mixed read-write-fork, but we could handle that arbitrarily or treat it as a runtime error.)

Conveniently, a single HTXM can model an entire set of TXMs. We can also hierarchically embed HTXMs.

The many advantages of HTXMs come at a cost: rework. When a transaction aborts after some effort, we have wasted at least part of that effort. The premise for TXMs as a practical model is that the amount of rework can be controlled to an acceptable level. Design patterns can help a lot to focus a system on productive work. We can also develop heuristic schedulers that are sensitive to read-write conflict histories to reduce rework due to concurrency conflicts.

Caveat: TXMs are unsuitable for distributed processing. I assume network access is modeled as an effect. If users wish to model PAXOS or CRDTs, they should do so explicitly. (As an exception to prove the rule, TXMs over RDMA might be acceptable.)

Posted in Language Design | Leave a comment

RDP Simplified

Reactive Demand Programming (RDP) is a reactive programming model I developed circa 2010-2012. The essential idea behind RDP is that “effects” should be commutative and continuous.

What I mean by “continuous” is that behaviors must be expressed statefully, attached to stateful representations. Continuity makes it easy to observe and directly manipulate system behavior, which is valuable from a human-in-the-loop perspective. Meanwhile, commutativity means that the order of expression for effects must be irrelevant. This property simplifies reasoning about concurrent behaviors, both within a program and for multi-agent systems.

To achieve commutative and continuous effects, RDP uses a concept of “demands“. A demand is a continuous request from a program (or subprogram), which may be observed and fulfilled by other programs (or subprograms). Demands are only observable collectively, as a set, not individually.

Unfortunately, my original expression of RDP was complicated by three factors:

  • Expression of RDP programs was arrowized, which is relatively painful to work with when compared to monadic programming.
  • I wasted efforts to support “instantaneous” reactive loops, i.e. such that we can observe effects without logical delay.
  • I required idempotence, such that duplicate expression of effects do not contribute to system behaviors.

Monadic expression of RDP programs is not difficult, especially if we remove instantaneous loops. A simplified monadic RDP API:

typeclass (Monoid a) ⇒ Bag a // requires commutative mappend
typeclass (Monad m) ⇒ RDP m where
    type Var m : * → *
    type Dir m : *
    dir : Dir m → String → m (Dir m)
    var : (Bag a, Typeable a) ⇒ Dir m → String → m (Var m a)
    get : Var m a → m a
    put : Var m a → a → m ()

Unlike conventional monadic effect loops, an RDP monadic behavior is applied contininuously, repeating at each time-step. We have a notion of persistent, labeled variables that are available from one time-step to the next. When we get a variable, we’ll receive the value from the previous time-step. When we put, we contribute our value to the next time-step. All contributions within a time-step are unioned together, via commutative mappend. 

By relaxing idempotence – that is, by using bags of demands instead of sets – we can add entries together. This is convenient for modeling votes, priorities, weighted preferences.  We can similarly model negation or suppression, which is convenient for modeling rules or constraints (see Behavioral Programming for examples of how this might be useful).

Although we do not allocate ‘new’ variables, a notion of local state is still valuable for RDP systems. For the API above, I borrow the concept of filesystem-like directories. Local state is achieved by assigning a distinct subdirectory to each distinct module or subprogram, such that subprograms cannot accidentally (or maliciously) interfere with each other. We can feasibly support object-capability patterns involving sharing of vars. Of course, a major goal of RDP is to preserve access to stable views of state for monitoring, extension, debugging, etc.. Thus, local state is never hidden from the parent process.

Effects in RDP system are simply modeled in terms of shared variables between a subprogram and its environment – which at the top-level should be a runtime or operating system. A subprogram can output demands – e.g. to display a window, to move a robot – into a variable that the environment observes. The environment can provide inputs into a variable that our subprogram observes. (The specific variables involved can easily be hidden behind a record of monadic operations/capabilities.) Of course, providing all possible inputs at all times would be inefficient. But we can leverage flexible patterns that involve registering available services or resources, or use of outputs to demand specific inputs.

In this sense, RDP has some similarities with tuple spaces or blackboard systems, albeit with RDP variables being more fine-grained than blackboards. There is also a similarity with functional relational programming, as described in Out of the Tarpit by Ben Mosely.

Implementing RDP efficiently remains a challenge. In general, we could try to optimize evaluation of an RDP subprogram by memoizing its partial evaluation and only re-evaluating sub-computations whose observable results have changed. This might require an explicit memoization combinator, with a comparable result type. Extending the monad with for explicit fork-join concurrency (e.g. fork : m a → m b → m (a * b) and fork_ : m () → m b → m b) may also help in isolating evaluation of subprograms. However, an efficient implementation of a monadic RDP API should at least be simpler than efficient implementation of arrowized RDP, by virtue of monads being simpler than arrows.

Posted in Language Design | 6 Comments

Monadic RDP

Reactive Demand Programming (RDP) is a programming model I developed circa 2010-2012. I’ve tabled it for a few years to work on other project goals, and since then I’ve also become interested in Kahn Process Networks (KPNs) as a simpler alternative in some ways. But RDP still is an interesting paradigm, with features convenient for live programming and open, distributed systems.

The essential idea for RDP is simply this: concurrent effects should be commutative and idempotent.

That is, expressing a specific request twice is the same as expressing it one or three times. If we have two different requests at the same logical time, the order in which we evaluate or observe them should not matter. With these properties, we preserve a lot of the nice equational reasoning and refactoring features associated with purely functional programming, and a lot of declarative reasoning is feasible. Essentially, effects are sets of declarations for input or output.

There are also some secondary ideas developed for RDP, such as precise logical timing. You can search this blog for more on the subject.

Anyhow, the original RDP models are arrowized, but monadic expression of behaviors would likely be more convenient.

Instead of normal imperative variables, which have one value at a time, our monadic RDP system will only have set variables, each with a label. Writing a variable simply adds a given value to the set. Reading the variable will return the prior value at the label, which will be a set.

The sets observed upon reading a variable must have deterministic ordering and duplicates removed, so we only support comparable value types. Actually, rather than sets we could use any commutative, idempotent monoid type. But sets are the most obvious such type.

Like RDP arrowized behaviors, our monad value would be applied continuously, corresponding to a stateless behavior applied in just one time-step. Like arrowized RDP, we could also support a “delay” operation that enables RDP behaviors to be distributed over multiple time-steps, but we might need to constrain the number of delay steps (otherwise we end up with untethered, fire-and-forget RDP behaviors – which is interesting, but undermines any live programming features). Effects would be modeled in terms exposing a subset of variables for external input or output. We might also add a few auxiliary features, such as single-assignment (per time step) labels, value suppression, forking of behavior threads, hierarchical structure for labels, and so on.

Posted in Concurrency, Distributed Programming, Language Design, Live Programming, Open Systems Programming, State | Leave a comment

Awelon Dictionary

A goal of Awelon project is to make software artifacts easy to share, modify, interact with, and compose. Sharing is easiest for documents, images, and similar artifacts that have no external dependencies. For Awelon, I apply this concept to the codebase.

An Awelon codebase is called a “dictionary”. It’s a big set of definitions that are versioned, curated, and maintained together. A dictionary does not depend on external definitions beyond the few Awelon primitives. This opposes the conventional “library” model where we use ad-hoc dependency and configuration managers to glue together interdependent subsets of definitions.

An Awelon dictionary is not limited to defining functions. It can easily contain definitions for data – maps, music, almanacs, articles, paintings, presentations, fonts, forums, books, or blogs. Binary objects may be embedded indirectly via %secureHash references.  Effectively, a dictionary serves as a “smart” filesystem with potential support for typechecked data, partial evaluations, transparent procedural generation, and spreadsheet-like views. In some cases, soft real-time updates are possible, e.g. adding telemetry data for moving objects.

(Aside: We could follow Haskell’s example of modeling a process with a `main` function and monadic effects detached from dictionary state. However, Awelon project is exploring alternative application models to make computation more accessible and visible. Software agents recording application state in a codebase seems a promising basis.)

To share entire codebases, we may logically embed one dictionary within another, such that dict/foo represents access to foo under the named dictionary. This structure avoids naming conflicts and securely controls dataflows – i.e. data can only flow outwards from `dict/` to its host. However, because dictionaries have no external dependencies, they’ll tend to replicate common function definitions (such as arithmetic and list processing).

Ultimately, Awelon dictionaries are massive, widely shared, largely similar, and frequently updated. In context of use, I desire incremental compilation that is optimally shared with equivalent definitions in similar dictionaries.

It took me a long while to develop a suitable representation.

I eventually found inspiration from log-structured merge-trees, radix trees, and use of secureHash for acyclic reference between binaries. Proposed representation:

/prefix1 secureHash1
/prefix2 secureHash2
:symbol1 definition1
:symbol2 definition2

Each dictionary node is represented as a series of updates. We can either update a symbol (defining or deleting it, `:` or `~` respectively) or an entire prefix `/`. In the latter case, we use a secure hash to identify another dictionary node, and we strip the matched prefix from symbols within that node.

To find the definition for a symbol, only the last update with matching symbol or prefix is examined. Hence, this representation can be used as an append-only log for streaming updates. For example, an update to `/p` will mask prior updates to `:poke` and `/prod`. We assume the new node at `/p` will contain those definitions if we intend to preserve them. The empty prefix (as in `/ secureHash`) is valid and useful. It masks all prior updates and essentially represents a checkpoint or reset in context of a stream, or a prototype inheritance as the first update in a node.

Hierarchical dictionaries may be naively included. Using `/dict/ secureHash` we can logically copy an entire dictionary. The radix tree structure that strips matched prefixes is essential for structure sharing. With `:dict/foo def` we can deeply update an individual definition.

Because recent updates are buffered near the root node, we effectively have lightweight working sets with near O(1) update. Because inner nodes are identified by secure hash, we obtain a lot of structure sharing and reasonably efficient diff computations to help maintain a cache of evaluated or compiled forms.

It seems this very simple dictionary representation can support most of my goals. I can even feasibly support distributed storage and lazy download of secure hash references, so clients can work with dictionaries much larger than a single disk.

Posted in Language Design | Leave a comment

Awelon Labeled Data

Awelon is a minimalist language, with four confluent concatenative combinators:

[B][A]a == A[B]     (apply)  
[B][A]b == [[B]A]   (bind)
   [A]c == [A][A]   (copy)
   [A]d ==          (drop)

The motivation for minimalism is simplicity. Awelon project’s goals include that every feature be simple, comprehensible, trustable – from language primitives up to multi-agent software systems (via shared monotonic dictionaries, etc.). So I construct programs from these few primitives. Of course, there are also some disadvantages for performance and syntax. For Awelon, these disadvantages are ameliorated by use of acceleration and projectional editing.

Originally, my intention has been to leverage acceleration and projectional editing even for labeled data – records and variants – in Awelon.

The main benefits of labeled data are extensibility, commutative structure for refactoring, and human-meaningful documentation. Labels can be modeled as paths through a trie, where the path encodes the label. Unfortunately, I’ve discovered that benefits of labeled data seem to be weakened when we choose a concrete encoding of labels and records. It becomes difficult to reason about which observations can be made on a label or record, or how accessors and records might be constructed dynamically. It’s even difficult to preserve meaningful labels when reporting type errors.

I’ve chosen to provisionally add label-primitives to Awelon that work as follows:

[[A] :a [B] :b [C] :c] .b == [B] [[A] :a [C] :c]
[A] :a [B] :b == [B] :b [A] :a  (labels commute)

Logically, these operate linearly on row-polymorphic abstract record constructors:

:a   :   {R without a} [A] :a → {R, a:[A]}
.a   :   [{} → {R, a:[A]}] → [A] [{} → {R}]

The actual record value is abstract and ephemeral, never represented within Awelon code or evaluations. Clients can treat `[[A] :a [B] :b]` as a record. But it’s simultaneously a normal function subject to ad-hoc composition, abstraction, and factoring. Awelon’s abstract record model forbids shadowing of labels and has linear access by default. Records don’t provide any features we couldn’t equally achieve using tuples `[[A][B]]` with extrinsically labeled positions.

For labeled variants, consider that basic sum type `(A+B)` has a Church encoding `∀r.(A→r)→(B→r)→r`. That is, an observer will supply a “handler” for each case, and the value itself selects and applies one handler, dropping the others. For labeled sums, we simply need a labeled product of handlers – `[[OnA]:a [OnB]:b [OnC]:c …]`. Variant values could be given concrete representation such as `[.label [Value] case]`. We need projectional editing to make this pretty, but the primitive syntax is reasonably compact, self-documenting, and close to the intended meaning.

I’m somewhat reluctant to add this primitive concept to Awelon language. I feel labeled data is sufficiently valuable for extensibility, commutativity, and human-meaningful documentation to warrant some special treatment. But there are also some risks – use of labels adds a lot of bloat to data representations, for example. Labels themselves will tend to bloat data, and introduce pressure for naming of things. Many data structures such as lists or trees can get by with just a few labels, so positions work well enough. This features currently has a “provisional” status in case I find a better approach in the future.

Posted in Language Design | Leave a comment

Lightweight Staged Programming for Awelon

Multi-stage programming is essentially disciplined partial evaluation, a widely useful optimization technique especially in context of DSLs. By ‘disciplined’ I mean programmers can comprehend and control when computations occur, and statically debug staging errors. This is valuable for avoiding hidden performance regressions.

Yesterday, I was inspired and found a simple way to support multi-stage programming in Awelon by introducing two annotations:

[F](now) == [F]     iff [F] doesn't contain (now)
[F](later)          masks F from analysis by (now)

For Awelon, partial evaluation is simplified by lack of entanglement with variable environments and local rewrite semantics.  Hence annotations do not need to drive partial evaluation. They only offer means to declare requirements and debug errors – provide some discipline. I can separately develop models (DSLs, monads, etc.) for convenient multi-stage programming.

The (now) annotation is essentially an assertion that peeks deep into otherwise opaque representations. The (later) annotation enables abstraction of staging requirements and working with more than two stages. Global function definitions should be implicitly staged, such that unmasked (now) indicates static evaluation in that context, and we can safely assume named functions are statically stage safe after compilation.

These annotations can be checked dynamically, but it’s feasible to statically compute whether runtime staging will be correct, assuming we know the arities of all the things. Our inferred types would need to record for each input whether any (now) annotation will be exposed due to partial evaluation if just that many inputs are provided.

Aside: Use of (now) and (later) is inspired from temporal logic. Temporal logic is a a natural fit for multi-stage programming, but I think most languages wouldn’t support such a lightweight adaptation.

Posted in Language Design | Leave a comment

Labeled Data: Records and Variants

Records and variants are basic labeled data types. A record might be expressed as `{x:12.1, y:13.7}`, having a value for each label. A variant might be `Damage:11.3` or `Say:”Hello”`, and involves selection of one label and an associated value. We can generally combine these to have a variant of records or a record of variants. Labeled data is convenient insofar as they are self-documenting (human meaningful) and extensible. Most modern programming languages support labeled data types as a built-in feature. A few have explored fluent extensibility under the names “row polymorphism” or “polymorphic variants”.

But assume we have a much simpler language. This language readily supports simple binary products (a*b) and sums (a+b). A binary product is a simple pair of values like `{fst:12.1, snd:13.7}` while a binary sum is a simple choice of two like `Left:11.3` or `Right:”Hello”`. Effectively, we have spatial labels that are neither human meaningful nor extensible. For my Awelon project, this isn’t a hypothetical scenario. Awelon can readily Church encode binary products and sums, and can even provide a conventional syntactic projection, but has no built-in type-level support for labeled data.

Given simple binary products and sums, can we recover the benefits of built-in labeled data types?

I think we can do that and better. But first a short detour. Type `0` (aka void) has no values (perhaps excepting `error`). Type 0 is the identity type for sums, in the sense that type `(0+a)` or `(a+0)` is equivalent in many ways to just `a`. Similarly, type `1` (aka unit) is an identity type for products, and `(a*1)` or `(1*a)` is equivalent in many ways to just `a`. But there are some important *structural* differences. Any non-error value from `(0+a)` will be encoded as `Right:_`. With deep sum type structures such as `(0+(0+((0+(a+0))+0)))` we end up encoding values as `Right:Right:Left:Right:Left_`.

One idea, then, is to encode our label within this Left-Right-Left sequence. One viable encoding is `(RL|RR)*LL` where `RL` means `Right:Left:` and encodes a zero-bit, `RR` encodes a one-bit, and `LL` encodes sequence stop. (This particular encoding has a convenient property of being simple and self synchronizing.) Given our string of ones and zeroes, we may subsequently encode labels using ASCII or UTF-8. This gives us a basis for variants. Similarly, records can be represented by encoding a trie on the label we use for variants. For the earlier example, directly encoding the path would give us a corresponding structure `(1*(1*((1*(a*1))*1)))`. (We might explore alternative encodings to make it easier to extend or enumerate a record.)

Those 0 and 1 types essentially represent the ‘extensibility’ options for our variants and records. We get more choices for variants, more slots for records, while the label encoding itself ensures unbounded extensibility. Conveniently, we can also use a variant label to access a record, which provides a simple basis for ‘case’ or ‘switch’ expressions. We do additionally need syntactic sugar for working with these otherwise onerous structures, and likely a compiler that accelerates them. But there is little need for special type level support. And by modeling these records and variants, we can more readily get various polymorphic features, and the ability to work with labels themselves as first-class, typeable structures.

Anyhow, while I found this idea interesting, and I haven’t seen other people approaching labeled data this way, I wouldn’t be surprised if it’s mentioned in papers older than I am. I’ll be pursuing this further in Awelon project.

Posted in Language Design, Types | Tagged , , , , , , | Leave a comment