Awelon Dictionary

A goal of Awelon project is to make software artifacts easy to share, modify, interact with, and compose. Sharing is easiest for documents, images, and similar artifacts that have no external dependencies. For Awelon, I apply this concept to the codebase.

An Awelon codebase is called a “dictionary”. It’s a big set of definitions that are versioned, curated, and maintained together. A dictionary does not depend on external definitions beyond the few Awelon primitives. This opposes the conventional “library” model where we use ad-hoc dependency and configuration managers to glue together interdependent subsets of definitions.

An Awelon dictionary is not limited to defining functions. It can easily contain definitions for data – maps, music, almanacs, articles, paintings, presentations, fonts, forums, books, or blogs. Binary objects may be embedded indirectly via %secureHash references.  Effectively, a dictionary serves as a “smart” filesystem with potential support for typechecked data, partial evaluations, transparent procedural generation, and spreadsheet-like views. In some cases, soft real-time updates are possible, e.g. adding telemetry data for moving objects.

(Aside: We could follow Haskell’s example of modeling a process with a `main` function and monadic effects detached from dictionary state. However, Awelon project is exploring alternative application models to make computation more accessible and visible. Software agents recording application state in a codebase seems a promising basis.)

To share entire codebases, we may logically embed one dictionary within another, such that dict/foo represents access to foo under the named dictionary. This structure avoids naming conflicts and securely controls dataflows – i.e. data can only flow outwards from `dict/` to its host. However, because dictionaries have no external dependencies, they’ll tend to replicate common function definitions (such as arithmetic and list processing).

Ultimately, Awelon dictionaries are massive, widely shared, largely similar, and frequently updated. In context of use, I desire incremental compilation that is optimally shared with equivalent definitions in similar dictionaries.

It took me a long while to develop a suitable representation.

I eventually found inspiration from log-structured merge-trees, radix trees, and use of secureHash for acyclic reference between binaries. Proposed representation:

/prefix1 secureHash1
/prefix2 secureHash2
:symbol1 definition1
:symbol2 definition2

Each dictionary node is represented as a series of updates. We can either update a symbol (defining or deleting it, `:` or `~` respectively) or an entire prefix `/`. In the latter case, we use a secure hash to identify another dictionary node, and we strip the matched prefix from symbols within that node.

To find the definition for a symbol, only the last update with matching symbol or prefix is examined. Hence, this representation can be used as an append-only log for streaming updates. For example, an update to `/p` will mask prior updates to `:poke` and `/prod`. We assume the new node at `/p` will contain those definitions if we intend to preserve them. The empty prefix (as in `/ secureHash`) is valid and useful. It masks all prior updates and essentially represents a checkpoint or reset in context of a stream, or a prototype inheritance as the first update in a node.

Hierarchical dictionaries may be naively included. Using `/dict/ secureHash` we can logically copy an entire dictionary. The radix tree structure that strips matched prefixes is essential for structure sharing. With `:dict/foo def` we can deeply update an individual definition.

Because recent updates are buffered near the root node, we effectively have lightweight working sets with near O(1) update. Because inner nodes are identified by secure hash, we obtain a lot of structure sharing and reasonably efficient diff computations to help maintain a cache of evaluated or compiled forms.

It seems this very simple dictionary representation can support most of my goals. I can even feasibly support distributed storage and lazy download of secure hash references, so clients can work with dictionaries much larger than a single disk.

This entry was posted in Language Design. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s