Embedded Literal Objects

When a PL is designed for a text editor, it is natural that text is the richest embedded literal. The text editor provides the visualization and update HCI. And text can certainly be useful; beyond modeling a subset of human input or output, it can model embedded DSLs such as regular expressions. But text is ineffective for many kinds of data we might wish to express in our programs – e.g. 3D meshes, logos, diagrams, graphs, grammars, matrices, decision trees, scene graphs, sound models, stage data in a 2D video game, dance notation, skeletal animation, spreadsheets, and so on.

Conventionally, to work with these richer models requires we use external tools – e.g. Maya, Blender, Photoshop. However, use of separate tools has many disadvantages, such as poor support for abstraction, partial evaluation, integration. It can also be inconvenient to load separate tools for a quick peek or edit.

A few not-so-conventional languages, such as pure data [more], are designed for work on a canvas and provide support for diagrams as literals. And then there are fully interactive programmable systems, such as Croquet project, where we manipulate first-class objects. The latter has advantages for flexibility and programmability, but also disadvantages with respect to entangling the code with the runtime.

I’ve been interested for a while in supporting flexible literal types for my Awelon project. Last week, my I was inspired towoard a simple model that I call embedded literal objects (ELO). An ELO is effectively just a block with some conventions surrounding it for display and update, leveraging edit-time computation. Relevant bullets:

  • represented by block of pure bytecode embedded in source code
  • fixpoint closure type, e.g. `µELO.(Command→(ELO+OtherResult))`
  • commands are either updates or queries
  • IDE processes updates with ELO and writes resulting ELO into source code
  • queries can obtain menus of command, rendering to canvas, access data
  • development of commands and queries left to convention

Like text literals, an ELO can be copied and pasted, version controlled, and readily reused or re-edited across projects. There is no entanglement with the program context. Unlike text literals, an ELO potentially has much richer structure for easy composition, decomposition, and abstraction. We're likely to develop ELOs that measure many megabytes in size. Though, a lot of redundant logic, e.g. for command menus, rendering, could be compressed via bytecode-layer linking model.

For current Awelon project languages, the proposed serialization of ELOs is quite trivial:

〚raw awelon bytecode here〛          in AO
[raw awelon bytecode here]{&elo}l   in ABC

The dedicated syntax for ELOs in Awelon Object (AO) language helps the IDE recognize them for special treatment. I'm using the unicode white variant square brackets: U+301A and U+301B. The resulting bytecode is just a plain old block with an extra annotation `{&elo}` indicating its nature for bytecode level debuggers or extraction. The final `l` operation (type `(a*(s*e))→((a*s)*e)`) shifts the embedded literal object onto the AO stack like all other AO literals.

I love the idea that, modulo partial evaluation, these ad-hoc literal objects remain recognizable at the Awelon Bytecode (ABC) layer. This is valuable for both debugging and extraction at the bytecode layer, and also is consistent with ABC's existing approach to pseudo-literal numbers (e.g. `#42` is a sequence of three bytecodes that generate the number forty-two on the stack) and embedded text (shorthand for a list of small integers). When programs fall to pieces, comprehension starts at the bedrock, which for Awelon project is ABC. Every little bit of consistency, simplicity, transparency, and legibility at the bytecode level will help developers.

A new ELO could easily be generated from the AO dictionary as it stands. An AO IDE might also provide a menu of ELOs by searching for words starting with prefix `elo.`.

Processing ELOs

For text-based embedded DSLs, we typically use a command to interpret or compile the text. For example:

"x y z → y^2 + x^2 - |y|" doTheMath

Of course, text works well for only a small subset of DSLs, such as regular expressions. Math really isn't among them if we want rich symbology, matrices, big division bars, and so on. An Awelon system can potentially optimize this behavior by systematic application of partial evaluation, i.e. such that we do not process the string on every execution.

With ELOs, we have greater flexibility. An ELO has its own logic, data, and can represent both rich and abstract data structures. We might ask an ELO to compile itself, or extract a value, or query a specific field, or to render itself at runtime. An ELO exposes the exact same set of commands to the runtime as it does to the IDE. ELOs can be treated as simple values, copied as many times as needed.

Updating ELO Types

Individual ELOs can be updated via the available commands. But how do we update the set of commands? For example, if we have an 'image' type with a canvas and a bunch of editing commands, how do we add new brushes and commands? I can think of several approaches:

  1. extensible command set: ELO command sets may be extensible and replaceable by convention, via specific commands
  2. explicit import/export: ELOs may provide an 'export' command that is readily imported by the next version of the ELO
  3. adapter pattern: wrap the ELO in an adapter that can translate commands and responses, at cost of performance
  4. extract from ELO: bring debuggers to bear on upon the bytecode and surgically extract the information we care about.
  5. extract from history: record commands that updated an ELO, and replay them on the new version of the ELO
  6. IDE layer extension: recognize common ELO types (e.g. based on query for type info) and install command sets at IDE layer

I suspect the answer to the ELO type update problem will be some ad-hoc mix of all of the above. The first two are preferable, and are ultimately achieved through convention and best practices in ELO design. The adapter pattern isn't ideal, but can work well enough if not layered too deeply. The last few lean heavily on the integrated development environment, but might serve as effective fallbacks. Having a history of commands on ELOs would be useful anyway for extracting user macros into reusable subprograms.

In any case, the update problem is analogous to modifying text literals for an embedded DSL after a language version update. Developers will systematically update all instances until they get bored of doing so and start designing with forward compatibility in mind. At that point, they'll update only the literals that need updating. Likely, there will be a recognizable ELO maturation life cycle.

The Way Forward

At the moment, my Awelon Object language is still trapped in a text editor, and ELOs won't do me much good.

My intention is to eventually upgrade to editing code via a web browser, perhaps in a wiki-like environment. ELOs aren't my only goal; I'm also interested in hyperlinks, live programming, automatic visualization of the tacit environment (perhaps from live tests), interactive documentation (tutorials, etc.), and so on. I think ELOs will become much more viable in such an environment.

There are a lot of details remaining. For example, I'll want to develop an API for rendering to canvas, possibly inspired from HTML or SVG. Animations and sound are interesting areas for literals; I'd love the ability to 'hear' whatever sound model I'm looking at, or see animations running in a loop. For 3D or larger 2D structures, ability to rotate, pan, zoom, etc. could be useful.

Posted in Language Design, Modularity, Types, User Interface, UserInterface | Leave a comment

LZW-GC Streaming Compression

LZW is a simple compression algorithm – certainly the simplest, most deterministic I could find in my investigations. And here I use ‘deterministic’ in the sense that there are no implementation-dependent decisions, ambiguities, smarts or heuristics. LZW has only one real decision to make: dictionary size. And for my application, which requires highly deterministic compression, I’m willing to fix the dictionary size.

But LZW has a weakness: after the dictionary is filled, no new patterns are recognized. And the dictionary, at this point, is very poorly utilized: there are many not useful patterns.

There are existing approaches to mitigate this weakness, e.g. including a ‘clear code’ to reset the dictionary, or compressing chunks at a time. However, these tend to require some heuristic decisions that undermine LZW’s wonderful determinism. They also tend to lose old but valid patterns.

In this article, I propose a simple alternative, garbage collect the dictionary with an exponential decay factor.

A preliminary (but flawed) approach:

  1. To every element in the extended dictionary, add a match count.
  2. Increment match count on match (even when we don’t output the token).
  3. When dictionary is full, output current match token & garbage collect.
  4. To garbage collect (GC), remove zero-match elements & cut match counts in half

LZW is quite optimistic when growing its dictionary. In practice, we can expect that many entries go unmatched, and thus will be available for GC. LZW attempts to grow its dictionary after every output token. Thus, we should expect to perform a GC roughly every dictionary-size outputs. The exponential decay factor results from cutting down match counts in half, thus favoring newer patterns over older ones.

The flaw of this approach is that not all patterns have equal opportunity to find matches. Information about patterns discovered just before GC is more likely to be lost. This flaw can be mitigated by incremental GC. We can GC exactly one token just before every allocation. In a round-robin fashion, we can halve the match count of tokens until we find one with zero count. That’s the one token we collect.

This can potentially result in unusual cases where we ‘match’ patterns that never existed in the original input, i.e. because the dictionary has some intermediate nodes that were collected. This is unlikely but okay; it doesn’t hurt anyone. The dictionary doesn’t change between outputs, and we’ll always be able to decode any token the way the compressor meant it.

To decode, for each input token we recursively increase ‘match count’ for that token and all its dependencies. We also perform a round-robin GC after every input, and keep track of the free token. It shouldn’t be very sophisticated. We’re simply reproducing the same dictionary used to generate the output.

This extended form of LZW, which I’ll just call LZW-GC, is very similar in some ways to sliding window compression algorithms. But I think the dictionary is simpler, especially on the encode. Theoretically, LZW-GC should be superior to a sliding window of equivalent size because LZW-GC can make effective use of older patterns that are long gone from the window. To complete LZW-GC, we should follow it with a Huffman encoding to compress frequent tokens. This is independent of GC at the LZW layer.

Where’s the Code? The code is available on github, along with compression results.

It turns out that LZW-GC is about the same as LZW in most use cases, but significantly better for a few very, very large inputs. I’ll need to explore some ways to speed up dictionary construction, such as adapting LZAP (a variant of LZW).

Posted in Language Design | Leave a comment

ABC Linking with Provider Independent Security

As I’ve mentioned in other posts, Awelon Bytecode (ABC) has a non-conventional approach to code reuse. A valid sequence of bytecode can be given a deterministic, cryptographically unique name – e.g. a secure hash. In ABC, the code to invoke this external bytecode resource might be represented with `{#secureHashOfBytecode}`. Logically, the identified bytecode is inlined in place. In practice, this indirection provides a fine opportunity for separate compilation and caching.

In case of distributed systems, a secure hash offers a potential advantage of location independence. You can download the identified code from anywhere then validate it against the secure hash. Unfortunately, the code itself is exposed to whomever stores it, and in general code can contain IP sensitive information. So we cannot upload the code to anywhere. With just a secure hash, we’re restricted to trusted hosts for storing code. I would prefer a property the Tahoe-LAFS community describes as provider independent security – i.e. that the host cannot inspect the data. This way, we can cheaply rent a few untrusted cloud servers to handle the bulk of data distribution.

It turns out that only a small tweak to ABC’s naming system is needed to support provider independence:

  • use the secure hash of the bytecode as a symmetric encryption key
  • use the secure hash of the cipher text as the lookup key
  • store the cipher text, indexed by its lookup key
  • access via `{#secureHashOfCipherText:secureHashOfBytecode}`

I’ve studied computer security a lot from the PL perspective, but I don’t consider myself an expert in cryptography. To help validate this solution, I compare it to Tahoe’s approach. Essentially, I need a tiny subset of Tahoe’s features: I don’t need mutable files, I don’t need erasure coding or K-of-N recovery, and I don’t large multi-segment files (I can decompose code at the ABC layer).

Tahoe’s documentation describes: “The capability contains the encryption key, the hash of the Capability Extension Block, and any encoding parameters necessary to perform the eventual decoding process. For convenience, it also contains the size of the file being stored.” And here the Capability Extension Block is derived from hashing the ciphertext – albeit, with a layer of indirection to support incremental loading of multi-segment files.

ABC’s model for a bytecode resource capability is very similar. The main difference is that ABC’s encryption key is determined from the bytecode content. And there is no capability extension block indirection, no segmentation at that layer. If there is a weakness in this design, it’s because the encryption key is easily computed from content. But, assuming the hash really is secure and independent of the encryption algorithm, I don’t believe that’s a feasible attack.

Having the final capability text be deterministic from content is valuable for interning, caching, reuse of popular software components (templates, frameworks, libraries, etc.). I would be reluctant to give it up.

Recently, I’ve been deciding on a few concrete encodings.

I’m thinking to use SHA3-384 as basic secure hash algorithm, but now using the first 192 bits for secureHashOfCipherText and the second 192 bits for secureHashOfBytecode. This should maintain the full original uniqueness, or very nearly so: if our encryption function was identity, we’d certainly get the original 384-bit secure hash. Those 192 bits each will be encoded as 32 characters of base64url. AES is still the standard for encryption, and does support a 192 bit key.

Compressing the bytecode prior to encryption is potentially a worthy investment. Doing so could save in storage and network overheads. It’s infeasible to compress after encryption. And a primary consideration is that the compression algorithm should be highly deterministic (almost no room for choices at compression time). (addendum): After investigation, LZ78 and LZW seem promising, especially together with Huffman or Polar encoding.

Anyhow, my original plan was to just use `{#secureHash}` resources and use a distinct encoding and pipeline for the encrypted variation. Now, I’m leaning towards just using the encrypted, provider-independent security naming convention at all times, even when the object never leaves the local machine. Decryption, decompression, and compilation are essentially one-time costs (if we use a good cache) and compilation will tend to dominate. The overhead for provider independent security is essentially negligible.

Convergence Secrets (Addendum)

Zooko – a primary developer on Tahoe-LAFS – notes in comments below that convergent encryption has a known security weakness to ‘confirmation’ attacks if the objects contain sensitive low-entropy information. Actually, confirmation attacks are a potential problem even without the encryption, but is a few orders of magnitude more so with encryption because we’ll be exposed to ‘offline’ attacks. Still, convergence is valuable in the common case for Awelon project (e.g. caching and reusing popular library code), so I’d be reluctant to abandon it.

Fortunately, this can be addressed by adding some artificial entropy. This can be done on a per-resource basis, so long as we have some policy to identify which resources are sensitive and thus should be protected. (In context of Awelon Object (AO) language, perhaps a dictionary word `foo` might be considered sensitive whenever a directive word `secret!foo` is defined.) Unlike Tahoe-LAFS, the secret doesn’t need to be part of the protocol; it isn’t a problem to compile it directly into the bytecode, e.g. by simply adding a random text literal then dropping it:

"sGA_1eXeWzIrmHmgmPmaBAbrW1PyC3oc
~%

Of course, there are many advantages to determinism. So, rather than random texts, I might instead favor a secret per user and use an HMAC style mechanism to deterministically generate a convergence secret per unique sensitive resource.

Posted in Distributed Programming, Language Design, Modularity, Open Systems Programming, Security | Tagged , , , | 4 Comments

Compiling ABC/AO

I’ve spent most of this month developing a more sophisticated compiler for Awelon project, based on first translating the code into a dataflow graph, then translating to Haskell. This allows easy use of Haskell’s variables, and more effective use of pure subprograms. Which, presumably, GHC knows how to optimize. At the moment, runtime performance only slightly exceeds interpreted performance for a simplistic microbenchmark. But I have yet to implement a few simple optimizations on the intermediate graph. Progress on the compiler has been painfully slow, involving a lot of backtracking and exploration of designs (though it’s only a couple thousand lines of code). Anyhow, this post isn’t primarily about performance or progress.

I’ve been thinking about how I should expose compilation to programmers. This post is just me thinking out loud. You’re welcome to join in.

Currently, I support dynamic compilation by use of a `{&compile}` annotation. Annotations must have identity type and no effect observable within the program, but are able to support debugging or tweak performance - e.g. parallel evaluation, or compilation of a block, or runtime assertions. In this case, obviously, we're compiling a block. But use of an annotation is very explicit, and can be somewhat awkward - e.g. requiring constructs like `[foo] compile inline` instead of just `foo`.

There is a good case for explicit dynamic compilation, especially in context of DSLs and metaprogramming. Leaving these to implicit just-in-time (JIT) compilation has always seemed awkward to me; often the programmer does know best. But there is also a good case for implicit compilation, based on traces or hotspots or similar.

At the moment, in context of Awelon Bytecode (ABC), I'm not sure how to go about implicit JIT. In conventional languages, source tends to have a stable name or location in memory, which is easily used to keep a counter to identify hotspots or traces. ABC is designed for streaming code, and ABC blocks don't really have any name or identity beyond their structure, and loops often involve compositions of code. Awelon Object (AO) code does support names (it's essentially a macro language for ABC) but still doesn't have recursive loops. (I'm also hesitant to compile AO directly, because development in that direction might hinder development for streaming bytecode.)

One possibility is to leverage a streaming compression and dictionary building algorithm to gradually identify long sequences of ABC code that are seen often, then compile those common sequences to support implicit JIT. I feel this is worth exploring, but it also seems like a lot of work for potentially marginal gains.

Anyhow, I'm also interested in static compilation, and easily mixing static and dynamic components. Dynamic compilation is useful in this endeavor, but I feel there is more I could do. This morning, I had another idea (which should have been obvious in retrospect):

I can use naming conventions in the AO dictionary to drive static compilation.

AO already uses naming conventions to support automatic testing. (Words prefixed with `test.` become automatic tests.) Gradual development of de-facto naming conventions - e.g. to support typechecking, documentation, property testing, optimizations or rewrite rules, exporting applications - has been part of AO's design from very near the beginning. The idea is that naming conventions do not change the formal meaning of a word (the macro expansion into ABC) but they do suggest contexutal application from an IDE or larger system.

The idea is that I could use a prefix such as `#` or a suffix such as `.exe` to suggest that words of the form `#foo` or `foo.exe` should be statically precompiled (in a runtime where this is enabled) for use within the runtime. Whether this is a good idea, or ultimately just as awkward as explicit dynamic compilation, remains to be seen. But it does appeal to me. It allows appropriately coarse-grained compilation for reusable components, and doesn't hinder performance when interpreting code. It can also enable effective separate compilation, reuse, and code sharing (which might result in better cache behavior).

A potential advantage of this `#foo` approach is that it mitigates a weakness of AO: exponential growth of generated code.

The formal semantics of AO is simply the inline expansion of each word's definition until there are no more words. All that remains, at that point, is an ABC stream. Keeping these extremely simple semantics is why AO forbids recursive function definitions - there would be no limit to simple inline expansion. Loop behavior is instead expressed non-recursively by use of a fixpoint combinator (which admittedly isn't very convenient, but developers can build while/until/foreach/etc. loops above fixpoint).

Anyhow, this inline expansion means that generated bytecode tends to grow exponentially in size with depth and layers of abstraction. This can be partially mitigated for streaming code because we can expand lazily. But not all Awelon project subprograms are streamed, and in general exponential expansion can stress system memory and CPU cache. By utilizing precompiled words at expansion time (as opposed to dynamic compilation, which applies after expansion) it seems reasonable that stress on memory and cache could be reduced. Further, I could integrate this with ABC's own approach to separate compilation and linking.

Awelon project has a non-conventional and interesting approach to separate compilation, linking, and resource sharing:

Instead of importing human-named modules and functions, we can identify a software component by the secure hash of its bytecode. For example `{#ngv6gnrqki5vgnyxtodjorlrwoiinmfk74gg3ftwj7byrqratmlomruoce57}` might identify bytecode that implements a fibonacci function. Theoretically, there is an opportunity here for collision in names. But, in practice, that probability can be reduced well below the threshold of unlikely problems like cosmic rays and meteor strikes. The convention is that this invocation shall search for a resource, download and validate it if necessary, then cache and compile locally. But this convention can be tweaked for circumstance, e.g. shifting the compilation task to a trusted proxy. If the resource cannot be obtained, the program simply fails. If space is needed, old or infrequently used resources (or perhaps just their compiled objects) can be discarded from cache. Search locations for download might be extended by use of annotations.

Use of a secure hash instead of human names avoids a lot of problems: update and versioning concerns are cleanly separated, there is no cache invalidation, we can download the code from anywhere, there is no trouble using proxy repositories. The design is highly compatible with open and distributed systems and streaming code. The hash can even serve as a secure capability of sorts, depending on the protocol by which we download the resource. I believe Awelon's design is theoretically superior to most conventional approaches, but it has yet to be implemented.

In context of precompiled words, it seems feasible to start using this technique directly: code of the form `#foo` might compile to a `{#secureHash}` resource instead of expanding fully into the underlying ABC. (For debugging, this might become `{#secureHash (#foo)}`.) This would make concrete what's happening with respect to space savings, separate compilation, and reuse, i.e. because each compiled element has a fixed size in the ABC stream. Providing developers the control here could prove immediately useful. Long term, however, division of a program into optimal reusable subcomponents might be mostly automated.

So... coming back to JIT.

The earlier difficulty with JIT for ABC was the lack of a stable location or names. However, these precompiled words should serve as moderately effective JIT targets. Naively, we might simply compile each word the first time it is required. But more sophisticated approaches involving counters or whatnot are certainly feasible.

I think I'll give these precompiled words a try. Reviewing this article, it seems I favor the `#` prefix for precompiled words, perhaps due to the allusion to `{#secureHash}` resources. Hopefully, this really will help with performance by reducing the memory and cache burdens.

Addendum 30 June

I've played around with using word prefixes to guide compilation, but I find this fairly inconvenient and rigid, requiring too much editing of the dictionary to test slightly different configurations. An alternative I'm now considering is to use a separate configuration value, which simply contains a list of words to precompile (one per line).

Posted in Language Design, Modularity, Open Systems Programming | Tagged , | Leave a comment

Awelon Project Report VI

Progress in May was disappointing. I didn’t complete either of my primary goals towards improving JIT or typechecking for ABC. My efforts were based around compiling ABC code into an intermediate box-and-wire ‘graph’ that hides the dataflow operators and supports a variety of useful optimizations. However, it seems I aimed to accomplish too much in a single pass: an ad-hoc mix of partial evaluation, inlining, and graph writing. Cumulative complexity dragged me down. :(

I’ll be making a similar attempt this month, albeit in smaller steps. Partial evaluation, inlining, and similar will occur in separate passes. I’ll probably need a few more intermediate graph types, though implementing a lot of different types can be painful in its own way. Fortunately, this compilation to graph should contribute to my other primary goals for typechecking and reactive implementations…

On a more positive note, my secondary efforts in May were not unfruitful:

  • asynchronous computations: I implemented, tuned, and tested an `{&asynch}` annotation that marks a block for asynchronous computation (`(a→b)→(a→b)`). Under the hood, this uses unsafeInterleaveIO and an MVar, such that clients wait on the `b` value only when they attempt to observe it. There is also a finite pool of worker threads, to avoid starting more computations than we can easily handle. Each action is required to terminate, and we'll wait for all asynchronous operations to complete at implicit points in the input stream (e.g. between 'paragraphs' in a command stream, or between commands in the REPL). Errors from asynchronous operations are accumulated and reported together.
  • asynchronous filesystem ops: all commands to observe or influence the filesystem are now asynchronous, albeit serialized for each filename
  • test.jit added a utility to the `ao` command line to run the test suite via JIT. This currently takes an absurdly long time: over 4 minutes when first compiling the code, then 1.2s using the cached compilations. Compare to 0.18s to run the test suite via interpreter (of which 0.1s is just loading the dictionary).
  • abc.ann added utilities to the `ao` command line for annotated ABC, i.e. including location information from source. This could potentially help with a debugger.
  • equality assertion as annotation: in April, I modified the `>` operator to be monomorphic, operating only on numbers. However, this broke many tests. I developed a new annotation `{&≡}` (that's U+2261) that is useful for the common case of asserting equivalence. This not only simplifies testing; it should also be useful when (for example) composing maps with user-defined key comparison functions; one can assert the key comparison functions are equivalent without actually observing the equivalence. Only structural equivalence is guaranteed to pass; I don't require decidable observational equivalence.
  • design thoughts on security for open systems: I've been brainstorming a bit about homomorphic encryption and encrypted computations in open distributed systems, and how I might represent this in ABC (mostly via tokens and simple conventions). This doesn't have any immediate application, but could become a very useful idiom.

I also learned that about 0.6s overhead for loading JIT programs is one-time only (on the first load), and the load cost is only a few milliseconds thereafter. This is much less a problem than I was imagining in April, when most of my tests involved a single {&compile} operation. The more significant cost now regards compilation itself, and I'll eventually need to generate code that is easy for GHC to parse and compile (and ideally reuse).

Posted in Language Design | Tagged , , | 2 Comments

Half-Embedded DSLs

Domain Specific Languages (DSLs) come in two primary flavors: external or embedded. External DSLs have their own external tools – parsers, interpreters or compilers, optimizers, etc. – apart from any general purpose language. Embedded DSLs exist in context of a general purpose language such as JavaScript or Haskell. In practice, external DSLs tend to grow in power to cover various corner cases until they accidentally become poorly designed general purpose languages. Embedded DSLs can avoid this fate by providing escapes to the hosting language’s semantics. However, these escapes can also make it more difficult to port, reason about, optimize, serialize, or update the DSL code.

There is a sweet spot that lies between embedded and external.

For an external DSL, we are implicitly assuming a particular approach to code distribution, e.g. Alice sends Bob some DSL code, which is then parsed and interpreted by a program that Bob already possesses, which hopefully has a compatible version. This sweet spot is found by tweaking the code distribution model: Alice sends Bob some DSL code, and Alice also sends Bob the program to parse and interpret (and optimize) this code. Alice can then address corner cases by updating her interpreter, and further retains all the advantages of external DSLs.

In this case, the interpreter is sent using a common general purpose language or bytecode, and the client provides a simple API. If you’ve worked with many of JavaScript frameworks, you’ve probably encountered this pattern, with the DOM serving as the API. A script section doesn’t need to be JavaScript, and might be interpreted by other JavaScript code.

Of course, design at this sweet spot presents a few performance challenges.

First, those parsers/interpreters/compilers/optimizers can be relatively heavy – themselves having hundreds of kilobytes up to a few megabytes of code. Alice doesn’t want to send the full interpreter every time a web-app is loaded… and Bob, similarly, doesn’t want to parse and compile and link a big interpreter every time. If a DSL is popular across dozens of websites, Bob also doesn’t want to load dozens of copies to accommodate his tab addiction. We need a good way to cache and reuse the interpreter code, at which point Bob can afford to optimize it heavily and use it hundreds of times.

Second, we ideally want the DSL code itself to operate with near-native performance. This suggests we need some sort of just-in-time or dynamic compilation, i.e. such that we can compile a DSL by translating it into the host language or bytecode. People care about performance. It would undermine the use of DSLs if developers had to jump through hoops to squeeze excellent performance from them.

In conventional PLs, interpreters tend to become named packages, and dynamic compilation is relatively rare or awkward. So this approach to DSLs seems to be rare outside of JavaScript. And, unfortunately, JavaScript is not a great target language (e.g. with respect to concurrency, security, serialization and distribution of first-class functions), and conventional URLs are not very nice as cross-domain reusable modules for bulky interpreter code. A simple secure hash of the resource would be much better: acyclic, origin independent, no cache invalidation.

I believe we’d have much to gain by developing and promoting a bytecode where these half-embedded DSLs can thrive, ideally together with an Internet with services, databases, and browsers progammable in this manner. My Awelon Bytecode (ABC) is designed for this role. Of course, JavaScript has all the advantages of incumbency and will be exceedingly difficult to dethrone, despite its weaknesses.

Posted in Language Design | 2 Comments

Awelon Progress Report V

Performance remains a priority.

At the beginning of this month, I reported some success with compiling the AO dictionary to Haskell ahead-of-time. However, after just a few days of using aoExec, I was feeling very sick of large up-front compilation times after tweaking the AO dictionary. Further, I realized I need some equivalence between `aoExec` and `aoi`, especially with respect to their powerblock and available effects, if testing in one is to carry over to the other.

On April 9th, I decided to discard `aoExec` and try a dynamic compiler based on Haskell’s plugins. Besides offering a more dynamic development environment, this approach should be much better than `aoExec` for metaprogramming.

I only required a few changes, but I (foolishly) decided to reorganize and rewrite most of the ao libraries. On the bright side, the rewrite netted significant (and rather unexpected) performance gains for my interpreter. The `bench.repeat100k` microbenchmark that before took 2.4s to interpret (and 0.4s when compiled), now takes 0.56s to interpret. The `ao test` that before took 1.38s, now takes 0.155s to interpret (though, it before took 0.012s when compiled into `aoTest`).

Dynamic compilation is now supported.

In AO, developers can use the annotation `{&compile}` to compile a block. The Haskell plugin is a module generated from the raw ABC code, and uniquely named based on a cryptographic hash of said ABC code. The type passed across the module boundary is `newtype Resource = Resource (forall m . (Runtime m, Monad m) ⇒ V m → m (V m))`, leveraging GHC's Rank2Types extension.

This works. As in: it compiles, it links, it executes and returns correct results. Simple `$c]` tail calls are working fine. But:

  • It takes 0.8s to compile the identity function, and 1.86s to compile `bench.repeat100k`
  • Upon execution, the compiled form (for this microbnenchmark) is approximately 65% the speed of the interpreted bytecode.
  • Object code is never garbage collected after load.

Still, I'm optimistic! I can't do much for the latency, but I could compile in parallel and hot-swap a block used in a loop, and perhaps avoid rebuilding code if it already exists. Due to a bug in GHC, I can't fix the garbage collection before GHC 7.8.1 (I currently use 7.6.3, which ships with Ubuntu). But, when the time comes, I might be able to leverage System.Memory.Weak ephemeron tables or foreign pointers to address garbage collection. Most importantly, I can improve the generated code.

The Haskell code I generate is currently extremely simplistic and naive. For example, the bytecode `vrwlc` expands into:

{-# LANGUAGE NoImplicitPrelude #-}
module Rynms52btpb4zpgpwot32aqx2mzgkkxe4serh3c4tht3rcieaj4pjjbywahbe
    (resource) where
import ABC.Imperative.Operations
import ABC.Imperative.Resource
import Control.Monad

resource :: Resource
resource = Resource (v>=>r>=>w>=>l>=>c)

I have a big list of ways to improve the generated code, e.g. by building a Haskell expression rather than directly composing operations I could reduce this to `resource = Resource $ \ (P a b) -> return (P b a)`. I suspect I can achieve an order of magnitude improvement for sophisticated programs. Additionally, with a little partial evaluation, I should be able to directly recognize fixpoint cycles and model loops in forms GHC can effectively work. If an ABC program is highly repetitive (large common subprograms) then it might also be useful to recognize these and reuse them to reduce memory footprint.

I'll be giving these improvements a try over the next couple weeks.

Leaping to Awelon Project's process model

I've spent a lot of time this month thinking about process models. In the last report, I mentioned, "I’ll need to develop the `aoExec` powerblock a fair bit to perform useful side-effects". Initially, I was laboring under an assumption that I'd pursue a fully imperative approach to get started, then go about implementing Awelon project's model (a toplevel ABC stream that manipulates a 'live' RDP program). However, imperative processes are an awkward fit for AO's constraints (spatial idempotence, causal commutativity, progress, etc.). I can model imperative processes if I'm willing to sacrifice causal commutativity, or accept lockstep concurrency, or implement a time-warp system. But I'm not willing to do the first two, and the last is even more sophisticated than RDP.

My decision is to move forward with the Awelon project model, albeit tweaked slightly: we'll still manipulate a representation of an RDP program, but the act of 'installing' the RDP behavior will be an explicit imperative action (albeit, an idempotent and commutative action). This risks one of my design principles: that "ALL long-running behaviors and policies should have corresponding state in the environment." However, it also allows much greater precision for how we interpret the RDP program, how much we recompute, and eliminates risk of trying to interpret invalid intermediate states. Presumably, some environments might couple these update actions more tightly to avoid behaviors falling out of synch with their representations. But, for now, that will be left to discipline.

Ultimately, both `ao exec` and `aoi` can use this same model. They already receive the same toplevel stream of AO or ABC. I haven't even started implementing the RDP interpretation of ABC, since I'm still focused on performance in the imperative layer. Fortunately, I don't expect any special challenge: Sirea was already a prototype for this sort of development.

Overall, I think this change is moving my timeline in a nice direction: I won't be detouring through imperative process models or development of associated words and idioms. I'll get more practice implementing RDP for when I fully bootstrap.

Miscellaneous

  1. The `>` ABC operator is now monomorphic, operating only on numbers. This change was suggested to me because `>` was the only remaining polymorphic operator that requires type information in simplistic implementations of ABC.
  2. The list type from `µL.(1+(a*L))` to `µL.((a*L)+1)`, mostly because operator `V` makes it a lot easier to add elements on the left than on the right (fewer operators for 'cons'). This change had a pretty small impact, only required tweaking a few list processing words.
  3. New utility `ao exec` offers a similar role to the now defunct `aoExec`. Though `ao exec` is more convenient because it receives an ad-hoc AO command.
  4. New utility `ao exec.s` can receive a stream of AO via standard input. It's basically the non-interactive `aoi`.
  5. New utility `ao exec.abc` can receive raw ABC code as a command. And `ao exec.abc.s` can receive ABC via standard input.
  6. New utility `ao jit` outputs the Haskell code that will be generated for dynamic compilation of a given program.
  7. New utilities `ao list`, `ao defs`, `ao uses`, and `ao def` are useful for studying the AO dictionary without the text editor.
  8. The old `ao type` utility is currently disabled and needs rewritten. I have new ideas for it anyway.

(Aside: I'm thinking to implement some command-line refactoring utilities, e.g. to rename words or update definitions.)

It has been a productive month. I've spent most of it 'in the flow'. However, there wasn't much development of new AO code.

What's Next?

Performance is the top priority for me. I don't want AO to remain a 'toy' language for another six months. I'll be spending the next couple weeks aiming to improve the generated Haskell code. I imagine I'll be content, if not happy, to achieve an order of magnitude improvement for loopy programs.

Beyond that, I'm interested in developing a more sophisticated static type inference - i.e. to properly detect structural fixpoint (µ) types, and track correctness under multiple sets of assumptions, and perhaps determine a conservative set of 'gatekeeper' safety predicates upon entry to a subprogram. I also have some ideas on how to develop `type.` and `typeOf.` words in AO for automatic testing and documentation purposes. (It'd be nice to assert that certain arguments are polymorphic, or to develop something like QuickCheck based on type descriptors.)

And, of course, I need to develop the RDP interpretation of ABC.

I don't have a solid timeline for all this, but my best guess would be: two more weeks on performance, two weeks on AO type inference, then get back to RDP integration in June (after which time I'll be able to try some long-running applications). I'd like to also develop the dictionary some, but I don't have much hope of getting back to it this month.

Time is flying! Hopefully, by October, I'll be ready to implement real programs (e.g. video games, web servers) and use this technology in real practice.

Posted in Language Design | Tagged | 2 Comments

Awelon Progress Report IV

I spent most of March attempting different approaches to improve AO’s performance. This isn’t what I set out to do. At the end of February, I wrote: “My plans for the next month are to start building functions for data structures: maps/records, sequences, streams.” And I did make a little progress in that direction – but not nearly enough to pretend I met my goals. Performance remains a pain point for me; I really wish to start developing applications within AO – such as an IDE, so I can ‘dogfood‘ the development of AO. But that’s simply impractical if the performance of the resulting application is inadequate. I fell to the siren song of performance pursuits.

Early in the month, I finally cleared a few tabs from my browser, reading Laurence Tratt’s excellent article, Fast Enough VMs in Fast Enough Time (highly recommended to anyone developing a PL), which describes how PyPy can construct a JIT compiler from an interpreter. I tried my hand at this for ABC, but I suspect I’ll need a more explicit memory model to meet PyPy’s restricted python (RPython) constraints. I might get back to this later. At the moment, the PyPy interpreter for ABC has approximately the same performance as the aoi interpreter in Haskell.

I had more success with a second attempt:

I’ve implemented a simple AO to Haskell translator. From the AO package, `ao dict2hs` will convert the entire dictionary into one Haskell module, AODict, using a simple name mangling scheme. This relies on an externally provided AOPrelude module for definitions of the ABC primitives. This is a very naive, direct translation, but the AOPrelude can (and does) include GHC rewrite rules. The resulting (profiled) Haskell object file is about 12 megabytes for 1141 words, or about a third that for non-profiled code. (The raw Haskell code is 242kB, where the AO dictionary is 144kB. Much of this is documentation (doc.) words.) The only tricky bit was ensuring AO can use Haskell’s TCO, which I eventually achieved by specializing the `$c` sequence. This development comes with a few new utilities for running AO tests and executing AO programs (`aoTest` and `aoExec`).

The performance of this translation seems pretty good for “straight-line” code. For example, `aoTest` and `ao test` run the same tests, but the former runs two orders of magnitude faster (0.012s vs. 1.38s). I was very excited by these numbers. Sadly, they didn’t generalize. Upon implementing a simple loop code (`@bench.repeat100k 0 [4 .add] 100000 repeat 400000 assertEQ`) I discovered a performance factor closer to 6 (from 2.42s to 0.40s). Further, I suspect that the improvement will prove very marginal for deep metaprogramming (where partial evaluation and JIT become more essential).

Despite the limits of this technique, it may be enough to begin dogfooding further development of AO.

So, that’s my plan this April. I’ll need to develop the `aoExec` powerblock a fair bit to perform useful side-effects, but that should not take long. I will need to test the `{&async}` annotation, which I implemented in the AOPrelude. I’m hoping, within the next month or two, to develop a few simple web applications mostly within AO, and eventually shift development of AO from filesystem + gedit into a persistent, wiki-like service. As AO’s performance further improves, more domains should open up to it.

Posted in Language Design | 5 Comments

Awelon Progress Report III

Another month gone, so quickly. What the heck did I do? It’s difficult to recall, but my git history is helping.

It seems my major accomplishments this month were:

  • implemented automatic unit testing for the AO dictionary (runs all `test.` words)
  • implemented an automatic type analysis based on partial evaluation
  • implemented a few list processing functions (e.g. nub, partition, sort, map, each)
  • simplified and optimized fixpoint combinator (15 ABC prims, from 80)
  • cleaned up the conditional behavior words (e.g. `if` and `if_`).
  • partially developed an incremental processes model (based on `µP.[a→(P*b)]`)

Not too bad, but I can’t say I’m entirely satisfied. I had an aha-moment idea for direct compilation of ABC into a Haskell monad (to improve bootstrap performance) but that ran into a wall. After picking up the pieces and removing the debris, all I had left of that effort was the typechecker. I also ended up removing adverbs, after experimenting with the idea and finding I needed more specialization for context and category (e.g. lists vs. streams vs. automata).

Ah, six steps forward, two steps back. That’s progress. :-/

My plans for the next month are to start building functions for data structures: maps/records, sequences, streams. And to further fill out the set for lists. I hope to get far enough along that I can start modeling grammars in April. At the moment, I have 844 words in AO, of which slightly more than half (440) are documentation (`doc.`), tests (`test.`), or assertions (`id.` or `eqv.`). I don’t know how many words I’ll need to bootstrap an ABC or AO compiler – quite a few, I imagine, but hopefully fewer than 10k. Regardless, I need to pick up the pace.

More subjectively, I quite enjoy programming in my new language.

The REPL plus automatic tests mix is very effective. I’d still like some automatic visualization and continuous feedback tools, and I very strongly look forward to rendering colors/styles instead of suffixes/prefixes for large words. But I’m not desparate for these features. AO has been delightful in its simplicity, security, safety, and flexibility. The first-class nature of the environment gives me much greater feeling of ‘control’ than I’ve felt in other languages. Going back to Haskell is starting to feel a little painful.

My main pain point for AO right now is the awful interpreted performance. I’ll be a lot happier with AO when I can compete with other languages for performance.

Posted in Language Design | Tagged | 4 Comments

Substructural Types in ABC

Awelon Bytecode (ABC) has two operators, `k` and `f`, to support modeling of substructural constraints. In ABC, `k` will mark a block no-drop (relevant), and `f` will mark it no-copy (affine). If both marks are applied, the block is effectively linear. These properties can be enforced dynamically if necessary (i.e. halt the program upon attempted violation), but the intention is that they should (as much as possible) be checked statically.

A ‘block’ is simply a first-class function in ABC. Though, unlike first-class functions in many languages, a block in ABC always has a serializable form as a finite sequence of ABC – e.g. `[rwrzw+l]` will add two numbers on the current stack of AO's standard environment. The primary operations on blocks are to compose two blocks, and to apply a block to a value.

o :: (y→z) * ((x→y) * e) → (x→z) * e -- compose (concatenate blocks)
$ :: (x→y) * (x * e) → (y * e)  -- apply block to just `x`; hides `e`
k :: (x→y) * e → (x→y)' * e -- mark block no-drop (relevant)
f :: (x→y) * e → (x→y)' * e -- mark block no-copy (affine)
^ :: (Copyable x) ⇒ x * e → x * (x * e) -- copy
% :: (Droppable x) ⇒ x * e → e -- drop

Sadly, I don't have a good pseudo-type-descriptor notation for affine and relevant attributes at the moment. Suggestions are welcome.

Anyhow, the `no-copy` property simply means that the copy operator `^` may not be applied to the value. And the `no-drop` property means that `%` cannot be applied. If we mark a first-class addition function relevant, it would serialize simply as `[rwrzw+l]k`. When composing blocks, the composite inherits the no-drop and no-copy properties from the operands, i.e. if we compose affine with relevant the result is linear. The only way to remove a relevant or linear block from the environment is to apply it using `$`.

Why Substructural Types?

Substructural properties are very expressive for enforcing various kinds of structured behavior independently of a structured syntax. This is a useful separation of concerns, which leads to simpler code and reasoning.

By 'structured behavior' I mean for example enforcing that some operation occurs 'at least once' or 'at most once' or 'exactly once' or 'before/after/between some other operations'. A typical example might be enforcing that a resource (e.g. an open file handle) is released exactly once and not used after release. But we can look at much richer examples, e.g. protocols, games, trade models, or workflows.

Conventional languages have a relatively rigid structured syntax. Fortunately, clever developers have consistently been able to leverage this structure to enforce, or at least discipline, higher level structured behaviors. For example, if we want to enforce that a file is released, we might develop a `withFile` function of (pseudo-Haskell) form:

withFile :: (MonadIO m) ⇒ (FileHandle → m a) → (FilePath → m a)

And this `withFile` function will acquire the file resource, apply the given function, then release the file resource. We could also develop a variation with MonadException to properly bracket the action for a monad that models exceptions.

Unfortunately, the `withFile` behavior is still not entirely safe! In `withFile action "foo.txt"`, our `action` might fork another thread to process the handle, or install a callback or return a closure or exception that contains the handle. In these cases, we might accidentally use the file handle even after the file has been released. The `withFile` pattern requires some discipline to use correctly, and implementation knowledge of the `action`, both of which tax our limited mental budgets.

So what can we do?

Well, one option is to develop a syntax - or EDSL (e.g. a monad or category) - that can control aliasing and enforce safe use of the file handle and similar `(Acquire Use* Release)` pattern resources. My intuition is that this wouldn't be especially difficult for a single resource at a time - e.g. executing `action` within some variation of the StateT or ReaderT monad transformer should work well enough. Leveraging algebraic effects (or heterogeneous lists, or dependent types) we might effectively mix operations on multiple resources.

Developing an EDSL or structured syntax takes its own form of discipline, but it only needs to happen once per resource-pattern. It's a decent approach if you wish to pursue it in a closed system. However, it can be difficult to integrate such an EDSL with independently developed frameworks.

Substructural types offer a simple - almost simplistic - alternative that greatly reduces the need for structure in the syntax. If we want an `(Acquire Use* Release)` pattern, we can almost model this directly. Example API:

type File (affine)
type OpenFile (linear)
type Command (regular)
acquire :: File → OpenFile
use :: OpenFile → Command → (OpenFile * Result)
release :: OpenFile → File

With this API, we can easily reason that an open file cannot be forgotten, that it is either closed or accessible. More importantly, our reasoning is context-free, beyond assuming that our context understands and enforces substructural types. We aren't constrained structurally - e.g. we can easily use an OpenFile together with callbacks and coroutines or mashup with other ad-hoc resources, so long as the protocols are compatible. Additionally, we can reason that a file is not used by more than one thread, assuming we constrain how the `File` type is acquired in the first place (e.g. via a FileSystem object).

In ABC, objects such as files would be modeled using blocks, to which we can pass messages/commands. When serialized, an open file might look like: `[{openFile:42}]kf`. Text within {} braces is called 'capability text' and cannot be forged from within ABC. In open systems, capability text would generally be encrypted or include an HMAC to ensure security.

Files have a relatively simple, albeit common, life cycle pattern.

But much richer workflows, protocols, even games are possible. Substructural types combined with basic structured types and value sealing (or ADTs) are capable of enforcing any behavior protocol a context-free grammar can describe. If we also have dependent types, we can address context-sensitive behavior patterns. Of course, substructural types don't hinder modeling of monads, categories, EDSLs, etc., which still serve a useful role for staged computing.

Substructure for Concurrency

ABC is a non-strict, implicitly concurrent language/bytecode. Relevantly, if you apply some ABC of the form:

-- apply two functions in parallel! (cf. '***' in Control.Arrow)
rzwz$wzwr$wl :: ((x1→y1)*(x2→y2)) * ((x1*x2) * e)  → ((y1*y2)*e)

Then we can compute y1 and y2 independently and in parallel, even if the functions are effectful. This property is also called causal commutativity - i.e. two operations commute if they don't have a direct data dependency. Substructural types in ABC serve a significant role in ensuring deterministic behavior, by partitioning resources so that parallel subprograms never operate on the same resource. A rich resource partitioning model can be expressed using substructural types.

This isn't a new idea. Other languages with similar properties include ParaSail, Clean, and Rust, and the languages that influenced them.

ABC also requires a related property for any effects model, which I call spatial idempotence: if the same function is applied twice to the same argument, it must return the same result and have no additional effect. Mathematical idempotence: ∀f . f f = f. Spatial idempotence: ∀f . dup [f] apply swap [f] apply = [f] apply dup. Together, spatial idempotence and causal commutativity offer ABC many of the equational reasoning benefits of pure code. (Basically, I get everything but dead-code elimination: I cannot eliminate a potentially effectful application simply because I ignore the result.) However, in practice, it is not easy to enforce spatial idempotence for arbitrary side-effects. Substructural types also play a role here: the spatial idempotence constraint cannot be violated (and is thus satisfied) if we forbid 'dup' for effects on otherwise problematic resources. Substructural types are, hence, essential if ABC is to work with conventional imperative effects.

(ABC is designed primarily for a new paradigm, RDP, for which spatial idempotence and causal commutativity are far more natural. Requiring these properties universally - even for programs in the imperative paradigm - is a boon for independent optimizers and analysis tools.)

Linear Structure for Simplified Memory Management

Memory is also a resource. Some people have pursued linear types to help control use of memory resources, e.g. to eliminate need for garbage collection without creating an undue burden of detail and discipline on the developer. Henry Baker's paper on Lively Linear Lisp comes to mind, as does the Cyclone language with reification of memory regions.

Simplified memory management is NOT a significant motivation for ABC's linear structure. But, as an additional benefit, it isn't bad. :)

If ABC's copy operator `^` is implemented using a deep-copy under the hood, then we really can eliminate garbage, and ABC's primitive functions can be implemented using mutators. If copy is instead implemented using an aliased reference, then we'll still need garbage collection and more conventional allocation of persistent data structures. I'm interested in exploring mixed/hybrid approaches based on global program analysis, and also the possibility of using separate heaps whenever I parallelize large computations.

ABC (and the thin human language layered above it, AO) tend to encourage linear programming idioms, minimizing use of copy and drop and favoring moves and data shufflers. That said, it isn't especially difficult to statically optimize copy and drop, e.g. by eliminating dead copies.

As an aside: structural typing lends itself more readily than nominative typing to substructural constraints. If we have generic nominative types like `struct Point[a] { x : a, y : a, z : a }`, it becomes difficult to pick it apart to operate on just `x` or just the `x y` pair. Structural types allow us to pick apart objects, operate on the pieces, and reassemble them, maintaining substructural properties at every intermediate step - i.e. correctness by construction instead of by analysis. Similar can be said for parameters to functions, and why tacit programming tends to serve well for substructural types.

Posted in Language Design, Types | Leave a comment