Awelon Progress Report VII

The two months since my last report have been productive, albeit not in the directions I was aiming to proceed. Much of my work was exploratory. I’ve gained little of lasting value, but at least I’ve gained a little.

In June, I developed a graph based intermediate representation for compiling Awelon Bytecode (ABC) into Haskell, with the idea of eliminating intermediate data-plumbing (e.g. the `lrwzvc` operations) and perhaps supporting a fair bit of partial evaluation (e.g. so `#42` results in number 42). My efforts here were moderately successful. I was executing microbenchmarks in about 40% the time compared to the previous naive JIT. Though, this isn’t so impressive when I reframe it as mere a 20% improvement over the interpreter. Unfortunately, I never did get tail calls working properly in the resulting Haskell code.

I gradually grew frustrated. I attribute my frustration an ad-hoc mix of the long edit-compile-test cycle in Haskell, the pathetic performance of the existing JIT code, the fact that I was not making progress developing the AO dictionary, and my inability to precisely compile subprograms.

By late June, I had a set of ideas for moving forward:

  • shift responsibility for the ABC to Haskell compiler into AO (i.e. to further develop the AO dictionary)
  • enable AO to specify separate compilation on a per-word level, i.e. so words can act as ‘hard’ components
  • enable JIT modules to reference one another, i.e. so we can better reuse memory resources
  • properly unload JIT modules, e.g. between ‘paragraphs’, so they don’t accumulate indefinitely in memory

Enabling modules to reference one another was easiest, so I did that in a couple hours – simply shifting all modules into a single root directory (albeit with a toplevel prefix like `Awx.Byz.` to avoid a super-wide flat directory). Modules are already named via secure hash of the compiled bytecode. Presumably, I could now compile code of form `{#secureHashOfBytecode}` to import the appropriate Haskell module.

My initial idea for separate compilation in AO was to compile words starting with `#`, e.g. the word `#foo` would be compiled separately. I implemented this idea without much difficulty, introducing an additional Precompile step. In retrospect, the `#` prefix criteria proved aesthetically ugly and painful to use, needing whole-dictionary refactoring just to tweak performance. The ‘use prefixes’ idea also composes and extends poorly, since new prefixes would bury older ones. After some brainstorming (annotations? separate file? dictionary driven?) I eventually decided to try a ‘directive words’ concept: if you define `compile!foo`, the AO system will separately compile word `foo` (if it exists). I still have mixed feelings about this, but it works well enough.

At some point during these developments, I got very side-tracked by network implications.

Awelon’s approach to separate compilation and linking is non-conventional in order to support open distributed systems and streaming code. Naming reusable subprograms by {#secureHash} can save a great deal of bandwidth and more effectively reuse memory. And programmer directed separate compilation of select words in the AO dictionary offers an effective jump start to use of these resources in the short term. (I hadn’t expected to make use of the separate compilation model this early in AO’s development.) I got to wondering about cloud services, especially in their role as content distribution networks. If we want to store ABC resources with an untrusted proxy or CDN, we must encrypt the bytecode, and keep the decryption key in the name. I wrote an article about this idea, achieving Tahoe-LAFS like provider independent security for ABC resources. Further, if we’re going to encrypt the code, we can’t really compress it afterwards, so we should probably compress before hand to save storage and bandwidth. ABC code should compress remarkably well. But compression algorithms seem to introduce a lot of ambiguity in their decision making (e.g. where to break up blocks)… which could be problematic for deterministic resource naming. I experimented with a few ideas. I’m still not sure which compression algorithm I’d favor.

I eventually tabled the whole compression and encryption effort for now, using ‘id’ function for compression and encryption. But I’m sold on the idea, and I plan to proceed with it eventually. ABC resources encrypted by default should prove an enormous boon for distributed programming.

I didn’t make any progress at all on re-implementing the ABC-to-Haskell compiler in AO, nor have I gotten around to properly unloading JIT plugins (plugins are disabled at the moment). I’m starting to wonder whether compiling to Haskell is a dead end. The interpreter is reasonably fast, and perhaps I should be shifting efforts towards a true bootstrap, e.g. targeting C or OpenCL or LLVM or JavaScript.

Secondary Efforts

While compilation and JIT were the bulk of my efforts, I’ve been doing a little design work on the side.

Back in April, I mentioned that I wish to also work on automatic type checking, i.e. to detect those silly errors that often pop up when we’re editing code. I’ve been keen on trying pluggable type systems. Awelon project implies a certain degree of structural and substructural typing (e.g. numbers, pairs, sums, unit, blocks, affine, relevant, sealed values). But I think we can go a lot further by making sets of ‘type hypotheses’, e.g. recognizing that some structure is essentially used as a list or a stack.

My current idea is to represent multiple type systems in a single AO dictionary, using a simple naming convention (a prefix) to identify type systems. Each type system will have some opportunity to output results about each program – e.g. types, failures, indecision, errors and warnings. These reports are returned to the programmers, who can decide what to do with them (ignoring is a possibility, e.g. if an experimental type system is known to give bad results). In a good IDE, type feedback for multiple type systems may be continuous as we make edits on a word.

This might be combined with a generic approach to type declarations, whereby we specify that some block of code should have a given type via standard ‘type system’ API. The main purpose here is the ability to declare parametric polymorphism and similar properties that are otherwise difficult to infer.

Another idea I’m working on is embedded literal objects, which allows extending Awelon project’s literal types beyond simple numbers and text, and may prove very useful as a basis for embedded DSLs and rich data description.

What’s Next?

Right now, more than anything else, I need to grow my AO dictionary.

My work on that front has fallen way behind in the last several months. A 392-octet microbenchmark isn’t enough to observe the memory reuse and caching benefits for separate compilation and linking. And, realistically, I need a much richer dictionary before I can effectively approach bootstrap or JIT. Further, ABC has a remaining approach to parsimony and performance that I haven’t even started to develop: Awelon Bytecode Deflated (ABCD), using higher UTF-8 bytecodes to predefine a standard dictionary of popular or performance-critical functions. To develop ABCD requires a massive and maturing ABC code base.

I’d like eventually (this year) to work on the RDP interpretation of ABC. Fortunately, RDP isn’t too difficult to implement except for the difficulty of integrating external systems (filesystems, databases, sensors and actuators, etc.). Unfortunately, integrating external systems with a runtime can be somewhat painful. RDP will take some time to make useful, and I’d rather not spend the next couple months in Haskell runtime code. (Especially since I’d also like to gradually deprecate the Haskell runtime and bootstrap a new one.)

My focus for at least the next six weeks will be the dictionary, which I’ll grow by developing simple projects. I’ll certainly need data structure support if I’m ever to advance my bootstrap efforts.

Posted in Language Design | Tagged , | 2 Comments

Embedded Literal Objects

When a PL is designed for a text editor, it is natural that text is the richest embedded literal. The text editor provides the visualization and update HCI. And text can certainly be useful; beyond modeling a subset of human input or output, it can model embedded DSLs such as regular expressions. But text is ineffective for many kinds of data we might wish to express in our programs – e.g. 3D meshes, logos, diagrams, graphs, grammars, matrices, decision trees, scene graphs, sound models, stage data in a 2D video game, dance notation, skeletal animation, spreadsheets, and so on.

Conventionally, to work with these richer models requires we use external tools – e.g. Maya, Blender, Photoshop. However, use of separate tools has many disadvantages, such as poor support for abstraction, partial evaluation, integration. It can also be inconvenient to load separate tools for a quick peek or edit.

A few not-so-conventional languages, such as pure data [more], are designed for work on a canvas and provide support for diagrams as literals. And then there are fully interactive programmable systems, such as Croquet project, where we manipulate first-class objects. The latter has advantages for flexibility and programmability, but also disadvantages with respect to entangling the code with the runtime.

I’ve been interested for a while in supporting flexible literal types for my Awelon project. Last week, my I was inspired towoard a simple model that I call embedded literal objects (ELO). An ELO is effectively just a block with some conventions surrounding it for display and update, leveraging edit-time computation. Relevant bullets:

  • represented by block of pure bytecode embedded in source code
  • fixpoint closure type, e.g. `µELO.(Command→(ELO+OtherResult))`
  • commands are either updates or queries
  • IDE processes updates with ELO and writes resulting ELO into source code
  • queries can obtain menus of command, rendering to canvas, access data
  • development of commands and queries left to convention

Like text literals, an ELO can be copied and pasted, version controlled, and readily reused or re-edited across projects. There is no entanglement with the program context. Unlike text literals, an ELO potentially has much richer structure for easy composition, decomposition, and abstraction. We’re likely to develop ELOs that measure many megabytes in size. Though, a lot of redundant logic, e.g. for command menus, rendering, could be compressed via bytecode-layer linking model.

For current Awelon project languages, the proposed serialization of ELOs is quite trivial:

〚raw awelon bytecode here〛          in AO
[raw awelon bytecode here]{&elo}l   in ABC

The dedicated syntax for ELOs in Awelon Object (AO) language helps the IDE recognize them for special treatment. I’m using the unicode white variant square brackets: U+301A and U+301B. The resulting bytecode is just a plain old block with an extra annotation `{&elo}` indicating its nature for bytecode level debuggers or extraction. The final `l` operation (type `(a*(s*e))→((a*s)*e)`) shifts the embedded literal object onto the AO stack like all other AO literals.

I love the idea that, modulo partial evaluation, these ad-hoc literal objects remain recognizable at the Awelon Bytecode (ABC) layer. This is valuable for both debugging and extraction at the bytecode layer, and also is consistent with ABC’s existing approach to pseudo-literal numbers (e.g. `#42` is a sequence of three bytecodes that generate the number forty-two on the stack) and embedded text (shorthand for a list of small integers). When programs fall to pieces, comprehension starts at the bedrock, which for Awelon project is ABC. Every little bit of consistency, simplicity, transparency, and legibility at the bytecode level will help developers.

A new ELO could easily be generated from the AO dictionary as it stands. An AO IDE might also provide a menu of ELOs by searching for words starting with prefix `elo.`.

Processing ELOs

For text-based embedded DSLs, we typically use a command to interpret or compile the text. For example:

"x y z → y^2 + x^2 - |y|" doTheMath

Of course, text works well for only a small subset of DSLs, such as regular expressions. Math really isn’t among them if we want rich symbology, matrices, big division bars, and so on. An Awelon system can potentially optimize this behavior by systematic application of partial evaluation, i.e. such that we do not process the string on every execution.

With ELOs, we have greater flexibility. An ELO has its own logic, data, and can represent both rich and abstract data structures. We might ask an ELO to compile itself, or extract a value, or query a specific field, or to render itself at runtime. An ELO exposes the exact same set of commands to the runtime as it does to the IDE. ELOs can be treated as simple values, copied as many times as needed.

Updating ELO Types

Individual ELOs can be updated via the available commands. But how do we update the set of commands? For example, if we have an ‘image’ type with a canvas and a bunch of editing commands, how do we add new brushes and commands? I can think of several approaches:

  1. extensible command set: ELO command sets may be extensible and replaceable by convention, via specific commands
  2. explicit import/export: ELOs may provide an ‘export’ command that is readily imported by the next version of the ELO
  3. adapter pattern: wrap the ELO in an adapter that can translate commands and responses, at cost of performance
  4. extract from ELO: bring debuggers to bear on upon the bytecode and surgically extract the information we care about.
  5. extract from history: record commands that updated an ELO, and replay them on the new version of the ELO
  6. IDE layer extension: recognize common ELO types (e.g. based on query for type info) and install command sets at IDE layer

I suspect the answer to the ELO type update problem will be some ad-hoc mix of all of the above. The first two are preferable, and are ultimately achieved through convention and best practices in ELO design. The adapter pattern isn’t ideal, but can work well enough if not layered too deeply. The last few lean heavily on the integrated development environment, but might serve as effective fallbacks. Having a history of commands on ELOs would be useful anyway for extracting user macros into reusable subprograms.

In any case, the update problem is analogous to modifying text literals for an embedded DSL after a language version update. Developers will systematically update all instances until they get bored of doing so and start designing with forward compatibility in mind. At that point, they’ll update only the literals that need updating. Likely, there will be a recognizable ELO maturation life cycle.

The Way Forward

At the moment, my Awelon Object language is still trapped in a text editor, and ELOs won’t do me much good.

My intention is to eventually upgrade to editing code via a web browser, perhaps in a wiki-like environment. ELOs aren’t my only goal; I’m also interested in hyperlinks, live programming, automatic visualization of the tacit environment (perhaps from live tests), interactive documentation (tutorials, etc.), and so on. I think ELOs will become much more viable in such an environment.

There are a lot of details remaining. For example, I’ll want to develop an API for rendering to canvas, possibly inspired from HTML or SVG. Animations and sound are interesting areas for literals; I’d love the ability to ‘hear’ whatever sound model I’m looking at, or see animations running in a loop. For 3D or larger 2D structures, ability to rotate, pan, zoom, etc. could be useful.

Posted in Language Design, Modularity, Types, User Interface, UserInterface | Leave a comment

LZW-GC Streaming Compression

LZW is a simple compression algorithm – certainly the simplest, most deterministic I could find in my investigations. And here I use ‘deterministic’ in the sense that there are no implementation-dependent decisions, ambiguities, smarts or heuristics. LZW has only one real decision to make: dictionary size. And for my application, which requires highly deterministic compression, I’m willing to fix the dictionary size.

But LZW has a weakness: after the dictionary is filled, no new patterns are recognized. And the dictionary, at this point, is very poorly utilized: there are many not useful patterns.

There are existing approaches to mitigate this weakness, e.g. including a ‘clear code’ to reset the dictionary, or compressing chunks at a time. However, these tend to require some heuristic decisions that undermine LZW’s wonderful determinism. They also tend to lose old but valid patterns.

In this article, I propose a simple alternative, garbage collect the dictionary with an exponential decay factor.

A preliminary (but flawed) approach:

  1. To every element in the extended dictionary, add a match count.
  2. Increment match count on match (even when we don’t output the token).
  3. When dictionary is full, output current match token & garbage collect.
  4. To garbage collect (GC), remove zero-match elements & cut match counts in half

LZW is quite optimistic when growing its dictionary. In practice, we can expect that many entries go unmatched, and thus will be available for GC. LZW attempts to grow its dictionary after every output token. Thus, we should expect to perform a GC roughly every dictionary-size outputs. The exponential decay factor results from cutting down match counts in half, thus favoring newer patterns over older ones.

The flaw of this approach is that not all patterns have equal opportunity to find matches. Information about patterns discovered just before GC is more likely to be lost. This flaw can be mitigated by incremental GC. We can GC exactly one token just before every allocation. In a round-robin fashion, we can halve the match count of tokens until we find one with zero count. That’s the one token we collect.

This can potentially result in unusual cases where we ‘match’ patterns that never existed in the original input, i.e. because the dictionary has some intermediate nodes that were collected. This is unlikely but okay; it doesn’t hurt anyone. The dictionary doesn’t change between outputs, and we’ll always be able to decode any token the way the compressor meant it.

To decode, for each input token we recursively increase ‘match count’ for that token and all its dependencies. We also perform a round-robin GC after every input, and keep track of the free token. It shouldn’t be very sophisticated. We’re simply reproducing the same dictionary used to generate the output.

This extended form of LZW, which I’ll just call LZW-GC, is very similar in some ways to sliding window compression algorithms. But I think the dictionary is simpler, especially on the encode. Theoretically, LZW-GC should be superior to a sliding window of equivalent size because LZW-GC can make effective use of older patterns that are long gone from the window. To complete LZW-GC, we should follow it with a Huffman encoding to compress frequent tokens. This is independent of GC at the LZW layer.

Where’s the Code? The code is available on github, along with compression results.

It turns out that LZW-GC is about the same as LZW in most use cases, but significantly better for a few very, very large inputs. I’ll need to explore some ways to speed up dictionary construction, such as adapting LZAP (a variant of LZW).

Posted in Language Design | Leave a comment

ABC Linking with Provider Independent Security

As I’ve mentioned in other posts, Awelon Bytecode (ABC) has a non-conventional approach to code reuse. A valid sequence of bytecode can be given a deterministic, cryptographically unique name – e.g. a secure hash. In ABC, the code to invoke this external bytecode resource might be represented with `{#secureHashOfBytecode}`. Logically, the identified bytecode is inlined in place. In practice, this indirection provides a fine opportunity for separate compilation and caching.

In case of distributed systems, a secure hash offers a potential advantage of location independence. You can download the identified code from anywhere then validate it against the secure hash. Unfortunately, the code itself is exposed to whomever stores it, and in general code can contain IP sensitive information. So we cannot upload the code to anywhere. With just a secure hash, we’re restricted to trusted hosts for storing code. I would prefer a property the Tahoe-LAFS community describes as provider independent security – i.e. that the host cannot inspect the data. This way, we can cheaply rent a few untrusted cloud servers to handle the bulk of data distribution.

It turns out that only a small tweak to ABC’s naming system is needed to support provider independence:

  • use the secure hash of the bytecode as a symmetric encryption key
  • use the secure hash of the cipher text as the lookup key
  • store the cipher text, indexed by its lookup key
  • access via `{#secureHashOfCipherText:secureHashOfBytecode}`

I’ve studied computer security a lot from the PL perspective, but I don’t consider myself an expert in cryptography. To help validate this solution, I compare it to Tahoe’s approach. Essentially, I need a tiny subset of Tahoe’s features: I don’t need mutable files, I don’t need erasure coding or K-of-N recovery, and I don’t large multi-segment files (I can decompose code at the ABC layer).

Tahoe’s documentation describes: “The capability contains the encryption key, the hash of the Capability Extension Block, and any encoding parameters necessary to perform the eventual decoding process. For convenience, it also contains the size of the file being stored.” And here the Capability Extension Block is derived from hashing the ciphertext – albeit, with a layer of indirection to support incremental loading of multi-segment files.

ABC’s model for a bytecode resource capability is very similar. The main difference is that ABC’s encryption key is determined from the bytecode content. And there is no capability extension block indirection, no segmentation at that layer. If there is a weakness in this design, it’s because the encryption key is easily computed from content. But, assuming the hash really is secure and independent of the encryption algorithm, I don’t believe that’s a feasible attack.

Having the final capability text be deterministic from content is valuable for interning, caching, reuse of popular software components (templates, frameworks, libraries, etc.). I would be reluctant to give it up.

Recently, I’ve been deciding on a few concrete encodings.

I’m thinking to use SHA3-384 as basic secure hash algorithm, but now using the first 192 bits for secureHashOfCipherText and the second 192 bits for secureHashOfBytecode. This should maintain the full original uniqueness, or very nearly so: if our encryption function was identity, we’d certainly get the original 384-bit secure hash. Those 192 bits each will be encoded as 32 characters of base64url. AES is still the standard for encryption, and does support a 192 bit key.

Compressing the bytecode prior to encryption is potentially a worthy investment. Doing so could save in storage and network overheads. It’s infeasible to compress after encryption. And a primary consideration is that the compression algorithm should be highly deterministic (almost no room for choices at compression time). (addendum): After investigation, LZ78 and LZW seem promising, especially together with Huffman or Polar encoding.

Anyhow, my original plan was to just use `{#secureHash}` resources and use a distinct encoding and pipeline for the encrypted variation. Now, I’m leaning towards just using the encrypted, provider-independent security naming convention at all times, even when the object never leaves the local machine. Decryption, decompression, and compilation are essentially one-time costs (if we use a good cache) and compilation will tend to dominate. The overhead for provider independent security is essentially negligible.

Convergence Secrets (Addendum)

Zooko – a primary developer on Tahoe-LAFS – notes in comments below that convergent encryption has a known security weakness to ‘confirmation’ attacks if the objects contain sensitive low-entropy information. Actually, confirmation attacks are a potential problem even without the encryption, but is a few orders of magnitude more so with encryption because we’ll be exposed to ‘offline’ attacks. Still, convergence is valuable in the common case for Awelon project (e.g. caching and reusing popular library code), so I’d be reluctant to abandon it.

Fortunately, this can be addressed by adding some artificial entropy. This can be done on a per-resource basis, so long as we have some policy to identify which resources are sensitive and thus should be protected. (In context of Awelon Object (AO) language, perhaps a dictionary word `foo` might be considered sensitive whenever a directive word `secret!foo` is defined.) Unlike Tahoe-LAFS, the secret doesn’t need to be part of the protocol; it isn’t a problem to compile it directly into the bytecode, e.g. by simply adding a random text literal then dropping it:

"sGA_1eXeWzIrmHmgmPmaBAbrW1PyC3oc
~%

Of course, there are many advantages to determinism. So, rather than random texts, I might instead favor a secret per user and use an HMAC style mechanism to deterministically generate a convergence secret per unique sensitive resource.

Posted in Distributed Programming, Language Design, Modularity, Open Systems Programming, Security | Tagged , , , | 4 Comments

Compiling ABC/AO

I’ve spent most of this month developing a more sophisticated compiler for Awelon project, based on first translating the code into a dataflow graph, then translating to Haskell. This allows easy use of Haskell’s variables, and more effective use of pure subprograms. Which, presumably, GHC knows how to optimize. At the moment, runtime performance only slightly exceeds interpreted performance for a simplistic microbenchmark. But I have yet to implement a few simple optimizations on the intermediate graph. Progress on the compiler has been painfully slow, involving a lot of backtracking and exploration of designs (though it’s only a couple thousand lines of code). Anyhow, this post isn’t primarily about performance or progress.

I’ve been thinking about how I should expose compilation to programmers. This post is just me thinking out loud. You’re welcome to join in.

Currently, I support dynamic compilation by use of a `{&compile}` annotation. Annotations must have identity type and no effect observable within the program, but are able to support debugging or tweak performance – e.g. parallel evaluation, or compilation of a block, or runtime assertions. In this case, obviously, we’re compiling a block. But use of an annotation is very explicit, and can be somewhat awkward – e.g. requiring constructs like `[foo] compile inline` instead of just `foo`.

There is a good case for explicit dynamic compilation, especially in context of DSLs and metaprogramming. Leaving these to implicit just-in-time (JIT) compilation has always seemed awkward to me; often the programmer does know best. But there is also a good case for implicit compilation, based on traces or hotspots or similar.

At the moment, in context of Awelon Bytecode (ABC), I’m not sure how to go about implicit JIT. In conventional languages, source tends to have a stable name or location in memory, which is easily used to keep a counter to identify hotspots or traces. ABC is designed for streaming code, and ABC blocks don’t really have any name or identity beyond their structure, and loops often involve compositions of code. Awelon Object (AO) code does support names (it’s essentially a macro language for ABC) but still doesn’t have recursive loops. (I’m also hesitant to compile AO directly, because development in that direction might hinder development for streaming bytecode.)

One possibility is to leverage a streaming compression and dictionary building algorithm to gradually identify long sequences of ABC code that are seen often, then compile those common sequences to support implicit JIT. I feel this is worth exploring, but it also seems like a lot of work for potentially marginal gains.

Anyhow, I’m also interested in static compilation, and easily mixing static and dynamic components. Dynamic compilation is useful in this endeavor, but I feel there is more I could do. This morning, I had another idea (which should have been obvious in retrospect):

I can use naming conventions in the AO dictionary to drive static compilation.

AO already uses naming conventions to support automatic testing. (Words prefixed with `test.` become automatic tests.) Gradual development of de-facto naming conventions – e.g. to support typechecking, documentation, property testing, optimizations or rewrite rules, exporting applications – has been part of AO’s design from very near the beginning. The idea is that naming conventions do not change the formal meaning of a word (the macro expansion into ABC) but they do suggest contexutal application from an IDE or larger system.

The idea is that I could use a prefix such as `#` or a suffix such as `.exe` to suggest that words of the form `#foo` or `foo.exe` should be statically precompiled (in a runtime where this is enabled) for use within the runtime. Whether this is a good idea, or ultimately just as awkward as explicit dynamic compilation, remains to be seen. But it does appeal to me. It allows appropriately coarse-grained compilation for reusable components, and doesn’t hinder performance when interpreting code. It can also enable effective separate compilation, reuse, and code sharing (which might result in better cache behavior).

A potential advantage of this `#foo` approach is that it mitigates a weakness of AO: exponential growth of generated code.

The formal semantics of AO is simply the inline expansion of each word’s definition until there are no more words. All that remains, at that point, is an ABC stream. Keeping these extremely simple semantics is why AO forbids recursive function definitions – there would be no limit to simple inline expansion. Loop behavior is instead expressed non-recursively by use of a fixpoint combinator (which admittedly isn’t very convenient, but developers can build while/until/foreach/etc. loops above fixpoint).

Anyhow, this inline expansion means that generated bytecode tends to grow exponentially in size with depth and layers of abstraction. This can be partially mitigated for streaming code because we can expand lazily. But not all Awelon project subprograms are streamed, and in general exponential expansion can stress system memory and CPU cache. By utilizing precompiled words at expansion time (as opposed to dynamic compilation, which applies after expansion) it seems reasonable that stress on memory and cache could be reduced. Further, I could integrate this with ABC’s own approach to separate compilation and linking.

Awelon project has a non-conventional and interesting approach to separate compilation, linking, and resource sharing:

Instead of importing human-named modules and functions, we can identify a software component by the secure hash of its bytecode. For example `{#ngv6gnrqki5vgnyxtodjorlrwoiinmfk74gg3ftwj7byrqratmlomruoce57}` might identify bytecode that implements a fibonacci function. Theoretically, there is an opportunity here for collision in names. But, in practice, that probability can be reduced well below the threshold of unlikely problems like cosmic rays and meteor strikes. The convention is that this invocation shall search for a resource, download and validate it if necessary, then cache and compile locally. But this convention can be tweaked for circumstance, e.g. shifting the compilation task to a trusted proxy. If the resource cannot be obtained, the program simply fails. If space is needed, old or infrequently used resources (or perhaps just their compiled objects) can be discarded from cache. Search locations for download might be extended by use of annotations.

Use of a secure hash instead of human names avoids a lot of problems: update and versioning concerns are cleanly separated, there is no cache invalidation, we can download the code from anywhere, there is no trouble using proxy repositories. The design is highly compatible with open and distributed systems and streaming code. The hash can even serve as a secure capability of sorts, depending on the protocol by which we download the resource. I believe Awelon’s design is theoretically superior to most conventional approaches, but it has yet to be implemented.

In context of precompiled words, it seems feasible to start using this technique directly: code of the form `#foo` might compile to a `{#secureHash}` resource instead of expanding fully into the underlying ABC. (For debugging, this might become `{#secureHash (#foo)}`.) This would make concrete what’s happening with respect to space savings, separate compilation, and reuse, i.e. because each compiled element has a fixed size in the ABC stream. Providing developers the control here could prove immediately useful. Long term, however, division of a program into optimal reusable subcomponents might be mostly automated.

So… coming back to JIT.

The earlier difficulty with JIT for ABC was the lack of a stable location or names. However, these precompiled words should serve as moderately effective JIT targets. Naively, we might simply compile each word the first time it is required. But more sophisticated approaches involving counters or whatnot are certainly feasible.

I think I’ll give these precompiled words a try. Reviewing this article, it seems I favor the `#` prefix for precompiled words, perhaps due to the allusion to `{#secureHash}` resources. Hopefully, this really will help with performance by reducing the memory and cache burdens.

Addendum 30 June

I’ve played around with using word prefixes to guide compilation, but I find this fairly inconvenient and rigid, requiring too much editing of the dictionary to test slightly different configurations. An alternative I’m now considering is to use a separate configuration value, which simply contains a list of words to precompile (one per line).

Posted in Language Design, Modularity, Open Systems Programming | Tagged , | Leave a comment

Awelon Project Report VI

Progress in May was disappointing. I didn’t complete either of my primary goals towards improving JIT or typechecking for ABC. My efforts were based around compiling ABC code into an intermediate box-and-wire ‘graph’ that hides the dataflow operators and supports a variety of useful optimizations. However, it seems I aimed to accomplish too much in a single pass: an ad-hoc mix of partial evaluation, inlining, and graph writing. Cumulative complexity dragged me down. :(

I’ll be making a similar attempt this month, albeit in smaller steps. Partial evaluation, inlining, and similar will occur in separate passes. I’ll probably need a few more intermediate graph types, though implementing a lot of different types can be painful in its own way. Fortunately, this compilation to graph should contribute to my other primary goals for typechecking and reactive implementations…

On a more positive note, my secondary efforts in May were not unfruitful:

  • asynchronous computations: I implemented, tuned, and tested an `{&asynch}` annotation that marks a block for asynchronous computation (`(a→b)→(a→b)`). Under the hood, this uses unsafeInterleaveIO and an MVar, such that clients wait on the `b` value only when they attempt to observe it. There is also a finite pool of worker threads, to avoid starting more computations than we can easily handle. Each action is required to terminate, and we’ll wait for all asynchronous operations to complete at implicit points in the input stream (e.g. between ‘paragraphs’ in a command stream, or between commands in the REPL). Errors from asynchronous operations are accumulated and reported together.
  • asynchronous filesystem ops: all commands to observe or influence the filesystem are now asynchronous, albeit serialized for each filename
  • test.jit added a utility to the `ao` command line to run the test suite via JIT. This currently takes an absurdly long time: over 4 minutes when first compiling the code, then 1.2s using the cached compilations. Compare to 0.18s to run the test suite via interpreter (of which 0.1s is just loading the dictionary).
  • abc.ann added utilities to the `ao` command line for annotated ABC, i.e. including location information from source. This could potentially help with a debugger.
  • equality assertion as annotation: in April, I modified the `>` operator to be monomorphic, operating only on numbers. However, this broke many tests. I developed a new annotation `{&≡}` (that’s U+2261) that is useful for the common case of asserting equivalence. This not only simplifies testing; it should also be useful when (for example) composing maps with user-defined key comparison functions; one can assert the key comparison functions are equivalent without actually observing the equivalence. Only structural equivalence is guaranteed to pass; I don’t require decidable observational equivalence.
  • design thoughts on security for open systems: I’ve been brainstorming a bit about homomorphic encryption and encrypted computations in open distributed systems, and how I might represent this in ABC (mostly via tokens and simple conventions). This doesn’t have any immediate application, but could become a very useful idiom.

I also learned that about 0.6s overhead for loading JIT programs is one-time only (on the first load), and the load cost is only a few milliseconds thereafter. This is much less a problem than I was imagining in April, when most of my tests involved a single {&compile} operation. The more significant cost now regards compilation itself, and I’ll eventually need to generate code that is easy for GHC to parse and compile (and ideally reuse).

Posted in Language Design | Tagged , , | 2 Comments

Half-Embedded DSLs

Domain Specific Languages (DSLs) come in two primary flavors: external or embedded. External DSLs have their own external tools – parsers, interpreters or compilers, optimizers, etc. – apart from any general purpose language. Embedded DSLs exist in context of a general purpose language such as JavaScript or Haskell. In practice, external DSLs tend to grow in power to cover various corner cases until they accidentally become poorly designed general purpose languages. Embedded DSLs can avoid this fate by providing escapes to the hosting language’s semantics. However, these escapes can also make it more difficult to port, reason about, optimize, serialize, or update the DSL code.

There is a sweet spot that lies between embedded and external.

For an external DSL, we are implicitly assuming a particular approach to code distribution, e.g. Alice sends Bob some DSL code, which is then parsed and interpreted by a program that Bob already possesses, which hopefully has a compatible version. This sweet spot is found by tweaking the code distribution model: Alice sends Bob some DSL code, and Alice also sends Bob the program to parse and interpret (and optimize) this code. Alice can then address corner cases by updating her interpreter, and further retains all the advantages of external DSLs.

In this case, the interpreter is sent using a common general purpose language or bytecode, and the client provides a simple API. If you’ve worked with many of JavaScript frameworks, you’ve probably encountered this pattern, with the DOM serving as the API. A script section doesn’t need to be JavaScript, and might be interpreted by other JavaScript code.

Of course, design at this sweet spot presents a few performance challenges.

First, those parsers/interpreters/compilers/optimizers can be relatively heavy – themselves having hundreds of kilobytes up to a few megabytes of code. Alice doesn’t want to send the full interpreter every time a web-app is loaded… and Bob, similarly, doesn’t want to parse and compile and link a big interpreter every time. If a DSL is popular across dozens of websites, Bob also doesn’t want to load dozens of copies to accommodate his tab addiction. We need a good way to cache and reuse the interpreter code, at which point Bob can afford to optimize it heavily and use it hundreds of times.

Second, we ideally want the DSL code itself to operate with near-native performance. This suggests we need some sort of just-in-time or dynamic compilation, i.e. such that we can compile a DSL by translating it into the host language or bytecode. People care about performance. It would undermine the use of DSLs if developers had to jump through hoops to squeeze excellent performance from them.

In conventional PLs, interpreters tend to become named packages, and dynamic compilation is relatively rare or awkward. So this approach to DSLs seems to be rare outside of JavaScript. And, unfortunately, JavaScript is not a great target language (e.g. with respect to concurrency, security, serialization and distribution of first-class functions), and conventional URLs are not very nice as cross-domain reusable modules for bulky interpreter code. A simple secure hash of the resource would be much better: acyclic, origin independent, no cache invalidation.

I believe we’d have much to gain by developing and promoting a bytecode where these half-embedded DSLs can thrive, ideally together with an Internet with services, databases, and browsers progammable in this manner. My Awelon Bytecode (ABC) is designed for this role. Of course, JavaScript has all the advantages of incumbency and will be exceedingly difficult to dethrone, despite its weaknesses.

Posted in Language Design | 2 Comments

Awelon Progress Report V

Performance remains a priority.

At the beginning of this month, I reported some success with compiling the AO dictionary to Haskell ahead-of-time. However, after just a few days of using aoExec, I was feeling very sick of large up-front compilation times after tweaking the AO dictionary. Further, I realized I need some equivalence between `aoExec` and `aoi`, especially with respect to their powerblock and available effects, if testing in one is to carry over to the other.

On April 9th, I decided to discard `aoExec` and try a dynamic compiler based on Haskell’s plugins. Besides offering a more dynamic development environment, this approach should be much better than `aoExec` for metaprogramming.

I only required a few changes, but I (foolishly) decided to reorganize and rewrite most of the ao libraries. On the bright side, the rewrite netted significant (and rather unexpected) performance gains for my interpreter. The `bench.repeat100k` microbenchmark that before took 2.4s to interpret (and 0.4s when compiled), now takes 0.56s to interpret. The `ao test` that before took 1.38s, now takes 0.155s to interpret (though, it before took 0.012s when compiled into `aoTest`).

Dynamic compilation is now supported.

In AO, developers can use the annotation `{&compile}` to compile a block. The Haskell plugin is a module generated from the raw ABC code, and uniquely named based on a cryptographic hash of said ABC code. The type passed across the module boundary is `newtype Resource = Resource (forall m . (Runtime m, Monad m) ⇒ V m → m (V m))`, leveraging GHC’s Rank2Types extension.

This works. As in: it compiles, it links, it executes and returns correct results. Simple `$c]` tail calls are working fine. But:

  • It takes 0.8s to compile the identity function, and 1.86s to compile `bench.repeat100k`
  • Upon execution, the compiled form (for this microbnenchmark) is approximately 65% the speed of the interpreted bytecode.
  • Object code is never garbage collected after load.

Still, I’m optimistic! I can’t do much for the latency, but I could compile in parallel and hot-swap a block used in a loop, and perhaps avoid rebuilding code if it already exists. Due to a bug in GHC, I can’t fix the garbage collection before GHC 7.8.1 (I currently use 7.6.3, which ships with Ubuntu). But, when the time comes, I might be able to leverage System.Memory.Weak ephemeron tables or foreign pointers to address garbage collection. Most importantly, I can improve the generated code.

The Haskell code I generate is currently extremely simplistic and naive. For example, the bytecode `vrwlc` expands into:

{-# LANGUAGE NoImplicitPrelude #-}
module Rynms52btpb4zpgpwot32aqx2mzgkkxe4serh3c4tht3rcieaj4pjjbywahbe
    (resource) where
import ABC.Imperative.Operations
import ABC.Imperative.Resource
import Control.Monad

resource :: Resource
resource = Resource (v>=>r>=>w>=>l>=>c)

I have a big list of ways to improve the generated code, e.g. by building a Haskell expression rather than directly composing operations I could reduce this to `resource = Resource $ \ (P a b) -> return (P b a)`. I suspect I can achieve an order of magnitude improvement for sophisticated programs. Additionally, with a little partial evaluation, I should be able to directly recognize fixpoint cycles and model loops in forms GHC can effectively work. If an ABC program is highly repetitive (large common subprograms) then it might also be useful to recognize these and reuse them to reduce memory footprint.

I’ll be giving these improvements a try over the next couple weeks.

Leaping to Awelon Project’s process model

I’ve spent a lot of time this month thinking about process models. In the last report, I mentioned, “I’ll need to develop the `aoExec` powerblock a fair bit to perform useful side-effects”. Initially, I was laboring under an assumption that I’d pursue a fully imperative approach to get started, then go about implementing Awelon project’s model (a toplevel ABC stream that manipulates a ‘live’ RDP program). However, imperative processes are an awkward fit for AO’s constraints (spatial idempotence, causal commutativity, progress, etc.). I can model imperative processes if I’m willing to sacrifice causal commutativity, or accept lockstep concurrency, or implement a time-warp system. But I’m not willing to do the first two, and the last is even more sophisticated than RDP.

My decision is to move forward with the Awelon project model, albeit tweaked slightly: we’ll still manipulate a representation of an RDP program, but the act of ‘installing’ the RDP behavior will be an explicit imperative action (albeit, an idempotent and commutative action). This risks one of my design principles: that “ALL long-running behaviors and policies should have corresponding state in the environment.” However, it also allows much greater precision for how we interpret the RDP program, how much we recompute, and eliminates risk of trying to interpret invalid intermediate states. Presumably, some environments might couple these update actions more tightly to avoid behaviors falling out of synch with their representations. But, for now, that will be left to discipline.

Ultimately, both `ao exec` and `aoi` can use this same model. They already receive the same toplevel stream of AO or ABC. I haven’t even started implementing the RDP interpretation of ABC, since I’m still focused on performance in the imperative layer. Fortunately, I don’t expect any special challenge: Sirea was already a prototype for this sort of development.

Overall, I think this change is moving my timeline in a nice direction: I won’t be detouring through imperative process models or development of associated words and idioms. I’ll get more practice implementing RDP for when I fully bootstrap.

Miscellaneous

  1. The `>` ABC operator is now monomorphic, operating only on numbers. This change was suggested to me because `>` was the only remaining polymorphic operator that requires type information in simplistic implementations of ABC.
  2. The list type from `µL.(1+(a*L))` to `µL.((a*L)+1)`, mostly because operator `V` makes it a lot easier to add elements on the left than on the right (fewer operators for ‘cons’). This change had a pretty small impact, only required tweaking a few list processing words.
  3. New utility `ao exec` offers a similar role to the now defunct `aoExec`. Though `ao exec` is more convenient because it receives an ad-hoc AO command.
  4. New utility `ao exec.s` can receive a stream of AO via standard input. It’s basically the non-interactive `aoi`.
  5. New utility `ao exec.abc` can receive raw ABC code as a command. And `ao exec.abc.s` can receive ABC via standard input.
  6. New utility `ao jit` outputs the Haskell code that will be generated for dynamic compilation of a given program.
  7. New utilities `ao list`, `ao defs`, `ao uses`, and `ao def` are useful for studying the AO dictionary without the text editor.
  8. The old `ao type` utility is currently disabled and needs rewritten. I have new ideas for it anyway.

(Aside: I’m thinking to implement some command-line refactoring utilities, e.g. to rename words or update definitions.)

It has been a productive month. I’ve spent most of it ‘in the flow’. However, there wasn’t much development of new AO code.

What’s Next?

Performance is the top priority for me. I don’t want AO to remain a ‘toy’ language for another six months. I’ll be spending the next couple weeks aiming to improve the generated Haskell code. I imagine I’ll be content, if not happy, to achieve an order of magnitude improvement for loopy programs.

Beyond that, I’m interested in developing a more sophisticated static type inference – i.e. to properly detect structural fixpoint (µ) types, and track correctness under multiple sets of assumptions, and perhaps determine a conservative set of ‘gatekeeper’ safety predicates upon entry to a subprogram. I also have some ideas on how to develop `type.` and `typeOf.` words in AO for automatic testing and documentation purposes. (It’d be nice to assert that certain arguments are polymorphic, or to develop something like QuickCheck based on type descriptors.)

And, of course, I need to develop the RDP interpretation of ABC.

I don’t have a solid timeline for all this, but my best guess would be: two more weeks on performance, two weeks on AO type inference, then get back to RDP integration in June (after which time I’ll be able to try some long-running applications). I’d like to also develop the dictionary some, but I don’t have much hope of getting back to it this month.

Time is flying! Hopefully, by October, I’ll be ready to implement real programs (e.g. video games, web servers) and use this technology in real practice.

Posted in Language Design | Tagged | 2 Comments

Awelon Progress Report IV

I spent most of March attempting different approaches to improve AO’s performance. This isn’t what I set out to do. At the end of February, I wrote: “My plans for the next month are to start building functions for data structures: maps/records, sequences, streams.” And I did make a little progress in that direction – but not nearly enough to pretend I met my goals. Performance remains a pain point for me; I really wish to start developing applications within AO – such as an IDE, so I can ‘dogfood‘ the development of AO. But that’s simply impractical if the performance of the resulting application is inadequate. I fell to the siren song of performance pursuits.

Early in the month, I finally cleared a few tabs from my browser, reading Laurence Tratt’s excellent article, Fast Enough VMs in Fast Enough Time (highly recommended to anyone developing a PL), which describes how PyPy can construct a JIT compiler from an interpreter. I tried my hand at this for ABC, but I suspect I’ll need a more explicit memory model to meet PyPy’s restricted python (RPython) constraints. I might get back to this later. At the moment, the PyPy interpreter for ABC has approximately the same performance as the aoi interpreter in Haskell.

I had more success with a second attempt:

I’ve implemented a simple AO to Haskell translator. From the AO package, `ao dict2hs` will convert the entire dictionary into one Haskell module, AODict, using a simple name mangling scheme. This relies on an externally provided AOPrelude module for definitions of the ABC primitives. This is a very naive, direct translation, but the AOPrelude can (and does) include GHC rewrite rules. The resulting (profiled) Haskell object file is about 12 megabytes for 1141 words, or about a third that for non-profiled code. (The raw Haskell code is 242kB, where the AO dictionary is 144kB. Much of this is documentation (doc.) words.) The only tricky bit was ensuring AO can use Haskell’s TCO, which I eventually achieved by specializing the `$c` sequence. This development comes with a few new utilities for running AO tests and executing AO programs (`aoTest` and `aoExec`).

The performance of this translation seems pretty good for “straight-line” code. For example, `aoTest` and `ao test` run the same tests, but the former runs two orders of magnitude faster (0.012s vs. 1.38s). I was very excited by these numbers. Sadly, they didn’t generalize. Upon implementing a simple loop code (`@bench.repeat100k 0 [4 .add] 100000 repeat 400000 assertEQ`) I discovered a performance factor closer to 6 (from 2.42s to 0.40s). Further, I suspect that the improvement will prove very marginal for deep metaprogramming (where partial evaluation and JIT become more essential).

Despite the limits of this technique, it may be enough to begin dogfooding further development of AO.

So, that’s my plan this April. I’ll need to develop the `aoExec` powerblock a fair bit to perform useful side-effects, but that should not take long. I will need to test the `{&async}` annotation, which I implemented in the AOPrelude. I’m hoping, within the next month or two, to develop a few simple web applications mostly within AO, and eventually shift development of AO from filesystem + gedit into a persistent, wiki-like service. As AO’s performance further improves, more domains should open up to it.

Posted in Language Design | 5 Comments

Awelon Progress Report III

Another month gone, so quickly. What the heck did I do? It’s difficult to recall, but my git history is helping.

It seems my major accomplishments this month were:

  • implemented automatic unit testing for the AO dictionary (runs all `test.` words)
  • implemented an automatic type analysis based on partial evaluation
  • implemented a few list processing functions (e.g. nub, partition, sort, map, each)
  • simplified and optimized fixpoint combinator (15 ABC prims, from 80)
  • cleaned up the conditional behavior words (e.g. `if` and `if_`).
  • partially developed an incremental processes model (based on `µP.[a→(P*b)]`)

Not too bad, but I can’t say I’m entirely satisfied. I had an aha-moment idea for direct compilation of ABC into a Haskell monad (to improve bootstrap performance) but that ran into a wall. After picking up the pieces and removing the debris, all I had left of that effort was the typechecker. I also ended up removing adverbs, after experimenting with the idea and finding I needed more specialization for context and category (e.g. lists vs. streams vs. automata).

Ah, six steps forward, two steps back. That’s progress. :-/

My plans for the next month are to start building functions for data structures: maps/records, sequences, streams. And to further fill out the set for lists. I hope to get far enough along that I can start modeling grammars in April. At the moment, I have 844 words in AO, of which slightly more than half (440) are documentation (`doc.`), tests (`test.`), or assertions (`id.` or `eqv.`). I don’t know how many words I’ll need to bootstrap an ABC or AO compiler – quite a few, I imagine, but hopefully fewer than 10k. Regardless, I need to pick up the pace.

More subjectively, I quite enjoy programming in my new language.

The REPL plus automatic tests mix is very effective. I’d still like some automatic visualization and continuous feedback tools, and I very strongly look forward to rendering colors/styles instead of suffixes/prefixes for large words. But I’m not desparate for these features. AO has been delightful in its simplicity, security, safety, and flexibility. The first-class nature of the environment gives me much greater feeling of ‘control’ than I’ve felt in other languages. Going back to Haskell is starting to feel a little painful.

My main pain point for AO right now is the awful interpreted performance. I’ll be a lot happier with AO when I can compete with other languages for performance.

Posted in Language Design | Tagged | 4 Comments