Awelon Progress Report VII

The two months since my last report have been productive, albeit not in the directions I was aiming to proceed. Much of my work was exploratory. I’ve gained little of lasting value, but at least I’ve gained a little.

In June, I developed a graph based intermediate representation for compiling Awelon Bytecode (ABC) into Haskell, with the idea of eliminating intermediate data-plumbing (e.g. the `lrwzvc` operations) and perhaps supporting a fair bit of partial evaluation (e.g. so `#42` results in number 42). My efforts here were moderately successful. I was executing microbenchmarks in about 40% the time compared to the previous naive JIT. Though, this isn’t so impressive when I reframe it as mere a 20% improvement over the interpreter. Unfortunately, I never did get tail calls working properly in the resulting Haskell code.

I gradually grew frustrated. I attribute my frustration an ad-hoc mix of the long edit-compile-test cycle in Haskell, the pathetic performance of the existing JIT code, the fact that I was not making progress developing the AO dictionary, and my inability to precisely compile subprograms.

By late June, I had a set of ideas for moving forward:

  • shift responsibility for the ABC to Haskell compiler into AO (i.e. to further develop the AO dictionary)
  • enable AO to specify separate compilation on a per-word level, i.e. so words can act as ‘hard’ components
  • enable JIT modules to reference one another, i.e. so we can better reuse memory resources
  • properly unload JIT modules, e.g. between ‘paragraphs’, so they don’t accumulate indefinitely in memory

Enabling modules to reference one another was easiest, so I did that in a couple hours – simply shifting all modules into a single root directory (albeit with a toplevel prefix like `Awx.Byz.` to avoid a super-wide flat directory). Modules are already named via secure hash of the compiled bytecode. Presumably, I could now compile code of form `{#secureHashOfBytecode}` to import the appropriate Haskell module.

My initial idea for separate compilation in AO was to compile words starting with `#`, e.g. the word `#foo` would be compiled separately. I implemented this idea without much difficulty, introducing an additional Precompile step. In retrospect, the `#` prefix criteria proved aesthetically ugly and painful to use, needing whole-dictionary refactoring just to tweak performance. The ‘use prefixes’ idea also composes and extends poorly, since new prefixes would bury older ones. After some brainstorming (annotations? separate file? dictionary driven?) I eventually decided to try a ‘directive words’ concept: if you define `compile!foo`, the AO system will separately compile word `foo` (if it exists). I still have mixed feelings about this, but it works well enough.

At some point during these developments, I got very side-tracked by network implications.

Awelon’s approach to separate compilation and linking is non-conventional in order to support open distributed systems and streaming code. Naming reusable subprograms by {#secureHash} can save a great deal of bandwidth and more effectively reuse memory. And programmer directed separate compilation of select words in the AO dictionary offers an effective jump start to use of these resources in the short term. (I hadn’t expected to make use of the separate compilation model this early in AO’s development.) I got to wondering about cloud services, especially in their role as content distribution networks. If we want to store ABC resources with an untrusted proxy or CDN, we must encrypt the bytecode, and keep the decryption key in the name. I wrote an article about this idea, achieving Tahoe-LAFS like provider independent security for ABC resources. Further, if we’re going to encrypt the code, we can’t really compress it afterwards, so we should probably compress before hand to save storage and bandwidth. ABC code should compress remarkably well. But compression algorithms seem to introduce a lot of ambiguity in their decision making (e.g. where to break up blocks)… which could be problematic for deterministic resource naming. I experimented with a few ideas. I’m still not sure which compression algorithm I’d favor.

I eventually tabled the whole compression and encryption effort for now, using ‘id’ function for compression and encryption. But I’m sold on the idea, and I plan to proceed with it eventually. ABC resources encrypted by default should prove an enormous boon for distributed programming.

I didn’t make any progress at all on re-implementing the ABC-to-Haskell compiler in AO, nor have I gotten around to properly unloading JIT plugins (plugins are disabled at the moment). I’m starting to wonder whether compiling to Haskell is a dead end. The interpreter is reasonably fast, and perhaps I should be shifting efforts towards a true bootstrap, e.g. targeting C or OpenCL or LLVM or JavaScript.

Secondary Efforts

While compilation and JIT were the bulk of my efforts, I’ve been doing a little design work on the side.

Back in April, I mentioned that I wish to also work on automatic type checking, i.e. to detect those silly errors that often pop up when we’re editing code. I’ve been keen on trying pluggable type systems. Awelon project implies a certain degree of structural and substructural typing (e.g. numbers, pairs, sums, unit, blocks, affine, relevant, sealed values). But I think we can go a lot further by making sets of ‘type hypotheses’, e.g. recognizing that some structure is essentially used as a list or a stack.

My current idea is to represent multiple type systems in a single AO dictionary, using a simple naming convention (a prefix) to identify type systems. Each type system will have some opportunity to output results about each program – e.g. types, failures, indecision, errors and warnings. These reports are returned to the programmers, who can decide what to do with them (ignoring is a possibility, e.g. if an experimental type system is known to give bad results). In a good IDE, type feedback for multiple type systems may be continuous as we make edits on a word.

This might be combined with a generic approach to type declarations, whereby we specify that some block of code should have a given type via standard ‘type system’ API. The main purpose here is the ability to declare parametric polymorphism and similar properties that are otherwise difficult to infer.

Another idea I’m working on is embedded literal objects, which allows extending Awelon project’s literal types beyond simple numbers and text, and may prove very useful as a basis for embedded DSLs and rich data description.

What’s Next?

Right now, more than anything else, I need to grow my AO dictionary.

My work on that front has fallen way behind in the last several months. A 392-octet microbenchmark isn’t enough to observe the memory reuse and caching benefits for separate compilation and linking. And, realistically, I need a much richer dictionary before I can effectively approach bootstrap or JIT. Further, ABC has a remaining approach to parsimony and performance that I haven’t even started to develop: Awelon Bytecode Deflated (ABCD), using higher UTF-8 bytecodes to predefine a standard dictionary of popular or performance-critical functions. To develop ABCD requires a massive and maturing ABC code base.

I’d like eventually (this year) to work on the RDP interpretation of ABC. Fortunately, RDP isn’t too difficult to implement except for the difficulty of integrating external systems (filesystems, databases, sensors and actuators, etc.). Unfortunately, integrating external systems with a runtime can be somewhat painful. RDP will take some time to make useful, and I’d rather not spend the next couple months in Haskell runtime code. (Especially since I’d also like to gradually deprecate the Haskell runtime and bootstrap a new one.)

My focus for at least the next six weeks will be the dictionary, which I’ll grow by developing simple projects. I’ll certainly need data structure support if I’m ever to advance my bootstrap efforts.

Advertisements
This entry was posted in Language Design and tagged , . Bookmark the permalink.

2 Responses to Awelon Progress Report VII

  1. Kyle Blake says:

    So, the typechecker for my ABC compiler (github.com/klkblake/abcc) is nearly finished, and if you are going to be writing a typechecker soon, I would suggest looking at the implicit-merged branch w.r.t handling merged types (if it’s not there, then it’s been merged to master. Also, it’s totally uncommented, you have been warned.). Figuring out the correct way to handle them ate more than a month of my time, though it was obvious in retrospect.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s