Program Rewriting

Rewriting is not an unusual basis for programming language formalization. Lambda calculus is frequently expressed in terms of rewriting, as are combinator logics. However, rewriting seems to be unusual at the implementation layer. Instead, we model environments, stacks, none of which are represented in the original program. We abort computation upon encountering free variables or stack underflow, neither of which are errors when the same program is placed in a different context. We worry over control flows and eager vs. lazy evaluation and parallel strategies, all the while struggling with constant propagation and partial evaluations.

Suppose we implement a language primarily by rewriting. We can feed our program to our evaluator, together with a space and effort quota. It chugs and churns and rewrites for a while, then outputs a different representation of the same program – something we can view and edit just like the input program. (To clarify, this is not the same as generalized term rewriting. The evaluator needs only one specific set of rewrite rules, those for our language semantics.) What have we gained?

  • Simplified resource management. Even upon hitting quota limits, we receive a useful, partially evaluated result. This is similar to a memory dump, but is wholly independent from the evaluator. We can continue evaluation, perhaps using a larger space quota or slicing a large computation into small, independent pieces.
  • Simplified debugging on error. If our language doesn’t know what to do with (“cat”+3), then we’ll simply carry that expression into the output, giving developers a concrete view of the error. Because there is no context in which that is okay, our evaluator may even reify it as `error(“cat”+3)` so errors are easily counted, highlighted in red, discovered by text search, etc… Our program output on error isn’t a horrifying, illegible memory dump, but rather is the same language as our input and may be rendered and studied using the same tools and text editors.
  • Improved debugging HCI. All program ‘state’ is explicitly modeled as part of “the program”. Thus, there is never any need to hover over or query a variable named `x` to read its value. Instead, you’ll plainly have an `x = 1234` somewhere in the program. Similarly, if a program is evaluating multiple instances of a subprogram (e.g. due to recursive loops or parallelism), you’ll be able to see those instances, which effectively gives you a rendering of the full stack and every parallel computation. Of course, we’ll need to use progressive disclosure techniques for rendering of very large structures. But that can be achieved by simple heuristics or data type annotations, is entirely data oriented, and is a much easier problem to solve.
  • Simplified breakpoint based debugging. We can reify breakpoints within our language as `@foo(P)`, configured by name as an additional evaluator input. If inactive, @foo(P) returns P. If active, our evaluator rewrites to `@foo:123(P)` (with `123` unique in the output) and treats the breakpoint as a barrier to use of P, but may evaluation of independent subprograms. Our output – the partially evaluated program – can be rendered as plain text (or using our normal program editor), and active breakpoints are easily highlighted, found by searched, and deleted by hand or by pattern.
  • Simplified log based debugging and profiling. We can trivially configure some breakpoints instead as ‘log points’. For example, @console(P) might both return P and print P to the console. A simple convention would an extensible set of names default to logging (e.g. @log:foo). Of course, logging doesn’t provide much context. We can do better with history based debugging.
  • Simplified history based debugging and program ‘animation’. We can animate on arbitrary breakpoints like `@foo` by simply evaluating as far as we can go, taking a snapshot, dropping all the `@foo:*` breakpoints, then repeating. We can animate on multiple breakpoints choosing a strategy: pararallel (@foo and @bar release together), hierarchical (as many @foo frames as we can manage, then one @bar frame, repeat), interleaved (@foo @bar @foo @bar), etc.. Because we’re halting on developer selected breakpoints (rather than time quotas) the intermediate programs will be predictably structured, easy to analyze and debug. Ultimately, we have many megabytes or gigabytes of highly compressible program-snapshots that could easily be rendered into a GIF image.
  • Simplified partial evaluation and staging. With rewriting, partial evaluation is the normal case, i.e. we can take any subprogram and evaluate it as far as it goes, independently of input. With a few annotations like `asap(P)`, we can tell an evaluator to prioritize evaluation deep within a program, giving us effective control of staged computing, no need to juggle anything.
  • Supports expressive spreadsheet HCI. In a typical spreadsheet, cells are inexpressive programs that reduce to an anemic data model. With program to program rewriting, cells can express anything our language can express – e.g. functions, objects, databases. Even when a cell contains a function, it’s the partially evaluated function, which allows for convenient inspection. Given expressive types, we can support have flexible rendering options. Using the ‘animation’ idea, spreadsheet cells could additionally render their own evaluations in a loop for anyone who is interested.
  • Simplified integration of DSLs. In context of rewriting, it is easy to leverage partial applications into partial evaluations. Thus, DSLs need be no worse than the host language for performance, not even for external DSLs. This does leave some interesting challenges, such as tracing errors back to lines in the original DSL code. (I do have some fledgling ideas about annotations for tracing, e.g. configuring a @gate to add hidden, contagious markers to data, perhaps with a limited generation count. Such a technique would be useful for active debugging in general.)
  • Simplified linking. Programs are their own representations, so we only need to deal with the problem of linking values. Linking values is a much easier problem compared to linking of mutable things, e.g. leveraging secure hash URLs. With program rewriting, a runtime link failure isn’t a huge problem. The only consequence is evaluation stalls, much like failing on an effort quota. The output remains a valid program and may easily be continued later, e.g. when the network is up.
  • Simplified big data handling. Because runtime linking is lightweight, secure, and fail-safe, we can feel free to tuck things behind links at runtime. For example, if we don’t expect to need P any time soon, we could say `stow(P)` and we get back a link reference to P that may later be loaded. This technique can be directly applied in the construction of data structures – log structured merge trees, hitchhiker trees, finger-tree deques – that may be vastly larger than working memory.
  • Simplified compilation model. When linking subprograms by secure hash, we are not restricted to loading the plain text representation. We may instead load a cached representation that is heavily optimized for the runtime, where the binaries are compact and the functions run fast. We may use annotations like `jit(P)` or `optimize(P)` to encourage the runtime to perform compilation. Interestingly, linking subprograms by secure hash gives us shared objects by default.
  • Simplified IO. Assume we model a `type Process = Request → (Response, Process)` and hook it up to a network socket. Request, Response, and Process are programs, because all values in program rewriting are represented by programs. They come with plain-text representation, maybe even a type checker. This eliminates a lot of ad-hoc make work surrounding serialization and input validation. Depending on their types, Request and Response may be very expressive, e.g. containing processes themselves, linked references to other data, and procedurally generated structure. Or they might be plain old data types. Injection and extraction of data by other languages needn’t be any more difficult than working with JSON or SQL.
  • Simplified transparent persistence. In the aforementioned process model, the Process is easily checkpointed to plain text, and completed Requests are easily recorded to a log. So we can easily use a simple checkpoint persistence model, heuristically updating the checkpoint and clearing the log. This, again, follows from everything being an independently serializable program.
  • Simplified parallelism. Rewriting multiple parts of a program is the default case rather than the a special case that requires a lot of effort to make happen. Because even the most lightweight of parallelism tends to have some overhead, we might still use explicit annotations like `par(P)` to indicate parallel evaluation. However, we don’t need to make special efforts to ‘touch’ that subprogram with the a control-flow cursor. Even when running sequentially, a good evaluator will occasionally yield work on one part of the program to go work on another, both to avoid burning a limited effort quota on an infinite loop and to improve the debugging experience.
  • Simplified distributed computing. Evaluation of a very large program can be divided among multiple processors in a cloud or mesh network. If a node fails, the worst that happens is we need to recompute some subprograms, and we can limit rework by performing occasional checkpoints. Links to big data or parallel/lazy computations can be communicated cheaply, with the data being serialized only when necessary, which makes it easier to pipeline data through subprograms that are mostly doing data-plumbing work. With annotations like `remote(Node,P)` we can potentially support declarative distribution.

A lot of difficult or annoying problems are simplified upon fully embracing `Program → Program` rewriting as our primary evaluation model. Also, I do not believe performance will suffer, not if we dedicate even a moderate fraction as much time to taking program rewriting seriously as we have historically dedicated to imperative programming.

Has the mainstream computing industry hobbled itself by sneering at rewriting? I am starting to thing so.

And I am now giving serious consideration to replacing my project’s existing bytecode with a variant more friendly to rewriting.

This entry was posted in Language Design, User Interface and tagged , . Bookmark the permalink.

7 Responses to Program Rewriting

  1. kevembuangga says:

    Hmmm… Are you sure you really “gain” anything by massaging the source text in whatever way?
    Did you read Tradeoffs in Metaprogramming fromTodd Veldhuizen:
    http://www.oalib.com/paper/4020234#.V8We-1Yw1hE
    Toil-wise EVERYTHING is equivalent, it’s only a matter of flavor and personal preference.

    • dmbarbour says:

      It really isn’t the case that all things are equivalent toil-wise. Accidental complexity is a thing, and it exists at every layer or membrane in a system, including expression of models, APIs, languages, and the physical machine. To say all things are equivalent toil-wise would imply that all things have the same accidental complexity, which is profoundly untrue.

      I’ll take a gander at the paper. But just to clarify:

      Program rewriting is not metaprogramming. The output of program rewriting is a different representation for the same program as the input. As a trivial example, with `11 + 12` as an input program, the output might be `23`. Program rewriting is thus a form of evaluation.

      However, program rewriting can ameliorate much of the accidental complexity surrounding metaprogramming, e.g. because a rewriting evaluator can perform ‘constant propagation’ of DSL data through a staged interpreter, and we can work within normal language abstractions as opposed to explicitly generating source code.

      • kevembuangga says:

        “Program rewriting is not metaprogramming.”

        Still very close, metaprogramming “rewrites” a high level specification into a lower level language for which compilers/tools/etc are readily available.
        You seem to want to rewrite a somewhat messy specification into an “improved” one, not may be exactly in the same language, that’s optimization by compilation.
        Veldhuizen’s point is that you cannot truly “erase” the inherent complexity of a given task, devising a well typed code is tantamount to proving that the typing is correct (of course) and requires no less ingenuity.
        A very recent post by John Shutt may also be relevant:
        http://fexpr.blogspot.nl/2016/08/interpreted-programming-languages.html

      • dmbarbour says:

        We cannot erase the inherent (aka essential) complexity of a problem. But if you review the article, you’ll see that all my points regard issues of non-inherent (aka accidental) complexity in software development and engineering. Our environments have friction and fragility that require us to toil less efficiently or more cautiously than inherently necessary.

  2. ivan.moony says:

    Aren’t all compilers just term rewriting systems, deep in their soul? They are rewriting terms of one programming language to terms of the other language. Thinking further, if we had a term rewriting system on top of a regular programming language, we could gradually extend that language by definitions of rewriting new kinds of expressions to expressions of that language.

    I liked the article, but I might be subjective, as I’m already working on one term rewriting system for a while now.

  3. Why do I get the feeling that you are advancing the ideas I’ve put into Enchilada 🙂 ?

    Enchilada is a purely functional ‘postfix’ language based on (parallel) term rewriting.

    Everything in Enchilada is immutable: state, programs, and (yet to be rewritten) terms. It’s also a total functional language – next to being an ‘array’ language similar to apl/j/k.

    Everything also carries a cryptographic hash, and everything can be stored, referenced and retrieved under such hash, – recursively – at parse time.

    Enchilada also has a kind of ‘quota’ operator. You can evaluate a (quoted!) expression n times:

    [1 2 + 3 4 + *] @1

    is rewritten as:

    [3 7*]

    .. because 1@ denotes the evaluation of a quoted term, by 1(@) parallel evaluation steps. Oh yeah, and exactly because evaluation is bounded via the @ meta operator, programs always terminate (the most important property of a total functional language).

    That said, I’m trying to advance Enchilada’s to a next level:

    1) strongly typed
    2) everything cryptographically authenticated (code, computations and data)
    3) fully traceability (aka provenance)
    4) reactive (think spreadsheets, but than via pervasive memoization)

    • dmbarbour says:

      I think you have some neat ideas there.

      Though, I’d not call it a total language based on using a quota. Not unless you can reduce to a normal form, a mathematical fixpoint, in a finite number of evaluation passes.

      Naming immutable subprograms by secure hash can be useful. But mutable codebases can also be interesting, e.g. codebases can be used as databases, tuple spaces, blackboard systems. I’ve been thinking a lot about modeling application state in a codebase.

Leave a comment