Semi-Static Structure

Lisp, Scheme, and a few other languages represent most structure using pairs of values. In general, these languages are very dynamic: the structure of any value can be introspected at runtime using queries of the form “are you a number? are you a pair?” For many use-cases, this dynamism is welcome. However, it also can hinder static analysis for safety or performance. To recover performance, dynamic languages often implement a second structural concept, such as arrays. Sadly, having both the pair and vector concepts seems redundant and raises complexity of generic code.

Recently, during development of Awelon Bytecode (ABC), I found a simple, elegant solution: unit type as a barrier to introspection. By doping a data structure with a few unit values, developers can crystallize structures that must have a fixed or statically known number of elements.

This concept came about when contemplating the nature of unit type `1` and void type `0`. My thoughts at the time were:

Unit is supposedly the identity type for a product. Ideally, `(a * 1)` has no more information than `a`. Similarly, void is the identity type for a sum, i.e. `a + 0` has no more information than `a`. However, if I allow introspection, then it is clear that `1` and `(1*1)` and `((1*1)*1)`, etc. could be distinguished, in which case the structure would carry information. Is there any way to have structure without information? Hmm…

Ultimately, I only need to avoid runtime information. I can typefully forbid introspection of unit-type, such that asking whether `(1 * 1)` is a pair will return the affirmative, but asking whether `1` is a pair is a type error. Similarly, unit value may only be compared with unit value, and is always equal. Essentially, there is no way to compute the number or placement of unit values at runtime… without statically knowing the answer.

Lisp and Scheme lists, then, would be modeled by terminating them with a number (perhaps 0). But if we want to model a fixed-width vector or fixed-size matrix in a manner the optimizer can easily recognize, we could terminate those with unit values. Our ability introspect structure ultimately depends on knowing where the unit values might or might not be placed.