Modularity without a Name

A couple years ago, Gilad Bracha posted on the evils of import, referring to the common practice of importing modules by name. He argues that this use of import undermines configurability and reusability of modules. I encourage you to read this article if you have not already done so.

There are at least two fundamental arguments justifying his position:

  1. Adaptation: Import early-binds a module to the features and properties of the module being imported. For flexible configuration and reuse of a module in a wider variety of contexts, it is desirable to influence these properties externally.
  2. Entanglement: Import by name entangles a module with a namespace. This makes it difficult to extricate a module for reuse in a different project – i.e. to port one module requires grabbing its entire ‘dependency forest’. Merging a dependency forest can introduce its own problems due to overlapping names with different versions or meanings.

Bracha goes on to argue for parameterizing modules, which answers the adaptation problem. However, there are problems with parameterization.

Sub-Modules

A useful feature for a module system is the ability to break down large, cumbersome modules into a library of smaller sub-modules, which might be independently usable. In a traditional module system, one would simply break these modules into smaller pieces then import them by name.

However, without import, these dependencies could only be provided by the client. And this could be a bad thing indeed. Consider: module M becomes large and unwieldy, so one breaks it into sub-modules Q and R. With a naive parameterization, one would first need to retroactively repair every client of M to access M(Q,R). If one later decides to break Q down with S,T, one would go back once again and repair this to M(Q(S,T),R).

It seems that module sub-structure is more like an implementation detail than a configuration option. To support modular modules without exposing implementation details to the client, access to specific sub-modules (such as Q, S, T, and R) must be implicitly passed by the client to module M.

Parameterized Namespace

One possible option is to provide a namespace as a parameter. Bracha essentially takes this position, recommending use of a Platform object as a common parameter between modules.

Following the earlier M(Q(S,T),R) example, a client would invoke Platform.Foo.Bar.M(Platform), and module M could access Platform.Foo.Bar.Q(Platform) and Platform.Foo.Bar.R(Platform). Module Q, in turn, could access modules S and T. Module R could just ignore the Platform parameter for now, but it provides a sort of future-proofing: if one later decides to break module R into smaller sub-modules, or to replace R with some other module, the clients will not need to be fixed.

Parameterized namespaces offer a significant advantage over traditional, global namespaces because they represent the namespace as a first-class object or value under programmatic control. For example, the following developer stories become available:

  1. A developer is working on an experimental version of module S, called S2, but does not want to commit it immediately. For testing, the developer modifies the Platform with: Platform.Foo.Bar.S := Platform.Foo.Bar.Experimental.S2.
  2. A developer loads an XML configuration file and wants to configure options and preferences for the rest of the application. The results can simply be added to the Platform object as a global configuration space – e.g. Platform.AppCfg.logger := myLogger; Platform.AppCfg.display_pref := "GTK"; Platform.AppCfg.use_truetype := True. Platform is already shared everywhere.
  3. A developer has an untrusted module from a third-party. Assuming a sane module organization and object model, the developer can create a new Platform that is exactly the same as the first – minus access to network or filesystem.
  4. A developer can easily inject a module to ‘wrap’ a module, e.g. a membrane pattern. This might be useful to extend a module with logging and call-counters.

However, despite their many advantages, parameterized namespaces do not solve the entanglement problem. A module M will depend on the structure of the namespace – i.e. that sub-modules Q and R are in the right locations, and have the right versions. Reusing a module from another project still requires merging the dependency forest and potential versioning issues or name conflicts.

Content Centric Import by Need

A well-studied alternative to locating objects or modules by unique name is to locate them by content. This relates to content-centric networking, associative memory systems, and the blackboard metaphor.

These content-centric mechanisms could be applied effectively to the modularity problem, trading import-by-name for import-by-need. I envision pattern-matching on a module’s export list, possibly augmented with some tests and type analysis. For example, if I want a module that exports a function that converts integers into roman numerals, the import statement might look like this:

import { toRomanNumeral:R }
assert (R 1 == "I")
assert (R 17 == "XVII")

I’m assuming each ‘import’ statement imports exactly one module. To import a list processing module will require a second import statement. The statement presented here tells the linker to reach out and find all modules that export a symbol ‘toRomanNumeral’ with an associated function (called R, locally). Modules are filtered to just those that pass the assertions. The assertions give a developer some extra confidence in the selection, and serve as a sort of unit test and sanity check.

As with any untested technology, there are many reasonable concerns with taking a content-centric approach:

  • scope: From where do imports come? Some implicit database or filesystem? An Internet-size CCN or cloud? It would be useful to place this under programmatic control, like with parameterized namespaces.
  • ambiguity: There might be three different modules (likely, three different versions of the same module) that export ‘toRomanNumeral’ and pass the given test. Which version is imported? How does one control the answer when an undesirable module is selected?
  • stability: Will a choice be stable over time and under slightly varying conditions (e.g. application restarts, runtime upgrade, live programming)?
  • predicativity: Can the choice be predicted by simple rules that are easy to understand and apply? Is link performance predictable?
  • verbosity: Isn’t writing up an import and a bunch of assertions a tad verbose?

I have some preliminary answers to each of these concerns.

The problem of scope can supported in analog to the parameterized namespaces described earlier. Instead of namespaces, there are search spaces, which might represent different sets of modules. When searching for a module, only search the given space. The scope can start small; one could use ‘import’ to obtain a larger search space, e.g. to access modules from a cloud.

With parameters, the import statement might look closer to:

import { toRomanNumeral:R } <= SearchSpace { ss:SearchSpace }
assert (R 1 == "I")
assert (R 17 == "XVII")

Stability can be achieved by making more-intelligent search space – i.e. one that remembers old answers and tries to stand by them unless explicitly reset. Similarly, adding a simple preference model to the space could help with configuration (e.g. favoring truetype and GTK), reduce ambiguity, and increase stability. I call these more intelligent search spaces ‘match makers’.

Ambiguity and predictive choice are controlled by several means. Controlling the search space allows developers to easily limit answers to databases they maintain and understand. Specifying a match maker’s preference heuristic can better control selection – e.g. prefer a higher version number, if all else is equal, or prefer the module that does not export ‘experimental:true’.

And, of course, there is a brute-force mechanism: simply use a bigger pattern match – more context means less ambiguity. It is trivial to model a namespace atop a content-centric substrate:

import { ns:"NLP.Latin", ver:V, toRomanNumeral:R } <= MatchMkr { mm:MatchMkr }
assert (V > 2)
assert (R 1 == "I")
assert (R 17 == "XVII")

While ambiguity is still possible, it can be effectively controlled in practice, and I imagine developers will create de-facto social idioms for further control (such as always export namespace, version, author, license…)

As a final bit of control over ambiguity: we could simply annotate that we want a result only if it is unambiguous. Consider the possibility of a special parameter seen by the matchmaker, called ‘meta’:

import { toRomanNumeral:R } <= MatchMkr { meta:{unique:true} mm:MatchMkr }
assert (R 1 == "I")
assert (R 17 == "XVII")

This might tell the linker and matchmaker to simply ‘fail’ if the solution is not unique.

The verbosity issue is partially solved by the very modular modules mechanisms we’re developing – move the test function into another module so that you can assert with one big test (disambiguate the test function using a large pattern). However, eliminating verbosity would also involve user-defined syntax to ‘abstract’ common sets of imports into the ‘language’ definition.

Performance predictivity, using the import model described here, remains a concern. The above examples are somewhat trivial, but in practice modules will interact. For example:

import { foo:F, bar:B } <= MM { mm:MM }
import { baz:Z, qux:Q } <= MM { mm:MM, bar:B }
assert (F 2 == Z 3)

In this case, the linker must find a combination of two modules that meet the assertion. Exports from and assertions within the baz/qux module might involve the bar value. The keyword here is combination. The costs are potentially combinatorial!

But I don’t believe this will be a problem in practice – e.g. not more so than the Turing complete template meta-programming. Developers will be able to control the costs of linking, and should have a good understanding of where they depend on the linker to ‘search’ for a solution. A little feedback from the IDE or linker would let them know where searches are taking too long, and they’ll tweak specifications to compensate.

Linkers as Constraint Solvers

A little ambiguity can be a blessing. There is another, more positive phrase for ambiguous choice: fallback option. Fallback options offer a basis for resilience and fluid adaptability.

The ‘import’ model I describe for content-centric linking allows developers to leverage this ambiguity. The linker effectively becomes a constraint solver. Even without parameters, applications could easily adapt based on available modules.

Parameters are supported to specify the matchmaker, and may be leveraged for other tasks. But parametric choice has a different flavor than ambiguous choice. Parameters are local, explicit, rigid, closed. Ambiguity is global, implicit, fluid, open.

I believe the two bases for adaptation have excellent potential to augment each other. To leverage ambiguity and constraint solver features to the greatest extent, weak constraints – aka preferences – will become critical.

import { ns:NS, ver:V, toRomanNumeral:R } <= MatchMkr { mm:MatchMkr }
prefer (V > 2)
prefer (NS.contains("Latin"))
assert (R 1 == "I")
assert (R 17 == "XVII")

This does leave an open question: How should the linker decide ‘strength’ of preferences, in case it can only satisfy some of them? I don’t like this question, because any answer to it is arbitrary. One possibility I’ve entertained is to simply specify a heuristic strength of each preference (e.g. prefer [3] (V > 2)). In a sense, discretionary arbitrariness is the purpose of preferences, so it might be okay to go without any standard definition.

I envision a sort of ‘stone soup’ approach to programming, where developers simply toss a weakly specified application (‘main’ module) into a database of service and library modules, adds a few adapter modules to account for differences in vocabulary, and the linker tries to find some way for the application to make sense. They tweak as necessary to get closer to what they want. The IDE can help out because ambiguity means it has suggestions to make.

But ‘stone soup programming’ is certainly getting hand-wavy. At the very least, linkers as constraint solvers seems a decent basis for integrating a limited form of declarative metaprogramming.

Grammar?

What might this look like? A potential grammar is:

Start = 'module' Record Mexpr* 'export' Record
Pattern = Symbol | Variable | Record
Record = '{' (Symbol:Pattern)* '}'
Symbol = [a-z]([a-zA-Z0-9_])*
Variable = [A-Z]([a-zA-Z0-9_])*
Mexpr = 'import' Record '<=' Variable Record
      | 'define' Pattern '=' Vexpr
      | 'assert' Vexpr
      | 'prefer' Strength Vexpre
Strength = '[' [1-9] ']'
-- Vexpr (value expression) is language defined

I’m assuming the first line ‘module’ simply receives the parameters. So one of the earlier examples might look like:

module { mm:MM }
import { toRomanNumeral:R } <= MM { mm:MM }
assert (R 17 == "XVII")
define Blah = ... R ...
export { blah:Blah }

However, I could also see developers wanting to put their exports at the top, right next to the parameters for a quick one-line summary, symmetry with import, and easy abstraction with user-defined syntax.

module {blah:Blah} <= { mm:MM }
import { toRomanNumeral:R } <= MM { mm:MM }
assert (R 17 == "XVII")
define Blah = ... R ...

Note that modules are anonymous.

An interesting feature of anonymous modules is that you can easily keep multiple small modules in a single file, i.e. to allow overrides. Most traditional module systems use the module names to help direct the search through a filesystem or database – one more painful, unnecessary entanglement to deal with!

About these ads
This entry was posted in Language Design, Modularity and tagged , , , , , . Bookmark the permalink.

7 Responses to Modularity without a Name

  1. FWIW, content-centric networking as defined by Jacobson et al. is *not* a content-addressed architecture (although one surely can build one on top of it.) It’s about “networking named data”.

    • dmbarbour says:

      Granted. It annoys me that Jacobson misuses the phrase (name is not centric or intrinsic to content), but he was there first. His use of the phrase leaves me with the impression that he started with a grander vision but scaled it back.

      Perhaps I should say ‘content-based’ or ‘content-addressed’. I’m thinking of several content-based publish/subscribe models, distributed searches, and the like.

  2. Luke says:

    This problem is difficult because its solution is prevented by many accepted facets of programming. You will not solve the entanglement problem unless you can make your modules context-independent; i.e. you should be able to rip out the rest of the project, save this one module, and the one module should still have meaning. Your search space ideas are neat, but they are heavily context dependent. In fact, the more sophisticated the module search techniques used, the more coupled the module becomes to its context (the sophisticated techniques can be used to create very modular modules, but we know that it will be abused. Modules must have meaning even when their features are abused.)

    It follows, at least if you are a functional thinker, that a module is a function from its dependencies. The dependency constraints form the type of the module: they tell you what the valid parameters are. Try to think of “function” generally; a classic ordered parameter list is probably not the best human interface to said functions. But exposing the functional structure of modules is the key to making them composable. This goes against our intuition, at least superficially, that modules are stored in files. Forget about files, they are an implementation detail.

    I have implemented this technique in Javascript. It is very composable, but a problem shows itself fairly soon. It seems there are two kinds of dependencies: a true abstract dependency, and a vocabulary extension. An abstract dependency must be instantiated with a concrete instance before being used. E.g. module UI is a function of a drawing module; you must specify which drawing module to use when importing UI. A vocabulary extension, on the other hand, is something that really shouldn’t affect the importers of the module. These are just a set of words that are convenient for defining the module, but to be able to instantiate them with another implementation would be to know too much about the module (for these might completely change and shouldn’t change the module’s public interface).

    I haven’t resolved this problem yet. Perhaps I am thinking too rigidly about what a module interface is, or what an importer can say.

    I am so happy to see people thinking about these issues. Every language I have used gets modules very, very wrong. Time to wake up.

    • dmbarbour says:

      Entanglement is caused by inflexible or rigid coupling to specific or shared objects in an environment. Depending on context is acceptable if done in a generic way, such that we can easily transplant a module from one context to another and it will simply discover and hook a new set of dependencies.

      Content-based search reduces coupling compared to name-based lookups, and thus reduces entanglement.

      A module may depend on its parameters, and parameters are provided by context. It just so happens that the parameters may include bulky variations on context objects (matchmakers, search spaces, platforms, and so on). This is a rather weak, indirect form of context dependence in any case, though depending too much on specific parameters would still allow entanglement issues.

      I grant the functional view might be violated if parameters are mutated. For my language, Awelon, I do not mutate parameters; e.g. if developers need a different matchmaker they would ‘import’ a new one that is just the same as the previous except with a few tweaks or filters.

      You can indeed take an anonymous module out one project and dump it into another, and (assuming the module is written in a known language) it will ‘have meaning’, even if not applicable to the project. Developers can also model multiple contexts (multiple matchmakers) within a single project, which can be useful for modeling distributed applications (each context is a different host), configuration management (each context is a different configuration), and security (e.g. filter sensitive resources for untrusted code). So it is actually feasible for a single module to be used with multiple (parametric) contexts in a single project.

      I agree with your observation that there seem to be multiple kinds of dependencies, and that ‘abstract resource’ vs. ‘vocab extension’ is one way to classify them. I would encourage developers to use the matchmaker for vocab extensions and parameters for abstract resources. I see this as a simple control issue: a client has much more awareness and control of parameters than of a context object. I think a good way to encourage this is to make it more difficult to share state or stateful services except by use of parameters, i.e. by encouraging directory paths at each import.

  3. Kevin says:

    I’m just thinking through the implications, so please let me know if I’m wrong somewhere.

    Continuing your example, the constraints upon toRomanNumeral would ideally be the complete minimum expected semantics/constraints we demand of that function locally (rather than a spot check as in your helpfully simplified example).

    In this case, the full description might look like an implementation of the function, but it would actually be part of your import test function which would itself probably be imported by name. (btw, would that be equivalent to a vocabulary extension? I interpret “vocabulary extension” as an import of names for their associated semantics, which would usually be static.)

    Then we might use the local symbol “toRomanNumeral” to refer to the function that meets our constraints (by way of a MatchMaker). In this case, the actual function name of “toRomanNumeral” is probably not one of the constraints, because we don’t care what the matched function implementation is named as long as it provides the input/output we demand through our description.

    Is importation reactive? i.e. if some constraints change or a better match becomes available, will there be a re-match and conversion of the import? e.g. a MatchMaker with an optimization constraint can behave like a JIT compiler and optimize a code path by replacing it with a semantically equivalent but faster implementation (possibly limited to a specialized context) based upon instrumentation.

    Hm, I’m having deja vu, so maybe I’ve said all this before. But it is fun to daydream about. :) Thanks for sharing, David.

    • dmbarbour says:

      Supposing the language provided sufficient support for abstract analysis and universal assertions (e.g. assert (forall A : Nat . Foo A (R A))) or existential definitions (e.g. exists R : Nat->String . forall A : Nat . Foo A (R A)) then we could achieve something much richer than spot checks. But, while such things might be ‘ideal’ (if we had infinite computational power, why not?), I must admit to doubts about practical feasibility.

      Spot checks are, without a doubt, practical, useful, and feasible. No hand-waving is necessary.

      With respect to reactivity: I plan to eventually model my reactive language in itself (supporting reflective towers), in which case everything must be reactive ‘all the way down’ – continuous reactive IDEs, compilers, linkers, matchmakers, et cetera.

      However, for stability reasons, I imagine matchmakers would be regulated through a state resource. Stability tames reactivity. For most applications and services, we do not want to teeter-totter between dependencies during live programming or runtime upgrades. So, even if a module becomes better according to preference heuristics, it might not be immediately used. (Though, developers might be alerted of the possibility and have a button they can press to refresh the configuration.)

      • Kevin says:

        It wouldn’t be feasible to explicitly verify the full semantics on every import, but by factoring out the import test function, it can become a common resource used somewhat similarly to RDF vocabularies. When an implementation becomes available, then it would be matched with common terms, so that future matching can occur at a higher level.

        Even if you just stick with spot checks you probably want to factor common ones out and match on them for performance (and verbosity, as you mentioned). Of course, this could be built on top of RDP with the right standards, so it’s probably not necessary to resolve the details yet.

        Your focus on reactive stability makes sense. It’s all very exciting!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s