Ad-Hoc External State Models

Encapsulation, Accessibility, Security

One of the main reasons we encapsulate state is to gain more control over how it updates. For example, take your traditional, imperative, mutable state variables that respond to get and put actions. We might use one or two of these variables to model a queue, or a stack. Queues and stacks have invariants that help developers reason about correctness, performance, fairness, and other properties of the larger program in which they are used. But in an open system, assuming these variables are accessible, someone might develop a new agent, plugin, or process that starts mucking around with the variables and breaking those invariants. By encapsulating state – placing it under control of a uniquely identified subprogram (object or module) – we can easily protect our invariants.

The cost is uniform semantic accessibility. If we extend the program with independent entities they cannot (modulo semantic hacks like pervasive reflection, monkey patching, debug builds) access the state in ways that were not designed. We could depend on programmer discipline and foresight to ensure a good design, but those are precious commodities and shouldn’t be spent frivolously on common problems. Uniform semantic accessibility can help with many common problems: orthogonal persistence, runtime upgrade, live coding, plugins and extensions.

I believe semantic accessibility is too great a price to ask. Consequently, I argue strongly for eschewing local state wherever feasible – including implicit local state such as message queues, synchronization, promises, stateful event loops. Of course there are limits on how far you can take that advice in an architecture not designed for it. What I’m really arguing for is a change in language and architecture designs.

Fortunately, encapsulation is not the only way to protect state invariants.

Consider: we can understand filesystems, databases, HTTP, etc. as providing extremely global, imperative mutable state variables with get and put actions. Such variables are abundant and more than adequate for dynamic systems. For example, you can generate filenames or URLs based on existing client IDs, pairs of IDs, existing state, etc.. And those state variables can be secured, e.g. using cryptographic IDs (based on PKI or HMAC) or sparse capabilities. State can be addressed for any stateful programming task you can imagine. If you want to model a stack or a state machine, designate a few variables for it. If you want to model a queue for every client of a service, designate variables based on the client ID and a server ID. Secure external state effectively solves the same problems as encapsulated state, without hindering extension.

Secure external state seems to have a very different experience and intuition than local state. You lose access to the concepts of creation and destruction, i.e. the ‘new’ operator of OOP. You must always provide a name for a resource. Creation is replaced with naming elements in abundant resource volumes. Destruction is replaced with either reset operations (which clear resources to a state that can be recovered implicitly) or predictable decay behaviors (e.g. exponential decay). I don’t see these experiential issues as better or worse, but they do take a little getting used to.

Abstraction and Behavior Encapsulation

But there remains at least one valid concern: abstraction. When we encapsulate a couple variables into a queue, we now have this entity we can uniquely understand as a queue. We do not need knowledge of its underlying implementation. If we model an external queue using state variables, how might we share and understand our queue as just a queue?

A deeply unsatisfying answer might be: make a queue a primitive. It’s a valid answer. We shouldn’t assume that ‘get or put’ mutable variables are the only primitives available. It’s also an effective, practical answer: need queues or streams? add them. Need relational tables? add them. We can get wherever we need to be, one primitive at a time. The ‘level’ at which we program will rise. It isn’t so uncomfortable to implement a stack or queue above a relational table. But this answer is unsatisfying because it kicks the can, eschews abstraction, and overly empowers the architect.

A better answer is behavior encapsulation, which can be separated from state encapsulation. A stateless program can gather data from external sources, distribute updates, and generally orchestrate ad-hoc dataflows. It is feasible to support distribution and utilization of these programs – e.g. as first-class functions, dynamic behaviors, or stateless objects. We could share a behavior that provides a queue-like interface above a few mutable state variables.

But behavior encapsulation introduces its own challenge. There is overhead to make the behavior available to its users. There are concerns regarding performance, partial failure, and consistency. Extensions might bypass the view. Ability locally reason about correctness is diminished.

Programmable State

It would be most convenient to simply have a queue wherever we need a queue. Similar can be said of stacks, or histories, or various state machines. If we’re to avoid a panoply of primitives, we’ll need to specify ad-hoc behaviors for our state resources. That means programming state.

The mechanics of programmable, external state are no challenge. We need a few programmable primitives to bootstrap the system. We might update state resource Foo with a script that affects the update behavior of state Bar. Or we could have a resource self-modify its behavior from some programmable initial state. If we desire static behavior, we can even specify update behavior directly in a resource identifier. Static behavior might work well enough for simple machines or parametric variations on a theme, and could feasibly allow state transfer as an alternative to behavior upgrade.

Programmable state comes with risks, of course. There is potential for divergence (which severely undermines persistence). Encapsulation still causes problems for extension. It may be difficult to change state update behaviors at runtime without losing work or entering an inconsistent state. The developer will depend on discipline and foresight to address such concerns.

Those risks can be addressed. The issues with local state are most severe when it’s pervasively mixed with the dataflow or orchestration logic. We could easily enforce an isolated update language, i.e. such that future state is a pure function of present state and inputs. We could further constrain the state update language to avoid divergence. If we ensure the resulting models are readily reusable and widely useful – such that developing new ones becomes a rare case – then relying on discipline and foresight is acceptable.

Programmable External State for RDP

Not every programming model is suitable for avoiding local state. For many, local state is pervasive, implicit, or both. Eventful programming styles generally require state for every non-trivial computation, to an extent that state is effectively local and pervasive no matter how you model it. (I advise against events for future programming languages and architectures.) Data streaming (e.g. streaming a large file over time) tends to have similar issues unless the data is naturally time-aligned (like video-frames or sound).

I am developing a paradigm called Reactive Demand Programming (RDP). An RDP program is an internally stateless behavior that orchestrates signals between external resources, which may be stateful. RDP developers are strongly discouraged from use of ‘change’ signals; instead, they should report continuous, time-varying states, queries, or control signals. To model events, I suggest an idiom of modeling a stateful log of events, then using a signal to report the state of that log. RDP is highly suitable for time-aligned data.

RDP’s interaction with stateful resources is quite simple: a signal may contain information about the current state of a resource, and the set of active signals reaching a resource can influence its present and future state. The origin, history, and future of specific signals, and replication of a signal, must not affect observable behavior (a constraint to achieve highly valuable commutativity, idempotence, and merge equivalence properties).

A consequence is that RDP needs new state models. Imperative, mutable state variables with get and put actions are a very poor fit for RDP. Further, all of them need to be external.

Since initially developing RDP, I’ve been contemplating how to resolve the state issue. I’ve sketched various state models such as history with exponential decay, reactive state transition, and animated term rewriting. I plan to implement a fair number of these. This would be a daunting task without generically programmable state.

Fortunately, it isn’t difficult to represent programmable state models. My current implementation of RDP is a Haskell project called Sirea. The user-programmed aspect of the state model can be modeled with typeclasses, while unique type identifiers become part of the resource identity (via Data.Typeable). A lightweight sketch for time-independent discrete declarative state – complete with orthogonal support for persistence and history – looks something like:

-- Every state has a history. State has a unique identity, rsc. State is
-- updated by signals of type i, which is run in a fixpoint loop in any
-- model of continuous time. 
class (Hist rsc s, Eq s, Ord i) => State rsc i s | rsc -> i s where
    update    :: rsc -> Set i -> s -> s

-- Support for both exponential decay of history, and windowed
-- history. A halflife of zero or less will result in no history 
-- being kept past expiration. 
class (Typeable rsc) => Hist rsc s | rsc -> s where
    intial    :: rsc -> s -- initial value (infinite history)
    mergehist :: rsc -> DT -> s -> s -- decay merge
    halflife  :: rsc -> DT -- exponential decay rate after expiration.
    expire    :: rsc -> DT -- windowed history; decay starts after

-- persistent state
class (State rsc i s, Serializable rsc, Serializable s) => PState rsc i s

-- declare state available in a particular partition
class (Ord ty, Partition p) => DeclState ty p

observeState :: (PState ty i s, DeclState ty p) => ty -> B (S p ()) (S p s)
observeHist :: (PState ty i s, DeclState ty p) => ty -> DT -> DT -> B (S p ()) (S p [(T,s)])
influenceState :: (PState ty i s, DeclState ty p) => ty -> B (S p i) S1

I describe the need for fixpoints in declarative state way back with reactive state transition.

Anyhow, something like the above, perhaps with a few variations for different kinds of state models (e.g. discrete animated state models, which may change incrementally despite constant influence), will probably become part of the Sirea state package, enabling developers to easily introduce new, ad-hoc state models while getting persistence, history, and other nice features implicitly. I really like the idea of uniform access to history. It may similarly be worth building in anticipation as a uniform feature, which would barely be any extra effort.

About these ads
This entry was posted in Concurrency, Language Design, Modularity, Open Systems Programming, Reactive Demand Programming, State, Types. Bookmark the permalink.

One Response to Ad-Hoc External State Models

  1. Pingback: Declarative State Machines | Awelon Blue

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s