Reject Remote Values

Haskell is a lazy programming language, in which a value is not fully computed before observed in a function; this easily allows for infinite structures, mutual recursion, and parallel processing. Oz/Mozart uses logic variables, which allow us to declare a variable, share it in a structure, and unify it in concurrent computation.

I find myself tempted to lift these features into a distributed programming system. Presumably, I could save some bandwidth by implicitly passing around a reference to a large value, then lazily grabbing the value when it proves necessary. We could call these ‘remote values’.

However, remote values fail. Here’s why:

  1. disruption and partitioning: distributed systems have failure modes that a local runtime does not, including disruption (both temporary and permanent). However, a ‘value’ is an immutable concept, so we cannot observe ‘disruption’ while looking at a value.
  2. denial of service: Lazy values could be injected deep into a service, crossing multiple trust boundaries, then denied when observed. This is a potential security risk. Security risks should always be obvious in the model.
  3. effectful observation: in an open system, even under ideal network conditions, we cannot enforce a condition that observing the value is effect-free. We cannot protect the ‘value’ abstraction. This can also be a security risk, in the form of covert channels or analysis of exactly which information is observed.
  4. GC concerns: Distributed GC is very difficult, and usually heuristic. Heuristics failure would violate the ‘value’ abstraction. And the costs of collecting remote values might easily outweigh the benefits.

Note: we can still benefit from remote parallel processing for values, so long as we could regenerate the computation (or even run it locally) after it fails. That is really a separate issue, since it assumes a closed system.

Also, we can explicitly model futures and promises with effectful objects and agents, subject to disruption. Explicit, effectful objects at least make the security and partial failure risks much clearer to developers. This should be done via library because developers must be aware of the model.

This entry was posted in Distributed Programming, Language Design. Bookmark the permalink.

2 Responses to Reject Remote Values

  1. > distributed systems have failure modes that a local runtime does not, including disruption (both temporary and permanent).

    I prefer the phrase “weak or episodic connectivity”. For example, signal strength effectively measures the confidence an Observer has that it has correctly interpreted a Subject’s broadcast message. Weak connectivity may be sufficient for some applications. For example, if we are chatting online, then a protocol that informs you I am typing a reply to the last thing you said probably doesn’t need reliable messaging (TCP) and can drop the requirements for reliability, ordering and data integrity.

    Also, it may be a requirement in some applications for nodes to remain radio-silent for long periods of time. For example, if two nodes both have sensors processing overlapping information, then there may be little need for them to coordinate about what they are sensing. Taking advantage of this sort of radio silence allows nodes to coordinate without communicating! The fact they are hooked into the same “database” of values is the only coordination required. When there is untrustworthy “middleware”, such as a signal jammer, coordination without communication is essential to scaling. In this situation, a signal jammer is creating a situation where there is weak connectivity and reliability, ordering and data integrity might be computationally impossible.

    It took me awhile to understand the idea of radio silence in a programming language design and even network sensors/actuators context.

    • dmbarbour says:

      It’s easy to escalate a wide variety of failures to ‘disruption’, and it’s relatively easy for application developers to reason about, propagate, handle, and recover from a binary failure condition at predictable boundaries.

      I would be unable to say the same of metrics such as “confidence that the observer correctly interpreted the message”. Where we do need ‘confidence’ in our interpretations (and we will, should we try to interpret voice, natural language, physical gestures, body language), I think that would be better served more explicitly in the domain model, and independently of the disruption issue.

      Intermittent network connectivity is only one common reason for disruption. We also face node destruction (e.g. bad encounter between robot and IED), undersea operations, administrative disruptions (e.g. due to quality of service or real-time policy violations) or shunting resources to higher-priority tasks.

      > coordination without communication is essential to scaling

      A certain degree of autonomy and code distribution, allowing for intermittent communications, is important for scalable systems. I wouldn’t go quite so far as suggesting we work entirely without communication, though.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s