Today’s applications and services are very committed to what they do for us. And I mean that in a bad way. When we send a message to a service, or call a procedure, or press a button, it is generally difficult to halt or undo. This has consequences for HCI design – buttons, toggles, “are you sure” dialogs. We aim to ensure the user is as committed as the process will be.
Ensuring user commitment prior to action has many implicit costs:
- consumes user resources – attention, time, a free hand
- hinders use of individually low-confidence or imprecise user inputs – face, emotion, eye focus, posture, gestures, grunts, mumbles, whispers
- hinders use of individually low-confidence or imprecise contextual clues – time of day, day of week, location, information in e-mail and on calendars
- inefficiency of process – humans tend to commit at last possible second, so computer wastes a lot of time idling
- burdens user with understanding consequence of action; hinders exploration
Indirectly, the problem of commitment has hindered application of computers in whole fields – battlefields, search and rescue, anything fast paced that demands attention and both hands.
It hasn’t much hindered me. Yet. Like many programmers, I spend time at a desk with big monitors, a clacky keyboard, and time to think about what I’m doing. There are times I’d like to just look at a document in order to type into it, or browse the internet hands free while enjoying a sloppy meat sandwich. (Or maybe it has hindered me. Opportunity cost is difficult to know.) But with affordable, fashionable COTS wearable computing on a near horizon the issues have been entering my thoughts more frequently.
For a desktop we have clicks and clacks and cursors – unambiguous, focused, effectively instantaneous. Clear. Committed.
For wearable computing we will have a few unambiguous inputs, e.g. buttons on arms of our glasses, maybe a small stick device in our pocket (e.g. three buttons, a joystick for the thumb, nine-axis acceleration). But for the most part we’ll utilize gesture, voice, and context. We want those inputs to be subtle, i.e. so we don’t jab the guy in the next seat with an elbow or draw attention to the fact we aren’t paying attention to a business meeting. Inputs will be incomplete and not entirely coherent – due to haste, junk our hands, food in our mouths, concurrent tasks like driving or jogging. Gestures, words, sentences, thoughts and intentions can be aborted halfway through. We’ll also reference our physical environment often. Boorish pointing and staring is possible, of course, but subtlety would reduce that to a light touch, a slight nod, a twitch of the wrist and finger – a glance to draw attention and a gesture to distinguish objects.
(Aside: Watch the touche video for a possible gloves-free gesturing technology.)
We will need user interfaces that can deal with our sloppiness and ambiguity, preferably without slowing the users down or demanding too much attention. This requires effective ambiguity resolution, recovery from misinterpretation, and adapting to better estimate the user’s intentions.
We won’t be able to achieve these properties in our UIs if the processes they control remain as committed as they are today. Services and processes must be more forgiving, and in a deep, consistent manner that can readily be leveraged in UI design. In practice, this will require a mix of:
- Ability to undo consequences of an action after taking it.
- Ability to predict consequences of an action and abort it.
In both cases, we abandon commitment to action. The latter shifts commitment to consequences. The former abandons commitment, period.
After Abandoning Commitment to Action
If we can reduce commitment in our programming models, that has a direct consequence on our user interfaces. User agents will take action based on low-confidence inputs and context clues. When they don’t know what to do, but have a few good guesses, they’ll present us with suggestions. If a user agent starts taking action and they’re doing the wrong thing, we’ll stop them or have the action reversed or corrected. The user might provide more clarification, and the user agent might provide more suggestions (in terms of anticipated consequences). This becomes a collaborative experience.
Today, GUIs are dumb controllers waiting for human input (then rushing to complete everything ASAP in the last 100 milliseconds). There are some exceptions to that, e.g. auto-complete and search engines both learn from the user. But those exceptions are limited to a few domains where reversing a poor choice is cheap.
Without commitment, user agents become trusted partners, servants that work in the background, anticipate our needs, provide suggestions, obtain clarification in a dialog. It won’t be like talking to a human; a user agent needs no self-awareness, no agendas, no intelligence beyond predicting its user’s actions. (Individual applications could add more intelligence.) The better predictions user-agents make, the more work they can do in advance – rather than 100ms latency, we can actually have negative latency (i.e. the web-page is loaded just before we asked for it).
Initially we’ll need some clear, unambiguous way to say `no` – causing the user agent to abort and undo an action. But eventually the agent will learn even how we behave when we disapprove, and (based on context clues) learn when we would disapprove.
The user experience becomes more streamlined – hands are kept free for more interesting tasks; attention can be directed to more critical and interesting problems; computing becomes more accessible in more domains – wearable computing, ambient devices, control over a house, etc.. With prediction or undo the burden on users is reduced – they can be presented options in terms of realistic consequences, rather than in terms of efforts. Users have more freedom to explore options without fear of breaking the system.
The machine also benefits. Prediction, planning, and world-modeling software becomes far more compositional and more powerful (i.e. since we can see potential consequences of a plan not only in the open system, but also in the predicted world models and how they’ll affect the plans of other agents). Further, we’re simply making effective use of what would otherwise have been idle cycles. Granted, we might be looking at an order of magnitude more computation than we’re regaining in utilization. But an improved user experience may be worth that.
How to Abandon Commitment to Action
At the moment, I don’t have a satisfactory answer for how to abandon commitment.
It is trivial to abandon commitment in a small, closed system. Simply copy the system, run the program with different inputs. The challenge is making the solution open, scalable, efficient, controllable (e.g. time and space), multi-user, securable, distributed, and able to handle multiple ambiguous actions in parallel (so we can present multiple possible consequences to the user at once). I would not be satisfied by a solution that is limited to trivial problems.
I explored transactions for this purpose early on. The idea was simple: to explore a possible future, and abort if you don’t like the results. With MVCC, we can explore multiple futures in parallel. However, transactions do not support deep prediction in open systems. For example, if we use a blackboard metaphor, the `effects` we are most interested in predicting will be caused by concurrent agents.
In 2008-9, I pursued approaches to distributed transactions while preserving object capability model properties, and I eventually developed a transactional actors model. For each outgoing message, an actor could send it either within the current transaction, or in the parent transaction (which might be the real world); any incoming message can start a transaction. Transactions could thus be extended across a blackboard metaphor by use of an explicit publish-subscribe pattern. Unfortunately, transactions don’t scale effectively. Even under ideal conditions, they’d be subject to a lot of interference and rework. Under less ideal conditions, a denial of service attack would be trivial.
Another promising seeming option is time warp protocol. With time warp, we have a message passing system and the goal is to ensure all messages are processed in a consistent order. For parallelism, processing is optimistic. Occasionally, a message is processed out of order. When that happens, we’ll move the system back a snapshot and replay messages. We send anti-messages and the new messages, and might cause other machines to replay parts of the system. The original, developed in 1985, was unsuitable for open systems. The modern `lightweight` time warp protocol (linked) is much more modular and more applicable.
But time warp would barf if we were using it just to choose between possible futures. It really depends on optimism, on being `right` most of the time.
In 2010, I began developing Reactive Demand Programming (RDP), and I eventually decided to make time explicit in the model, i.e. sending whole future-of-the-signal updates instead of sending each value in the signal. This was sparked by a reminder of time warp (which, by that time, I hadn’t thought about for seven years).
By modeling time explicitly in RDP, I eliminate coupling to update arrival orders, and hence get some nice consistency properties. If I send anticipated future of every signal, propagating those futures across computations, I can benefit from a lot of optimistic processing of that future. Systems can prepare resources (e.g. load the right textures) in advance, better interpolate values. Even better, it makes RDP very resistant to network hiccups: when updates straggle, I still act on probably-mostly-good anticipated values; when connection is lost, I have a small window of good signals while preparing for graceful transition; by logically modeling delay, I can mask variability in latency.
Anticipation in RDP also makes a powerful foundation for avoiding commitment to action. But, as currently designed, you could only consider one possible future at a time. (If you send two signals, you’ll see one future in which both of the signals were present, not two possible futures.) In July 2010, I considered models of branching time for RDP. I rejected branching time because it results in a combinatorial explosion of possibilities. Combinatorial explositions tend to make a terrible mess of compositional reasoning.
Without multiple futures it will be difficult to achieve my vision of presenting a menu of real consequences to a user, who may ultimately select between them.
But what if there was another option to eliminate combinatorial explosions?
Assume a user agent renders a menu of three possible consequences based on the agent’s best guess of the user’s intention based on context and input. The user picks one. Now what? We don’t magically know which user-input led to that particular consequence. Well, there are ways to work around the model – add annotations or an omniscient observer, hack, hack, hack. But a better approach is to transform or augment the model then work within it.
I propose to systematically select inputs. When the user chooses between possible signals, this is communicated back to the previous process, which chooses its inputs accordingly – and so on, until we’ve finally selected (i.e. recognized) the user’s initial intentions. This design can work with black box behaviors in open systems. It is also very compositional: intermediate components will select input signals based on context, will introduce more possibilities based on ambiguous input, will even learn from the choices made.
To make this work, I’d need a new kind of signal to represent this sort of choice. Neither product nor sum types are appropriate. But there are concepts I’ve seen before that seem applicable.
- A notion of weighted references is sometimes used in mixed linear type systems (like Plaid). When you fully own a reference it has a weight of 1. When you share a reference, the weight splits in half. It is possible to recombine weights and eventually get a full linear reference again.
- A notion of multiplicative disjunction (also `par` or an upside down `&` symbol – ⅋) from linear logic that represents a sort of choice by the consumer (whereas (+) sum is choice by the provider).
I take the concept of a signal and add weight to it. Normal RDP signals (including products and sums) have a weight of 1. If a service receiving a signal thinks its meaning is ambiguous in context, it splits the signal among multiple interpretations and assigns a positive rational weight to each. All behaviors obey a simple and auditable conservation principle: the aggregate sum of weight of output possibilities equals the aggregate sum of the weights of input possibilities.
To select and filter input signals is achieved (without breaking conservation) by shifting weight from one input signal to another. The shift in weight is propagated, i.e. as an extra output signal from every behavior. Selection is not based on ambiguity, but rather based on preference – i.e. we may prefer demands that are easier to implement, don’t conflict with other demands. We can also combine signals when they have the same effective meaning (which might depend on state or context). The user-agent ultimately has a preference based on the user selection.
There is an issue of what to do with effectful behaviors. It might be acceptable to load a texture with low confidence (or small scale preview), but not so acceptable to e-mail a letter or delete a file. Though we could still report the file as deleted on that path, i.e. anticipating what would happen. Each service and resource would have its own policy, would observe weights and choose a reasonable course of action based on what can be undone and what can be predicted. In any case, the ability to observe the weight of a signal will have an impact on both the output and actual effects.
Consequently, when we shift weights, we need the new weights to propagate back, influencing effects. Potentially, the new effects affect the shifts in weight. Ultimately, we get a fixpoint weight reduction loop – one that would be infeasible to guarantee convergent in an open system. But this isn’t a new problem for RDP; divergent fixpoint loops RDP can at least be made incremental (i.e. convergent up to any instant) by modeling delay. It would be up to developer discipline and good models to achieve stability. I do wonder if there are any math tricks I could use to achieve more rapid convergence (e.g. shift weights by multiplying them by scaling factors rather than adding and subtracting? or maybe switch to ambiguous, relative metrics of strength?).
This seems a promising model. It needs a lot more work (it’s only a few days old) and a better name (it isn’t really about probabilities, just inspired by them). The whole management of weights will need a lot of work to make it expressive, easy to use, and semantically clean. Achieving performance may be an issue. Concurrent interaction with shared services and state, i.e. for multiple users, could greatly increase number of possibilities – but that could be countered by a suitable state model, or the state model itself only reporting the best possibilities on assumption of each path. It might be useful to indicate a “minimum” weight that a given behavior even considers so we can do some static reductions and type analysis based on it (and also control branching).
We need to handle ambiguity and abandon commitment to improve user experiences, to support broader ranges and domains for HCI, and even to improve the intelligence of our computational systems, to make it a partner in a dialog. But to make this usable in an open system, it needs to be composable end-to-end across arbitrary services, so we can make decisions based on likely consequences based on non-local knowledge. Probabilistic RDP is a potential model to make this happen. Transactions are not. Time warp is probably worth exploring further, to see if variations could work.