Augmented reality (AR) is a powerful foundation for the next generation of ‘integrated development environments’ for many reasons. Developers can have sprawling workspaces across a room, a building, even a mountainside (depending on how much exercise the developer desires). There is little need for panning, zooming, and tabbing. Instead of menu navigation, a developer may possess shelves full of virtual components waiting to be grabbed and arranged. A developer might open a catalog – a physical magazine, notebook, even a deck of cards – where each page opens to a menu of virtual components ready to be used or shared.
AR is especially viable for component based programming models, where developers primarily develop larger components by composing existing components, or occasionally edit existing components. Components have a physical metaphor. Form determines function and behavior. Use of names is minimized.
This article describes a vision of programming with augmented reality by means of generic binding of virtual components to physical representations. Binding to physical representations provides physical stability and persistence for the workspace, enables developers to spread out, to walk away and return. This supersedes and improves on my earlier vision of programming with pen and paper.
A physical representation may be:
- a recognizable object – e.g. a particular mug, bush, chair
- a texture – e.g. a block of text, surface of a desk, picture in a magazine
- a hand-written symbol or word
The binding means that, whenever you look at the physical representation, the AR glasses will project a visual representation of the virtual component. The projection itself may be displaced or extruded from the physical representation, may be larger or smaller, and has relative coordinates, size, and orientation. If the physical representation moves, the virtual object will move with it. If the physical representation is broken or damaged, the virtual object is no longer accessible by that means, but it can be recovered from a history. Any virtual component can be ‘destroyed’ by simple gestures (e.g. toss it over the shoulder) but can be easily recovered.
Bindings are often temporary. A developer might place an object on a surface, creating a temporary binding, then pick it back up, thus breaking the binding. The creation of bindings to physical representations is automated, with support of the AR glasses. If a binding cannot be formed – e.g. because the surface has no useful features – this is indicated by having the object bounce back to the hand.
Binding to hand-written symbols or words enables pen and paper programming. Such a binding might be formed by grabbing a virtual object and a pen in one hand, then writing the desired word or symbol. From then on (unless the binding is explicitly broken or overridden) the developer will have said component materialize whenever he or she writes out that word. Leveraging gesture recognition enables the visual recognizer to initially be somewhat more relaxed. After the virtual object materializes, it becomes associated with that specific instance of the hand-written word (based on imperfections and context), which enables further edits and manipulations. In case of ambiguity, a small menu of possible components might be raised.
Pen and paper bindings are potentially very efficient: a whole application might be whipped up by writing a few words and perhaps performing a few gestures to compose them. They are also flexible, e.g. one could write upon notecards for later rearrangement, or grab an empty notebook and create a catalog of components for quick access.
Aside: this approach to pen and paper programming doesn’t address free-form text very well because it doesn’t make sense to associate components at the level of individual characters. However, more specialized approaches can be taken, e.g. a more generic way to grab text from the environment, or projecting a virtual keyboard.
In addition to the augmented physical workspace, developers may have ‘virtual’ workspaces – e.g. like a rectangle that can be laid out wherever needed, which has components positioned upon it. Virtual workspaces have several advantages, e.g. it could be visible and shared by multiple developers or a single developer at multiple locations (e.g. home and office). Virtual workspaces may be accessed by gestures and speech, or bound to physical representations (even hand-written). Virtual workspaces are initially accessed by gestures and spoken words, but may also be bound to any physical representation. Finally, developers can access a history – components seen, touched, tagged, destroyed. A complete lifetime history is not infeasible, especially leveraging shared structure for compression.
Development consists primarily of materializing components into a workspace, composing them, then potentially wrapping the result into an opaque shell. The shell would hide implementation details (more as anti-distraction; developers can look inside if they want), and controls which interfaces are exposed. An ‘incomplete’ component may also have gaps or holes that need to be filled before it can be activated, e.g. like plugging in a graphics card or battery.
The act of composition involves placing compatible interfaces in close proximity. If they are compatible, they attach (somewhat magnetically). Otherwise, extra work is required. One can also ‘wire’ components together – a wire is effectively an identity function or component. To simplify safe composition:
- Interfaces are automatically shaped and colored based on type compatibility. Even if the visual aspect is not unique, this works well; e.g. if there are only 16 basic representations, then there is only 1/256 chance that two incompatible types have the same visual representation, and much less for pairs or triples of types.
- Easy search for virtual workspaces and histories for components with compatible types. E.g. if I want a JPEG to TGA conversion, it should be easy to find all interfaces accepting JPEG, outputting TGA, or both.
- While holding a component, perhaps highlight components in the workspace with compatible interfaces.
In addition to semantic interfaces, component enclosures may have 3D shape and color, documentation, and perhaps ‘debugging’ interfaces (gauges, lights, bells, whistles) that are only active under special conditions (such as debugging or live programming).
Developers can easily share code. A set of documented components might be shared directly or as part of a set of bindings (e.g. for a catalog). Either way, the components could be accessed by search. Multi-user virtual workspaces can feasibly be shared via a cloud. If developers really need external dependencies (perhaps for independent development), one could have a primitive that accepts a URL for an external component.
Each binding consists of a relatively unique recognizer, a few binding properties (e.g. relative offset, size, orientation, default copy vs. take), and component descriptor (or virtual workspace id). A component descriptor is recursive for all sub-components, and is formally pass-by-value but may take advantage of shared sub-structures when stored or serialized. A workspace identifier is a reference to an external workspace definition, which allows virtual workspaces to be shared.
An ‘application’ is simply a component with a standardized interface. Developers will be able to activate application-typed components, then debug them.
A challenge associated with debugging is that we may want some specialized views, i.e. so we can see what different subcomponents are doing concurrently even if they aren’t near one another physically. This would need to be another specialized feature of the augmented reality model, supporting ‘views’ – perhaps like portals that can see remote subcomponents. It may be preferable to build such views directly into the components to start with, as part of the component’s debug interface.
Anyhow, I think this is getting concrete enough to start pursuing when I have time.