Automated Automation

Tuesday, February 7, 2012

Automated automation is the ability of a tool to aid the user by deriving the automation of a task from the observation of a performance of that task.

We have many tools that permit automation. Scripting languages and shell scripts are an excellent example. If you can perform a task in bash, you can write a script to perform it, use cron to schedule it, or build a Web service around it.

However, performing a task and automating the task are still separate activities. In a more efficient system these activities are the same. Performing a task automatically automates that task.

This is one of my goals for the ActiveShell project. After you perform a task in ActiveShell, you have already automated it. Writing a shell script is not a separate activity, but merely giving a name to a series of operations already performed.

Another case of automating automation is the GIMP non-destructive editing UI work described recently by Peter Sikking. In this system, user actions update a graph which describes the relationship between image inputs and image operations which combine to create the final output image or images. Rather than destructively updating a bitmap, the user is interactively creating a program that generates that output. We can already script image processing tasks using libraries and programming languages, but that is not the same activity as directly using an image editor. By bringing these activities together, these systems open up a world of power and convenience to the user without requiring a separate and seemingly unrelated skill.

In text editing, the analogous innovation is to save not merely the buffer, but rather the full history of editing actions that brought the document from the original empty document to the current state.

Rather than saving the instantaneous results of the user's efforts and forgetting the efforts themselves, in a deep sense these systems capture more of the actual work the user is doing.

I believe stored actions will displace the saved-snapshot model in all creation and productivity interfaces. Action oriented editing provides immediate benefits such as unlimited and non-linear undo, "time-travel" editing, and eliminating File→Save, but also opens other more dramatic possibilities. Action histories contain information, lost by snapshots, that correlates with likely future edits.

The next step in automating automation is to automate the detection and recommendation of what can be automated.

When editing text, I might copy a plain text file, and begin editing to create HTML. At the first line of the first paragraph of plain text, I add a <p> tag, and a matching closing </p> tag after the last line. After I have done this on three paragraphs, a sufficiently intelligent text editor can recognize the pattern and offer to automatically add these tags to the remaining paragraphs of text.

For this the editor must recognize the structural significance of plain text paragraphs separated by blank lines and identify the start and end of these paragraphs as the location of the related edits (possibly interspersed with other edits). It must infer the emerging pattern ("first three paragraphs edited similarly"), and extrapolate and suggest a reasonable continuation ("edit the remaining paragraphs analogously").

After approving the editor's suggestion on paragraphs, I might begin manually converting URLs, text emphasized by asterisks or slants, and inline code marked by backticks to the corresponding HTML elements. When the editor can identify and extract the rules that govern these manual edits, it can automatically automate them. With a single intelligent text editor feature, we replace Markdown and every variant of it, as simply special cases of an automatically automated operation. Of course, this is more general as it may not have been valid Markdown (or anything else) that we began with. We begin with what we have, create what we want, and let the editor derive the transformation.

A spellchecker also is a system for identifying and suggesting likely future editing actions. When operating over aggregate data from many users and editing histories, a text editor will find that "thsi" appearing as a word in English text is with high probability eventually replaced in further editing by "this". The intelligence in a spell checker, however, is not in looking for misspellings that have occurred before, but rather in identifying misspelled words (low-probability character sequences) from the vastness of unseen possible errors, and guessing the most likely correction.

Generalizing these examples, we find two capabilities: to recognize states in which edits are likely, and from such a state to suggest an editing action that the user is likely to accept.

When editing programs, renaming a method and updating call sites in one step is another instance of this pattern. This common IDE feature would be obviated by a more general feature that detects such patterns of edits and synthesizes such recommendations. In such an editor, when renaming a method, the corresponding rename of the call sites would be suggested by similarity to previous editing histories. The potential to emerge from a general feature rather than needing to be specifically programmed is significant, because it means that similar gains can be expected from the recognition and automation of other editing patterns of which we may not even be aware.

How feasible are these features? The first step is to store edit action histories rather than merely storing the end result of actions. These are the raw material on which these algorithms operate. Having that data available, we need recognizers which identify patterns in editing operations and in the data being edited. This requires interpreting data at the semantic levels relative to which these operations make sense: in a program, the nodes of a syntax tree; in English prose, the words, phrases, sentences, and paragraphs; in image editing, the shapes and objects (recognizable by human or machine vision) occurring in the image. Such a recognizer takes as input a sequence of editing actions and the state (possibly already annotated by other recognizers) of the data as it exists after those actions, and produces a probability that the pattern it recognizes exists in the input. For example, a renamed method definition, or an image of a human face with the red-eye artifact, are machine-recognizable patterns highly correlated with future edits. In the second step, the pattern generates a suggested editing action, such as renaming call sites or applying red-eye removal.

Both the recognition of states in which edits are likely, and the generation of edits can be partly determined by analysis of editing histories. Edits such as swapping adjacent letters are highly correlated with starting document states that can be characterized as prose containing low-probability character sequences. If lower-level recognizers can identify such document states and such edits, the correlation can be derived. These kinds of correlations can let us directly calculate some of the probability values that Peter Norvig describes in his spelling corrector article.

We already have accurate classifiers which can be used to spot patterns in email handling. An email filter rule is simply the automation of an observed behavior on emails that match a particular class, and the technology exists to automatically classify messages and derive the rules from observations, rather than requiring users to manually configure email automation.

A system that suggests actions may see those suggestions either applied or rejected by the user. The success rate of suggested actions is a useful feedback to tune the system, by promoting those recognizers that yield popular suggestions, and devaluing those that generate spurious suggestions. There is a cost in user attention to suggesting an editing action, and this must be balanced against the gain. Automatic edits should be suggested when the benefit to the user of applying the automatic edit times the probability the edit will be accepted exceeds the cost of suggesting an action. None of these values are known, but all can be estimated from edit and suggestion histories.

We can describe a continuum from no automation to fully automated automation where the system becomes an assistant, recommending actions that allow the machine to do more of the work. Currently we have automated systems, but the automation is seldom built or extended by the same people that use the system. The ability to automate ad-hoc processes is available to relatively few, and most of the work that could be automated is not automated because automating it is a skilled manual process. We should not expect this to change, until the tools themselves aid in the automation. By building tools that automate automation, those who can perform a task become empowered to automate it away.