Remarks on JavaScript Whitespace

Thursday, April 14, 2010

Many decry comma-first as being "ugly" or "unclean". However, this is a statement about one's brain and habituation, not about the code itself. In my opinion, "clean" is defined as "easier to interpret quickly". If a given coding convention makes differences look different (especially errors) and another coding convention makes errors harder to spot, then it's clear that the first is more "clean". If you disagree, then apparently, "clean" just means "looks like what I saw yesterday and the day before". This rubric is worse than useless, as it specifically prevents innovation or improvement.

- Isaac Schlueter

People seeing my code formatting conventions for the first time sometimes ask "Wow, how do you read that!?". The answer is "better than any other style I've ever seen". In fact, the code I write is optimized for readability, and I find it easier to read than any other style I have used.

However, habituation can be hard to overcome, and people frequently confuse unfamiliarity (subjective; an aspect of the observer) with ugliness or uncleanliness (objective; an aspect of the code).

The Ragged Left Edge

The first thing you need to know when reading code is where the control flow statements are, and where blocks begin and end. If the next 4 lines are an if statement that tests the debug flag, you can often skip that entire block, just as soon as you ascertain where it ends; if the function you're looking at is not the one you're looking for, you want to skip to the next function as quickly as possible, and so on. Whether getting acquainted with a program for the first time, or looking for a specific location in code we know very well, a large part of what we do is simply scanning for the start and end of control flow structures, so we should structure our code to make this easy.

To accomplish this, we use indentation to encode structure in the left edge of the text, where it can be scanned quickly.

Here's an example, with line numbers added, from Douglas Crockford's code conventions:

 1    if (condition) {
 2        statements
 3    }
 4    
 5    if (condition) {
 6        statements
 7    } else {
 8        statements
 9    }
10    
11    if (condition) {
12        statements
13    } else if (condition) {
14        statements
15    } else {
16        statements
17    }

As we can see, the control flow structures can be easily followed by looking at the left edge alone.

The rule here is that any line which is subsumed within or "under the control of" an earlier line is indented further to the right. To find the end of a structure, one scans the left edge looking for a line which is at the same level of indentation.

The second thing you need to know about control flow structures, after where they begin and end, is what kind they are (while loop, if statement, else clause, etc) and what relation they have to the rest of the code. To make this as immediately obvious as possible, we should put the most important structural token on the left edge. When scanning the left edge, one wishes to know whether the block in question is interesting or not, and the token that is most likely to determine this should be first on which the scanning eye falls.

In the example above, this criterion is met for the if statements, but the else clauses on lines 7, 13, and 15 have the closing brace of the previous clause preceding the else, so when scanning down the left edge, the first thing the eye sees is this closing brace. The closing brace doesn't even participate in the block starting on that line, it participates in the previous block, so to determine whether the block that actually starts on that line is interesting, it is useless. We can do better by moving the uninteresting closing brace to the end of the previous line, to which it logically belongs.

 1    if (condition) {
 2        statements
 3    }
 4    
 5    if (condition) {
 6        statements }
 7    else {
 8        statements
 9    }
10    
11    if (condition) {
12        statements }
13    else if (condition) {
14        statements }
15    else {
16        statements
17    }

Now the most important token is the leftmost, in both the if and else lines, and we can read straight down the left edge to get the control flow.

However, we still have those closing braces on lines 3, 9, and 17, taking up valuable positions on the left edge. What the braces tell us is where the blocks end, but this is already encoded by the indentation of the other lines! For example, the brace on line 3 tells us that the if statement from line 1 is ended, but we would know this anyway as soon as we see line 5 which is indented by the same amount. These braces in the left edge carry no new information, and the redundancy is simply visual noise. We should not tolerate noise in the all-important left edge, so let's put all the closing braces on the right, directly after the statements which they terminate.

 1    if (condition) {
 2        statements }
 3    
 4    if (condition) {
 5        statements }
 6    else {
 7        statements }
 8    
 9    if (condition) {
10        statements }
11    else if (condition) {
12        statements }
13    else {
14        statements }

Now we have a noise-free left edge. Everything in that edge bears important information about control flow, and simply scanning the left-most tokens gives "if, if, else, if, else, else", which is the highest-level overview we could ask for of this code block. Additionally, we saved three lines.

If you are a programmer you may already be so used to the dominant brace style that you actually doubt that this last example is clearer than the first. If so, I challenge you to convert a project you work on to this style and try it for a week. Alternatively, find the nearest non-programmer and ask them which of the three examples above is clearest.

With this style, everything visible in the left edge is the beginning of a meaningful line of code, there are no left edge positions wasted on a mere closing brace. The indentation rule is very simple: if and only if the line is subsumed by a line above, it is indented further to the left. Lines that are equally indented, such as lines 1, 4, and 6 above, are siblings, at the same level of significance in the overall structure. There is never a line containing anything as useless as a mere closing brace, rather, closing punctuation that would be redundant with indentation is put on the end of the line at the end of the block.

Vertical Space

Vertical space is precious.

If you're a professional programmer, how frequently do you work with programs that fit entirely in a single vertical screenful in your editor? It's a worthy goal to keep every program or component this small, but rarely achieved.

Scrolling to edit somewhere else in the same file is relatively cheap, and active searching (e.g. Ctrl-F or '/' or whatever your editor uses) makes most manual scrolling while editing unnecessary (though reading new code is a different matter). However, the screen isn't just the place where I edit text, it's also a kind of cache for my working memory. The more of the program I can see at once, the more of the program I can retrieve and reason about at a glance. I may not be able to significantly expand my working memory, but I'll take any advantage I can to keep more of the code in sight, where I can retrieve relevant facts from the screen, rather than some unknown place in the scroll buffer.

Given all this, I'm surprised at how profligate most developers are with vertical space, throwing it away as if it had almost no value at all. The example of closing braces on separate lines is one we saw above, but there are many more.

Here's an example, taken from the JSLint source code, and spanning 19 lines of precious vertical space:

    function assume() {
        if (!option.safe) {
            if (option.rhino) {
                combine(predefined, rhino);
            }
            if (option.devel) {
                combine(predefined, devel);
            }
            if (option.browser) {
                combine(predefined, browser);
            }
            if (option.windows) {
                combine(predefined, windows);
            }
            if (option.widget) {
                combine(predefined, widget);
            }
        }
    }

Seven lines suffice:

    function assume() {
        if (!option.safe) {
            if (option.rhino)   combine(predefined, rhino);
            if (option.devel)   combine(predefined, devel);
            if (option.browser) combine(predefined, browser);
            if (option.windows) combine(predefined, windows);
            if (option.widget)  combine(predefined, widget); }}

In addition to being parsimonious with vertical space, is this not also more readable? The control flow could hardly be made more clear. I've also added extra horizontal whitespace as necessary to vertically align the combine function calls. The human visual system excels at pattern recognition and identifies parallel structures nearly instantly when they are lined up like this.

When the shorter code is also more readable, who would waste more than half of the vertical space in the editor window for nothing? And what possible good can we say about those final three closing braces, each alone on on an entire wasted line? The above is a cherry-picked example from the JSLint source code, where the vertically compact style happens to also be more readable, and even shorter in bytes, and thus better in just about every way we can measure. Sometimes, however, there is a conflict between maximizing readability and minimizing vertical space. The correct tradeoff may depend on how "hot" the code is with respect to the idea mentioned above of using the screen to enhance effective working memory. If the code is not challenging or interesting, and there is no reason to make it easier to see more of it at once, then readability should be favored over vertical compression; otherwise vertical compression should be favored, to maximize the comprehensibility of the code.

A rule I abide by is that a single function should always fit in a single editor screenful. (What a screenful is depends both on the editor and the screen, so of course this is not a precise rule.) Not only any function, but any indented block, should be short enough that both the beginning and end of it can be seen on the same screen. If this is not the case, it needs to be split apart or rewritten.

In the case of the common idiom of using an anonymous function to create a top-level scope for an entire program, the rest of the program should not be indented one step simply because this idiom is in use; the enclosing ";(function(){" and "})()" lines are not quite first-class lines of code, just talismans to ward off the evils of global scope pollution. When you read code, every step of indentation represents a stack frame in your mental model of the code. You need to know "I am inside a function, inside an if statement, inside a for loop...", and when the next code block doesn't begin flush left, you must ask why. It's not worth indenting an entire file just because it's wrapped in a function for scoping reasons, anymore than you should indent an entire C header file when wrapping it in ifdef preprocessor directives.

Comma-first Style and the Left Edge

Isaac Schlueter commented on the comma-first style, which was the initial spark for this post, and gave the following example:

// standard style
var a = "ape",
  b = "bat",
  c = "cat",
  d = "dog",
  e = "elf",
  f = "fly",
  g = "gnu",
  h = "hat",
  i = "ibu";
 
// comma-first style
var a = "ape"
  , b = "bat"
  , c = "cat"
  , d = "dog"
  , e = "elf"
  , f = "fly"
  , g = "gnu"
  , h = "hat"
  , i = "ibu"
  ;

The comma-first style has commas in the left edge, directly under the var token. This immediately makes the structure of the code clear, and lines up the variable names as well. If one is scanning the left edge for code structure, the commas clearly indicate that the var statement continues across the subsequent lines. If scanning these lines, the information each left-most token gives is "starting a var statement, continuing the var statement, continuing..., var statement is ended", whereas the standard style example gives "starting a var statement, there is a b here, there is a c here...", and the fact that each line starts with a different character makes the parallelism harder to see. The statement in comma-first style can be more easily read by scanning the left edge, unless the particular variable names are of interest. Visually, then, this style emphasizes the structure of the statement over the contents, making it easier to navigate visually within the structure. It is because the structural information is emphasized that this style makes structural mistakes, such as dropping a comma when editing, easier to see.

I typically put var statements all on one line after the opening brace of a function, and then initialize variables separately, but for intermixed variable declaration and initialization, I use the comma-first style to make reading and editing slightly easier. I get the most use from comma-first style with object and array literals.

In the case of literals, I generally position the opening brace naturally within the code, and then indent the commas to the same depth on subsequent lines. Sometimes I put the closing brace on the line below the last comma, to emphasize the structure even more and make editing easier, but usually I favor vertical space and put the closing brace directly after the last item. If the literal is large or deeply nested, I might put the opening brace on a new line to reduce the total amount of indentation.

Here is an example with a return statement containing an object literal:

function parse_input(input){var m
 m=/^(?:(new|old|current) )?(?:([a-zA-Z_]+) )?([a-zA-Z0-9/_]+)$/.exec(input)
 if(!m)throw new Error('could not parse '+input)
 return {path:resolve(path,m[3])
        ,type:m[2]
        ,mode:m[1]||'current'}}

Following the left edge makes this code easy to read: first there is a function, which contains an assignment statement, an if statement, and then a return statement. The indentation makes the last two lines quite obviously part of the return statement, and the structure of the returned object is immediately clear.

Functions

I write in a function-heavy style, creating many functions which are used in only one or two places and serve to increase readability by naming a chunk of code the meaning of which would have to be puzzled out if inlined. I prefer to read if(flag)reset_position(obj) rather than if(flag)obj.pos=obj.stack[obj.stack.length-1], even if reset_position is used nowhere else. Of course one must put the the reset_position function somewhere nearby. One of the benefits of treating vertical space as valuable is that the reset_position function may already be visible on the screen.

One popular style would write that function like so, spending at least three lines (four if separated by a blank line from other functions):

function reset_position(obj){
    obj.pos = obj.stack[obj.stack.length-1];
}

As you already know, I would not waste a line on the closing brace, but in fact I would only allow one line for the entire function:

function reset_position(obj){obj.pos=obj.stack[obj.stack.length-1]}

There is nothing wrong with putting a simple function on a single line. If a function is only used internally by another function, I often declare it in that function, usually at the end, as in this example:

function v6_dependencies(opts,rules){var ret={},deps
 go(opts.start)
 return ret
 function go(rule_name){var rule
  if(ret[rule_name])return // it has already been processed
  rule=rules[rule_name]
  if(!rule) throw new Error('Rule required but not defined: '+rule_name)
  ret[rule_name]=rule
  v6_direct_dependencies(rule.expr).map(go)}}

This also has the advantage that the function can be given a very short, descriptive name (in this case "go"), without concern for name collisions, since the inner function is local to the scope in which it appears.

When a function uses local variables, I use a single var statement. I almost always put this statement on the same line with the function parameters and opening brace, as in the example above, again to conserve vertical space, and because conceptually the var statement which allocates local variables is a part of the function signature as much as anything else. When the function is entered, space is allocated for each of its local variables, and JavaScript variables have function scope, so it makes sense to put them all at the top. Putting them on the same line saves vertical space, and increases readability: if you want to know which variables are local to the function, there is only one place to look, and any variable not found there must come from an enclosing scope. Sometimes this rule does make editing a little harder and requires a bit more work when writing a function, but for me the readability is worth it. With debugging statements that are added temporarily I will usually cheat and use a var statement at the point of use.

In for loops, it is common to assign a variable to the current element from the set that is being iterated over. As with functions and var statements, I often also put these on the same line with the opening brace for the for loop, for similar reasons. Conceptually, the assignment is more a part of the for loop control than a part of the loop body.

Semicolons

In JavaScript, semicolons at the end of statements are (usually) optional. I'm not going to discuss semicolons here, as they are not whitespace, except to say that I leave them out in my own projects. To me, code without semicolons is more readable, and the semicolons are a form of visual noise and a distraction, albeit a minor one. Whether you write JavaScript with or without semicolons, you need to understand the rules of automatic semicolon insertion, which is the topic of my next post.

Priorities

The ideal coding conventions for a particular situation depend on what you are optimizing for. If you are optimizing for writing code quickly, for readability, or for easily spotting syntax errors, you may end up with very different conventions. I prefer to use a text editor which alerts me to syntax errors as I type, so I am less likely to rely on formatting conventions to do the same. I generally optimize for readability, and not just the ability to easily run one's eyes over the code, but the ability to comprehend the code even when the logic is complex.

Final Notes

I've recently seen a few people overheat and say things like "I'm taking a stand against this idiotic practice" in regard to other people's preferred formatting conventions. If you find yourself about to use words like "idiotic" in a discussion about whitespace, you should probably take a break from the discussion until you can take a more rational, objective, and polite perspective.

If you criticize a style you've never used yourself, your criticism has almost no value, so before you knock it, try it first.

Trying to convince someone to prefer your style is usually a waste of time. These choices are largely arbitrary, like your taste in drapery or fish, and it would be absurd to argue that tuna is tastier with someone who prefers salmon. If there's a style you like, make your case and then let others choose according to their own taste.

In addition to taste, formatting conventions are influenced by the purpose of the code, the constraints under which it is written, and the tools used to write it. Don't hesitate to adapt your style to the circumstances, and don't be surprised when others do the same.

It is ridiculous to use formatting conventions as a proxy for code quality. It's a sign of just how difficult judging code quality is that people resort to useless criteria, but I've actually heard people say things like "Project X does what I need, but doesn't [use semicolons, pass JSLint, use my preferred brace style, etc], so I kept looking". If you are reduced to picking software on the basis of whether the author prefers your particular coding style or not, you may just as well admit that you do not know how to discern code quality and flip a coin instead.

In your own code, experiment all you like with formatting. Common practice leaves plenty of room for improvement, and experimenting will help you read and write code in unfamiliar styles you may encounter. If you are contributing to an existing project, you should adopt the conventions already in use there. Learning to adapt to existing conventions is part of becoming skilled at your craft.