Against CAPTCHA

Let’s Not Waste Human Effort

Tuesday, March 13, 2018

Alice, Bob, and Spamuel ¶

When Paul Graham published A Plan for Spam in 2002, spam was mostly an email problem. Email and the entire landscape of online communication has changed deeply, but spam is still with us, both as a problem for email, and as a constraint on the designs of new systems.

The spam problem starts with someone we’ll call Alice, who wants anyone to be able to contact her, and Bob, who wants to contact Alice. Alice sets up a public inbox, which could be an email account, a web form, or many other forms. By “public inbox,” we that anyone can send messages to it, even a potentially anonymous member of the public, and that, like an old inbox tray on a desk, Alice will eventually handle and see every item that is sent to her.

This sounds a bit like putting a bucket in front of your house with a sign that says “all messages will be read!”. For most of us, that would be fine, and we would have no trouble reading the number of messages that we would get, because, among other reasons, not that many people pass in front of our house.

While Alice and Bob’s system still seems to work, a third person now comes along, who we’ll call Spamuel. Spamuel realizes that he can profit by using the system that we have set up in a way we did not intend. By the design of our public inbox systems, we have essentially sold Alice’s attention for nothing. Since there are people willing to pay a non-zero price for that attention, Spamuel can capture the difference.

The Public Inbox Problem ¶

This problem of attention is the “public inbox problem”.

The public inbox problem applies to email, online comment sections, web forms intended for commercial inquiries, personal messages sent to public figures or to organizations, companies, or brands, or to anonymous messages intended for any audience. Any system where messages are easy to send and are expected to be read must address this problem.

Initial responses to the spam problem in email came from email service providers, who bore the economic costs of the spam epidemic, and looked for economic solutions such as bulk filtering. False positives were a necessary evil from the service provider’s perspective.

Statistical filtering of the kind Paul Graham described worked well on the spam of the day, but it did not and could not fully solve the problem. Spam still exists as a problem which is only partly solved by large providers, and the partial solutions have all coincidentally further consolidated the email system. Meanwhile, email itself is losing ground to newer systems.

Systems designed since email have dealt with the spam problem in different ways, generally by making sending messages slightly expensive in some way. When the cost is monetary, usually these go to the operator of the system, creating various kinds of perverse incentives. “Walled garden” messaging systems from Facebook to Wechat have largely given up on these properties, which means that anonymous messages are impossible, and doing business through these systems requires the cooperation of the owners of the walled gardens.

Another way to disincentivize spam is to add an economic cost to the spammer which is payable to the recipient. This could be a small token attached to the message, a bit like a stamp in the postal mail system, but unlike a stamp, it would retain its value when used. The public inbox would reject any message arriving without the required payment, at a level set by the inbox owner. In cases where the message was desired, the token would be canceled or returned to the sender, so sending the message would effectively cost them nothing. Of course, when the message is spam, the recipient would simply keep the payment, which compensates them directly for their time and inconvenience. Such a spam tax would totally solve the spam problem. It would cost legitimate senders nothing, would still enable all kinds of marketing messages to be sent and delivered, would compensate the recipients of unwanted messages directly for their time and inconvenience, with none of the distorting effects of payments to a third party, and it is compatible with distributed, open systems such as email, rather than the walled gardens that have prevailed since. Why wasn’t this obvious solution added to email long ago? Unfortunately, like so many perfect solutions, it fails for entirely practical reasons, including the lack of an appropriate payment system, and the network effect of the existing email system. Since email, there has been no widely-adopted system that was as open and decentralized, so there has never been a chance to try this.

Another way to add a cost to senders, is by requiring not an economic payment, but an investment of time in exchange for access to the public inbox, and this is the solution that has been increasingly adopted on that other massively open and distributed system, the web.

The Guard: CAPTCHA ¶

I recently met the public inbox problem again in the form of spammers dropping useless form submissions into an online contact form for a business in Shanghai. The form exists to allow people to contact the business with as little friction as possible, and they are never read by anyone outside the business, so it is completely useless to send them, but Spamuel isn’t sending these messages manually at all–he’s just written a bot that finds any open forms on the Web and submits his messages to them automatically. This is called form spam, and it’s the same public inbox problem all over again.

However, the most common solution is different. The current solution for form spam is a CAPTCHA system which asks the sender of the message to do work to prove that they are a human. It seems like a good idea, many online services make it very easy to do, and it seems to solve the problem, so let’s return to Alice, Bob, and Spamuel, and see how the CAPTCHA solution affects each of them.

Bob’s problem - wasted human effort ¶

When Bob fills out the contact form on Alice’s website, a CAPTCHA asks him something useless, to make sure he is a human. In the case of Google’s CAPTCHA service, it may only ask him to check a box that says “I am not a robot”, surely a small inconvenience. On the other hand, if Google’s CAPTCHA service doesn’t know who Bob is, or Bob is connecting from a country, network, VPN service, device, user-agent, or anything else that Google’s CAPTCHA system doesn’t like, it may ask him to complete a long or never-ending series of challenging tasks. From Bob’s perspective these questions are purely a waste of his effort. Worse yet, if Bob happens to be in a country where Google services are sometimes blocked or always blocked, or if Bob cannot read, understand, and reply in the language the CAPTCHA is using, then his message may be completely blocked.

Alice’s problem - missed connections ¶

Alice probably notices that her spam volume has gone way down, but what she can’t see are the messages that she doesn’t get from people who give up, fail to pass the CAPTCHA, or simply connect from the wrong country or device. Alice’s problem is that she wanted a public inbox, but with a guard, who would create no extra cost for Bob while blocking only Spamuel. Unfortunately, an automated solution cannot outwit a motivated antagonist like Spamuel.

Just as in email spam filtering, no automated technical solution can increase costs for spammers without imposing costs on legitimate users as well.

Spamuel’s messages are now blocked by CAPTCHAs that his bots don’t know how to get past. If the CAPTCHA solution imposed a small problem on Alice and Bob, but locked out Spamuel, we might accept the cost, but unfortunately, as we will see, the problem for Spamuel is temporary.

Spamuel’s solutions ¶

Like someone in a Monty Python sketch, Spamuel is faced with a Guard asking a simple question, trivial for any typical English-speaking human to answer, like “What is your favorite color?” or “Are you a robot?” or “What is seven plus three?”. At the scale of the business he wants to operate, Spamuel cannot simply answer these questions himself, so he needs an automated solution, which forces him to get creative.

Of course, Spamuel can program his bots to answer the simple questions directly. That leads to an arms race, which means rising costs for everyone, and Spamuel doesn’t want to get into a spending war with the likes of Google. There’s an even simpler solution that turns out to scale surprisingly well.

If the questions are simple for humans to answer, then why not engage humans in answering them? The simple way this works is that Spamuel sets up a site, offering anything that people want in exchange for solving simple questions. As long as there is a dedicated stream of seekers, who are willing to trade a little time and energy for what Spamuel’s parallel site is offering them, then any CAPTCHA system is irrevocably broken.

The setup of this solution might be complicated, but since spammers operate at scale, they can afford to spend time setting up a solution. So Spamuel builds a bot that can simply proxy the questions from the Guard to the users of a parallel site, and then send back the answers of the real human users to the Guard.

When we protect a public inbox by asking Bob to do a simple task, we create the demand side of a market for the performance of simple tasks. The supply side of this market is one that Spamuel can easily set up, and with the benefit of scale, which Bob does not have.

Seen from above, the whole thing at this point begins to look like a farce. Alice asked Bob to do a little work for the Guard, to verify that they are good humans and not evil bots, and Spamuel tricked the Guard by giving that work to people who are trying to get access to something completely different, helping Spamuel without knowing that they are doing so.

The Guard’s benefit from Spamuel’s solution ¶

If we take a little detour into the Guard’s perspective, which could be Google or any company, the Guard has a benefit from the whole arrangement in the form of a steady stream of small amounts of human effort for free. Trivially small human efforts add up, and the Guard also operates at scale. Any company looking to train algorithms to do things that are still easy for humans but hard for algorithms has a potential benefit from such a system. This market of streams of free labor being redirected starts to look like an opportunity.

Solutions ¶

The problem with CAPTCHA is that it wastes or misdirects human effort. Just like bitcoin wastes electricity to fight cheating, CAPTCHA wastes human effort.

Instead of directing Bob’s work towards a third party, let us find a way to use the free labor to solve Alice’s problem.

Our solution for a contact form has two parts. First, after Bob’s message is submitted, we analyze it using an algorithm, to determine a spam score. We ask our algorithm to determine how likely the message is to be spam, but because we have no perfect algorithm for this question, we will always have some amount of error in what the algorithm tells us. Because our algorithm is imperfect, we need a human to review the messages and fix the algorithm’s mistakes. Naturally this work would fall to Alice, but since Bob and Spamuel are the ones creating the demands on Alice’s attention, if we can distribute the work to Bob and Spamuel then we will have a more equitable system.

False positives, where the algorithm incorrectly identifies a message as spam, can be handled as follows. When a message is identified as likely spam, we notify the message sender and offer options. First, they can do nothing, and their message will linger in a holding area until it is either re-classified by a human as non-spam and delivered to Alice, or continues to be classified as spam until it is eventually deleted. Second, we may offer the ability to edit the message, for example if someone had included a single sentence and a link, they might understand why the algorithm classified the message as spam, and alter it. Third, they can help the system classify messages until they have created a reduction in the demands on Alice’s time that is at least as great as the increase created by submitting their message to Alice’s inbox. In this case we are taking the classified messages in the holding area and presenting them for verification as either spam or non-spam. We can use Bayesian statistics to evaluate the trustworthiness of the person doing the scoring and to update the spam scores for the messages that they manually classify. If the classification work is high-quality, then we will lower the total uncertainty of the system about the classifications of messages, until we have lowered it enough to forward the original message to Alice. If the classification is closer to giving random answers, then our Bayesian analysis of the value of the work will be negative, and it will be impossible for them to offset the spam score of their message in this way.

From Bob’s perspective, if his message is classified as spam, he is still being asked to do work, so it’s perhaps no better than a CAPTCHA solution. However, the work that he is doing benefits Alice by reducing the demands on her time, and the amount of work he is asked to do is directly tied to the characteristics of his message, rather than to factors outside of his control. Although our algorithm will not be perfect, we are asking Bob for an input of effort that is proportional to our doubts about the quality of his message.

This solution solves Alice’s problem more neatly, as she actually has human effort directed to sorting out the demands on her attention, rather than that effort being wasted or directed to a third party.

What to do with the messages that fall below the threshold of directly being dropped in Alice’s inbox? If we immediately delete them, then we have no chance to correct our mistakes. The answer is that we keep them in the holding area and use them as fodder for subsequent challenges, and if they are consistently scored by humans as legitimate messages, then Alice will eventually see them. Alice also can have the ability to look at the holding area and manually classify any messages that should have been delivered but were not, which also helps train our classification algorithm. Thus the worst that can happen to a legitimate message that for any reason looks like spam to our algorithm is that it is delayed.

Issue: Privacy ¶

If a message is misclassified as spam, and as a result is held, and then shown by our system to third parties, there is a risk of information that should have been private being exposed. For public comment systems and for forms used to contact most businesses, this is not a great risk, but it must be considered more carefully in cases where messages are intended to be more private. Even with messages that are intended to be public, some messages are meant only to be read by the moderation team or administration, or are potentially inappropriate for consumption by anyone.

Note: Principle of Proportionality ¶

We assume that Bob’s time cost will limit the work we assign to Alice until Spamuel teaches us how wrong we are. We ask Alice to do work made for her by Bob when we set up an email-like system that makes Alice’s to-do list or INBOX a public bucket, free to add to from Bob’s perspective. We assume by a principle of proportionality that Bob’s time cost limits Alice’s work (proportional also to the number of people interested in sending messages to Alice). The negative force or evil energy in the universe or Moloch in the manifestation of Spamuel teaches us how wrong we are to naively design such a system. Hence the ongoing death by a thousand cuts of email-as-designed.

Note: applications and terms ¶

Note: the lazy method of citation ¶

I expect that what is written here has been said better and before by others, and someone may point out some of the prior work that I should have cited but didn’t. Paul Graham said something in a transcribed talk about “lazy evaluation of research papers.”–I’m not going to link to the transcript at the moment, but you can look up the reference and send it to me if that bothers you–in any case the arrangement of ideas here is what I’m interested in and so I’m happy to follow the lazy method of citation.

Note: Guards’ Benefit, Alice’s loss, Bob’s misdirection of effort: the principal-agent problem. ¶

Another angle of intellectual attack is in the terms of the principal-agent problem, which allows the Guard to benefit itself at the expense of Alice, once it is inserted between them as Alice’s agent.

Further in the direction of the principal-agent problem, we can consider specific ways in which Spamuel’s solution benefits the Guard at the cost of both Alice and Bob.