This is the fourth post in a series of blog posts of excerpts of my paper Ethical and Legal Considerations of reCAPTCHA to be presented at PST 2012. The paper’s primary purpose is to provoke thought and discussion. I’ve signed a document prohibiting me from publishing the final copy of the paper, but I am allowed to post the paper as originally submitted for consideration, so here it is…
Consequentialist ethical frameworks
Ethical egoism & altruism
Two competing schools of thought, ethical egoism and ethical altruism, rule out reCAPTCHA, but for different reasons. Ethical egoism requires agents to behave in a manner most beneficial to themselves and any harm inflicted on others should be merely incidental to serving one’s self interests rather than by intention. Under ethical egoism, reCAPTCHA is ethically defensible: the entity issuing the reCAPTCHA challenges is merely attempt to better its own position. The wasting of other peoples’ time is a byproduct, not a goal, of being self-serving. However, there is another stakeholder: the reCAPTCHA solver. Unless the solver directly benefits from solving a reCAPTCHA instead of a CAPTCHA, the solver ought to protect his/her interests by avoiding or gaming the reCAPTCHA system.
If the would-be solver refuses to be taken advantage of, reCAPTCHA systems become useless. If, on the other hand, the solver directly benefits and all agents are acting rationally and ethically (in an ethical egoistic sense), the solver should still refuse to participate in the reCAPTCHA scheme. Consider the case where a solver receives less than or equal benefit than the average solver. By withdrawing his or her brain from the pool of available reCAPTCHA solvers, this solver could expect those whose share of the benefit is greater to continue to solve reCAPTCHAs. However, as each successive cohort of would-be solvers drops out, the number of solvers eventually becomes zero. This is, of course, assuming independent agents using a game-theoretically optimal strategy under ethical egoism. In this framework, if obeyed universally, there appear to be only one condition that would permit for the existence and usefulness of reCAPTCHAs: if the entire would-be reCAPTCHA solver population consisting of superrational agents (as described by Douglas Hofstadter) who, in effect, are not an independent decision makers AND that population were faced with a reCAPTCHA that benefitted them directly.
On the other hand, under the framework of ethical altruism, ethical agents would do things to benefit others at their own expense. Unlike an ethical egoist, a ethical altruist would potentially be ethically bound to solve a reCAPTCHA. However, an ethical altruist ought not ask another individual to do this unpaid work; instead of employing reCAPTCHA, they would employ willing individuals or use GWAP to complete the task that would have been embedded in the reCAPTCHA.
Kant postulates in his categorical imperative that an action is unethical if logical contradiction results when turning that action into a rule and universalizing it. Consider the formulation “If I have work to be completed, I will give it to others to do.” Universalized, this becomes “∀ person p, if p has work to be completed, p will give it to others to do.” Under such a formulation, this results in an endless case of passing off responsibility and no work would be done, negating the concept of “work”. This logical contradiction would lead to the conclusion, under Kant’s first formulation, that reCAPTCHA is unethical.
In Kant’s second formulation, he argues, as in a deontological framework, that people themselves have intrinsic value and should not be treated as a means to an end. As such, the arguments laid out in §Deontological ethical frameworks apply again. reCAPTCHA, using others as a means of shirking duty, is therefore impermissible from a Kantian-ethics viewpoint.
One of the more well-known ethical frameworks are the utilitarian ethical frameworks (act, rule, etc.). reCAPTCHA, when analysed from a rule utilitarian standpoint with the universal rule, “If I have work to be completed, I will give it to others to do,” is similar to the categorical imperative analysis (see §Kantian ethics); that is, society would be stuck in a perpetual cycle of redelegating work. Such a society would be highly dysfunctional (production would virtually cease) and would likely not last long; the total happiness of such a society, while it lasts, might at first increase (high happiness degree) but would eventually plummet (short happiness duration) as starvation kicks in and means of entertainment evaporate. Drops in population will also decrease the number of people experiencing happiness; thus, the total utility of the idea behind reCAPTCHA is low. This analysis continues to hold if suffering is considered in addition to a loss of happiness.
A different, more specific, maxim could also be constructed for the rule utilitarian: “If I have I have work to be completed, I will give it to others to do UNLESS I was assigned the task.” Such a society could possibly survive, but it would be plagued by robber barons that are shrewd at delegating work. Historically, inequality has resulted in the formation of unions, including a major strike wave in the United States during the Great Depression; such strikes were organized in an attempt at bettering the position of its members. Being an extreme action, it seems fair to (under)state that those union members were very unhappy for a prolonged period of time. Given the level, number, and duration of unhappiness compared to the happiness of the few, the modified maxim would result in a situation with low utility.
Obviously, continuing to refine a rule under rule utilitarianism means that it becomes much more useful to look at reCAPTCHA as it currently exists using act utilitarianism. As with rule utilitarianism, the deception and furtiveness of reCAPTCHA is permissible if the end result is an increase in utility. Here, it is sufficient to show an alternative to reCAPTCHA that has higher utility. To begin, the magnitude of reCAPTCHA must be quantified. Google alone displays more than 100 million reCAPTCHA s per day (Google Inc. 2011). Assuming that at least 15% of the reCAPTCHAs are displayed to humans ((This is a conservative estimate of the human-to-bot activity ratio based on the 4:1 spam-to-non-spam e-mail ratio (Radicati Group 2009). Though signing up for webmail may involve a CAPTCHA system, sending does not.)) and an additional five seconds per reCAPTCHA compared to a CAPTCHA (i.e, Wr vs. Wc), 7.6 million person-hours of free labour are committed to reCAPTCHAs per year. This is about 3800 people working 40 hours per week at this task or about, depending on region within North America, US$50-80 million of labour at minimum wage annually.
Not everyone who solves reCAPTCHAs are normally employed at minimum wage jobs; reCAPTCHAs, to solvable by people without impairment, will probably tend to be menial tasks that would only command minimum wage. Although solvers may not have been doing work at the moment, this takes away time from whatever else they were doing at the time. If salaries are positively correlated with the utility of one’s work (not necessarily true, given that crime can generate an income and wages are not on a ratio scale in the presence of minimum wage), the ability of the solvers is being used sub-optimally.
While the individual annoyance level at solving a reCAPTCHA might be low, this is multiplied by all the people solving them. Google, a major user of reCAPTCHA, could easily afford to pay people in other countries to perform the OCR task; this would employ individuals, keep money flowing in the economy, and remove the involuntary nature of reCAPTCHA, reducing annoyance. However, this might be too philanthropic an endeavour to keep shareholders happy. A different solution with even more utility exists. Google could divert funds towards developing an OCR system available to all. Either through encouraging interest and funding scholarships to do research in the area or by recruiting more experts, an intellectual powerhouse devoted to solving image recognition problems could be created. With access to computational and storage resources such as Google’s, a large lab, by university standards, could be created and greatly accelerate development in the field. Given that reCAPTCHA, at the solve-rate of 2007, would continue to annoy people for another 400 years just to OCR existing books (Rubens 2007), the project could be completed more cheaply, faster (and thus yielding more happiness, assuming the project is worthwhile – otherwise, plain CAPTCHA would provide more utility), with less annoyances to users, and research benefits, this alternative produces more utility than reCAPTCHA. Even given that OCR would come to a temporary standstill before overtaking the brute-force approach, it is unlikely to cause any short term harm (especially since a lack of indexing is not an obstacle to reading) and have a detrimental effect on long-term utility.
In all the analyses above, there was one tacit assumption that was largely irrelevant except in the act utilitarian case. This assumption is that the reCAPTCHA task remain mostly unsolvable by bots; however, it does not hold since malicious agents can trick real humans to solve tasks or use reCAPTCHA-specific OCR techniques intended to beat reCAPTCHA(Baecher, Bu ̈scher, Fischlin & Milde 2011). Further, by solving the OCR problem, bots can make use of the same technology. On the other hand, unless CAPTCHA-designers can outsmart (or win through attrition) against those wishing to break the systems, CAPTCHAs will eventually have to move away from OCR-based problems, become increasingly challenging for humans, or become obsolete.
Veil of Ignorance
Under a veil of ignorance, individuals are unaware of their station in life and need to decide what is permissible and what is not. Would individuals in this situation opt to allow reCAPTCHA? That is, if individuals would have agreed that reCAPTCHA was an appropriate thing to permit in a society, the lack of explicit consent would be permissible under the notion of hypothetical consent. The situation can be considered at different levels of specificity for determining whether reCAPTCHA is ethical. In the broad, the question could revolve around permitting deception and/or coercion to elicit an action. Not knowing whether one will be the deceiver/coercer or the deceived/coerced, most individuals would likely hope that such behaviour were not permitted.
In the specific case of having to choose whether to permit reCAPTCHA for the purposes of OCRing old documents, the results are less clear. Hardly anyone would object to being someone benefitting from the OCRing of a document, ceteris paribus. However, one could also be a reCAPTCHA solver that gains no benefit. Altruistic individuals may reason that their (individual) small pains will help a greater number of people. More selfishly-motivated or game-theoretically rational individuals would opt to not allow this behaviour. Why is this so? Consider how many people read older documents from existing sites such as Project Gutenberg, which offers free access to many popular books in electronic format (digitized by volunteers). Now consider how often those people would have been hampered by not being able to copy a section of text to clipboard or do a text search had Project Gutenberg simply scanned the books. Compare this to the number of people that solve reCAPTCHAs. Lastly, determine the individual benefit of having an OCRed document and the individual cost of reCAPTCHA. Most likely, individuals would opt to disallow reCAPTCHA, especially given the alternatives proposed in §Utilitarianism.