Written by Betsy Cooper, Tuesday, 23 August 2011     
Introduction

February 16, 2011 was a day of reckoning for humankind. A new computer, appropriately dubbed
“Watson,” beat the world’s best Jeopardy! players at their own game. At first blush this may not seem so
surprising: after all, computers are notoriously better than humans at “recalling” factual knowledge. But
Jeopardy! is a game show known for the nuance of its clues, which often contain puns, ambiguities, and
other curiosities. Watson’s ability to understand and quickly respond to Jeopardy! questions thus
reveals that computers have made great strides in emulating how humans think.

Watson is a computer built for a very specific purpose: to beat humans at Jeopardy!. Since his victory,
pundits and IBM staffers have suggested that the technology powering Watson might have many uses—
in the gaming world, for example, or improving customer service from much-maligned automated call
centers. Only a week after winning the Jeopardy! title, Watson’s creators proclaimed to the annual
Healthcare Information and Management Systems Society meeting that “Watson could dramatically
improve health care delivery by offering, in minimal time, solutions that have a high level of certainty.”
Here I propose how Watson could apply his skills in a legal environment: by helping textualists interpret
statutes. New textualists believe in reducing the discretion of judges in analyzing statutes. Thus, they
advocate for relatively formulaic and systematic interpretative rules. How better to limit the risk of
normative judgments creeping into statutory interpretation than by allowing a computer to do the work?

This Essay considers whether judges might share the job of statutory interpretation with computers like
Watson. First, it briefly lays out how new textualists approach statutory interpretation. Second, it
describes how Watson’s aptitudes lend themselves to textualist-style statutory interpretation. Finally, the
Essay pulls the threads together, discussing how Watson might both aid textualist interpretation and
perhaps perform such interpretation on his own.

I.        The New Textualist Aspiration

New textualism is a popular method of interpretation by which judges decipher statutes; perhaps its
foremost proponent is U.S. Supreme Court Justice Antonin Scalia. New textualist premises and lines of
reasoning about statutory interpretation have become widespread. Indeed, there is arguably a growing
consensus that we (or at least judges) “are all textualists now.” Many others have outlined the key tenets
of this method, so I will not spend much time doing so here. For the purposes of this Essay, there are
three important elements of new textualism: its reliance on ordinary meaning (the premise), its emphasis
on context (the process), and its rejection of normative biases (the reasoning). I consider each in turn.

First, new textualism begins from the premise that “the apparent plain meaning of a statutory text must
be the alpha and the omega of a judge’s interpretation of a statute.” The goal of textualist statutory
interpretation “is to identify the objective meaning of statutory text without regard to what any legislator
intended that text to mean.” This stands in stark contrast to intentionalists, who believe that the goal of
statutory interpretation should be for “courts to implement the intent of the legislature.”

Second, the new textualist process of analyzing statutes takes into account the context in which a word
presents itself, including the structure and coherence of the statute. New textualists thus distinguish
themselves from strict constructionists, who refuse to look at any sources outside the text of the statute.
As Justice Scalia has suggested, “when you ask someone, ‘Do you use a cane?’ you are not inquiring
whether he has hung his grandfather’s antique cane as a decoration in the hallway.” One can only
understand that the question is asking whether the individual uses a cane for walking by considering it in
context.

Finally, new textualists’ reasoning for undertaking this scheme of interpretation is to reduce the
discretion that judges use when interpreting statutes. Justice Scalia warns that “the main danger in
judicial interpretation . . . is that the judges will mistake their own predilections for the law.” To avoid
such errors, new textualists believe that “the goal of statutory interpretation is to determine the objective
meaning of statutory text.” New textualism aspires to such objectivity by advocating a relatively
mechanical process of textual interpretation, divorced from the intent of Congress. Whereas traditional
textualists allowed “strongly contradictory legislative history” to trump the plain meaning of a statute, new
textualists believe that “statutes [should] be read with a strict literalism and with reference to well-
established canons of statutory construction” because doing so will encourage Congress to draft laws
more clearly in the first place. New textualists accordingly reject the use of legislative history as a means
of understanding statutes. Such interpretative parsimony is a perceived strength of the new textualists’
method.

A range of critiques has been levied at new textualism. Not least, it has been accused of seeking to
achieve impossible goals. After all, human judges will always begin from their own inherently subjective
frame of reference. Further, one might quibble about whether the actual practice of new textualism
achieves its stated goal of eliminating bias. Some purists have argued that to take seriously the goal of
eliminating bias would require rejecting particular canons of statutory interpretation, such as absurd
results or scriveners’ errors. This is in part because, by providing an escape hatch from strict textual
meaning, such canons improperly allow judicial discretion to creep in.

My point here is not to dispute the importance or viability of new textualism as a mechanism for statutory
interpretation. Rather, taking new textualism as a starting point—a goal, if you will—for understanding
statutes, I query here whether humans, with all our cognitive biases and normative bents, are the actors
best equipped to interpret statutes in this manner. It has long been recognized that humans draw poor
causal inferences, especially when making judgments under uncertain conditions. And all humans have
normative biases that can confound any effort to apply interpretative rules strictly or narrowly. This leads
us to the new textualist dilemma: can humans ever really be successful textualists? To answer this
question, we look to Watson for a little help.

II.        “It’s Elementary, Watson”: Jeopardy! and Statutory Interpretation

Watson was designed with a single goal in mind: to beat humans at their own game, Jeopardy!. To
determine whether Watson can successfully interpret statutes, one first must understand how he
functions as a Jeopardy! contestant. This Part considers the obstacles Watson overcame to become
Jeopardy! champion. It also investigates how Watson answers questions, in order to see whether his
methods might help resolve the new textualist dilemma.

A. “Who Is . . . Watson?”

Watson is a computer system designed—as a first objective—to answer trivia questions on the game
show Jeopardy!. In creating Watson, IBM began from the premise that computers are not, as a general
principle, particularly good at answering even direct queries:

    Search engines don’t answer a question—they deliver thousands of search results that match
keywords. University researchers and company engineers have long worked on question answering
software, but the very best could only comprehend and answer simple, straightforward questions (How
many Oscars did Elizabeth Taylor win?) and would typically still get them wrong nearly one third of the
time.   
    
Worse still, to win at Jeopardy!, Watson needed to be able to answer questions where important search
terms were not provided. For example, Jeopardy!’s “decomposition”-type questions require the
contestant to “decompose the question into . . . two parts and identify answers to each one.” Often, “the
answer common to both questions is the answer to the original clue.” Existing search engines like
Google—which require the user to input search terms and then make use of algorithms to find instances
where those terms relate most closely to one another—were ineffective at answering such questions.

This task is made even more difficult when we consider the construction of an average “question” on
Jeopardy! First, unlike most trivia games, Jeopardy! provides an answer and requires players to
respond with the corresponding question—a complicated twist for a computer. And second, Jeopardy!
questions are replete with puns, word games, and even jokes, requiring far more layers of
understanding than simple recall.

To understand how this works in practice, take a sample Jeopardy! question: “It means detestable or
loathsome, though I have no beef with the snowman, myself.” Unlike most typical trivia questions, this
Jeopardy! question contains no dates, facts, or even substantive knowledge. It relies instead on
wordplay. You must know that a synonym for detestable is ‘abominable’ and then connect that phrase to
the folklore of the snowman in the Himalayan Mountains. The average human can synthesize these
streams of knowledge simultaneously, putting them together to reach the answer: “What is abominable?”
But the average computer can only respond correctly if nearly the exact same question and answer
appeared together in text it has learned. When I first plugged that sample Jeopardy! question into
Google, the top hit (excluding the article from which I borrowed the question) was a blog post about
wearing red high heels to a pig roast party.

Watson’s creators sought to design a computer system that could better approximate the human
approach to asking and answering questions—both to determine the most likely answer and to express
a level of confidence that the answer is correct. As IBM puts it, the goal of designing Watson was “to
understand the actual meaning behind words, distinguish between relevant and irrelevant content, and
ultimately demonstrate confidence to deliver precise final answers.” In February 2011, IBM declared
success. During a Jeopardy! television special, Watson took on the two most decorated human
Jeopardy! players of all time: Ken Jennings, who once won seventy-four straight games on the show,
and Brad Rutter, the player with the all-time highest prize earnings. Watson amassed thousands of
dollars in a resounding victory, which he secured even before the final round.

After the fact, some criticized Watson’s hair-trigger buzzer system, which could respond in as quickly as
one-tenth of a second if he was confident in his answer. It is true that Watson beat his competitors to the
buzzer in the vast majority of questions. But by any measure, Watson’s victory was decisive. Watson
finished the contest with $77,147 in prize winnings. His next closest competitor, Jennings, earned only
$24,000. Jennings responded graciously in the face of certain defeat, scrawling under his final answer in
the contest that “I, for one, welcome our new computer overlords.”

B. How Watson Answers Questions

How did Watson achieve this resounding victory? IBM staff described the core components that enable
Watson to answer Jeopardy! questions:

    Watson runs on a cluster of Power 750™ computers—ten racks holding 90 servers, for a total of
2880 processor cores running DeepQA software and storage. It can hold the equivalent of about one
million books worth of information. . . .         
     When a question is put to Watson, more than 100 algorithms analyze the question in different ways,
and find many different plausible answers—all at the same time. Yet another set of algorithms ranks the
answers and gives them a score. For each possible answer, Watson finds evidence that may support or
refute that answer. So for each of hundreds of possible answers it finds hundreds of bits of evidence
and then with hundreds of algorithms scores the degree to which the evidence supports the answer.
The answer with the best evidence assessment will earn the most confidence. The highest-ranking
answer becomes the answer. However, during a Jeopardy! game, if the highest-ranking possible answer
isn’t rated high enough to give Watson enough confidence, Watson decides not to buzz in and risk
losing money if it’s wrong. The Watson computer does all of this in about three seconds.         
For Watson to successfully play Jeopardy!, his creators relied on the premise that computers are better
than humans at storing data. For the Jeopardy! challenge, “the sources for Watson included a wide
range of encyclopedias, dictionaries, thesauri, newswire articles, literary works, and so on.” Watson can
access a good proportion of knowledge in the public sphere, both colloquial and expert, and has perfect
retention as long as he can access the information. He is thus well equipped to understand what
information is available, and importantly, how frequently that information appears.

Watson processes this vast array of information by looking for relationships between the clue and other
words; he figures out what words mean in context. Unlike previous computers, Watson can sort out the
most relevant words in a clue and better target his answer. He identifies the “focus of the question,”
detects relationships between words in the question, and decomposes questions into sub-questions,
among other techniques. In the question about the abominable snowman, for example, Watson might
downplay the closeness of words such as “loathsome,” “detestable,” and “myself”—the apparent reason
why the blog post on red heels and pigs was elevated by the Google algorithm. Instead, Watson might
look for the links that connect “detestable” to “snowman,” which would be far more likely to produce the
right result. Even better, Watson can learn from his mistakes through trial and error; he stores incorrect
answers and incorporates them into future games. This helps explain how Watson went from losing in
trial Jeopardy! competitions to beating the top-ranked competitors of all time.

Finally, Watson is not subject to some key reasoning errors of humans. Of course, this is in part
because Watson is subject to no normative inclinations of his own; he is only biased insofar as the
human-controlled inputs fed into Watson’s memory are biased. Likewise, in contrast to humans who are
notoriously poor at estimating probabilities, especially in the face of irrelevant information, Watson is
able to express the probability that he is right in a systematic and quantifiable way. More specifically,
Watson can estimate how likely it is that a particular answer he provides is correct and refrain from
responding to a question if the likelihood is small (by refusing to buzz in). These skills, as I argue in the
following Part, get to the essence of new textualist approaches to statutory interpretation.

III.        Watson: The New Textualist?

Watson has many potential applications—and perhaps not just for search engines and scientists.
Watson-style computers already help lawyers sift through documents in discovery and decipher patterns
in clients’ activity. At least one judge already accepts that judges are “not like the supercomputer
Watson. . . They have no hope of knowing everything.” But could Watson and judges work together and
revolutionize statutory interpretation? This Part considers, first, whether Watson could perform certain
tasks of new textualism better than judges; and second, whether he might somehow assist (but not
replace) them in performing such tasks.

A. Watson the Judge

Could Watson perform better than judges at the tasks of statutory interpretation? Each of the three
elements of new textual interpretation—premise, process, and reasoning—point toward the possibility of
Watson outperforming new textualist judges at their own game.

First, computers support new textualists’ premise by offering a mechanical way of determining the
“ordinary meaning” of a statute. According to Merriam-Webster’s Collegiate Dictionary, “ordinary” means
“of a kind to be expected in the normal order of events; routine; usual.” The common factor in each part
of the definition is frequency; given a set of circumstances, the ordinary outcome is the outcome that
occurs more often than other possible outcomes. Humans are flawed textualists because they have only
one frame of reference: their own “ordinary” experience. Any computer is better equipped to identify the
frequency with which a particular phrase occurs in common parlance.

Take a famous example: in Muscarello v. United States, the Supreme Court debated the meaning of the
phrase “carries a firearm.” The majority argued that the ordinary meaning of carrying a gun included
transporting it in a vehicle. The dissent disagreed, arguing that “carry” required holding a gun on one’s
person. The two sides marshaled a vast array of evidence from the public domain to demonstrate that
their interpretation was the most ordinary, including dictionaries, news articles, and even the Bible.
Watson could have saved the Court’s law clerks a great deal of trouble. The computer would have been
able to calculate how frequently the terms “carry” and “vehicle” (or their synonyms) appear together
versus “carry” and “person” (or their synonyms). Thus, in at least one sense Watson is better at
textualist interpretation than humans—he can not only identify ordinary meanings but can tell us just
how ordinary a particular meaning is!

Watson’s superior recall is particularly important given the historical nature of statutes, meanings of
which can change over time. Justice Scalia, for example, has suggested that absolute immunity for
prosecutors did not exist at common law. A well-informed Watson could report back in a matter of
minutes as to the likelihood that this was true. Watson even may be able to help decipher antiquated
meanings on which there is no modern expertise—such as common law phrases no longer used today—
by looking at the context in which such phrases were used.

This raises a second Watsonian virtue: his process of interpretation. Most computers merely isolate
instances where identical words appear most closely to one another. Watson’s algorithms go a step
further by distinguishing which connotation of a particular word is intended based on the particular
context. Watson might not only look for words elsewhere in the statute, but could also draw from other
words not in the statute to provide additional interpretative context. In the Muscarello example, there was
at least one contextually-appropriate usage of “carry” that was not uncovered by either party in the
litigation: whether state “carry” gun laws (for example, “open carry” and “concealed carry” gun laws)
apply to vehicles. Watson could have estimated the frequency with which each connotation arises—
including the state law use of “carry” not considered by the actual parties—to determine whether “carry”
ordinarily encompasses transportation in vehicles.

Finally, and perhaps most importantly, Watson’s reasoning is more systematic than humans’ reasoning.
Inasmuch as he makes errors, these errors are randomly distributed. His mistakes are not skewed due
to political preferences, personal relationships, or other sources of human prejudice. Watson by design
avoids the ideological bias of judges—which textualists so deeply fear—because, of course, he does not
have any ideology of his own. These arguments are summarized in Figure 1.

Figure 1
Watson Versus the new Textualism


















B.        Watson’s Limitations as a New Textualist

Despite these advantages, computers are unlikely to replace judges anytime soon. For one thing,
Watson still makes mistakes at critical times. Perhaps the most amusing occurred in the very last
Jeopardy! round in the competition. The Final Jeopardy category was “U.S. Cities,” and the answer was
the following: “Its largest airport is named for a World War II hero; its second largest, for a World War II
battle.” While both Jennings and Rutter correctly provided the question “What is Chicago?” Watson
responded, “What is Toronto?” Given that the only city named Toronto with any commercial airport is
not in the United States, but in Canada, this was a baffling response.

The Toronto incident highlights that Watson cannot filter away such absurd responses on his own.
Without a human to assist him, serious errors may remain. To be fair to Watson, the question marks
indicate he was highly unsure about his response to the “Toronto” question; he was forced to answer
the question in Final Jeopardy and wagered a low amount as a result. But this quantified uncertainty
may not be useful when Watson attempts textualist interpretation. If Watson is uncertain about the
“ordinary meaning” of a statute, he will not be able to refuse to buzz in. When Watson can find no clear
ordinary meaning, what should he (or a judge) do then?

This suggests that the most serious critique for a Watson-led textualism is not practical but principled: at
least in the tough cases, judging should contain normative as well as objective inputs. Employing
Watson for statutory interpretation requires an important choice between allowing judicial decisions with
random error but occasionally absurd results or allowing decisions with nonrandom, biased error.
Watson could achieve the new textualists’ stated goal of determining ordinary meaning—with a dash of
random error. But he could never decide, for example, that an outcome is normatively absurd. According
to his computational frame of reference, any answer his algorithm spits out is the most likely accurate
meaning. Do new textualists really want judicial decisions to be made based only on the frequency with
which a meaning appears in Watson’s memory, especially when his certainty is low? I expect not.

C.        Watson Assisting Textualists

As IBM brings Watson’s DeepQA technology to the medical community, Watson’s creators are not
proposing that his algorithms could replace doctors in their entirety. More appropriately, they suggest
that Watson could aid doctors in doing their job better. This also may be the most appropriate role for
Watson in the judicial sphere. A Watson-type tool could bring the advantages of computer-based
analysis to statutory interpretation without sacrificing the normative discretion which allows humans to
“get it right” in ways that computers cannot.

One can imagine a tool into which users could input short phrases from statutes. The tool, powered by
DeepQA technology, would then output the ordinary meaning of the phrase based on frequency
calculations. Such a technology would create a presumption of ordinary meaning that judges would
become (informally) bound to refute if they wished to stray from such meaning. Circumstances in which
they might stray might include (1) a close call where two ‘ordinary meanings’ score highly; (2) an
analytically dubious result accompanied by a low level of confidence (i.e., the Toronto example above);
or (3) a normatively absurd result produced by the effects of the ordinary meaning, such as an
excessively punitive result. This is similar to the way that judges already treat government agencies: in
most cases giving them deference.

How might judges employ such a technology? One possibility is that it could serve the function of a law
clerk and conduct basic research upon which judges can construct their opinions. A second possibility is
that it could become a resource of the Federal Judicial Center; officials at each courthouse could get
trained in the technology. Third, Watson might function usefully as a tool for the private sector. Lexis
and Westlaw might purchase the rights to the technology, providing firms, universities, and judges alike
with the ability to determine their own “ordinary meanings.” By providing more definitive meanings,
Watson could eventually reduce litigation—if all parties agree to turn their fate over to the hands of a
computer.

Conclusion

Watson achieved a great victory for computational “thinking” over human “thinking.” But he cannot yet
make the normative decisions that ethical judging requires. What Watson already can do for judges is to
provide a baseline against which to evaluate their own interpretations of “ordinary meaning.” Watson will
not stop bias from creeping into judicial decisionmaking—but his contributions to statutory interpretation
are nevertheless far from trivial.

Betsy Cooper is the Executive Bluebook Editor of The Yale Law Journal and a member of the Yale Law
School Class of 2012. She received her DPhil in Politics from the University of Oxford in 2009. The
author would like to thank Aaron Barkhouse for inspiring this Essay and Professor William Eskridge,
Arpit Garg, Daniel Hemel, Nick Hoy, and The Yale Law Journal Volume 121 team for their helpful
feedback.

Preferred Citation: Betsy Cooper, Judges in Jeopardy!: Could IBM’s Watson Beat Courts at Their Own
Game?, 121 Yale L.J. Online 87 (2011),
www.yalelawjournal.org/2011/08/23/cooper.html.

Next