How would Watson do against that other font of knowledge — Google? Watson is optimized to understand human language, which can be full of ambiguity, wordplay, sarcasm and puns. Google makes some accommodations for language, but keywords do the heavy lifting.
Google did both much better than I expected and about as well as I expected. It did much better in the sense that I expected Watson to runaway from Google as much as it ran away from Jennings and Rutter. It did as well as I expected in that before I entered each clue into Google, I predicted how well Google would do. For example, I knew the Beatles lyrics category would be a piece of cake for Google. Those predictions were generally correct.
Watson had the correct response for 79% of the clues. (This includes clues for which Watson didn’t buzz in.) Google had the correct response in the first position 56% of the time and in the first 10 positions 79% of the time.
Google and Watson tended to struggle on the same clues. Both had trouble on Final Frontiers, Alternate Meanings and The Art of the Steal categories.
The only clue that Watson got wrong and Google nailed was “The first modern crossword puzzle is published & Oreo cookies are introduced” in the category Name The Decade. Google’s answer was found on this page that provides explanations of the New York Times crossword. If Watson could have heard Jennings’ incorrect answer of “1920s,” his backup answer of “1910s” would have been correct.
Finding a document with the right answer is one thing. Synthesizing that information into an answer is another significant challenge. Google only partially does that. I scored Google based on the snippet returned. While it wasn’t generally able to provide the precision of Watson, it did provide the right parts of the document with the answer.
As for Final Jeopardy!, neither Watson or Google got it right.
The clue in U.S. Cities was “Its largest airport is named for a World War II hero; its second largest, for a World War II battle.” Watson hesitantly answered Toronto. Google’s best guess would’ve been Arizona. Even geographically challenged Americans would know that neither answer fit the parameters of the category. (I hope.)
Oddly, that’s a relatively easy clue for humans. Think of U.S. cities, narrow it to the big ones, narrow that to ones that have multiple airports and then go through the list. Kennedy… Dulles… LAX… SFO… Hobby… Love…
A few thoughts on search from this exercise:
- Google still has serious problems with content duplication. One query generated seven pages of essentially the same AP story about Jeopardy!
- Content farms got in the way of the best answer a few times.
- Wikipedia frequently provided the correct answer.
- The day that Google can synthesize answers to queries is closer than I thought. Bad news for publishers.
Also see Danny Sullivan’s great analysis of the underlying technologies and the differences between natural language processing and keyword search.
- I used a private browsing session to avoid any influence from my search history.
- I ran the exact text of the clue (as provided on the Jeopardy! Archive) through Google.
- I excluded all results that were specific to the Jeopardy! match.
- I looked for the text of the answer on the search results page, including the result title and snippet.
- If the correct response appeared in the title or snippet, Google got credit. For the Name the Decade category, I gave Google credit if a year appeared that was in the correct decade. That would not be a correct response for the game, but I felt it was a better comparison for these purposes.
- Search results change based on time, new content published, geography and other factors. You may not be able to duplicate these results. The optimal tests would be done on Google’s index prior to the airing of these episodes.
Disclosure: I have several good friends who work at Google and went to high school with co-founder and CEO Larry Page.