Why Nerds Rule: Luis Von Ahn and reCAPTCHA
I stumbled across this video on the weekend and was floored by the idea presented. It’s brilliant, simple, and effective. A true testament to the notion that small contributions can make a BIG difference.
*UPDATE: September 16, 2009 @ 9:20am Google announces they have acquired reCAPTCHA. Official Google Blog post here
For those who can’t spare 12 minutes for the brilliant and entertaining talk above, here’s the gist:
THE PROBLEM: Spammers
Free email services like Google, Yahoo!, and Microsoft were suffering attacks from hackers/spammers who had written programs to obtain millions of email addresses every day. Why did they need so many email addresses? Because these free services only allowed users to send a specific amount of emails per day (e.g., Yahoo only allowed 100), so in order to effectively ‘spam’ they required numerous addresses.
THE SOLUTION: CAPTCHA
Develop a program that protects websites against bots by generating and grading tests that humans can pass but current computer programs cannot. For example, humans can read distorted text as the one shown below, but current computer programs can’t.
This is an example of a typical CAPTCHA
In 2000, Luis von Ahn and Manuel Blum coined the term ‘CAPTCHA’. They invented multiple examples of CAPTCHAs, including the first CAPTCHAs to be widely used, which were those adopted by Yahoo!.
– Approximately 200 million CAPTCHAs are typed every day around the world
– Each CAPTCHA takes nearly 10 seconds of time and thus;
– 500,000 hours of human time are wasted every day typing CAPTCHAs
Is there any way this human effort can be used for the greater good of humanity?
THE SOLUTION REVISITED: reCAPTCHA
– Digitizing books one word at a time. reCAPTCHA is a free CAPTCHA service that helps to digitize books, newspapers and old time radio shows
How it works
In an effort to make information more accessible, book pages are being photographically scanned, and then transformed into text using “Optical Character Recognition” (OCR). The transformation into text is useful because scanning a book produces images, which are difficult to store on small devices, expensive to download, and cannot be searched. The problem is that OCR is not perfect.
reCAPTCHA improves the process of digitizing books by sending words that cannot be read by computers to the Web in the form of CAPTCHAs for humans to decipher. Each word that cannot be read correctly by OCR is placed on an image and used as a CAPTCHA. This is possible because most OCR programs alert you when a word cannot be read correctly.
But if a computer can’t read such a CAPTCHA, how does the system know the correct answer to the puzzle?
Here’s how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct. BRILLIANT!
FYI: With the assistance of reCAPTCHA, the entire New York Times archive from the 1850’s – 1980’s will have been completely transcribed in less than 12 months.
Luis Von Ahn
Graduating from Carnegie Mellon with a Ph.D. in Computer Science in 2005, Von Ahn is now a professor at his Alma mater. When he’s not lecturing about the Science of the Web he’s working on Human Computation, which harnesses the combined computational power of humans and computers
to solve large-scale problems. Some call this “crowdsourcing.”
His 8-page C.V is quite impressive and his list of accomplishments will only grow as he continues his research. Some of his ‘selected honours’ include:
– MacArthur Fellow, 2006-2011.
– Discover Magazine: 50 Best Brains in Science, 2008.
– Silicon.com: 50 Most Influential People in Technology, 2007.
– Microsoft New Faculty Fellow, 2007.
– Sloan Fellow, 2009.
– Smithsonian Magazine: America’s Top Young Innovators in the Arts and Sciences, 2007.
– Technology Review’s TR35: Young Innovators Under 35, 2007.
– IEEE Intelligent Systems “Ten to Watch for the Future of AI,” 2008.
– Popular Science Magazine Brilliant 10 Scientists of 2006.
You can find his personal blog here, and his university page here
THE POWER OF PEOPLE