Wordpress plugin reCAPTCHA - Digitize books while stopping spam
A Lifehacker article today led me to the reCAPTCHA project. This fascinating project creates CAPTCHAs from OCR errors produced while digitizing text, then serves those CAPTCHAs to your site resulting in a seemingly symbiotic process - you prevent comment spam on your site with their CAPTCHA, and they receive assistance from thousands of humans correcting OCR errors. According to reCAPTCHA’s project description,
reCAPTCHA improves the process of digitizing books by sending words that cannot be read by computers to the Web in the form of CAPTCHAs for humans to decipher. More specifically, each word that cannot be read correctly by OCR is placed on an image and used as a CAPTCHA. This is possible because most OCR programs alert you when a word cannot be read correctly.
But if a computer can’t read such a CAPTCHA, how does the system know the correct answer to the puzzle? Here’s how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct.
I decided to check out this slick-sounding project. It’s pretty easy to integrate their CAPTCHA plugin into Wordpress:
- Sign up for an account at reCAPTCHA - as far as I can tell, the service is free
- Register for an API key, one per domain
- Download the Wordpress plugin (they have plugins for other applications as well)
- Upload, activate and set options for the plugin
- Follow the instructions here to insert the CAPTCHA into your comment loop
You can see the results below. The interface is a little confusing - it would be nice if it used a smaller field and smaller widget, perhaps something more like bot-check. Perhaps reCAPTCHA 2.0 will integrate better into existing forms. As it is, it’s still worth the small added confusion to be helping digitize projects like the Internet Archive.
Unless, of course, people stop leaving comments.
After playing with the plugin a little, I noticed the letters are sometimes hard to discern. For instance, I can’t even make a guess as to what the word on the left must be in this one:

Dave says it’s cit.), - good call.
My guess is this widget’s refresh button will get a lot of use.
What do you think - is the interface too confusing? Test it out and tell me in the comments below.
UPDATE-
I decided to disable the plugin - it’s too cumbersome as it exists now, and I don’t want to make visitors work hard to leave comments. I like the idea of the project though, and hope a version 2.0 is in the works. I’ll leave a screenshot up of the plugin interface.
Thank you for visiting NoShrinkwrap. If you enjoyed this article, check out the related posts below and subscribe to our feed.