re-Captcha, nanojobs and GWAP

Probably the most clever idea I ever heard of.

This is not new but it keeps astonishing people when I tell them about it: Did you know that Captchas help scanning books? And they are doing it very very well.

A captcha seen on the facebook registration page

You all know about captchas: Images containing words that you are forced to type to make sure you are a human and not a robot when performing various actions on the Internet (create an account, write a comment…) in order to fight spam. Thousands of people are decoding them everyday. Everyone of them is doing a small mental effort to read the words.

A captcha from re-Captcha

And this is where the guys from the re-Captcha project had a brilliant idea: What if these thousands mental efforts could be used to actually do something useful? Like helping scanning books ?

Today, many organizations are scanning old books and transforming them in a digital format. The OCR software that transforms the scanned image to digital text may sometimes not be able to do its job correctly (however complex the software may be). re-Captcha uses the human brain to actually decode words the computer is not sure about:

The company behind re-Captcha, its data and its market-share were acquired by Google in 2009. (what else would you have imagined)

To me this system is brilliant: it solves a problem by dividing it in such simple tasks that they can be executed by people who don’t even notice that they are working. (And what’s nicer with this one is that it helps fighting spam and digitalizing books, two great causes.)

nano jobs

I don’t know if there is another term, but I call this nano jobs.

Let us take another example of nano job: in 2006 a professor released a fun tool where you can play with a random other player: an image was displayed and your goal was to find common words describing this image with the other remote player. Of course, you quickly realize that this was only done to help labeling the image base: Today, contrary to a human, a machine has difficulties to understand what an image represents (Image recognition). The “find common words to improve your score” is just a incentive to gather a lot of data. Google did the same to help labelling its image base.

Playing the ESP game with a random player, finding common labels for a given image.

This leads us to another important point in nano jobs: game mechanics.

You cannot force people into doing small tasks, they have to do them by themselves. In the case of re-Captcha, they understand the need of fighting spam, so they accept the task. In the case of the ESP game they want to do the best score or maybe to have fun with a random web user (this reminds me chatroulette).

These games are called games with a purpose (GWAP). Imagine, the workforce that these millions of people farming like zombies on Farmville represent (Unfortunately, Farmville business model is more in selling your data and selling stupid virtual stuff than making you doing nano jobs). Then, when we hear about Google investing in social game companies, I think nano jobs are part of their motivation (not the only one of course).

My conclusion

To conclude, I think this de-centralized and effortless way of solving problems is extremely powerful. Once again, divide and conquer seams to be the strategy to adopt, even for problems that don’t seam scalable.

Some more examples

To another extent, Amazon is providing an online service called Amazon Mechanical Turk. It links nano-jobs providers with a widespread user base doing small tasks for money. I heard many companies are using this platform to help performing Human Intelligence Tasks.