I?m unsure if we should submit it as a patch, in which case, it would probably work... until the crackers find a way to break it, but who knows... maybe this is the solution...
WaterCap Strong PHP CAPTCHA With Negative Spaces And Shadows
Spoiler:
Introduction
Most of the Internet users these day have seen a CAPTCHA. A CAPTCHA is a challenge-response test used on many web sites to determine whether or not the user is human. It's the most widely used mechanism to defend an access to a specific content against the software bots, while allowing an entry to the human users. You probably faced CAPTCHA already, especially if you use hosted email, have a web site, are involved in e-commerce or provide services over the Internet to others.
Here I present WaterCap - new, simple and strong CAPTCHA image generator (on the right hand side of the page). In under 50 lines of PHP code, WaterCap was specifically designed to withstand commonly used CAPTCHA defeat algorithms.
The problem
I am involved in the development of several large web sites, many of which heavily rely on CAPTCHA. CAPTCHA's seem to be working well, except for the phpBB forum. The phpBB forum software version 2.0.2x uses very weak CAPTCHA that is being regularly defeated by the software bots. Thus, I now get all kinds of porn, Viagra and other fun stuff in addition to serving thousands of web pages to dozens of non-human members registering daily!
If you follow the news on the topic, this might not be a surprise to you, but it is a huge surprise to me. Before I discovered this problem in my phpBB forum I didn't even think that CAPTCHA's can be defeated. Apparently, there are numerous articles {1-5} with the examples of software (some open-source) that instantly breaks CAPTCHA, some reporting over 90% success rate! So, as many others things in life - CAPTCHA is a chase! Us against them, good against evil - with a lot of time, money and humanity burned in the process...
The solution
After a quick research I found several CAPTCHA image generators for PHP, but none I liked. They all seemed a variation on the same theme and they all seem to me to be easy to defeat. Thus I decided to read more about the software that breaks CAPTCHA, hoping to construct the CAPTCHA image generator that is difficult for these tools to defeat.
The CAPTCHA breaking software {1-5} works by processing the challenge image in several stages, including some of these steps:
1. background noise elimination- fetch the same challenge several times, hoping that is always has different random noise, but the same challenge text; if so, all images can be "added up" and the noise can be subtracted out
2. pixel convolution (grouping) - roughly if in 3x3 matrix has only one white pixel and all other black pixels, turn this white pixel black
3. border detection - where a bounding box for each character is detected
4. foreground enhancement - within a bounding box
5. character search - brute force matching of extracted character image to a database of character images for well known fonts
6. word validation - if it is known that a challenge is a valid word, rather than random symbol combination
7. character outlining
8. line thinning
9. endpoint finding
10. feature vector search
I have collected and inspected many examples of CAPTCHA images, most of which have been defeated already with over 90% accuracy. What makes them all easy to defeat? How can I generate challenge images in the way that makes these techniques above useless? How to complicate the "boundary detection" and the "character outlining"? Why none of these work:
http://www.softwaresecretweapons.com/jspwiki/attach?page=WaterCap_Strong_PHP_CAPTCHA_With_Negative_Spaces_And_Shadows%2Fother.png
Take a closer look at these images. They all have a common trait of having distinct text color. The letters are distorted in variety of ways: turned, fogger, shadowed, squished, and stretched, noise is added, but one thing remains the same - the color of all characters is the same. This is the main weakness!
WaterCap CAPTCHA image generator described here is designed to eliminate this weakness and make several steps in the automatic image recognition process especially difficult. With WaterCap the pixel convolution becomes useless, the border detection is much harder and so is the foreground enhancement. And it all is achieved with one simple technique - by imprinting the text with negative spaces and shadows, by using the background color as the text color.
As I think more and more about this I even have an idea why other CAPCHA engines draw the text a one specific color. I think that drawing colored text is complex. As far as I know, a typical drawText() function found in Java, .Net, Delphi, PHP or Perl drawing API's just can't do it. Can this really be so simple...
I have no proof yet that the WaterCap is a better CAPTCHA image generator, compared to other generators. But it seems to me to be so, because the WaterCap doesn't use any additional color for the text - it uses the background color itself. The noise is placed on top and around the text, so it resembles the shadow of the letter, but without continuous boundary around each character. This is what I think will make it difficult to defeat WaterCap by a software program. And the beauty is in simplicity: only 50 lines of PHP code is needed to create the image! Here I have several examples:
WaterCap Example 1: Characters 0..1

WaterCap Example 2: Characters a..z

WaterCap Example 3: Characters A..Z

The implementation
The complete PHP implementation of WaterCap is presented below. Since I am very new to PHP, I have started from the original code of Simon Jarvis to avoid learning PHP drawing API. The WaterCap image is obtained by drawing the same challenge text three times with three different colors, while shifting the text a bit. The small angle rotation quickly adds light fuzziness. Among other things, I made sure that noise is always the same for the same challenge code.
Spoiler:
Here is an example of using WaterCap in phpBB. Open and edit usercp_confirm.php file; add the WaterCap class definition at the top. Insert three new lines just before $_png = define_filtered_pngs(); as shown below. This is it! Nothing else to change.
Spoiler:
Final word
Don't be afraid of the software bots! A software bot is just a program written by a human - by a software engineer dude just like you. It can be quickly defeated as soon as you put your thought into the defense. Don't just trust the tools (CAPTCHA or otherwise) and forget about the forces behind the games you play. The whole software engineering is about continuous change, so keep the eyes on the ball.
WaterCap CAPTCHA and the ideas from this article are yours to use as you see fit for your own projects. I have no proof yet that WaterCap works well, but I am investigating its strength and will report on it if it is confirmed. No doubt that even if it works well today it's likely not to work well tomorrow. But we will talk about what to do then when that time comes...




















