I was on Aljazeera Arabic’s website the other day and, as I was voting on a poll, was presented the following screen:
The CAPTCHA in the screen above immediately caught my attention. The distortions in it seemed very simple, the text was not warped in any form and no overlap between characters.
The following is a URL for one of the CAPTCHAs:
Opening the URL above and refreshing the page a few times gives the following CAPTCHAs:
The dashed grey lines are randomized, while the letters in the CAPTCHAs above are static. The letters are encoded in the
Codeparameter in the URL. Notice that there are two forms for each character; a straight form and another that is slightly rotated.
Aljazeera’s CAPTCHA can easily be broken by doing the following:
- Removing the dashed grey lines
- Finding the characters in the image
- Separating the characters in the image
- Classifying each character
I’ll be using Octave/Matlab for the above tasks and will be explaining my algorithm using the following CAPTCHA as an example.