Blog comment spamming usually involves a malicious party writing a computer program that seeks out blogs, finds their comment scripts, and submits tons of advertisements into them. Read on if you’re interested in the techniques I’m using to fight comment spam.
Our website currently uses a number of techniques in an attempt to foil comment spammers.
- IP-based throttling. This is a built-in feature of Movable Type which stops any user from posting more than one comment in a given period of time. Automated comment spammers tend to submit many comments in rapid succession, so this logic is supposed to stop them. Unfortunately, there are many situations where this feature doesn’t work correctly – for example, if a number of users are posting from behind a firewall, and they post at around the same time, they would all appear to be the same user. So I have this set to a fairly low level, blocking only the most aggressive repeated comment spammers.
- Jay Allen’s MT-Blacklist plugin for Movable Type, our blogging system. MT-Blacklist is a fairly simplistic system which scans comments before they are posted, looks for certain key words, and blocks the comments if they match the key words. MT-Blacklist also allows the blog authors to report comments that aren’t blocked, and add them to the blacklist so that similar comments will be blocked in the future. This feature works well for comments which match the blacklist, but doesn’t do anything to prevent comment spam from people not on the list.
- Automatic blacklist updating script. This is a script I wrote myself using Perl and the LWP and WWW::Mechanize libraries. It actually goes to Jay Allen’s website, fetches the latest copy of his master blacklist file, logs in to Movable Type on its own, and adds any new entries to the blacklist. This works pretty well, and keeps the list of known spammers up-to-date. However, Jay has been busy lately, having been hired by Six Apart, the makers of Movable Type. He’s in the process of moving from Hungary to San Francisco, and hasn’t had the time to keep his own blacklist file up to date. So we can’t count 100% on his blacklist to keep spammers away.
- Manual blacklist maintenance. Any spam comments that come through without being blocked have a link to ‘De-spam’ them. This brings the blog author into the MT-Blacklist system, allows them to delete the spam comment, and also figures out what parts of the comment should be added to the spam blacklist. This way, we can stay up to date with new spammers even if Jay’s master blacklist file is out of date.
- Renaming the comments script. Most spammers look for known names of comment scripts in order to do their spamming. If the name of the script is changed from what it’s known to be, then they can’t find any comments to spam. So I periodically change the name of the script to something random. This only works for a short time, as the spammers eventually discover the new script names and add them to their repertoire.
- Removing the initial ‘Post’ option from the comment window. This causes some frustration for legitimate comment posters, as they can’t simply click on “Post” to submit their comment. It forces them to “Preview” first, and from there they can “Post”. On the bright side, many spammers are programmed to hit the “Post” button right away, so this can stop them. However, it’s only a matter of time before their programs are capable of getting around this.
- Using a “CAPTCHA”. This technique is used by many websites. Basically, a graphic is displayed on the comment posting screen. This graphic contains some text that is readable by humans, but not readable by computers. The human commenter must type in the text they are reading from the screen to validate that they are, in fact, a real person, and not some comment-spamming robot. Like the forced preview, this is annoying to comment posters. It could also prevent people who can’t see the graphic from commenting. But for comment spammers, it’s a sure-fire way to stop them from posting. I’ve just started testing out this feature, and I may not end up using it simply because of the annoyance factor. But it’s nice to know that it’s there in case the spammers get really vicious.
One of the major techniques we’re not using is comment registration. This basically means that in order to post a comment on our site, you’d have to go through a registration process, set up a username and password, and then log in in order to post. Besides the fact that our current version of Movable Type doesn’t support it, I feel that this is much more of a hassle than our readers are already subject to.
Another possible technique that we’re not using is Bayesian filtering. This is used in many modern email clients, such as Thunderbird and Apple’s Mail. The basic premise is that the user ‘trains’ the client on what spam is and what it is not. With commenting, the Bayesian filter would capture comments, and ask the user whether or not they were spam. Based on these continued ‘lessons’, the filter would learn to recognize spam messages, and treat them appropriately. If you’ve used an Email client with this capability, you probably know that it does work well. But in the case of comment spam, the currently available Bayesian filters require too much overhead on the server to make them practical. So for now, I’ll stay away from a solution like this.
Another option related to the CAPTCHA is to use a simplistic question that could be answered (and checked) to allow humans easy access, but make it more difficult for robot spammers. Something like “What is Peter R. Wood’s first name?”, for instance.
Then when you want to really mess with everyone, you could make the answer be not Peter, but Rebecca. 🙂
That’s a pretty good idea, and would definitely involve less overhead than using GD to draw those images. However, it doesn’t do much for the user-annoyance factor. :-4a7d3d609129a9296bf7ac0608c2097