blacklist_to_modsec.pl
By Peter R. Wood - peter at prwdot dot org
News and updates on blacklist_to_modsec can be found here.
Contents
- Introduction
- Downloading
- Installing
- Configuring
- Running
- mod_security Tips
- Resources
- Et cetera
Introduction
Terse
blacklist_to_modsec is a tool that can import existing spam blocking rules and convert them into rules to be used with mod_security.
Verbose
Web spam is a rapidly growing nuisance these days. Any web-based system that allows users to post content without registration and login is vulnerable. Common examples include weblog systems like Movable Type and WordPress, and nowadays even photo galleries such as Gallery.
Many tools have been specifically written to help combat web spam. Some work by blocking IP addresses known to be 'open relays'. Others work by blocking requests that contain key words or URLs known to be used by spammers. Still other tools require user-submitted content to be moderated before posting. And some tools require users to complete some sort of puzzle, or answer a challenge and response in order to prove that they are not simply automated spamming tools.
blacklist_to_modsec falls into the category of tools that block requests containing key words or phrases. To be precise, blacklist_to_modsec itself does not block anything. Rather, blacklist_to_modsec connects to various data sources, obtains pre-existing blacklists, and converts them into rules to be used by mod_security. mod_security, an Apache module, is a powerful and flexible tool that can take a list of rules and use them to intercept, block, and otherwise handle incoming web server traffic before it ever reaches the application level. blacklist_to_modsec is able to supply mod_security with an exhaustive list of keywords and patterns, and mod_security actually does the blocking.
blacklist_to_modsec is also capable of keeping itself up to date. It can connect directly to a local MT-Blacklist installation, if it is available, and pull rule updates from the database. It can also force MT-Blacklist itself to run an update from the master blacklist. And it can issue system commands so that the web server process can be restarted, if necessary.
Caveat
Since blacklist_to_modsec is only capable of blocking spam which it knows about, it can do nothing to prevent spam whose characteristics have not been identified. With regular updates from master blacklist sources, it can do a pretty good job, but it will never be 100% effective in blocking all types of spam. In my personal situation, blacklist_to_modsec+mod_security is effective in reducing comment spam to a level where it is not annoying - perhaps only a few spams per week. I would probably get even less spam if I updated my blacklist more frequently - but at the moment, it only updates at noon and midnight each day. To further increase the effectiveness of your spam blocking defenses, consider additional tools as necessary - noted in the Resources section of this document.
Downloading
Download the code here.
Installing
Note: This document assumes that you already have mod_security and Apache configured and running. General setup and configuration of Apache and mod_security are outside the scope of this document. Refer to the mod_security website and the Apache HTTPD website if you need basic assistance.
- Put the code in any location on your system.
- Rename the file to end in '.pl' instead of '.txt'. This isn't strictly necessary, but it does make the script easier to recognize as Perl, and might possibly be needed on some systems.
- Edit the file and configure it as noted in the Configuring section of the documentation. Also, change the perl shebang at the top if perl resides elsewhere on your system.
- Alter your mod_security configuration to include the rules file. e.g.:
Include conf/myrules.txt
Configuring
Configuration is done by editing the script itself. All of the user-configurable parameters are located near the top of the file. The available options are the following:
- @master_sources
- @master_sources may contain a comma-separated, quote-delimited list of sources to obtain a master blacklist. By default, this just contains the URL for Jay Allen's master blacklist file. You can also add URLs for other blacklists, as well as specifying file paths to blacklists on your own local system. Anything with an 'http://' in front of it will be considered a URL. Anything else will be considered a file path. The URLs and files listed in @master_sources will only be queried when the local database is initialized. Whether you use a URL or a local file, the format of the file must be the same as Jay's master blacklist file. NOTE: If you want to use MT-Blacklist's own database to pull your rules from, you should not have anything in master_sources.
- @update_sources
- @update_sources should contain a comma-separated, quote-delimited list of sources to obtain a blacklist update file. By default, this just contains the URL for Jay Allen's latest blacklist changes file. You can also add URLs for other update files, as well as specify file paths to update files on your own local system. Anything with an 'http://' in front of it will be considered a URL. Anything else will be considered a file path. The URLs and files listed in @update_sources will only be queried when the local database is initialized. Whether you use a URL or a local file, the format of the file must be the same as Jay's blacklist update file. NOTE: If you want to use MT-Blacklist's own database to pull your rules from, you should not have anything in update_sources.
- $rules_file
- This is the local file on your system where the rules will be stored.
- $data_file
- This is the local file on your system where the raw blacklist data will be stored in between runs of the program.
- $rule_format
- This is the format you'd like for your rule. See http://www.modsecurity.org/ for information on writing rules. The default one works pretty well for me. You can also check out Noel Jackson's mod_security rule generator.
- $logfile
- Path to the logfile in the local filesystem.
- $mt_dir
- The location of your Movable Type home directory - this is the directory that probably contains scripts such as 'mt.cgi'. Fill in this value only if you have MT-Blacklist installed and wish to use its database. With this value enabled, blacklist_to_modsec will connect directly to your MT-Blacklist rules list and extract rules from there. The rules found there will supersede rules pulled from any other sources, so effectively this option makes MT-Blacklist's database authoritative.
- @restart_commands
- This should be a comma-separated, quote-delimited list of commands, in the order you would like them to be executed. They will be run at the end of the script, presumably to restart your server, and any output the commands produce will be written to the log file, if you have one. For example:
@restart_commands = (
'/usr/local/apache/bin/apachectl restart',
'uptime',
)
Running
Command-line options
- -i
- Initialize the blacklist data file. Use this if you want to force the data file to be rebuilt from scratch, using the master databases. You will want to do this if you are upgrading from a previous version of the script, or if you suspect the data file to be corrupted.
- -u
- Perform a MT-Blacklist master update. Use this only if you are using MT-Blacklist and would like to have MT-Blacklist update its own database before this script accesses it. This can add a few minutes to your script's execution time, since MT-Blacklist has to contact the server, parse the rules, etc.
- -b
- Create a backup of the rules file before writing it. Useful when upgrading in between versions, in case the new rule set doesn't work for you.
General Steps
- Run the script. e.g.:
$ perl blacklist_to_modsec.pl
- (On the first run, the full master blacklist will be downloaded, and data on the current blacklist will be saved to a file. On subsequent runs, only the latest changes will be downloaded, and they will be applied to the last blacklist file. If you would like to completely reinitialize the data file, run the script with the "-i" command line option as noted above.)
- If you are running mod_security in your main Apache configuration (e.g. httpd.conf), restart your Apache process. e.g.:
$ apachectl restart
- If you'd like to have the blacklist rules updated regularly, without manual intervention, add it to your system's crontab file at whatever frequency you desire.
mod_security Tips
Important note on httpd.conf vs .htaccess
These instructions are written under the assumption that you have the ability to modify your server-wide httpd.conf file in order to add the necessary mod_security options. If, instead, you must use an .htaccess file, there are certain limitations which you will need to be aware of.
First, .htaccess files cannot use the 'Include' directive. Therefore, you will need to manually paste the entire contents of the output rules file into the mod_security section in your .htaccess file. Unfortunately, there is no graceful workaround for this, and writing a complete .htaccess file is beyond the scope of the blacklist_to_modsec script.
Second, .htaccess files cannot use the 'Location' directive. Therefore, if you want to exclude mod_security from checking certain directories, you will need to use a workaround. For example, you can try using the FilesMatch directive to have mod_security selectively ignore particular scripts.
Tips
- Make sure the 'SecFilterEngine' directive is set to 'DynamicOnly'. This tells mod_security to only scan requests for dynamic scripts, e.g. cgi scripts, php, etc. There's not much point in scanning requests for graphics or for static files.
- Consider setting 'SecFilterInheritance' to 'Off' for certain locations, by including it within a 'Location' block. For example, I have mine set up not to scan mt.cgi, because occasionally users post blog entries that contain words that are on the blacklist (e.g. 'poker'), and I want them to be able to post whatever they want. Remember, mod_security will scan everything that gets posted to your server, whether it's posted by a legitimate user logged in to Movable Type, or a comment spammer.
- There may be further ways to optimize mod_security, but I haven't yet had the time to fully delve into the reference manual. Please feel free to share any other tips that you have.
Resources
Et cetera
To-Do
- Add ability to write to .htaccess file
- Improve documentation
- Make script smarter
- Improve regex compatability/cleanness
- Implement user suggestions
Thanks
David Phillips for his hard work on debugging and testing with Apache 1.x, and for his contribution of the parse_mtbl subroutine; Jay Allen for writing MT-Blacklist, Ivan Ristic for writing mod_security.