As some readers may have noticed, we’ve been having a problem with blog comment spam lately — just like most other Movable Type blogs. We’ve had several hundred comments, of which only seven have been non-spam. So I’ve been thinking about countermeasures.
- Soft touch: the more detectable and forceful the anti-spam measures are, the more they will interfere with legitimate users and be circumvented by spammers. Some thoughts:
- Moderate rather than ban: when spam is suspected, mark the comment as “needing approval” rather than rejecting it.
- Different views of the world: when a comment needs approval, show it to its posting IP address as if nothing were wrong. This requires substantially more work for the spammer to detect and circumvent countermeasures.
- Temporary failures: when a spam-appearing comment is posted, hold the connection open for a few seconds, then drop it without sending an HTTP reply. Ordinary people will retry; spamware may or may not.
- IP whitelisting: posts from an IP address that has never had an approved post should be held for moderation.
- Bayesian filtering: simple naive Bayesian classifiers would do a great job of distinguishing spam from nonspam so far.