We’ll be not posting for several days so we can complete our paper about Nutch for submission to WWW 2005

As some readers may have noticed, we’ve been having a problem with blog comment spam lately — just like most other Movable Type blogs. We’ve had several hundred comments, of which only seven have been non-spam. So I’ve been thinking about countermeasures.

  • Soft touch: the more detectable and forceful the anti-spam measures are, the more they will interfere with legitimate users and be circumvented by spammers. Some thoughts:
    • Moderate rather than ban: when spam is suspected, mark the comment as “needing approval” rather than rejecting it.
    • Different views of the world: when a comment needs approval, show it to its posting IP address as if nothing were wrong. This requires substantially more work for the spammer to detect and circumvent countermeasures.
    • Temporary failures: when a spam-appearing comment is posted, hold the connection open for a few seconds, then drop it without sending an HTTP reply. Ordinary people will retry; spamware may or may not.
  • IP whitelisting: posts from an IP address that has never had an approved post should be held for moderation.
  • Bayesian filtering: simple naive Bayesian classifiers would do a great job of distinguishing spam from nonspam so far.