November 5, 2004
We’ll be not posting for several days so we can complete our paper about Nutch for submission to WWW 2005…
October 6, 2004
blog comment spam
As some readers may have noticed, we’ve been having a problem with blog comment spam lately — just like most other Movable Type blogs. We’ve had several hundred comments, of which only seven have been non-spam. So I’ve been thinking about countermeasures.
- Soft touch: the more detectable and forceful the anti-spam measures are, the more they will interfere with legitimate users and be circumvented by spammers. Some thoughts:
- Moderate rather than ban: when spam is suspected, mark the comment as “needing approval” rather than rejecting it.
- Different views of the world: when a comment needs approval, show it to its posting IP address as if nothing were wrong. This requires substantially more work for the spammer to detect and circumvent countermeasures.
- Temporary failures: when a spam-appearing comment is posted, hold the connection open for a few seconds, then drop it without sending an HTTP reply. Ordinary people will retry; spamware may or may not.
- IP whitelisting: posts from an IP address that has never had an approved post should be held for moderation.
- Bayesian filtering: simple naive Bayesian classifiers would do a great job of distinguishing spam from nonspam so far.