May 17, 2004

Reversing the Spam Cannon

by Nick Montfort · , 2:07 pm

Traditional methods for combating spam on blogs – for instance, obfuscating links and thus decreasing the PageRank and usefulness of blogs, using censorship methods known as blacklists – are a disservice to public communication, albeit often in ways that are minor at first. If these are used exclusively, they will eventually lead to the ruin of the Internet as a public space and a public conversation.

Instead, we should encourage technical and legal measures that actively counterattack spammers and assailants of blogs. Spambots – here I refer to the sorts of programs that communicate on IRC to coordinate the defacement and destruction of blogs – attempt to turn channels of public communication and conversation against themselves. Spambots should themselves be sabotaged so that they are made to perform useful tasks, at the very least, notifying end users and network administrators that their computers have been compromised, but perhaps also implementing DDOS (distributed denial of service) attacks on rogue, spamming machines. Additionally, spammers should certainly be publicly identified and then ostracized, bankrupted, and in some cases physically incarcerated, but there are powerful technical methods that could be available to us, too, and it’s worthwhile to spur on the development of these.

The problem with comment spam is not that blogs link to things or that blogs allow unconstrained communication by commenters online; the problem is the abuse of blogs as a channel of communication and the attempts of spammers to destroy the blog as a popular forum and to render the Internet a wasteland of speech. The appropriate response is not to cripple blogs, but to target abusers and the abuse and attacks they visit on our new communication systems and conversational spaces.

A classic, useful definition of email spam is “UBE,” unsolicited bulk email. By this definition, spam does not need to be commercial; something that is noncommercial or even meaningless as communication (e.g., a flood of empty or nonsense messages) counts. However, a single unsolicited message sent by a person, even if it is commercial in nature, is not spam, so people can initiate conversations with each other by email without being cast as spammers. Any message sent to a group of recipients without their consent falls into this category, however. Although it is not explicit in the definition, we would expect that varying the message by exchanging nonsense words for one another should not keep us from considering a batch of emails as UBEs. The definition is not perfect, since a flood of nonsense email sent to a single person does not involve bulk messages, but it is a start, and perhaps that sort of attack is best characterized differently, anyway.

We might similarly adopt the term or definition “UBC” (unsolicited bulk comments) to refer to comments, whether entered automatically or manually, that are entered into multiple blogs without any relevance to messages on those blogs or in violation of comment policies on those blogs. Actually, spam on blogs is probably better defined in terms of USENET spam. The term “spam” in this usage seems to have originated on USENET, along with the standard that messages are the same if they are “substantively identical.” Blogs do not generally encourage cross-posting – a legitimate activity on USENET – so identifying spam is even easier on blogs. We should also distinguish a single bulk comment posted across blogs, however inappropriate, from a flood attack, the purpose of which is to destroy systems and, overall, to discourage the existence of blogs as channels of communication so that the Internet is eventually turned into an enormous direct mail campaign and hosts no communication between individuals.

The conventional response is typified by two changes that have been made to many blogs, including Grand Text Auto: [*]

  1. Obfuscate links so that comments no longer link directly to Web sites and spammers are “denied PageRank.” Of course, the numerous creative sites, authored by individuals, that are linked to in legitimate comments are also denied PageRank, which is is exactly what spammers want. They would prefer that their link farms and paid advertisements be the exclusive way that PageRank is assigned. And even better, blogs are crippled by this mechanism since it prevents users from mousing over a link and reading the URL. By degrading the workings of blogs, pay sites seem better off in comparison.
  2. Implement censorship mechanisms known as blacklists. As we know from AOL’s prohibition of the word “breast” in chat rooms and bulletin board posts (quite a problem for those who wanted to discuss breast cancer and breast feeding), these blanket methods stifle legitimate conversation, making it particularly difficult to discuss uncomfortable subjects. Will a legitimate URL that has the words “product” and “rape” in it, such as this one, make it through filters meant to deny flood attacks of commercial, sexually deviant URLs?

These sorts of measures also simply pass the comment spam problem along to bloggers on the margins, people who may have no or limited access to technical support and exactly the sort of people who may benefit the most from having blogs as a channel of communication to discuss serious issues that are hard to converse about in other “real world” forums.

On the other hand, legal and technical counterattacks on spammers and blog assailants benefit the whole blogging community and do nothing to restrict legitimate conversations on blogs.

Already, some webmasters have designed systems that feed spammer’s email addresses back to their own email harvesters or otherwise fight back against email harvesters so that their automatic bulk emailing software will send email to their own servers or to the “abuse” addresses of their ISP. Email harvesters are often referred to as “spambots,” but in this document, I’m using “spambot” to refer to a program that is used to coordinate the spamming of or attacks on blogs, often by communicating with numerous compromised computers via IRC to avoid IP-banning schemes. These are enabled by Trojan horse programs such as IRC/Fyle.

Why not extend these sorts of anti-email-harvester tactics to comment-posting spambots that operate on IRC channels? Instead of just kicking such spambots off of IRC channels – the typical response – the spammer’s system can be sabotaged, by the legitimate operators of the public channel, so that spam and attacks are redirected to spammers.

Please notice the essential difference between this proposed tactic – in which the administrator of an IRC channel takes measures to prevent the channel itself from being turned against legitimate Internet users and tries to divert an already-organized attack by compromised computers – and the tactics of the bounty-hunter, anti-art organization RIAA, which has sought legal sanction for clearly criminal incursions on the communications systems of private users, attacks that resemble spammer flood assaults much more than they do the countermeasures I am proposing here. Again, I’m not proposing to initiate DDOS attacks against spammers, but simply to divert their own attacks on blogs so that spammers and attackers, rather than bloggers, suffer.

Perhaps the most serious criticism of this invective of mine: Why just mention this idea, rather than implement it? I have been known to do some computer programming at times, but I certainly couldn’t put together such a system in a half a day, so I felt that I could probably make a better immediate contribution at the level of concept and rhetoric than implementation. I’d be glad to work with others to make it happen, though.

[*] Both of these changes were made with my consent or urging, I should add, as they seemed to be the only tenable, immediate options for us to recover Grand Text Auto as a space for conversation. These can’t be the only defenses that bloggers use, however, and measures that are restrictive to legitimate users should be reversed when more suitable spam-dismantling techniques become available.