Computers. Networks. Web Sites.
Brochure with information about our Web design, Web sites, Web pages, computer, and network services
Preventing the Problem: Spambots & Email Harvesting

It seems every good thing can be used for evil. Email and Web sites are no exception. Illegitimate, automated software programs called spam robots or spambots are the tools used by those responsible for the email harvesting phenomenon that poses a significant threat to today's Web site owners, Webmasters and email users.

Spambots employ several modern, legitimate Web technologies and programming language features to perform an illegitimate service. Spambots can use technology similar to the spidering techniques used by search engines to traverse and index Web pages throughout the World Wide Web. This technology allows spambots to move from one Web site to another continually harvesting all the email addresses encountered.

Spambots also utilize the pattern matching features of modern programming languages. Pattern matching allows spambots to identify email addresses within other Web content. Spambots process the text of a Web site in search for matches to the desired pattern — the common patterns of email addresses . Matches are captured and stored — harvested. With a little imagination, you can see just how easily an innocent Web site can be converted into a spamming database.

The Problem — Why is it a Phenomenon?

Email harvesting has become a technological epidemic. Modern advances in the availability and ease of Web publishing allow essentially anyone to create and publish content on the World Wide Web. Free services such as MySpace, Facebook, and Blogger have made Web publishing convenient, but convenience combined with user inexperience has come at a price to all our email boxes.

At the heart of the problem is the use of basic email links (MAILTO - RFC 822, 1738, 2368) on Web pages. Basic email links expose the email addresses used on the visible page and in the underlying source code. Whether on the page or in the source, spambots can easily process and capture email addresses in basic email links.

Here is an example of a basic (MAILTO) email link:

Click this basic email link to email me at please_spam_me@azaleatech.com.

Here is the underlying HTML/XHTML code:

Click this <a href="mailto:please_spam_me@azaleatech.com">basic email link</a> to email me at <a href="mailto:please_spam_me@azaleatech.com">please_spam_me@azaleatech.com</a>.

Notice the email address in basic link and in the underlying code above. Spambots notice too. Herein lies the dilemma. It's necessary to allow simple, quick email communication via Web sites, but how can it be done without revealing the email addresses involved in the process?

Prevention — What Can I Do About It?

There are many techniques for preventing email harvesting. Perform a Google search on “prevent email harvesting” and note the number of results returned. Most of the techniques center around modifying email addresses in such as way as to make them undetectable to spambots whether on the page or in the source code. Other techniques such as honeypots try to discover the activities of spammers and corrupt spamming databases with invalid email addresses.

Azalea Technology recommends obscuring or encrypting email addresses in connection with the use of online forms and server-side scripts to process the forms. This effectively prevents most all pattern matching techniques. You can create your own algorithm for obscuring email addresses or use one of the standard encryption algorithms such as AES, Triple DES, or Blowfish along with a salt value. A salt value helps to randomize the cipher text and prevent dictionary-style attacks. Many languages such as PHP have built-in support for these encryption algorithms, so you don't have to implement them yourself.

Here is a snippet of XHTML code for a link that uses an encrypted email address:

Click this <a href="my_form.php?email=5d485d6966ef25c1663087cd44d8436debc2b66e0f539a7c7781562b2b4bc7df">secure link</a> to email me.

The secure link above points to a PHP server-side script that collects the GET request email value and adds it in with other form fields that will be submitted on a subsequent POST request.

Here is a snippet of XHTML code for a form that uses an encrypted email address:

<form action="my_form_processor.php" method="post" name="formdata" id="formdata">
<input type="hidden" name="send_to_address" value="5d485d6966ef25c1663087cd44d8436debc2b66e0f539a7c7781562b2b4bc7df" />
.
.
.
</form>

The value of the hidden form field send_to_address is an encrypted email address that the server-side script uses for the recipient of the form information. A server-side script encrypts the email address before it ever leaves the server, so the real email address is never visible in the source. When the form is submitted, a server-side script decrypts the email address so that it can be utilized.

If you are not a Web developer, and programming and encryption aren't your cup of tea, find a person or company that can help you secure the email addresses found on your Web site. You will never rid yourself of your spam problem until you mitigate the email harvesting problem taking place on your Web site.

Basic email links also present an important usability concern. Basic email links involve the behavior of the internet browser and operating system in question in order to act on the email link. In most cases, the default email program is launched, and a new email message is created with the email address listed as the message recipient. The usability issues arise when the default email program is not a part of a Web site visitors normal email experience. For example, a Gmail or Hotmail user typically uses an internet browser in his email experience instead of the default email program on his Microsoft Windows PC (e.g. Microsoft Outlook Express). The default email program can be frustrating and annoying for many Web site visitors, especially Web mail users.