|
It seems every good thing can be used for evil. Email and Web sites are no exception. Illegitimate,
automated software programs called spam robots or spambots are the tools used by those responsible for
the email harvesting phenomenon that poses a significant threat to today's Web site owners, Webmasters and
email users.
Spambots employ several modern, legitimate Web technologies and programming language features to perform an
illegitimate service. Spambots can use technology similar to the spidering techniques used by search engines
to traverse and index Web pages throughout the World Wide Web. This technology allows spambots to move from
one Web site to another continually harvesting all the email addresses encountered.
Spambots also utilize the pattern matching features of modern programming languages. Pattern
matching allows spambots to identify email addresses within other Web content. Spambots process the text of
a Web site in search for matches to the desired pattern — the common patterns of email addresses . Matches
are captured and stored — harvested. With a little imagination, you can see just how easily an innocent
Web site can be converted into a spamming database.
The Problem — Why is it a Phenomenon?
Email harvesting has become a technological epidemic. Modern advances in the availability and ease of
Web publishing allow essentially anyone to create and publish content on the World Wide Web. Free
services such as MySpace, Facebook, and Blogger have made Web publishing convenient, but convenience
combined with user inexperience has come at a price to all our email boxes.
At the heart of the problem is the use of basic email links (MAILTO - RFC 822, 1738, 2368) on Web pages.
Basic email links expose the email addresses used on the visible page and in the underlying source code.
Whether on the page or in the source, spambots can easily process and capture email addresses in basic
email links.
Here is an example of a basic (MAILTO) email link:
Click this basic email link to email me at please_spam_me@azaleatech.com.
Here is the underlying HTML/XHTML code:
Click this <a href="mailto:please_spam_me@azaleatech.com">basic email link</a> to email me at <a href="mailto:please_spam_me@azaleatech.com">please_spam_me@azaleatech.com</a>.
Notice the email address in basic link and in the underlying code above. Spambots notice
too. Herein lies the dilemma. It's necessary to allow simple, quick email communication via Web sites, but
how can it be done without revealing the email addresses involved in the process?
Prevention — What Can I Do About It?
There are many techniques for preventing email harvesting. Perform a Google search on “prevent
email harvesting” and note the number of results returned. Most of the techniques center around
modifying email addresses in such as way as to make them undetectable to spambots whether on the page
or in the source code. Other techniques such as honeypots try to discover the activities of
spammers and corrupt spamming databases with invalid email addresses.
Azalea Technology recommends obscuring or encrypting email addresses in connection with the use of
online forms and server-side scripts to process the forms. This effectively prevents most all pattern
matching techniques. You can create your own algorithm for obscuring email addresses or use one of the
standard encryption algorithms such as AES, Triple DES, or Blowfish along with a salt value. A salt
value helps to randomize the cipher text and prevent dictionary-style attacks. Many languages such as
PHP have built-in support for these encryption algorithms, so you don't have to implement them yourself.
Here is a snippet of XHTML code for a link that uses an encrypted
email address:
Click this <a href="my_form.php?email=5d485d6966ef25c1663087cd44d8436debc2b66e0f539a7c7781562b2b4bc7df">secure link</a> to email me.
The secure link above points to a PHP server-side script that collects the GET request email
value and adds it in with other form fields that will be submitted on a subsequent POST request.
Here is a snippet of XHTML code for a form that uses an encrypted
email address:
<form action="my_form_processor.php" method="post" name="formdata" id="formdata">
<input type="hidden" name="send_to_address" value="5d485d6966ef25c1663087cd44d8436debc2b66e0f539a7c7781562b2b4bc7df" />
.
.
.
</form>
The value of the hidden form field send_to_address is an encrypted email address that
the server-side script uses for the recipient of the form information. A server-side script encrypts the email
address before it ever leaves the server, so the real email address is never visible in the source. When the
form is submitted, a server-side script decrypts the email address so that it can be utilized.
If you are not a Web developer, and programming and encryption aren't your cup of tea, find a person or
company that can help you secure the email addresses found on your Web site. You will never rid yourself
of your spam problem until you mitigate the email harvesting problem taking place on your Web site.
Basic email links also present an important usability concern. Basic email links involve the behavior of
the internet browser and operating system in question in order to act on the email link. In most cases,
the default email program is launched, and a new email message is created with the email address listed
as the message recipient. The usability issues arise when the default email program is not a part of
a Web site visitors normal email experience. For example, a Gmail or Hotmail user typically uses an internet
browser in his email experience instead of the default email program on his Microsoft Windows PC (e.g. Microsoft
Outlook Express). The default email program can be frustrating and annoying for many Web site visitors,
especially Web mail users.
|