First, let me say to all you TYPO3 veterans. If you are familiar with how to obscure e-mail addresses with TypoScript then you may want to skip down to the section where I introduce the "The Twist" and the extra JavaScript needed to pull this off.
Spam bots are continually crawling your website trying to harvest your e-mail address so they they can send you boatloads of junk mail. It is always a challenge to obscure addresses from these bots and still make the site easy for users to use and editors to edit.
I am going to first present the code TYPO3 already has to address the problem. After that I will introduce a twist that I can up with through experimentation which I believe better hides the addresses from bots while maintaining usability. The e-mail address that I will be using in the examples is myname@mydomain.com. With out any attempt at obscuring, this linked address will look like this to the front end user:
and look like this to spam bots in the source code:
<a href="mailto:myname@mydomain.com"class="mail">
myname@mydomain.com</a>
Obviously, the address is entirely exposed to the dreaded spam bots.
There is a very simple way to obscure linked e-mail addresses in your TYPO3 site. Add code like this to your TYPO3 template.
config {
spamProtectEmailAddresses = -3
spamProtectEmailAddresses_atSubst = [at]
spamProtectEmailAddresses_lastDotSubst = [dot]
}
After putting this code in the template and clearing the cache, our address now looks like this to our site visitor:
and like this in the source code:
<a href="http://www.busynoggin.com/?id=javascript:
linkTo_UnCryptMailto('jxfiql7jvkxjbXjvaljxfk+zlj');"
class="mail">myname[at]mydomain[dot]com</a>
What does the TypoScript do?
The address functions as normal meaning users can still click on it and have their e-mail client brought up and the address injected into a new message. Of course, this approach depends on JavaScript being available on the user's browser, but that is pretty much universal these days. You can read more about these settings on page 57 of TSref.
So, what have we accomplished so far? We have done a good job of hiding the address contained in the "mailto" part of the source code. And we have made an attempt to hide the text from bots by substituting "[at]" for "@" and substituting "[dot]" for "." However, it is becoming more common for people to substitute alternate text for these parts of an e-mail address. Bots can easily be programmed to see through this.
What have we sacrificed? Nothing for the editor as they still enter the address in the backend as myname@mydomain.com and link it. But for the web site user we have definitely sacrificed some level of usability. Especially those that are less web savvy and are confused by our odd-looking text.
There is, however, a better way
First, let me qualify a couple of things. I have come up with this technique on my own, however, it is quite possible that others have figured this out as well. Second, for all you javascript programmers, please withhold your laughter. I am sure the js code can be much cleaner. Basically, I quickly looked stuff up in a js book, put it together and tested it. The function name and arguments are written for clarity in this post and not for compactness of code.
This approach came from asking myself, "I wonder if TypoScript will let me do this?" and ended with "Isn't that cool Typoscript allows this to be done."
The basic concept is to have TypoScript substitute JavaScript code for "@" and "." in the e-mail instead of substituting text.
You need a couple of JavaScript functions available to your page. One way to do this is to add the following code to the page object in your site template (of course, the example assumes your page object is called "page" and that there is not already a "headerData.50" -- adjust names if needed):
page {
headerData.50 = TEXT
headerData.50.value (
<script type="text/javascript">
<!--
function obscureAddMid() {
document.write('@');
}
function obscureAddEnd() {
document.write('.');
}
// -->
</script>
)
}
Use this code for your spam protection (keep values on same line as object path, not like example):
config {
spamProtectEmailAddresses = -2
spamProtectEmailAddresses_atSubst =
<script type="text/javascript"> obscureAddMid() </script>
spamProtectEmailAddresses_lastDotSubst =
<script type="text/javascript"> obscureAddEnd() </script>
}
After clearing the cache our example e-mail will look like this to the front end user:
And to the spam bots the source code looks like:
<a href="javascript:linkTo_UnCryptMailto('jxfiql7jvkxjbXjvaljxfk+zlj');"
class="mail" >myname<script type="text/javascript">
document.write(obscure('@'))
</script>mydomain<script type="text/javascript">
document.write(obscure('.')) </script>com</a>
What have we accomplished now?
You can see this approach in operation on the e-mail address on this page.
Anyway, this very long blog post all came from asking myself, "I wonder if...."
this solution looks nice, but there are much easier ways to do that. If you agree that a spam bot will most likely not parse CSS definitions, you can simply do this:
config {
spamProtectEmailAddresses = 1
spamProtectEmailAddresses_atSubst = ping@
spamProtectEmailAddresses_lastDotSubst = pong.
}
The bot will either find the address my.nameping@myhostpong.com (if he just strips all HTML tags) or mynamemyhostcom (if he strips HTML tags including the contents).
Pros:
+ Human accessibility (in this case you should use better placeholders, e.g. "REMOVE_THIS" instead of "ping" and "pong")
+ no additional JavaScript tricks required
- not 100% future safe: Bots might become able to parse CSS, but they also might be able to parse JavaScript as well as image placeholders
--
- michael