blog.Resource

Archive:

News-Feeds:


RSS 2.0
RSS 0.91
RDF
ATOM 0.3
March 30, 2008

Watch out when you use 404 errorhandling

Category: Søren Andersen, TYPO3

By: Søren Andersen

Some people like to show a nice 404 error page to their users, preferably with a dynamic sitemap, so the user can find his way back to the actual webpage. So did I, until I discovered something that added to my expenses each month...

Maybe you have seen the screen with "Page not found" or the realURL "Segment was not bla bla bla", and thought that this wasn't a very nice way to tell the user that a page doesn't exist. Fortunately TYPO3 allows you to direct the user to an error page of your choice. It could be a static file on your webserver, it could be one of the pages in your pagetree, or an external page on another domain. The possibilities are endless, but some of these solutions comes with a price.

I host a lot of small sites on my webserver, all of them with TYPO3 on a shared source. Suddenly I discovered that one of my clients started to show an unusual amount of traffic. I know this client very well so it puzzled me since there wasn't really anything on that page, that could generate that amount of traffic, unless my client suddenly had a dramatic raise in visitors, which would be very doubtful.

I examined the statistics and to my surprise almost all of it stemmed from 404 error pages. On two following days my client had received more than 20.000 requests for pages that did not exist. The pages had paths like /typo3temp/typo3temp/fileadmin/fileadmin/template/typo3temp and so on. Just these two days, and then it stopped. Is this a problem? Not if your errorpage has a small filesize, but when the errorpage includes graphics, it's not that fun anymore.

So know this when you use the custom 404 page, if your page includes graphics, you could get a substantial amount of traffic from bad requests. As I checked other pages, many of them had similar entries, but those were better hidden because the amount of traffics on these sites were big even before. I solved this problem by adding an alias to apache:
Alias /errorpage.html "/usr/local/blabla/errorpage.html"
Now I can set all my clients errorhandling to this lightweight error page.

But these requests rendered another problem, that wasn't solved with the solution just presented. Every request would be logged by apache, and because of the error handling by TYPO3, there would be two logentries for every request. This meant that the apache access log of my client quickly gained 100mb+, and that's not space you want to allocate to useless logentries.
I solved that problem using apache's conditional logging. I added the following to my configuration:
SetEnvIf Remote_Addr "X\.Y\.Z\.K" dontlog
This one checks if the entry is requested by my own server (that's one of the entries)
SetEnvIf Request_URI "^.*typo3temp.*(?:typo3temp|fileadmin).*$" dontlog
This is a regex checking for invalid URIs like /typo3temp/something/typo3temp/ or /typo3temp/fileadmin/something/ basically any URI that somewhere has typo3temp and later in the URI either fileadmin or typo3temp again. I did this because I found that many of the invalid URI's were of this type.

Now this will set an environment variable that I can validate against in every log on all my clients sites. By adding env=!dontlog to every CustomLog entry in the virtual hosts, I now avoid ever having logs filled with these 404 entries.

This was what I did to solve this annoying problem, do some of you have any experience with this?


comments

comment #1
Gravatar: Jeroen Serpieters Jeroen Serpieters April 1, 2008 18:58
Don't you think that you in fact cripple your websites by doing this?
If you provide a sitemap or a search functionality on your 404 pages, you do this with a reason: helping your visitor to find the information he needs.

Maybe it might be better to only block those erroneous requests for which you actually *know* that they are really erroneous and in fact only eat your expensive bandwidth.
This way your customer doesn't loose good functionality for his visitors but he doesn't loose too much bandwidth on 20k requests that are definitively wrong.

comment #2
Gravatar: Nathan L Nathan L April 1, 2008 23:12
I have been experiencing the exact same thing. My log files and views to the custom 404 page have been growing like crazy. One day last week had over 85,000 views of the 404 page.

It seems to grow, starting with a request to "fileadmin/templates/some.js" then two seconds later the same user requests:
"fileadmin/templates/fileadmin/templates/some.js repeat for 3 minutes!

So far I think it has something to do with their browser not useing the base href tag correctly. When this mistake gives them a 404, the 404 page has the same problem so it gives them another 404, etc...

This seems related to bug 1537 (bugs.typo3.org).

comment #3
Gravatar: Søren Andersen Søren Andersen April 2, 2008 16:50
Nathan, I also suspect the base href tag to be the sinner, as I have only seen this behaviour on pages using realurl.

Jeroen, it's a tradeoff. But if there is one thing I don't want to spend my time with, it's blocking people who use these browsers, that go ballistic on my traffic. The clients I have doesn't really need this fine 404 handling, but it would be a great idea to make an extension that provides a database tabel and records pr. IP, how many times a 404 page was requested, and when that reaches some limit it will stop serving the nice 404 and start using another document that's a lot lighter.

That would make it possible to have nice 404's without worrying that you suddenly have to pay extra because useless traffic is consumed.

Sorry, comments are closed for this post.