Maybe you have seen the screen with "Page not found" or the realURL "Segment was not bla bla bla", and thought that this wasn't a very nice way to tell the user that a page doesn't exist. Fortunately TYPO3 allows you to direct the user to an error page of your choice. It could be a static file on your webserver, it could be one of the pages in your pagetree, or an external page on another domain. The possibilities are endless, but some of these solutions comes with a price.
I host a lot of small sites on my webserver, all of them with TYPO3 on a shared source. Suddenly I discovered that one of my clients started to show an unusual amount of traffic. I know this client very well so it puzzled me since there wasn't really anything on that page, that could generate that amount of traffic, unless my client suddenly had a dramatic raise in visitors, which would be very doubtful.
I examined the statistics and to my surprise almost all of it stemmed from 404 error pages. On two following days my client had received more than 20.000 requests for pages that did not exist. The pages had paths like /typo3temp/typo3temp/fileadmin/fileadmin/template/typo3temp and so on. Just these two days, and then it stopped. Is this a problem? Not if your errorpage has a small filesize, but when the errorpage includes graphics, it's not that fun anymore.
So know this when you use the custom 404 page, if your page includes graphics, you could get a substantial amount of traffic from bad requests. As I checked other pages, many of them had similar entries, but those were better hidden because the amount of traffics on these sites were big even before. I solved this problem by adding an alias to apache:
Alias /errorpage.html "/usr/local/blabla/errorpage.html"
Now I can set all my clients errorhandling to this lightweight error page.
But these requests rendered another problem, that wasn't solved with the solution just presented. Every request would be logged by apache, and because of the error handling by TYPO3, there would be two logentries for every request. This meant that the apache access log of my client quickly gained 100mb+, and that's not space you want to allocate to useless logentries.
I solved that problem using apache's conditional logging. I added the following to my configuration:
SetEnvIf Remote_Addr "X\.Y\.Z\.K" dontlog
This one checks if the entry is requested by my own server (that's one of the entries)
SetEnvIf Request_URI "^.*typo3temp.*(?:typo3temp|fileadmin).*$" dontlog
This is a regex checking for invalid URIs like /typo3temp/something/typo3temp/ or /typo3temp/fileadmin/something/ basically any URI that somewhere has typo3temp and later in the URI either fileadmin or typo3temp again. I did this because I found that many of the invalid URI's were of this type.
Now this will set an environment variable that I can validate against in every log on all my clients sites. By adding env=!dontlog to every CustomLog entry in the virtual hosts, I now avoid ever having logs filled with these 404 entries.
This was what I did to solve this annoying problem, do some of you have any experience with this?
If you provide a sitemap or a search functionality on your 404 pages, you do this with a reason: helping your visitor to find the information he needs.
Maybe it might be better to only block those erroneous requests for which you actually *know* that they are really erroneous and in fact only eat your expensive bandwidth.
This way your customer doesn't loose good functionality for his visitors but he doesn't loose too much bandwidth on 20k requests that are definitively wrong.