RSS 2.0
RSS 0.91
ATOM 0.3
September 18, 2009

The cHashes just got longer

Category: Core

By: Francois Suter

In TYPO3 4.3 cHashes will use full-length md5 hashes instead of TYPO3's famous "short md5" hashes. Why and what impact does it have?

TYPO3 is famous for using so-called "short md5" hashes, which are nothing but a cropped version of a md5 hash. How many characters are preserved can be chosen when calling t3lib_div::shortMD5(), the default being 10. This is used in particular for calculating cHashes (if you're not familiar with cHashes, please refer to Kasper's excellent article "The mysteries of &cHash", old but not dated).

One issue with any kind of hash is that it's not unique. The longer the hash, the less probability there is that two given strings will produce the same hash. On the other hand cutting down a hash raises that probability. This is what was happening with short md5 hashes. Imagine a large TYPO3 web sites with many elements, like a lot of news items (from tt_news) and a lot of FE user groups. This will trigger the generation of a large number of cHashes. Every now and then a hash is generated that is a duplicate of another hash. What happens then? TYPO3 is confused and serves the wrong cache (for example, the wrong news item).

Switching to full-length md5 hashes reduces this risk. It is not entirely gone, but the sheer number of possible combinations makes it very small. To reduce the risk even further longer hashes could be used. However any increase in the length of cHashes also has an impact on performance. Longer hashes mean more demand on the database when executing queries where cHashes are used. There was thus a balance to find between the length of the hashes and the risks of duplicates. Another minor drawback is that longer cHashes also look worse in the URL (which was an advantage of shorter cHashes).

Anyway the calculation of cHashes is now fully encapsulated, so it will be easier to change in the future, should such a need arise. Calculating a cHash is now as easy as:

$cHash = t3lib_div::generateCHash('&foo=bar&baz=tog');

where the argument is a query string.

So what to expect of this change? It has an impact on some extensions that use or rely on cHashes. Up to now we have identified two important extensions: RealURL (realurl) and the Site Crawler (crawler):

  • for RealURL you must use the latest development version, which can be grabbed from the SVN repository.
  • for the Site Crawler, the change is not yet integrated, but will be soon. It affects crawler modules that rely on the crawler providing cHashes, like indexed_search for example.

Rest assured that fully working, public versions of these extensions will be available by the time the final version of TYPO3 4.3 is released.


comment #1
Gravatar: Robert Robert September 18, 2009 14:38
Why would longer cHashes increase the strain on the database?

From my understanding, unique cHashes cause an index cardinality of n (n being the number of cHashes present throught on particular table) which in an ideal hash model would equal the number of hashed items. Hash collisions are the only reason which would decrease cardinality, I think. And as collisions are a bad thing index size is just a collateral effect of databas design. Agree?

comment #2
Gravatar: Tolleiv Nietsch Tolleiv Nietsch September 18, 2009 15:07
Thx for the information.

For everyone who's interested in the crawler- that's the related issue:
and that's the relevant SVN location (as soon as this is commited):

I think we'll have a TYPO3 4.3 service release soon :)

comment #3
Gravatar: Michiel Roos Michiel Roos September 29, 2009 15:53
I thought md5 had been cracked looong ago?

comment #4
Gravatar: Asakurayoh Asakurayoh October 5, 2009 20:56
Michiel Roos:
So what if it has been cracked? It's juste a way to create a unique id for cache. Not to secure the TYPO3 ;)
(there is Salted-Password to do that)

Sorry, comments are closed for this post.