Effective CPU denial of service attack using hash functions

ℹ️ This article was written many years ago. Its technical content may be outdated.

As discussed in a previous post on thwarting attackers who gain control of an unencrypted database, hashing is a useful technique which can be used to obfuscate passwords and other data that needs to be verified without storing the data in plaintext. However, as with any situation in which the source of the data is unknown, hashing algorithms on the server can be exploited. A vulnerability was recently given a lot of publicity where data sent using a normal HTTP request could tie up the server for several minutes to several hours.

A hash table is a common data structure used to store POST data sent to a web server. For example, whenever you comment on a post on a website, your name, email and comment are likely stored in a hash table as they are processed. The server uses a hashing algorithm to map keys to values, using the hash as essentially an array index, the algorithm being specified by the implementation language. Alexander Klink and Julian Wälde, security researchers, were able to show that, if they could generate many strings that hashed to the same value, the hash table would degenerate to the point where 99% of CPU time was being used for the duration of the execution of the script.

The researchers describe the vulnerability in their paper.

If the language does not provide a randomized hash function or the application server does not recognize attacks using multi-collisions, an attacker can degenerate the hash table by sending lots of colliding keys. The algorithmic complexity of inserting n elements into the table then goes to O(n**2), making it possible to exhaust hours of CPU time using a single HTTP request.
The maximal POST request size is typically limited to 8 MB, which when filled with a set of multi-collisions would consume about four hours of CPU time on an i7 core. Luckily, this time can not be exhausted because it is limited by the max_input_time (default configuration: -1, unlimited), Ubuntu and several BSDs: 60 seconds) configuration parameter. If the max_input_time parameter is set to -1 (theoretically: unlimited), it is bound by the max_execution_time configuration parameter (default value: 30). On an i7 core, the 60 seconds take a string of multi-collisions of about 500k. 30 seconds of CPU time can be generated using a string of about 300k. This means that an attacker needs about 70-100kbit/s to keep one i7 core constantly busy. An attacker with a Gigabit connection can keep about 10.000 i7 cores busy.

By implementing a randomized hash function, as described above, the attacker cannot possibly force hash collisions on the scale required to tie up the web server. PHP uses the DJBX33 hash function, which has the property that if two strings collide under this function, then any other strings containing these as substrings collide too (this weakness is known as the “equivalent substring” property). This gives an attacker an effective means to compute hash collisions where otherwise a brute force attack would be the most effective route. The fact that most hash table algorithms are not cryptographically strong means that it can be trivial to reverse-engineer the algorithm to generate collisions.

This vulnerability isn’t new: it was first published back in 2003, but only recently gained substantial attention by technology media, meaning that a fix had to be fast-tracked. And PHP was not the only scripting language affected: ASP.NET, Java, Python and Ruby, among others, are also subject to this vulnerability (although not all because of the “equivalent substring” weakness in their hashing algorithms). Microsoft have also released an emergency out-of-cycle patch addressing this vulnerability in their .NET framework.