davidlyness.com

Things I find interesting.


As discussed in a previous post on thwarting attackers who gain control of an unencrypted database, hashing is a useful technique which can be used to obfuscate passwords and other data that needs to be verified without storing the data in plaintext. However, as with any situation in which the source of the data is unknown, hashing algorithms on the server can be exploited. A vulnerability was recently given a lot of publicity where data sent using a normal HTTP request could tie up the server for several minutes to several hours.

A hash table is a common data structure used to store POST data sent to a web server. For example, whenever you comment on a post on this website, your name, email and comment are stored in a hash table as they are processed (in this case by PHP). It uses a hashing algorithm to map keys to values, using the hash as essentially an array index, the algorithm being specified by the implementation language. Alexander Klink and Julian Wälde, security researchers, were able to show that, if they could generate many strings that hashed to the same value, the hash table would degenerate to the point where 99% of CPU time was being used for the duration of the execution of the script.

The researchers describe the vulnerability in their paper.

If the language does not provide a randomized hash function or the application server does not recognize attacks using multi-collisions, an attacker can degenerate the hash table by sending lots of colliding keys. The algorithmic complexity of inserting n elements into the table then goes to O(n**2), making it possible to exhaust hours of CPU time using a single HTTP request.

The maximal POST request size is typically limited to 8 MB, which when filled with a set of multi-collisions would consume about four hours of CPU time on an i7 core. Luckily, this time can not be exhausted because it is limited by the max_input_time (default configuration: -1, unlimited), Ubuntu and several BSDs: 60 seconds) configuration parameter. If the max_input_time parameter is set to -1 (theoretically: unlimited), it is bound by the max_execution_time configuration parameter (default value: 30). On an i7 core, the 60 seconds take a string of multi-collisions of about 500k. 30 seconds of CPU time can be generated using a string of about 300k. This means that an attacker needs about 70-100kbit/s to keep one i7 core constantly busy. An attacker with a Gigabit connection can keep about 10.000 i7 cores busy.


By implementing a randomized hash function, as described above, the attacker cannot possibly force hash collisions on the scale required to tie up the web server. PHP uses the DJBX33 hash function, which has the property that if two strings collide under this function, then any other strings containing these as substrings collide too (this weakness is known as the "equivalent substring" property). This gives an attacker an effective means to compute hash collisions where otherwise a brute force attack would be the most effective route. The fact that most hash table algorithms are not cryptographically strong means that it can be trivial to reverse-engineer the algorithm to generate collisions.

This vulnerability isn't new: it was first published back in 2003, but only recently gained substantial attention by technology media, meaning that a fix had to be fast-tracked. And PHP was not the only scripting language affected: ASP.NET, Java, Python and Ruby, among others, are also subject to this vulnerability (although not all because of the "equivalent substring" weakness in their hashing algorithms). Microsoft have also released an emergency out-of-cycle patch addressing this vulnerability in their .NET framework.


In computing, a quine is a computer program which outputs a copy of its own source code. (See the Wikipedia page on quines for more information and examples.) I recently discovered a different sort of quine: mathematical ones.

Consider the function \[{1\over 2} < \left\lfloor \mathrm{mod}\left(\left\lfloor {y \over 17} \right\rfloor 2^{-17 \lfloor x \rfloor - \mathrm{mod}(\lfloor y\rfloor, 17)},2\right)\right\rfloor\] where
  • \(\lfloor x \rfloor\) is the floor function, i.e. the largest integer less than \(x\);
  • \(\bmod(b,m)\) is the modulo operation, which, when graphed over \(0\leq x\leq 106\) and \(n\leq y\leq n+17\) with n = 4 858 450 636 189 713 423 582 095 962 494 202 044 581 400 587 983 244 549 483 093 085 061 934 704 708 809 928 450 644 769 865 524 364 849 997 247 024 915 119 110 411 605 739 177 407 856 919 754 326 571 855 442 057 210 445 735 883 681 829 823 754 139 634 338 225 199 452 191 651 284 348 332 905 131 193 199 953 502 413 758 765 239 264 874 613 394 906 870 130 562 295 813 219 481 113 685 339 535 565 290 850 023 875 092 856 892 694 555 974 281 546 386 510 730 049 106 723 058 933 586 052 544 096 664 351 265 349 363 643 957 125 565 695 936 815 184 334 857 605 266 940 161 251 266 951 421 550 539 554 519 153 785 457 525 756 590 740 540 157 929 001 765 967 965 480 064 427 829 131 488 548 259 914 721 248 506 352 686 630 476 300.


This is known as Tupper's Formula (created by Jeff Tupper in 2001). It produces the following "graph".

quine picture

If you look closely, this graph is a bitmap which has been chosen out of all possible bitmaps of this size by the value of \(n\) specified above, making it not truly self-referential. A real (and complex) self-referential mathematical formula was created by Jakub Trávník earlier this year.


Like all other data, computers store colour data as a string of digits. Colour values are stored as strings of digits, with the hash symbol (#) at the front indicating we are working in hexadecimal. For example, #000000 is black, #ffffff is white, and #c0c0c0 is the grey of the codeblock below - check out the W3Schools page on CSS Colors for more details. On a recent project I came across the problem of generating a random string of hex digits to be interpreted as a colour.

Our first thought is that we're probably going to want to use the in-built JavaScript pseudo random number generator. While not cryptographically strong, it's good enough for this case. The general syntax is:

Math.random()


This will return a random decimal value between 0 and 1 exclusive, for example 0.40566325443796813. For reasons we'll see later, we would like to generate an integer within a range rather than a decimal. This is accomplished by multiplying and using the Math.Floor function. For example, if we wanted a random number between 0 and 10:

Math.floor(Math.random()*11)


Note that since the raw generated numbers are contained in (0,1) rather than [0,1], Math.Floor will never return 10 when the result is multiplied by 10. Therefore 11 must be used instead.

All hex colour codes are between #000000 and #ffffff. Remember that these are hexadecimal numbers, so they correspond to the range from 0 to 16777215. Therefore, we need to add one to reach the full range of colours, making the multiplier 16777216. We can also use JavaScript's ToString method to convert our decimal number back to hexadecimal (using a radix of 16), and put a '#' character on the front to comply with formatting rules. Our code so far looks as follows:

'#' + Math.floor(Math.random()*16777216).toString(16)


Generating 10 numbers, we get the following set of promising results:

#218ac5
#9e0a5c
#d733b
#5f4ea
#a9d39e
#5f9d7d
#26156f
#37f078
#e4948d
#9e754b


There is still one remaining problem, which can be seen in the 3rd and 4th entries of the above list. They have only 5 hex digits whereas the format calls for 6. The problem here is that these numbers should have leading zeros if they are less than 6 digits long. After scratching my head for a bit I came up with a neat trick using the substr() method with a negative index (with the caveat that it doesn't work in older versions of IE):

'#' + ('00000' + Math.floor(Math.random()*16777216).toString(16)).substr(-6)


Now we get the following set of results:

#00a24b
#686cc1
#7eb56f
#2aba6b
#13ab28
#b72d45
#eba0b2
#9dca3b
#f0a4ed
#63b0c1


Notice that the 1st entry has been padded with two zeros, so all generated numbers are 6 digits long. And we're done!