davidlyness.com

Things I find interesting.

Posts in the Web Technology category (page 1 of 2)

For the past year or so the internet community has been attempting to migrate (or at least attempting to prepare to migrate) from IPv4 to IPv6. The main reason for the transition is to enable - out of necessity - a larger IP address space. While IPv4 allocates 32 bits for an individual address, giving \(2^{32}=4294967296\) unique IP addresses, IPv6 allocates 128 bits, allowing \(2^{128}=340282366920938463463374607431768211456\) different addresses to be uniquely referenced.

However, IPv6 affords us many other benefits aside from the increased address space. While none of these may necessitate a transition by themselves, they are still welcome additions to the protocol. The following information is paraphrased from the IPv6 Wikipedia article.

Multicasting

IPv6 allows a packet to be transmitted to multiple recipients with a single operation, rather than having to specify each recipient individually (and use up bandwidth in the process). By sending a packet to the ff02::1 multicast group a packet will be transmitted to all hosts on the link.

Stateless address autoconfiguration (SLAAC)

ICMP - the Internet Control Message Protocol - and DHCP - the Dynamic Host Configuration Protocol - are also getting v6 version bumps with the introduction of IPv6. ICMPv6 and DHCPv6 allow automatic configuration of hosts connected to an IPv6 network. ICMPv6 allows stateless configuration, meaning that the local router will automatically respond to a multicast broadcast advertising its configuration parameters. If this is unavailable (depending on environment and application) DHCPv6 can be used as a fallback in much the same way as DHCPv4.

Mandatory network-layer security

Perhaps the most important additional enhancement, IPv6 now handles IPsec traffic as standard. IPsec, like HTTPS, enforces encryption and authentication of packets, but is protocol-agnostic. Although engineered into IPv4, IPsec will be mandatory for IPv6 connections and will hugely increase the security of all internet use.

Simplified processing by routers

Multiple enhancements have been made to simplify the process of packet forwarding and make it more efficient.
  • Due to the additional features of IPv6, the packet headers are on the order of twice as large as IPv4 headers. However, the layout has been simplified with fields that are rarely used being moved to optional header extensions (see "Options extensibility" below). For example, the Selective Directed Broadcast Mode option, providing unreliable UDP delivery to a set of IP addresses - which can range in size from 6 to 38 bytes - is rarely used and can be safely moved to the optional header extensions section of the packet.
  • There is no fragmentation of packets between routers over an IPv6 network. To ensure the correct MTU is set, hosts can perform end-to-end MTU discovery, end-to-end packet fragmentation, or use the default minimum of 1280 bytes. This eases the load on the hosts to reassemble fragmented packets, but should also mean that numerous VPN solutions work much more reliably.
  • The somewhat-redundant Internet-layer checksum has been removed. Since both the link layer (e.g. Ethernet) and transport layer (e.g. TCP) typically perform checksum calculations, it is very unlikely that such a checksum would detect an error not already caught at a higher or lower layer. This also eases the burden on intermediate IPv6 routers, as when the packet is altered due to the "time to live" (TTL) being decremented, they do not have to recalculate the checksum.
  • The TTL field has been renamed to "Hop Limit", an all round more intuitive term for what it represents.


Mobility

Mobile IPv6, i.e. IPv6 on mobile devices, is the IPv6 implementation that allows users to move from one network to another while maintaining a permanent (or at least sticky) IP address. (Otherwise, when dynamically switching from a 2G to a 3G network on a mobile device, any partially-downloaded files would become unusable, and voice / video calls would likely glitch as the transition is made.) Mobile IPv6 avoids the problem of triangular routing that plagues mobile IPv4.

Options extensibility

As noted above, optional header options are implemented as additional extensions separate from the main IPv6 header, which has a fixed size of 40 bytes. This allows the protocol to be completely extensible in the future, allowing quality of service, security and mobility features to be added without redesigning the base protocol.

Jumbograms

The most fun-sounding addition has been saved for last. IPv4 packets have a fixed maximum size of 65535 bytes. By specifying the "Jumbo Payload Option" header, IPv6 nodes can handle packets larger than this limit, known as jumbograms. This can increase performance over high-MTU links, but is also fun to say.


As discussed in a previous post on thwarting attackers who gain control of an unencrypted database, hashing is a useful technique which can be used to obfuscate passwords and other data that needs to be verified without storing the data in plaintext. However, as with any situation in which the source of the data is unknown, hashing algorithms on the server can be exploited. A vulnerability was recently given a lot of publicity where data sent using a normal HTTP request could tie up the server for several minutes to several hours.

A hash table is a common data structure used to store POST data sent to a web server. For example, whenever you comment on a post on this website, your name, email and comment are stored in a hash table as they are processed (in this case by PHP). It uses a hashing algorithm to map keys to values, using the hash as essentially an array index, the algorithm being specified by the implementation language. Alexander Klink and Julian Wälde, security researchers, were able to show that, if they could generate many strings that hashed to the same value, the hash table would degenerate to the point where 99% of CPU time was being used for the duration of the execution of the script.

The researchers describe the vulnerability in their paper.

If the language does not provide a randomized hash function or the application server does not recognize attacks using multi-collisions, an attacker can degenerate the hash table by sending lots of colliding keys. The algorithmic complexity of inserting n elements into the table then goes to O(n**2), making it possible to exhaust hours of CPU time using a single HTTP request.

The maximal POST request size is typically limited to 8 MB, which when filled with a set of multi-collisions would consume about four hours of CPU time on an i7 core. Luckily, this time can not be exhausted because it is limited by the max_input_time (default configuration: -1, unlimited), Ubuntu and several BSDs: 60 seconds) configuration parameter. If the max_input_time parameter is set to -1 (theoretically: unlimited), it is bound by the max_execution_time configuration parameter (default value: 30). On an i7 core, the 60 seconds take a string of multi-collisions of about 500k. 30 seconds of CPU time can be generated using a string of about 300k. This means that an attacker needs about 70-100kbit/s to keep one i7 core constantly busy. An attacker with a Gigabit connection can keep about 10.000 i7 cores busy.


By implementing a randomized hash function, as described above, the attacker cannot possibly force hash collisions on the scale required to tie up the web server. PHP uses the DJBX33 hash function, which has the property that if two strings collide under this function, then any other strings containing these as substrings collide too (this weakness is known as the "equivalent substring" property). This gives an attacker an effective means to compute hash collisions where otherwise a brute force attack would be the most effective route. The fact that most hash table algorithms are not cryptographically strong means that it can be trivial to reverse-engineer the algorithm to generate collisions.

This vulnerability isn't new: it was first published back in 2003, but only recently gained substantial attention by technology media, meaning that a fix had to be fast-tracked. And PHP was not the only scripting language affected: ASP.NET, Java, Python and Ruby, among others, are also subject to this vulnerability (although not all because of the "equivalent substring" weakness in their hashing algorithms). Microsoft have also released an emergency out-of-cycle patch addressing this vulnerability in their .NET framework.


On any site that uses passwords for authentication, be it Google, Facebook or this one, the site needs to have some way of telling whether the password given by the user matches the correct password. This alone is a simple concept - but at the same time we need to make sure that we protect users' passwords if the database containing usernames, passwords, etc is compromised. Many people use the same password for different web services, so if an obscure website's database was hacked and the emails and passwords of their users were compromised, it's likely that more than a few used the same email and password combination for their PayPal account, Google account, online banking accounts... you get the idea.

Let's first consider the example where we simply store users' passwords in plaintext (that is, if a user's password is "abc123" we store "abc123" in the database). Consider the following set of users.

UserPasswordPassword data stored in database
User1abcabc
User2abcabc
User3abdabd
User4defdef
Obviously anyone who gains access to this database will have all the login details for all users immediately. The site owners always need to be able to tell whether a user knows the correct password, so all we can do is obscure the process of calculating a user's password from their database data.

(Aside: encrypting the database with a secret key could be seen as a solution; however, the majority of web attacks are done through the likes of SQL injection vulnerabilities, which execute at a level at which the database can be read in a decrypted form.)

A good first step is to apply a public, well-tested hash function to the passwords in the database, and store the password hashes rather than the passwords themselves. A hash function is a one-way function; that is, is it easy to perform one way (i.e. turning the password into the hash), but hard or impossible to do the reverse (i.e. turning the hash into the password). Since hashing algorithms map to a fixed-length string, there are guaranteed to be hash collisions and so it is often impossible to know the original text given the hash. For this example I am going to use the SHA-1 algorithm, which is conveniently built into many web programming languages including PHP. Our database would now look as follows.

UserPasswordPassword data stored in database
User1abca9993e364706816aba3e25717850c26c9cd0d89d
User2abca9993e364706816aba3e25717850c26c9cd0d89d
User3abdcb4cc28df0fdbe0ecf9d9662e294b118092a5735
User4def589c22335a381f122d129225f5c0ba3056ed5811
To the human eye this would probably be good enough. Also, note that although User2 and User3 have very similar (but not identical) passwords, their hashes are completely different. (As a rule of thumb, changing one bit of data in the plaintext should flip approximately 50% of the bits of the resulting hash.) However, there are still a couple of caveats:
  • Anyone looking at the database would know that User1 and User2 have the same password
  • Databases using popular hashing algorithms are still vulnerable to rainbow tables. These are huge tables holding the result of hashing every possible string, often up to a certain number of characters. For example, the rainbow table available here for SHA1 hashes of lowercase letters, numbers and spaces up to 9 characters is already 52GB! Anyone with a hash could run it through such a rainbow table and come up with possible sources for that hash.
Rainbow tables are only useful when someone has gone to the bother of generating them, which means that obscure or self-coded hashing algorithms will likely not have a rainbow table against which they can be looked up. However, such algorithms are not likely to be mathematically rigorous enough to be considered a strong hash function, which is bad news if you're a big target like Google.

A way to protect against rainbow table attacks is to ensure that the number of characters in the plaintexts are not included in the rainbow table's domain. For example, in the example above, the 52GB rainbow table only worked when the passwords were 9 characters or less - so anything above 9 characters would be invulnerable to this current rainbow table. And since it's extremely low overhead to hash something as short as a string of text, we might as well add, say, 10 characters to each password. (In practice you would want to do more than this, but I want to fit everything on the page!) A common string of characters that we concatenate with every password is called a salt. Let's take our salt as "abcdefghij". Then, for example, the first entry in the database will be the result of applying the SHA-1 function to "abcabcdefghij".

UserPasswordSaltPassword data stored in database
User1abcabcdefghij0143aa54b192a3db577861c54d60486920634b28
User2abcabcdefghij0143aa54b192a3db577861c54d60486920634b28
User3abdabcdefghij03c313d4f74ab4d53a7312f42ca1287938169aed
User4defabcdefghij5d4b46aa07beb42a8057f1913e70e8d9e53bdbc7
Note that User1 and User2 still have the same stored password. While we assume that the salt will also be compromised if the database is, all existing rainbow tables will be worthless. However, if the target was high-value, a new rainbow table could be constructed to work explicitly for this database. Since the overhead of each SHA-1 operation is very low, we might as well slow down any would-be crackers by performing it recursively many times, say 4096 (this is the number of rounds of hashes performed by the WPA algorithm - as an interesting aside, it takes the SSID of the access point as its salt). This will represent no slowdown to us, but to an attacker it will slow down the creation of the rainbow table 4096-fold. We now have the following data.

UserPasswordSaltPassword data stored in database
User1abcabcdefghij1cf0c7f82fa8dbbb3e68b96130d33b8defaf1791
User2abcabcdefghij1cf0c7f82fa8dbbb3e68b96130d33b8defaf1791
User3abdabcdefghij63480fa579f5a067db0254771ba13bdc6534503b
User4defabcdefghij9a19325b82d0a8d6a3748c71d6e4d767e4a3fb14
We still have the problem that User1 and User2 can be identified as having the same password. So, as well as having the database-wide salt to obscure our data from rainbow tables, we add a pepper to obscure each entry from each other. (In practice, an attacker would have to create a separate rainbow table for every entry they wanted to crack!) With the peppers listed below, we now have the following final data.

UserPasswordSaltPepperPassword data stored in database
User1abcabcdefghijqwertyuiope27cb9a5f3973a5d173f6d5da95565ca668f3fea
User2abcabcdefghijasdfghjklpb0c8d626936e30a7b7561be5ad4384e43454ab04
User3abdabcdefghijzxcvbnmkop7ed2c4d59f006c5d321f653cb1a98ec8cf8e6a8d
User4defabcdefghij1q2w3e4r5t265f5ac148f1242c8af4ee2e70dcbe077519a3f7
And now we finally have our solution! The same plaintexts now hash to different values, as their individual peppers are different.

There are some additional measures we could take to increase the security of the system as a whole.
  • Hash the user's password client-side using JavaScript or similar so that the server never "sees" the original password. This will prevent a passive man in the middle attack. (While we're at it, we may as well salt the hash.)
  • Use a session-encryption technology like https. Such a technique is required to prevent active MITM attacks, as additional layers of JavaScript will only provide obscurity rather than security if an attacker can change a page's or script's content in transit to the user.
If the above wall of text proved difficult to follow, the process is pictured below.

password hashing algorithm

The site owner would take the data provided by the user, run it through this process, and compare the result with the data stored in the database. If they match, the password the user originally entered must have been correct. All this can be done by the owner with absolutely minimal computational overhead. Are you paying attention, Sony?