davidlyness [dot] com

Things I find interesting.


Like all other data, computers store colour data as a string of digits. Colour values are stored as strings of digits, with the hash symbol (#) at the front indicating we are working in hexadecimal. For example, #000000 is black, #ffffff is white, and #c0c0c0 is the grey of the codeblock below - check out the W3Schools page on CSS Colors for more details. On a recent project I came across the problem of generating a random string of hex digits to be interpreted as a colour.

Our first thought is that we're probably going to want to use the in-built JavaScript pseudo random number generator. While not cryptographically strong, it's good enough for this case. The general syntax is:

Math.random()


This will return a random decimal value between 0 and 1 exclusive, for example 0.40566325443796813. For reasons we'll see later, we would like to generate an integer within a range rather than a decimal. This is accomplished by multiplying and using the Math.Floor function. For example, if we wanted a random number between 0 and 10:

Math.floor(Math.random()*11)


Note that since the raw generated numbers are contained in (0,1) rather than [0,1], Math.Floor will never return 10 when the result is multiplied by 10. Therefore 11 must be used instead.

All hex colour codes are between #000000 and #ffffff. Remember that these are hexadecimal numbers, so they correspond to the range from 0 to 16777215. Therefore, we need to add one to reach the full range of colours, making the multiplier 16777216. We can also use JavaScript's ToString method to convert our decimal number back to hexadecimal (using a radix of 16), and put a '#' character on the front to comply with formatting rules. Our code so far looks as follows:

'#' + Math.floor(Math.random()*16777216).toString(16)


Generating 10 numbers, we get the following set of promising results:

#218ac5
#9e0a5c
#d733b
#5f4ea
#a9d39e
#5f9d7d
#26156f
#37f078
#e4948d
#9e754b


There is still one remaining problem, which can be seen in the 3rd and 4th entries of the above list. They have only 5 hex digits whereas the format calls for 6. The problem here is that these numbers should have leading zeros if they are less than 6 digits long. After scratching my head for a bit I came up with a neat trick using the substr() method with a negative index (with the caveat that it doesn't work in older versions of IE):

'#' + ('00000' + Math.floor(Math.random()*16777216).toString(16)).substr(-6)


Now we get the following set of results:

#00a24b
#686cc1
#7eb56f
#2aba6b
#13ab28
#b72d45
#eba0b2
#9dca3b
#f0a4ed
#63b0c1


Notice that the 1st entry has been padded with two zeros, so all generated numbers are 6 digits long. And we're done!


Steve Jobs 1955 - 2011

"Your time is limited, so don't waste it living someone else's life. Don't be trapped by dogma - which is living with the results of other people's thinking. Don't let the noise of other's opinions drown out your own inner voice. And most important, have the courage to follow your heart and intuition. They somehow already know what you truly want to become. Everything else is secondary." -- Steve Jobs


On any site that uses passwords for authentication, be it Google, Facebook or this one, the site needs to have some way of telling whether the password given by the user matches the correct password. This alone is a simple concept - but at the same time we need to make sure that we protect users' passwords if the database containing usernames, passwords, etc is compromised. Many people use the same password for different web services, so if an obscure website's database was hacked and the emails and passwords of their users were compromised, it's likely that more than a few used the same email and password combination for their PayPal account, Google account, online banking accounts... you get the idea.

Let's first consider the example where we simply store users' passwords in plaintext (that is, if a user's password is "abc123" we store "abc123" in the database). Consider the following set of users.

UserPasswordPassword data stored in database
User1abcabc
User2abcabc
User3abdabd
User4defdef
Obviously anyone who gains access to this database will have all the login details for all users immediately. The site owners always need to be able to tell whether a user knows the correct password, so all we can do is obscure the process of calculating a user's password from their database data.

(Aside: encrypting the database with a secret key could be seen as a solution; however, the majority of web attacks are done through the likes of SQL injection vulnerabilities, which execute at a level at which the database can be read in a decrypted form.)

A good first step is to apply a public, well-tested hash function to the passwords in the database, and store the password hashes rather than the passwords themselves. A hash function is a one-way function; that is, is it easy to perform one way (i.e. turning the password into the hash), but hard or impossible to do the reverse (i.e. turning the hash into the password). Since hashing algorithms map to a fixed-length string, there are guaranteed to be hash collisions and so it is often impossible to know the original text given the hash. For this example I am going to use the SHA-1 algorithm, which is conveniently built into many web programming languages including PHP. Our database would now look as follows.

UserPasswordPassword data stored in database
User1abca9993e364706816aba3e25717850c26c9cd0d89d
User2abca9993e364706816aba3e25717850c26c9cd0d89d
User3abdcb4cc28df0fdbe0ecf9d9662e294b118092a5735
User4def589c22335a381f122d129225f5c0ba3056ed5811
To the human eye this would probably be good enough. Also, note that although User2 and User3 have very similar (but not identical) passwords, their hashes are completely different. (As a rule of thumb, changing one bit of data in the plaintext should flip approximately 50% of the bits of the resulting hash.) However, there are still a couple of caveats:
  • Anyone looking at the database would know that User1 and User2 have the same password
  • Databases using popular hashing algorithms are still vulnerable to rainbow tables. These are huge tables holding the result of hashing every possible string, often up to a certain number of characters. For example, the rainbow table available here for SHA1 hashes of lowercase letters, numbers and spaces up to 9 characters is already 52GB! Anyone with a hash could run it through such a rainbow table and come up with possible sources for that hash.
Rainbow tables are only useful when someone has gone to the bother of generating them, which means that obscure or self-coded hashing algorithms will likely not have a rainbow table against which they can be looked up. However, such algorithms are not likely to be mathematically rigorous enough to be considered a strong hash function, which is bad news if you're a big target like Google.

A way to protect against rainbow table attacks is to ensure that the number of characters in the plaintexts are not included in the rainbow table's domain. For example, in the example above, the 52GB rainbow table only worked when the passwords were 9 characters or less - so anything above 9 characters would be invulnerable to this current rainbow table. And since it's extremely low overhead to hash something as short as a string of text, we might as well add, say, 10 characters to each password. (In practice you would want to do more than this, but I want to fit everything on the page!) A common string of characters that we concatenate with every password is called a salt. Let's take our salt as "abcdefghij". Then, for example, the first entry in the database will be the result of applying the SHA-1 function to "abcabcdefghij".

UserPasswordSaltPassword data stored in database
User1abcabcdefghij0143aa54b192a3db577861c54d60486920634b28
User2abcabcdefghij0143aa54b192a3db577861c54d60486920634b28
User3abdabcdefghij03c313d4f74ab4d53a7312f42ca1287938169aed
User4defabcdefghij5d4b46aa07beb42a8057f1913e70e8d9e53bdbc7
Note that User1 and User2 still have the same stored password. While we assume that the salt will also be compromised if the database is, all existing rainbow tables will be worthless. However, if the target was high-value, a new rainbow table could be constructed to work explicitly for this database. Since the overhead of each SHA-1 operation is very low, we might as well slow down any would-be crackers by performing it recursively many times, say 4096 (this is the number of rounds of hashes performed by the WPA algorithm - as an interesting aside, it takes the SSID of the access point as its salt). This will represent no slowdown to us, but to an attacker it will slow down the creation of the rainbow table 4096-fold. We now have the following data.

UserPasswordSaltPassword data stored in database
User1abcabcdefghij1cf0c7f82fa8dbbb3e68b96130d33b8defaf1791
User2abcabcdefghij1cf0c7f82fa8dbbb3e68b96130d33b8defaf1791
User3abdabcdefghij63480fa579f5a067db0254771ba13bdc6534503b
User4defabcdefghij9a19325b82d0a8d6a3748c71d6e4d767e4a3fb14
We still have the problem that User1 and User2 can be identified as having the same password. So, as well as having the database-wide salt to obscure our data from rainbow tables, we add a pepper to obscure each entry from each other. (In practice, an attacker would have to create a separate rainbow table for every entry they wanted to crack!) With the peppers listed below, we now have the following final data.

UserPasswordSaltPepperPassword data stored in database
User1abcabcdefghijqwertyuiope27cb9a5f3973a5d173f6d5da95565ca668f3fea
User2abcabcdefghijasdfghjklpb0c8d626936e30a7b7561be5ad4384e43454ab04
User3abdabcdefghijzxcvbnmkop7ed2c4d59f006c5d321f653cb1a98ec8cf8e6a8d
User4defabcdefghij1q2w3e4r5t265f5ac148f1242c8af4ee2e70dcbe077519a3f7
And now we finally have our solution! The same plaintexts now hash to different values, as their individual peppers are different.

There are some additional measures we could take to increase the security of the system as a whole.
  • Hash the user's password client-side using JavaScript or similar so that the server never "sees" the original password. This will prevent a passive man in the middle attack. (While we're at it, we may as well salt the hash.)
  • Use a session-encryption technology like https. Such a technique is required to prevent active MITM attacks, as additional layers of JavaScript will only provide obscurity rather than security if an attacker can change a page's or script's content in transit to the user.
If the above wall of text proved difficult to follow, the process is pictured below.

password hashing algorithm

The site owner would take the data provided by the user, run it through this process, and compare the result with the data stored in the database. If they match, the password the user originally entered must have been correct. All this can be done by the owner with absolutely minimal computational overhead. Are you paying attention, Sony?