davidlyness.com

Things I find interesting.

Posts in the Programming category

Google's PageRank algorithm assigns a number between 1 and 10 to web domains (such as facebook.com or davidlyness.com), and is one of the main factors in determining how high to place a website in the list of Google search results. The method by which a domain's PageRank is calculated is patented (and hence public), but a domain's PageRank is never actually shown in the search results themselves. It is also notoriously difficult to find a simple way to query Google's servers for this information.

When Google released the Google Toolbar back in 2000, it included a feature which displayed the PageRank of the website currently being viewed. With a bit of reverse-engineering, I developed the below function which will return the PageRank of a given domain. Interesting to note is that the returned rank is always between 1 and 9 - so sites like google.com that have a de facto PageRank of 10 appear as having a PageRank of 9.

function getPageRank($domain) {
	$domainlen = strlen($domain);
	$seed = "Mining PageRank is AGAINST GOOGLE'S TERMS OF SERVICE. Yes, I'm talking to you, scammer.";
	$seedlen = strlen($seed);
	$result = 0x01020345;
	for ($i = 0; $i < $domainlen; $i++) {
		$pos = $i % $seedlen;
		$result ^= ord($seed[$pos]) ^ ord($domain[$i]);
		$result = (($result >> 23) & 0x1ff) | $result << 9;
	}
	$checksum = 8 . dechex($result);
	$url = sprintf("http://toolbarqueries.google.com/tbr?client=navclient-auto&ch=%s&features=Rank&q=info:%s", $checksum, $domain);
	$ch = curl_init();
	curl_setopt($ch, CURLOPT_URL, $url);
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
	$response = curl_exec($ch);
	curl_close($ch);
	$rank = substr(strrchr($response, ":"), 1, 1);
	return $rank;
}


Disclaimer:
  • As can be seen from $seed, abuse of this script is in violation of Google's Terms of Service. Use at your own risk.
  • Google can change the initialised value of $result at any time - doing so will render this script unusable until this value is updated.


Like all other data, computers store colour data as a string of digits. Colour values are stored as strings of digits, with the hash symbol (#) at the front indicating we are working in hexadecimal. For example, #000000 is black, #ffffff is white, and #c0c0c0 is the grey of the codeblock below - check out the W3Schools page on CSS Colors for more details. On a recent project I came across the problem of generating a random string of hex digits to be interpreted as a colour.

Our first thought is that we're probably going to want to use the in-built JavaScript pseudo random number generator. While not cryptographically strong, it's good enough for this case. The general syntax is:

Math.random()


This will return a random decimal value between 0 and 1 exclusive, for example 0.40566325443796813. For reasons we'll see later, we would like to generate an integer within a range rather than a decimal. This is accomplished by multiplying and using the Math.Floor function. For example, if we wanted a random number between 0 and 10:

Math.floor(Math.random()*11)


Note that since the raw generated numbers are contained in (0,1) rather than [0,1], Math.Floor will never return 10 when the result is multiplied by 10. Therefore 11 must be used instead.

All hex colour codes are between #000000 and #ffffff. Remember that these are hexadecimal numbers, so they correspond to the range from 0 to 16777215. Therefore, we need to add one to reach the full range of colours, making the multiplier 16777216. We can also use JavaScript's ToString method to convert our decimal number back to hexadecimal (using a radix of 16), and put a '#' character on the front to comply with formatting rules. Our code so far looks as follows:

'#' + Math.floor(Math.random()*16777216).toString(16)


Generating 10 numbers, we get the following set of promising results:

#218ac5
#9e0a5c
#d733b
#5f4ea
#a9d39e
#5f9d7d
#26156f
#37f078
#e4948d
#9e754b


There is still one remaining problem, which can be seen in the 3rd and 4th entries of the above list. They have only 5 hex digits whereas the format calls for 6. The problem here is that these numbers should have leading zeros if they are less than 6 digits long. After scratching my head for a bit I came up with a neat trick using the substr() method with a negative index (with the caveat that it doesn't work in older versions of IE):

'#' + ('00000' + Math.floor(Math.random()*16777216).toString(16)).substr(-6)


Now we get the following set of results:

#00a24b
#686cc1
#7eb56f
#2aba6b
#13ab28
#b72d45
#eba0b2
#9dca3b
#f0a4ed
#63b0c1


Notice that the 1st entry has been padded with two zeros, so all generated numbers are 6 digits long. And we're done!