davidlyness.com

Things I find interesting.

Posts in the Web Technology category (page 2 of 2)

In the 1.0.12 update, the single entry in the changelog was "nicer input sanitization". What does this mean, and why is it now "nicer"?

A website such as this accepts information from site visitors. For example: Adhering to the basic security model, all such data should be regarded as untrusted - that is, there is no guarantee that the data is not designed to maliciously affect the system. For example, a comment could be submitted which includes a SQL command that drops a table whenever the comment is inserted into the database. This sort of thing needs to be stamped out before it gets near the database interpreter.

Before 1.0.12, I was using a strategy of escaping characters. This means that, for example, whenever a " (quote mark) is seen, the database replaces it with \" - this ensures that an attacker cannot interject their own SQL commands into their comment. PHP provides a useful function called addSlashes to do this automatically for escapable characters; when the comment is to be outputted, a similar function called stripSlashes can be used to remove all the backslashes, changing the \" back to a " so that the comment is displayed correctly.

This is a somwhat messy way to do things - it relies on the developer remembering to use the addSlashes and stripSlashes functions on all database interactions. In addition, PHP includes a "feature" called magic quotes (thankfully deprecated, unfortunately still enabled by default) which automatically escapes data inserted into the database. This may sound like a good thing, but not when you don't know it's going on, and it does it in all circumstances regardless of context!

The new method being used in 1.0.12 is prepared SQL statements. This involves sending the structure of the SQL command to the database first, and afterwards sending in the parameters (user provided data). Because the command is submitted in two separate steps like this, the database knows that anything received in the second step will be just data, nothing else. This prevents SQL injection attacks in a much cleaner way.

The PHP PDO architecture provides a clean method for preparing such statements - by calling prepare first we can outline the structure of the command, and with bindParam we can insert the untrusted data.

For example, consider the following (insecure) PHP code snippet:

$db = new PDO(/* database connection information */);
$sql = "insert into tblcomments (postid, comment) values ($postid, $comment);";
$stmt = $db->prepare($sql);
$stmt->execute();


If we assume the comment stored in the variable $comment has not been escaped or otherwise sanitized, it is possible a SQL injection could occur by way of someone submitting a malicious comment.

Now consider the following improved version:

$db = new PDO(/* database connection information */);
$stmt = $db->prepare("insert into tblcomments (postid, comment) values (:postid, :comment));
$stmt->bindParam(':postid', $postid);
$stmt->bindParam(':comment', $comment);
$stmt->execute();


Because we have two user-provided parameters (the post that the comment is attached to, as well as the comment itself) we require two bindParam statements. Notice that on line 2, when the SQL command structure is being sent to the database, there is no user-provided data involved - the same command will be sent regardless of the post or comment. This also gives performance enhancements when using select statements, as we're able to cache commonly requested queries.

There is still an outstanding problem in the current codebase. There is a function which has the field selection, table selection, field selection and field data all defined in terms of variables - so the barebones SQL command has the following structure:

select $request from $table where $givenfield=$givenvalue limit 1


This poses a problem as the bindParam method can only be used on field data - i.e. the $givenvalue variable above. Therefore the above SQL command cannot (as far as I know) be effectively paramaterised using prepared statements. This isn't a huge problem as by design this function is internal - that is, it is always called with hard-coded data rather than user-supplied input. (The point of the above command is to give me a function I can call to execute repetitive-but-slightly-different commands like selecting a field from a table based on another field's value.)

SQL injection is one of the two main web attacks. The other, cross site scripting (XSS), is more complicated - expect to see a post on it in the future.


A practice that you've no doubt come across in the past when signing up for a new website is the "Let us send emails to all of your friends so that we can invite them to join too!" activity. Most often, this is accomplished by you giving the website your username and password, which then starts an automated process of them logging into your account, scanning your contact list for email addresses and then sending an email to each and every one. Aside from encouraging spam, it also raises the question of security. You may trust Facebook or Google with your email credentials, but what about other websites that may not have as much to lose from a privacy breach? And aren't we all told never to give away our passwords anyway?

The same issue is encountered by Twitter application developers. There are many, many Twitter applications out there - desktop applications, browser plugins, mobile apps and so on. In order to post from your account they obviously need to be "logged in" as you. One way of doing this is giving them your username and password (Basic Authentication) - and if you're testing out some of the many existing applications, more than a handful of developers now potentially have access to your Twitter account. This may not be the most high-security target in the world, but it illustrates the point.

The current answer to this is OAuth. Wikipedia's definition of OAuth is "an open protocol that allows users to share their private resources (e.g. photos, videos, contact lists) stored on one site with another site without having to hand out their username and password". The following diagrams serve to illustrate the difference between Basic and OAuth Authentication.

basic authentication OAuth authentication

Obviously Step 1 in Basic Authentication and Steps 1-5 in OAuth Authentication only need to happen once. Furthermore, we can derive a number of advantages from the use of OAuth:
  • No application ever has access to your password, so if you use the same password across different sites your identity is not compromised.
  • The server can revoke access to any application at any time by removing its authorisation token from its database - useful in case the application is ever found to be doing anything non-professional.
  • The user can see an entire list of applications that have access to their account, and what privileges they have. They can also revoke access at the click of a button.
  • Changing the password does not prevent these applications from functioning.
According to Twitter's official blog, the move from Basic Authentication to OAuth Authentication was supposed to happen on 30th June. However, in order to avoid any kind of major outage during the 2010 World Cup, it was postponed until 31st August - which is why around that time you probably saw messages along the lines of "The Twitter API is down" in the sidebar. I didn't think that they'd go so far as to stop pulling in public tweets that don't require a username/password in the first place, but they did. Such is life. If you're interested as to how the sidebar widget is now working, it is scanning the RSS feed and parsing the result in order to display just the most recent tweets.

Currently in development is the OAuth 2.0 specification, which focuses more on specific authorisation flows for various hardware devices - you can read the draft specification here.

Update: a recent episode of the Security Now! podcast went into OAuth in greater depth than I have here. If you're interested in the very technical aspects of it, along with the potential security vulnerability, you can read the transcript of the show or listen to the audio.