Web Authentication


Authentication for Web Sites

Author: Baruch Even

Introduction

Many web-sites have accounts for users with personal information or the ability to control various things on the site. For this the site employs a username and password combination, such that the user has some permission and the password is used to make sure that the client who tries to login is truly the real user, that is, the client is authenticated.

Password authentication is the most common, other forms include biological based authentication method, such as a retinal eye scan, additional forms use a smart token. However password authentication is the only feasible method for web-site authentication, the other methods require hardware that is not readily available for all users.

The process by which the user sends the username and password for the site and the site acknowledges the user as authenticated is called the authentication protocol.

Authentication Tokens

After the user will be logged in we'll need to keep track of him so that he will not need to go through the login process for each page. For this we keep authentication tokens attached to the user, the authentication tokens will be verified on each page access and if they do not match what the user logged in with we will require him to login again.

Token placement

Before we discuss the tokens themselves let's see what options we have to attach them to the user. Basically we have two options, we either give the user a cookie or we could add the tokens to each URL in the site.

The first option is the best since it is less intrusive and is hidden from the user, also cookies avoid the chance to bookmark stale tokens and the chance to remove the tokens inadvertantly.

The problem with cookies is that they might be disabled, for this case we might require the user to support cookies or we could fallback to attach the tokens to the URL for each page, this is effectively what gets done in PHP4, whenever cookies are not supported by the browser, PHP4 will attach the tokens to the URL.

What is in the Tokens?

The tokens purpose is to say that the user with this token is logged in, the simplest such token is to have it named loggedin and set it to one after the user logged in.

This only serves to show that the tokens themselves need to be secured in the sense that we don't want a user to be able to say that he is logged in without going through our login page.

After all, in this example the user can just add the cookie by himself and he will be regarded as logged in. Bad idea!

The simplest scheme for the tokens is to have a token called username and a token called password which will hold the users username and password, respectively. However it is possible to intercept this on the way and so it's insecure in the sense that someone can listen and learn the password.

A simple and the most effective method is to have a session id that will be a random number attached to the user at login time and have all info on the server attached to this session id, when the user logs off the session id and it's associated information is dropped and so it poses no large problem.

It is possible for an attacker to use the session ID of the user but that is only usefull as long as the user is logged in, we will see later on that this is unavoidable under normal circumstances.

Basic Protocol

The simplest such protocol is employed by web servers for the Basic Authentication, this is also the default protocol for authentication by the webserver, when you get a popup window to ask for a username and a password this is the basic authentication of the webserver, it is usually done with the htaccess file in Apache. In this protocol the server asks the user for authentication and the user provides the user and password, The web server then compares the password with the copy it has stored on file and allows access.

All the data is transfered as-is, this means that anyone who can listen to the communication will learn the password. Listening on the communication line is called sniffing and is not so hard to do, so we want something that has better security.

Improved Protocol

The immediate solution is to encrypt the password in transit by some key, however this is not usually possible because both sides need to know the key beforehand. There are methods to create such a key on the fly but these algorithms are not suitable for implementation in the a regular web application.

The solution for this is to use a Hash function, a Hash function is a one-way function, that is, given some input it returns a number, the one way property means that if the sniffer gets the resulting number he will not be able to find an input that will generate this number.

The two hash functions that are in wide use are MD5 and SHA1, MD5 returns a 128 bit number and SHA1 returns a 160 bit number. The number is usually converted to hexadecimal notation in which each 4 bits are translated into a character that is easy to store and display.

The authentication method now is switched to have the user send the username and the hash of the password instead of the password itself. The server cannot find from the hash what was the password the user types, but since the server knows the password from beforehand (during registration) it can compute the hash by itself and then compare the received hash with what it computed. This means that anyone who is listening will not learn the password and so supposedly he cannot enter the web-site as this user.

Final Protocol

Stopping at this point would have made this article rather short, luckily for us there is a hole in the above scheme. The hole in this scheme is that the sniffer now has the hash and since all we need is to send the username and the hash he can simply send them without knowing the password!

So the problem now is that even though the sniffer can't learn the password he can still login, what are we to do?

We solve this by adding a random variable, called salt, that is for each time the user logs in the server will send some random number, now the user will send the hash of the random number and the password, this way the hash will be unique for each login and the sniffer will not be able to login even if he gets some hash, since he'll need to get the hash for each random number that the server might send, if we make sure to have a choice from a large enough pool the sniffer will have a lot of trouble to get the access he so covets. A large enough salt will be at least 10 characters long, this will be enough to defeat most attackers, the longer the salt is the larger the work that the sniffer needs to do to have a chance, but the cost to the site is neglibigle.

One thing that needs securing is making sure that these login requests with the salt will timeout after a short while, otherwise the sniffer will just use an old login to get into the system. This means that the server needs to keep the last salt it sent for a user and a timestamp saying when it was sent. If a login is attempted with a different salt or after more than some predefined time the login is automatically denied and a new salt generated and provided.

After all this work and revisions to the basic protocol we are still insecure, there are several problems that we cannot solve in the usual framework and will only be able to solve with external means, however, for most purposes and for most sites such an authentication method is sufficient, obviously if you work for a bank you should be looking for something better but for a web log or simple personalization you could go without the extra bother.

Unsolved problems

The first and foremost problem is that a sniffer even in the last protocol can hear the login sequence and as long as the real user is logged in the sniffer can use the authentication tokens of the logged in user to access the site. A stopping for this can be done by adding the IP address of the user to the hashed password and verifying that the request comes from that IP address, this will have the effect that the sniffer will also be forced to fake the real users IP address, however since the sniffer sees all messages between the user and the server he can also fake the IP address with relative ease, but this still imposes extra work that he will need to go through.

Another possible problem that we haven't addressed is the possibility of a Man-In-The-Middle attack, that is, the sniffer could go active and simply catch all the traffic between the site and the user, read it and send to the site everything it wants, this means he can also change user commands and fake responses. This is a serious problem, but for most sites it is very unlikely that someone will go to the bother of doing such a thing.

The only solution for these problems is to use a secure channel, the secure channel is usually an encrypted transmission channel, where the two sides have a shared key that only they know and so no one can read their communication or fake it. Creating such a secure channel will not be covered as it requires an article (or a book) of its own.

A secure channel can give protection against sniffing and against Man-In-The-Middle attacks, this is because the sniffer cannot know the password in advance. With a good protocol for the setup of this secure channel the sniffer will also be unable to impersonate the server.

A standard way to initiate a secure channel is SSL (Secure Sockets Layer) which is the usual encryption used for web sites. With SSL the server has a key and a certificate from some known Certificate Agency (CA), the CA has a key that is distributed with the browsers and so when a browser hits an SSL website it asks for the certificate and can verify it, if the certificate verified ok the browser will trust the server key and will initiate a secure channel to it.

The main drawback for SSL and the reason it is not so widely used for non e-commerce sites, is that it requires the server to pay for a certificate which has an expiration date and thus must be renewed every one or two years, also the processing time that is associated with encrypting and decrypting the secure channel is several times greater than a simple non-encrypted channel.

Conclusion

The final conclusion is that we have a relatively simple authentication method which is moderately secure against passive attacks, the sniffing, but is not secure against active attacks, such as IP address faking. We cannot have anything that is much better than that without encryption but for most websites such a method will suffice.

Additional Resources