There was once a time when running a big (or popular) web application meant running a big web server. As your application attracted more users you would add more memory and processors to your server.
Today, the ‘one huge server’ paradigm has been replaced with the idea of having a large number of smaller servers which employ one or more methods of balancing the load (known as ‘load balancing’) across the entire group (known as a ‘farm’ or ‘cluster’). This is partly down to the fall in hardware prices which made this approach more viable.
The advantages of ‘many small servers’ approach over the old ‘big server’ method are twofold:
If a server falls over then the load balancing system is normally clever enough to stop sending requests to the crashed machine and distribute the load across the remaining healthy servers.
It now becomes much easier to scale your infrastructure. All you need to do is plug a few new servers into the load balancer and away you go. No downtime necessary.
So there must be a catch… right? Well yes, it can make developing your applications a little more complex and this is what the remainder of the blog post will be covering.
At this point you may be saying to yourself, ‘ok, but how do I know if I am using this load balancing thingy’. The honest answer is that if you are asking this question then the answer is probably going to be that you are not using a load balanced system and you don’t need to worry too much. Most people explicitly setup load balancing when their applications grow to a size that demands it. However, I have occasionally seen web hosting companies that load balance their shared hosting accounts. These companies may tell you that they do this load balancing, or you may have to work it out for yourself based on the remainder of the blog post.
Before we continue I would like to point out that this post focuses on load balancing from the perspective of PHP (or your chosen server side language). I may write an additional blog post on database load balancing in the future, but for now that will have to wait.
A note on ‘web applications’: You may notice that I keep referring to ‘web applications’ rather than websites. I do this to distinguish between sites which simply display static content, and more advanced sites which are typically powered by databases and server-side programming.
Your PHP Files
The first question you may have is, ‘if there are lots of servers, how do I get my files onto all of them?’
There are a few of ways we can distribute our files to all of our servers:
- Upload all of our files to each server separately, just as we used to do with one server. This way clearly sucks because a) imagine doing this for 20 servers and b) it is far too easy to get something wrong and have different versions of your files on different servers.
- Use ‘rsync‘ (or similar). Such tools can sync local directories to multiple remote locations. An example of this would be syncing your single staging server with your multiple live servers.
- Use a version control system (such as subversion). This is my preferred method as I can maintain my code in subversion and then run an ‘svn update’ command on each live server when I am ready to make my changes live. This also makes rolling back changes particularly easy.
- Use a file server (you may find NFS useful). In this case you use a file server to store your web root directory/directories and then mount the share onto each of your web server. Of course, if your file server crashes then all your sites go down, and there can be a lot of overhead in pulling files from remote machines, especially if each request involves a large number of files.
The option you choose will depend on your requirements and the skills at your disposal. If you use a version control system then you may want to devise a way of running the update command on all servers simultaneously, whereas if you use a file server you may want to implement some form of failover system in case the file server crashes or becomes unreachable.
A file upload is still a simple request, so the file will only be sent to one server. This is, of course, not a problem when you only have one server, but what do we do when there are several machines that the file needs to be placed on?
The problem of handling file uploads is very similar that of distributing your files across the server farm, and has the following potential solutions:
- Store your uploaded files in a database. Most databases allow you to store files in the form of binary data, so you can use a database to store you file uploads rather than storing them as files on disk. When you come to send the file to a user you can pull the file data from the database along with some extra data such as the file name and mime type. Before continuing down this route you should consider how much database storage you have at your disposal. This method also has quite a high overhead as the request needs to be passed though PHP, which then has to pull the data from the database.
- Store your uploaded files on a file server. Here, as with the previous section, you mount a fileserver share on each of your web servers and place your uploads there. This way the upload is available on all your web servers instantly. However, if the file server is unavailable then you may end up with broken images or downloads. Also, there is still an overhead in pulling files off a file server.
- Design your upload handling code to transfer the file to each server. This option does not have the disadvantages of using a file server or database but will probably complexity to your code. For example, what happens if one server is down at the time of the upload?
It is possible to reduce the overhead with the database storage option by keeping a local file cache. When a server receives a request for an uploaded file it first checks its local cache of uploaded files. If the file is found then it can be sent from the cache, otherwise it can be pulled from the database and the cache updated with the file.
If you are familiar with PHP’s built in session handling you will probably know that by default it stores session data in temporary session files on the server. Again, this file is only present on the server which handled the request, but subsequent requests which use the session could be passed to any server in your farm. The result is that sessions are frequently not recognised, and the user is (for example) continually logged out.
The solution I recommend here is to either override PHP’s built in session handling and store the session data in your database, or implement some guaranteed way of always sending users to the same server. It may be easier to use the former given that larger applications will probably implement their own session handling anyway.
I feel that configuration is worth covering here even though the topic is not just isolated to PHP. When running a server farm it is an excellent idea to have some way of keeping your configuration files in sync between servers, whether they are related to PHP or not. If your configuration files become out of sync it can lead to some very strange and intermittent behaviour that can be difficult to track down.
I recommend keeping most, if not all, of your configuration files in a separate area of your version control system. This way you can store different PHP configuration files for different installations of your projects, as well as keep all your server configuration files in sync.
Like configuration, logging is not solely related to PHP but it is still very important for keeping an eye on your servers’ health. Without a proper logging system how will you know if your PHP code starts producing errors (you do have display_errors set to ‘off’, right?)
There are a few ways you can implement logging:
- Log on each server – This is the simplest method available to you (aside from not logging at all!). Each machine simply logs to a file as it would do if it were not part of a larger farm. This has the benefit of being simple and requiring very little (if any) work to setup. However, as the number of servers you have begins to grow you may find that monitoring these individual log files become untenable.
- Log to a share – In this method each machine still has its own log files, but they are stored on a central file server and are written to via a share. This can make log monitoring simpler as all the logs are stored centrally, but there is an overhead in writing to log files on a share. Also, if the file server becomes unreachable your servers or applications may do anything from simply not logging messages to completely crashing.
- Log to logging server – You can use a logging application such as syslog to perform all your logging on a central server. Although this method requires the most configuration it also provides the most robust solution.
I have only covered a small area of logging here, the (very exciting! Ahem) topic of logging best practices could justify an entire blog post to itself. As always, I recommend choosing the technique best suited to your situation.
This blog post has given you an introduction to running your PHP applications on a larger scale. Most of the problems listed here stem from working with files, and the fact that these files often need to be shared between all the servers in your farm. It is therefore a good idea to think about the implications of load balancing when working with local files.
You can apply the techniques here if you are developing a large scale web application or if you work on projects which are distributed to a number of users or clients. For example, if you contribute to an open source project it is likely that some installations of your application will be run across several servers. It is therefore important to be aware of this when designing and creating your application.