Recently, pear.php.net silently switched over to a revolutionary new way of maintaing the live website that has been extremely freeing. The same technique is being explored by the government of the state of Iowa and other large-scale sites with mission-critical servers. Before revealing the secret, let's learn about the problem.
One of the most common tasks that we experience as web developers is synchronizing a development web server with a live site. There are many solutions that have been tried before. Just a few:
- develop the code on the live server
- develop on a remote machine, and ftp the files to the site as they change
- develop using version control like CVS, and use a cron job to synchronize the website directly from CVS
- develop on a remote server with the same configuration, and use rsync to synchronize changes
Almost everyone has tried #1. The first time you accidentally erase a file, it will rid you of that habit in a hurry. Most developers use #2, and this has the benefit of having a backup on the development machine. The same principles hold for #3 and #4 but the changes are synced in a controlled way.
However, every single method described above has the potential for immediate and catastrophic failure, even with a backup.
Why?
Websites are not just the code that runs them. Websites consist of interdependency between code, data storage backend (LDAP/database/other), and also the actual hostname itself. Of the 4 methods listed above, not one addresses this issue. When upgrading code on the live server, if there are any database changes, these must be done by hand. Keeping track of links to use (for instance, when redirecting a user to a secure login form, one cannot use a relative href, but must use the full hostname) also requires some footwork. The most common solutions involve two strategies:
- use /etc/hosts to mimic the live server's DNS (alias hostnames to localhost)
- use a static configuration file that is not synchronized to define hostname/path differences
The first idea is good, but also means you will need another alias in order to rsync with the remote site (or a dedicated IP on the remote site). Chances are, if you have a dedicated machine for development, you also have a dedicated IP for the live server, and then this is a good solution for managing hostname differences between the development and live machine. The second solution is far more common, and much less robust. Any file that must be different between the development machine and the remote server is asking for trouble when using ftp or rsync. If the file is excluded from rsync/ftp, when it is erased, the file is gone, and must be stored somewhere else. This quickly introduces unwanted complexity in the maintenance of the website, and another failure path for the site.
The problems only multiply when using external depedencies. As many developers have discovered, it's all fine and good to try to avoid re-inventing the wheel. The problem happens when trying to upgrade an external library. Each library has different requirements, and upgrading often means a mysterious quagmire of erasing and adding files, as well as database changes or other tasks that make it onerous to upgrade. Actually, upgrading isn't so hard, but downgrading, should the upgrade break everything is really hard (Note: this is far more common if the package is not from pear.php.net - PEAR has a strong backwards compatibility policy designed to mitigate this issue).
The ultimate solution, as I outline in chapter 3 of The PEAR Installer Manifesto, is to use the PEAR Installer in conjunction with version control to manage the complexity of synchronizing a live website with the development machine.
A few years back, while designing the next incarnation of the PEAR
Installer, I realized that its ability to do file transactions,
intrinsic support for dependencies and versioning make it ideal for
managing not just PHP libraries and applications, but also for managing
complete websites, and not just the PHP files, but all file-related
events. There were just a few elements missing: how to synchronize a database? It turns out that the answer is also PEAR-related, and was one of the features integrated in PEAR 1.4.0 and newer.
The code that runs http://pear.php.net is now completely encapsulated within the pearweb package (http://pear.php.net/pearweb). As part of the installation for the pearweb package, a post-installation script is available that initializes the database using MDB2_Schema, a cross-database package that can create or modify a database installation. As time passes, the pearweb package will be divided into several related packages, one for each sub-site within pear.php.net (there is already a subpackage for the installation .phar files, http://pear.php.net/pearweb_phars). This will allow modular development and other tried and true techniques applied to traditional programming environs.
The biggest difference between the old and the new method was exemplified a few weeks back. A regression was introduced in pearweb version 1.1.2 that broke the statistics graph for package downloads. Initially, I wasn't sure what the problem was caused by, so I first reverted to the previous version:
pear upgrade --force pearweb-1.1.1
Once the problem was traced to an extra line in one of the files, pearweb 1.1.3 was released, and the new version was synchronized with:
pear upgrade pearweb-1.1.3
Two lines of shell scripting and the entire website was fixed! In another early instance, an upgrade of the website resulted in the entire site being placed in the wrong directory due to a configuration value change in the PEAR Installer at pear.php.net. In 30 seconds, I was able to fix it with:
pear config-set web_dir /path/to/actual/pearweb
pear upgrade --force pearweb
This is the missing piece of the 4 most common synchronization methods: it's really hard to fix mistakes made. You will still make mistakes when using the PEAR Installer, the difference is that reverting them is a 1-liner and requires no sweat or fear that one will break something else in the process: each release of a package is a known quantity, and it will work the same way as it did last time. The major exception is database upgrades - these require special attention, as once the database changes, reverting to a previous version is no longer possible. However, designing modularly so that database changes only affect a single module within the website can even fix this problem.
For a more comprehensive analysis of the steps needed to maintain a live webserver successfully, like ways of providing a safety buffer when upgrading (redirecting users temporarily) and the nitty-gritty of packaging up a website using the PEAR Installer, buy my book, I promise it will be worth the cover price.
Don't be fooled by the title, this book is not about how to type "pear install Blah." As one of the people who wrote to me said:
Chapter 3 is of most interest to me - going beyond simple libraries to
distributing full apps. As I run a development team of 8 managing
mission critical sites, I'm trying hard to bring as much good software
engineering methodology as I can in. I've taught everyone to use
Subversion so my next issue is deployment. I'm obviously familiar now
with basic package.xml stuff for libraries but not for full blown apps.
I've been working on a small (php-based) app called **** on and off for a while, which automates some
aspects of web app deployment (setting up the environment - databases
etc., checking stuff out of SVN including deploying bits from PEAR and
so on). However now that I have your book it seems that quite a bit of
it can be done within PEAR - I had no idea it was that flexible!
The book uses the Chiara Quartet's website as an example, which was a from-scratch rewrite to fit this new model. Unfortunately since writing the chapter, we decided to outsource the website to another person who uses their own model of development. Fortunately, the design is much better than when I was maintaining the site
. pear.php.net is more interesting, as it demonstrates how easy it is to move an existing site to the new model. All you really need is a package.xml, and possibly a PEAR channel server, although there are other ways to move the files around as well. Incidentally, Chapter 5, which is available for free on the packt website, describes in detail how to set up a channel server.
I really hope you will check out this method - it has made everything easier for me, and in a way represents the culmination of several years of blood, sweat and tears on my part pooling ideas for the PEAR Installer, implementing and regression testing them, and finally writing about them.
Happy PHPing!