Well it's been the craziest summer of my entire life (see Chiara Quartet website for the skinny). Of course, at the moment my schedule went into overdrive, two major issues "hit the fan" in the PHP world, in that Marcus proposed the phar extension for inclusion in the core of PHP, and Arnaud and I proposed major changes to the PEAR coding standards for the new PEAR2 repository. More on PEAR2 later.
phar is an interesting beast. Right before the summer began, I received a private email from a user who noticed that when trying to package together more than around 270 files into the same phar, his OS X would run out of available file handles to use. This is a problem that Marcus and I had suspected could be an issue based on the reports of alpha testers, but had not yet seen in the wild.
Marcus did a bunch of work on improving the stability of phar earlier in the spring and summer, bringing us to phar version 1.2.0 in late May, but did not have time to look at this complicated issue
So, early this week I sat down and buried myself inside the computer for many hours, finally achieving victory a few minutes ago. The process was arduous for many reasons. First, I needed to write better introspection, so I created a gdb helper file similar to the .gdbinit bundled with PHP that would allow me to browse open phars, open file handles, and a phar file entry very quickly.
Next, I had to track down the source of the problem. It turned out to be in the implementation of flushing to disk, which aggregates all unmodified and modified phar entries and then pushes them into a new phar on the disk, generates the manifest, calculates a hash, yadda yadda. After this act, every single file entry in the phar has an open file pointer for a temporary file containing the file's contents, even the ones that were unused! This was bad, so I created a simple routine to check for unused files and close them during flush.
Thanks to our voluminous unit tests (with upwards of 90% code coverage according to gcov), I found a large issue in rename, and also discovered that I would need to implement a just-in-time re-opening of the file handle. At the same time, I discovered a subtle coordination issue with compression where a newly compressed file was being treated as compressed before the compression could actually occur. After fixing this, I finally had my patch. Further real-life testing is needed before I trust that this is the version of phar that will be released as 1.2.1, but it's a grand step closer to fully robust.
Perhaps I'm the only person who thinks this is exciting, but it's been an issue since the earliest versions of phar with write support, and now is an issue no more. This makes the next step of profiling and comparing to other solutions a lot more palatable to me, as I much prefer stability before optimization for all of the obvious reasons.
By the way, for those of you wondering how to make the transition from programming in PHP to programming PHP itself, I can't claim to be an expert, but I can tell you who is. Sara Golemon. Her book, Extending and Embedding PHP is an indispensable resource for anyone who wants to either write extensions or dig into the source of PHP itself. Combine this with the (free) search facility of lxr.php.net and a clever use of grep, and you're well on your way to being a PHP internals coder with very little extra effort. Sara's book is worth at least twice the cover price, and I recommend it even to those who have no interest in programming extensions, as understanding a bit (or more) of the internals of PHP is worth its weight in gold when you are making design choices for your applications or libraries.