Update: I've found a solution that will work for Pyrus. See my next blog post.
While working on Pyrus, the next generation installer for PEAR, I've been attempting to figure out how to leverage PHP's excellent new libxml-based XML validation to speed up package.xml validation in PEAR. To my great surprise and annoyance, I am starting to believe that this is the wrong thing to do.
Originally, I designed a W3C Schema for package.xml 2.0 using XMLSpy, but found that libxml is unable to handle the schema whatsoever. Instead, with the release of PEAR 1.4.0 I designed a schema validator based on the array functions of PHP and the unserialized array generated by XML_Serializer's XML_Unserializer class. Because this is not an elegant validator of W3C Schema or Relax NG, I called it "stupidSchemaValidate()". This is used to validate the basic structure of the XML level-by-level, and so requires many function calls.
I never considered using DTD because it can't handle namespaces.
Now that I am working on the PHP 5+ implementation of Pyrus, the first thing I thought I might do is create a Relax NG schema that the PHP libxml can handle. After an entire day of fighting with the thing, I've managed to discover more than 10 simple and valid Relax NG schema that simply don't work with the version of libxml distributed with PHP 5.2.3. In addition, with helpful error messages like "Expecting name, got nothing here," even with the use of libxml_use_internal_errors() I find the error reporting to be excruciatingly useless.
With my hacked-together stupidSchemaValidate(), I get wonderfully clear error messages like "unexpected <channel>, expecting one of <version>, <date> on line 23" which make it not just easy but simple to debug broken package.xml formatting.
Is there anyone out there who has had any luck getting a supposedly smart external validation tool to print remotely helpful error messages on validating a complex .xml with PHP?