<< April 10, 2010 | Home | April 12, 2010 >>

XML Logging

I don't think it's an understatement to say that XML is one of the most misused and overengineered technologies of the past decade. XML is good for markup of text documents, but it's clearly a bad format for most other purposes. Nevertheless, many people (me included!) misuse XML for things like configuration files and data export, simply because the XML tools are so ubiquitous today. If XML is used by some part of the application, there is a temptation to use it for all sorts of other things as well, whenever the need for a representation syntax comes along. So rather than bundling e.g the more appropriate JSON library for data transport, a more clumsy XML representation is often used simply because the tools are already in place.

Structured log files are however one of those cases where I think the XML format makes sense. It's actually quite reasonable to think of log files as heavily structured text documents, for which XML markup is a good fit. The classical Unix syslog format with long lines composed of space-separated strings becomes difficult to handle for both machines and humans whenever the content's structural complexity is non-trivial.

But there is one big problem with using XML for logging. Log files are typically written incrementally, where each new entry is appended separately to the end of the file. This doesn't fit in at all with the XML document structure which mandates that all content must be enclosed within a single top-level "root element". This is one of the many deficiencies of XML that disqualifies it as a "general-purpose" syntax. But it turns out that there are workarounds for this problem. It wasn't obvious to me at first how to do this, so I have made a note of it here.

First: Write the actual log file by appending XML fragments, without any root element, and where each fragment is an XML element that represents a log entry. Such log files can be rotated and concatenated without any need to worry about XML document boundaries. The JAXB library supports writing of XML fragments instead of documents: just set the Marshaller property JAXB_FRAGMENT to "true".

Second: Enclose the log files content within a root element before reading, so an XML document is presented to the parser. This was the tricky part, because it wasn't obvious to me how to do this without copying the whole log file each time some program wants to read the log document. But a small wrapper document with a special DOCTYPE declaration makes it possible:
    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE log [<!ENTITY data SYSTEM "logfile.txt">]>
<log>
   &data;
</log>
The ENTITY definition makes the JAXB parser fill in the contents of "logfile.txt" where it says "&data;". Such a wrapper document can be used as it is, for example as input to an XSLT processor. Or the wrapper document can be created on-the-fly by a Java program. Here is an example:
    String filename = "logfile.txt";

String x = "<?xml version='1.0' encoding='UTF-8' standalone='yes'?>\n" +
   "<!DOCTYPE log [<!ENTITY data SYSTEM '" +filename+ "'>]>\n" +
   "<log>&data;</log>\n";

InputStream s = new ByteArrayInputStream(x.getBytes("UTF-8"));
JAXBContext jc = JAXBContext.newInstance("mypkg.jaxb.log");
Unmarshaller u = jc.createUnmarshaller();
JAXBElement<Log> root = (JAXBElement<Log>) u.unmarshal(s);
Log log = root.getValue();

for (LogEntry e : log.getLogEntry()) {
   ...
}
The technique can be extended to wrapping multiple log files at once. Just add more ENTITY definitions and enumerate the corresponding entities in the wrapper document.

Using EclipseLink in GlassFish v2.1.1

Oracle's TopLink Essentials is the bundled persistence-layer provider in GlassFish v2. Unfortunately TopLink Essentials is totally inadequate for anything except small toy examples. I found after a few months of development that read and write operations cost O(N2) in the number of objects involved. This of course makes TopLink Essentials useless for real work.

I noticed that GlassFish v3 has switched to EclipseLink, which is based on TopLink but presumably more mature. Upgrading to GlassFish v3 is not an option at the moment, since that would entail either upgrading to ICEfaces 2.0 or bundling JSF 1.2 with the application. ICEfaces 2.0 is on my roadmap, but migration is not scheduled until later. And bundling JSF 1.2 with the application would be too messy. However, thanks to this helpful link, I found that adding EclipseLink to GlassFish v2.1.1 is very easy. Here is what I did:
  • Download EclipseLink from http://www.eclipse.org/eclipselink/downloads/
  • Copy jlib/eclipselink.jar to $GLASSFISH_HOME/domains/domain1/lib/
  • Add the following to the <persistence-unit> element in META-INF/persistence.xml in all application jars that contain JPA-persisted classes, and rename all properties from "toplink" to "eclipselink":

        <provider>org.eclipse.persistence.jpa.PersistenceProvider</provider>

  • Finally, I had to change my JPA-persisted classes from using public fields to using annotated getter and setter methods instead. Otherwise nothing was committed to the database. This made my code a lot more verbose, but perhaps getters and setters allows EclipseLink to use some clever instrumentation techniques? Anyway, this was an unexpected snag, but manageable.
That's all! No bundling of extra jars with the application is needed, the eclipselink.jar file can be distributed separately.