XML Logging
I don't think it's an understatement to say that XML is one of the most misused and overengineered technologies of the past decade. XML is good for markup of text documents, but it's clearly a bad format for most other purposes. Nevertheless, many people (me included!) misuse XML for things like configuration files and data export, simply because the XML tools are so ubiquitous today. If XML is used by some part of the application, there is a temptation to use it for all sorts of other things as well, whenever the need for a representation syntax comes along. So rather than bundling e.g the more appropriate JSON library for data transport, a more clumsy XML representation is often used simply because the tools are already in place.
Structured log files are however one of those cases where I think the XML format makes sense. It's actually quite reasonable to think of log files as heavily structured text documents, for which XML markup is a good fit. The classical Unix syslog format with long lines composed of space-separated strings becomes difficult to handle for both machines and humans whenever the content's structural complexity is non-trivial.
But there is one big problem with using XML for logging. Log files are typically written incrementally, where each new entry is appended separately to the end of the file. This doesn't fit in at all with the XML document structure which mandates that all content must be enclosed within a single top-level "root element". This is one of the many deficiencies of XML that disqualifies it as a "general-purpose" syntax. But it turns out that there are workarounds for this problem. It wasn't obvious to me at first how to do this, so I have made a note of it here.
First: Write the actual log file by appending XML fragments, without any root element, and where each fragment is an XML element that represents a log entry. Such log files can be rotated and concatenated without any need to worry about XML document boundaries. The JAXB library supports writing of XML fragments instead of documents: just set the
Second: Enclose the log files content within a root element before reading, so an XML document is presented to the parser. This was the tricky part, because it wasn't obvious to me how to do this without copying the whole log file each time some program wants to read the log document. But a small wrapper document with a special DOCTYPE declaration makes it possible:
Structured log files are however one of those cases where I think the XML format makes sense. It's actually quite reasonable to think of log files as heavily structured text documents, for which XML markup is a good fit. The classical Unix syslog format with long lines composed of space-separated strings becomes difficult to handle for both machines and humans whenever the content's structural complexity is non-trivial.
But there is one big problem with using XML for logging. Log files are typically written incrementally, where each new entry is appended separately to the end of the file. This doesn't fit in at all with the XML document structure which mandates that all content must be enclosed within a single top-level "root element". This is one of the many deficiencies of XML that disqualifies it as a "general-purpose" syntax. But it turns out that there are workarounds for this problem. It wasn't obvious to me at first how to do this, so I have made a note of it here.
First: Write the actual log file by appending XML fragments, without any root element, and where each fragment is an XML element that represents a log entry. Such log files can be rotated and concatenated without any need to worry about XML document boundaries. The JAXB library supports writing of XML fragments instead of documents: just set the
Marshaller property JAXB_FRAGMENT to "true".Second: Enclose the log files content within a root element before reading, so an XML document is presented to the parser. This was the tricky part, because it wasn't obvious to me how to do this without copying the whole log file each time some program wants to read the log document. But a small wrapper document with a special DOCTYPE declaration makes it possible:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>The ENTITY definition makes the JAXB parser fill in the contents of "logfile.txt" where it says "&data;". Such a wrapper document can be used as it is, for example as input to an XSLT processor. Or the wrapper document can be created on-the-fly by a Java program. Here is an example:
<!DOCTYPE log [<!ENTITY data SYSTEM "logfile.txt">]>
<log>
&data;
</log>
String filename = "logfile.txt";The technique can be extended to wrapping multiple log files at once. Just add more ENTITY definitions and enumerate the corresponding entities in the wrapper document.
String x = "<?xml version='1.0' encoding='UTF-8' standalone='yes'?>\n" +
"<!DOCTYPE log [<!ENTITY data SYSTEM '" +filename+ "'>]>\n" +
"<log>&data;</log>\n";
InputStream s = new ByteArrayInputStream(x.getBytes("UTF-8"));
JAXBContext jc = JAXBContext.newInstance("mypkg.jaxb.log");
Unmarshaller u = jc.createUnmarshaller();
JAXBElement<Log> root = (JAXBElement<Log>) u.unmarshal(s);
Log log = root.getValue();
for (LogEntry e : log.getLogEntry()) {
...
}
Using EclipseLink in GlassFish v2.1.1
Oracle's TopLink Essentials is the bundled persistence-layer provider in GlassFish v2. Unfortunately TopLink Essentials is totally inadequate for anything except small toy examples. I found after a few months of development that read and write operations cost O(N2) in the number of objects involved. This of course makes TopLink Essentials useless for real work.
I noticed that GlassFish v3 has switched to EclipseLink, which is based on TopLink but presumably more mature. Upgrading to GlassFish v3 is not an option at the moment, since that would entail either upgrading to ICEfaces 2.0 or bundling JSF 1.2 with the application. ICEfaces 2.0 is on my roadmap, but migration is not scheduled until later. And bundling JSF 1.2 with the application would be too messy. However, thanks to this helpful link, I found that adding EclipseLink to GlassFish v2.1.1 is very easy. Here is what I did:
I noticed that GlassFish v3 has switched to EclipseLink, which is based on TopLink but presumably more mature. Upgrading to GlassFish v3 is not an option at the moment, since that would entail either upgrading to ICEfaces 2.0 or bundling JSF 1.2 with the application. ICEfaces 2.0 is on my roadmap, but migration is not scheduled until later. And bundling JSF 1.2 with the application would be too messy. However, thanks to this helpful link, I found that adding EclipseLink to GlassFish v2.1.1 is very easy. Here is what I did:
- Download EclipseLink from http://www.eclipse.org/eclipselink/downloads/
- Copy
jlib/eclipselink.jarto$GLASSFISH_HOME/domains/domain1/lib/ - Add the following to the
<persistence-unit>element inMETA-INF/persistence.xmlin all application jars that contain JPA-persisted classes, and rename all properties from "toplink" to "eclipselink":
<provider>org.eclipse.persistence.jpa.PersistenceProvider</provider>
- Finally, I had to change my JPA-persisted classes from using public fields to using annotated getter and setter methods instead. Otherwise nothing was committed to the database. This made my code a lot more verbose, but perhaps getters and setters allows EclipseLink to use some clever instrumentation techniques? Anyway, this was an unexpected snag, but manageable.
eclipselink.jar file can be distributed separately.
Being too platform-independent
Yes, the title says platform-independent. Let me explain: A couple of weeks ago I fixed a bug that was surprisingly difficult to track down, because it was in a part of the code that I considered to be completely finished and safe and tucked away for the past 5 years.
The buggy code was in a part of the Rubble parser that was responsible for detecting line endings. First a little background: Rubble syntax is stream-oriented (aka "free format") in contrast to line-oriented syntaxes like Machine-Code Assembler, FORTRAN, BASIC, and Python. Except for one single exception: Rubble has something called line end comments, which means that anything between an unquoted "#" character and the next line ending is syntactically equivalent to a whitespace. This is a proven convenient feature for source code comments, and such line end comments have a long tradition in otherwise stream-oriented languages, from LISP to Prolog to C++ to Perl, Java, and C#.
My earliest Rubble parsers were just proof-of-concept code that assumed a Unix environment, so when a "
What happened next was that I decided sometime in 2004 to make this particular part of the code platform-independent. So instead of looking for the next "
Now fast-forward to November 2009. A customer installed Rubble on a server running Windows. You might ask: why run Windows on a server, or on any machine at all? Ever? Well, since they are a customer I try to avoid posing too many questions like that. They have their reasons. Anyway, the failure mode was that some Rubble code components never executed at all, they just failed silently. And mysteriously, at the same time there were several Rubble demos that worked perfectly.
If you have read this far you have probably figured it out by now. Yes, the Rubble code that failed had a line end comment at the very beginning of the file, and that file used the Unix end-of-line convention. On the Windows server the "
So using the platform-independent way of looking for the "
The solution, it turns out, has been present in Java SE for several years now: just use the traditional
Things could have taken an entirely different turn back in 2004 if I had been brave enough to use regular expressions in the parser. Then everything would have kept working magically, without intervention from me. But in those early days I was still using Java 1.3 which didn't have a regexp library. At one point I even had a Rubble version that ran on J2ME MIDP 1.0 which didn't have List and Map abstractions, or even floating-point math operations. So "getting it right from the start" was theoretically possible in this case, but only in retrospect.
The buggy code was in a part of the Rubble parser that was responsible for detecting line endings. First a little background: Rubble syntax is stream-oriented (aka "free format") in contrast to line-oriented syntaxes like Machine-Code Assembler, FORTRAN, BASIC, and Python. Except for one single exception: Rubble has something called line end comments, which means that anything between an unquoted "#" character and the next line ending is syntactically equivalent to a whitespace. This is a proven convenient feature for source code comments, and such line end comments have a long tradition in otherwise stream-oriented languages, from LISP to Prolog to C++ to Perl, Java, and C#.
My earliest Rubble parsers were just proof-of-concept code that assumed a Unix environment, so when a "
#" character was encountered, everything up to the next newline character (Java "\n") was skipped, since this is the end-of-line convention in Unix. Actually the character introducing a line end comment was "%" instead of "#" in the early days of Rubble, for attempted compatibility with Edinburgh Prolog syntax, but that is unimportant now.What happened next was that I decided sometime in 2004 to make this particular part of the code platform-independent. So instead of looking for the next "
\n" character, the code looked for the next occurrence of the string returned by System.getProperty("line.separator"). This was the proper way to be platform-independent about line endings, or at least that's what I thought at the time.Now fast-forward to November 2009. A customer installed Rubble on a server running Windows. You might ask: why run Windows on a server, or on any machine at all? Ever? Well, since they are a customer I try to avoid posing too many questions like that. They have their reasons. Anyway, the failure mode was that some Rubble code components never executed at all, they just failed silently. And mysteriously, at the same time there were several Rubble demos that worked perfectly.
If you have read this far you have probably figured it out by now. Yes, the Rubble code that failed had a line end comment at the very beginning of the file, and that file used the Unix end-of-line convention. On the Windows server the "
line.separator" property is "\r\n" instead of just "\n", so no line ending was found and the entire file was treated as a comment.So using the platform-independent way of looking for the "
line.separator" property is obviously the wrong thing to do in a server environment, where data comes from multiple sources that don't necessarily comply with the server's local platform conventions.The solution, it turns out, has been present in Java SE for several years now: just use the traditional
^ and $ patterns in regular expressions. According to the docs for java.util.regex.Pattern, the following strings are recognized as line terminators:- A newline (line feed) character ('\n'),
- A carriage-return character followed immediately by a newline character ("\r\n"),
- A standalone carriage-return character ('\r'),
- A next-line character ('\u0085'),
- A line-separator character ('\u2028'), or
- A paragraph-separator character ('\u2029').
Things could have taken an entirely different turn back in 2004 if I had been brave enough to use regular expressions in the parser. Then everything would have kept working magically, without intervention from me. But in those early days I was still using Java 1.3 which didn't have a regexp library. At one point I even had a Rubble version that ran on J2ME MIDP 1.0 which didn't have List and Map abstractions, or even floating-point math operations. So "getting it right from the start" was theoretically possible in this case, but only in retrospect.
