<< December 14, 2009 | Home | December 16, 2009 >>

RSS | Atom | E-mail

Navigate

Advanced Search

Archives

2025	2024
2023	2022
2021	2020
2019	2018
2017	2016 February (1)
2015	2014 August (1)
2013 October (1) July (1) January (1)	2012 December (1) April (1)
2011	2010 May (2) April (2)
2009 December (1) September (1) June (1) May (1) April (1) February (1) January (1)	2008 December (2) November (2)

Recent Blog Entries

Selective repository import via "git filter-branch"
This article is concerned with selective importing of file repositories while keeping version history intact. It is not particularly about Java technology, but I'm tossing it in here anyway since I have been blogging here about my adventures with version ...
418 I'm a teapot
Problem: How to handle session timeouts for AJAX calls made from a plain HTML + jQuery page? The webapp has a GUI where the user logs in via a form and gets an authentication cookie. A servlet filter handles authentication and accepts a valid cookie, or ...
Double.MIN_VALUE is positive
Note to self: Double.MIN_VALUE is positive. Repeat again: positive. Its larger cousin Float.MIN_VALUE is also positive. It's not negative like the MIN_VALUE properties of all the other signed numeric data types like Integer, Byte, Short, and Long. ...

Recent Responses

Re: Java VIII
Throughout the great scheme of things you actually receive a B- just for hard work. Where you actually lost me personally was in the specifics. As as the maxim goes, details make or break the argument.. And it could not be much more correct right here. ...
Re: Dynamic DataTables in ICEfaces
It took me a day's work to figure this out (ICEfaces 1.8.2) - I wish I had come across your info in the first place. The behavior is unexpected indeed: the explicit column id won't be used anywhere in the rendered HTML (for a simple table, at least), the ...
Re: Java VIII
There is also talk about increasing the number of Access Modifiers, too. + global - reachable from another JVM on the same node + cosmic - reachable from any JVM on the Internet For details, see the Sun white paper 'The Six Modes of Java VIII'.

Summary of blogs

Cuspy blogs

Spilled beans - Thoughts about Java technology

Rubble - Rubble the rule engine

Kitchen sink - Anything goes

Log me in using Google

Being too platform-independent

Yes, the title says platform-independent. Let me explain: A couple of weeks ago I fixed a bug that was surprisingly difficult to track down, because it was in a part of the code that I considered to be completely finished and safe and tucked away for the past 5 years.

The buggy code was in a part of the Rubble parser that was responsible for detecting line endings. First a little background: Rubble syntax is stream-oriented (aka "free format") in contrast to line-oriented syntaxes like Machine-Code Assembler, FORTRAN, BASIC, and Python. Except for one single exception: Rubble has something called line end comments, which means that anything between an unquoted "#" character and the next line ending is syntactically equivalent to a whitespace. This is a proven convenient feature for source code comments, and such line end comments have a long tradition in otherwise stream-oriented languages, from LISP to Prolog to C++ to Perl, Java, and C#.

My earliest Rubble parsers were just proof-of-concept code that assumed a Unix environment, so when a "#" character was encountered, everything up to the next newline character (Java "\n") was skipped, since this is the end-of-line convention in Unix. Actually the character introducing a line end comment was "%" instead of "#" in the early days of Rubble, for attempted compatibility with Edinburgh Prolog syntax, but that is unimportant now.

What happened next was that I decided sometime in 2004 to make this particular part of the code platform-independent. So instead of looking for the next "\n" character, the code looked for the next occurrence of the string returned by System.getProperty("line.separator"). This was the proper way to be platform-independent about line endings, or at least that's what I thought at the time.

Now fast-forward to November 2009. A customer installed Rubble on a server running Windows. You might ask: why run Windows on a server, or on any machine at all? Ever? Well, since they are a customer I try to avoid posing too many questions like that. They have their reasons. Anyway, the failure mode was that some Rubble code components never executed at all, they just failed silently. And mysteriously, at the same time there were several Rubble demos that worked perfectly.

If you have read this far you have probably figured it out by now. Yes, the Rubble code that failed had a line end comment at the very beginning of the file, and that file used the Unix end-of-line convention. On the Windows server the "line.separator" property is "\r\n" instead of just "\n", so no line ending was found and the entire file was treated as a comment.

So using the platform-independent way of looking for the "line.separator" property is obviously the wrong thing to do in a server environment, where data comes from multiple sources that don't necessarily comply with the server's local platform conventions.

The solution, it turns out, has been present in Java SE for several years now: just use the traditional ^ and $ patterns in regular expressions. According to the docs for java.util.regex.Pattern, the following strings are recognized as line terminators:

A newline (line feed) character ('\n'),
A carriage-return character followed immediately by a newline character ("\r\n"),
A standalone carriage-return character ('\r'),
A next-line character ('\u0085'),
A line-separator character ('\u2028'), or
A paragraph-separator character ('\u2029').

This handles the entire spectrum of Unicode line terminators except for '\u000C' (aka ASCII form feed). I don't know why Java doesn't see form feed as a line terminator, but I am willing to compromise here and stick with the Java pattern because it's clearly good enough for all practical purposes.

Things could have taken an entirely different turn back in 2004 if I had been brave enough to use regular expressions in the parser. Then everything would have kept working magically, without intervention from me. But in those early days I was still using Java 1.3 which didn't have a regexp library. At one point I even had a Rubble version that ran on J2ME MIDP 1.0 which didn't have List and Map abstractions, or even floating-point math operations. So "getting it right from the start" was theoretically possible in this case, but only in retrospect.

Add a comment

Posted by Björn Danielsson on December 15, 2009 8:48:00 PM CET #

December 2009
Sun	Mon	Tue	Wed	Thu	Fri	Sat
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31
Nov \| Today \| Jan