Wednesday, October 17, 2007

iJAM, Formalized Class Loading

(This blog is adapted after comment from Victor, I had not see that iJam had an exception for java.*, excuses).

There is a paper floating on the net that proposes an alternative to the class loading strategy of JSR 277. Richard S. Hall pointed me to this paper and told me to write a blog about it. So I did.

Though the paper provides a formalization of the class loading strategy, the modification it proposes is actually quite simple. Standard Java class loading rules is parent first, then self. In a modern Java system, there can be many ancestors so this rule is supposed to be applied recursively until one reaches the system class loader. This is a simple rule that provides Sun with the guarantee that its classes can not be overridden at a lower level. This model implies that a module can never override anything available in its parents. For example, if a module wants to use a newer version of the XML parser in the platform then it would be nice if these classes could be overridden on application level to ensure the proper version. For this reason, iJAM proposes to change the parent first rule to local first, except for java.* and javax.* classes. This allows a module to override any class in an ancestor class loader.

I do agree with the problem but I disagree rather strongly with the solution, it is too simple. Let me explain why I came to this conclusion.

First, loading only javax.* and java.* from the parent is ignoring classes that come from the bootclasspath but do not start with java.* or javax.*. An example is org.xml.sax. If this package is loaded from a module, then the system classes will load their own instance of this package and modules will see another. This will cause class cast exceptions if you try to give your SAX Handler to the parser because they will use different class loaders for the same package.

Another problem is that many javax.* packages are prime candidates to be downloaded as bundles. Though there are logical reasons to treat java.* as special because overriding java.lang.Object is quite disastrous, there are no reasons to treat javax.* in the same way.

A module must be able to define its priorities in loading, there are clear use cases where overriding a platform provided class is crucial. Can this be done as an all or nothing decision on module level? Don't think so, only in simple cases are these decisions module wide, and when was the last time you did something simple and got paid for it? Sometimes you want to provide a default implementation in case the platform, or any of the other modules, does not provide an implementation. Other times you want to be sure to get your own version. A simple rule as local first can not distinguish between these cases, nor can a rule like parent first satisfy you all the time.

Another problem with the JSR 277 and iJAM rules is that it treats classes as standalone entities, not part of a cohesive package. If your module overrides one class of a larger package you have something we call a split package. Split packages are nasty. First, classes in the same package have package private visibility (the default). However, this rule only works when those classes come from the same class loader. Obviously, you can get some very hard to understand errors when two classes in the same package get access errors for a field that is package private. Really, split packages are evil and it is quite surprising that JSR 277 allows them, as much as iJAM proposes this same behavior. Obviously, there are many other errors that can occur when half of your classes are loaded from your module and the other half from some other module. Each half could have quite interesting assumptions about the other half. Unless you enjoy debugging very strange errors, split packages are just not a recommended strategy. A package is not just part of the name of a class, it is a grouping concept that should be respected.

So how does OSGi address this issue? Well, our class loading rules are extensive because this is a highly complex area that we were unfortunate enough to have to learn over the last 9 years.

Our first rule is to load any java.* class from the parent class loader. As I explained, very few java.* classes can be overridden without wreaking havoc in some way (java.sql.* is the only example that comes to mind).

It then looks at the imported packages of a bundle. In OSGi, the manifest explicitly tells the framework which packages are expected to come from another bundle. These packages are listed in an Import-Package header with explicit constraints on the exporter.

If the package is not imported, the Framework will look in bundles linked with Required-Bundle. If that fails, the framework then looks in the jar and attached fragments. The developer can control searching the JAR in detail with the Bundle-Classpath header.

The class loading strategy is obviously an example that Einstein must have had in mind when he said: "Things should be as simple as possible, but not simpler". We learned the hard way that in the end, the developer must be able to control what he gets from where, but not at the expense of a potentially unworkable system. Though I like French laissez-faire, when it comes to class loading I prefer the more deterministic Anglo-Saxon approach.

To conclude, I really like the formal work that the iJAM paper shows, it is one of my secret desires to ever have a similar Z specification of the OSGi module layer. If the authors of the iJAM papers want to work on this, please let me help. However, I think that the class loading strategy in this paper is, just like the class loading strategy of JSR 277, unfortunately too simplistic for the complexity of the real world.

Peter Kriens

6 comments:

  1. When I saw iJAM pop up in the InfoQ feed in my RSS reader I of course went over and had a quick look. As soon as I realised that they were proposing self-first I dismissed it out of hand. That might have been harsh, but it seems like such a simple and obvious thing to get wrong that I wouldn't be surprised if other Java developers dismiss it in a similar way.

    I backed away pretty quickly and even felt a slight pang of betrayal to OSGi for even having a look. ;)

    ReplyDelete
  2. The JBI specification, back in 2005, already included support for selecting between parent-first or self-first class loading strategies. Parameterization of boot-classloading was also included in the spec. A JBI runtime can also be viewed as a kind of service component container, not as flexible as OSGi class loading mechanisms but it was a start for the JEE world...

    ReplyDelete
  3. I am not sure if Java developers dismiss this local first strategy out of hand immediately. The JSR 277 strategy is slightly, but not much better. I think it is the siren song of simplicity.

    Kind regards,

    Peter Kriens

    ReplyDelete
  4. I'm a bit puzzled by your comment

    "Therefore the proposal to first
    search for all classes in the
    module first is clearly wrong,
    any class in java.* must come
    from the system class loader."

    I do not find such a statement in
    the paper. - Especially in section
    "3.1 Class definition lookup
    functions" they clearly state,
    that they fullfill the requirement
    wrt. java.*

    Can you please comment?

    Kind regards,

    Viktor Ransmayr

    ReplyDelete
  5. You are right that they search first in the "Java's core library", I missed that.

    1. The Java core library is not defined in the paper. Is this everything rt.jar? Everything that starts with java.* and javax.*? Extension libraries? For a formal paper, this is very undefined. And one of the key things with a module system is to be able to override javax.* packages.

    2. I do not think that it invalidates what I am saying, the examples are now just using the wrong classes which will make the problems more subtle.

    I'll fix the text.

    Kind regards,

    Peter Kriens

    ReplyDelete
  6. Looking further I saw that the example code shows that java.* and javax.* are coming from the parent.

    This will have very interesting effects because javax.xml requires the use of org.xml.sax, where does that one come from?

    In OSGi, the framework must set up export statements for the runtime. This allows packages to come from the boot classpath but it also allows these packages to be overridden by bundles.

    Kind regards,

    Peter Kriens

    ReplyDelete