This presentation is the next step in the saga of Java modularity. Sun’s history of Java modularity is littered with blogs and presentations; it shows no proper requirements and design documentation whatsoever. Neither has any of these blogs and presentations been discussed in, or presented to, the relevant JSR 294 or JSR 277 expert groups. It feels therefore awkward to react to a design that has no visible requirements nor a proper design document. How can one judge it, except highly subjectively? By its very nature, a presentation is giving an overview of an underlying construct, leaving out all the details that are so important to understand and evaluate a proper design. One can only hope that these design documents exist in Sun's offices somewhere.
However, getting modularity right in the Java platform can make a tremendous difference for hundreds of thousands of companies and millions of individuals. And clearly there is a self interest as well: it is of paramount importance to the OSGi Alliance to get it right. Because there are no discussions on the proper mailing lists, I feel forced to react with the same ill-matched tools of blogs and presentations. Excuses.
The Devoxx presentation begins with sketching the problems that JSR 294 addresses. These can be summarized as:
- JAR Hell
- Platform Fragmentation
- Startup Performance
JAR Hell is composed of the following problems (derived from Dependency Hell, DLL Hell, and JAR Hell):
- (Too) Many Transitive Dependencies. This is related to the dreadful feeling you get when you want to use A, which needs B and C, which need D, E, F, G, ad nauseum. Though a certain amount of coupling is required to enable reuse and extension mechanisms, it often turns out that many dependencies are not always necessary. However, in Java there is no way to express this optionality.
Many popular (open source) libraries have a staggering amount of dependencies. For example, use maven to compile "Hello World" for the first time. It downloads an impressive amount of libraries because it has no mechanisms to manage the unnecessary transitive dependencies. OSGi handles this problem by using the service model where only the packages with service interfaces are shared and picking packages, which have the smallest possible granularity in the Java VM model, as the unit of sharing. - Dependency on multiple versions. Java applications cannot have multiple versions of the same component in a single application. Well, why would you want this? Assume you rely on JAR A and JAR B. Both A and B require JAR C. Unfortunately, A requires version 1 of C and B requires version 2 of C. In Java you are out of luck. The reason is that they only have a linear class path so or C;version=1 comes first or C;version=2 comes first. During run-time, either A or B is bound to the wrong version, wreaking havoc at unexpected times and places. In OSGi, this problem is addressed by precisely wiring up bundles based on meta-data in the JAR manifest.
- Unmanaged Dependencies. Java has no way to specify dependencies on other JARs. Depending on the class loader hierarchy, the class path, JARs in folders, and magic. This is all done "blind", there is no verification that it matches the assumptions in the JAR files. The effect is that errors happen (too) late and not early. Trying to make a set of JARs work together can therefore be a cumbersome and very hard to get right. In OSGi, this is addressed with the manifest in each bundle that specifies the assumptions of the code. This allows the framework to verify the bundle dependencies before the code gets started.
- Use of private code. All public code in a JAR file is visible to all other JAR files, and public is pervasive because it is required to implement an interface. It is therefore easy to use code that is supposed to be an implementation detail, which can easily break a client in a later version or unnecessarily restrict the provider. In OSGi, this is addressed by explicitly exporting packages. All other packages are private.
- Stomping. Stomping is the problem that you overwrite one JAR with another because the name is the same though the version is different. This can have very subtle, and much less subtle, unpleasant effects while everything looks OK. OSGi addresses this problem by having a clear install phase that separates the JAR from the internal structure. That is, a JAR file can have the same name but can both be installed in the same VM if they have different versions. This install phase connects very easily with all kinds of repositories and management systems.
The key mistake made with the profiles and configurations was caused by the lack of modularity. If Java had been re-factored in a core VM (including java.lang because everybody must share the Class and Object classes) then sets of packages that could have extended this core VM. However, for some bizarre reason it was often not even legal to run a packaged developed in Java ME. The most bizarre JSR that I ever was involved with was JSR 197 that made it legal to run the javax.microedition.io package on Java SE. It is hard to believe that it took us a year to accomplish this minor, but important for one of the OSGi's specifications, feat.
In OSGi we came from the embedded world so this was a serious problem for us from day one. Obviously, we could not change Java itself, but at least we could address the sub-setting problem and the unmanaged aspect of it. We came up with the execution environments. An execution environment looks a lot like a profile/configuration but it is not intended to be the final word. It is a description of a set of classes that are a proper subset of all feasible Java environments, that is, the common denominator. Compiling against this subset (we have the Jars that contain only public API for you to do this) ensures that you are not coupled to anything outside the execution environment and as a consequence, outside any of the sub-setted profiles. For example, we have ee.minimum, which runs on all known profiles, from CDC/FP to Java SE 7. We use this execution environment to target all our APIs. We also have ee.foundation which is aligned with Java ME Foundation Profile. These execution environments are used by Equinox, Felix, and Knopflerfish to allow their implementations to run on the widest possible set of VMs.
However, we did not stop there. We also designed meta-data in the bundle's manifest that indicates the assumption of the bundle about its environment. A bundle cannot resolve unless the framework can establish that the VM implements one of the required execution environments.
Performance. This problem is related to the work done in Java 6 where the JRE can be incrementally loaded to reduce start-up time. This speedup was necessary because it basically made applets impossible to use because the page froze for up to a minute the first time an applet got started. There are two strategies involved in improving performance: lazy and eager. In a lazy model, the code is not activated until there is a direct need. This model works very well where you have a large application where many parts are rarely used and loading code is relatively fast (local disk). Eclipse is a good example where lazy loading is used heavily to minimize startup and footprint. Eager loading is better when it is clear you will need it soon and loading code is slow (over the net). A good example is the average applet. There are many variations and anybody ever written a serious cache manager knows the trickiness. Performance from a module system therefore depends on the available meta-data and initialization/activation model.
This problem is further clarified later in the presentation when it requires that a module should be able to contain partial packages to speed up downloads: java.lang is split in 3 different modules. This requirement is begging for more complexity and the payoff seems very slim. Splitting packages along performance boundaries will require great foresight to have any performance effects and will in almost all cases be in conflict with minimizing coupling. It is a classic example of coupling of two unrelated concepts (performance, low coupling) into a single design concept (module). In practice, it always results in systems that are not good in either performance nor in decreasing the coupling.
Though performance is a crucial aspect of the VM (and some fantastic work has been done in Hotspot, even without modularity), it is important not to mix concepts that have very different optimization axes. Every time when I see that happening, both axes have to be compromised.
Integration with native packaging systems. One of the not well defined concepts in the presentation is the integration with native packaging systems. A typical native packaging system is rpm. Packaging systems use dependency graphs and scripts to modify an operating system to a new state where it provides new functionality. There is a tremendous experience in packaging systems and nowadays they are quite impressive in how reliable they work.
However, take one step back. The absolute number one value of Java is its platform independence. Native packaging systems should be able to reliably provide modules to the Java platform, but I am fairly confident that inside Java we do not want to see any of these native systems in Java. It is crucial that the Java module system is a well defined system in Java without having to defer to a platform dependent module system. This would be the anathema of Java. However, the presentation seems to assume that native packaging system would integrate with the VM? For example, module version syntax is not defined so it can leverage formats from packaging systems. Alas, if we only had a design document and not a presentation ...
Updated 1: Alex Buckley has told me that Project Jigsaw nor JSR 294 willprepare for native packaging systems ... So my fear is unjustified.
Package granularity. Package granularity has always been a hot potato in Java. Though packages look like a first class citizen, there is a lot of fudging going on to ignore them. On one side we have a Package class representing a package, we import packages in our source code, and there are clear access visibility rules for classes in the same package. However, in the class file format a package is not visible and this has led many (including the JRE) to treat packages as second class citizens.
In the presentation, it is argued that libraries often consist of multiple packages. Though the OSGi service model shows that constraining the interface to a single package works well for highly decoupled designs, it does make sense to be able to think about a group of packages as set that belongs together. This is for me the conceptual advantage of a module: a group of packages that tightly belong together. Superpackages anybody?
Multiple module systems. There seems to be a strong implicit requirement that there will be multiple module systems in the VM. These module systems are somehow supposed to handle the run-time class loading in different ways. Sun will provide a "simple" module system for the JDK, but it will allow others for the application level. They are even polite enough to not specify a version syntax so that OSGi can use its own version syntax while Sun can continue with versions that always start with 1.
Multiple implementations of a specification sounds good, doesn't it? Hmm. Let's look what it means. It means that programmers will have to choose a deployment format because none is specified. As a programmer I will have to choose one or more formats I support because I cannot waste the resources to support all. Do I have any gain as a programmer by having a "choice"? Nope, every choice I make makes my code incompatible with modules from other systems.
Specifications are supposed to simplify the life of programmers, not make it harder. By not having an open discussion about deployment formats and creating consensus around one format, the problem is dumped on the lap of millions of programmers, creating confusing and chaos where none should be necessary. Even if interoperability is supported, there will be lots of small problems for no obvious reason.
And there is of course a political aspect. Sun moved the deployment aspects to an OpenJDK project called Jigsaw. The scope of Jigsaw is to create a module system for the JDK and applications. Though I am fairly sure it will be not match OSGi's capabilities it will be part of the JDK. It is hard to ignore the similarity between Microsoft including Internet Explorer in their operating system because they could not compete on functionality with Netscape.
Missing Aspects
The following section details problems that I thought were an intrinsic part of a module system but that are not discussed in the presentation. I think these areas are very important and closely related to Java modularity.
Class Space Consistency. One of the hardest parts in the OSGi R4 specification was class space consistency. Once you allow multiple versions of the same package in a VM you must ensure that the different modules use the right class loaders or you get hard to diagnose class cast exceptions. That is, a class X from class loader A is not compatible with a class X from class loader B. Confusing but true. In OSGi we have the concept of a class space and we maintain consistency in this class space using the "uses" directive, providing information of what implementation dependencies a package has. With this directive, a framework can assign bundles to different class spaces and thereby ensure no collisions happen. The presentation explicitly acknowledges that this can happen in the proposed model, but does not propose to fix this.
This might not be a major problem for a JRE where it is likely that all modules are only a bug fix away from each other. By definition, a JRE is only depending on itself. It is not very likely that a JRE will have the problem of multiple versions of the same packages. However, application programmers that use a large number of open source libraries are rarely that lucky.
Supporting multiple versions in one application is one of the core aspects of JAR hell. Enabling this is therefore good. Not guaranteeing class space consistency will only create module hell.
Plugin/Extensions. The largest missing area in the presentation is a plugin extension model. One of the primary reasons to choose OSGi is to provide an extension model but the presentation assumes all modules are statically wired. Implicit in the model is that we'll be stuck with class path scanning (OK, module scanning) and class loading hacks to make today's applications.
Compatibility. Java is clearly backward compatible and I applaud Sun for the remarkable feat that Java 1 code can still run on a Java 6 VM. However, there is also forward compatibility and that is largely lacking from the JDK perspective. Most javax packages can easily run on earlier VMs. For example, the javax.script package defines a way how script engines can make themselves available to application code. There is nothing in this package that would make it impossible to run it on a Java 1.2 VM. However, it is only available in Java 6. If you can always run the latest VM (I guess like when you are a Sun employee) this is hardly a problem. However, for the rest of us it does pose a problem to move our code bases to the next version, which usually causes a lag of 2-4 years.
Project Jigsaw in OpenJDK and JSR 277 target Java 7, which is supposed to be out in 2010. How could people on older VMs take advantage of some of the features? Is modularity really only needed on future VMs? Looking at OSGi it seems unnecessary to only focus on the future VMs, it runs as well on Java 7 as on Java 1.2.
Preliminary Conclusion
The presentation unfortunately shows all the aspects of too few eyes from too few perspectives. The interests from the VM perspective have a huge role in the presentation. However, the application programmer's perspective has been more or less ignored.
And then there is the most important aspect of all: multiple module systems. Java is in the extremely fortunate situation to have only one modularity standard today that is well adopted and highly mature. Analyzing the problems as stated in the presentation I have not seen anything that OSGi could not do better today. Though in any other area competition is good, a module system is the technology that allows parties to compete on better implementations without technological friction. A single module system will reduce the ease with which one can adopt open source libraries or commercial components. The world really does not need more than one module system in Java.
I am hoping that Mark Reinhold and Alex Buckley will bring their requirements to the OSGi Core Platform Expert Group where we could discuss the problems and have more (some very experienced) eyes on what is really need. I am pretty sure we can find a consensus. I actually hope we can do this soon: there is an OSGi Expert Group meeting in Boston in January and Sun is a member.
I am fairly sure there is only one requirement we can unfortunately not address: "There shall be no OSGi technology in the solution".
Peter Kriens