Thursday, May 26, 2011

The Unbearable Lightness of Jigsaw

Yesterday Mark Reinhold posted a link to the new requirements on Jigsaw. He states a more ambitious goal to make Jigsaw not just for VM developers only as they said before. The good news is that OSGi is recognized. The even better news is that now there are requirements it is obviously clear that OSGi meets virtually all of them, and then some. The bad news is that despite OSGi meeting the requirements it does not look likely that Jigsaw will go away anytime soon.

As this round will likely get me another 15 minutes of fame, let me use this fame to try to explain why I consider the Jigsaw dependency model harmful to the Java eco-system.

Jigsaw's dependency model must be based on Maven concepts according to requirement #1. This model is the same as Require-Bundle in OSGi. It is based on a module requiring other modules, where modules are deployment artifacts like JARs. To deploy, you transitively traverse all their dependencies and put them in the module system to run. That is, a module for Jigsaw is a deployment module, not a language module.

It is a popular solution because it is so easy to understand. However, Einstein already warned us: "A solution should be as simple as possible, but not simpler." The problem of managing today's code base for large applications is complex and all of us in the Java eco-system will be harmed with a simplistic solution.

The key problem is that if you depend on one part of a JAR you inherit all its transitive dependencies. For example, take the JAR Spring Aspects JAR. Even if my code only uses the org.springframework.beans.factory.aspectj package in this JAR, the Spring Aspects JAR will unnecessarily provide me with on dao, orm, jpa, as well as the transaction package. If I deploy my application  I am forced to drag in the dependencies of these packages, which drag in other JARs, which each have additional dependencies, etc. etc. During deployment tens (if not hundreds) of JARs are dragged in even though those packages can never be reached from my code. Not only is this wasteful in memory, time, and complexity, it also inevitably causes problems in compatibility. For example, if the jpa dependency and the orm dependency transitively depend on different versions of the log4j JAR I cannot even run my code for no real reason whatsoever.  In practice this problem is often ignored, for example maven will take the first log4j version it encounters, regardless if this is a suitable version. The value of type safety is questionable if that kind of fudging is acceptable.

Maven users are already all too familiar with this problem: many are complaining that maven downloads the Internet. This is not bashing maven, maven only does what it is told to do in the poms. In Tim O'Brien's blog he explains that Maven does not download the Internet, People download the Internet. Though he rightly blames the poms, he ignores the fact that the dependency model makes it hard to work differently. It is like you want a candy bar but you can only get the whole supermarket. He advises each JAR to be self contained and completely modularized to not drag in extraneous dependencies. Agree. Unfortunately this means you get lots and lots of JARs. And with lots of JARs on your module path without side by side versioning the compatibility issues like the above log4j example explode. And Jigsaw is not likely to have simultaneous multiple versions in run-time when critically reading the requirement.

In Unix dependency managers the transitive aspect is not such a big deal because dependencies are relatively coarse grained and there is a reason why you usually have to build stuff to make it compatible with your platform. Java is (thank God) different. Java dependencies are not only much more fine grained, they are binary portable to any platform, most important of all, often they have no dependency on an implementation but use a dependency on a contract instead. Unix was brilliant 40 years ago and is still impressive today but let's not undervalue what Java brought us. We must understand the differences before we revert to a 40 year old technology from an operating system to solve the problems of a modern and unique programming environment like Java.

The key advantage that Java brings to the dependency landscape is that each class file encodes all its dependencies. Give me your class file and I will tell you what packages you depend on. For example, when I compile my JAX-RS class against the jersey JAR(s) you will find that the class file only depends on the javax.ws.rs package even though the Jersey file contains lots of implementation packages. However, in the Jigsaw world I must select a module JAR that provides that package. Best case is to pick the JSR 311 API JAR for compilation to ensure I do not accidentally pick up implementation classes. However, I do not want to make the life of the deployer hard by unnecessarily forcing a specific implementation in run-time. So what module should I pick to depend on?

Despite common believe, we do not need fidelity across all phases between the build time and run-time; we need to compile against API and run against implementations that are compatible with that API. A good Java module system must allow me to depend on an API module at compile time and select an appropriate implementation at deployment-time. Forcing the compile time dependency to be the same as the run-time dependency creates brittle systems, believe me. The problem can be solved by separating the concept of a deployment module (the JAR) and the language module.

We could define the concept of a language module in Java and allow the JAR to contain a number of these modules. Surprisingly (for some), Java already provides such a language module, they are called packages. Packages provide a unique name space, privacy, and grouping, all hallmarks of modularity. The JLS introduction even names them as similar to Modula's modules. Actually, module dependencies on packages are already encoded in the class files. The only thing we need to add to the existing system is versioning information on the package and record this version in the class file. During deploy-time a resolver can take a set of JAR's (which can be complete) and calculate what modules should be added depending on the actual environment. E.g. if it runs on a Mac it might deploy another set of additional JAR's then if it runs on the PC or Linux. 

To manage real world dependencies we need this indirection to keep this manageable. From 2010 to 2017 the amount of software will double. If we want to have a fighting chance to keep up we must decouple compile time from deployment time dependencies and have a module system that can handle it.

It seems so obvious to build on existing concept in the Java language that it begs an explanation why the transitive dependency model on JARs is so popular. The only reason I can think of is that is easier for the module system providers to program. In the Jigsaw model traversing the dependencies is as easy as taking the module name + version, combining it with a repository URL, doing some string juggling and using the result as a URL to your module. Voila! Works with any web server or file system!

Package dependencies require an indirection that uses a mapping from package name to deployment artifact. The Jigsaw/Maven model is like having no phone number portability because the operators found a lookup too difficult. Or worse, it is like having a world without Google ...
 
So for the convenience of a handful of module system providers (in the case of Jigsaw just one because it is not a specification) we force the rest of us to suffer? Though people have learned to use dynamic class loading and not telling the truth in their poms to work around dependency problems inherent in the dependency model we should not encode this in my favorite language. Should we not try to address this problem when a new module system is started? Or better, use an already mature solution that does not suffer the impedance mismatch between compile-time and run-time?

At JavaOne 2007 Gilad Bracha, then Sun's man in charge of JSR 277, told us he had started JSR 294 because he did not trust the deployers to handle the language issues. At that particular time this remark puzzled me.

Peter Kriens

P.S. Opinions stated in this blog are mine alone and do not have to agree with the OSGi Alliance's position

9 comments:

  1. Funny thing, I have done a presentation at the Toulouse Jug just tonight. And I was pointing rightly at the main difference between Maven/Ivy and the OSGi dependency models you just wrote about, difference I certainly learned on this blog :). I just wished you blogged ealier so I could answer one of the question there was in the audience about the revival of Jigsaw I didn't knew about until your post.

    ReplyDelete
  2. Sorry :-)

    You can look up at earlier rants against Require-Bundle which OSGi's way of the Maven/Ivy/Jigsaw model

    ReplyDelete
  3. Hi Peter. Actually, I think OSGi is not bad, but some of your statements seem to me too simplistic and just slightly objective:

    "The key problem is that if you depend on one part of a JAR you inherit all its transitive dependencies. [..] If I deploy my application I am forced to drag in the dependencies of these packages, which drag in other JARs, which each have additional dependencies, etc. etc."

    You don't have to provide the transitive dependencies of your bundle in the OSGi runtime? How did your bundle start when its transitive dependencies cannot be resolved?

    "The key advantage that Java brings to the dependency landscape is that each class file encodes all its dependencies."

    That is only half of the truth. Dependencies to classes/resources that will be loaded via reflection are not encoded in the class file. You could analyze the string constants but this is ambiguous and not deterministic.

    "Surprisingly (for some), Java already provides such a language module, they are called packages. Packages provide a unique name space, privacy, and grouping, all hallmarks of modularity."

    Let me add two hallmarks of modules: self-contained (closed functional unit) and potentially replaceable.

    I've rarely seen that a functionality was contributed by only one package. In the most of the cases, we had a JAR/bundle which contributes a closed functionality to the system, e.g. persistence, the model, logging... For me, these JARs are subdivided via packages into identifiable logical areas, which serve to fulfill the overall functionality of the module.

    Replaceable.. I really really have never replaced a single package! I always replaced JARs or started or stopped Bundles (even JARs). Surprisingly (for some), the deployment unit!

    "The JLS introduction even names them as similar to Modula's modules."

    See last few comments here for more information.

    "Actually, module dependencies on packages are already encoded in the class files."

    No, see above...

    "The only thing we need to add to the existing system is versioning information on the package and record this version in the class file."

    Yeah, it's already a challenge to establish and maintain proper versions and an accurate release process (that adjusts these versions) for the bunch of JARs of our systems... lets do that for all of the packages and classes contained in these JARs too!

    Sorry, I realy like (and belive in) modularity. And I realy realy like the OSGi Service Registry. These are the reasons why I initially was excited about OSGi. Unfortunately, some projects (with OSGi) and discussions with other developers took me to the conclustion that OSGi is too unintuitive for daily work and brings a complexity into your projects that is only appropriate in a few cases (e.g. if you realy need the dynamic of bundles). I also experienced that developers are more concerned about the characteristics of OSGi and the development of bundles, as about the actual problem. Therefore I think OSGi has never become mainstream.

    I think OSGi was an important and necessary step towards modularity but not the end of all solutions.

    ReplyDelete
  4. The remarks about Maven and Spring are pretty true. SpringSource itself tries to become more modular in some areas by embracing OSGi (Virgo) and considering, in a project simply using JPA 2 being forced to drag all the AOP stuff as well as numerous other Spring packages along while never considering to use Aspect Oriented Programming makes Spring more bloated than what its marketing geniuses often want us to believe Java is.

    Where Jigsaw follows that path, Java may remain bloated to some extent, and even more tools and frameworks using Jigsaw, not sparing Spring Framework or any of its competitor.

    ReplyDelete
  5. @Bernd: Obviously OSGi is not the end of all solutions. OSGi is just a step on the infinite road to software heaven. Functions, Modula modules, objects, packages, IoC, DI, they were all simplifying software in a way that allowed us to handle more complexity. I only argue that OSGi is the next step and that Maven/Jigsaw is a big step backwards.

    I am afraid you seem to miss some of what OSGi is really about: your arguments are only valid when you program in OSGi as you would do with a flat class path. The trick in OSGi is that rarely ever class load dynamically (Class.forName) in applications as you need to do in classic Java. In OSGi you're either statically bound or you're what we cal an extender manipulating bundles and loading classes directly from those bundles. Neither case has the unknowns you refer to. The pain (and OSGi's biggest roadblock to wider adoption) is the popularity of applications/libraries that load classes assuming there is no module boundary. Jigsaw will either be modular and have the same problems or will only pretend to provide modularity.

    The µServices model removes the need for dynamic class loading. If you find you need Class.forName, you're just not programming the OSGi way ...

    The problem is that current Java programmers are so used to the dynamic class loading model (e.g. Spring) that they fail to see how anti-modular Class.forName really is.

    You have a similar mistake with versioning. Package versioning is a lot easier than JAR versioning because a package has can be much more cohesive. Most JAR versions are baloney because there is too much fuzzy change in a JAR anyway. A well maintained API/contract in a package (the µservice model) does not suffer of that problem. bndtools largely automates it; and automation is our only salvation.

    Though OSGi supports dynamics you should understand that the dynamics were a consequence of strong modularity, dynamics are not an orthogonal dimension to modularity. Dynamics is about not assuming that the provider will be around all the time. And not knowing is another expression for modularity.

    I think you unfortunately illustrate my problem, which is the same I had explaining objects in the late '80s. It is hard to see the benefits of polymorphism or inheritance when you do not "feel" the object oriented paradigm. Lots of people felt objects were counter-intuitive and hard to understand, it took 15-20 years before enough people got their head around that shift. Alas, it is trivial to complain about new technology because a screwdriver is a very lousy hammer. I see the same pattern today.

    Anyway I am afraid that mainstream is a bit too fuzzily defined to say we're not swimming in it. OSGi is under all major (and some minor) application servers and is on the laptop of virtually any Java developer in the form of Eclipse. And I am not even talking about the embedded world.

    As OSGi is not a developer programming model (only middle ware should use the OSGi APIs, and even then try to stay away from it) it will never become as much in the face as e.g. Java EE.

    So Neil Bartlett and I have a Masterclass on OSGi in Stockholm at the end of August. If you at the end of those 4 days can honestly claim we've not changed your mind about OSGi I'll refund you the cost for the Masterclass!

    ReplyDelete
  6. I like the idea of building against an interface and running against any one of a number of implementations. But in practice this can be fraught with risk, because implementations can vary so much in their behavior. This is why I feel "fidelity across all phases" is something that should be supported, at least as an option.

    ReplyDelete
  7. @Nick: Obviously you want to be sure that what gets into production is identical to what was tested. Therefore, during the development process you need to narrow down the choices more and more. However, if you build fidelity into the architecture (like Eclipse PDE, and to a lesser extent Maven) you create brittle systems due to version rigidity and not detecting assumptions in code that developers should not have made, causing problems later on.

    Notice that in the majority of cases the multiple implementations are just different versions of the same product. In today's world the components evolve at independent rates.

    Last point. Much of the problems caused by running against different implementations are caused by bad specifications as well as a surprisingly large number of fuzzy things in Java: classpath, class loaders, lack of versions, package atomicity, etc. I am sometimes amazed how people do not see how quicksand there is beneath our feet. I think OSGi has done great work there to make the foundation more rigid but this is often not recognized because many developers actually are addicted to the flexibility the fuzzyness provides. Hey! It works on my machine ...

    ReplyDelete
  8. "The key problem is that if you depend on one part of a JAR you inherit all its transitive dependencies. For example, take the JAR Spring Aspects JAR. Even if my code only uses the org.springframework.beans.factory.aspectj package in this JAR, the Spring Aspects JAR will unnecessarily provide me with on dao, orm, jpa, as well as the transaction package. If I deploy my application I am forced to drag in the dependencies of these packages, which drag in other JARs, which each have additional dependencies, etc. etc. During deployment tens (if not hundreds) of JARs are dragged in even though those packages can never be reached from my code. "

    In my opinion modules would be there to exactly address this issue .

    consider this module :

    module com.greetings @ 0.1 {
    requires org.astro @ 1.2; // requires a specific version
    class com.greetings.Hello;
    }

    this defines that the greetings module requires astro , not the container JAR , ok?

    so if we have multiple modules in one jar file , for example aspect , dao , orm etc . your module says that you only need aspect module not dao , orm ... .

    you are not working with JARS anymore.

    ReplyDelete
  9. @saeed In Jigsaw there will be a 1:1 relation between module and JAR so there is no difference.

    If you can have multiple modules per JAR, as OSGi has in packages as modules, the models are very similar. You can then wonder why we need a new construct as we already have packages.

    Anyway, in my understanding Jigsaw modules are also the deployment unit.

    ReplyDelete