Yesterday Mark Reinhold posted a link to the new
requirements on Jigsaw. He states a more ambitious goal to make Jigsaw not just for VM developers only as they said before. The good news is that OSGi is recognized. The even better news is that now there are requirements it is obviously clear that OSGi meets virtually all of them, and then some. The bad news is that despite OSGi meeting the requirements it does not look likely that Jigsaw will go away anytime soon.
As this round will likely get me another 15 minutes of fame, let me use this fame to try to explain why I consider the Jigsaw dependency model harmful to the Java eco-system.
Jigsaw's dependency model must be based on Maven concepts according to requirement #1. This model is the same as Require-Bundle in OSGi. It is based on a module requiring other modules, where modules are deployment artifacts like JARs. To deploy, you transitively traverse all their dependencies and put them in the module system to run. That is, a module for Jigsaw is a
deployment module, not a
language module.
It is a popular solution because it is so easy to understand. However, Einstein already warned us:
"A solution should be as simple as possible, but not simpler." The problem of managing today's code base for large applications is complex and all of us in the Java eco-system will be harmed with a simplistic solution.
The key problem is that if you depend on one part of a JAR you inherit
all its transitive dependencies. For example, take the
JAR Spring Aspects JAR. Even if my code only uses the
org.springframework.beans.factory.aspectj package in this JAR, the Spring Aspects JAR will
unnecessarily provide me with on
dao,
orm,
jpa, as well as the
transaction package. If I deploy my application I am forced to drag in the dependencies of these packages, which drag in other JARs, which each have
additional dependencies, etc. etc. During deployment tens (if not hundreds) of JARs are dragged in even though those packages can
never be reached from my code. Not only is this wasteful in memory, time, and complexity, it also inevitably causes problems in
compatibility. For example, if the jpa dependency and the orm dependency transitively depend on different versions of the log4j JAR I cannot even run my code for no real reason whatsoever. In practice this problem is often ignored, for example maven will take the first log4j version it encounters, regardless if this is a suitable version. The value of type safety is questionable if that kind of fudging is acceptable.
Maven users are already all too familiar with this problem: many are complaining that maven downloads the Internet. This is not bashing maven, maven only does what it is told to do in the poms. In
Tim O'Brien's blog he explains that
Maven does not download the Internet, People download the Internet. Though he rightly blames the poms, he ignores the fact that the dependency model makes it hard to work differently. It is like you want a candy bar but you can only get the whole supermarket. He advises each JAR to be self contained and completely modularized to not drag in extraneous dependencies. Agree. Unfortunately this means you get lots and lots of JARs. And with lots of JARs on your module path without side by side versioning the compatibility issues like the above log4j example explode. And Jigsaw is not likely to have
simultaneous multiple versions in run-time when critically reading the requirement.
In Unix dependency managers the transitive aspect is not such a big deal because dependencies are relatively coarse grained and there is a reason why you usually have to build stuff to make it compatible with your platform. Java is (thank God) different. Java dependencies are not only much more fine grained, they are binary portable to any platform, most important of all, often they have no dependency on an implementation but use a dependency on a contract instead. Unix was brilliant 40 years ago and is still impressive today but let's not undervalue what Java brought us. We must understand the differences before we revert to a 40 year old technology from an operating system to solve the problems of a modern and unique programming environment like Java.
The key advantage that Java brings to the dependency landscape is that each class file encodes
all its dependencies. Give me your class file and I will tell you what packages you depend on. For example, when I compile my JAX-RS class against the jersey JAR(s) you will find that the class file only depends on the
javax.ws.rs package even though the Jersey file contains lots of implementation packages. However, in the Jigsaw world I must select a module JAR that provides that package. Best case is to pick the JSR 311 API JAR for compilation to ensure I do not accidentally pick up implementation classes. However, I do not want to make the life of the deployer hard by unnecessarily forcing a specific implementation in run-time. So what module should I pick to depend on?
Despite common believe, we do
not need
fidelity across all phases between the build time and run-time; we need to
compile against API and
run against implementations that are
compatible with that API. A good Java module system must allow me to depend on an API module at compile time and select an appropriate implementation at deployment-time. Forcing the compile time dependency to be the same as the run-time dependency creates brittle systems, believe me. The problem can be solved by separating the concept of a deployment module (the JAR) and the language module.
We could define the concept of a language module in Java and allow the JAR to contain a number of these modules. Surprisingly (for some), Java already provides such a language module, they are called
packages. Packages provide a unique name space, privacy, and grouping, all hallmarks of modularity. The JLS introduction even names them as similar to Modula's modules. Actually, module dependencies on packages are already encoded in the class files. The only thing we need to add to the existing system is versioning information on the package and record this version in the class file. During deploy-time a resolver can take a set of JAR's (which can be complete) and calculate what modules should be added depending on the actual environment. E.g. if it runs on a Mac it might deploy another set of additional JAR's then if it runs on the PC or Linux.
To manage real world dependencies we need this indirection to keep this manageable. From 2010 to 2017 the amount of software will double. If we want to have a fighting chance to keep up we must decouple compile time from deployment time dependencies and have a module system that can handle it.
It seems so obvious to build on existing concept in the Java language that it begs an explanation why the transitive dependency model on JARs is so popular. The only reason I can think of is that is
easier for the module system providers to program. In the Jigsaw model traversing the dependencies is as easy as taking the module name + version, combining it with a repository URL, doing some string juggling and using the result as a URL to your module. Voila! Works with any web server or file system!
Package dependencies require an
indirection that uses a mapping from package name to deployment artifact. The Jigsaw/Maven model is like having no phone number portability because the operators found a lookup too difficult. Or worse, it is like having a world without Google ...
So for the convenience of a handful of module system providers (in the case of Jigsaw just one because it is not a specification) we force the rest of us to suffer? Though people have learned to use dynamic class loading and not telling the truth in their poms to work around dependency problems inherent in the dependency model we should not encode this in my favorite language. Should we not try to address this problem when a new module system is started? Or better, use an already mature solution that does not suffer the impedance mismatch between compile-time and run-time?
At JavaOne 2007 Gilad Bracha, then Sun's man in charge of JSR 277, told us he had started JSR 294 because he did not trust the deployers to handle the language issues. At that particular time this remark puzzled me.
Peter Kriens
P.S. Opinions stated in this blog are mine alone and do not have to agree with the OSGi Alliance's position