Tuesday, November 27, 2007

JSR 294 SuperPackages

This blog is updated after a pleasant conversation with Alex Buckley over email in which he explained some of the finer points. These sections are marked with []. JSR 294 has produced their public draft of superpackages! (Somehow the exclamation mark seems necessary with this name ... ) Anyway, superpackages are a new construct for Java 7 to improve the modularity of the Java language. Originally JSR 277 was going to take care of modularity but Gilad Bracha (formerly Sun) thought that deployers could not be trusted with language elements and spun off JSR 294 after he published a spin about superpackages in his blog. Gilad left, but the JSR 294 expert group churned on and they produced a public draft, this blog is a review of that draft.

Superpackages address the ever present need to modularize. Types encapsulate types, fields and methods, packages encapsulate types, and superpackages encapsulate, packages and nested superpackages. When Java was designed packages were intended to be a set of closely related types, a.k.a. a module. However, systems have become larger and larger and packages turned out to not have the right granularity for many. It could have worked if packages had been nested, but this would of course have limited the flexibility of the programmer and Sun used to favor configuration over convention.

For the OSGi specifications we came up with the JAR file as a module of packages. Packages can be exported and imported, allowing the developer to keep classes private or expose them to be used by others. Modularity in OSGi is therefore a deployment concept, the same packages can be members of different modules/bundles. OSGi service platforms enforce these rules with the aid of class loaders.

However, purists want to have the modularity in the language itself, therefore superpackages were born in Sunville. Superpackages group a set of named (super-)packages and export types from these packages.

First let me make clear that I have not seen anything in the public review draft that would make it hard for OSGi to support superpackages. The current incarnation of the OSGi framework will be more or less oblivious of superpackages. The accessibility rules are enforced by the VM and the system works with normal class/resource loading rules. Superpackages and its member types must come from the same class-loader/bundle, but that does not look like an issue. So from an OSGi point of view there is no reason for concern with JSR 294 as long as a superpackage and its member packages come from the same bundle.*

So the remainder is just a technical review of the technical merits of 294, the OSGi specifications are not affected by it.

The superpackages specification is surprisingly complex and there is no overview. The spec is a set of changes to the existing Java Language Specification. I'll try to show you my understanding.

The current model is depicted in the following picture.


When you begin a class you give it a package name with the package keyword:

package com.foo;
class A { ... }

That is, the class defines its membership to a package. This is different from a superpackage that honors its name looking at the complexity it introduces. The following picture shows a similar diagram when superpackages are introduced.


This clearly is supercomplex in comparison with the current model. The major drawback is the redundancy that is introduced by this model. When I went to school ages ago I was thought that redundancy is evil. Over time I learned that redundancy can improve a system but the cost always remains. In this case the redundancy actually seems to create a very brittle structure that makes deployment unnecessary rigid. Let's explore.

[] Before first. As you can see, the Package is dashed now. The reason is that Sun is a bit ambiguous about packages in this specification. In this specification a package is treated as a name with structure (com.foo.bar is a sub-superpackage of com.foo), this is not uncommon. However, in Java a package is (well should be) a first class citizen but it often is not. Packages do provide access rules and there is the java.lang.Package class (not in java.lang.reflect). However, this class is not reflective; it is impossible to find the classes belonging to a package. Would you accept that a class could not enumerate its fields and methods? So why should a package not be able to enumerate its members? In contrast, a superpackage is in java.lang.reflect can enumerate is contained superpackages and, strangely enough, its member types, not its packages. Interestingly, to be able to enumerate the types the VM must be able to enumerate the packages because the superpackage class file only enumerates the package names, not the individual members. I would advise the JSR 294 expert group to take the packages seriously and allow full reflection.

First, superpackage links are always bidirectional and can therefore easily be wrong. A class file points to the superpackage file and the superpackage file points to the package of the class files. That is, if you move a package to another superpackage the superpackage definition file and the sources of the types must be modified. Fortunately wildcards can be used in the superpackage to identify the exported classes, though this means that most (if not all) changes in a type require recompilation of the superpackage definition.

The same is true for superpackage parenthood (enclosement) and membership (nesting). This link is also bidirectional, the parent must list its children and the children must list their parent. A superpackage is restricted to one parent, it is a true hierarchy. By the way, the top of the hierarchy is the unnamed superpackage. Oh yes, also all superpackages with a simple name (no dot in them) are automatically visible (in scope) to any superpackage if I read 7.4.5 correctly (took me some time to figure that out).

[]The unnamed superpackage is really special. Any top level super package is automatically a member of this supersuperpackage. Any class can see any exported type from a top level superpackage. I missed this the first time, the specification could make this more clear. Because the rules for the unnamed superpackage are so different. For example, membership is automatic, all exported members are visible to anyone, and the unnamed package is not open for reflection, it is represented as null. I wonder if this unnamed superpackage should not just be named the global space. I.e. there is no supersuperpackage so do not imply it by calling the superpackage that shall not be named?

The data structure for superpackages is not elegant. Though we have good refactoring tools today in Eclipse, IDEA, and Netbeans, I do not think it is a good excuse to design data structures that are so error prone.

However, the restriction is something that worries me most because it seems to create a system that I have hard time to see how it should work. The restriction is intentional, from section 7.4.5:
If a superpackage did not have to declare which superpackage it is nested in, then the following problem could occur. Consider these superpackages, where Outer.Inner does not declare that it is nested in Outer.

superpackage Outer {
member superpackage Outer.Inner;
}
superpackage Outer.Inner {
member package foo;
export foo.C;
}

If a type outside the Outer.Inner superpackage tries to access foo.C, then the access
would succeed because foo.C is exported from Outer.Inner and neither C.class nor the superpackage file for Outer.Inner mentions the fact that Outer.Inner is a non-exported nested superpackage of Outer. The intent of the Outer superpackage - to restrict access to members of Outer.Inner - is subverted.
Clearly, they have chosen restriction over convenience. However, the consequences of this are quite far reaching. Let us take a look at the access rules. I always need pictures for these things so we need a legend:

The first access rule in 5.4.4. in the specification reads that type C is accessible to type D if any of the following conditions is true:
  1. Type C is public and is not a member of a named superpackage


  2. Type C is public and both type C and D are a member of the same superpackage S.


  3. Type C is public and C is an exported member of named superpackage S and D is a member of the enclosing superpackage O of superpackage S


  4. Type C and D reside in the same package p.


At first could not understand how one of the most common cases, a library provider, could work with superpackages. The rules state that a type can only see what is available to its superpackage. This seems to exclude visibility between peer superpackages. For example, if OSGi would put all its specification packages in the org.osgi super package, a member type of the com.acme package could not see the OSGi exported types. However, after a lot of puzzling I found that in 7.4.2 it states that "A superpackage name can be simple or qualified (§6.2). A superpackage with a simple name is trivially in scope in all superpackage declarations."

I guess this means means that a super package has all superpackages with simple names as superpackage members? If this interpretation is true, then any "top" level superpackage would be visible to anybody else. Therefore, the following example should work:

[] The previous is not correct, the magic is the unnamed superpackage. I missed the rule (even after looking after being told) that any top level superpackage makes its exports available to any type in the system, regardless if it is a simple or complex name. That is exported types of top level superpackages are global. The use of the unnamed superpackage confused me because the rule is so different from normal superpackages. Silly me.


It seems Sun is slowly moving to conventions over configuration! Influence of the Ruby guys they hired? A name with no dots means general membership is clearly convention. However, it raises a number of issues.
  • If the package must have a simple name, how do we handle uniqueness? Package names normally are scope with the reverse domain name, like org.osgi... However, org.osgi is not a simple name? [] This is thus not an issue, a superpackage can have a dotted name, the trick is that it must be a top level package, i.e. not being enclosed.
  • It seems that top level packages are special. Then why is a superpackage not just defined in a single file that allows nested superpackages without naming them? This model would significantly simplify the model where the VM must find resources from all over the system that have obligatory relations. A lot of potential errors could be removed this way.
  • []Despite my minsunderstanding, the previous point is still relevant. It is not clear why superpackage members are spread out over the file system while they are closely dependent on each other with bidirectional links.
I guess I must be missing something ...

Versioning

One would expect that in AD 2007 any modularity system in Java should have an evolution model. Alas, I have not been able to find any hint, not even a manifest header, of versions and how a superpackage should evolve. Obviously, this needs to be addressed. Superpackages are related to large systems and large systems do not pop in existence nor do they suddenly disappear. They live a long time and need to change over time.

Defining the Content of a Superpackage
The spec says that a superpackage has only member types and nested superpackages. However, the superpackage file contains a list of packages and lists the nested superpackages. The exports, however, list the exported type names and the exported superpackages.

These data structures are specified in the superpackage declaration and this is a file that the average developer will love to hate. This file must list all the packages of a superpackage; wildcarding or using the hierarchy is not allowed. Each package must be entered in full detail. Same for member superpackages as well as exported superpackages. Exported types can, however, use a short cut by specifying the exported types with the on-demand wildcard. That is, you can export com.foo.* indicating that you accept all types in the com.foo package (or all nested types in the com.foo type!). This sounds cool until you look at normal practice. A very common case is that implementation classes in a package have a name that ends in Impl. However, the wildcard in on-demand specifications is all or nothing. This likely means that all exported types must be enumerated by hand. Painful!

Deployment Versus Language
The key driver to remove JSR 294 from JSR 277 was that deployment artifacts should have no influence on the language. I am not so sure of that. One of the key insights I have obtained in the last few years is that there are many ways to slice and dice a JAR. With the myriad of profiles and configurations in Java, it is likely that you must deploy the same code in different ways to different platforms. For example, one of the really useful tricks of the bnd tool is the possibility to copy code from other bundles. Though many people can get upset about this, it provides me with a way to minimize dependencies without introducing redundancy because there is still one source code base.

The current solution of superpackages is highly rigid and static with its double linked structure. A class can only be a member of one superpackage. I really prefer the more flexible solution of a description that defines how a set of classes are modularized. It is a pity that the JCP does not work from a requirements phase so that these differences could be discussed without a worked out proposal on the table.

Restrictions and Security
In section 7.4.5 an example with an Outer and Outer.Inner superpackage is given that elucidates why the nested package must name their enclosing package. However, without a security manager anybody can easily access any packages to their liking. Access restrictions are conveniences, not security.

It would have been a better solution to add a SuperpackagePermission that specifies which packages can be named members or not. This would be similar to the OSGi PackagePermission. This would be a safe way to control access, the current model pays a very high price (only a single parent, double pointers) but does not provide security, just a slight barrier.

OSGi and Superpackages
Assuming that superpackages can come from different class loaders (see footnote), it is likely that the OSGi specifications need an additional header that reflects the super package dependencies. This dependency is a nice intermediate between Import-Package and Require-Bundle, albeit at the cost of a lot of added complexity and maintenance.

Interestingly, in the work for the next release we are being pushed from multiple sides to provide an entity that stands for an application. In J2EE and MIDP the context of an application is clear because applications are strictly contained in a silo. In OSGi platforms the situation is more fluid because applications collaborate. Superpackages could be a handle in this direction but in the current incarnation it will likely not work.

[]It looks like that a top level superpackage and its children must come from a single class loader. This is bad, because it means that in JSR 277 and OSGi it is impossible to deploy a member superpackages in modules. A common use case is that an enterprise has an application consisting of multiple bundles. Superpackages could have been used to minimize the exposure of internal API to other residents. However, bundles in OSGi and modules in JSR 277 require their own class loader, implying that a top level superpackage must be deployed with all its enclosed superpackages and code in one module/bundle. I guess OSGi Fragments can be abused to still allow partial delivery and update of an application, but this is not very elegant.

Conclusion
I wish I could be positive about JSR 294 but it I can't. The lack of requirements document makes it hard to judge the JSR on its own merits so this conclusion is my personal opinion.

I think that the current solution is unnecessary complex, there is too much redundancy and there is too much to specify; information that is usually quite volatile during development. The current model unnecessarily allows too many potential errors. Also versioning must be addressed. And if I understand the model with simple names being available to all superpackages then a solution must be envisioned to allow superpackage names to be unique.

However, the key aspect I differ with is if we need a language construct for modularity. Maybe I am blinded by almost ten years of OSGi modularity but JAR based modularity seems to provide more than superpackages provide at great additional expense. So if superpackages must be added to the language, please simplify it and provide a more convenient method to specify its contents. Better, consider how much JAR based modularity could add to the language.

Peter Kriens

* = There is a slight concern with section 5.3.5 of the Classfile and VM changes of the 294classfilevm.html file. In the last paragraph it states that a superpackage and its member types must be loaded from the same class loader. I interpret this as its direct members, however, one could interpret this as an member type of the enclosed super packages. If this unlikely interpretation is true, all superpackages would have to come from the same class loader, which seems silly. However, 7.1.1 defines type membership transitively giving credence to the silly interpretation. Needs work.


1 comment:

  1. This is not the first time I've seen Gilad Bracha to reinvent the wheel. After reading the post about JCP being flawed, it seems even more right.

    First of all, bloating the language with constructs which are to large extent redundant is a road to hell, which Microsoft took few years ago with Microsoft.Net (and NetFx 3.0 and 3.5, as well as the new Visual Basic and C# are the results). I much prefer the approach of OSGi or application servers where the JAR/WAR is the module.

    The problem with lack of the requirements process is long known in the Sun Java development and dates back to days, when Sun looked for the imaging API for Java and took one proprietary implementation as a standard. The problem is, that sometimes, it just works. So it is difficult to learn from the mistakes from the past if they do not hurt enough.

    ReplyDelete