OSGi Blog: 201101

Monday, January 24, 2011

bndtools hackaton

I've talked with a few people that are using bndtools and there is some interest to spent some time hacking on bndtools, Neil Bartlett's Eclipse plugin for OSGi development. The tool is getting traction and people want to extend the tool in different directions. So I will be organizing a hackaton for bndtools on OSGi DevCon/EclipseCon on Wednesday night (March 23, Santa Clara). I'll make sure some of the right people are there and that we will have a place to hack. If you want to participate, let me know at Peter.Kriens@aQute.biz.

Peter Kriens

Friday, January 21, 2011

Anecdotes of Frustration

I am doing a little survey of what moments of frustration we're having as developers of software that relies heavily on external software (both open source, proprietary, and in-house but from other departments). I am looking for anecdotes how you, or others, wasted time and money in your normal developer work flow because something went wrong with an external software product. I am not looking for theories, I'd like to hear real stories, a.k.a. anecdotes.

To ease the processing, I'd like to see the following format:

The situation/environment
What were you trying to accomplish
How did you do it
What went wrong
How to prevent it next time
What was the economic cost of the mishap

For example:

Developing a web site for a small workshop
We were trying to provide textual search to the past orders
We integrated open open source X into our website and then suddenly we got Null Pointer Exceptions in our web layer, which was not at all connected.
It turned out that some of the packages were repeated in product X and because X was first on the class path the web layer started to use X's packages and they were an older version.
Reverse the order on the class path
We lost about a man day figuring it out

Another:

Creating an order taking application for the iPad
We were field testing the application with potential customers a to prepare for the launch at a large conference that was held within two weeks.
One of the field testers contacted us and said that he had noticed we were using a GPL package to communicate with our back-end. He was a hobby developer and asked us for our sources so he could review the source code. However, our code contained trade secrets and we were not willing to provide the source base to him. He then threatened to contact the EFF.
We worked three day straight to replace the GPL code in our product, which we actually had not been aware that it was in our product.
We lost about 20 man days and got a few extra gray hairs

From the examples it should be clear I am not looking for OSGi centered problems; I am mostly interested in problems related to the handling of external products. Keywords:

Legal, problems with licenses, patents.
Versioning, surprising mismatches
Deploying
Escrow
Java
Security
...

You can send these anecdotes to anecdotes@aQute.biz. I would appreciate it if you could ask others as well. You can of course send multiple anecdotes but please use different emails to simplify the processing. I will not use the anecdotes in any way that can be connected to the companies or individuals, sending them anonymous or without actual company or individual names is ok.

If I receive a substantial number of these anecdotes then I will process them and write a number of blogs about common problems that everybody seem to have. This of course will be a fine input to the OSGi specification process.

Looking forward to your anecdotes, keep them coming!

Peter Kriens

Thursday, January 13, 2011

Error Messages

The increased adoption of OSGi is fantastic but it also means more and more people start using it without properly understanding the technology. These people tend to get upset when things do not immediately work the way they expect and blame it on the technology. The clarity of the error messages are therefore an important aspect of the usability. If they are clear and simple to understand people will be easier to get along with the technology. However, dependency graphs with constraints are notoriously difficult to explain in textual messages.

In this vein Richard S. Hall (Apache Felix) contacted me to discuss a very interesting presentation he is working on. Going through the presentation I joked that I thought the error messages listed in the presentation were completely unreadable. Richard is the conscience of the OSGi Alliance and very much driven to keep things simple and to provide as much help as possible to end users. So that joke hurt. That started a long thread about error messages. We mostly focused on this message:

Constraint violation for package 'bar' when resolving
module 11.0 between existing import 8.0.bar BLAMED ON
[[11.0] package;
(&(package=bar)(version>=1.0.0)(!(version>=2.0.0)))]
and uses constraint 10.0.bar BLAMED ON [[11.0] package;
(&(package=foo)(version>=1.0.0)(!(version>=2.0.0))),
[9.0] package;
(&(package=boz)(version>=1.0.0)(!(version>=2.0.0)))]

I've noticed it before but I have a huge deficiency; a blurp of unformatted text like this message is for me completely unreadable. When I run into such a amorphous blurb, I need to paste it in an editor and put some visual structure in it to grasp the contents. So lets format it:

Constraint violation for package 'bar' when resolving module 11.0
between existing import 8.0.bar
  BLAMED ON [[11.0] package;
    (&(package=bar)(version>=1.0.0)(!(version>=2.0.0)))]
    and uses constraint 10.0.bar
  BLAMED ON
    [[11.0] package;
    (&(package=foo)(version>=1.0.0)(!(version>=2.0.0))),
    [9.0] package;
    (&(package=boz)(version>=1.0.0)(!(version>=2.0.0)))]

Hmm. This seems about a uses constraint violation that is caused by a module that can see the package bar through two different paths in the network of dependencies. That is, the class space for package bar in module 11 contains two definitions of package bar. One comes from a module 11 importing package bar directly from module 8 and the other is a chain of modules. Module 11 seems to import package foo from module 9, where module 9 also exports boz that has a uses constraint on package bar. Module 9 seems to import package bar from module 10. Phew! After some talking with Richard, I came to the conclusion that we have this problem:

This diagram uses the syntax used in the OSGi specifications. A white box is an import, a black box is an export, a yellow box is a bundle, and the white cylinder in the package boxes is the uses constraint.

In this case, module 11 can receive bar objects that are loaded by module 8 as well as module 10, not good. Clearly, any sane person will concur that uses constraints are good because they detect run-time errors early and allow a resolver to prevent these run-time errors from happening. However, how should we report such a complex problem in a log message?

After some sparring Richard came back a few days later with an alternative proposal:

Module com.acme.eleven[11.0] can see package 'bar' from
com.acme.eight[8.0] and com.acme.ten[10.0]. The first
is due to com.acme.eleven[11.0]'s requirement
(&(package=bar)(version>=1.0.0)(!(version>=2.0.0)))
that is satisfied by com.acme.eight[8.0]. The second
is due to the dependency chain from
com.acme.eleven[11.0]'s requirement
(&(package=foo)(version>=1.0.0)(!(version>=2.0.0)))
which is satisfied by com.acme.nine[9.0], that has a
requirement
(&(package=boz)(version>=1.0.0)(!(version>=2.0.0)))
which is satisfied by com.acme.ten[10.0] that uses
package 'bar'.

Ouch, another blurb. The problem is that Richard is someone that is extremely good in parsing these text blurbs (he still prefers vi over Eclipse), but I suck at it. I need a visual representation ! However, this message did improve the original if properly formatted:

Module com.acme.eleven[11.0] can see package 'bar'
 from com.acme.eight[8.0]
 and  com.acme.ten[10.0].

The first is due to com.acme.eleven[11.0]'s requirement
    (&(package=bar)(version>=1.0.0)(!(version>=2.0.0)))
that is satisfied by com.acme.eight[8.0].

The second is due to the dependency chain from
  com.acme.eleven[11.0]'s requirement
    (&(package=exporter2.woz)(version>=1.0.0)(!(version>=2.0.0)))
which is satisfied by com.acme.nine[9.0],
that has a requirement
    (&(package=foo)(version>=1.0.0)(!(version>=2.0.0)))
which is satisfied by com.acme.ten[10.0]
that uses package 'bar'.

This clearly is in the right direction but the lack of formatting and the, in my opinion, overly verbose text is still not to my liking. Personally, I like conciseness, it makes it easier to see the problem; verbose template text has a tendency to shut my brain down. However, Richard argues that a lot of people reading the text have no idea what OSGi is and need the verbosity. Not sure, maybe there needs to be an explanation of the problem without the actual arguments or a web page that details what is going on. If you have no clue, I think it is too ambitious to expect an understanding from a mere error message. In the old, long gone days of computing, all error messages had numbers that one could look up, maybe we should start doing that again.

So Richard has taken two shots at it. Now let me provide a variation on the error message that I think would be best. As a strong believer in leveraging the visual brain I think formatting and white space are not a luxury but a necessity. So my message is highly visualized:

[E128] Uses constraint violation in com.acme.importer[11.0] on package 'bar'

A bundle can see classes from the same package via multiple class loaders.
To prevent this case (which results in Class Cast Exceptions), an
exporter package  can declare what packages it uses with the uses:
directive. The framework transitively detects cases where a bundle
can be confronted with objects from the same package but coming
from different class loaders. This error indicates that no resolution
could be found and this constraint violation is one of the problems.

In this case com.acme.eleven (module 11.0) sees package 'bar' through
the following 2 dependency chains:

1)  com.acme.eleven[11.0]
    import: (&(package=bar)(version>=1.0.0)(!(version>=2.0.0)))
     |
    export: {package=bar, version=1.2}
  com.acme.eight[8.0]

2) com.acme.eleven[11.0]
    import: (&(package=foo)(version>=1.0.0)(!(version>=2.0.0)))
     |
    export: {package=foo, version=1.5, uses:=boz}
 com.acme.nine[9.0]
    import: (&(package=boz)(version>=1.0.0)(!(version>=2.0.0)))
     |
    export: {package=boz, version=1.5, uses:="bar"}
    export: {package=bar, version=1.2}
 com.acme.ten[10.0]

Richard has not seen this example but he will undoubtedly react. And we would you to react as well because we just do not know what is best. Richard is willing to change the error messages of the Apache Felix framework if people come up with a better solution than we have now.

Peter Kriens

P.S. Richard became a father yesterday of his second child: Brady William Hall. Congratulations!

Wednesday, January 5, 2011

Accidental Complexity and Modularity

This week I watched a discussion unfold that seems quite typical for the pain many people feel when they port code to OSGi. The problem started with the infamous class not found exceptions that happen when you try to dynamically load classes from the big void. After a couple of iterations where context class loaders, buddy loading, and other hacks were discussed things seem to get worse and worse until someone asked: "Why do you actually load all those classes dynamically?" After a silence, the somewhat reluctant answer was that the code looked a bit over the top in this area ...

If I would order a new construction on my house I would be pretty surprised (upset would probably a better word) when the constructor would spent a substantial portion of his paid time experimenting with different building techniques to make his life easier. Fortunately, this is not very likely because I can understand what construction workers are doing (most of the time) and the skills they possess are not the skills required to optimize their tools and techniques. For example, painters are unlikely to insert thousands of logging statements in their paint (20110101 10:22.03223 INFO: Latex: Paint drying from brush@3128AC at position 3121912@121212C ?) or develop their own brushes. We are different: our customers usually have no clue how we do it and our tools are built with the same skills as the applications we make. We are the only large industry with an ailing tool market.

I do know from experience how easy it is to go overboard by attempting to simplify, believe me. The most gratifying code is that annotations library that reduces your domain code from a thousand repetitive lines to that 1 line of code (with 99 annotations). Unfortunately, simplifications are in the eye of the beholder. Newcomers must still learn the annotations to understand what the code is doing. If (too) many developers make such "simplifications" we moved the complexity from repetitive simple code to a very hard to understand (and maintain) system. Still, it is fun to do, much more fun than learn what is already out there.

The last few years I've started to see more and more how "simplifications" locally can make things more complex globally. Tricks and hacks that seem to work wonderfully well for a problem tend create brittle and hard to maintain systems. Not because they are inherently bad but because the large number of one time rules they introduce makes the overall systems hard to understand. This is increasingly becoming a problem because today we build our systems out of many open source components. Unfortunately, each open source component has a tendency to bring its own bag of tricks and hacks. The big problem we face today is how to combine these components into a coherent application and not having to learn and fight the combined accidental complexity. That is, complexity is no longer a qualative problem but it has become a quantitative problem.

Systems build from collaborating components is an area that is eminently addressed by OSGi. But one should be warned that there is no free lunch. To achieve OSGi's extensive benefits requires that modules hide their tricks and hacks inside the module boundaries and use the OSGi primitives to collaborate with their peers. Any hack that assumes global visibility (as does dynamic loading) is bound to destroy these hard won benefits, if not today then surely tomorrow.

So the bad news is that OSGi is disruptive, you cannot take your existing code base, sprinkle some OSGi pixy-dust on it and then suddenly all your problems are magically gone. (If you have such a pixy-dust, lets talk, I have some Facebook shares to sell.) This is not related to how OSGi was designed, this is related to the fact that OSGi is based on the principles of strong modularity. There is no silver bullet to modularize your code base overnight because modularity is a design principle, not a technique you can haphazardly apply. If you read David Parnas' seminal paper about modularity you see exactly what I mean; he decomposes a system in different ways but using the principles of modularity he got an overall simpler and much more resilient solution.

When the first combustion engine cars were developed they looked like horse carriages as today most enterprise applications look like Java EE applications. In the last few years the OSGi has spent a lot of effort to make it easier for Java EE applications to use to the OSGi highway; moving your WAR to OSGi has become trivial with the WAB specification. However, we should not forget that to go at full speed leaving your design unchanged is at your own peril (and cost).

Peter Kriens