Thursday, January 13, 2011

Error Messages

The increased adoption of OSGi is fantastic but it also means more and more people start using it without properly understanding the technology. These people tend to get upset when things do not immediately work the way they expect and blame it on the technology. The clarity of the error messages are therefore an important aspect of the usability. If they are clear and simple to understand people will be easier to get along with the technology. However, dependency graphs with constraints are notoriously difficult to explain in textual messages.

In this vein Richard S. Hall (Apache Felix) contacted me to discuss a very interesting presentation he is working on. Going through the presentation I joked that I thought the error messages listed in the presentation were completely unreadable. Richard is the conscience of the OSGi Alliance and very much driven to keep things simple and to provide as much help as possible to end users. So that joke hurt. That started a long thread about error messages. We mostly focused on this message:

Constraint violation for package 'bar' when resolving
module 11.0 between existing import 8.0.bar BLAMED ON
[[11.0] package;
(&(package=bar)(version>=1.0.0)(!(version>=2.0.0)))]
and uses constraint 10.0.bar BLAMED ON [[11.0] package;
(&(package=foo)(version>=1.0.0)(!(version>=2.0.0))),
[9.0] package;
(&(package=boz)(version>=1.0.0)(!(version>=2.0.0)))]

I've noticed it before but I have a huge deficiency; a blurp of unformatted text like this message is for me completely unreadable. When I run into such a amorphous blurb, I need to paste it in an editor and put some visual structure in it to grasp the contents. So lets format it:
Constraint violation for package 'bar' when resolving module 11.0
between existing import 8.0.bar
BLAMED ON [[11.0] package;
(&(package=bar)(version>=1.0.0)(!(version>=2.0.0)))]
and uses constraint 10.0.bar
BLAMED ON
[[11.0] package;
(&(package=foo)(version>=1.0.0)(!(version>=2.0.0))),
[9.0] package;
(&(package=boz)(version>=1.0.0)(!(version>=2.0.0)))]

Hmm. This seems about a uses constraint violation that is caused by a module that can see the package bar through two different paths in the network of dependencies. That is, the class space for package bar in module 11 contains two definitions of package bar. One comes from a module 11 importing package bar directly from module 8 and the other is a chain of modules. Module 11 seems to import package foo from module 9, where module 9 also exports boz that has a uses constraint on package bar. Module 9 seems to import package bar from module 10. Phew! After some talking with Richard, I came to the conclusion that we have this problem:



This diagram uses the syntax used in the OSGi specifications. A white box is an import, a black box is an export, a yellow box is a bundle, and the white cylinder in the package boxes is the uses constraint.

In this case, module 11 can receive bar objects that are loaded by module 8 as well as module 10, not good. Clearly, any sane person will concur that uses constraints are good because they detect run-time errors early and allow a resolver to prevent these run-time errors from happening. However, how should we report such a complex problem in a log message?

After some sparring Richard came back a few days later with an alternative proposal:

Module com.acme.eleven[11.0] can see package 'bar' from
com.acme.eight[8.0] and com.acme.ten[10.0]. The first
is due to com.acme.eleven[11.0]'s requirement
(&(package=bar)(version>=1.0.0)(!(version>=2.0.0)))
that is satisfied by com.acme.eight[8.0]. The second
is due to the dependency chain from
com.acme.eleven[11.0]'s requirement
(&(package=foo)(version>=1.0.0)(!(version>=2.0.0)))
which is satisfied by com.acme.nine[9.0], that has a
requirement
(&(package=boz)(version>=1.0.0)(!(version>=2.0.0)))
which is satisfied by com.acme.ten[10.0] that uses
package 'bar'.


Ouch, another blurb. The problem is that Richard is someone that is extremely good in parsing these text blurbs (he still prefers vi over Eclipse), but I suck at it. I need a visual representation ! However, this message did improve the original if properly formatted:

Module com.acme.eleven[11.0] can see package 'bar'
from com.acme.eight[8.0]
and com.acme.ten[10.0].

The first is due to com.acme.eleven[11.0]'s requirement
(&(package=bar)(version>=1.0.0)(!(version>=2.0.0)))
that is satisfied by com.acme.eight[8.0].

The second is due to the dependency chain from
com.acme.eleven[11.0]'s requirement
(&(package=exporter2.woz)(version>=1.0.0)(!(version>=2.0.0)))
which is satisfied by com.acme.nine[9.0],
that has a requirement
(&(package=foo)(version>=1.0.0)(!(version>=2.0.0)))
which is satisfied by com.acme.ten[10.0]
that uses package 'bar'.


This clearly is in the right direction but the lack of formatting and the, in my opinion, overly verbose text is still not to my liking. Personally, I like conciseness, it makes it easier to see the problem; verbose template text has a tendency to shut my brain down. However, Richard argues that a lot of people reading the text have no idea what OSGi is and need the verbosity. Not sure, maybe there needs to be an explanation of the problem without the actual arguments or a web page that details what is going on. If you have no clue, I think it is too ambitious to expect an understanding from a mere error message. In the old, long gone days of computing, all error messages had numbers that one could look up, maybe we should start doing that again.

So Richard has taken two shots at it. Now let me provide a variation on the error message that I think would be best. As a strong believer in leveraging the visual brain I think formatting and white space are not a luxury but a necessity. So my message is highly visualized:

[E128] Uses constraint violation in com.acme.importer[11.0] on package 'bar'

A bundle can see classes from the same package via multiple class loaders.
To prevent this case (which results in Class Cast Exceptions), an
exporter package can declare what packages it uses with the uses:
directive. The framework transitively detects cases where a bundle
can be confronted with objects from the same package but coming
from different class loaders. This error indicates that no resolution
could be found and this constraint violation is one of the problems.

In this case com.acme.eleven (module 11.0) sees package 'bar' through
the following 2 dependency chains:

1) com.acme.eleven[11.0]
import: (&(package=bar)(version>=1.0.0)(!(version>=2.0.0)))
|
export: {package=bar, version=1.2}
com.acme.eight[8.0]

2) com.acme.eleven[11.0]
import: (&(package=foo)(version>=1.0.0)(!(version>=2.0.0)))
|
export: {package=foo, version=1.5, uses:=boz}
com.acme.nine[9.0]
import: (&(package=boz)(version>=1.0.0)(!(version>=2.0.0)))
|
export: {package=boz, version=1.5, uses:="bar"}
export: {package=bar, version=1.2}
com.acme.ten[10.0]


Richard has not seen this example but he will undoubtedly react. And we would you to react as well because we just do not know what is best. Richard is willing to change the error messages of the Apache Felix framework if people come up with a better solution than we have now.

Peter Kriens

P.S. Richard became a father yesterday of his second child: Brady William Hall. Congratulations!

9 comments:

  1. Peter,

    I think you are spot on. I really like the layout of the last error message, where you provide some background information and also visually match up imports and exports.

    ReplyDelete
  2. Peter,

    I like the code for the error. I don't like the background info. And I like the import/export graph.

    The "code for error" makes it easier to search for it over the Internet. This is (still) used by databases to report errors, and it's very useful so you can find possible causes and fixes.

    I don't like the background info, because that is just adding (mostly) static information that can be looked up. Only the first line seems mutable, as the "user constraint error" sounds too generic. I'd create something like E128.A for this actual case of user constraint violation. But I still believe that the error should be lookup up in a knowledge base (and/or Internet) for further information.

    The explanatory part is MUCH useful, because it makes it MUCH easier to see the conflict (I guess I'm as visual as you). That includes the "In this case..." and the import/export graph, but I'm just afraid how it would behave in a long chain of dependencies... I guess we would need a parser to get your graphical graph from the text file!

    Thanks for your good post,
    Leo.

    ReplyDelete
  3. I think you are working in the right direction, but you fail to lever the power of the Google search on a short but unique error message. I think this is what all developers do when seeing an error message.

    Maybe setup and include a unique URL for each problem, where the uninitiated can find background info and possible resolutions. Maybe something for ?lanetOSGI.

    ReplyDelete
  4. I like it, too. However I also believe the specific error code should be something that makes it easy to Google for. On my blog some posts get huge amounts of hits, because people are searching for a rare but very specific exception text, making my solution the top hit in Google. So I second Maarten's idea.

    But additionally I would rephrase the first line like this:

    [E123.A] 'uses' constraint violated in ....

    When I first read your phrasing - and I know OSGi - it read like something *was using a constraint violation* (whatever that would mean), but it took my brain a second to re-arrange the sentence and identify the meaning. Just a minor change, but IMHO could make a big difference in intelligibility.

    ReplyDelete
  5. We can improve it further by getting rid of the ldap qualifiers, at least for disjunctive normal form.

    Consider these two fragments:

    (&(package=bar)(version>=1.0.0)(!(version>=2.0.0)))

    package=bar && version >= 1.0.0 && version < 2.0.0

    Which would an average Java developer understand more quickly?

    ReplyDelete
  6. IMHO, a textual representation is a bad choice when explaining (dependency) graphs. Have you ever seen a teacher or professor explaining graph problems (exclusively) by writing text on a blackboard?

    Another idea:
    * reduce the textual output intended for direct human consumption to a minimum (error code plus one short sentence).
    * generate a machine-readable description of the problem, including the part of the dependency graph which fails to comply to all the constraints
    * print a http link to a web page where the user can paste this description to get a full explanation of the problem.

    This service should be a simple web page, where you copy/paste the textual output generated by the resolver, then press "submit". The service parses the description and generates a web page with
    - a graphical dependency graph
    - a prominent visual hint which dependency is causing problems
    - in *addition*, a textual problem statement

    The service could be made a standard bundle which is deployed in each OSGi runtime by default, or at least *also* at some prominent public location, so the resolver could include a link to either the public "explain this error" web page, or to the one run locally.

    ReplyDelete
  7. Displaying the information in LDAP filter syntax is hard for humans.

    Why not:

    import: foo; version=[1.0.0,2.0.0)

    instead of:

    import: (&(package=foo)(version>=1.0.0)(!(version>=2.0.0)))

    package= is redundant information and breaking the version range into the LDAP filter syntax is confusing.

    This seems to be leaking internal constraint modeling details into the error message.

    ReplyDelete
  8. You may like to compare a similar uses constraint error message produced by Eclipse Virgo as we put quite a bit of effort into readability:

    org.eclipse.virgo.kernel.osgi.framework.UnableToSatisfyBundleDependenciesException: Unable to satisfy dependencies of bundle 'a' at version '0.0.0': Cannot resolve: a
    Resolver report:
    Uses violation: <Import-Package: p; version="0.0.0"> in bundle <a_0.0.0[1295255517816]>
    Found conflicts:
    package 'r_0.0.0' in bundle 'c_0.0.0[59]' used by 'p_0.0.0' in bundle 'b_0.0.0[60]'
    conflicts with 'r_1.0.0' in bundle 'd_0.0.0[58]' imported by bundle 'a_0.0.0[1295255517816]'

    Also, the explanation "A bundle can see classes from the same package via multiple class loaders." is, of course, misleading as the uses constraint violation prevents resolution and so the inconsistent class space is never created.

    ReplyDelete
  9. The important thing to remember here is that we are discussing the exact content of an exception message, so I question how useful the formatting will be.

    Regarding LDAP filters, they are definitely not ideal, but they aren't terrible either. And given that they are the Felix resolver's internal model then that's all there is (there is no manifest format in memory). In short, the LDAP filters will have to stay...which is only particularly bad for version ranges.

    An alternative is to not show the actual offending requirements, so you won't know "the" requirement you'll just know "a" requirement was the issue. However, I find that for debugging purposes it is better to know the requirement, since often it is because attributes or version ranges are incorrect.

    ReplyDelete