Tuesday, March 20, 2012

Coordinator

Last year we introduced the Coordinator in the Compendium 4.3. Unfortunately, this release 4.3 was held up over some legal issues. However, it will soon be available, in the 4.3 Compendium as well as the Enterprise 5.0.

The Coordinator service is a bit my baby. When we started with OSGi almost 14 years ago one of the first things I proposed was to start with a transaction manager. I'd just read in 3days Transaction Processing from Gray & Reuters and was thrilled, that had been the best non-fiction book I ever read. Obviously the ACID properties were interesting, and very informative to see how they could be implemented, but the most exciting part was the coordination aspect of transactions. Transactions, as described in the seminal book, allowed different parties (the resource managers) to collaborate on a task without prior knowledge of each other. Resource managers when called could discover an ongoing transaction and join it. The transaction guaranteed them a callback before the task was finished. This of course is a dream in a component model like OSGi where you call many different parties of which most you have no knowledge of. Each called service could participate in the ongoing task and be informed when the task was about to be finished. When I proposed a transaction manager the guys around the table looked at me warily and further ignored me, transactions in an embedded environment?

We got very busy but I kept nagging and the rest kept looking if I was silly in the head. In 2004 I finally wrote RFC 98, a modest proposal for a transaction manager light. Of course I immediately ran into the situation that, even though few if any had used it, that there was an already existing Java Transaction API (JTA). Prosyst did some work on this since they saw some value but in the end it was moved to a full blown JTA service. This was not what I wanted because JTA is weird (from my perspective then) because it distinguishes too much between the Container people and the application people. OSGi is about peer-to-peer, containers are about control from above. Try to act like a resource manager with XA (which would give the coordination aspects), however, it looks like it was made difficult on purpose.

The it hit me, I always ran into the opposition because I used a name that too many people associated with heavy and complexity. Though a full blown distributed high performance robust transaction manager is to say the least a non-trivial beast I was mostly interested in the coordination aspects inside an OSGi framework, a significantly simpler animal. So choose to change the name! The Coordinator service was born!

The first RFC was a very simple thread based Coordinator service. When you're called in your service you can get the current Coordination (or you can start one). Each thread had its own current Coordination. Anybody could then join the Coordination by giving it a participant. The Coordination can either fail or succeed, after which all the participants are called back and informed of the outcome. Anybody that has access to the Coordination can fail it, the Coordination will also fail with time out if it is not terminated before a deadline.

So how would you use it? The following code shows how a Coordination is started:


  Coordinator  coordinator = ...
 
  Coordination c = coordinator.create("work",0);
  try {
    doWork(c);
  } catch( Throwable t ) {
    c.fail(t);
  } finally {
    c.end(); 
  }
This template is very robust. The Coordination is created with a name and a timeout. The work is then done in a try/catch/finally block. The catch block will fail the Coordination. Calling end on a failed Coordination will throw an exception so the exception does not get lost. A worker would do the following to participate in the Coordination:

 Coordinator coordinator = ... 
 void foo() {
   doPrepare();
   if ( !coordinator.addParticipant(this))
     doFinish();
 }
 
A worker can use the Coordination service to add itself as a participant. It is then guaranteed to get a call back when the current Coordination is terminated.

An example use case is batching a number of updates. Normally you can significantly optimize if you can delay communications by batching a number of updates. However, how you do you know when you had the last update so you can initiate the batch? If there is no Coordinator, the updates are done immediately, with a Coordinator they can be batched until the Coordination is terminated.


During the specification process a number of features were added: direct Coordinations (not thread based), a variable store per Coordination, and a reflective API.

I guess it will take some time before the Coordinator is really taken advantage of since the model is quite foreign to most developers. However, I am convinced that the service is really what OSGi is all about: coordinated collaborative components.

Peter Kriens

3 comments:

  1. See also http://njbartlett.name/2012/03/21/are-we-there-yet.html

    ReplyDelete
  2. Nice post Peter!
    Are the workers responsible to implement the rollback process?

    Cheers,

    Bruno

    ReplyDelete
  3. You won't get any free lunch. You can find out how the task succeed, either failed or succeeded. The rest is up to you. Not the gold standard of ACID but actually quite nice for virtually no cost.

    ReplyDelete