Monday, April 12, 2010

Calling your cake and sending it too

During the last EEG meeting in Mountain View at LinkedIn in March we discussed the next phase in Distributed OSGi: asynchronous messaging. With the Remote Service Admin specification we have an elegant model for handling the distributed topology of a cluster of systems but this model is based on synchronous calls to a service, like:

Baz n = service.foo( bar )

Synchronous function calls are very simple to use because the answer is returned inline on the same thread. This model of computing allows you to store the state on the stack, which is efficient and handy. However, In a distributed environment your thread will block for billions of instructions until the return comes in from the remote systems. Threads are relatively expensive resources and it is a pity they go to waste idling. Anyway, if you have to program in a concurrent environment a lot of advantages of synchronous calling seem to disappear. For example, you must be very careful not to hold locks when you call a remote service for it is easy to create deadlocks.

The alternative is messaging. With messaging you create a message (some object) and call a send method on some distribution provider. For example, in JMS there is a send() method that can take a Message, where the message object can contain arbitrary data. The receiver of the message then can send zero or more responses back. The sender can receive this through a proprietary callback mechanism or message queue.

Programs that are based on asynchronous messaging are highly scalable, are easier to make deadlock free, and are more extendable. For example, persistent queues are transparent for the sender and receiver but can provide some very interesting reliability characteristics to the system. In the early OSGi days I wrote an OSGi test framework that used synchronous calls from the GUI to the framework. After struggling with this model for some time I gave up and went to asynchronous message and I remember it felt like a dry warm towel after a heavy water-boarding session.

A big advantage for the OSGi Alliance is that services are a very convenient way to write their specifications using Javadoc. Message based APIs are not as nearly as easy to document. Also, in many cases the synchronous way of calling methods is by far the most efficient when the method is in the same process. With distributed OSGi, we are often not aware in our code that a service is remote. For the best of both worlds we'd like to be able to do both synchronously and asynchronously. There are actually different solutions that mix the idea of a synchronous call but asynchronously processing the result value.

The simplest solution is to adapt the API to handle the asynchronous return value. Google Web Toolkit uses this model extensively for its remote calls from the web browser to the backend. The basic API is defined with an interface but in reality the caller passes a callback object in every call. The following example shows two interfaces, first the normal and second the adapted version. The caller of the second declare method passes an object that is called back when the result comes in.


interface Tax {
Return declare( Declaration decl);
}
interface ServiceAsync {
void declare( Declaration decl, AsyncCallback result );
}

An alternative is the use of the Java 5 Future interface. A Future is an object that is immediately returned as the result of a synchronous call and can be used to get the result later asynchronously. Futures also require the adaptation of the the API to reflect the asynchronous nature:

interface ServiceFuture {
Future declare( Declaration decl);
}


Though these solutions are simple and provide type safety they are kind of ugly because it violates one of the important rules of modularity: cohesion. The interface that was very much about the domain now mixes in concerns of distribution. What does declaring taxes have to do with the callback? These two aspects are very unrelated and it is usually not a good idea to mix them in API design.

An alternative solution is provided by ECF, the Eclipse communications framework. They defined an IRemoteService that takes an IRemoteCall and an IRemoteListener parameter. The IRemoteCall object specifies the remote procedure call with parameters: a Method object, the actual parameters, and a timeout. The Remote Listener is called when the result comes in. This is an intriguing solution because it allows a call to any service, even a local one. The callee is oblivious of the whole asynchronicity, it is always called in a synchronous way. This solution is quite transparent for the callee but very intrusive for the caller because it is too awkward to use from normal code as long as Java does not provide pointers to methods. It is only useful for middle-ware that is already manipulating Method objects and parameter lists.

Could we use a synchronous type safe calling model to specify a service but use it in an asynchronous way if the callee (or an intermediate party like a distribution provider) could play along? After some toying with this idea I do think we can actually eat our cake and have it too.

It is always good practice to start with the caller because asynchronous handing of a method call is most intrusive for the caller. Assume Async is the service that can turn the synchronous world asynchronous and the player is a music player that is async aware. That is the player will finish the song when called synchronously but it will return immediately when called asynchronously. With these assumptions, the code could then look like:

Player asyncPlayer = async.create(player);
URL url = new URL("http://www.sounds.com?id=123212");
Future r = async.call( asyncPlayer.play( url ) );
// do other processing
System.out.println( r.get() );

This looks quite simple and it was actually not that hard to implement (I did a prototype during the meeting). It provides type safety (notice the generic type of the Future). So how could this work?

The heart of the system is formed by the Async service. The Async service can create a proxy to an actual service, this happens in the create method. For each call going through the proxy, the proxy creates a Rendez Vous object and places it in a Thread Local variable before it calls the actual service.

If the callee is aware of the Async service, for example a middleware distribution provider, then it gets the current RendezVous object. This RendezVous object can then be used to fill in the result when it arrives somewhere in the future.

After the asynchronous proxy has called the actual service the Rendez Vous was accepted or it was not touched. If it was not touched, the proxy has the result and fills it in. If the RendezVous object was accepted by the callee, the Rendez Vous is left alone; the callee is then responsible for filling in the result.

After the call the client calls the call method. This method takes the (null) result of the invoked method. The reason for this prototype is that it allows the returned Future to be typed correctly. Though the call method verifies it gets a null (to make sure it is used correctly) it will use the RendezVous object that must be present in the Thread Local variable. The previous sequence is depicted in the following diagram:

The proposed solution provides a type-safe and convenient way to asynchronously call service methods. It works both for services (or intermediate layers) that are aware of the asynchronicity or that are oblivious of it. For me this is a very nice example of how OSGi allows you to create adaptive code. The scenario works correct in all cases but it provides a highly optimized solution when the peers can collaborate. And best of all, it is actually quite easy to use.

Peter Kriens

P.S. These are just my ramblings, the post is just one of the many ideas discussed in the OSGi EGs, and it has no official status.

17 comments:

  1. Hi Peter,

    Your post look very interesting for me because in my open source solution i'm working a lot trying to build a "messaging solution" on top of OSGi servives.

    My solution at the moment is built on top on OSGi Event Admin service, and i'm using the Distributed Event Admin Service provided by ECF that has an implementation on top of JMS.

    I've also find some difficult within that approach, for example the Event Admin requires to mantain order delivery, and at the moment in the event objects olny simple values should be used..

    So my question is if does make sense to review and improve the current Event Admin Service instead of provide new specifications...

    Any thoughts???

    Andrea Zoppello

    ReplyDelete
  2. Andrea:

    I don't speak for the OSGi alliance or Peter but work for a member company, and have been preparing a very small and lightweight spec proposal to formalize the various efforts of making EventAdmin distribution-aware. This should put an end to the somewhat fragmented efforts that reinvent wheels (IMHO) badly.

    In any case it is important to understand that *eventing is not messaging*, and abusing EventAdmin or any other eventing solution in place of a messaging solution - which provides various acknowledgement modes, one-to-one queues etc. - is only going to end up in disappointment. It is much better to treat possibly unreliable, possibly unordered eventing and "reliable" messaging as different because they really are. To that end the Enterprise Expert Group has also started work on proper JMS integration, which should see some work soon.

    Regarding your last point: extending EventAdmin with new APIs and allowing for complex objects is a tar pit of problems for a multitude of reasons, all of which are too long for a blog response. :)

    best regards

    Holger

    ReplyDelete
  3. Hi Holger,

    Thanks for the response.

    I perfect undertsand the difference between "Eventing and Messaging" but leaving for a moment the (jms) queue concept i think you could agree that event admin spec is quite similar to publish subscribe model.

    I think i could agree with you that an "OSGi messaging service" will be the better solution, but as at the moment there's not i do not think that using event admin is so bad as you said.


    The reason for which i choice event admin and to implement some sort of messagging on top of it is that i like to have my components to rely only on event admin api and let then as an implementation details which implementation to use.

    In my use case both implementation are living inside the OSGI container and a service coud decide at runtime wich implementation to use ( the local one or the ecf-jms one )

    Regard the API extending i know classloading policies restriction on OSGi and i'm not thinking to use any object but simply the event admin to expoase some sort of EventBody interface...

    Regards the ordering i've raised a bug to OSGi expert group ( after i discussion on equinox bugzilla ) asking for adding the postUnorderedEvent method...

    Cheers
    Andrea

    ReplyDelete
  4. Andrea,

    excellent discussion! :-)

    What you call "OSGi messaging service" will (at least that's my intention) *not* be a new API; the idea is to layer over EventAdmin exactly the way you describe (very similar to what ECF DEA does). This will need to be a bit extended - similar to the "Remote Service Admin" spec with multiple layers of responsibility, because we probably want to allow for different middleware transports, not just JMS (which is conceptually a bit limited, precisely it was made for a different purpose).

    The class loading was exactly the issue I referred to - glad you know about the problem. There will be a solution coming soon becaused it is needed in other places as well (like JMS in order to deal with its ObjectMessage).

    Good to know that you were the one to raise the postUnorderedEvent idea! I agree that it is needed, but having more methods for all combinations of delivery qualities is probably not a scalable approach. After all "sending" itself is a polymorphic action. How the event should be sent (all related QoS) should be per-topic declarative (maybe overidable per event) and transport-dependent. That would allow selectively declaring (un)reliable, (un)ordered), (un)prioritized etc. delivery, or anything else that a topic's distribution provider can do. That way you can have multiple transports, each distributing multiple topics with varying levels of quality.

    We will soon introduce the ideas and concepts to a wider audience. I think it will satisfy all your current and future requirements. :)

    regards
    Holger

    ReplyDelete
  5. Hi Peter.

    Nice post. I appreciate the reference to ECF's work in this area of asynchronous invocation of a (remote or local) service.

    A couple of thoughts/comments: You say that the ECF approach is '...too awkward to use from normal code as long as java does not provide pointers to methods'. I don't deny that the lack of pointers to methods in java makes this approach to async remote method invocation more complex than direct method call, but I would assert that without language-level changes/additions *all* async mechanisms have a similar trade off...and that 'awkward' may be in eye of beholder, to some degree. I believe it's a question of getting the most general mechanisms possible and building APIs that are as simple as possible. Hopefully providing several ways and leaving the door open to further innovation (and possibly language changes).

    For example, In the approach you describe toward the end of your article, you introduce the notion of a an 'async service'. I can see the utility in this, but it also introduces complexity...i.e. extra steps of 'create' and 'call'. Further, the handling of method arguments (e.g. id) with a URL introduces quite a lot of complexity (IMHO)...even though currently popular with REST-based rpc...particularly when it comes to complex types (non strings) for parameters.

    It's also possible, btw, to build additional structures on top of IRemoteService methods (i.e. both the async callback and the future). Note that it would be almost trivial to create your async service api using IRemoteService.callAsync. Another idea would be to dynamically construct proxies with methods like

    IFuture asyncFoo(declared foo params);

    and/or

    void asyncFoo(declared foo params, AsyncCallback);

    Again this has additional complexity, but it has some positive characteristics...and since creating proxies is happening for the sync invocation, it can/could happen for async invocation. Note again that something like the above could be fairly easily implemented on IRemoteService.callAsync/1 or callAsync/2.

    In any event (pun intended)...hopefully this helps.

    ReplyDelete
  6. Hi Holger,

    You say: "This will need to be a bit extended - similar to the "Remote Service Admin" spec with multiple layers of responsibility, because we probably want to allow for different middleware transports, not just JMS (which is conceptually a bit limited, precisely it was made for a different purpose)".

    Just want to be clear about ECF...and it's implementation of a Distributed EventAdmin...ECF is fully transport independent right now...and the EventAdmin implementation specifically is *not* bound to JMS or any other transport. It *is* bound to an multi-transport API that we loving refer to the 'shared object api' (i.e. org.eclipse.ecf.sharedobject). This API gives us the replication and messaging semantics needed to implement Distributed EventAdmin.

    I do agree, though, that it should ultimately be possible to express message ordering requirements for a pub/sub messaging service (e.g. no ordering, sender ordered, causal ordering, global ordering, etc)...whether extending EventAdmin or not...and be able to get those guarantees.

    ReplyDelete
  7. The Future object looks nice, but unless clients treat them fundamentally different from normal return values, you are still pretty much in the same situation. Using (almost) your words: "Using Future objects, your thread will block for billions of instructions when you call Future.get(), until the return value comes in from the remote systems. Threads are relatively expensive resources and it is a pity they go to waste idling. Anyway, if you have to program in a concurrent environment a lot of advantages of synchronous calling seem to disappear. For example, you must be very careful not to hold locks when you call Future.get() for it is easy to create deadlocks."

    ReplyDelete
  8. Holger,

    It seems we definitely agree on goals... and on the fact that Event Admin is a good base to start...

    Your approach is right, so i appreciate you're considering all QoS factor to design a complete messaging solution.

    I'll definitely stay in touch for this new "OSGi messaging" features and i'll definitely adapt my code to work within when it will be ready.

    It seems that now i'm basically implementing myself a subset off "all messaging stuff".

    Regards the postUnorderedEvent, my request start because without this feature it's not possible to make the event delivery phase very scalable ( using a threadpoolexecutor ). Some people from equinox team suggest me to introduce thread pools when sending the event but this will not work with a distributed implementation...


    Andrea

    ReplyDelete
  9. @Boris,

    I don't think I'm understanding your entire point. It seems to me the introduction of a future (or in the case of equinox...IFuture in org.eclipse.equinox.concurrent and used by IRemoteService.callAsync) *is* to introduce a fundamentally different approach to dealing with blocking call/return semantics...as it is up to the client to determine when to call the (blocking) get(). Are you saying that the 'appropriate' use of IFuture will have to be more clear (which I certainly agree with), or are you also saying that additional mechanism (that is somehow even more different from blocking call/return) will/should be introduced? Or both...or something else? Thanks for any clarification.

    ReplyDelete
  10. @Scott,

    What I was trying to say is that at least one problem with synchronicity remains - calls to Future.get() are deadlock-prone. Also, naive use of Futures (where you call get() at a time convenient for the caller/client) would again lead to a "waste" of OS resources in that a thread would be blocking for an unknown amount of time.

    ReplyDelete
  11. @Andrea
    I think Event Admin is another story than messaging. Its property based nature makes not suitable for general messaging.

    @Scott:
    I think the Async service is significantly easier to use than your IRemoteService from app code unless we have method pointers. Just try to encode the example I give.

    However, IRemoteService is perfectly useful for an Async service implementation and many other middle-ware situations where you already have the method object.

    @Boris
    You point out the problems with a Future. Two things: First a future is useful because it allows you start multiple actions that then execute in parallel. I.e. start a, start b, start c. Wait for a, wait for b, wait for c. So there is an advantage. Second, in my prototype I created a Rendez Vous class that implements Future but adds callbacks for cancel and finish making it truly asynchronous.

    ReplyDelete
  12. @Peter,

    RE: Async service...I'm not sure I agree about easier...as it seems like a ymmv situation to me.

    FYI...I decided to play around with using java annotation (on java 6) to create an 'async proxy' from an annotated service interface...e.g.

    @AsyncService
    interface IFoo {

    String foo();

    }

    at compile time can create another service interface:

    interface IFooAsync extends RemoteService Proxy {

    IFuture fooAsync();
    and/or
    void fooAsync(ICallback);

    }

    Then at proxy creation/run time IFooAsync can be added to the remote service proxy (and implemented by IRemoteService.callAsync).

    This might simplify the use of asynchronous/non-blocking remote methods for those used to annotation.

    ReplyDelete
  13. @Peter

    As i already said to Hoelger i perfectly know the difference between event and messaging system.

    I agree with you that probably the event admin as it's now better fit to events than message, but in my opinion it will be very easy to "extend/complete" the event admin to fit both world.

    From my experience Event Admin used within DS and DS Component Factories is very useful, because it allow your service to rely on a very simple API, and if you consider that JMS implementation are available....

    I've already propose the add of the postUnorderedEvent method to specification..

    The other part that need to improve is to add "the Message" concept.

    ReplyDelete
  14. @Andrea
    I absolutely 100% do agree that the Event Admin has a very nice API that maps well to messaging. When we will work on messaging next month in London, we are sure to take a very deep look at the API for JMS. For example, last week I saw a very elegant solution at a client that used a whiteboard listener for JMS. To receive messages type X, he registered a service like:

    class FooReceiver implements Receiver {
    void receive(Foo foo);
    }

    c.registerService( Receiver.class.getName(), new FooReceiver(), null );

    The JMS front end listened to all Receiver services, figured out from the type what type it needed and then called the service. All nicely type safe without an explosion in interfaces. I think such a message receiving service model would be very simple to use; I am sure we will discuss such a model in the OSGi.

    @Scott
    Using apt and annotations to create new interfaces is an interesting variation on RMI but the big problem is that it requires collaboration of the domain interface, which is often not feasible. E.g. OSGi will likely not put such an annotation on the Log Service. And touching people's build is usually not a popular idea anyway :-)

    ReplyDelete
  15. @Peter

    It's of course not necessary to use apt, although given this approach we've just added to ECF remote services, then @AsyncService may be appealing to some (as @WebMethod or @WebService have become for web services)

    ReplyDelete
  16. Interresting post, I just want to add a remarks regarding what you are telling on GWT. I've already done something similar you explain here using the Proxy from java.lang.reflect package and I think you perhaps did it that way. The thing is this is impossible to do with GWT as the GWT Compiler 'translate' java code into Javascript and consequently introspection is not available. Therefore a simple solution to do something like above is much more tricky.

    ReplyDelete
  17. This discussion started a long time a go and until today I couldn't see an official position of OSGi Alliance.
    So I would like to know, will Async method calling be included in an OSGi specification ?

    ReplyDelete