Monday, September 30, 2013

Baselining, Semantic Versioning Made Easy

Versioning is one of those things where everybody has a general idea but few really understand it well, resulting in many different and sometimes bizarre practices. The semantic versioning movement put a more solid footing on the version syntax, creating a version Domain Specific Language (DSL) to signal backward compatibility. It uses a 3-part version, where the first part (MAJOR) signals the breaking changes, the second part (MINOR) signals backward compatible changes, and the third part (MICRO/PATCH) signal bug fixes not visible in the public API. For example, an artifact with version 1.2.3 has the same API as 1.2.4, will be backward compatible with 1.3.0, and will break with 2.0.0. By using semantic versioning you pledge that in the future you will use this DSL to signal backward compatibility so that tools can point out breakage or select compatible components. Semantic versions are a big step in software engineering.

So any decent software engineer will agree that semantic versioning is good; being able to watch Maven central close up, I can also see that it has become widely used over the past 2 years. That said, how much work is there for a developer to maintain these versions? Developers are rightly lazy people and versions can be quite error prone and are complicated to maintain without tool support. To minimize the work, I've used the OSGi semantic version rules extensively in bnd. If you compile against an API then you are bound to a range of versions. For example, if you compile against an API with version 1.2.3 then bnd will calculate the corresponding import range: [1.2.3,2). (Actually it is a bit more subtle, see the OSGi semantic version paper.)

Though bnd has the tools to maintain your semantic versions and therefore pledged how things would be updated, it never checked if those pledges were actually kept. If you forgot to change a version after a code change then all bets were off. Since humans are really bad at versions and developers rarely know all the compatibility rules there were many errors.

Meet baselining

When you enable baselining, bnd will baseline the new bundle against the the last released non-snapshot bundle, a.k.a. the baseline. That is, it compares the public exported API of the new bundle with the baseline  If there are any changes it will use the OSGi semantic version rules to calculate the minimum new version. If the new bundle has a lower version, a number of errors are generated. 

The first error is on the place where the version is defined. The other errors are on the API changes that caused the version change. Since bndtools runs bnd continuously you have the uncanny effect that adding a method to an interface suddenly generates errors in different places, pointing out that you are trying to make an incompatible change. Quick fixes are then available to bump the version or to remove the offending API change. Detecting errors earlier is the hallmark of Eclipse and is a great boon to productivity. We all know how much time it saves when you find these bugs while they are being made.

Baselining teaches the actual developers a lot about backward compatibility. After enabling baselining on bnd this weekend I was actually shocked to find that some of the (expected to be) tiny changes I had made in the last three weeks since we froze 2.2 were not as compatible as I thought. (This is another way of saying I had not bumped the appropriate versions.) They were not just bug fixes but actually had API repercussions I had not foreseen, humbling.

Peter Kriens @pkriens

Tuesday, September 24, 2013

The Magic of Modularity

Anybody that has done some computer classes over the last 30 years has learned about modularity and should know it is good. However, to describe why it is good is always a bit harder. Maybe it is not that hard to understand the benefits of encapsulation because we all have been in situations where we could not change something because it was exposed. However, for me the magic actually appears during design, when you pick the modules and decide about their responsibilities. This is reflected in the seminal paper of David Parnas [1971] called "On the criteria to be used in decomposing systems into modules".

Last week I was designing a function for bnd and there I ran into an example that illustrated very nicely why the decomposition is so important. The problem was the communication to the remote repositories. Obviously, one uses a URL for this purpose since it is extremely well supported in Java, it supports many protocols, and in OSGi it can be easily extended by registering a Stream Handler service. However, security and other details often require some pre-processing of the connection request. For example, basic Http authentication requires a request header with the encoded user id and password. Anybody that has ever touched security standards knows this is a vast area that cannot be covered out of the box, it requires some plugin model. This was the reason we already had a very convenient URLConnector interface in bnd that could translate a URL into an Input Stream:

   public interface URLConnector {
      InputStream connect( URL url) throws Exception;

Even more convenient, there were already several implementations, one that disabled Https certificate verification and one for basic authentication. Always so nice when you find you can reuse something.

However, after starting to use this abstraction I found that I was repeating a lot of code in different URL Connector implementations. I first solved this problem with a base class, but then it required extra parameters to select which of the options should be used. And the basic design did not support output (you know you can even send a mail with just a URL?). So after some struggling I decided to change the design and leverage the URLConnection class instead. Though the common use for a URL is to call openStream(), you can actually first get a URLConnection, parameterize it, and the actually open the connection. So instead of a URLConnector interface I devised a URLConnectionHandler interface. This interface had a single method:

   public interface URLConnectionHandler {
      void handle( URLConnection connection) throws Exception;

Since this interface now specifies a transformation it can be called multiple times, unlike the URLConnector interface. This enabled me to write a number of tiny adapters that only did one thing and were therefore much simpler and actually more powerful. The user can now specify a number of  URLConnectionHandler for a matching URL. For example, Basic Authentication should in general not be used without HTTPS since it shows the user id and password in clear text. Instead of building this verification in the Basic Authentication plugin it can now just be selected by the user so that for another URL it can be used with a different combination. 

After porting the existing functionality of the URLConnector implementations I ended up with significantly less code and much more power., only because the structure was different. That is what I call the magic of modularity.

Peter Kriens @pkriens

P.S. Registered for the OSGi Community Event in Ludwigsburg? I will be giving a talk about Developing Web Apps with OSGi. For Germans, there is also an OSGi bootcamp from the OSGi User's Forum Germany. Advance registration ends Oct 1.

Wednesday, September 18, 2013

OSGi's Popularity in Numbers

Just some interesting statistics that I found out by scanning Maven Central. I've got all the metadata in a Mongo database so it is easy to analyze. The current database consists of over 426 thousand JARs organized in more than 46 thousand projects. I have been scanning Maven central since last year and these numbers seems to have almost doubled, which is a scary though if this continues at that exponential rate (especially for Sonatype who seems to pay the bandwidth and storage costs of Maven Central).

Almost 10% of the 46000 projects in Maven Central today are OSGi bundles. The most surprising part for me was that the the official OSGi Core JAR actually comes in at #36. For the official OSGi Core JAR, there are more than 24 thousand transitively inbound projects. It is more popular than dom4j (#43) or Apache commons collections (#45).

So what does this ranking number mean? I uses an algorithm similar to Google Pagerank, a project is more important when it has more inbound maven dependencies (compile and runtime scope) based on the latest revision of a project. A staggering more than half of the 46 thousand projects have a transient dependency on the OSGi JAR. Staggering because it says probably more about the infectious nature of the maven dependency model than OSGi's popularity.

There are over 50 projects that contain the package org.osgi.framework. In overall ranking Eclipse Equinox at #78 and Apache Felix at #216. That said, Apache Felix also provides a compile only JAR with the OSGi packages that comes in at #100. When looking at this list it turns out that the OSGi Jars appear in several incarnations. I even found the release 3 JARs: 300k for core and compendium combined. The compendium, a JAR that is only used by projects that are really OSGi, comes in at #50.

The numbers look very good for OSGi and I think it indicates that we will see more and more projects providing OSGi metadata. As David Bosschaert wrote earlier this year, if you need help adding this metadata then let us know.

So which project is #1? I know you've been dying to ask. Well, the top 5 is:

  1. org.hamcrest : hamcrest-core
  2. junit : junit
  3. javax.activation : activation
  4. javax.mail : mail
  5. org.apache.geronimo.genesis.config : logging-config

Peter Kriens @pkriens

Friday, September 13, 2013

Babysteps, the RFP for the Application Framework

The first step official step is set for the OSGi Application Framework! In the past weeks I've followed the OSGi Process and written a Request for Proposal (RFP). last week we discussed this in lovely southern England at the IBM's Hursley premises. Since the OSGi Alliance recently made the specification process fully open, this RFP is publicly available

At the combined CPEG/EEG/REG meeting yesterday I spent almost 4 hours mostly talking about this. I first demonstrated the system I developed last year and then and segued into lessons learned. Since this was my 'sabbatical' I could do something that is a lot harder when you work for a company. I developed this system from the ground up to be a no-compromise µservice based system. This was fun and proved that µservices work as advertised for real life applications. However, this work also made me aware how hard it is to find the right components for your system. Though popular open source projects have adopted the OSGi headers (thank you! You know who you are.), few projects actually support the µservice model as it was intended. I was therefore forced to develop a lot of base components that just should have been widely available. And even if those components would have been there, there is actually too little information about how to architect an OSGi system.

After I had bored everybody for 2.5 hours we went to the RFP. The RFP is very ambitions (and quite large for an RFP). It outlines the scope, which is much, much, bigger than we can do in a short time. We will actually try provide developers with a complete solution, integrating many best practices. Ambitious, and it will take time, but it is supposed to guide us in the coming years when we work on this project. 

I'd love to get feedback and since this RFP is public you can now actually read while it will progress through the organization. So please read it and let me know. You can either react on the blog, mail me, or create issues on the OSGi Bugzilla

Peter Kriens

Monday, September 2, 2013

Why has OSGi a dynamic model?

OSGi was derived from Sun's Java Embedded Server, a product that had dynamics at its heart. It consisted of a dynamic µservice model with updatable modules. So why did we adopt this, seemingly complex, model? Well, we could, and at the time were heavily frustrated with Windows 98 that seemed to require a reboot for every third mouse move. So it seemed utterly stupid to build a static system that required a reboot to update a module or a configuration.   

What I had not realized at the time is what a powerful software primitive the µservice model actually was.  Once you accept that a service can come and go you need to make it easy to handle this. So we did with Declarative Services (DS). Once you have a primitive that models dynamic behavior you start to see how dynamic the real world actually is. You also notice that highly complex middleware is build to shield the application developer from the facts of life because they are not deemed clever enough to handle dynamics.

Bill Joy, once told us (at Ericsson Research) a very inspiring story about the development of the Internet that opened my eyes: How you can get much better quality, for a much lower price, by just accepting failure. Initially, he told us, the Internet was developed with routers that were not supposed to lose a package, ever. Despite these expensive and highly complex routers the desired quality of the network was not achieved because there were still too many failure modes. The key insight was to accept that it is ok that routers fail. This brought us TCP, the protocol to provide a reliable connection over an unreliable, much simpler, underlying network.

Once you accept that a µservice is frail, you must handle their frailty in your code. If you have DS, this is none to very little work for a component, DS acts in similar vein as TCP does. Systems build out of such resilient components are (much) simpler and thus more reliable. Read AntiFragile of Taleb if you want to see how nature uses this model pervasively.

Once you accept µservices as a primitive they can be used in an amazing number of use cases. In its most basic form it can just be a service abstracting a platform function, like for example logging, that is not likely to go away. It can represent the availability of something, e.g. a Bluetooth device in the neighborhood (of which you can have many). It can represent a configured database connection, a servlet, etc. And the cherry on top is of course that you can now remote a service since the middleware can reliably signal failures, voiding several of the arguments in the Fallacies of Distributed Computing.

When it is easy to work with these dynamics, you start to see more and more use cases. After wading through a very popular open source project last week, I noticed myriad places where µservices could have saved tons of code and would have added functionality. Virtually all software I write today consists of sometimes a small and sometimes sizable module but invariably a module that provides a single service and depends on a handful of services. 

So it is cool to update a module on the fly. However, I find it much cooler how the outside world can change while your system adapts. While I am developing days can pass without reboots,  updating components and configurations all the time. Not only is this a wonderful fluid way to develop, it also ensure your software becomes highly resilient.

Therefore, for me the real innovation of OSGi is the µservices model and paradoxically accepting their low quality of service.