Monday, July 8, 2013

Persistence

My assignment for the OSGi Alliance is to increase adoption by making it easier to get started with OSGi. So I am currently writing a Request For Proposal (RFP), the standards requirements document in the OSGi Alliance. One of the primary parts is the Application Domain. In this section you neutrally describe the current practices, it is basically used to scope the problem domain area and provide a vocabulary for the subsequent sections and documents.
So last week I started the section persistence. It is an area that I have rather little experience with so I welcomed the chance to work with them during my sabbatical. I picked the document oriented database Mongodb because it felt much easier to work with in an object oriented environment than relational databases, and I must admit that choice has made me quite happy (except for the lack of transactions). However, it is clear that relational databases are the bread and butter of web applications. I therefore had to look deeper into what's happening in this area.
I then stumbled upon the debate around Java Data Objects (JDO) and Java Persistence Architecture (JPA). I had the privilege to work with Mike Keith (Oracle, JPA spec lead) on the OSGi JPA specifications so I knew something about JPA. However, so far I'd never seriously looked at JDO, and actually thought JPA was replacing JDO. Unfortunately, life seemed to be not so simple.
Both JDO and JPA define metadata to store normal Java objects in a database. JDO focuses on any type of persistent store (even S3 seems to be supported) while JPA is only used for relational databases. JDO seems to be a real working horse that got a new lift when Google selected it for its Google App engine. Though JPA is the new kid on the block its decision to limit itself to relational databases seems a severe limitation with the increasing popularity of NoSQL databases like Mongodb, Cassandra, Neo4j, etc. So far, it also looks like JDO is more portable and flexible while JPA seems faster. However, lots of experiences seem to come from toy projects or evaluation projects. Now lets not start a flamewar but I would love to hear (neutral) experiences of the use of these technologies in real life sized projects. Obviously I am extremely interested in how they work in OSGi.

Peter Kriens


16 comments:

  1. For MongoDB in OSGi, take a look at the MongoDB component in Amdatu: http://amdatu.org/components/mongodb.html. We use this in production combined with an Object Mapper (Jackson based in our case, but it works with other mappers as well). The component basically configures the standard Java driver using Configuration Admin.

    JDO vs JPA I would definitely vote for JPA. It always struck me as strange that AppEngine went the JDO way (at first), because JPA has been the leading spec for the past several years. JPA is only focussed on relational databases, although for example EclipseLink offers some functionality for other data stores as well. I think the different datastores are too different to have one Object Mapping spec. to rule them all. This is also why this was removed from the Java EE 7 scope. Work is still being done in that area in both EclipseLink and Hibernate OGM however.

    So for relational databases, definitely go for JPA. For other data stores there are no specifications yet, so pick one of the leading object mappers (e.g. Morphia or Mongojack for MongoDB).

    ReplyDelete
  2. Hi Peter,

    OSGi + JPA work fine together in projects I'm participating. EclipseLink was chosen as JPA implementation (JPA RI, JPA 2.1 API). The projects do not use any OSGi Enterprise API Implementations (like Apache Aries or Eclipse Gemini) and integrate with JPA API on lower level (we have more control and flexibility to manage EntityManagers).

    Best regards,
    Dmytro

    ReplyDelete
  3. @Paul: I think I listed your arguments but reading up on it I can't escape the feeling that JDO is undervalued ... especially now with NoSQL becoming more popular. It is kind of scary that JDO seems to be a superset of JPA, even for relational.

    ReplyDelete
    Replies
    1. You could be right about JDO being undervalued, I'm not sure because I didn't use it outside the AppEngine context.

      In the process of a more streamlined development experience I think the focus should be on technology that most people know about (and that's definitely JPA in this discussion). Moving from JPA to JDO would be a show stopper for a lot of people (event if JDO might be technically superior).

      Delete
  4. According to the OSGI Enterprise spec a persistence bundle is required to contain all the entities that the persistence unit it defines refers to.
    This is a drawback in OSGI + JPA since it prevents independent deployment and container independence.

    ReplyDelete
    Replies
    1. a) It seems good practice anyway to keep your domain objects close to your persistence XML since they are highly coupled?
      b) I fail to see why you would not have container independence? Can you elaborate?

      Delete
    2. A persistence bundle for OSGi is container independent, of course. Maybe container independence is the wrong word here. My problem is as follows: We have an application that contains several business modules. Each module has its own jar containing the entities. The product delivered to the customer contains all the modules for that customer and a persistence.xml. All entities from the different module jars are then contained in the persistence unit.
      This is standard JPA stuff and I cannot reuse it in OSGi. Thats why I said that OSGi prevents container independence.
      Btw: I just saw that Cristiano Gavião is talking about the same problem in the next posting.

      Delete
    3. It sounds like very bad practice since you now have 'hidden' dependencies; the persistence.xml now contains class names that are not present.

      Possible solution: Use a bundle for the persistence.xml and then put the domain objects in fragments.

      Delete
    4. Not necessarily - the JPA impl may try to discover entities based on annotation scanning. In this case, there are no classes in the persistence.xml. Iirc this was something I never got working with JPA and OSGI - I always had to put the classes in explicitly.

      Delete
    5. @Peter The solution with fragments would work. But I'm using my domain objects in different products and each product has its own persistence bundle. I would need fragments for each product :(
      What I would like to have is:
      1. Each business module has its bundle containing the entities. No dependency to any persistence bundle.
      2. There is a product specific persistence bundle that contains the persistence.xml. The Import-Package declaration lists all (entity) packages that are needed by the persistence bundle (No hidden dependencies thanks to OSGi)

      Delete
  5. I have been using Gemini JPA + EclipseLink for a while. I like the way that EMF/EMFB services are generated in the name of the PU bundle using extender pattern.
    But the problem I see in the current JPA spec is that entities must come from one unique Bundle. We could use fragments to extend an existent PU but fragments is not good to PaaS.
    In our case, customer could choose which business modules he want to pay for, so I must delivery only the domain entities for those modules.And currently I need to that in build time instead of to do this at runtime.
    Another point that worth attention for new specs is define the proper setup/use of JPA services in context of a subsystem.

    ReplyDelete
    Replies
    1. What is the problem with fragments in Paas?

      Delete
    2. Well, mainly because fragments doesn't have its own lifecycle, so I don't have module independence.
      In this case I have to create one PU bundle for the entire system, set it using config admin (that is cool) and put all entities of each business module inside its own fragment. But if I need to do any maintenance in only one module, I have to stop/refresh all others together...

      Delete
  6. Hi Peter,

    My main issue with OSGi JPA is related to database connection configuration.
    In J2EE you can easily define a JDBC datasource on the server and let your application only know the JNDI name.
    This of course has the advantage that you can change the database driver, connection parameters, pooling configuration, statement cache etc without having to change your application.
    In OSGi I could not find a configuration approach that was equally convenient. When done the OSGi-JPA way, my company system administrators complained that configuring the application required the skill set of a developer.

    If I remember correctly, when using Eclipse Virgo and EclipseLink, you need:
    * Gemini DBAccess that publishes the data base driver in the registry as a DataSourceFactory
    * Gemini JPA that reads your persistence.xml to figure out which JDBC driver you require, locates the appropriate DataSourceFactory by driver name and finally configures EclipseLink

    I really dislike the need to declare the driver name in your persistence.xml. Plus, if I am not mistaken, the specification even states that you could include the JDBC driver in your bundle.
    Both are bad practices: my application consists of about 20 persistence units and supports a handful of different database systems.
    I don't like the idea of having to edit all the persistence.xml files and repackage the bundles to change the database server, while in JEE you just change the DataSource definitions on the server.

    Plus, I could not find an easy way to tweak the datasource configuration (e.g. connection pool size, time to live etc).

    Because of the above configuration issues (and also due to a couple of bugs in Gemini JPA and EclipseLink when used together), I ended up using Tomcat JNDI support in Virgo and
    I passed the data source obtained via JNDI to EclipseLink programmatically, without using Gemini JPA and Gemini DBAccess.
    Therefore, I am not using a pure OSGi approach and as such I have lost portability across OSGi runtimes.

    GianMaria.


    ReplyDelete
    Replies
    1. Though the blog post was about JPA versus JDA it seems people are heaving problems with JPA. I 250% agree with your objections, I am pretty sure you could set the JDBC driver outside persistence XML. I will investigate and write a blog about OSGi an JPA, hoping to show how it can be done well.

      Delete
  7. I don't have any experience with JDO, only with JPA. Both standards are compromises in a way that they abstract too many things. In my point of view:

    JPA only supports RDBMS, but hides SQL

    You've blogged about this yourself. JPA creates an impedance mismatch between relational data models and OO models. Considering that JPA focuses on RDBMS only, this is really a problem, because:

    - You cannot 100% map relational data to objects and vice versa
    - You cannot really use SQL (and I don't mean the occasional plain SQL SELECT. I'm talking about sophisticated UPDATE, MERGE statements, etc)

    JDO supports any data store, pretending they're "similar"

    With JDO, of course, you cannot really use SQL either in case you're using JDO with an RDBMS. But things get worse:

    JDO will eventually run into similar issues like Microsoft LINQ or other querying APIs that pretend that all data stores are alike. They're not. Abstracting a SQL or non-SQL DML will inevitably reduce the target store's functionality to a mediocre subset of what its original power really was. If you're choosing MongoDB only to hide it behind some generic API, then you're not taking full advantage of MongoDB.

    ReplyDelete