OSGi Blog: Persistence

Monday, July 8, 2013

Persistence

My assignment for the OSGi Alliance is to increase adoption by making it easier to get started with OSGi. So I am currently writing a Request For Proposal (RFP), the standards requirements document in the OSGi Alliance. One of the primary parts is the Application Domain. In this section you neutrally describe the current practices, it is basically used to scope the problem domain area and provide a vocabulary for the subsequent sections and documents.
So last week I started the section persistence. It is an area that I have rather little experience with so I welcomed the chance to work with them during my sabbatical. I picked the document oriented database Mongodb because it felt much easier to work with in an object oriented environment than relational databases, and I must admit that choice has made me quite happy (except for the lack of transactions). However, it is clear that relational databases are the bread and butter of web applications. I therefore had to look deeper into what's happening in this area.
I then stumbled upon the debate around Java Data Objects (JDO) and Java Persistence Architecture (JPA). I had the privilege to work with Mike Keith (Oracle, JPA spec lead) on the OSGi JPA specifications so I knew something about JPA. However, so far I'd never seriously looked at JDO, and actually thought JPA was replacing JDO. Unfortunately, life seemed to be not so simple.
Both JDO and JPA define metadata to store normal Java objects in a database. JDO focuses on any type of persistent store (even S3 seems to be supported) while JPA is only used for relational databases. JDO seems to be a real working horse that got a new lift when Google selected it for its Google App engine. Though JPA is the new kid on the block its decision to limit itself to relational databases seems a severe limitation with the increasing popularity of NoSQL databases like Mongodb, Cassandra, Neo4j, etc. So far, it also looks like JDO is more portable and flexible while JPA seems faster. However, lots of experiences seem to come from toy projects or evaluation projects. Now lets not start a flamewar but I would love to hear (neutral) experiences of the use of these technologies in real life sized projects. Obviously I am extremely interested in how they work in OSGi.

Peter Kriens

16 comments:

UnknownJul 8, 2013, 8:46:00 AM
For MongoDB in OSGi, take a look at the MongoDB component in Amdatu: http://amdatu.org/components/mongodb.html. We use this in production combined with an Object Mapper (Jackson based in our case, but it works with other mappers as well). The component basically configures the standard Java driver using Configuration Admin.

JDO vs JPA I would definitely vote for JPA. It always struck me as strange that AppEngine went the JDO way (at first), because JPA has been the leading spec for the past several years. JPA is only focussed on relational databases, although for example EclipseLink offers some functionality for other data stores as well. I think the different datastores are too different to have one Object Mapping spec. to rule them all. This is also why this was removed from the Java EE 7 scope. Work is still being done in that area in both EclipseLink and Hibernate OGM however.

So for relational databases, definitely go for JPA. For other data stores there are no specifications yet, so pick one of the leading object mappers (e.g. Morphia or Mongojack for MongoDB).

ReplyDelete
Replies
UnknownJul 8, 2013, 8:55:00 AM
Hi Peter,

OSGi + JPA work fine together in projects I'm participating. EclipseLink was chosen as JPA implementation (JPA RI, JPA 2.1 API). The projects do not use any OSGi Enterprise API Implementations (like Apache Aries or Eclipse Gemini) and integrate with JPA API on lower level (we have more control and flexibility to manage EntityManagers).

Best regards,
Dmytro
ReplyDelete
Replies
Peter KriensJul 8, 2013, 9:55:00 AM
@Paul: I think I listed your arguments but reading up on it I can't escape the feeling that JDO is undervalued ... especially now with NoSQL becoming more popular. It is kind of scary that JDO seems to be a superset of JPA, even for relational.
ReplyDelete
Replies
UnknownJul 8, 2013, 11:17:00 AM
According to the OSGI Enterprise spec a persistence bundle is required to contain all the entities that the persistence unit it defines refers to.
This is a drawback in OSGI + JPA since it prevents independent deployment and container independence.
ReplyDelete
Replies
Cristiano GaviãoJul 8, 2013, 11:56:00 AM
I have been using Gemini JPA + EclipseLink for a while. I like the way that EMF/EMFB services are generated in the name of the PU bundle using extender pattern.
But the problem I see in the current JPA spec is that entities must come from one unique Bundle. We could use fragments to extend an existent PU but fragments is not good to PaaS.
In our case, customer could choose which business modules he want to pay for, so I must delivery only the domain entities for those modules.And currently I need to that in build time instead of to do this at runtime.
Another point that worth attention for new specs is define the proper setup/use of JPA services in context of a subsystem.
ReplyDelete
Replies
GiammaJul 10, 2013, 9:09:00 AM
Hi Peter,

My main issue with OSGi JPA is related to database connection configuration.
In J2EE you can easily define a JDBC datasource on the server and let your application only know the JNDI name.
This of course has the advantage that you can change the database driver, connection parameters, pooling configuration, statement cache etc without having to change your application.
In OSGi I could not find a configuration approach that was equally convenient. When done the OSGi-JPA way, my company system administrators complained that configuring the application required the skill set of a developer.

If I remember correctly, when using Eclipse Virgo and EclipseLink, you need:
* Gemini DBAccess that publishes the data base driver in the registry as a DataSourceFactory
* Gemini JPA that reads your persistence.xml to figure out which JDBC driver you require, locates the appropriate DataSourceFactory by driver name and finally configures EclipseLink

I really dislike the need to declare the driver name in your persistence.xml. Plus, if I am not mistaken, the specification even states that you could include the JDBC driver in your bundle.
Both are bad practices: my application consists of about 20 persistence units and supports a handful of different database systems.
I don't like the idea of having to edit all the persistence.xml files and repackage the bundles to change the database server, while in JEE you just change the DataSource definitions on the server.

Plus, I could not find an easy way to tweak the datasource configuration (e.g. connection pool size, time to live etc).

Because of the above configuration issues (and also due to a couple of bugs in Gemini JPA and EclipseLink when used together), I ended up using Tomcat JNDI support in Virgo and
I passed the data source obtained via JNDI to EclipseLink programmatically, without using Gemini JPA and Gemini DBAccess.
Therefore, I am not using a pure OSGi approach and as such I have lost portability across OSGi runtimes.

GianMaria.

ReplyDelete
Replies
AnonymousJul 21, 2013, 8:03:00 AM
I don't have any experience with JDO, only with JPA. Both standards are compromises in a way that they abstract too many things. In my point of view:

JPA only supports RDBMS, but hides SQL

You've blogged about this yourself. JPA creates an impedance mismatch between relational data models and OO models. Considering that JPA focuses on RDBMS only, this is really a problem, because:

- You cannot 100% map relational data to objects and vice versa
- You cannot really use SQL (and I don't mean the occasional plain SQL SELECT. I'm talking about sophisticated UPDATE, MERGE statements, etc)

JDO supports any data store, pretending they're "similar"

With JDO, of course, you cannot really use SQL either in case you're using JDO with an RDBMS. But things get worse:

JDO will eventually run into similar issues like Microsoft LINQ or other querying APIs that pretend that all data stores are alike. They're not. Abstracting a SQL or non-SQL DML will inevitably reduce the target store's functionality to a mediocre subset of what its original power really was. If you're choosing MongoDB only to hide it behind some generic API, then you're not taking full advantage of MongoDB.
ReplyDelete
Replies

Add comment