Wednesday, November 6, 2013

The Transaction Composability Problem

Entering the enterprise world from an embedded background feels a bit like Alice must have felt when she entered Wonderland. Sometimes you feel very big, other times you feel awfully small. Worse, you often do not know your relative size in that new and wondrous world. One of these areas for me is persistence and its associated transaction model.
The reason I have to understand this area better is that for the enRoute project we will have to provide a persistence model that makes sense in a world build out of reusable components. If these components are closely tied to a specific database, JPA, and transaction manager implementations than reuse will be minimal, forfeiting the purpose. There are many issues to solve but analyzing this landscape one thing seems to pop up: the transaction composability problem. A problem quite severe in a reusable component model like OSGi.
Transactions provide the Atomic, Consistent, Isolated, and Durable (ACID) properties to a group of operations. This grouping is in general tied to a thread. A method starts a transaction and subsequent calls on that thread are part of the grouping until the transaction is either committed or rolled back. The easy solution is to start a transaction at the edge of the system (RMI, servlet, queue manager, etc.) and rollback/commit when the call into the application code returns. However,  transactions are related to locks in the databases (and other resource managers) it is therefore crucial to minimize the number of grouped operations to increase throughput and minimize deadlocks. Generating HTML inside a transaction can seriously reduce throughput. Therefore, application developers need to handle transactions in their code.
One of the issues these developers must handle is how to treat an existing transaction. Should they join it or suspend it? Is being called outside a transaction allowed? Since methods can be called in many different orders and from many different places it is very hard to make assumptions about the current transaction state in a method. For example method foo() and bar() can each begin a transaction but then foo() can not call bar() or vice versa.
The corresponding complexity and associated boilerplate code resulted in declarative transactions. Annotations provide the suspend and joining strategy and something outside the application takes care of the details. EJB and Spring containers provide this functionality by proxying the instances or weaving the classes.
Back to OSGi. Since transactions will cross cut through all components we need to define a single transaction model that can be shared by all, just like the service model.
Obviously, we will need to register a Transaction Manager service so the different components can at least manage transactions together. However, do we need to prescribe a declarative model since this seems to be the best practice? Do we then pick EJB or Spring annotations? Support both? Or make new? Or are we moot on this and allow others to provide this support with extenders? This would be similar to the service model, Blueprint, iPOJO, Dependency Manager, and DS are all in business to make life easier for code using services, a similar model could be followed for transactions?
I am very interested in hearing feedback on this.


  1. Rather don't. Seriously, trying to compose transactions is more pain than it's worth. The places where I have seen it tried have almost always ended up dumping the transaction composing because the performance is horrible. Also, almost nothing out there handles all of the edge cases, which means that sooner or later you end up having to hand-code recovery, which means you wasted all of that initial effort.

  2. So what you're saying is, is that you want to manually handle transactions?

  3. Based on the e-mails we had in the last couple of days/weeks, I comment my thoughts here as well:

    In case there will be a "standard" how transactions should be handled in the OSGi world, please leave out the annotation based transaction handling. It is used everywhere, than why?

    First, there is the solution that the technology wraps the service class with a java proxy. It intercepts every call on the interface and if there is an annotation it handles transactions. Oops, there is a problem. What if a function call is done within the object. The function call will not be intrcepted.

    Second, technologis start using ASM or Javassist to be able to intercept every function call, even the ones within the object. They do class inheritance and the inherited class is instantiated. Oops, there is a problem: What if I wanted to annotate private function within the class with @Transaction? It will not be handled by the inherited class.

    Third, technologies start doing weaving and modify the code of the original class during class loading. Wow! Are we sure we want this? Just check the startup time of a JEE server. Glassfish deployment is way much slower than the deployment on a simple Tomcat. Why? Because during the deployment lots of Annotation scanners run, and there is weaving as well. In my opinion, the biggest mistake of the OSGi specification (there are not many :)) was that weaving became possible at version 4.3. People start implementing annotation scanners and weaving in their technologies and the OSGi container soon becomes slow and robust. The programmer will not know what code runs on the server because the code is modified during runtime...

    What can we use than? I think the logic that there are transaction brackets and there is a commit in case of a return or rollback in case of a RuntimeException is cool. It makes programming much easier. Before Java 8 we can do this with unnamed classes passed to a TransactionHelper function or with an API that is similar to the one of ReentrantLock. Is it really a pain to write 3 lines more when we want to do transaction handling? After Java 8 we will have Lambda expressions and the code will look nicer.

    Another use-case that should be concidered during the design of transaction handling in an application: Database clusters often work in that way that there is a master installation and synchronized replicas. We can insert and update the data on the master, while reading can be done on one of the replicas. It is especially useful if we do the FTS (full-text-search) based on the database engine. How do we decide if we should go to the master, or to a replica. In case there is an ongoing transaction, we go to the master and if not, we go to the replica. In case of a web application for most of the screen rendering, we do not even need transaction.

    And as you wrote: Transactions should not be long. In a project there was a solution that the transaction was handled in a filter. Everything was in a transaction. Even the rendering of the page that was the 95% of the query processing time. The system could have been killed easily as many of the queries could not run parallel. Except for some special cases (uploading a large blob or something) I would say that the transaction timeout should not be longer than 1 second. That really forces the programmers to have an effective design.

  4. Noel, can you please expand on your comments 1) "dumping the transaction composing because the performance is horrible" and 2) "sooner or later you end up having to hand-code recovery"?

    I don't really care what solution (but I think there should be one) is implemented as long as it is simple to use, one line would be preferable (as is currently the case with Spring annotated transactions) but I suppose three may be okay.

    Balázs you highlighted a couple of good points regarding some problems with the 'out of the box' Spring proxy solution where a function call is not intercepted within another function. It would be great if the new solution did not suffer the same restrictions.

    I am not so concerned with speed of deployment vs simplicity of use. I think there will be more acceptance of a solution that is simple to use vs trading off speed of deployment, especially considering many will compare the new solution with an existing solution e.g. a one line annotation, so an increase in complexity will probably not be accepted well.

    Balázs the example you sited where queries could not be run in parallel, I think that has more to do with the architecture of the database e.g. most MVCC relational databases (e.g Oracle, Postgres) will not block on a read when using read commited isolation, however SQL Server and Sybase can (or at least did in the past).

    Where I really do disagree strongly is putting any restriction on a maximum transaction timeout. How long a transaction is open should be dependent on the use case. For example it is easy to envisage a process running asynchronously with respect to the user's action performing many long running statements updating/inserting/deleting millions of rows with a desire to do this within one transaction.

  5. In case a database accepts 10 parallel connections from an application, it is really important to close a transaction as soon as it is poosible. By a managed connection pool the connection is held until the transaction is finished. In case there is a read query on the database and after that we have some HTML rendering that takes 200ms, that connection will be locked from parallel usage until that time. I do not think it is the solution to allow 200 times more connections on the database side.

    In web applications if there is a CRUD or some similar solution, I think it is helpful for the lead developer to reduce the transaction timeout to 10ms. When a developer meets the problem that his/her code runs out of the timeout, they can have a code review together to see the causes. In many cases a bit or refactor will help. There are of course situations, when longer transaction timeout is necessary, e.g. when we upload a large blob into the database. It is another question if we store the blob in the database.

    In case there are millions of rows to change, it should be done (if it is possible) with packets, e.g. 100 records/packet that can fit into a low transaction timeout. In many cases a bit of refactor can help, like adding a new field with a state that the fields are selected to process to be sure that the million rows are not modified until each packet is processed. On the million rows a bulk update runs fast enough.

  6. In case we do anything based on a connection within a transaction the connection cannot be re-used until the transaction is finished. We can help with a managed connection pool so the same connection will be provided within the same transaction if available. In case the transaction is kept long unnecessarily, it will reduce the parallel processing of the application. Let's say we have a connection-pool with 8 connections. The server will not be able to process more than 8 queries parallel even if 100 parallel HTTP queries are allowed. What should we do than? Allow hundreds of connections to the database for each application? I do not think so.

    In my opinion it is really good to reduce the transaction timeout to 1 sec or less (if it is possible). In case a developer meets the problem that his code runs out of the timeout, it is a good reason to have a code and design review. Most of the times, after a code review, the application speeds up a lot and there are no such problems anymore.

    There are of course use-cases, when longer transactions are necessary. E.g.: If we store a large object (blob) in the database, upload and download takes time. In this case the transaction timeout can be longer.

    In case there are millions of records that should be updated, it is often possible to re-design the schema a bit to handle this requirement. For example, a flag can be used in the table that tells the application that the records are selected for later process so nobody should update them. If we have this solution, it is possible process the data in packets (100 records in a transaction) that fits in a reasonable transaction time.

    No-SQL databases flourished in the last few years. I think that there are use-cases when they are useful. However, in many cases developer want to change to no-SQL because they wrote bad quality code and kept long running transactions. With no-SQL the problem is "solved" as they will not be able to use long-running transactions. There will be always available connections from the pool.

  7. Tim, Peter, when you do transactions like this you tend to have two things happening

    (a) your transaction boundaries become unpredictable, which leads to hard-to-debug performance issues because some random component can sometimes take a very long time.

    This however, is merely painful, and is not as big a problem as

    (b) you need to use 2-phase commit. Now 2PC is one of those deceptive things that doesn't look terribly hard to handle. Sorry various people try to implement it, and it will mostly seem to work in local testing. But they invariably get it wrong when handling the edge cases, which will invariably lead you to occasional partially committed data.

    So my opinion is that transactions should never cross the API boundary between OSGI-level components, unless the component is itself exposing a transaction API.

  8. I'm a little bit late in the discussion but I think the problem has not been solved yet. We had a project based on Gemini Blueprint last year. It was really nice to simply put @Transactional on all the methods that needed a transaction. The downside was that all implementation bundles had a dependency to org.springframework.transactions. We tried to do the transaction demarcation in Spring/Gemini configuration files and to put these files in extra configuration bundles. But that didn't work.
    I'd like to see a solution that reuses existing technologies (like Spring) but keeps the vendor/container specific stuff out of the implementation bundles.