Alan Kay is the inventor of Smalltalk, the first fully truly object oriented language. I learned Smalltalk in the early eighties and almost everyday that I use Java I am crunching my teeth that James Gosling did not steal more ideas from Smalltalk. About 20 years ago, during an OOPSLA, Alan Kay presented the idea that data should always carry its own methods to access that data. His example was a tape (!) that would contain the data as well as the code to interpret that data.
I think this idea was very much at the core of the Java Management standard first proposed around 1997. Each device would have a Java VM on board and the management system could send little management programs that would be executed on the device. However good it sounded at the time (and I tried to push this idea in Ericsson) the idea never became successful, it was just too complex to make it work reliably on a larger scale where machines have different versions and are implemented in more languages I could ever learn in this life. It was just too complicated, error prone, and risky. Exchanging, or relying on, arbitrary code between loosely coupled machines turned out to be a surprisingly bad idea. Objects, however useful they are in many places, seem to be getting more and more in the way when you build larger distributed systems.
The reason is that objects are so ill suited to go outside their
process is that they force the objects to expose their innards, the very
thing objects try so hard to hide. Even if we could encapsulate the
data during the transition as Alan Kay suggested we would create a huge burden on the receiver
to understand (and trust) the code that encapsulates the data. We also created a huge dependency problem that the code provided with the data can actually correctly run on the receiver.
There has always been an impedance mismatch between persistence and
object orientation. JPA does a decent job but there is something fishy
when you need such huge, complicated, and performance intensive
middleware only to simplify the life of the developer. Recently I've
been doing some more thinking about this subject and I think that though
objects work beautifully in a single process they are ill suited for anything
that involves crossing the process boundary, which obviously includes
Last week during an OSGi EG conference call the problem came up again during the discussion of a specification: do we support serialization for some of the domain objects or not? What is often not realized is that serialization is a public interface since it is shared with the world, it is not an internal implementation detail. This is the essence of modularity, there is an inside and there is an outside. What is on the inside only can be changed what escaped from the inside must be carefully (and thus more expensively) evolved since its dependencies are unknown.
The problem is acute with interface based programming. Two systems running a service defined in interface S (maybe separated in time) that need to communicate their domain objects can only do so if the specification for S defines a serialization format. Putting a serializedVersionUID in an interface is a total waste of bits (although they do occur!). The only solution that I see is that we need to make
the marshalling a first class citizen in the contract since the data
representation is part of the public API.
However, what format should be used? The standard Java serialization format is quite awkward to parse except for implementation classes.There is good old XML but JSON is increasing in popularity and there are enough other serialization standards out there to fill books. SQL is also a kind of serialization format. Picking one without making others unhappy will be hard.
I've come to the conclusion that the best format is actually ... Java. I started to use what I call data classes. These are classes with only public fields of primitives (or their wrappers), strings, data classes, and collections or arrays of data classes. This subset is very easy to (un)marshal to almost any available marshalling technique using simple rules and reflection. These data classes can act as a very convenient schema for my public interface to other processes, including me in the future (a.k.a. persistence). Since they are part of the Java type system they are easy to use and the compiler can do a lot of sanity type checking. And they can easily be versioned in OSGi.
The data classes are a solution to a problem I see becoming prevalent. It is against pure object orientation but I honestly do not see another solution; The shared code model just does not work very well. Sad, but I think it is time to declare defeat, maybe Java 8 should not steal from Smalltalk but the struct from C?