The second OSGi Cloud Workshop was held during EclipseCon/OSGi DevCon 2012 last March. It was a very interesting morning with some good presentations and some great discussion. You can still find the presentations linked from here:
http://www.osgi.org/Design/Cloud.
We learned that people are already widely using OSGi in Cloud environments, and part of the morning was spent discussing what OSGi could do to make it even more suitable for use in the Cloud. As a result of that a number of topics were proposed for people active in the OSGi Alliance to look at. You can find a summary of these topics here:
https://mail.osgi.org/pipermail/cloud-workshop/2012-March/000112.html.
Last week the OSGi Enterprise Expert Group and the Residential Expert Group met to discuss these topics and to find potential routes to address them. Below you can find the results of these discussions. In this list I'll start each topic with the requirement as posted earlier to the
cloud-workshop mailing list. The follow-ups below describe the thinking that we came to during the recent EEG/REG meeting.
1. Topic: Make it possible to describe various Service unavailable States.
A service may be unavailable in the cloud because of a variety of reasons:
- Maybe the amount of invocations available to you is exhausted for today.
- Maybe your credit card expired
- Maybe the node running the service crashed.
- etc...
It should be possible to model these various failure states and it
should also be possible to register 'callback' mechanisms that can deal
with these states in whatever way is appropriate (blacklist the service,
wait a while, send an email to the credit card holder, etc).
1. Follow-up: A potential new RFP is under discussion around monitoring and
management. This RFP is currently being discussed in the Residential
Expert Group, but it should ultimately be useful to all contexts in
which OSGi is run. The requirements in this RFP could address some of
the service quality issues referred to in this topic.
Additionally, there was a discussion whether it would make sense to
extend the OSGi ServiceException so that various types of service
failures could be reported (i.e. payment needed, quota exceeded, etc).
2. Topic: WYTIWYR (What you test is what you run) It should be possible to
quickly deploy and redeploy.
2. Follow-up: One of the requirements that this expresses is the need to remotely run a
test suite in an existing (remote) framework. There are test OSGi test
frameworks that support this kind of behavior today (
Pax Exam,
Arquillian and others), but possibly they need to be
enhanced with a remote deployment/managament solution that is
cloud-friendly, for example the REST-based OSGi Framework management as
is being discussed in RFC 182.
2b. Topic: There was an additional use-case around reverting the data (and
configuration) changes made during an upgrade. If we need to downgrade
after an upgrade then we may need to convert the data/configuration back
into its old state.
2b. Follow-up: It might be possible to achieve this by providing an OSGi API to
snapshot the framework state. This API could allow the user to save the
current state and to retrieve a past saved state. When reverting to a
past deployment this operation could be combined with a
pluggable compensation process that converts the data back, if
applicable.
The idea of snapshotting the framework state will be explored in a new RFP that is to be created soon. The data compensation process itself is most likely out of scope for OSGi.
3. Topic: Come up with a common and agreed architecture for Discovery. This
should include consideration of Remote Services, Remote Events and
Distributed Configuration Admin.
3. Follow-up: This is the topic of the new RFC 183 Cloud Discovery.
4. Topic: Resource utilization. It should be possible to measure/report
this for each cloud node. Number of threads available, amount of memory,
power consumption etc… Possibly create OSGi Capability namespaces for this.
4. Follow-up: This relates to the monitoring RFP mentioned above.
5. Topic: OBR scaling. Need to be able to use OBR in a highly available
manner. Should support failover and should hook in with discovery.
5. Follow-up: The Repository service as defined in OSGi Enterprise R5 spec chapter 132
(see
http://www.osgi.org/News/20120326 for download instructions of the latest draft)
provides a stateless API which can work with well-known HA solutions
(replication, failover, etc). Additionally, the Repository supports the
concept of referrals, allowing multiple, federated repositories to be
combined into a single logical view.
The discovery piece is part of RFC 183.
6. Topic: We need subsystems across frameworks. Possibly refer to them as
'Ecosystems'. These describe a number of subsystems deployed across a
number of frameworks.
6. Follow-up: While the general usefulness of this isn't disputed, there is nobody at
this point in time driving this. If people strongly feel it should be
addressed they should come forward and help out with defining the
solution to addressing the issue.
7. Topic: Asynchronous services and asynchronous remote services.
7. Follow-up: This is the topic of RFP 132 which was recently restarted. RFP 132 is
purely about asynchronous OSGi services. Once this is established,
asynchronous remote services can be modeled as a layer on top.
8. Topic: Isolation and security for applications
- For multi-tenancy
- Protect access to file system
- Lifecycle handling of applications
- OBR - isolated OBR (multiple tenants should not see each other's OBR)
This all needs to be configurable.
8. Follow-up: Clearly separate VMs provide the best isolation, while separate JavaVMs
within a single OS-level VM also provide fairly strong isolation
(however, be aware of possible side effects of native code and possible
resource exhaustion). Nested OSGi Frameworks and Subsystem Regions also
provide isolation to a certain degree (see
Graham's post on Subsystems), but the level of protection that
is required clearly depends on the required security for the given
application. The deployer can choose from these options as a target
for deploying bundles and/or subsystems.
9. Topic: It should be possible to look at the cloud system state:
- where am I (type of cloud, geographical location)?
- what nodes are there and what is their state?
- what frameworks are available in this cloud system?
- where's my OBR?
- what state am I in?
- what do I need here in order to operate?
- etc…
9. Follow-up: This is part of what is being discussed in RFC 183 Cloud Discovery.
10. Topic: There should be a management mechanism for use in the cloud
- JMX? Possibly not
- REST? Most likely
Management of application state should also be possible in addition to
bundle/framework state
10. Follow-up: A cloud-friendly REST-based management API for the framework is currently being worked on in RFC 182. Once that is established it can also form the baseline for Subsystems management technology which can be used for application-level management.
11. Topic: Deployment - when deploying replicated nodes it should be
possible to specify that the replica should not be deployed on certain
nodes, to avoid that all the replicas are deployed on the same node.
11. Follow-up: This also relates to discovery as discussed in RFC 183. A management
agent can use this information to enforce such a constraint.
12. Topic: Single Sign-on for OSGi.
12. Follow-up: One member company has done a project in relation to this on top of the User Admin Service. A new RFP will be created to discuss this requirement further.
So there you are - the ideas from the cloud workshop were greatly appreciated and provide very useful input into future work. If you're interested in following the progress, as usual we're planning to release regular early access drafts of the documents that are relatively mature. Or, if you're interested in taking part in the creation of these specs, join in! For more information see:
http://www.osgi.org/About/Join or contact me (david at redhat.com) or anyone else active in OSGi for more information.