Monday, 18 October 2010

OSGi & The Cloud (Part 2)

This is the second blog entry in a series documenting the underlying points I made in my recent talk at the OSGi Community Event in London.  Entitled "OSGi And Private Cloud",  the slides are available here and the agenda is as follows:
  • Where is Cloud computing today? (Part 1)
  • Where does OSGi fit in the Cloud architecture?
  • What are the challenges of using OSGi in the Cloud?
  • What does an OSGi Cloud platform look like?
In this section of the talk I look at where OSGi fits into the Cloud architecture. However, as the community event was co-hosted with JAX London it wasn't a given that everyone at my talk would know OSGi. This is also possibly true for others reading this blog, so to make sure we're all starting from a similar page, I'll briefly explain the basics of what OSGi is about for those who have not come across it before.

    OSGi A Quick Review

    I've been working with Richard Hall, Karl Pauls and Stuart McCulloch on writing OSGi In Action which explains OSGi from first principles to advanced use cases, so if you want to know more that's a good place to look. However, here I'd like to give my elevator pitch for OSGi which would be something like as follows...

    OSGi first became a standard in 1999 and provides a set of specifications for building dynamic modular Java applications. It has success stories in every area of Java development from embedded devices, though desktop applications to enterprise applications. The core features that OSGi provides a Java application are:
    • Modules - the building blocks from which to create applications
    • Life cycle - control when modules are installed or uninstalled and customise their behaviour when they are activated
    • Services - minimal coupling between modules
    You might say that none of these are new ideas, so why is OSGi important? The key is in the standardisation of these fundamental axioms of Java applications. Instead of every software stack having a new and inventive way of wiring classloaders together, booting components, or connecting component A to component B, OSGi provides a minimal flexible specification that allows us to get interoperability between modules and let developers get on with the interesting part of building applications.

    An Uncomfortable Truth

    To see where OSGi fits into the Cloud story it's worth taking a brief segue to consider a point made by Kirk Knoernschild at the OSGi community event in February this year. Namely that we are generating more and more code with every passing day:
    • Lines of code double every 7 years
    • 50% of development time spent understanding code
    • 90% of software cost is maintenance and evolution
    By 2017, we'll have written not only double the amount of code written in the past 7 years but more than the total amount of code ever written combined! Object Orientation has helped in encapsulating our code so that changes in private implementation details do not effect consumers. But in fact OO turns out to be just a stop gap and it is reaching the limits of its capabilities. If you refactor public objects or methods you still need to worry about who is consuming these and without modules this can be a hard question to answer.

    Eric Newcomer of Credit Suisse gave another good talk at the recent community event on the scale of software development at the bank. The message I took away from this presentation is that within any large organisation, one can probably locate an example of virtually any computer algorithm ever conceived (in fact if you look hard enough you will more than likely find two). If we look out to small and medium sized organisations, sure they won't have pre-canned examples but any developer worth his salt can probably; knock up some approximation within a couple of days, find an open source library to do the same job, or part with some money to a vendor to get the job done.

    The message from these two presentations is, that as we move into the era of Cloud computing the real problem is not how to author code but how to manage and reuse code and to do so at scale. As businesses grow and Cloud makes hardware cheaper and cheaper to use, market competition is driving computer software to larger and larger scales to cope with increased processing, network and storage volumes. This scaling tends to lead to more complexity and often in an exponential relationship.  But what do I mean by scale when talking about software in the Cloud? And how do we tame the complexity versus scale curve?

    Types of Scale

    There are three measures of scale that I think are of relevance to this discussion of OSGi and the Cloud:
    • Operational scale - the number of processors, network interfaces, storage options required to perform a function
    • Architectural scale - the number and diversity of software components required to make up a system to perform a function
    • Administrative scale - the number of configuration options that our architectures and our algorithms generate
    In fact, I think we've got pretty good patterns by now for dealing with the operational scale. As we increase the number of physical resources at our disposal, this drives the class of software algorithms required to perform a function. To pick a random selection Actors, CEP, DHTs and Grid are just some of the useful software patterns for use in the Cloud. However, I think architectural and administrative scale is often less well managed.

    In terms of architectural scale, think about all the libraries we have to perform similar functions; logging, data access layers, RPC frameworks, web frameworks. How many of the millions of lines of code that we are generating are boilerplate copies of each other? Redundant architecture is a major problem in the growth of software. Which parts are really providing value to the business? Which parts are harmless clones of each other? Which parts are a maintenance cost that should be replaced? We employ abstractions to protect ourselves from underlying implementations but these abstractions can themselves become maintenance costs. I would argue that as software engineers we are suffering from the paradox of choice.

    When managing code we need to worry about updating code, as code is very rarely a static entity; bugs are fixed, new APIs are created, old ones are deprecated and removed. With the volume of code in existence, we need mechanisms to manage the complexity created by the constant churn of logic that makes up our business systems. This leads us onto the problems of administrative scale.
    Administrative scale hampers our ability to reason about and evolve deployed systems. The human brain has evolved to deal with relatively small connected graphs. But software today consists of multiple configuration options - libraries that implement APIs, network configurations, storage configurations, queue depths, the list is endless. When we look at the interconnected nature of many software architectures, do we really know what the impact of changing parts of the configuration will have?

    All this brings me to...

    OSGi Cloud Benefits

    In Part 1 of this series of blogs I mentioned that the Nist definition of a cloud includes the statement that: "Cloud software takes full advantage of the cloud paradigm by being service oriented with a focus on statelessness, low coupling, modularity and semantic interoperability", to my mind OSGi has these bases covered.

    OSGi is a specification for modular Java that encourages low coupling via the use of services and it certainly allows you to build stateless applications. OSGi also promotes semantic interoperability via the fact that the code runs in a JVM and is abstracted from the underlying platform. Higher order levels of interoperability can easily be enabled using API and implementation modules that provide service abstractions around common platform functions, such as accessing data or scheduling tasks.

    But why should cloud software have these features?

    Returning to my theme of scale and complexity from the previous section, modularity and service orientated architectures enable encapsulation of coherent components to help reduce architectural complexity. Semantic interoperability aids in the war against administrative complexity as the same code can run no matter what hardware or network environment it is deployed in. Finally, stateless architectures are just a good design goal for dealing with production scale.

    OK interesting, but you might say that "TechnologyX (pick your favourite) can also provide these features, so really sell me on the OSGi cloud benefits". In which case I propose that there are four additional benefits of OSGi with respect to Cloud software which I'll deal with in turn:

    Dynamic: Clouds can be tempestuous environments with latency and contention being major factors. In these sorts of environments software that is designed to cope with runtime dependencies coming and going is more robust than static architectures. Consider the analogy with civil engineering and bridge or aeroplane wing design - rigid architectures are more fragile than those that incorporate degrees of flexibility. The Remote Services chapter from OSGi 4.2 specification promotes a discovery based services API. Thus, if a service dependency is lost due to movements in the Cloud then client code does not go into a spin making socket connect timeouts, it just gracefully moves into what ever state makes most sense and the rest of the processing can continue.

    Extensible: Clouds are all about expansion so following good XP principles when you start a new project you should only develop the parts of the application you actually know are needed. However, versioned module dependencies and service interfaces mean that you can easily abstract or update simple implementations as the application demands grow.

    Lightweight: Clouds are meant be light, right? OSGi promotes modular design.  Modular designs in turn allow you to tune a software deployment to the actual task at hand. OSGi lifecycle and dynamic services patterns even allow us to extend an application at runtime. This enables all sorts of interesting new use cases, for example:
    • if you need to get diagnostics information out of the software, only deploy the diagnostics components for the time that they are needed - for the rest of the time run lean
    • if you need to scale up a certain component's processing power, swap an in-memory job queue for a distributed processing queue and when you're done swap it back again.

    Self describing: OSGi bundles contain a description of the module in their jar manifest files. This helps in the war against administrative complexity, notably via automation and audit. Just as an OSGi framework can validate that a bundle has been deployed with all its necessary dependencies, it is also possible to reverse this process and download required dependencies automatically. There are several implementations of this pattern already in use in OSGi today; Nimble, OBR and P2. This simplifies deployments by allowing software engineers to focus on what they want to deploy instead of what they need to deploy.

    In terms of audit, OSGi bundles have a number of standardised headers to describe meta features such as name, description, license and documentation, so if you want to find out about a piece of software in a bundle just look at its manifest information. Once you get into this frame of thinking, other meta data such as author, build date, business unit also make sense to be embedded into the bundle. This sort of meta information can greatly benefit system admins and system builders in the future.

    OSGi Cloud Services

    To conclude this post, assuming I've managed to convince you of the benefits of OSGi in Cloud architectures, here are some ideas for potential cloud OSGi services (definitely non exhaustive):
    • MapReduce services - Hadoop or Bigtable implementations?
    • Batch services - Plug and play Grids?
    • NoSQL services - Scalable data for the Cloud!
    • Communications services - Email, Calendars, IM, Twitter?
    • Social networking services - Cross platform widgets?
    • Billing services - Making money in the Cloud!
    • AJAX/HTML 5.0 services - Pluggable UI architectures?
    These would enable developers to start building modular, dynamic, scalable applications for the Cloud and are in fact pretty simple to achieve if there's the will power to make it happen.

    I think OSGi provides an excellent foundation for building Cloud software. There are things it doesn't do but its extensible nature means that it is very easy to build additional tools on top of it and really start addressing the problems of scale. I'll look at some of tools I've been working on in this area in the final post.

    So all good right? Well there are still of course challenges, so in the next post I'll look at some of these and discuss how to overcome these. In the meantime, I'm very interested in any feedback on the ideas found in this post.


    Friday, 8 October 2010

    OSGi & The Cloud (Part 1)

    I recently attended the OSGi Community Event where I gave a talk entitled "OSGi And Private Cloud" the slides for which are available here. However as has been pointed out, if you watch the slide deck they're a little on the zen side, so if you weren't at the event then it's a bit difficult to guess the underlying points I was trying to make.

    To address this I've decided to create a couple of blog entries that discuss the ideas I was trying to get across. Hopefully this will be of interest to others.

    In the talk the agenda was as follows:
    • Where is Cloud computing today?
    • Where does OSGi fit in the Cloud architecture? (Part 2)
    • What are the challenges of using OSGi in the Cloud?
    • What does an OSGi cloud platform look like?
    I'll stick to this flow but break these sections up into separate blog entries. So here goes with the first section...

    Where is Cloud computing today?

    Ironically Cloud computing is viewed by many as a pretty nebulous technology so before even describing where Cloud computing is, it's possibly useful to define what Cloud computing is.
    • Wikipedia defines a Cloud as: "Internet-based computing, whereby shared resources, software, and information are provided to computers and other devices on demand, like the electricity grid".
    • InfoWorld defines Cloud as: "[sic] a way to increase capacity or add capabilities on the fly without investing in new infrastructure, training new personnel, or licensing new software. Cloud computing encompasses any subscription-based or pay-per-use service that, in real time over the Internet, extends IT's existing capabilities".
    • NIST defines (see the "NIST Definition of Cloud Computing" link) a Cloud as: "a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction"
    For me all of these definitions seem pretty similar to Utility computing. Again wikipedia defines Utility computing as "[sic] the packaging of computing resources, such as computation, storage and services, as a metered service similar to a traditional public utility (such as electricity, water, natural gas or telephone network)". So what is the boundary between Utility computing and Cloud computing? Others have attempted to define Cloud by what it is not. I tend to agree with some of these points but not others.

    So where does this leave us…?

    Actually I think the NIST definition I referred to above does a good job of describing what Cloud is as long as you read past the first sentence. For me the summary of this document is that:

    Cloud is computation that is: on demand; easily accessed from a network; with pooled resources; and rapid elasticity.

    As a final foot note in the NIST document it mentions that:

    "Cloud software takes full advantage of the cloud paradigm by being service oriented with a focus on statelessness, low coupling, modularity and semantic interoperability".

    This last sentence for me is of fundamental importance when considering the relevance OSGi to the Cloud.

    I'm tempted to break off from the flow of the slides here and leap to the conclusion but I've set my self a goal of explaining my presentation. So before I explore this point further I'll continue with the slide deck as presented but rest assured this point will be returned to.

    Why Cloud?

    So as the previous discussion suggests, the reason for using a Cloud model is it gives users just in time:
    • Processing power
    • Storage capacity
    • Network capacity.
    Clouds models are great for small, medium and large organisations.
    • Small organisations benefit from the reduced startup costs of Clouds compared with setting up and provisioning home grown infrastructure for web sites, email, accounting software, etc.
    • Medium sized organisations benefit from the on demand nature of Clouds - as their business grows so can their infrastructure
    • Large organisations benefit from Cloud due to their shared resources - instead of having to maintain silos of computing infrastructure for different departments they can get cost savings via economy of scale.
    There are a large number of vendors touting Cloud products, including Amazon, Google, Salesforce, Rackspace, Microsoft, IBM, VMware and Paremus. These products fit into various categories of Cloud, IAAS (Infrastructure as a Service), PAAS (Platform as a Service), SAAS (Software as a Service) and Public or Private Cloud.

    Cloud Realities

    So Cloud seems pretty utopian right? In fact despite promise of Cloud the realities it delivers are somewhat different.
    • As there are so many vendors, there are also multiple APIs that developers need to code to for simple things they used to do like loading resources
    • Depending on the vendor the sort of things you can do in a Cloud are often limited (in Google App Engine you can't create Threads for example)
    • Some vendors package Clouds as VM images and for simple deployments this is perfectly ok. However, as the number and variation of applications changes the course grained nature of VM images is a prohibitive factor. The system administrator needs to build and manage lots of different VM image configurations.
    Finally, and this is a factor that effects all Clouds, they are not - despite marketing - infinite resources (pdf). Contention and latency are real problems in Cloud environments. The the shared nature of Cloud architectures means that SLAs can be severely impacted by seemingly random processing spikes by other tenants. Cloud providers employ many different tactics to minimise these problems but running an application in the Cloud and running it on dedicated hardware is not a seamless transition.

    Why Private Cloud?

    If Clouds are about shared resources and mimic utility services like electricity, gas or water, why would organisations choose to host their own Clouds? Well, it basically comes down to a healthy dose of paranoia. For large organisations the risks associated with Public Cloud are just too great for many of their business processes. Concerns often touted in the industry are:
    • Data ownership risks – A bank for example is often extremely reluctant to host private customer details on infrastructure they don't own. This can be for legal/regulatory reasons or business intelligence reasons
    • Data inertia – I've heard one horror story at a previous Cloud Camp in London about a Web2.0 portal that had started their business in the Cloud (I forget which one). However once it got successful its users were creating data in the Cloud faster than they could offload it, meaning they had a hell of a job moving providers once they started. Not a good position to be in from a price/negotiation perspective
    • API lock-in – The requirement to code to vendor specific APIs is a major problem in terms of vendor lock-in. As a small company the cost of starting out is pretty minimal but once the business starts to scale, finding you are locked into an uncompetitive pricing plan is obviously not good for business
    • SLA – The contention and latency issues of Clouds can mean that for those businesses that are are in a competitive compute-intensive business then any downtime or latency outside of your control can have a major effect on your bottom line.
    Private Cloud implies all of the on-demand, dynamic, network accessible goodness, but in a controlled environment where the business has direct control of the cloud tentants, so can better control their SLA. A bit like owning a well, or growing your own food, there are costs but also benefits.

    This pretty much takes me to the end of the beginning. I left this section with a question posed in the form of a set of images (which I've used in this blog entry). The Cloud vision is certainly an enticing one and there are already a lot of commercial benefits to using Cloud in day-to-day business. However, there are definitely still problems. The final question is...

    How Do We Get Here?

    This is M51a “The whirlpool galaxy” discovered by Charles Messier in 1774 (and its companion galaxy NGC 5195).

    I came at computing from the physics angle and when I think of computer software/architecture I tend to think in terms of patterns. A galaxy is just a cloud of gas after all - but there is structure, dynamicity and mechanics that describe their overall behaviour!
    • Homogeneous Cloud deployments can be inefficient – sometimes local processing is just faster than distributed algorithms. We need fidelity in our Cloud deployments, but with fidelity comes administrative cost. Dealing with fine grained structure on the scale of millions (or even tens of thousands) of nodes requires new models
    • Clouds are dynamic, resources come and go, their can be gaps in communication caused by latency, their can even be large scale events like data centre collapse. Software that is deployed on them must be able to cope with these dynamics
    • As anyone who has worked on any large or even medium sized software project knows, software is often a tangled web of interdependencies. Changing the processing laws in local parts of an architecture can have drastic effects on distant components. We need ways of encapsulating and segmenting Cloud software to allow for updates.

    There we go, I'll try to write up the next section "Where does OSGi fit in the cloud" soon. But to look briefly into the future (or the past - depending on your perspective), I think we need to use modularisation and abstraction to help tame the complexity/scale curve and this is where OSGi fits into the Cloud model.

    In the mean time, I'm very interested in any comments or feedback on any of the ideas discussed here.