Highly available apps on OpenShift

One question we’re working through in OpenShift is how to make sure applications are highly available in the case of node host failure. The current implementation isn’t satisfactory because a single gear relies on its node host to function. Host goes down, gear goes down, app goes down.

We have scaled applications which expand the application out to multiple gears, but they have a single point of failure in the proxy layer (all requests go through one proxy gear). If there is a database cartridge to the app, that also is a single point of failure (we don’t offer database scaling yet). Finally, there’s no way to ensure that the gears don’t actually end up all on the same node host (except by administratively moving them). They are placed more or less randomly.

This is a hot topic of design debate internally, so look for a long-term solution to show up at some point. (Look for something to crystalize here.) What I want to talk about is: what can we do now?

If you have your own installation of OpenShift Origin or OpenShift Enterprise, here is one approach that may work for you.

  1.  Define a gear profile (or multiple) for the purpose of ensuring node host separation. It need not have different resource parameters, just a different name. Put the node(s) with this profile somewhere separate from the other nodes – a different rack, a different room, a different data center, a different Amazon EC2 region; whatever will satisfy your level of confidence criteria in what size failure you can expect your app to survive.
  2. When you create your app, do so twice: one for each gear profile. Here I’m supposing you’ve defined a gear profile “hagear” in addition to the default gear profile.
    $ rhc app create criticalApp python
    $ rhc app create criticalAppHA python -g hagear

    You can make them scaled apps if you want, but that’s a capacity concern, not HA.

  3. Now, develop and deploy your application. When you created “criticalApp” rhc cloned its git repository into the criticalApp directory. Code up your application there, commit it, and deploy with your normal git workflow. This puts your application live on the default gear size.
  4. Copy your git repository over to your HA gear application. This is a git operation and you can choose from a few methods, but I would just add the git remote to your first repository and push it straight from there:
    $ rhc app show criticalAppHA

    Output will include a line like:

    Git URL = ssh://3415c...@criticalAppHA-demo.example.com/~/git/criticalAppHA.git/
    

    … which you can just add as a remote and push to:

    $ cd criticalApp
    $ git add remote ha ssh://...
    $ git push ha master

    Now you have deployed the same application to a separate node with profile “hagear” and a different name.

  5. Load balance the two applications. We don’t have anything to enable this in OpenShift itself, but surely if you’re interested in HA you already have an industrial strength load balancer and you can add an application URL into it and balance between the two backend apps (in this example they would be http://criticalAppHA-demo.example.com/ and http://criticalApp-demo.example.com/). If not, Red Hat has some suitable products to do the job.

This should work just fine for some cases. Let me also discuss what it doesn’t address:

  • Shared storage/state. If you have a database or other storage as part of your application, there’s nothing here to keep them in sync between the multiple apps. We don’t have any way that I know of to have active/active or hot standby for database gears. If you have this requirement, you would have to host the DB separately from OpenShift and make it HA yourself.
  • Partial failures where the load balancer can’t detect that one of the applications isn’t really working, e.g. if one application is returning 404 for everything – you would have to define your own monitoring criteria and infrastructure for determining that each app is “really” available (though the LB likely has relevant capabilities).
  • Keeping the applications synchronized – if you push out a new version to one and forget the other, they could be out of sync. You could actually define a git hook for your origin gear git repo that automatically forwards changes to the ha gear(s), but I will leave that as an exercise for the reader.

It might be worth mentioning that you don’t strictly need a separate gear profile in order to separate the nodes your gears land on. You could manually move them (oo-admin-move) or just recreate them until they land on sufficiently separate nodes (this would even work with the OpenShift Online service). But that would be somewhat unreliable as administrators could easily move your gears to the same node later and you wouldn’t notice the lack of redundancy until there was a failure. So, separating by profile is the workaround I would recommend until we have a proper solution.

Advertisements