OpenShift scaling on OpenStack

Recently I worked on a video for our CTO’s keynote at the OpenStack Summit. You can watch it here:

https://www.openstack.org/summit/portland-2013/session-videos/presentation/keynote-openstack-at-red-hat

Actually the still for the video appears to be my goofy mug (at least for now) instead of Brian. My video starts at about 11:30 in. I actually just did the screenshots, script, and voice-over. The diagrams and music and editing and whatnot were other people :)

It is definitely still proof-of-concept but I thought I should point out where the code for scaling OpenShift like this (after a little cleanup) actually lives. It’s over here:

https://github.com/openshift/openshift-extras/tree/enterprise-1.1/node-manager

It’s intended to be pluggable, so it should be fairly easy to swap in another IaaS or client (having worked with it a bit, I don’t find the nova client particularly scriptable, probably would have gone another way if someone hadn’t done a lot of that work for me).

If anything, it highlights some of the complexities of trying to automate this. But people have been asking about automating this case, so here’s a first stab at it; improvements are most welcome.

OpenShift v2 cartridges: node host tools

There is a series starting on the OpenShift blog about the v2 cartridge format. Check it out. Way more official than whatever I write here.

Updated 2013-04-17 – updates marked below.

I introduced v2 cartridges in a previous post. If you have an OpenShift Origin node host running and you’ve toggled it to v2 mode, follow along.

When a user creates an application, that flows through the broker and mcollective to the node via the MCollective openshift.rb agent. You can shortcut that path if you want to create gears and configure cartridges into them more directly on the node host. None of the following involves the broker (so, of course, the broker will deny all knowledge of it if you ask).

Creating a gear

You can use the oo-app-create command on a node host to create a gear arbitrarily. Several parameters are required. You can run it with -h to see all the options. The main things you need are:

  1. An application UUID. This is a unique string that identifies the whole application; if the app has multiple gears, all will still share the application ID. Once the gear is created this goes in an environment variable.
  2. A gear UUID, which is referred to as a “container” ID. This is a unique string that identifies the gear. For single-gear apps, we typically just re-use the application ID, but that’s just convention.
  3. The application name – this would ordinarily be what a developer called their application.
  4. A namespace, which would ordinarily be the developer’s “domain”.

So the fun news is, since you don’t have to deal with the broker, you can totally make these up. There are a few requirements:

  1. The UUIDs do actually have to be unique, at least on the node host (they’re supposed to be unique across OpenShift). The broker just makes these up and instructs the node to use them.
  2. Gear UUID and name and namespace need to be all “word” characters. App name and UUID can be basically anything.
  3. Gear UUID will be used to create a system user, so it can’t violate the restrictions on that – e.g. it can’t be too long.

So, once you’ve made up your values, you can just create an empty gear like so:

# UUID=9vr98yvfr98ju23vfjuf
# DOMAIN=lm20130412
# NAME=testname
# oo-app-create --with-app-uuid $UUID \
--with-container-uuid $UUID \
--with-namespace $DOMAIN \
--with-app-name $NAME

Once you’ve done this, you check that the gear has been created with “id $UUID” and by looking in /var/lib/openshift for a directory of the same name.

A quick description of some of the optional parameters is in order:

  • --with-container-name is the name for the gear as opposed to the app – it just defaults to the app name if not specified. This is what is combined with the domain to create the DNS record for your gear – even if it’s a child gear in a scaled app, it will get its own DNS entry (although if you’re manually creating gears this way, the broker never knows to create the DNS entry so it’s rather moot).
  • --with-uid is used to specify a numeric user ID for the gear user – this is specified by the broker for nodes that are in a district; the UID is chosen from a pool that is available in the district and reserved for the gear regardless of which node in the district it lands on. So, it’s specified at the time the gear is created. If not specified, the node just picks an available one.

Distinguishing v1 and v2 gears

Even before we’ve done anything else with the new gear, it is marked as a v2 gear. Look at the files in the gear:

/var/lib/openshift/9vr98yvfr98ju23vfjuf/
├── app-root
│   ├── data
│   │   └── .bash_profile
│   ├── repo -> runtime/repo
│   └── runtime
│   ├── data -> ../data
│   ├── repo
│   └── .state
├── .env
│   ├── CARTRIDGE_VERSION_2
│   ├── HISTFILE
│   ├── HOME
│   ├── OPENSHIFT_APP_DNS
│   ├── OPENSHIFT_APP_NAME
│   ├── OPENSHIFT_APP_UUID
│   ├── OPENSHIFT_DATA_DIR
│   ├── OPENSHIFT_GEAR_DNS
│   ├── OPENSHIFT_GEAR_NAME
│   ├── OPENSHIFT_GEAR_UUID
│   ├── OPENSHIFT_HOMEDIR
│   ├── OPENSHIFT_REPO_DIR
│   ├── OPENSHIFT_TMP_DIR
│   ├── PATH
│   ├── TMP
│   ├── TMPDIR
│   └── TMP_DIR
├── .sandbox
│   └── 9vr98yvfr98ju23vfjuf
├── .ssh
└── .tmp

There is exactly one difference from a v1 gear: the CARTRIDGE_VERSION_2 env var. But that’s enough – the presence of this addition is used to decide whether to use v1 or v2 logic with cartridges in this gear.

Configuring a cartridge into the gear

So, let’s actually add a cartridge. You can do this with the oo-cartridge command. This is basically a convenience wrapper for manual testing – nothing in the product uses this script, but it is an entry point to the same code that actually is executed via MCollective to instantiate a cartridge in the gear.

# oo-cartridge -a add -c $UUID -n mock-0.1 -v -d
Cartridge add succeeded
Output:
-----------------------------
Creating version marker for 0.1

Although I’ve added the -v -d flags (verbose, debug) you can see there isn’t much output here. Without either flag you just get the first line (success or failure). The “verbose” flag adds the output from the start hook after the cartridge is added. The “debug” flag will give detailed output if there is an exception (otherwise it adds nothing). To see what is really going on, you’ll want to look at the platform logs.

The platform logs are configured in /etc/openshift/node.conf and located in /var/log/openshift/node/. I suggest for the purposes of understanding, set the platform.log level to INFO in order to understand the flow of what’s happening, and leave platform-trace.log at DEBUG level to consult for the actual bash commands and their results. If you were developing a cartridge, though, you’d probably want platform.log at DEBUG level (the default) to see the bash commands mixed in with the code-level flow.

Example INFO-level platform.log for the above (leaving off timestamps and such):

Creating cartridge directory 9vr98yvfr98ju23vfjuf/mock
Created cartridge directory 9vr98yvfr98ju23vfjuf/mock
Creating private endpoints for 9vr98yvfr98ju23vfjuf/mock
Created private endpoint for cart mock in gear 9vr98yvfr98ju23vfjuf: [OPENSHIFT_MOCK_EXAMPLE_IP1=127.0.251.129, OPENSHIFT_MOCK_EXAMPLE_PORT1=8080]
Created private endpoint for cart mock in gear 9vr98yvfr98ju23vfjuf: [OPENSHIFT_MOCK_EXAMPLE_IP1=127.0.251.129, OPENSHIFT_MOCK_EXAMPLE_PORT2=8081]
Created private endpoint for cart mock in gear 9vr98yvfr98ju23vfjuf: [OPENSHIFT_MOCK_EXAMPLE_IP1=127.0.251.129, OPENSHIFT_MOCK_EXAMPLE_PORT3=8082]
Created private endpoint for cart mock in gear 9vr98yvfr98ju23vfjuf: [OPENSHIFT_MOCK_EXAMPLE_IP2=127.0.251.130, OPENSHIFT_MOCK_EXAMPLE_PORT4=9090]
Created private endpoints for 9vr98yvfr98ju23vfjuf/mock
mock attempted lock/unlock on out-of-bounds entry [~/invalid_mock_locked_file]
Running setup for 9vr98yvfr98ju23vfjuf/mock
Ran setup for 9vr98yvfr98ju23vfjuf/mock
Creating gear repo for 9vr98yvfr98ju23vfjuf/mock from ``
Created gear repo for 9vr98yvfr98ju23vfjuf/mock
Processing ERB templates for /var/lib/openshift/9vr98yvfr98ju23vfjuf/mock/**
Connecting frontend mapping for 9vr98yvfr98ju23vfjuf/mock: => 127.0.251.129:8080 with options: {"websocket"=>true}
Connecting frontend mapping for 9vr98yvfr98ju23vfjuf/mock: /front1a => 127.0.251.129:8080/back1a with options: {}
configure output: Creating version marker for 0.1

platform-trace.log DEBUG-level output for the tail end of that is:

oo_spawn running service openshift-node-web-proxy reload: {:unsetenv_others=>false, :close_others=>true, :in=>"/dev/null", :out=>#<IO:fd 12>, :err=>#<IO:fd 9>}
oo_spawn buffer(11/) Reloading node-web-proxy:
oo_spawn buffer(11/) [
oo_spawn buffer(11/) OK
oo_spawn buffer(11/) ]
oo_spawn buffer(11/)
oo_spawn buffer(11/)
oo_spawn running /usr/sbin/httxt2dbm -f DB -i /etc/httpd/conf.d/openshift/nodes.txt -o /etc/httpd/conf.d/openshift/nodes.db-20130413-30162-z6g4cz/new.db: {:unsetenv_others=>false, :close_others=>true, :in=>"/dev/null", :out=>#<IO:fd 11>, :err=>#<IO:fd 8>}
oo_spawn running /usr/sbin/httxt2dbm -f DB -i /etc/httpd/conf.d/openshift/nodes.txt -o /etc/httpd/conf.d/openshift/nodes.db-20130413-30162-82d3xc/new.db: {:unsetenv_others=>false, :close_others=>true, :in=>"/dev/null", :out=>#<IO:fd 11>, :err=>#<IO:fd 8>}

In OpenShift you can compose an application by adding multiple cartridges, e.g. database or cron cartridges. The mock-plugin cartridge tests this functionality. You can use oo-cartridge to add this as well:

# oo-cartridge -a add -c $UUID -n mock-plugin-0.1 -d
Cartridge add succeeded
Output:
-----------------------------
Creating version marker for 0.1

You can check that your gear is up and running with curl. Your gear has been configured into the front-end proxy’s routing, even though its DNS record doesn’t exist. You can tell the proxy which gear to access by setting the host header:

curl http://localhost/ -IH "Host: $NAME-$DOMAIN.$CLOUD_DOMAIN"

(You can get CLOUD_DOMAIN from /etc/openshift/node.conf) Of course with the mock cartridge, there may not be much to see; another cartridge like php-5.3 or ruby-1.9 will have content by default.

Cucumber tests

The mock cartridge is a testing tool for putting the cartridge logic through its paces. Take a look at the cucumber tests in the origin-server repo to see that in action (the mock cartridge feature is controller/test/cucumber/cartridge-mock.feature).

Updated 04/17: even as I was writing this, the cartridge-mock.feature was split into platform-* features, e.g. platform-basic.feature. Look at those instead.

You can run these by checking out the origin-server git repo on your node host and running cucumber against the feature file you are interested in (of course you must have the cucumber gem installed):

# cucumber cartridge-mock.feature
Using RSpec 2
simple: 17816 ruby 1.9.3 2012-11-10 [x86_64-linux]
@runtime_other
Feature: V2 SDK Mock Cartridge
Scenario: Exercise basic platform functionality in isolation # cartridge-mock.feature:4
[...]

If you were developing a new v2 cartridge, BDD with a cucumber feature would probably be a better approach than the manual testing this post is demonstrating, especially for testing composing and scaling.

Logging into the gear

Update 04/17: Adding this section

Now that you have a gear with a cartridge or two in it, you might want to log in and look around, like you are used to with normal gears. Of course this really just means getting a login as the gear user. Normally you would do that with ssh, but you haven’t set up an ssh key for the gear yet. It’s easy to do that, but why bother? You can just use su, right?

# su - $UUID
Invalid context: unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023, expected unconfined_u:system_r:openshift_t:s0:c0,c502

Not so fast. The gear runs in a specialized SELinux context, and normal su doesn’t handle that. For this purpose you need oo-su:

# oo-su $UUID -c oo-trap-user

This will get you an ordinary gear-user login (preceded by a few error messages that don’t seem to harm anything). oo-trap-user is the login shell; of course, you don’t have to do that, you can use oo-su similarly to directly run any command in the context of the gear user.

Cleanup

You can remove a cartridge from a gear in much the same way it was added:

# oo-cartridge -a delete -c $UUID -n mock-plugin-0.1 -d
Cartridge delete succeeded
Output:
-----------------------------
# oo-cartridge -a delete -c $UUID -n mock-0.1 -d
Cartridge delete succeeded
Output:
-----------------------------

You’ll notice, though, that the gear is likely not left pristine. Cartridges leave log files and more even after removed. The base framework cartridges are particularly bad about this. You’ll even find that removing one framework cartridge and adding another may cause a failure. That’s because in real usage, framework cartridges are never removed. The whole gear is simply discarded:

# oo-app-destroy -a $UUID -c $UUID --with-app-name $NAME --with-namespace $DOMAIN

So, that is the simplest cleanup.

Working around the mysterious yum multilib error on an OpenShift Enterprise install

This one took a little while to track down and understand so I offer it here in the spirit of helping someone else get around this without that discovery process.

TL;DR

if when installing cartridges on an OpenShift Enterprise node host you get a yum error with “Error:  Multilib version problems found.” then make sure you have followed the steps in https://access.redhat.com/site/articles/316613 even if you don’t want the JBoss cartridges.

The problem

The example configure script we’ve provided for OpenShift Enterprise currently installs node cartridges as a single yum command which attempts to install all of them, but hedges its bets with the –skip-broken flag in case something doesn’t work out with one or more cartridges. This keeps you from having an install where no cartridges get installed just because some fiddling dependency problem blocked one of them and the others could have proceeded fine. This is very helpful during our development testing. For a production deployment it may or may not be, depending on how you define “helpful.”

So, I found that if I ran a production configuration and did not follow the directions at https://access.redhat.com/site/articles/316613 to work around the nature of our JBoss subscription dependencies, not only did JBoss cartridge installation fail, but I got this bizarre yum error message:

Error: Multilib version problems found. This often means that the root
cause is something else and multilib version checking is just
pointing out that there is a problem.

[...]

Protected multilib versions: zlib-1.2.3-27.el6.i686 != zlib-1.2.3-29.el6.x86_64

(The full message is similar to this post.) Reading between the lines there, what this error tells you is: There is some kind of dependency problem, we found it when trying to update zlib (and found that we were trying to install mismatched versions on the two architectures i686 and x86_64), but the real problem is probably somewhere way upstream. Which is about as useless an error message as you can get, but I understand where these things come from – it’s pretty hard to unravel the chain of consequences sometimes to give a helpful error message.

So, what the heck, right?

Also, if you try to install individual cartridges, they’ll work fine. If you remove the JBoss ones from the yum command, it works fine. It only fails on exactly the command that is run by default by the script.

Debugging

Running yum with the -v flag (verbose) gives a lot more info. I did not chase this down to the exact dependency chain but I gather the following describes what’s happening.

The first annoying question is, why is it even trying to install the i686 zlib in the first place? Everything I’m using is 64-bit. The next question is, why is it trying to install different versions? The -27 release is available for both architectures, why isn’t it just using that?

The clue comes in these lines of debugging output:

SKIPBROKEN: removing zlib-devel-1.2.3-29.el6.x86_64 from transaction 
SKIPBROKEN: removing zlib-devel-1.2.3-29.el6.x86_64 from pkgSack & updates

zlib-devel requires a zlib with the same version, and itself is required by freetype-devel, a requirement of the python cartridge.  It is also a dependency for libxml2-devel (required from the ruby cartridge) and openssl-devel (distant requirement for several cartridges). So let’s just say it is in the thick of things.

Unraveling the chain

I gather that what happened is this: With the channels broken, yum got deep into resolution and realized it could not install the JBoss cartridges. With the –skip-broken flag, yum removed from the transaction those cartridges and all of the package versions it had included for their dependencies; then it tried to resolve dependencies for the rest of the transaction with ALL of those packages not allowed in the transaction (as any could be the source of the problem). Apparently zlib-devel-1.2.3-29.el6.x86_64 was part of what was excised. But zlib-devel was still needed by other cartridges, and this is where it gets really bizarre.

openssl-devel-1.0.0-27.el6_4.2.x86_64 requires: zlib-devel
–> Processing Dependency: zlib-devel for package: openssl-devel-1.0.0-27.el6_4.2.x86_64
Searching pkgSack for dep: zlib-devel
TSINFO: Marking zlib-devel-1.2.3-27.el6.x86_64 as install for openssl-devel-1.0.0-27.el6_4.2.x86_64
[...]
—> Package zlib-devel.x86_64 0:1.2.3-27.el6 will be installed

[...]

zlib-devel-1.2.3-27.el6.x86_64 requires: zlib = 1.2.3-27.el6
–> Processing Dependency: zlib = 1.2.3-27.el6 for package: zlib-devel-1.2.3-27.el6.x86_64

Searching pkgSack for dep: zlib
Potential resolving package zlib-1.2.3-27.el6.x86_64 has newer instance installed.
TSINFO: Marking zlib-1.2.3-27.el6.i686 as install for zlib-devel-1.2.3-27.el6.x86_64
–> Running transaction check
—> Package zlib.i686 0:1.2.3-27.el6 will be installed

Because zlib-devel -29 was verboten, it fell back to zlib-devel -27. But that requires zlib -27, and -29 was already installed. You can’t have both installed at the same time, but yum saw a way to resolve the request by pulling in the zlib package from the i686 arch (zlib-devel did not specify arch in its requirement). So now yum wanted to install zlib-1.2.3-27.i686 on a system where zlib-1.2.3-29.x86_64 was already installed. It wasn’t allowed to downgrade the existing installation and can’t find any other way to satisfy the requirements under the restrictions imposed by having removed part of the transaction. So finally yum gave up and I got this confusing error about the multilib versions not matching, far from the source of the problem (which is, arguably, yum’s process for dependency resolution following pruning by –skip-broken).

Fixing the repo configuration (https://access.redhat.com/site/articles/316613) so that JBoss cartridges install cleanly fixes the problem. Not attempting to install them also fixes the problem.

Hopefully this provides some useful insight into this kind of problem, even for those who arrive here by completely different means.

The OpenShift cartridge refactor: a brief introduction

If you’re watching the commit logs over at OpenShift Origin you’ll see a lot of activity around “v2″ cartridges (especially a lot of “WIP” commits). For a variety of reasons we’re refactoring cartridges to make it easier to write and maintain them. We’re particularly interested in enabling those who wish to write cartridges, and part of that includes removing as much as possible from the current cartridge code that is really generic platform code and shouldn’t be boilerplate repeated in cartridges. And in general, we’re just trying to bring more sanity and remove opacity.

If you’ve fired up Origin lately you wouldn’t necessarily notice that anything has changed. The refactored cartridges are available in parallel with existing cartridges, and you have to opt in to use them. To do that, use the following command as root on a node host:

# oo-cart-version -c toggle
Node is currently in v1 mode
Switching node cartridge version
Node is currently in v2 mode

The node now works with the cartridges installed in /usr/libexec/openshift/cartridges/v2 (rather than the “v1″ cartridges in /usr/libexec/openshift/cartridges – BTW these locations are likely to change, watch the RPM packaging for clues). Aside from the separate cartridge location, there are logic branches for the two formats in the node model objects, most prominently in OpenShift::ApplicationContainer (application_container.rb under the openshift-origin-node gem) making a lot of calls against @cartridge_model which is either a V1CartridgeModel or a V2CartridgeModel object depending.

The logic branches are based on two things – for an existing gear, the cartridge format already present is used; otherwise, for new gears, the presence of a marker file /var/lib/openshift/.settings/v2_cartridge_format is checked (which is the main thing the command above changes) – if present, use v2 cartridges, otherwise use the old ones. In this way, the development and testing of v2 cartridges can continue without needing a fork / branch and without disrupting the use of v1 cartridges.

A word of warning, though: you can use gears with the v1 and v2 cartridges in parallel on the same node (toggle back and forth), but don’t try to configure an embedded cart from one format into a gear with the other. Also, do not set a different mode on different nodes in the same installation. Results of trying to mix and match that way are undefined, which is to say, probably super broken.

Let’s look around a bit.

# ls /usr/libexec/openshift/cartridges/
10gen-mms-agent-0.1 diy-0.1 jbossews-1.0 mongodb-2.2 phpmyadmin-3.4 rockmongo-1.1 zend-5.6
abstract embedded jbossews-2.0 mysql-5.1 postgresql-8.4 ruby-1.8
abstract-httpd haproxy-1.4 jenkins-1.4 nodejs-0.6 python-2.6 ruby-1.9
abstract-jboss jbossas-7 jenkins-client-1.4 perl-5.10 python-2.7 switchyard-0.6
cron-1.4 jbosseap-6.0 metrics-0.1 php-5.3 python-3.3 v2

# ls /usr/libexec/openshift/cartridges/v2
diy haproxy jbosseap jbossews jenkins jenkins-client mock mock-plugin mysql perl php python ruby

There look to be a lot fewer cartridges under v2, and that’s not just because they’re not all complete yet. Notice what’s missing in v2? Version numbers. You’ll see the same thing looking in the source at the cartridge source trees and package specs; you don’t have a single cartridge per version anymore. It’s possible to support multiple different runtimes from the same cartridge. This is evident if you look in the ruby cartridge. First, there’s the cartridge manifest:

# grep Version /usr/libexec/openshift/cartridges/v2/ruby/metadata/manifest.yml
Version: '1.9'
Versions: ['1.9', '1.8']
Cartridge-Version: 0.0.1

There’s a default version if none is specified when configuring the cartridge, but there are two versions available in the same cartridge. Also notice the separate directories for version-specific implementations:

# ls /usr/libexec/openshift/cartridges/v2/ruby/versions/
1.8 1.9 shared

So rather than have completely separate cartridges for the different versions, different versions can live in the same cartridge and directly share the things they have in common, while overriding the usually-minor differences. This doesn’t mean we’re going to see ruby versions 1.9.1, 1.9.2, 1.9.3, etc. – in general you’ll only want one current version of a supported branch, such that security and bug fixes can be applied without having to migrate apps to a new version. But it means we cut down on a lot of duplication of effort for multi-versioned platforms. We can put ruby 1.8, 1.9, and 2.0 all in one cartridge and share most of the cartridge code.

You might be wondering how to specify which version you get. I’m not sure what is planned for the future, but at this time I don’t believe the logic branches for v2 cartridges have been extended to the broker. Right now, if you look in /var/log/mcollective.log for the cartridge-list action, you’ll see the node is reporting two separate Ruby cartridges just like before, which are reported back to the client, and you still request app creation with the version in the cartridge:

$ rhc setup
...
Run 'rhc app create' to create your first application.
Do-It-Yourself rhc app create <app name> diy-0.1
 JBoss Enterprise Application Platform rhc app create <app name> jbosseap-6.0.1
 Jenkins Server rhc app create <app name> jenkins-1.4
 Mock Cartridge rhc app create <app name> mock-0.1
 PHP 5.3 rhc app create <app name> php-5.3
 Perl 5.10 rhc app create <app name> perl-5.10
 Python 2.6 rhc app create <app name> python-2.6
 Ruby rhc app create <app name> ruby-1.9
 Ruby rhc app create <app name> ruby-1.8
 Tomcat 7 (JBoss EWS 2.0) rhc app create <app name> jbossews-2.0
$ rhc app create rb ruby-1.8
...
Application rb was created.

If you look in v2_cart_model.rb, you’ll see there’s a FIXME that parses out the version from the base cart name to handle this – the FIXME is to note that this should really be specified explicitly in an updated node command protocol. So at this time, there’s no broker-side API change to pick which version from a cartridge you want. But look for that to change when v2 carts are close to prime time.

By the way, if you’re used to looking in /var/log/mcollective.log to see broker/node interaction, that’s still there (you probably want to set loglevel = info in /etc/mcollective/server.cfg) but a lot more details about the node actions that result from these requests are now recorded in /var/log/openshift/node/platform.log (location configured in /etc/openshift/node.conf). You can watch this to see exactly how mcollective actions translate into system commands, and use this to manually test actions against developing cartridges (see also the mock cartridge and the test cases against it).

You’ll notice if you follow some cartridge actions (e.g. “restart”) through the code that the v2 format has centralized a lot of functions into a few scripts. Where before, each action and hook resulted to a call to a separate script (often symlinked in from the “abstract” cartridge which anyone would admit, is kind of a hack):

# ls /usr/libexec/openshift/cartridges/ruby-1.8/info/{bin,hooks}

/usr/libexec/openshift/cartridges/ruby-1.8/info/bin:
app_ctl.sh build.sh post_deploy.sh ps threaddump.sh
app_ctl_stop.sh deploy_httpd_config.sh pre_build.sh sync_gears.sh
/usr/libexec/openshift/cartridges/ruby-1.8/info/hooks:
add-module deploy-httpd-proxy reload restart stop tidy
configure info remove-httpd-proxy start system-messages update-namespace
deconfigure move remove-module status threaddump

In the new format, these are just options on a few scripts:

# ls /usr/libexec/openshift/cartridges/v2/ruby/bin/
build control setup teardown

If you look at the mcollective requests and the code, you’ll see the requests haven’t changed, but the v2 code is just routing it to the new scripts. For instance, “restart” is now just an option to the “control” script above.

Those are just some of the changes that are in the works. The details are still evolving daily, too fast for me to keep track of frankly, but if you’re interested in what’s happening, especially interested in writing cartridges for OpenShift, you might like to dive into the existing documentation describing the new format:

https://github.com/openshift/origin-server/blob/master/node/README.writing_cartridges.md

Other documents in the same directory may or may not distinguish between v1 and v2 usage, but regardless should be useful, if sometimes out of date, reading.

Highly available apps on OpenShift

One question we’re working through in OpenShift is how to make sure applications are highly available in the case of node host failure. The current implementation isn’t satisfactory because a single gear relies on its node host to function. Host goes down, gear goes down, app goes down.

We have scaled applications which expand the application out to multiple gears, but they have a single point of failure in the proxy layer (all requests go through one proxy gear). If there is a database cartridge to the app, that also is a single point of failure (we don’t offer database scaling yet). Finally, there’s no way to ensure that the gears don’t actually end up all on the same node host (except by administratively moving them). They are placed more or less randomly.

This is a hot topic of design debate internally, so look for a long-term solution to show up at some point. (Look for something to crystalize here.) What I want to talk about is: what can we do now?

If you have your own installation of OpenShift Origin or OpenShift Enterprise, here is one approach that may work for you.

  1.  Define a gear profile (or multiple) for the purpose of ensuring node host separation. It need not have different resource parameters, just a different name. Put the node(s) with this profile somewhere separate from the other nodes – a different rack, a different room, a different data center, a different Amazon EC2 region; whatever will satisfy your level of confidence criteria in what size failure you can expect your app to survive.
  2. When you create your app, do so twice: one for each gear profile. Here I’m supposing you’ve defined a gear profile “hagear” in addition to the default gear profile.
    $ rhc app create criticalApp python
    $ rhc app create criticalAppHA python -g hagear

    You can make them scaled apps if you want, but that’s a capacity concern, not HA.

  3. Now, develop and deploy your application. When you created “criticalApp” rhc cloned its git repository into the criticalApp directory. Code up your application there, commit it, and deploy with your normal git workflow. This puts your application live on the default gear size.
  4. Copy your git repository over to your HA gear application. This is a git operation and you can choose from a few methods, but I would just add the git remote to your first repository and push it straight from there:
    $ rhc app show criticalAppHA

    Output will include a line like:

    Git URL = ssh://3415c...@criticalAppHA-demo.example.com/~/git/criticalAppHA.git/
    

    … which you can just add as a remote and push to:

    $ cd criticalApp
    $ git add remote ha ssh://...
    $ git push ha master

    Now you have deployed the same application to a separate node with profile “hagear” and a different name.

  5. Load balance the two applications. We don’t have anything to enable this in OpenShift itself, but surely if you’re interested in HA you already have an industrial strength load balancer and you can add an application URL into it and balance between the two backend apps (in this example they would be http://criticalAppHA-demo.example.com/ and http://criticalApp-demo.example.com/). If not, Red Hat has some suitable products to do the job.

This should work just fine for some cases. Let me also discuss what it doesn’t address:

  • Shared storage/state. If you have a database or other storage as part of your application, there’s nothing here to keep them in sync between the multiple apps. We don’t have any way that I know of to have active/active or hot standby for database gears. If you have this requirement, you would have to host the DB separately from OpenShift and make it HA yourself.
  • Partial failures where the load balancer can’t detect that one of the applications isn’t really working, e.g. if one application is returning 404 for everything – you would have to define your own monitoring criteria and infrastructure for determining that each app is “really” available (though the LB likely has relevant capabilities).
  • Keeping the applications synchronized – if you push out a new version to one and forget the other, they could be out of sync. You could actually define a git hook for your origin gear git repo that automatically forwards changes to the ha gear(s), but I will leave that as an exercise for the reader.

It might be worth mentioning that you don’t strictly need a separate gear profile in order to separate the nodes your gears land on. You could manually move them (oo-admin-move) or just recreate them until they land on sufficiently separate nodes (this would even work with the OpenShift Online service). But that would be somewhat unreliable as administrators could easily move your gears to the same node later and you wouldn’t notice the lack of redundancy until there was a failure. So, separating by profile is the workaround I would recommend until we have a proper solution.

OpenShift with dynamic host IPs?

From the time we began packaging OpenShift Enterprise, we made a decision not to support dynamic changes to host IP addresses. This might seem a little odd since we do demonstrate installation with the assumption that DHCP is in use; we just require it to be used with addresses pinned statically to host names. It’s not that it’s impossible to work with dynamic re-leasing; it’s just that it’s an unnecessary complication and potentially a source of tricky problems.

However, I’ve crawled all over OpenShift configuration for the last few months, and I can say with a fair amount of confidence that it’s certainly possible to handle dynamic changes to host IP, as long as those changes are tracked by DNS with static hostnames.

But there are, of course, a number of caveats.

First off, it should be obvious that DNS must be integrated with DHCP such that hostnames never change and always resolve correctly to the same actual host. Then, if configuration everywhere uses hostnames, it should in theory be able to survive IP changes.

The most obvious exception is the IP(s) of the nameserver(s) themselves. In /etc/resolv.conf clearly the IP must be used, as it’s the source for name resolution, so it can’t bootstrap itself. However, in the unlikely event that nameservers need to re-IP, DHCP could make the transition with a bit of work. You could not use our basic dhclient configuration that statically prepends the installation nameserver IP – instead the DHCP server would need to supply all nameserver definitions, and there would be some complications around the transition since not all hosts would renew leases at the same time. Really, this would probably be the province of a configuration management system. I trust that those who need to do such a thing have thought about it much more than I have.

Then there’s the concern of the dynamic DNS server that OpenShift publishes app hostnames to. Well, no reason that can’t be a hostname as well, as long as the nameserver supplied by DHCP/dhclient knows how to resolve it. Have I mentioned that you should probably implement your host DNS separately from the dynamic app DNS? No reason they need to use the same server, and probably lots of reasons not to.

OK, maybe you’ve looked through /etc/openshift/node.conf and noticed the PUBLIC_IP setting in there. What about that? Well, I’ve tracked that through the code base and as far as I can tell, the only thing it is ever used for is to create a log entry when gears are created. In other words, it has no functional significance. It may have in the past – as I understand it, apps used to be created with A records rather than as CNAMEs to the node hosts. But for now, it’s a red herring.

Something else to worry about are iptables filters. In our instructions we never demonstrate filters for specific addresses, but conscientious sysadmins would likely limit connections to the hosts that are expected to need them in many cases. And they would be unlikely to define them using hostnames. So either don’t do that… or have a plan for handling IP changes.

One more caveat: what do we mean by dynamic IP changes? How dynamic?

If we’re talking about the kind of IP change where you shut down the host (perhaps to migrate its storage) and when it is booted again, it has a new IP, then that use case should be handled pretty well (again, as long as all configured host references use the hostname). This is the sort of thing you would run into in Amazon EC2 where hosts keep their IP as long as they stay up, but when shut down generally get a new IP. All the services on the host are started with the proper IP in use.

It’s a little more tricky to support IP changes while the host is operating. Any services that bind specifically to the external IP address would need restarting. I’ve had a look while writing this, though, and this is a lot less common than I expected. As far as I can see, only one node host service does that: haproxy (which is used by openshift-port-proxy to proxy specific ports from the external interface back to gear ports). The httpd proxy listens to all interfaces so it’s exempt, and individual gears listen on internal interfaces only. On the broker and supporting hosts, ActiveMQ and MongoDB either listen locally or to all interfaces. The nameserver, despite being configured to listen to “all”, appears to bind to specific IPs, so it looks like it would need a restart. You could probably configure dhclient to do this when a lease changes (with appropriate SELinux policy changes to permit it). But you can see how easy this would be to get wrong.

Hopefully this brief exploration of the issues involved demonstrates why we’re going to stick with the current (non-)support policy for the time being. But I also expect some of you out there will try out OpenShift in a dynamic IP environment, and if so I hope you’ll let me know what you run into.

Break explained, more posts coming

The blog has been pretty quiet lately. It’s not just because I rejoined Red Hat in May and began working on OpenShift Enterprise. I mean, that does keep me busy, but actually I have to work that in around the edges sometimes. My wife and I had our second child in June. The child is fine, but the wife has not been fine at all, so in addition to juggling a toddler and an infant (and childcare for same), I’ve been going to doctors visits, ER visits, hospital visits, caretaking, and so forth. And when I had a spare minute where nobody needed me, like at 2am, I’d actually try to get some work done. Needless to say, blogging didn’t come high on the list of priorities.

I think this is slowly changing, and I hope I’ll have time soon to talk about some of the things I’ve discovered while working on OpenShift. It’s been fun and educational.

Follow

Get every new post delivered to your Inbox.

Join 181 other followers