OpenShift scaling on OpenStack

Recently I worked on a video for our CTO’s keynote at the OpenStack Summit. You can watch it here:

https://www.openstack.org/summit/portland-2013/session-videos/presentation/keynote-openstack-at-red-hat

Actually the still for the video appears to be my goofy mug (at least for now) instead of Brian. My video starts at about 11:30 in. I actually just did the screenshots, script, and voice-over. The diagrams and music and editing and whatnot were other people :)

It is definitely still proof-of-concept but I thought I should point out where the code for scaling OpenShift like this (after a little cleanup) actually lives. It’s over here:

https://github.com/openshift/openshift-extras/tree/enterprise-1.1/node-manager

It’s intended to be pluggable, so it should be fairly easy to swap in another IaaS or client (having worked with it a bit, I don’t find the nova client particularly scriptable, probably would have gone another way if someone hadn’t done a lot of that work for me).

If anything, it highlights some of the complexities of trying to automate this. But people have been asking about automating this case, so here’s a first stab at it; improvements are most welcome.

OpenShift v2 cartridges: node host tools

There is a series starting on the OpenShift blog about the v2 cartridge format. Check it out. Way more official than whatever I write here.

Updated 2013-04-17 – updates marked below.

I introduced v2 cartridges in a previous post. If you have an OpenShift Origin node host running and you’ve toggled it to v2 mode, follow along.

When a user creates an application, that flows through the broker and mcollective to the node via the MCollective openshift.rb agent. You can shortcut that path if you want to create gears and configure cartridges into them more directly on the node host. None of the following involves the broker (so, of course, the broker will deny all knowledge of it if you ask).

Creating a gear

You can use the oo-app-create command on a node host to create a gear arbitrarily. Several parameters are required. You can run it with -h to see all the options. The main things you need are:

  1. An application UUID. This is a unique string that identifies the whole application; if the app has multiple gears, all will still share the application ID. Once the gear is created this goes in an environment variable.
  2. A gear UUID, which is referred to as a “container” ID. This is a unique string that identifies the gear. For single-gear apps, we typically just re-use the application ID, but that’s just convention.
  3. The application name – this would ordinarily be what a developer called their application.
  4. A namespace, which would ordinarily be the developer’s “domain”.

So the fun news is, since you don’t have to deal with the broker, you can totally make these up. There are a few requirements:

  1. The UUIDs do actually have to be unique, at least on the node host (they’re supposed to be unique across OpenShift). The broker just makes these up and instructs the node to use them.
  2. Gear UUID and name and namespace need to be all “word” characters. App name and UUID can be basically anything.
  3. Gear UUID will be used to create a system user, so it can’t violate the restrictions on that – e.g. it can’t be too long.

So, once you’ve made up your values, you can just create an empty gear like so:

# UUID=9vr98yvfr98ju23vfjuf
# DOMAIN=lm20130412
# NAME=testname
# oo-app-create --with-app-uuid $UUID \
--with-container-uuid $UUID \
--with-namespace $DOMAIN \
--with-app-name $NAME

Once you’ve done this, you check that the gear has been created with “id $UUID” and by looking in /var/lib/openshift for a directory of the same name.

A quick description of some of the optional parameters is in order:

  • --with-container-name is the name for the gear as opposed to the app – it just defaults to the app name if not specified. This is what is combined with the domain to create the DNS record for your gear – even if it’s a child gear in a scaled app, it will get its own DNS entry (although if you’re manually creating gears this way, the broker never knows to create the DNS entry so it’s rather moot).
  • --with-uid is used to specify a numeric user ID for the gear user – this is specified by the broker for nodes that are in a district; the UID is chosen from a pool that is available in the district and reserved for the gear regardless of which node in the district it lands on. So, it’s specified at the time the gear is created. If not specified, the node just picks an available one.

Distinguishing v1 and v2 gears

Even before we’ve done anything else with the new gear, it is marked as a v2 gear. Look at the files in the gear:

/var/lib/openshift/9vr98yvfr98ju23vfjuf/
├── app-root
│   ├── data
│   │   └── .bash_profile
│   ├── repo -> runtime/repo
│   └── runtime
│   ├── data -> ../data
│   ├── repo
│   └── .state
├── .env
│   ├── CARTRIDGE_VERSION_2
│   ├── HISTFILE
│   ├── HOME
│   ├── OPENSHIFT_APP_DNS
│   ├── OPENSHIFT_APP_NAME
│   ├── OPENSHIFT_APP_UUID
│   ├── OPENSHIFT_DATA_DIR
│   ├── OPENSHIFT_GEAR_DNS
│   ├── OPENSHIFT_GEAR_NAME
│   ├── OPENSHIFT_GEAR_UUID
│   ├── OPENSHIFT_HOMEDIR
│   ├── OPENSHIFT_REPO_DIR
│   ├── OPENSHIFT_TMP_DIR
│   ├── PATH
│   ├── TMP
│   ├── TMPDIR
│   └── TMP_DIR
├── .sandbox
│   └── 9vr98yvfr98ju23vfjuf
├── .ssh
└── .tmp

There is exactly one difference from a v1 gear: the CARTRIDGE_VERSION_2 env var. But that’s enough – the presence of this addition is used to decide whether to use v1 or v2 logic with cartridges in this gear.

Configuring a cartridge into the gear

So, let’s actually add a cartridge. You can do this with the oo-cartridge command. This is basically a convenience wrapper for manual testing – nothing in the product uses this script, but it is an entry point to the same code that actually is executed via MCollective to instantiate a cartridge in the gear.

# oo-cartridge -a add -c $UUID -n mock-0.1 -v -d
Cartridge add succeeded
Output:
-----------------------------
Creating version marker for 0.1

Although I’ve added the -v -d flags (verbose, debug) you can see there isn’t much output here. Without either flag you just get the first line (success or failure). The “verbose” flag adds the output from the start hook after the cartridge is added. The “debug” flag will give detailed output if there is an exception (otherwise it adds nothing). To see what is really going on, you’ll want to look at the platform logs.

The platform logs are configured in /etc/openshift/node.conf and located in /var/log/openshift/node/. I suggest for the purposes of understanding, set the platform.log level to INFO in order to understand the flow of what’s happening, and leave platform-trace.log at DEBUG level to consult for the actual bash commands and their results. If you were developing a cartridge, though, you’d probably want platform.log at DEBUG level (the default) to see the bash commands mixed in with the code-level flow.

Example INFO-level platform.log for the above (leaving off timestamps and such):

Creating cartridge directory 9vr98yvfr98ju23vfjuf/mock
Created cartridge directory 9vr98yvfr98ju23vfjuf/mock
Creating private endpoints for 9vr98yvfr98ju23vfjuf/mock
Created private endpoint for cart mock in gear 9vr98yvfr98ju23vfjuf: [OPENSHIFT_MOCK_EXAMPLE_IP1=127.0.251.129, OPENSHIFT_MOCK_EXAMPLE_PORT1=8080]
Created private endpoint for cart mock in gear 9vr98yvfr98ju23vfjuf: [OPENSHIFT_MOCK_EXAMPLE_IP1=127.0.251.129, OPENSHIFT_MOCK_EXAMPLE_PORT2=8081]
Created private endpoint for cart mock in gear 9vr98yvfr98ju23vfjuf: [OPENSHIFT_MOCK_EXAMPLE_IP1=127.0.251.129, OPENSHIFT_MOCK_EXAMPLE_PORT3=8082]
Created private endpoint for cart mock in gear 9vr98yvfr98ju23vfjuf: [OPENSHIFT_MOCK_EXAMPLE_IP2=127.0.251.130, OPENSHIFT_MOCK_EXAMPLE_PORT4=9090]
Created private endpoints for 9vr98yvfr98ju23vfjuf/mock
mock attempted lock/unlock on out-of-bounds entry [~/invalid_mock_locked_file]
Running setup for 9vr98yvfr98ju23vfjuf/mock
Ran setup for 9vr98yvfr98ju23vfjuf/mock
Creating gear repo for 9vr98yvfr98ju23vfjuf/mock from ``
Created gear repo for 9vr98yvfr98ju23vfjuf/mock
Processing ERB templates for /var/lib/openshift/9vr98yvfr98ju23vfjuf/mock/**
Connecting frontend mapping for 9vr98yvfr98ju23vfjuf/mock: => 127.0.251.129:8080 with options: {"websocket"=>true}
Connecting frontend mapping for 9vr98yvfr98ju23vfjuf/mock: /front1a => 127.0.251.129:8080/back1a with options: {}
configure output: Creating version marker for 0.1

platform-trace.log DEBUG-level output for the tail end of that is:

oo_spawn running service openshift-node-web-proxy reload: {:unsetenv_others=>false, :close_others=>true, :in=>"/dev/null", :out=>#<IO:fd 12>, :err=>#<IO:fd 9>}
oo_spawn buffer(11/) Reloading node-web-proxy:
oo_spawn buffer(11/) [
oo_spawn buffer(11/) OK
oo_spawn buffer(11/) ]
oo_spawn buffer(11/)
oo_spawn buffer(11/)
oo_spawn running /usr/sbin/httxt2dbm -f DB -i /etc/httpd/conf.d/openshift/nodes.txt -o /etc/httpd/conf.d/openshift/nodes.db-20130413-30162-z6g4cz/new.db: {:unsetenv_others=>false, :close_others=>true, :in=>"/dev/null", :out=>#<IO:fd 11>, :err=>#<IO:fd 8>}
oo_spawn running /usr/sbin/httxt2dbm -f DB -i /etc/httpd/conf.d/openshift/nodes.txt -o /etc/httpd/conf.d/openshift/nodes.db-20130413-30162-82d3xc/new.db: {:unsetenv_others=>false, :close_others=>true, :in=>"/dev/null", :out=>#<IO:fd 11>, :err=>#<IO:fd 8>}

In OpenShift you can compose an application by adding multiple cartridges, e.g. database or cron cartridges. The mock-plugin cartridge tests this functionality. You can use oo-cartridge to add this as well:

# oo-cartridge -a add -c $UUID -n mock-plugin-0.1 -d
Cartridge add succeeded
Output:
-----------------------------
Creating version marker for 0.1

You can check that your gear is up and running with curl. Your gear has been configured into the front-end proxy’s routing, even though its DNS record doesn’t exist. You can tell the proxy which gear to access by setting the host header:

curl http://localhost/ -IH "Host: $NAME-$DOMAIN.$CLOUD_DOMAIN"

(You can get CLOUD_DOMAIN from /etc/openshift/node.conf) Of course with the mock cartridge, there may not be much to see; another cartridge like php-5.3 or ruby-1.9 will have content by default.

Cucumber tests

The mock cartridge is a testing tool for putting the cartridge logic through its paces. Take a look at the cucumber tests in the origin-server repo to see that in action (the mock cartridge feature is controller/test/cucumber/cartridge-mock.feature).

Updated 04/17: even as I was writing this, the cartridge-mock.feature was split into platform-* features, e.g. platform-basic.feature. Look at those instead.

You can run these by checking out the origin-server git repo on your node host and running cucumber against the feature file you are interested in (of course you must have the cucumber gem installed):

# cucumber cartridge-mock.feature
Using RSpec 2
simple: 17816 ruby 1.9.3 2012-11-10 [x86_64-linux]
@runtime_other
Feature: V2 SDK Mock Cartridge
Scenario: Exercise basic platform functionality in isolation # cartridge-mock.feature:4
[...]

If you were developing a new v2 cartridge, BDD with a cucumber feature would probably be a better approach than the manual testing this post is demonstrating, especially for testing composing and scaling.

Logging into the gear

Update 04/17: Adding this section

Now that you have a gear with a cartridge or two in it, you might want to log in and look around, like you are used to with normal gears. Of course this really just means getting a login as the gear user. Normally you would do that with ssh, but you haven’t set up an ssh key for the gear yet. It’s easy to do that, but why bother? You can just use su, right?

# su - $UUID
Invalid context: unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023, expected unconfined_u:system_r:openshift_t:s0:c0,c502

Not so fast. The gear runs in a specialized SELinux context, and normal su doesn’t handle that. For this purpose you need oo-su:

# oo-su $UUID -c oo-trap-user

This will get you an ordinary gear-user login (preceded by a few error messages that don’t seem to harm anything). oo-trap-user is the login shell; of course, you don’t have to do that, you can use oo-su similarly to directly run any command in the context of the gear user.

Cleanup

You can remove a cartridge from a gear in much the same way it was added:

# oo-cartridge -a delete -c $UUID -n mock-plugin-0.1 -d
Cartridge delete succeeded
Output:
-----------------------------
# oo-cartridge -a delete -c $UUID -n mock-0.1 -d
Cartridge delete succeeded
Output:
-----------------------------

You’ll notice, though, that the gear is likely not left pristine. Cartridges leave log files and more even after removed. The base framework cartridges are particularly bad about this. You’ll even find that removing one framework cartridge and adding another may cause a failure. That’s because in real usage, framework cartridges are never removed. The whole gear is simply discarded:

# oo-app-destroy -a $UUID -c $UUID --with-app-name $NAME --with-namespace $DOMAIN

So, that is the simplest cleanup.

Working around the mysterious yum multilib error on an OpenShift Enterprise install

This one took a little while to track down and understand so I offer it here in the spirit of helping someone else get around this without that discovery process.

TL;DR

if when installing cartridges on an OpenShift Enterprise node host you get a yum error with “Error:  Multilib version problems found.” then make sure you have followed the steps in https://access.redhat.com/site/articles/316613 even if you don’t want the JBoss cartridges.

The problem

The example configure script we’ve provided for OpenShift Enterprise currently installs node cartridges as a single yum command which attempts to install all of them, but hedges its bets with the –skip-broken flag in case something doesn’t work out with one or more cartridges. This keeps you from having an install where no cartridges get installed just because some fiddling dependency problem blocked one of them and the others could have proceeded fine. This is very helpful during our development testing. For a production deployment it may or may not be, depending on how you define “helpful.”

So, I found that if I ran a production configuration and did not follow the directions at https://access.redhat.com/site/articles/316613 to work around the nature of our JBoss subscription dependencies, not only did JBoss cartridge installation fail, but I got this bizarre yum error message:

Error: Multilib version problems found. This often means that the root
cause is something else and multilib version checking is just
pointing out that there is a problem.

[…]

Protected multilib versions: zlib-1.2.3-27.el6.i686 != zlib-1.2.3-29.el6.x86_64

(The full message is similar to this post.) Reading between the lines there, what this error tells you is: There is some kind of dependency problem, we found it when trying to update zlib (and found that we were trying to install mismatched versions on the two architectures i686 and x86_64), but the real problem is probably somewhere way upstream. Which is about as useless an error message as you can get, but I understand where these things come from – it’s pretty hard to unravel the chain of consequences sometimes to give a helpful error message.

So, what the heck, right?

Also, if you try to install individual cartridges, they’ll work fine. If you remove the JBoss ones from the yum command, it works fine. It only fails on exactly the command that is run by default by the script.

Debugging

Running yum with the -v flag (verbose) gives a lot more info. I did not chase this down to the exact dependency chain but I gather the following describes what’s happening.

The first annoying question is, why is it even trying to install the i686 zlib in the first place? Everything I’m using is 64-bit. The next question is, why is it trying to install different versions? The -27 release is available for both architectures, why isn’t it just using that?

The clue comes in these lines of debugging output:

SKIPBROKEN: removing zlib-devel-1.2.3-29.el6.x86_64 from transaction 
SKIPBROKEN: removing zlib-devel-1.2.3-29.el6.x86_64 from pkgSack & updates

zlib-devel requires a zlib with the same version, and itself is required by freetype-devel, a requirement of the python cartridge.  It is also a dependency for libxml2-devel (required from the ruby cartridge) and openssl-devel (distant requirement for several cartridges). So let’s just say it is in the thick of things.

Unraveling the chain

I gather that what happened is this: With the channels broken, yum got deep into resolution and realized it could not install the JBoss cartridges. With the –skip-broken flag, yum removed from the transaction those cartridges and all of the package versions it had included for their dependencies; then it tried to resolve dependencies for the rest of the transaction with ALL of those packages not allowed in the transaction (as any could be the source of the problem). Apparently zlib-devel-1.2.3-29.el6.x86_64 was part of what was excised. But zlib-devel was still needed by other cartridges, and this is where it gets really bizarre.

openssl-devel-1.0.0-27.el6_4.2.x86_64 requires: zlib-devel
–> Processing Dependency: zlib-devel for package: openssl-devel-1.0.0-27.el6_4.2.x86_64
Searching pkgSack for dep: zlib-devel
TSINFO: Marking zlib-devel-1.2.3-27.el6.x86_64 as install for openssl-devel-1.0.0-27.el6_4.2.x86_64
[…]
—> Package zlib-devel.x86_64 0:1.2.3-27.el6 will be installed

[…]

zlib-devel-1.2.3-27.el6.x86_64 requires: zlib = 1.2.3-27.el6
–> Processing Dependency: zlib = 1.2.3-27.el6 for package: zlib-devel-1.2.3-27.el6.x86_64

Searching pkgSack for dep: zlib
Potential resolving package zlib-1.2.3-27.el6.x86_64 has newer instance installed.
TSINFO: Marking zlib-1.2.3-27.el6.i686 as install for zlib-devel-1.2.3-27.el6.x86_64
–> Running transaction check
—> Package zlib.i686 0:1.2.3-27.el6 will be installed

Because zlib-devel -29 was verboten, it fell back to zlib-devel -27. But that requires zlib -27, and -29 was already installed. You can’t have both installed at the same time, but yum saw a way to resolve the request by pulling in the zlib package from the i686 arch (zlib-devel did not specify arch in its requirement). So now yum wanted to install zlib-1.2.3-27.i686 on a system where zlib-1.2.3-29.x86_64 was already installed. It wasn’t allowed to downgrade the existing installation and can’t find any other way to satisfy the requirements under the restrictions imposed by having removed part of the transaction. So finally yum gave up and I got this confusing error about the multilib versions not matching, far from the source of the problem (which is, arguably, yum’s process for dependency resolution following pruning by –skip-broken).

Fixing the repo configuration (https://access.redhat.com/site/articles/316613) so that JBoss cartridges install cleanly fixes the problem. Not attempting to install them also fixes the problem.

Hopefully this provides some useful insight into this kind of problem, even for those who arrive here by completely different means.