Customizing OpenShift JBoss confs without customizing the cartridge

I added a feature recently to enable OpenShift administrators to specify (at the broker) a custom location to get the default app git template from. This allows you to customize the initial experience of developers when they create an app; so you can, for example, put your organization’s name and logo on them. This should be out in Origin nightly builds now and the Enterprise 2.0.3 point release coming soon.

For JBoss applications, there is an added use for this feature. JBoss configuration files are located in the application git repository, so if you wanted to change the default confs for these cartridges as an administrator, say to add a custom valve, you can. Users are free to ignore this, of course, either by specifying a different source for their code or blowing your changes away after creating the app. Still, it can be useful to set the defaults the way you like, and with this feature, you don’t have to customize the cartridge to do it. You just need to maintain a custom git repository.

There’s a slight complication, though, as I discovered when trying to demonstrate this. The JBoss cartridges construct configuration files with three processing steps in between the source and the outcome. These are:

  1. The “install” step of cartridge instantiation modifies the Maven pom.xml that ships with the default template, replacing strategically-placed {APP_NAME} entries with the application name. If you construct your template using the source, Maven will not like it if you leave these as-is.
  2. The “setup” step of cartridge instantiation combines shared configuration files with version-specific configuration files from the cartridge source.
  3. Most of the conf files in the application git repo are directly symlinked from the actual gear configuration. However, there are a few that aren’t, which happen to be the ones you tend to want to change. These are actually templates that are processed during every build of the application (i.e. every git push).

These aren’t hard to work around, but they’re a little surprising if you don’t know about them. Let me demonstrate how I would do this with an example. Let’s say we wanted to change the log format on a JBoss EWS 2.0 cartridge.

  1. First, create an EWS 2.0 app with the installed default:
    • rhc app create template jbossews-2.0
  2. Now edit the resulting “template” directory that git creates as needed:
    • Change .openshift/config/server.xml log valve as follows:
      <Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs"
             prefix="localhost_access_log." suffix=".txt"
             pattern="CHANGED %h %l %u %t &quot;%r&quot; %s %b" />
    • Note, this is one of the files that is interpreted with every git push. The Connector element has expressions in it which are evaluated at build time on the gear.
    • Edit the pom.xml file. This is optional, but you may want to use a different groupId, artifactId, etc. than just the “template” app name. It’s possible to use env vars here, e.g.
      <groupId>${env.OPENSHIFT_APP_DNS}</groupId>

      … however, Maven will give out WARNINGs with every build and threatens to break in the future if you do this, so I don’t recommend it.

    • Commit the changes.
       git commit -am "Template customizations"
  3. Now put this git repo somewhere all the nodes can see it. You can put it on github if you like, or your internal gitolite instance, or just on a plain web server. For simplicity, I just put it directly on the node filesystem, but remember that all nodes have to have the template available in the same place (although it could be useful to vary the template contents according to gear profile):
    # mkdir -p /etc/openshift/templates
    # git clone template /etc/openshift/templates/jbossews2.git
  4. Now modify the broker to specify this as the default. In /etc/openshift/broker.conf:
    DEFAULT_APP_TEMPLATES=jbossews-2.0|file:///etc/openshift/templates/jbossews2.git

    … and restart the broker:

    $ service openshift-broker restart

    Of course, with multiple brokers, you need to do this for all of them.

At this point, whenever you create a new jbossews-2.0 application, it will default to using your template and the changed access log format.

OpenShift: profiles, districts, and nodes, oh my!

Someone in #openshift-dev recently pointed out that the relationship between OpenShift profiles, districts, and nodes isn’t laid out clearly anywhere. I had a look through the docs, and I have to admit, he has a point. You can kind of infer it from various parts of the documentation, but I couldn’t find anywhere that simply states what I’m about to here. I’d be happy to be shown wrong.

TL;DR: Profiles contain districts, which contain nodes, which contain gears. You can’t have districts or nodes with multiple profiles. You can’t control what district or node a gear is created in.

If you’re familiar with OpenShift at all, you probably have at least some grasp of what a gear is: the basic unit of compute in OpenShift. Practically speaking, a gear is actually a regular old user account on a Linux host (an OpenShift node host), with a specified allocation of resources (RAM, disk, network, etc.), locked down by various containment mechanisms to just those resources. Much of OpenShift revolves around managing gears.

Profiles

Gears have a profile, also known as a size. I don’t like calling it a size, because it need not have anything to do with size, but we’re stuck with the term in a few places (notably, the DB and API) so I can’t pretend it’s not there. And for many deployments, it probably will be about size. But I’ll call it a profile here.

Profiles are the most fundamental way in which gears are grouped. The original point of profiles was to provide some uniformity for capacity planning, but you can really use them to partition your gears in any way you want – departmental ownership, location, high availability separation, security clearance, etc. We will have better ways to implement separation for some of these concepts in the not-too-distant future, but at this moment, gear profiles are the only point at which OpenShift enables giving different users access to separate parts of the deployment.

For such a fundamental piece of the architecture, it may be surprising that there is no model object or real definition of a profile in the OpenShift broker database schema. It is literally just a string, which can be whatever you want.

Nodes

An OpenShift node is a Linux host with OpenShift services for containing gears. There might conceivably be additional implementations in the future – the point is that nodes are containers for gears. Nodes have a gear profile, which is defined in /etc/openshfit/resource_limits.conf – this file contains the gear profile string as well as all of the resources that a gear gets when it is created on that node.

Technically, a bunch of nodes all claiming to have a particular gear profile could specify wildly different resource limits. There’s nothing at the broker that would even know they were different, for the most part. But if you are a sane system administrator, you would not do this, except by accident. See, you would probably like to have some idea of how many gears you can fit on the node, so that you know when to make new nodes. And you would probably not like your gears in a profile to have randomly different resource limits.

Most OpenShift admins (and salespeople) wonder at some point how you can specify multiple gear profiles for a node. You can’t. If nodes could host multiple sizes, how would you know how many gears you can fit on that node? It comes as a great surprise to people who want to create a monster node host and run their whole PaaS off of it when we tell them “just partition it into VMs and give them different profiles.” So perhaps someday, we will enable multiple profiles per host; but don’t bet on it. Sometimes very unintuitive results make sense when you look at the bigger picture.

So at this point, gear profiles are synonymous with node profiles. A node contains gears of a particular profile, and a gear profile is constrained in number of gears by the number of node hosts configured with that profile.

Districts

Of all the things that are misunderstood in OpenShift, I’m pretty sure districts are number one. I think it’s because they kind of sound like what profiles actually are – a partitioning scheme. They’re really not at all.

Conceptually, profiles contain districts, and districts contain node hosts. A district has a gear profile and can only contain nodes with that profile. But districts are completely invisible to users, and there is no way to specify which one a gear lands in when you’re creating a gear. They have nothing to do with permissions or partitions. To understand what they are for, you have to understand a little about the technical details that motivated them.

Sometimes, for one reason or another, you want to move a gear from one node host to another. You would like it to function exactly the same way on the new host as the old one. The problem is, there are a few resources that must be exclusive to any single gear on a host: internal IPs and external ports being the main ones. If you move a gear from one host to another, there’s no guarantee that the same resources it was using will be available on the new host, so you would need to detect this situation and reconfigure the gear to use unique resources that don’t clash. This is indeed the approach that was taken before districts, but it proved to be rather brittle.

  1. It led to all kinds of edge case bugs where things would wind up broken only when certain cartridges (or combinations of them, perhaps when scaled…) were moved to nodes where they needed to be reconfigured. In short, it was a regression testing nightmare.
  2. It also made cartridges hard to write correctly to handle moving, and we wanted writing cartridges to be easy so that lots of developers can contribute them.
  3. Finally, since gears are configured via setting environment variables, reconfiguring for a move meant changing environment variables. The parts of the gear that relied on these would work fine after a move, but the places where the app developer had hard-coded the values instead of using environment variables… broke. Naturally, developers assume it is the administrator’s fault, or a bug in the PaaS. So, it’s an administrative nightmare too.

To get around this, OpenShift introduced a simple allocation scheme: to ensure that you can move a gear off of a node, reserve the unique resources it will need on multiple nodes.

In practice, a district is nothing more than a pool of numeric user IDs that are reserved against a set of nodes. Every time a gear is created, it first reserves an ID from the district pool; then on a node host in that district, a user is created with that ID, and algorithms based on that ID specify the range of resources that are available to the gear. Since the UID is reserved across all the nodes in the district, it is guaranteed to be available if you move a gear to any node in that district, and thus all the resources based on it will also be available, and the gear needs no reconfiguration. Problem solved.

Unfortunately, having to explain all this just to get across what districts are for… is pretty awkward. But it’s a necessary concept to understand if you’re an OpenShift administrator.

vim as IDE

I happened to learn vi as my first major text editor back in the 90s. There are many great editors, and I have no interest in proving that vim is best. It’s just what I use, and what many others use, and it’s available everywhere.

A friend recently observed “it looks like your vim does things that mine doesn’t.” Vim is almost infinitely extensible. It takes some time to incorporate everything, to be sure, and vim lacks somewhat in discoverability. But when you are working on code all day every day, it pays to invest some time in improving and learning your tools. And no matter how much you know about vim, you can always find some feature to surprise and delight you.

vim tips and tricks sites abound, so I don’t really have much to add to these:

  1. http://www.vim.org/docs.php
  2. http://vim.wikia.com/wiki/Vim_Tips_Wiki + http://vim.wikia.com/wiki/Best_Vim_Tips
  3. http://vimcasts.org/
  4. http://pragprog.com/book/dnvim/practical-vim
  5. http://learnvimscriptthehardway.stevelosh.com/ (learn how to seriously customize vim)

I spend most of my time working with git between various related repositories, mostly coding in ruby and bash. If you are doing the same thing, you might be interested in some of the plugins I’ve added to make life a little easier and have vim help as much as possible with the workflow. You really can get to the point where vim pretty much does everything you need. I’m still getting these into my fingers, but thought I’d pass them on:

  1. NERDTree – this is a handy directory plugin. vim already has a directory display; if you start up vim with a directory name, you get a directory listing. It’s not a tree, though, and it goes away once you pick a file to edit. Invoke NERDTree (I mapped “:NT” to toggle it on and off) and it keeps a directory tree structure in a vertical split on the left; choose a file and it opens in a buffer on the right. If you dismiss NERDTree and bring it back later, it comes back with the same state – same directories opened.
  2. Fugitive – Sweet git integration plugin from Tim Pope. I will never work another merge conflict without it. It does so much stuff there are five vimcasts introducing it. May also introduce you to standard vim features you never heard of, like the quickfix list.
  3. Rails.vim – another Tim Pope invention for working with Rails. The idea is to make all those TextMate users jealous (you may want some addons like SnipMate though – and see this classic post for pointers to really decking out your vim Rails IDE).

That’s just three, and that’ll keep you busy for a long time. There are plenty more (see that last link and various recommendations on StackOverflow).

vim for OpenShift and oo-ruby

One more addition – if you happen to be in my very particular line of work, you get to work with a lot of ruby files that don’t *look* like ruby files to vim, because they’re scripts that invoke oo-ruby as their executable.

What’s oo-ruby? It’s a shim to wrap Ruby such that you get a Ruby 1.9 environment whether you are on Fedora (where 1.9 is native currently) or on RHEL (where it is provided by an SCL).

But the problem is, if the file doesn’t end in .rb, vim doesn’t know what filetype to give it, so syntax highlighting and all the other goodies that come with a known filetype don’t work. You have to help vim recognize the filetype as follows. Create or edit .vim/scripts.vim and add the following vimscript:

if did_filetype() " filetype already set..
    finish " ..don't do these checks
endif
if getline(1) =~ '^#!.*\<oo-ruby\>'
    setfiletype ruby
endif

This checks the first line of the file for “oo-ruby” somewhere after the shebang and, if present and filetype is not otherwise determined, sets filetype to ruby. Problem solved!

OpenShift scaling on OpenStack

Recently I worked on a video for our CTO’s keynote at the OpenStack Summit. You can watch it here:

https://www.openstack.org/summit/portland-2013/session-videos/presentation/keynote-openstack-at-red-hat

Actually the still for the video appears to be my goofy mug (at least for now) instead of Brian. My video starts at about 11:30 in. I actually just did the screenshots, script, and voice-over. The diagrams and music and editing and whatnot were other people :)

It is definitely still proof-of-concept but I thought I should point out where the code for scaling OpenShift like this (after a little cleanup) actually lives. It’s over here:

https://github.com/openshift/openshift-extras/tree/enterprise-1.1/node-manager

It’s intended to be pluggable, so it should be fairly easy to swap in another IaaS or client (having worked with it a bit, I don’t find the nova client particularly scriptable, probably would have gone another way if someone hadn’t done a lot of that work for me).

If anything, it highlights some of the complexities of trying to automate this. But people have been asking about automating this case, so here’s a first stab at it; improvements are most welcome.

OpenShift v2 cartridges: node host tools

There is a series starting on the OpenShift blog about the v2 cartridge format. Check it out. Way more official than whatever I write here.

Updated 2013-04-17 – updates marked below.

I introduced v2 cartridges in a previous post. If you have an OpenShift Origin node host running and you’ve toggled it to v2 mode, follow along.

When a user creates an application, that flows through the broker and mcollective to the node via the MCollective openshift.rb agent. You can shortcut that path if you want to create gears and configure cartridges into them more directly on the node host. None of the following involves the broker (so, of course, the broker will deny all knowledge of it if you ask).

Creating a gear

You can use the oo-app-create command on a node host to create a gear arbitrarily. Several parameters are required. You can run it with -h to see all the options. The main things you need are:

  1. An application UUID. This is a unique string that identifies the whole application; if the app has multiple gears, all will still share the application ID. Once the gear is created this goes in an environment variable.
  2. A gear UUID, which is referred to as a “container” ID. This is a unique string that identifies the gear. For single-gear apps, we typically just re-use the application ID, but that’s just convention.
  3. The application name – this would ordinarily be what a developer called their application.
  4. A namespace, which would ordinarily be the developer’s “domain”.

So the fun news is, since you don’t have to deal with the broker, you can totally make these up. There are a few requirements:

  1. The UUIDs do actually have to be unique, at least on the node host (they’re supposed to be unique across OpenShift). The broker just makes these up and instructs the node to use them.
  2. Gear UUID and name and namespace need to be all “word” characters. App name and UUID can be basically anything.
  3. Gear UUID will be used to create a system user, so it can’t violate the restrictions on that – e.g. it can’t be too long.

So, once you’ve made up your values, you can just create an empty gear like so:

# UUID=9vr98yvfr98ju23vfjuf
# DOMAIN=lm20130412
# NAME=testname
# oo-app-create --with-app-uuid $UUID \
--with-container-uuid $UUID \
--with-namespace $DOMAIN \
--with-app-name $NAME

Once you’ve done this, you check that the gear has been created with “id $UUID” and by looking in /var/lib/openshift for a directory of the same name.

A quick description of some of the optional parameters is in order:

  • --with-container-name is the name for the gear as opposed to the app – it just defaults to the app name if not specified. This is what is combined with the domain to create the DNS record for your gear – even if it’s a child gear in a scaled app, it will get its own DNS entry (although if you’re manually creating gears this way, the broker never knows to create the DNS entry so it’s rather moot).
  • --with-uid is used to specify a numeric user ID for the gear user – this is specified by the broker for nodes that are in a district; the UID is chosen from a pool that is available in the district and reserved for the gear regardless of which node in the district it lands on. So, it’s specified at the time the gear is created. If not specified, the node just picks an available one.

Distinguishing v1 and v2 gears

Even before we’ve done anything else with the new gear, it is marked as a v2 gear. Look at the files in the gear:

/var/lib/openshift/9vr98yvfr98ju23vfjuf/
├── app-root
│   ├── data
│   │   └── .bash_profile
│   ├── repo -> runtime/repo
│   └── runtime
│   ├── data -> ../data
│   ├── repo
│   └── .state
├── .env
│   ├── CARTRIDGE_VERSION_2
│   ├── HISTFILE
│   ├── HOME
│   ├── OPENSHIFT_APP_DNS
│   ├── OPENSHIFT_APP_NAME
│   ├── OPENSHIFT_APP_UUID
│   ├── OPENSHIFT_DATA_DIR
│   ├── OPENSHIFT_GEAR_DNS
│   ├── OPENSHIFT_GEAR_NAME
│   ├── OPENSHIFT_GEAR_UUID
│   ├── OPENSHIFT_HOMEDIR
│   ├── OPENSHIFT_REPO_DIR
│   ├── OPENSHIFT_TMP_DIR
│   ├── PATH
│   ├── TMP
│   ├── TMPDIR
│   └── TMP_DIR
├── .sandbox
│   └── 9vr98yvfr98ju23vfjuf
├── .ssh
└── .tmp

There is exactly one difference from a v1 gear: the CARTRIDGE_VERSION_2 env var. But that’s enough – the presence of this addition is used to decide whether to use v1 or v2 logic with cartridges in this gear.

Configuring a cartridge into the gear

So, let’s actually add a cartridge. You can do this with the oo-cartridge command. This is basically a convenience wrapper for manual testing – nothing in the product uses this script, but it is an entry point to the same code that actually is executed via MCollective to instantiate a cartridge in the gear.

# oo-cartridge -a add -c $UUID -n mock-0.1 -v -d
Cartridge add succeeded
Output:
-----------------------------
Creating version marker for 0.1

Although I’ve added the -v -d flags (verbose, debug) you can see there isn’t much output here. Without either flag you just get the first line (success or failure). The “verbose” flag adds the output from the start hook after the cartridge is added. The “debug” flag will give detailed output if there is an exception (otherwise it adds nothing). To see what is really going on, you’ll want to look at the platform logs.

The platform logs are configured in /etc/openshift/node.conf and located in /var/log/openshift/node/. I suggest for the purposes of understanding, set the platform.log level to INFO in order to understand the flow of what’s happening, and leave platform-trace.log at DEBUG level to consult for the actual bash commands and their results. If you were developing a cartridge, though, you’d probably want platform.log at DEBUG level (the default) to see the bash commands mixed in with the code-level flow.

Example INFO-level platform.log for the above (leaving off timestamps and such):

Creating cartridge directory 9vr98yvfr98ju23vfjuf/mock
Created cartridge directory 9vr98yvfr98ju23vfjuf/mock
Creating private endpoints for 9vr98yvfr98ju23vfjuf/mock
Created private endpoint for cart mock in gear 9vr98yvfr98ju23vfjuf: [OPENSHIFT_MOCK_EXAMPLE_IP1=127.0.251.129, OPENSHIFT_MOCK_EXAMPLE_PORT1=8080]
Created private endpoint for cart mock in gear 9vr98yvfr98ju23vfjuf: [OPENSHIFT_MOCK_EXAMPLE_IP1=127.0.251.129, OPENSHIFT_MOCK_EXAMPLE_PORT2=8081]
Created private endpoint for cart mock in gear 9vr98yvfr98ju23vfjuf: [OPENSHIFT_MOCK_EXAMPLE_IP1=127.0.251.129, OPENSHIFT_MOCK_EXAMPLE_PORT3=8082]
Created private endpoint for cart mock in gear 9vr98yvfr98ju23vfjuf: [OPENSHIFT_MOCK_EXAMPLE_IP2=127.0.251.130, OPENSHIFT_MOCK_EXAMPLE_PORT4=9090]
Created private endpoints for 9vr98yvfr98ju23vfjuf/mock
mock attempted lock/unlock on out-of-bounds entry [~/invalid_mock_locked_file]
Running setup for 9vr98yvfr98ju23vfjuf/mock
Ran setup for 9vr98yvfr98ju23vfjuf/mock
Creating gear repo for 9vr98yvfr98ju23vfjuf/mock from ``
Created gear repo for 9vr98yvfr98ju23vfjuf/mock
Processing ERB templates for /var/lib/openshift/9vr98yvfr98ju23vfjuf/mock/**
Connecting frontend mapping for 9vr98yvfr98ju23vfjuf/mock: => 127.0.251.129:8080 with options: {"websocket"=>true}
Connecting frontend mapping for 9vr98yvfr98ju23vfjuf/mock: /front1a => 127.0.251.129:8080/back1a with options: {}
configure output: Creating version marker for 0.1

platform-trace.log DEBUG-level output for the tail end of that is:

oo_spawn running service openshift-node-web-proxy reload: {:unsetenv_others=>false, :close_others=>true, :in=>"/dev/null", :out=>#<IO:fd 12>, :err=>#<IO:fd 9>}
oo_spawn buffer(11/) Reloading node-web-proxy:
oo_spawn buffer(11/) [
oo_spawn buffer(11/) OK
oo_spawn buffer(11/) ]
oo_spawn buffer(11/)
oo_spawn buffer(11/)
oo_spawn running /usr/sbin/httxt2dbm -f DB -i /etc/httpd/conf.d/openshift/nodes.txt -o /etc/httpd/conf.d/openshift/nodes.db-20130413-30162-z6g4cz/new.db: {:unsetenv_others=>false, :close_others=>true, :in=>"/dev/null", :out=>#<IO:fd 11>, :err=>#<IO:fd 8>}
oo_spawn running /usr/sbin/httxt2dbm -f DB -i /etc/httpd/conf.d/openshift/nodes.txt -o /etc/httpd/conf.d/openshift/nodes.db-20130413-30162-82d3xc/new.db: {:unsetenv_others=>false, :close_others=>true, :in=>"/dev/null", :out=>#<IO:fd 11>, :err=>#<IO:fd 8>}

In OpenShift you can compose an application by adding multiple cartridges, e.g. database or cron cartridges. The mock-plugin cartridge tests this functionality. You can use oo-cartridge to add this as well:

# oo-cartridge -a add -c $UUID -n mock-plugin-0.1 -d
Cartridge add succeeded
Output:
-----------------------------
Creating version marker for 0.1

You can check that your gear is up and running with curl. Your gear has been configured into the front-end proxy’s routing, even though its DNS record doesn’t exist. You can tell the proxy which gear to access by setting the host header:

curl http://localhost/ -IH "Host: $NAME-$DOMAIN.$CLOUD_DOMAIN"

(You can get CLOUD_DOMAIN from /etc/openshift/node.conf) Of course with the mock cartridge, there may not be much to see; another cartridge like php-5.3 or ruby-1.9 will have content by default.

Cucumber tests

The mock cartridge is a testing tool for putting the cartridge logic through its paces. Take a look at the cucumber tests in the origin-server repo to see that in action (the mock cartridge feature is controller/test/cucumber/cartridge-mock.feature).

Updated 04/17: even as I was writing this, the cartridge-mock.feature was split into platform-* features, e.g. platform-basic.feature. Look at those instead.

You can run these by checking out the origin-server git repo on your node host and running cucumber against the feature file you are interested in (of course you must have the cucumber gem installed):

# cucumber cartridge-mock.feature
Using RSpec 2
simple: 17816 ruby 1.9.3 2012-11-10 [x86_64-linux]
@runtime_other
Feature: V2 SDK Mock Cartridge
Scenario: Exercise basic platform functionality in isolation # cartridge-mock.feature:4
[...]

If you were developing a new v2 cartridge, BDD with a cucumber feature would probably be a better approach than the manual testing this post is demonstrating, especially for testing composing and scaling.

Logging into the gear

Update 04/17: Adding this section

Now that you have a gear with a cartridge or two in it, you might want to log in and look around, like you are used to with normal gears. Of course this really just means getting a login as the gear user. Normally you would do that with ssh, but you haven’t set up an ssh key for the gear yet. It’s easy to do that, but why bother? You can just use su, right?

# su - $UUID
Invalid context: unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023, expected unconfined_u:system_r:openshift_t:s0:c0,c502

Not so fast. The gear runs in a specialized SELinux context, and normal su doesn’t handle that. For this purpose you need oo-su:

# oo-su $UUID -c oo-trap-user

This will get you an ordinary gear-user login (preceded by a few error messages that don’t seem to harm anything). oo-trap-user is the login shell; of course, you don’t have to do that, you can use oo-su similarly to directly run any command in the context of the gear user.

Cleanup

You can remove a cartridge from a gear in much the same way it was added:

# oo-cartridge -a delete -c $UUID -n mock-plugin-0.1 -d
Cartridge delete succeeded
Output:
-----------------------------
# oo-cartridge -a delete -c $UUID -n mock-0.1 -d
Cartridge delete succeeded
Output:
-----------------------------

You’ll notice, though, that the gear is likely not left pristine. Cartridges leave log files and more even after removed. The base framework cartridges are particularly bad about this. You’ll even find that removing one framework cartridge and adding another may cause a failure. That’s because in real usage, framework cartridges are never removed. The whole gear is simply discarded:

# oo-app-destroy -a $UUID -c $UUID --with-app-name $NAME --with-namespace $DOMAIN

So, that is the simplest cleanup.

Working around the mysterious yum multilib error on an OpenShift Enterprise install

This one took a little while to track down and understand so I offer it here in the spirit of helping someone else get around this without that discovery process.

TL;DR

if when installing cartridges on an OpenShift Enterprise node host you get a yum error with “Error:  Multilib version problems found.” then make sure you have followed the steps in https://access.redhat.com/site/articles/316613 even if you don’t want the JBoss cartridges.

The problem

The example configure script we’ve provided for OpenShift Enterprise currently installs node cartridges as a single yum command which attempts to install all of them, but hedges its bets with the –skip-broken flag in case something doesn’t work out with one or more cartridges. This keeps you from having an install where no cartridges get installed just because some fiddling dependency problem blocked one of them and the others could have proceeded fine. This is very helpful during our development testing. For a production deployment it may or may not be, depending on how you define “helpful.”

So, I found that if I ran a production configuration and did not follow the directions at https://access.redhat.com/site/articles/316613 to work around the nature of our JBoss subscription dependencies, not only did JBoss cartridge installation fail, but I got this bizarre yum error message:

Error: Multilib version problems found. This often means that the root
cause is something else and multilib version checking is just
pointing out that there is a problem.

[...]

Protected multilib versions: zlib-1.2.3-27.el6.i686 != zlib-1.2.3-29.el6.x86_64

(The full message is similar to this post.) Reading between the lines there, what this error tells you is: There is some kind of dependency problem, we found it when trying to update zlib (and found that we were trying to install mismatched versions on the two architectures i686 and x86_64), but the real problem is probably somewhere way upstream. Which is about as useless an error message as you can get, but I understand where these things come from – it’s pretty hard to unravel the chain of consequences sometimes to give a helpful error message.

So, what the heck, right?

Also, if you try to install individual cartridges, they’ll work fine. If you remove the JBoss ones from the yum command, it works fine. It only fails on exactly the command that is run by default by the script.

Debugging

Running yum with the -v flag (verbose) gives a lot more info. I did not chase this down to the exact dependency chain but I gather the following describes what’s happening.

The first annoying question is, why is it even trying to install the i686 zlib in the first place? Everything I’m using is 64-bit. The next question is, why is it trying to install different versions? The -27 release is available for both architectures, why isn’t it just using that?

The clue comes in these lines of debugging output:

SKIPBROKEN: removing zlib-devel-1.2.3-29.el6.x86_64 from transaction 
SKIPBROKEN: removing zlib-devel-1.2.3-29.el6.x86_64 from pkgSack & updates

zlib-devel requires a zlib with the same version, and itself is required by freetype-devel, a requirement of the python cartridge.  It is also a dependency for libxml2-devel (required from the ruby cartridge) and openssl-devel (distant requirement for several cartridges). So let’s just say it is in the thick of things.

Unraveling the chain

I gather that what happened is this: With the channels broken, yum got deep into resolution and realized it could not install the JBoss cartridges. With the –skip-broken flag, yum removed from the transaction those cartridges and all of the package versions it had included for their dependencies; then it tried to resolve dependencies for the rest of the transaction with ALL of those packages not allowed in the transaction (as any could be the source of the problem). Apparently zlib-devel-1.2.3-29.el6.x86_64 was part of what was excised. But zlib-devel was still needed by other cartridges, and this is where it gets really bizarre.

openssl-devel-1.0.0-27.el6_4.2.x86_64 requires: zlib-devel
–> Processing Dependency: zlib-devel for package: openssl-devel-1.0.0-27.el6_4.2.x86_64
Searching pkgSack for dep: zlib-devel
TSINFO: Marking zlib-devel-1.2.3-27.el6.x86_64 as install for openssl-devel-1.0.0-27.el6_4.2.x86_64
[...]
—> Package zlib-devel.x86_64 0:1.2.3-27.el6 will be installed

[...]

zlib-devel-1.2.3-27.el6.x86_64 requires: zlib = 1.2.3-27.el6
–> Processing Dependency: zlib = 1.2.3-27.el6 for package: zlib-devel-1.2.3-27.el6.x86_64

Searching pkgSack for dep: zlib
Potential resolving package zlib-1.2.3-27.el6.x86_64 has newer instance installed.
TSINFO: Marking zlib-1.2.3-27.el6.i686 as install for zlib-devel-1.2.3-27.el6.x86_64
–> Running transaction check
—> Package zlib.i686 0:1.2.3-27.el6 will be installed

Because zlib-devel -29 was verboten, it fell back to zlib-devel -27. But that requires zlib -27, and -29 was already installed. You can’t have both installed at the same time, but yum saw a way to resolve the request by pulling in the zlib package from the i686 arch (zlib-devel did not specify arch in its requirement). So now yum wanted to install zlib-1.2.3-27.i686 on a system where zlib-1.2.3-29.x86_64 was already installed. It wasn’t allowed to downgrade the existing installation and can’t find any other way to satisfy the requirements under the restrictions imposed by having removed part of the transaction. So finally yum gave up and I got this confusing error about the multilib versions not matching, far from the source of the problem (which is, arguably, yum’s process for dependency resolution following pruning by –skip-broken).

Fixing the repo configuration (https://access.redhat.com/site/articles/316613) so that JBoss cartridges install cleanly fixes the problem. Not attempting to install them also fixes the problem.

Hopefully this provides some useful insight into this kind of problem, even for those who arrive here by completely different means.

The OpenShift cartridge refactor: a brief introduction

If you’re watching the commit logs over at OpenShift Origin you’ll see a lot of activity around “v2″ cartridges (especially a lot of “WIP” commits). For a variety of reasons we’re refactoring cartridges to make it easier to write and maintain them. We’re particularly interested in enabling those who wish to write cartridges, and part of that includes removing as much as possible from the current cartridge code that is really generic platform code and shouldn’t be boilerplate repeated in cartridges. And in general, we’re just trying to bring more sanity and remove opacity.

If you’ve fired up Origin lately you wouldn’t necessarily notice that anything has changed. The refactored cartridges are available in parallel with existing cartridges, and you have to opt in to use them. To do that, use the following command as root on a node host:

# oo-cart-version -c toggle
Node is currently in v1 mode
Switching node cartridge version
Node is currently in v2 mode

The node now works with the cartridges installed in /usr/libexec/openshift/cartridges/v2 (rather than the “v1″ cartridges in /usr/libexec/openshift/cartridges – BTW these locations are likely to change, watch the RPM packaging for clues). Aside from the separate cartridge location, there are logic branches for the two formats in the node model objects, most prominently in OpenShift::ApplicationContainer (application_container.rb under the openshift-origin-node gem) making a lot of calls against @cartridge_model which is either a V1CartridgeModel or a V2CartridgeModel object depending.

The logic branches are based on two things – for an existing gear, the cartridge format already present is used; otherwise, for new gears, the presence of a marker file /var/lib/openshift/.settings/v2_cartridge_format is checked (which is the main thing the command above changes) – if present, use v2 cartridges, otherwise use the old ones. In this way, the development and testing of v2 cartridges can continue without needing a fork / branch and without disrupting the use of v1 cartridges.

A word of warning, though: you can use gears with the v1 and v2 cartridges in parallel on the same node (toggle back and forth), but don’t try to configure an embedded cart from one format into a gear with the other. Also, do not set a different mode on different nodes in the same installation. Results of trying to mix and match that way are undefined, which is to say, probably super broken.

Let’s look around a bit.

# ls /usr/libexec/openshift/cartridges/
10gen-mms-agent-0.1 diy-0.1 jbossews-1.0 mongodb-2.2 phpmyadmin-3.4 rockmongo-1.1 zend-5.6
abstract embedded jbossews-2.0 mysql-5.1 postgresql-8.4 ruby-1.8
abstract-httpd haproxy-1.4 jenkins-1.4 nodejs-0.6 python-2.6 ruby-1.9
abstract-jboss jbossas-7 jenkins-client-1.4 perl-5.10 python-2.7 switchyard-0.6
cron-1.4 jbosseap-6.0 metrics-0.1 php-5.3 python-3.3 v2

# ls /usr/libexec/openshift/cartridges/v2
diy haproxy jbosseap jbossews jenkins jenkins-client mock mock-plugin mysql perl php python ruby

There look to be a lot fewer cartridges under v2, and that’s not just because they’re not all complete yet. Notice what’s missing in v2? Version numbers. You’ll see the same thing looking in the source at the cartridge source trees and package specs; you don’t have a single cartridge per version anymore. It’s possible to support multiple different runtimes from the same cartridge. This is evident if you look in the ruby cartridge. First, there’s the cartridge manifest:

# grep Version /usr/libexec/openshift/cartridges/v2/ruby/metadata/manifest.yml
Version: '1.9'
Versions: ['1.9', '1.8']
Cartridge-Version: 0.0.1

There’s a default version if none is specified when configuring the cartridge, but there are two versions available in the same cartridge. Also notice the separate directories for version-specific implementations:

# ls /usr/libexec/openshift/cartridges/v2/ruby/versions/
1.8 1.9 shared

So rather than have completely separate cartridges for the different versions, different versions can live in the same cartridge and directly share the things they have in common, while overriding the usually-minor differences. This doesn’t mean we’re going to see ruby versions 1.9.1, 1.9.2, 1.9.3, etc. – in general you’ll only want one current version of a supported branch, such that security and bug fixes can be applied without having to migrate apps to a new version. But it means we cut down on a lot of duplication of effort for multi-versioned platforms. We can put ruby 1.8, 1.9, and 2.0 all in one cartridge and share most of the cartridge code.

You might be wondering how to specify which version you get. I’m not sure what is planned for the future, but at this time I don’t believe the logic branches for v2 cartridges have been extended to the broker. Right now, if you look in /var/log/mcollective.log for the cartridge-list action, you’ll see the node is reporting two separate Ruby cartridges just like before, which are reported back to the client, and you still request app creation with the version in the cartridge:

$ rhc setup
...
Run 'rhc app create' to create your first application.
Do-It-Yourself rhc app create <app name> diy-0.1
 JBoss Enterprise Application Platform rhc app create <app name> jbosseap-6.0.1
 Jenkins Server rhc app create <app name> jenkins-1.4
 Mock Cartridge rhc app create <app name> mock-0.1
 PHP 5.3 rhc app create <app name> php-5.3
 Perl 5.10 rhc app create <app name> perl-5.10
 Python 2.6 rhc app create <app name> python-2.6
 Ruby rhc app create <app name> ruby-1.9
 Ruby rhc app create <app name> ruby-1.8
 Tomcat 7 (JBoss EWS 2.0) rhc app create <app name> jbossews-2.0
$ rhc app create rb ruby-1.8
...
Application rb was created.

If you look in v2_cart_model.rb, you’ll see there’s a FIXME that parses out the version from the base cart name to handle this – the FIXME is to note that this should really be specified explicitly in an updated node command protocol. So at this time, there’s no broker-side API change to pick which version from a cartridge you want. But look for that to change when v2 carts are close to prime time.

By the way, if you’re used to looking in /var/log/mcollective.log to see broker/node interaction, that’s still there (you probably want to set loglevel = info in /etc/mcollective/server.cfg) but a lot more details about the node actions that result from these requests are now recorded in /var/log/openshift/node/platform.log (location configured in /etc/openshift/node.conf). You can watch this to see exactly how mcollective actions translate into system commands, and use this to manually test actions against developing cartridges (see also the mock cartridge and the test cases against it).

You’ll notice if you follow some cartridge actions (e.g. “restart”) through the code that the v2 format has centralized a lot of functions into a few scripts. Where before, each action and hook resulted to a call to a separate script (often symlinked in from the “abstract” cartridge which anyone would admit, is kind of a hack):

# ls /usr/libexec/openshift/cartridges/ruby-1.8/info/{bin,hooks}

/usr/libexec/openshift/cartridges/ruby-1.8/info/bin:
app_ctl.sh build.sh post_deploy.sh ps threaddump.sh
app_ctl_stop.sh deploy_httpd_config.sh pre_build.sh sync_gears.sh
/usr/libexec/openshift/cartridges/ruby-1.8/info/hooks:
add-module deploy-httpd-proxy reload restart stop tidy
configure info remove-httpd-proxy start system-messages update-namespace
deconfigure move remove-module status threaddump

In the new format, these are just options on a few scripts:

# ls /usr/libexec/openshift/cartridges/v2/ruby/bin/
build control setup teardown

If you look at the mcollective requests and the code, you’ll see the requests haven’t changed, but the v2 code is just routing it to the new scripts. For instance, “restart” is now just an option to the “control” script above.

Those are just some of the changes that are in the works. The details are still evolving daily, too fast for me to keep track of frankly, but if you’re interested in what’s happening, especially interested in writing cartridges for OpenShift, you might like to dive into the existing documentation describing the new format:

https://github.com/openshift/origin-server/blob/master/node/README.writing_cartridges.md

Other documents in the same directory may or may not distinguish between v1 and v2 usage, but regardless should be useful, if sometimes out of date, reading.

Follow

Get every new post delivered to your Inbox.

Join 185 other followers