OpenShift: profiles, districts, and nodes, oh my!

Someone in #openshift-dev recently pointed out that the relationship between OpenShift profiles, districts, and nodes isn’t laid out clearly anywhere. I had a look through the docs, and I have to admit, he has a point. You can kind of infer it from various parts of the documentation, but I couldn’t find anywhere that simply states what I’m about to here. I’d be happy to be shown wrong.

TL;DR: Profiles contain districts, which contain nodes, which contain gears. You can’t have districts or nodes with multiple profiles. You can’t control what district or node a gear is created in.

If you’re familiar with OpenShift at all, you probably have at least some grasp of what a gear is: the basic unit of compute in OpenShift. Practically speaking, a gear is actually a regular old user account on a Linux host (an OpenShift node host), with a specified allocation of resources (RAM, disk, network, etc.), locked down by various containment mechanisms to just those resources. Much of OpenShift revolves around managing gears.

Profiles

Gears have a profile, also known as a size. I don’t like calling it a size, because it need not have anything to do with size, but we’re stuck with the term in a few places (notably, the DB and API) so I can’t pretend it’s not there. And for many deployments, it probably will be about size. But I’ll call it a profile here.

Profiles are the most fundamental way in which gears are grouped. The original point of profiles was to provide some uniformity for capacity planning, but you can really use them to partition your gears in any way you want – departmental ownership, location, high availability separation, security clearance, etc. We will have better ways to implement separation for some of these concepts in the not-too-distant future, but at this moment, gear profiles are the only point at which OpenShift enables giving different users access to separate parts of the deployment.

For such a fundamental piece of the architecture, it may be surprising that there is no model object or real definition of a profile in the OpenShift broker database schema. It is literally just a string, which can be whatever you want.

Nodes

An OpenShift node is a Linux host with OpenShift services for containing gears. There might conceivably be additional implementations in the future – the point is that nodes are containers for gears. Nodes have a gear profile, which is defined in /etc/openshfit/resource_limits.conf – this file contains the gear profile string as well as all of the resources that a gear gets when it is created on that node.

Technically, a bunch of nodes all claiming to have a particular gear profile could specify wildly different resource limits. There’s nothing at the broker that would even know they were different, for the most part. But if you are a sane system administrator, you would not do this, except by accident. See, you would probably like to have some idea of how many gears you can fit on the node, so that you know when to make new nodes. And you would probably not like your gears in a profile to have randomly different resource limits.

Most OpenShift admins (and salespeople) wonder at some point how you can specify multiple gear profiles for a node. You can’t. If nodes could host multiple sizes, how would you know how many gears you can fit on that node? It comes as a great surprise to people who want to create a monster node host and run their whole PaaS off of it when we tell them “just partition it into VMs and give them different profiles.” So perhaps someday, we will enable multiple profiles per host; but don’t bet on it. Sometimes very unintuitive results make sense when you look at the bigger picture.

So at this point, gear profiles are synonymous with node profiles. A node contains gears of a particular profile, and a gear profile is constrained in number of gears by the number of node hosts configured with that profile.

Districts

Of all the things that are misunderstood in OpenShift, I’m pretty sure districts are number one. I think it’s because they kind of sound like what profiles actually are – a partitioning scheme. They’re really not at all.

Conceptually, profiles contain districts, and districts contain node hosts. A district has a gear profile and can only contain nodes with that profile. But districts are completely invisible to users, and there is no way to specify which one a gear lands in when you’re creating a gear. They have nothing to do with permissions or partitions. To understand what they are for, you have to understand a little about the technical details that motivated them.

Sometimes, for one reason or another, you want to move a gear from one node host to another. You would like it to function exactly the same way on the new host as the old one. The problem is, there are a few resources that must be exclusive to any single gear on a host: internal IPs and external ports being the main ones. If you move a gear from one host to another, there’s no guarantee that the same resources it was using will be available on the new host, so you would need to detect this situation and reconfigure the gear to use unique resources that don’t clash. This is indeed the approach that was taken before districts, but it proved to be rather brittle.

  1. It led to all kinds of edge case bugs where things would wind up broken only when certain cartridges (or combinations of them, perhaps when scaled…) were moved to nodes where they needed to be reconfigured. In short, it was a regression testing nightmare.
  2. It also made cartridges hard to write correctly to handle moving, and we wanted writing cartridges to be easy so that lots of developers can contribute them.
  3. Finally, since gears are configured via setting environment variables, reconfiguring for a move meant changing environment variables. The parts of the gear that relied on these would work fine after a move, but the places where the app developer had hard-coded the values instead of using environment variables… broke. Naturally, developers assume it is the administrator’s fault, or a bug in the PaaS. So, it’s an administrative nightmare too.

To get around this, OpenShift introduced a simple allocation scheme: to ensure that you can move a gear off of a node, reserve the unique resources it will need on multiple nodes.

In practice, a district is nothing more than a pool of numeric user IDs that are reserved against a set of nodes. Every time a gear is created, it first reserves an ID from the district pool; then on a node host in that district, a user is created with that ID, and algorithms based on that ID specify the range of resources that are available to the gear. Since the UID is reserved across all the nodes in the district, it is guaranteed to be available if you move a gear to any node in that district, and thus all the resources based on it will also be available, and the gear needs no reconfiguration. Problem solved.

Unfortunately, having to explain all this just to get across what districts are for… is pretty awkward. But it’s a necessary concept to understand if you’re an OpenShift administrator.

Advertisements

OpenShift v2 cartridges: node host tools

There is a series starting on the OpenShift blog about the v2 cartridge format. Check it out. Way more official than whatever I write here.

Updated 2013-04-17 – updates marked below.

I introduced v2 cartridges in a previous post. If you have an OpenShift Origin node host running and you’ve toggled it to v2 mode, follow along.

When a user creates an application, that flows through the broker and mcollective to the node via the MCollective openshift.rb agent. You can shortcut that path if you want to create gears and configure cartridges into them more directly on the node host. None of the following involves the broker (so, of course, the broker will deny all knowledge of it if you ask).

Creating a gear

You can use the oo-app-create command on a node host to create a gear arbitrarily. Several parameters are required. You can run it with -h to see all the options. The main things you need are:

  1. An application UUID. This is a unique string that identifies the whole application; if the app has multiple gears, all will still share the application ID. Once the gear is created this goes in an environment variable.
  2. A gear UUID, which is referred to as a “container” ID. This is a unique string that identifies the gear. For single-gear apps, we typically just re-use the application ID, but that’s just convention.
  3. The application name – this would ordinarily be what a developer called their application.
  4. A namespace, which would ordinarily be the developer’s “domain”.

So the fun news is, since you don’t have to deal with the broker, you can totally make these up. There are a few requirements:

  1. The UUIDs do actually have to be unique, at least on the node host (they’re supposed to be unique across OpenShift). The broker just makes these up and instructs the node to use them.
  2. Gear UUID and name and namespace need to be all “word” characters. App name and UUID can be basically anything.
  3. Gear UUID will be used to create a system user, so it can’t violate the restrictions on that – e.g. it can’t be too long.

So, once you’ve made up your values, you can just create an empty gear like so:

# UUID=9vr98yvfr98ju23vfjuf
# DOMAIN=lm20130412
# NAME=testname
# oo-app-create --with-app-uuid $UUID \
--with-container-uuid $UUID \
--with-namespace $DOMAIN \
--with-app-name $NAME

Once you’ve done this, you check that the gear has been created with “id $UUID” and by looking in /var/lib/openshift for a directory of the same name.

A quick description of some of the optional parameters is in order:

  • --with-container-name is the name for the gear as opposed to the app – it just defaults to the app name if not specified. This is what is combined with the domain to create the DNS record for your gear – even if it’s a child gear in a scaled app, it will get its own DNS entry (although if you’re manually creating gears this way, the broker never knows to create the DNS entry so it’s rather moot).
  • --with-uid is used to specify a numeric user ID for the gear user – this is specified by the broker for nodes that are in a district; the UID is chosen from a pool that is available in the district and reserved for the gear regardless of which node in the district it lands on. So, it’s specified at the time the gear is created. If not specified, the node just picks an available one.

Distinguishing v1 and v2 gears

Even before we’ve done anything else with the new gear, it is marked as a v2 gear. Look at the files in the gear:

/var/lib/openshift/9vr98yvfr98ju23vfjuf/
├── app-root
│   ├── data
│   │   └── .bash_profile
│   ├── repo -> runtime/repo
│   └── runtime
│   ├── data -> ../data
│   ├── repo
│   └── .state
├── .env
│   ├── CARTRIDGE_VERSION_2
│   ├── HISTFILE
│   ├── HOME
│   ├── OPENSHIFT_APP_DNS
│   ├── OPENSHIFT_APP_NAME
│   ├── OPENSHIFT_APP_UUID
│   ├── OPENSHIFT_DATA_DIR
│   ├── OPENSHIFT_GEAR_DNS
│   ├── OPENSHIFT_GEAR_NAME
│   ├── OPENSHIFT_GEAR_UUID
│   ├── OPENSHIFT_HOMEDIR
│   ├── OPENSHIFT_REPO_DIR
│   ├── OPENSHIFT_TMP_DIR
│   ├── PATH
│   ├── TMP
│   ├── TMPDIR
│   └── TMP_DIR
├── .sandbox
│   └── 9vr98yvfr98ju23vfjuf
├── .ssh
└── .tmp

There is exactly one difference from a v1 gear: the CARTRIDGE_VERSION_2 env var. But that’s enough – the presence of this addition is used to decide whether to use v1 or v2 logic with cartridges in this gear.

Configuring a cartridge into the gear

So, let’s actually add a cartridge. You can do this with the oo-cartridge command. This is basically a convenience wrapper for manual testing – nothing in the product uses this script, but it is an entry point to the same code that actually is executed via MCollective to instantiate a cartridge in the gear.

# oo-cartridge -a add -c $UUID -n mock-0.1 -v -d
Cartridge add succeeded
Output:
-----------------------------
Creating version marker for 0.1

Although I’ve added the -v -d flags (verbose, debug) you can see there isn’t much output here. Without either flag you just get the first line (success or failure). The “verbose” flag adds the output from the start hook after the cartridge is added. The “debug” flag will give detailed output if there is an exception (otherwise it adds nothing). To see what is really going on, you’ll want to look at the platform logs.

The platform logs are configured in /etc/openshift/node.conf and located in /var/log/openshift/node/. I suggest for the purposes of understanding, set the platform.log level to INFO in order to understand the flow of what’s happening, and leave platform-trace.log at DEBUG level to consult for the actual bash commands and their results. If you were developing a cartridge, though, you’d probably want platform.log at DEBUG level (the default) to see the bash commands mixed in with the code-level flow.

Example INFO-level platform.log for the above (leaving off timestamps and such):

Creating cartridge directory 9vr98yvfr98ju23vfjuf/mock
Created cartridge directory 9vr98yvfr98ju23vfjuf/mock
Creating private endpoints for 9vr98yvfr98ju23vfjuf/mock
Created private endpoint for cart mock in gear 9vr98yvfr98ju23vfjuf: [OPENSHIFT_MOCK_EXAMPLE_IP1=127.0.251.129, OPENSHIFT_MOCK_EXAMPLE_PORT1=8080]
Created private endpoint for cart mock in gear 9vr98yvfr98ju23vfjuf: [OPENSHIFT_MOCK_EXAMPLE_IP1=127.0.251.129, OPENSHIFT_MOCK_EXAMPLE_PORT2=8081]
Created private endpoint for cart mock in gear 9vr98yvfr98ju23vfjuf: [OPENSHIFT_MOCK_EXAMPLE_IP1=127.0.251.129, OPENSHIFT_MOCK_EXAMPLE_PORT3=8082]
Created private endpoint for cart mock in gear 9vr98yvfr98ju23vfjuf: [OPENSHIFT_MOCK_EXAMPLE_IP2=127.0.251.130, OPENSHIFT_MOCK_EXAMPLE_PORT4=9090]
Created private endpoints for 9vr98yvfr98ju23vfjuf/mock
mock attempted lock/unlock on out-of-bounds entry [~/invalid_mock_locked_file]
Running setup for 9vr98yvfr98ju23vfjuf/mock
Ran setup for 9vr98yvfr98ju23vfjuf/mock
Creating gear repo for 9vr98yvfr98ju23vfjuf/mock from ``
Created gear repo for 9vr98yvfr98ju23vfjuf/mock
Processing ERB templates for /var/lib/openshift/9vr98yvfr98ju23vfjuf/mock/**
Connecting frontend mapping for 9vr98yvfr98ju23vfjuf/mock: => 127.0.251.129:8080 with options: {"websocket"=>true}
Connecting frontend mapping for 9vr98yvfr98ju23vfjuf/mock: /front1a => 127.0.251.129:8080/back1a with options: {}
configure output: Creating version marker for 0.1

platform-trace.log DEBUG-level output for the tail end of that is:

oo_spawn running service openshift-node-web-proxy reload: {:unsetenv_others=>false, :close_others=>true, :in=>"/dev/null", :out=>#<IO:fd 12>, :err=>#<IO:fd 9>}
oo_spawn buffer(11/) Reloading node-web-proxy:
oo_spawn buffer(11/) [
oo_spawn buffer(11/) OK
oo_spawn buffer(11/) ]
oo_spawn buffer(11/)
oo_spawn buffer(11/)
oo_spawn running /usr/sbin/httxt2dbm -f DB -i /etc/httpd/conf.d/openshift/nodes.txt -o /etc/httpd/conf.d/openshift/nodes.db-20130413-30162-z6g4cz/new.db: {:unsetenv_others=>false, :close_others=>true, :in=>"/dev/null", :out=>#<IO:fd 11>, :err=>#<IO:fd 8>}
oo_spawn running /usr/sbin/httxt2dbm -f DB -i /etc/httpd/conf.d/openshift/nodes.txt -o /etc/httpd/conf.d/openshift/nodes.db-20130413-30162-82d3xc/new.db: {:unsetenv_others=>false, :close_others=>true, :in=>"/dev/null", :out=>#<IO:fd 11>, :err=>#<IO:fd 8>}

In OpenShift you can compose an application by adding multiple cartridges, e.g. database or cron cartridges. The mock-plugin cartridge tests this functionality. You can use oo-cartridge to add this as well:

# oo-cartridge -a add -c $UUID -n mock-plugin-0.1 -d
Cartridge add succeeded
Output:
-----------------------------
Creating version marker for 0.1

You can check that your gear is up and running with curl. Your gear has been configured into the front-end proxy’s routing, even though its DNS record doesn’t exist. You can tell the proxy which gear to access by setting the host header:

curl http://localhost/ -IH "Host: $NAME-$DOMAIN.$CLOUD_DOMAIN"

(You can get CLOUD_DOMAIN from /etc/openshift/node.conf) Of course with the mock cartridge, there may not be much to see; another cartridge like php-5.3 or ruby-1.9 will have content by default.

Cucumber tests

The mock cartridge is a testing tool for putting the cartridge logic through its paces. Take a look at the cucumber tests in the origin-server repo to see that in action (the mock cartridge feature is controller/test/cucumber/cartridge-mock.feature).

Updated 04/17: even as I was writing this, the cartridge-mock.feature was split into platform-* features, e.g. platform-basic.feature. Look at those instead.

You can run these by checking out the origin-server git repo on your node host and running cucumber against the feature file you are interested in (of course you must have the cucumber gem installed):

# cucumber cartridge-mock.feature
Using RSpec 2
simple: 17816 ruby 1.9.3 2012-11-10 [x86_64-linux]
@runtime_other
Feature: V2 SDK Mock Cartridge
Scenario: Exercise basic platform functionality in isolation # cartridge-mock.feature:4
[...]

If you were developing a new v2 cartridge, BDD with a cucumber feature would probably be a better approach than the manual testing this post is demonstrating, especially for testing composing and scaling.

Logging into the gear

Update 04/17: Adding this section

Now that you have a gear with a cartridge or two in it, you might want to log in and look around, like you are used to with normal gears. Of course this really just means getting a login as the gear user. Normally you would do that with ssh, but you haven’t set up an ssh key for the gear yet. It’s easy to do that, but why bother? You can just use su, right?

# su - $UUID
Invalid context: unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023, expected unconfined_u:system_r:openshift_t:s0:c0,c502

Not so fast. The gear runs in a specialized SELinux context, and normal su doesn’t handle that. For this purpose you need oo-su:

# oo-su $UUID -c oo-trap-user

This will get you an ordinary gear-user login (preceded by a few error messages that don’t seem to harm anything). oo-trap-user is the login shell; of course, you don’t have to do that, you can use oo-su similarly to directly run any command in the context of the gear user.

Cleanup

You can remove a cartridge from a gear in much the same way it was added:

# oo-cartridge -a delete -c $UUID -n mock-plugin-0.1 -d
Cartridge delete succeeded
Output:
-----------------------------
# oo-cartridge -a delete -c $UUID -n mock-0.1 -d
Cartridge delete succeeded
Output:
-----------------------------

You’ll notice, though, that the gear is likely not left pristine. Cartridges leave log files and more even after removed. The base framework cartridges are particularly bad about this. You’ll even find that removing one framework cartridge and adding another may cause a failure. That’s because in real usage, framework cartridges are never removed. The whole gear is simply discarded:

# oo-app-destroy -a $UUID -c $UUID --with-app-name $NAME --with-namespace $DOMAIN

So, that is the simplest cleanup.