Wednesday

Returning from a long silence, going to try once again to make a habit of journaling. Expect it to be mundane.

Also returning from a long vacation — two weeks (that’s long for me) plus two days of F2F with my team. So, a fair amount of time going through email, trying to respond to quick things, turning the rest into personal Trello cards. For a long time I tried to turn things into todos in the GMail app, which had the advantage of enabling nice references to emails so I could return to them and follow up when done with something. However it didn’t do a very good job of capturing the state of each task and I was clearly not really using it. So, trying something else. Not sure personal Trello will stick either, but I gotta keep trying things until something does.

Right now I’m stuck trying to get openshift-ansible to run to test a little change I’m making. openshift_facts module is failing inexplicably:

<origin-master> (0, 'Traceback (most recent call last):\r\n File "/tmp/ansible_QyeeOK/ansible_module_openshift_facts.py", line 2470, in <module>\r\n main()\r\n File "/tmp/ansible_QyeeOK/ansible_module_openshift_facts.py", line 2457, in main\r\n protected_facts_to_overwrite)\r\n File "/tmp/ansible_QyeeOK/ansible_module_openshift_facts.py", line 1830, in __init__\r\n protected_facts_to_overwrite)\r\n File "/tmp/ansible_QyeeOK/ansible_module_openshift_facts.py", line 1879, in generate_facts\r\n facts = set_selectors(facts)\r\n File "/tmp/ansible_QyeeOK/ansible_module_openshift_facts.py", line 496, in set_selectors\r\n facts[\'logging\'][\'selector\'] = None\r\nTypeError: \'unicode\' object does not support item assignment\r\n', 'Shared connection to 192.168.122.156 closed.\r\n')
fatal: [origin-master]: FAILED! => {
 "changed": false, 
 "failed": true, 
 "module_stderr": "Shared connection to 192.168.122.156 closed.\r\n", 
 "module_stdout": "Traceback (most recent call last):\r\n File \"/tmp/ansible_QyeeOK/ansible_module_openshift_facts.py\", line 2470, in <module>\r\n main()\r\n File \"/tmp/ansible_QyeeOK/ansible_module_openshift_facts.py\", line 2457, in main\r\n protected_facts_to_overwrite)\r\n File \"/tmp/ansible_QyeeOK/ansible_module_openshift_facts.py\", line 1830, in __init__\r\n protected_facts_to_overwrite)\r\n File \"/tmp/ansible_QyeeOK/ansible_module_openshift_facts.py\", line 1879, in generate_facts\r\n facts = set_selectors(facts)\r\n File \"/tmp/ansible_QyeeOK/ansible_module_openshift_facts.py\", line 496, in set_selectors\r\n facts['logging']['selector'] = None\r\nTypeError: 'unicode' object does not support item assignment\r\n", 
 "msg": "MODULE FAILURE", 
 "rc": 0
}

And since that error happens early in init of the first master, it cascades to the node which fails trying to look up the master’s version, giving a lovely masking error at the end of the output:

fatal: [origin-node-1]: FAILED! => {
 "failed": true, 
 "msg": "The task includes an option with an undefined variable. The error was: {{ hostvars[groups.oo_first_master.0].openshift_version }}: 'dict object' has no attribute 'openshift_version'\n\nThe error appears to have been in '/home/lmeyer/go/src/github.com/openshift/openshift-ansible/playbooks/common/openshift-cluster/initialize_openshift_version.yml': line 16, column 5, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n pre_tasks:\n - set_fact:\n ^ here\n\nexception type: <class 'ansible.errors.AnsibleUndefinedVariable'>\nexception: {{ hostvars[groups.oo_first_master.0].openshift_version }}: 'dict object' has no attribute 'openshift_version'"

Yeah, so… Ansible has a great way of welcoming you back.

 

Concerns for preflight check design

Lately I’m working on preflight checks for OpenShift v3 Ansible installs/upgrades. There is no piece right now that checks that you have everything you might reasonably need set up for an install/upgrade and bails out before doing anything if you don’t. What happens right now is that you get partway through the install/upgrade and then find out… oh, you have the wrong repos enabled or whatever, UGLY ERROR -> Fix it and start over again… bleah. Nobody enjoys SEV1 support calls in the middle of the night. For installs and particularly for upgrades, we’d really like the sysadmin to be able to run a preflight check before their outage window and find out about any common problems at that time.

So my latest conundrum is figuring out what the user expects during a preflight check. This is not as straightforward as you might think. The installer does a pretty good job of figuring out what you meant without you having to specify everything down to the last detail (because humans are not reliably good at doing that). Thing is, it may install and configure a number of things on your systems… just in order to figure out how to run.

This isn’t a big deal in the installer, because when you run an install or upgrade, you expect to install and configure things. Preflight checks are different because you’d like to affect system state as little as possible. The whole idea is to do checks before you make changes. So if we just reuse the logic the installer uses, users may be unpleasantly surprised to find their systems being changed.

So, for example. Pretty much the first thing that we want is facts about the configuration and the systems, which the openshift_facts role provides. This role runs various custom Ansible modules on target systems, which requires several dependencies to be present on those systems. If they aren’t there, they’re installed.

An Origin RPM install requires enabling an Origin repo. Unless you configure one beforehand, for Origin this is usually set up by the openshift_repos role, which is a dependency of the openshift_version role. So if you want to run the preflight checks before an install, you won’t have any Origin repo to check RPMs unless the checks configure this repo like the installer does.

The openshift_version role itself relies on some clever things to determine the version to install. If you’re doing an RPM install, it uses the repoquery tool to determine the precise version of RPMs that are available, so it can match it with the precise version of images to run; thus yum-utils is installed to provide repoquery. If you’re doing an enterprise containerized install, it looks up the precise version of images available by running a docker image on the remote host — and on an RPM-based host, installs and configures firewalld and docker to run that.

So in thinking about this, I’ve tried to determine if there’s any way to tease out just what we need for preflight checks and put that in a shared role, without having to go through as thorough a setup as we would for an install or upgrade. Or if we can make simplifying assumptions to do only what we need. Without going through too detailed an analysis, I think the answer is basically… no. We do not want to create and maintain parallel logic in the preflight checks for the very complex ways in which the installer determines what to do.

Reflecting a bit further, letting preflight config setup alter the systems is not really a problem, practically speaking.  If the user is installing a new cluster or adding hosts to an existing one, the target hosts are not in production yet, so altering them should be acceptable. If the user is upgrading, all of the necessary config and dependencies should already be in place, so hosts won’t be substantially altered. So, just depend on the same logic from the installer (and perhaps improve the user-friendliness of the output when things go wrong even before preflight checks). And very clearly document expectations.

Running an OpenShift install into containers

For testing purposes, we would like the ability to set up and tear down a whole lot of OpenShift clusters (single- or multi-node). And why do this with VMs when we have all of this container technology? A container looks a lot like a VM, right? And we have the very nifty (but little-documented) docker connection plugin for Ansible to treat a container like a host. So we ought to be able to run the installer against containers.

Of course, things are not quite that simple. And even though I’m not sure how useful this will be, I set out to just see what happens. Perhaps we could at least have a base image from an actual Ansible install of OpenShift that runs an all-in-one cluster in a container, rather than going through oc cluster up or the like. Then we would have full configuration files and separate systemd units to work with in our testing.

So first, defining the “hosts”. It took me a few iterations to get to this given the examples go in a different direction, but I can just define containers in my inventory as if they were hosts, and specify the docker connection method for them as a host variable. Here’s my inventory for an Origin install:

[OSEv3:children]
masters
nodes
etcd

[OSEv3:vars]
deployment_type=origin
openshift_release=1.4
openshift_uninstall_images=False

[masters]
master_container ansible_connection=docker

[nodes]
master_container ansible_connection=docker
node_container ansible_connection=docker

[etcd]
master_container ansible_connection=docker

To ensure the containers exist and are running before ansible tries to connect to them, I created a play to iterate over the inventory names and create them:

---
- name: start up containers
  hosts: localhost
  tasks:
  - name: start containers
    with_inventory_hostnames:
      - all
    docker_container:
      image: centos/systemd
      name: "{{ item }}"
      state: started
      volumes:
        - /var/run/docker.sock:/var/run/docker.sock:z

This uses the Ansible docker_container module to ensure there is a docker container for each hostname that is running the centos/systemd image (a base CentOS image that runs systemd init). Since I don’t really want to run a separate docker inside of each container once the cluster is up (remember, I want to start a lot of these, and they’ll pretty much all use the same images, so I’d really like to reuse the docker image cache), I’m mounting in the host’s docker socket so everyone will use one big happy docker daemon.

Then I just have to run the regular plays for an install (this assumes we’re in the openshift-ansible source directory):

- include: playbooks/byo/openshift-cluster/config.yml

Now of course it could not be that simple. After a few minutes of installing, I ran into an error:

TASK [openshift_clock : Start and enable ntpd/chronyd] *************************
fatal: [master_container]: FAILED! => {
 "changed": true, 
 "cmd": "timedatectl set-ntp true", 
 "delta": "0:00:00.200535", 
 "end": "2017-03-14 23:43:39.038562", 
 "failed": true, 
 "rc": 1, 
 "start": "2017-03-14 23:43:38.838027", 
 "warnings": []
}

STDERR:

Failed to create bus connection: No such file or directory

I looked around and found others who had similarly experienced this issue, and it seemed related to running dbus, but dbus is installed in the image and I couldn’t get it running. Eventually a colleague told me that you have to run the container privileged for dbus to work. Why this should be, I don’t know, but it’s easily enough done.

On to the next problem. I ran into an error from within Ansible that was trying to use 1.4 as a string when it’s specified as a float.

TASK [openshift_version : set_fact] **********************************************************************************
fatal: [master_container]: FAILED! => {
 "failed": true
}

MSG:

The conditional check 'openshift_release is defined and 
openshift_release[0] == 'v'' failed. The error was: 
error while evaluating conditional (openshift_release is 
defined and openshift_release[0] == 'v'): float object has no element 0

Having seen this sort of thing before I could see this was due to how I specified the openshift_release in my inventory. It looks like a number so the YAML parser treats it as one. So I can just change it to "1.4" or v1.4 and it will be parsed as a string. I think this was only a problem when I was running Ansible from source; I didn’t see it with the released package.

Next problem. A playbook error because I’m using the docker connection plugin and so no ssh user is specified and thus it can’t be retrieved. Well, even though it’s unnecessary, just specify one in the inventory.

[OSEv3:vars]
ansible_user=root

Next problem. The installer complains that you need to have NetworkManager before running the install.

TASK [openshift_node_dnsmasq : fail] *******************************************
fatal: [master_container]: FAILED! => {
 "changed": false, 
 "failed": true
}

MSG:

Currently, NetworkManager must be installed and enabled prior to installation.

And I quickly found out that things will hang if you don’t restart dbus (possibly related to this old Fedora bug) after installing NetworkManager. Alright, just add that to my plays:

- name: set up NetworkManager
  hosts: all
  tasks:
  - name: ensure NetworkManager is installed
    package:
      name: NetworkManager
      state: present
  - name: ensure NetworkManager is enabled
    systemd:
      name: NetworkManager
      enabled: True
  - name: dbus needs a restart after this or NetworkManager and firewall-cmd choke
    systemd:
      name: dbus
      state: restarted

When I was first experimenting with this it went through just fine. On later tries, starting with fresh containers, this hung at starting NetworkManager, and I haven’t figured out why yet.

Finally it looked like everything is actually installing successfully, but then of course starting the actual node failed.

fatal: [node_container]: FAILED! => {
 "attempts": 1, 
 "changed": false, 
 "failed": true
}

MSG:

Unable to start service origin-node: Job for origin-node.service 
failed because the control process exited with error code. 
See "systemctl status origin-node.service" and "journalctl -xe" for details.

# docker exec -i --tty node_container bash
[root@9f7e04f06921 /]# journalctl --no-pager -eu origin-node 
[...]
systemd[1]: Starting Origin Node...
origin-node[8835]: F0315 19:13:21.972837 8835 start_node.go:131] 
cannot fetch "default" cluster network: Get 
https://cf42f96fd2f8:8443/oapi/v1/clusternetworks/default: 
dial tcp: lookup cf42f96fd2f8: no such host
systemd[1]: origin-node.service: main process exited, code=exited, status=255/n/a
systemd[1]: Failed to start Origin Node.


Actually I got a completely different error previously related to ovs that I’m not seeing now. These could be anything as far as I know, but it may be related to the fact that I didn’t expose any ports or specify any external IP addresses for my “hosts” to talk to each other nor arrange any DNS for them to resolve each other. In any case, something to remedy another day. So far the playbook and inventory look like this:

---
- name: start up containers
  hosts: localhost
  tasks:
    - name: start containers
  with_inventory_hostnames:
    - all
  docker_container:
    image: centos/systemd
    name: "{{ item }}"
    state: started
    privileged: True
    volumes:
    - /var/run/docker.sock:/var/run/docker.sock:z

- name: set up NetworkManager
  hosts: all
  tasks:
    - name: ensure NetworkManager is installed
      package:
        name: NetworkManager
        state: present
    - name: ensure NetworkManager is enabled
      systemd:
        name: NetworkManager
        enabled: yes
        state: started
    - name: dbus needs a restart after this or NetworkManager and firewall-cmd choke
      systemd:
        name: dbus
        state: restarted

- include: openshift-cluster/config.yml

 

[OSEv3:children]
masters
nodes
etcd

[OSEv3:vars]
deployment_type=origin
openshift_release="1.4"
openshift_uninstall_images=False
ansible_user=root

[masters]
master_container ansible_connection=docker

[nodes]
master_container ansible_connection=docker
node_container ansible_connection=docker

[etcd]
master_container ansible_connection=docker

See the thing I made

I wrote an article for the Red Hat Developers Blog. I haven’t felt much like blogging this year, but there’s one thing at least. If I have articles I think would be of outside interest, I’ll probably post them over there. This blog should return to its original purpose, which was for me to blather about my frustrations and solutions in a kind of stream of consciousness.

libvirt boxen for OpenShift v3

I promise I have not been struggling with vagrant the whole time since my last post. Actually I updated the vagrant-openshift docs and made some other fixes so the whole thing is a little more sane and obvious how to use, and then went on to other stuff. Today I’m just trying to put together OpenShift v3 libvirt boxen to put up for the public next to the virtualbox ones. Should be easy, actually it probably is; my problems today all seem to be local.

It would be nice if, just once, vagrant had a little transparency. It doesn’t have a verbose mode, and never tells you where anything is or should be.

$ vagrant box list
aws-dummy-box (aws, 0)
fedora_base (libvirt, 0)
fedora_inst (libvirt, 0)
openstack-dummy-box (openstack, 0)

Ah, yeah… so… where are those defined? What images do they point to, and where were they downloaded from?

The errors are the worst. When something goes wrong, could you please tell me what you think you got from me, what you tried to do with that, and what went wrong? No.

$ vagrant up --provider=libvirt
Bringing machine 'openshiftdev' up with 'libvirt' provider...
Name `origin_openshiftdev` of domain about to create is already taken.
Please try to run `vagrant up` command again.

Just try to figure out what is specifying “origin_openshiftdev” as a domain and what to do about it. Or how to release it so I can, in fact, run vagrant up again.

$ vagrant status
Current machine states:

openshiftdev not created (libvirt)

The Libvirt domain is not created. Run `vagrant up` to create it.
$ vagrant destroy
==> openshiftdev: Domain is not created. Please run `vagrant up` first.

Part of the problem is that I have at least three semi-autonomous bits of vagrant to deal with. There’s vagrant itself, which keeps track of box definitions. There’s the Vagrantfile I’m feeding it from OpenShift Origin, which might interact with the vagrant-openshift plugin (though I don’t think so on vagrant up) but in any case defines what hosts I’m supposed to be creating. Finally, there’s the provider plugin (libvirt in this case) that has to interface with the virtualization to actually manage the hosts. If something goes wrong, I can’t even tell which part is complaining, much less why.

Enough complaining, what is going on?

The primary input to vagrant is a “box”. This is really just a tarball that contains a minimal Vagrantfile, metadata file, and the real payload, the disk image of the virtual host. The vagrant “box” is provider-specific – the metadata specifies a provider.

When you run vagrant up, the local Vagrantfile should specify which box to start with – a URL to retrieve it and the name for vagrant to import it as. The first run will download and unpack it under ~/.vagrant.d/boxes/<name>/<version>/<provider>/ (note, you can have multiple providers for the same box name/version). Subsequent runs just use that box definition. Simple enough as it goes.

vagrant up also creates a local .vagrant/ directory to keep track of “machines” (which are intended to represent actual running virtual hosts instantiated from boxes). Machines are stored under .vagrant/machines/<name>/<provider>, where the name comes from the Vagrantfile VM definition. In OpenShift’s Vagrantfile we have config.vm.define “openshiftdev”, so for the libvirt provider I could expect to see a directory .vagrant/machines/openshiftdev/libvirt once I’ve brought up a machine. (Under vbox you can define a master and several minions, which would all have different names. I hope we can do that soon with the other providers too.)

I was planning to build a libvirt box from scratch, but then I realized there is a Vagrant plugin “vagrant-mutate” that will take an existing box and change it to another provider. Since we already have boxes defined for vbox I thought I’d just try this out to make a libvirt version of it.

$ vagrant mutate \
  https://mirror.openshift.com/pub/vagrant/boxes/openshift3/centos7_virtualbox_inst.box \
  libvirt
Downloading box centos7_virtualbox_inst from https://mirror.openshift.com/pub/vagrant/boxes/openshift3/centos7_virtualbox_inst.box
Extracting box file to a temporary directory.
Converting centos7_virtualbox_inst from virtualbox to libvirt.
 (100.00/100%)
Cleaning up temporary files.
The box centos7_virtualbox_inst (libvirt) is now ready to use.

So far, so good. Or not, because what does “ready to use” mean? Where is it? Turns out, it means said box is stored under my ~/.vagrant.d/boxes directory for use with the next vagrant up. It kept the same name with the provider embedded in it, but if I just change the name…

$ mv ~/.vagrant.d/boxes/centos7_{virtualbox_,}inst
$ vagrant box list
aws-dummy-box (aws, 0)
centos7_inst (libvirt, 0)
fedora_base (libvirt, 0)
fedora_inst (libvirt, 0)
openstack-dummy-box (openstack, 0)

… everything works out fine. So to use that with my openshift/origin Vagrantfile, I just put that name into my .vagrant-openshift.json file like so:

"libvirt": {
  "box_name": "centos7_inst"
},

Note that I don’t need to specify a box_url because the box is already local. Folks will need the box_url to access it once I publish it. So let’s vagrant up already…

$ vagrant up --provider=libvirt
Bringing machine 'openshiftdev' up with 'libvirt' provider...
/home/luke/.vagrant.d/gems/gems/fog-1.27.0/lib/fog/libvirt/requests/compute/list_volumes.rb:32:in `info': 
Call to virStorageVolGetInfo failed: Storage volume not found: 
no storage vol with matching path '/mnt/VMs/origin_openshiftdev.img'
(Libvirt::RetrieveError)

Ah. This is definitely due to some messing around on my part, because I deleted that image as I thought vagrant was saying earlier it was in the way (remember “Name `origin_openshiftdev` of domain about to create is already taken” ?). This error at least seems safe to pin on the libvirt provider, but I’m not sure what to do about it. Shouldn’t libvirt just clone the image from the vagrant box to create a new VM? How did my request to instantiate the “centos7_inst” box as “openshiftdev” get translated into looking for that particular file to exist?

I’m guessing (since grep got me nowhere) that the libvirt provider takes the directory I’m in and the box being requested and uses that as the VM name. Or at least, a volume name from which VMs can be cloned for Vagrant usage.

virsh to the rescue

I’m not really very knowledgeable of libvirt, mainly because I’ve been able to run VMs just fine using the graphical virt-manager interface and didn’t really need a lot more. I deleted that image above using virt-manager, figuring it would take care of referential integrity. Now that I’m venturing into the world of scripted VM management, I have been fiddling a little with virsh, so let’s apply that:

# virsh vol-list default
 Name                     Path 
------------------------------------------------------------------------------
[...]
 origin_openshiftdev.img  /mnt/VMs/origin_openshiftdev.img

Hmm, yes, libvirt does actually seem to expect that volume to be there. And then it’s failing trying to use it because the actual file isn’t there. So let’s nuke the volume record, wherever that may be.

# virsh vol-delete origin_openshiftdev.img default
Vol origin_openshiftdev.img deleted

And vagrant up --provider=libvirt suddenly works again.

Updating libvirt boxes

One extra note about using libvirt as a provider: as soon as you use vagrant to start a libvirt box you have downloaded, the vagrant-libvirt plugin makes a copy of the image from the box definition and uses that. The copy is made in libvirt’s default storage pool (unless you tell it otherwise… BTW, quite a few interesting options at the vagrant-libvirt README) and is named <box_name>_vagrant_box_image.img. So my box above translates to /mnt/VMs/centos7_inst_vagrant_box_image.img (I use a separate mount point for my VM storage because it’s just too easy to fill your root fs otherwise). Then when you actually create a VM, it uses a copy-on-write snapshot of that image, which seems to be named after the project and VM definition (my problem volume above, origin_openshiftdev.img). That way it’s a pretty fast, efficient startup from a consistent starting point.

Of course this could be a bit confusing if you actually want to update your vagrant box. You might download a new box definition from vagrant’s perspective, but vagrant-libvirt sees it already has a volume with the right name and keeps using that (in fact, once it has copied the volume, you may as well truncate the box.img under .vagrant to save space). You have to nuke the libvirt volume to get it to use the updated box definition. virt-manager seems to do just as well as virsh vol-delete at this (not sure what happened before in my case). So e.g.

# virsh vol-delete centos7_inst_vagrant_box_image.img default

Then the next vagrant up with that box will use the updated box definition.

vagrant setup

I may be an idiot, but I’ve simply never used vagrant successfully before.

“Just vagrant up and you’re ready to go!” say all the instructions. Yeah, that probably works fine with the default VirtualBox, which is available for all major desktop platforms. But I don’t want to use any more proprietary Oracle crap than I absolutely have to. I don’t even want to run VMs on my local host (all my RAM is already taken up by my browser tab habit), but if I did it would be on KVM/QEMU that’s native to Fedora. But I have access to AWS and OpenStack, so why would I even do that?

Well, if you want to use something other than the default, you have to add provider plugins. Alright, sounds easy enough.

$  vagrant plugin install vagrant-openstack-provider
Installing the 'vagrant-openstack-provider' plugin. This can take a few minutes...
Installed the plugin 'vagrant-openstack-provider (0.6.0)'!
$ vagrant plugin install vagrant-aws
Installing the 'vagrant-aws' plugin. This can take a few minutes...
Installed the plugin 'vagrant-aws (0.6.0)'!

Oh yeah, easy-peasy man! OK, let’s fire up OpenShift v3:

$ git clone https://github.com/openshift/origin
$ cd origin
$ vagrant up --provider=aws
There are errors in the configuration of this machine. Please fix
the following errors and try again:
SSH:
* `private_key_path` file must exist: PATH TO AWS KEYPAIR PRIVATE KEY

Hm, OK, must be something I need to provide. Look through the Vagrantfile, looks like it’s expecting an entry for AWSPrivateKeyPath in my .awscreds file. I have a private key file, I can do that. Try again…

$ vagrant up --provider=aws
Bringing machine 'openshiftdev' up with 'aws' provider...
/home/luke/.vagrant.d/gems/gems/fog-aws-0.0.6/lib/fog/aws/region_methods.rb:6:in `validate_aws_region': Unknown region: "<AMI_REGION>" (ArgumentError)

Erm… right, more stuff to fill in. I don’t particularly want to edit the Vagrantfile, and not sure which AMI_REGION I should use. Surely someone on my team has specified all this somewhere? A search brings me to https://github.com/openshift/vagrant-openshift which looks like it ought to at least create me a config file that Vagrant will read. Sounds good, let’s go:

$ cd vagrant-openshift
$ bundle
Fetching git://github.com/mitchellh/vagrant.git
Fetching gem metadata from https://rubygems.org/.........
Resolving dependencies...
[.......]
Using vagrant-openshift 1.0.12 from source at .
Your bundle is complete!
Use `bundle show [gemname]` to see where a bundled gem is installed.

$ rake vagrant:install
vagrant-openshift 1.0.12 built to pkg/vagrant-openshift-1.0.12.gem.
The plugin 'vagrant-openshift' is not installed. Please install it first.
Installing the 'pkg/vagrant-openshift-1.0.12.gem' plugin. This can take a few minutes...
Installed the plugin 'vagrant-openshift (1.0.12)'!

$ cd ~/go/
[luke:/home/luke/go] $ vagrant openshift3-local-checkout -u sosiouxme
/home/luke/.rvm/rubies/ruby-1.9.3-p545/lib/ruby/site_ruby/1.9.1/rubygems/dependency.rb:298:in `to_specs': Could not find 'vagrant' (>= 0) among 218 total gem(s) (Gem::LoadError)
[...stack trace]

Whaaaaat? I’ve entirely broken vagrant now, and I have no idea how. Vagrant seems just a little more… fragile?… than I was expecting. Fine, let’s move to ruby 2.0 and define a gemset just for vagrant, such that if I hose things up again, it’s contained. (I tried ruby 2.1 first but had an error getting rubygems from rubygems.org… well, that’s not vagrant’s fault.) Wait, I can’t do that, recent vagrant versions are no longer published as a rubygem; I’m supposed to get it from my OS. I have it installed as the vagrant-0.6.5 RPM. If I try to add plugins under rvm, it complains that the vagrant *gem* isn’t installed. Which of course it isn’t… if you do install it, it just tells you not to do that.

OK, so let’s just go with the system ruby that apparently the RPM is expecting.

$ rvm use system
Now using system ruby.
$ vagrant plugin install vagrant-aws
Installing the 'vagrant-aws' plugin. This can take a few minutes...
Installed the plugin 'vagrant-aws (0.6.0)'!
$ bundle
[...]
Using vagrant 1.7.2 from git://github.com/mitchellh/vagrant.git (at master)
 [ should I be worried the version doesn't match?]
Installing vagrant-aws 0.6.0
[...]
$ vagrant openshift3-local-checkout -u sosiouxme
Waiting for the cloning process to finish
Cloning origin ...
Cloning git@github.com:sosiouxme/origin
Cloning source-to-image ...
Cloning git@github.com:sosiouxme/source-to-image
Cloning wildfly-8-centos ...
Cloning git@github.com:sosiouxme/wildfly-8-centos
Cloning ruby-20-centos ...
Cloning git@github.com:sosiouxme/ruby-20-centos
ERROR: Repository not found.
 [yeah, I haven't cloned all the repos... do I need to???]
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
Fork of repo wildfly-8-centos not found. Cloning read-only copy from upstream
ERROR: Repository not found.
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
Fork of repo source-to-image not found. Cloning read-only copy from upstream
ERROR: Repository not found.
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
Fork of repo ruby-20-centos not found. Cloning read-only copy from upstream
remote: Counting objects: 1, done.
remote: Total 1 (delta 0), reused 1 (delta 0)
Unpacking objects: 100% (1/1), done.
From https://github.com/openshift/origin
 * [new branch] master -> upstream/master
 * [new tag] v0.2 -> v0.2
OpenShift repositories cloned into /home/luke/go/src/github.com/openshift

$ cd src/github.com/openshift/origin/
$ vagrant origin-init --stage inst --os fedora lmeyer-osv3dev
Reading AWS credentials from /home/luke/.awscred
Searching for latest base AMI
Found: ami-0221586a (devenv-fedora_559)
$ vagrant up --provider=aws
Bringing machine 'openshiftdev' up with 'aws' provider...
[...]
/home/luke/.vagrant.d/gems/gems/excon-0.43.0/lib/excon/middlewares/expects.rb:6:in `response_call': The key pair 'AWS KEYPAIR NAME' does not exist (Fog::Compute::AWS::NotFound)

OK now what? I really don’t want to have to edit Vagrantfile and deal with keeping that out of git. I haven’t quite torn my hair out before a coworker points out https://github.com/openshift/vagrant-openshift#aws-credentials which was considerably further down than I was looking. OK, so I just needed to add  AWSKeyPairName to my .awscreds, and…

$ vagrant up --provider=aws
Bringing machine 'openshiftdev' up with 'aws' provider...
[...]
==> openshiftdev: Machine is booted and ready for use!

Finally! And “vagrant ssh” works too! There’s not really much running yet, but I’ll figure that out later. Now what if I want to tear down that box and do something different? Let’s see…

$ vagrant -h
/home/luke/.vagrant.d/gems/gems/vagrant-share-1.1.3/lib/vagrant-share/activate.rb:8:in `rescue in <encoded>': vagrant-share can't be installed without vagrant login (RuntimeError)

Really? Ah, seems I ran into a bug and just need to upgrade vagrant by downloading it from vagrantup.com rather than Fedora. Just like I apparently did long ago and forgot about. Fun.

What would probably have been obvious to anyone who knew vagrant is that adding vagrant-openshift actually adds commands to what vagrant can handle, such as “vagrant openshift3-local-checkout” above. There are a bunch more on the help menu.

So, back to running stuff. This looks promising:

$ vagrant install-openshift3
Running ssh/sudo command 'yum install -y augeas' with timeout 600. Attempt #0
Package augeas-1.2.0-2.fc20.x86_64 already installed and latest version
[...]
$ vagrant test-openshift3
***************************************************
Running hack/test-go.sh...
[...]

It’s not obvious (to me) how to actually run openshift via vagrant. I’m guessing you just vagrant ssh in and run it from /data where everything is compiled. I was kind of hoping for more magic (like, here’s a vagrant command that sets up three clustered etcd servers and five nodes, and you just ssh in and “osc get foo” works). Also I need to try out the libvirt and openstack providers. But that’s all I have time for today…

Yep, easy-peasy!

OpenShift 3 from zero

I’ve been working on OpenShift v2 for a long time, supporting our existing customers in various ways, but it’s only fairly recently I’ve been able to take a little time to try out OpenShift v3, which as I previously noted, is a complete departure from v2. Mostly the same people working on it, informed by all of the lessons of v2 – but with totally different technology. And that’s great, because v2 spent an awful lot of time on container and orchestration technology that we’ll get with Docker and Kubernetes “for free” (there’s a price in having to collaborate to achieve our own requirements in projects also developed by others, but participation in a healthy community project should eventually bring about a large return on investment).

With totally different technology in play, trying out v3 is totally different from trying out v2. Under v2, you needed to install and configure a ton of RPMs, some built from the OpenShift Origin source (which in itself could be challenging – extensive BuildRequires – or you could get prebuilt RPMs from yum repos, but they wouldn’t be updated that often) as well as from various other sources (various dependencies like Jenkins, MongoDB, etc.) and the OS. Under v3, the hope is that components will be minimized and come from standard sources, preferably as part of the OS, with OpenShift a fairly self-contained add-on. Certainly, what is available now is not as complex as it will be once we’re talking about HA orchestration, routing, and runtime components, but the reduction in complexity specific to OpenShift is already palpable (mostly by being separated out into the Docker, etcd, and Kubernetes components that Red Hat is leveraging as a community participant rather than project owner).

As a rather fast-moving project, unhampered as yet by any semblance of production usage and the need for stability, v3 can still present a few challenges to approach. Any guide to setting it up will inevitably be obsolete quickly as changes introduce inconsistencies from any snapshot in time. And so, expect that things will be renamed (it’s kubecfg… wait, kubectl… wait, openshift cli!), that capabilities will evolve (surely we can figure out how to interact with SELinux enforcing and firewalld), and that you may need to dig around to figure out what the new reality is even when referencing relatively recent guides. (Sidebar: in this day and age, it’s hard to believe there are still blog posts about evolving technology without timestamps. Seriously? If I can’t tell what time period you’re discussing, your post may as well be misinformation.)

Getting to the starting line

This blog post is a case in point. What does it take to get going with OpenShift v3? Well, that depends on what you mean. Do you just want a running instance to poke at, or do you actually want to start building from source so you can modify it as needed?

Let’s start with running it. v3 is based on Docker as the container technology. (Docker isn’t just about running containers, though – it’s a whole infrastructure around building and distributing container images.) You need to have Docker, and Docker is based on Linux. If you’re not running Linux, you can’t run Docker directly, but you can run a fairly minimal virtual machine running Linux for the purpose, and in general I would recommend that even if you do run Linux on your desktop – best if you can set up a test system without disturbing anything else. Any way you can get your hands on a VM will do – whether running locally or in some IaaS cloud you have access to. The v3 OpenShift Origin project comes with a Vagrantfile if the Vagrant approach to provisioning a VM appeals to you, but it’s up to you. I’m not going to walk through that – it will totally depend on what you have and what you know.

But – which operating system to use, and what version/flavor? While Docker is available on most recent Linux distributions, the OpenShift layer on top of it will only be tested and developed regularly on a few Red Hat-related operating systems, so in the interests of minimizing potential problems, I’d recommend one of those:

Fedora 21

It’s free. It’s fairly cutting edge. It has a huge feature set. This is a pretty good base for testing and development – the only problem I could see with using it is that, being fairly fast-moving, it is more likely to have bugs. That and, I suppose, if you’re developing, you have to beware of using language/OS/library features that aren’t available elsewhere yet. Fedora 21 also ships Kubernetes (separately) and golang if that is relevant. You’ll want the “Server” flavor, not the “Cloud” one (yet – see below).

RHEL 7 / CentOS 7

RHEL 7 is Red Hat’s eventual target for running v3 in a supportable fashion. It’s an Enterprise OS, meaning it doesn’t change quickly and you can expect features to be stable across its (long!) lifetime, so it will tend to trail Fedora significantly. It’s not free of price, but the CentOS clone of it is free to use and updates are available from open yum repositories; it follows updates to RHEL pretty quickly (hopefully more so now that CentOS has Red Hat backing). Since most people can’t afford to blow an Enterprise subscription just to fool around with new technology, I’d recommend CentOS 7.

It’s important to note that Docker is included in the “Extras” channel of RHEL 7, which brings a different level of support. Extras are supported in the sense that Red Hat will fix bugs, but not in the sense that updates are backward compatible as with the rest of the OS. Since Docker is still under rapid development, this is the right place for it – Red Hat does not want to get stuck supporting essentially an early beta for ten years. (Current docker RPM is version 1.3.2.) Expect that version to get updates as needed to incorporate required features for OpenShift and other projects, and maybe for it to migrate to the non-Extras channel at some point.

To get golang or gcc-go, you need to add the “Optional” channel, which isn’t supported at all. For testing, the distinction isn’t important, but just be aware that these aren’t a supported part of the OS. Kubernetes isn’t distributed in any of these channels yet (though Fedora 21 does have an RPM for it). It’s not clear to me whether RHEL Extras will ship Kubernetes before OpenShift v3 goes GA, or if we’ll just compile in a fork of Kubernetes as we currently do. Same for etcd.

RHEL 7 Atomic / CentOS 7 Core / Fedora Cloud?

Proejct Atomic servers exist essentially just to host Docker containers. They won’t even let you install packages, instead managing whole-system (atomic) updates via ostree. So, it’s unsuitable for any kind of development, really (you could run a container that provides development tools and libraries, but that seems rather awkward and counter to the point). It’s currently in beta and it seems unlikely to be ready to support OpenShift v3 at GA, but it’s definitely a target for deployment some time later. There’s no actual need for a Docker host to enable traditional package management, since you can just supply any software you want in containers, and OpenShift is no different – it’s a goal to be able to deliver the whole thing via containers. Interestingly enough, the beta Atomic install includes builds of Kubernetes and etcd, in somewhat older versions. It will be interesting to see where this goes, but I don’t see a lot of point to using it as a vehicle for trying OpenShift v3 just now.

Getting Docker ready

Once you have a VM running a Docker-capable Linux as above, you of course need to install it and run the service.

# yum install docker
# setenforce 0
# systemctl enable docker
# systemctl start docker

I don’t know if the “setenforce 0” is still necessary today – certainly the end goal is to have everything running under the protection of SELinux. It’s also worth noting that if you are using firewalld, you should disable it or add docker to the trusted zone in order for networking to work out.

Docker in general requires root access to run, but you can also enable other users to access it by adding them to the docker group:

# usermod -aG docker <user>

(The user must log out and in to gain the new privileges; and be aware this just provides Docker privileges – you’ll still need sudo/root in order to perform other system tasks.)

Docker is set up. Now we run OpenShift v3 in one of three ways. Currently, a single binary runs all necessary services as well as providing a client to access them (all assuming running on the local host – of course things are more complicated without that assumption). It’s just a question of obtaining it and executing it.

Just run it (as a Docker container)

Using Docker, starting up an openshift instance is super easy:

$ docker run \
  -v /var/run/docker.sock:/var/run/docker.sock \
  --net=host --privileged \
  openshift/origin start

What’s going on here?

Well, hopefully it’s evident that you’re running a Docker container. The first time you run this, Docker will pull down the “openshift/origin” container image from the Docker Hub, which you can think of as GitHub for Docker images. This is an image that OpenShift engineers build from source periodically and upload to the Docker Hub. Presumably when it’s time for v3 to go GA, Red Hat will set up a separate authenticated Docker registry to distribute the v3 container images (at least that seems like a likely distribution mechanism – we will see) and you’ll just docker run index.docker.redhat.com/openshift or something like that instead.

Of course, OpenShift isn’t just some container running a workload – it’s actually intended to do orchestration of other containers. So it needs to run as a privileged container, meaning it actually has the ability to “break out” of the container to manage the host system. In particular, it needs a view of the host’s network and docker server, which is what the other options on the command line are about. (The “-v” option mounts things from the host filesystem into the container filesystem.) I should have mentioned that this is going to bind to a number of ports on your host — 4001, 7001, 8080… which of course will fail if there’s already anything listening there, and will be exposed to the external network if it succeeds.

The final word on the command line (“start”) is an argument to the container entry point, which is /usr/bin/openshift (just another binary sitting in a container image). If it makes you a little nervous to pull an image from the internet and run it as a privileged container, well… it should. (So build it yourself! More later.)

Since the docker run wasn’t daemonized, you’ll just see the output from pulling down the container and running it. OpenShift starts up a single binary with REST APIs available for OpenShift, Kubernetes, etcd, a Kubelet, and miscellaneous other stuff. It will run until you hit Ctrl-C, at which point the container exits. Alternatively, run docker with the “-d” option and use “docker logs” to watch the logs:

$ docker run -d \
  -v /var/run/docker.sock:/var/run/docker.sock \
  --net=host --privileged \
  openshift/origin start 
9bd1133f5e0b79e48e7dfca8a23cde06274441442e673b41e85a0b2158c1de9f
$ docker logs -f 9bd1133f5e0b79e48e7dfca8a23cde06274441442e673b41e85a0b2158c1de9f
I1229 21:31:47.229648 1 start.go:174] Starting an OpenShift all-in-one, reachable at http://172.16.4.182:8080 (etcd: http://172.16.4.182:4001)
I1229 21:31:47.229886 1 start.go:184] Node: localhost
I1229 21:31:47.229943 1 etcd.go:29] Started etcd at 172.16.4.182:4001
[...]

Bam! Just by running this privileged container, you’re ready to run through Ben Parees‘s three in-depth blog posts. Well, sort of. The “openshift” binary in this image implements both client and server runtimes. In order to run “openshift” client commands you can execute another container (from the same image):

$ docker run --net=host openshift/origin cli get pods
POD CONTAINER(S) IMAGE(S) HOST LABELS STATUS

(You need –net=host so it can reach the ports on the host where the other container is listening.) Kinda clunky. Probably better to just get a shell in a container:

$ docker run -it --net=host --entrypoint=/bin/bash openshift/origin 
[root@localhost openshift]# openshift cli get pods
POD CONTAINER(S) IMAGE(S) HOST LABELS STATUS

(“-it” gets you an interactive tty, and “–entrypoint” runs a shell instead of the openshift executable.)

And then if you actually do that, you find that openshift has moved on since October, “openshift kube list pods” is now “openshift cli get pods” and the JSON deployment defined in that first blog post no longer matches the API. Ah, the fun never ends!

If the privileged Docker container running the service is stopped for any reason (Ctrl-C, docker stop, reboot…) then you can simply look it up and start it again. (Some fields omitted for brevity)

$ docker ps -a
CONTAINER ID IMAGE COMMAND STATUS
145d0692fbb1 openshift/origin:latest "/bin/bash" Up 37 hours
272f6bf4c15c openshift/origin:latest "/usr/bin/openshift Exited (2) 37 hours ago
$ docker start 272f6bf4c15c
272f6bf4c15c

If instead you start a new container with “docker run”, it will not have any of the data generated during interactions with the previous container (unless you go to pains to have them share a volume mounted at /var/lib/openshift).

Just download it

Well, if all those Docker tricks look shady to you, you can always just work with a good old-fashioned binary. Check for the latest release on github. Download it, unpack, and run it:

# wget https://github.com/openshift/origin/releases/download/v0.2/openshift-origin-v0.2-20-gfe983146fbac7f-fe98314-linux-amd64.tar.gz
# tar zfx openshift-origin-v0.2-20-gfe983146fbac7f-fe98314-linux-amd64.tar.gz
# # ./openshift start &
[1] 21828
[root@localhost bin]# I1231 20:25:16.354385 21828 start.go:174] Starting an OpenShift all-in-one, reachable at http://172.16.4.182:8080 (etcd: http://172.16.4.182:4001)
[...]
# ./openshift cli get pods 
NAME IMAGE(S) HOST LABELS STATUS

This is the same thing as you got from the container, just running outside a container. It binds to the same ports and provides the same services. Currently, it stores data in subdirectories of the pwd, instead of inside the container. Pretty simple? True. But it’s not much harder to generate it yourself from source.

Just compile it

Unlike OpenShift Origin v2, v3 is pretty darn simple to compile yourself from the source. It helps that we’re not trying to build a bunch of RPMs out of it.

It’s a little confusing that “get started developing v3” instructions are currently spread (somewhat duplicated and out of sync) across the Origin project README, CONTRIBUTING, and HACKING documents. I kind of expect the latter two will merge at some point and the README will simply point to the result for those trying to work on the source code. Let’s also note that there is a docs directory for describing how things work or will work (or once worked until they changed direction). Engineers aren’t known for great documentation but we’re trying to be helpful/transparent here, and I believe the plan is for actual documentation writers to contribute to this substantially as the project matures.

The build will likely get more complicated as we get closer to a finished product, but should still remain a lot simpler than v2. There could perhaps be multiple binaries each housing a different component, or possibly we’ll continue with a single binary housing all (simply varying the invocation to provide whatever is necessary for a specific host). For now, it’s a piece of cake: compile one binary (“openshift”) from one repository.

You just need golang and git on the VM you’re working with. As mentioned previously, to do this on RHEL 7 you’ll need the “Extras” and “Optional” channels (Fedora does not have this separation):

# subscription-manager repos --enable rhel-7-server-extras-rpms \
  --enable rhel-7-server-optional-rpms

Then just install golang (and maybe some attendant stuff) and git:

# yum install -y golang git golang-vim make

Sidebar: You may wonder about using gcc-go as an alternate toolchain. Without going into great depth, some initial attempts to use gcc-go to compile Docker have had promising results. The two toolchains will tend to vary a bit but maintain “eventual parity” over time. So I’m pretty hopeful we’ll start seeing Red Hat distribute go projects compiled with the gcc-go toolchain, which brings the benefit of distributing on more architectures than just x86_64. But for now… we’ll assume golang.

So, once you’ve installed golang, you need to create a Go work directory and set up environment variables to use it (I’m assuming you’ll do development as a non-root user, although it works as well either way):

$ mkdir $HOME/go

In your ~/.bash_profile file, set a GOPATH and augment your PATH by adding to the end:

export GOPATH=$HOME/go
export PATH=$PATH:$GOPATH/bin

These set up the location that go will use by default for various actions, and adds the go/bin subdir to your path. You’ll need a new shell to get the updated variables, or you can just run the two “export” commands above at the command line. Now you’re ready to clone the github repo, compile it, and use the “openshift” binary:

$ go get github.com/openshift/origin 
$ cd $GOPATH/src/github.com/openshift/origin
$ hack/build-go.sh
++ Building go targets for linux/amd64: cmd/openshift
++ Placing binaries
$ sudo _output/local/go/bin/openshift start
[... the usual startup output ...]

Let me just mention that if the godeps update between builds, you’ll need to clean out your deps first. You can do this with make clean in the top of the repo (assuming make is installed). All it does is rm -rf _output Godeps/_workspace/pkg so you could just do that manually. Also, plain make runs the build script above.

If you want to make execution a little easier, create the ~/go/bin directory and put a couple symlinks in it:

$ mkdir ~/go/bin
$ ln -s `pwd`/_output/local/go/bin/openshift ~/go/bin/openshift 
$ ln -s `pwd`/_output/local/go/bin/openshift ~/go/bin/osc

openshift is of course our usual command, but what’s osc? When you symlink the binary with this name, it is treated as a shortcut for openshift cli, basically the v3 analog to v2 rhc:

$ osc get pods
POD CONTAINER(S) IMAGE(S) HOST LABELS STATUS

What’s next?

So now you know what operating system to run in a test VM, and the available mechanisms (Docker, download, or build) for obtaining OpenShift v3. Hopefully that gets you to the starting line. What can you actually do with it? I’ll be exploring that further myself, but for now I’ll leave you with the CONTRIBUTING and HACKING documentation (to explain building, testing, and the road ahead) as well as Ben’s great blog posts on usage:

Exploring 2 – journal

I have been reading through Lennart Poettering’s ever expanding up to the seventeenth installment of his ongoing series on systemd for Administrators without much to say here. Good stuff.

Number 17 is about the journal, which is basically a replacement for syslog. This answers my earlier question of how systemd displayed the log lines from httpd… the journal is hooked up by systemd to capture syslog and kernel log entries as well as stdout/stderr for any processes it manages. What I saw in the httpd status output there would be the stdout from starting httpd… the journal isn’t following the actual log files created by httpd (you’d need to configure httpd to log messages to syslog or journal).

The journal is really cool, though. It natively solves a lot of annoying things about system logs, mainly by attaching a ton of metadata to log entries, including automatic and unfakeable items like cgroup, pid, and executable. And then indexes by that and presents a nice filtering interface with the journalctl client (incidentally allowing users to access their own log entries). If we had this in OpenShift 2, we wouldn’t have needed a plugin for rsyslog7 to add these kinds of attributes to gear syslogs, and gears would not have needed to store their own logs at all since they could just access their own journal entries from the host with journalctl (although… I would need to check how access is controlled; if it’s by UID and not SELinux context then we would need to do something special because UIDs can be reused under OpenShift). I bet we’ll use this for v3.

One thing to note under RHEL 7… the default install doesn’t enable the persistent journal – all you have is whatever is stored in /run/log/journal since the last boot. However, it’s easy to enable the persistent journal by just creating /var/log/journal. At this point you can nuke rsyslog and just let the journal capture everything. Also, bash tab completion doesn’t seem to be set up for journalctl attributes as indicated in the blog (there’s probably a simple way to enable that too).

Back to the future (of OpenShift)

I started this blog originally for just sort of writing down random stuff I tried or discovered. It morphed over time into very rare posts along the lines of “I just spent a week figuring this out, let me write it down to save everyone else the trouble”.

Well, OpenShift Enterprise 2.2 is out the door, and that will be in maintenance mode while we work on version 3. Just when I felt like I knew something about v2, it’s time to return to being a dunce because the world has been upended for v3. So maybe it’s time to return to stumbling around and writing down what happens.

Everything old is new ag…. no, wait:

Everything new is really, really new

Approximately nothing from OpenShift v2 will survive recognizably in v3. It will be as different as systemd is from sysv, as different as Linux is from Windows, as different as solar energy is from the Hoover dam. Here’s what’s on my hit list to get up to speed on (let me know what I missed):

RHEL 7 / Atomic

OSE 2 runs on RHEL 6. About the time Fedora 20 and Ruby on Rails 4 came out, it became evident that trying to make it span RHEL 6 and newer platforms was going to be way more trouble than we wanted. We gave up on that and left Origin users to run on CentOS 6 rather than try to keep including Fedora.

This brings some good things, though. Managing dependencies for OSE 2 on RHEL 6 has been a bit of a nightmare. All signs point to that going away completely for v3. As in, you might not even need yum at all. If the eventual platform we recommend is Atomic, platform updates will be whole-system run via rpm-ostree (AKA atomic). If so, then I’ll need to know about that distribution mechanism. If not, it still looks like there will be a lot less to install and configure on the actual OS.

So:

  • rpm-ostree / atomic
  • systemd – have to understand more than just “systemctl enable foo; systemctl start foo” – how to define services, how daemons are spun off and monitored, where logs go…
  • firewalld – is this just a frontend to iptables?
  • btrfs?

go

go is the new hotness. Ruby on Rails is old and broken. OK, not old and broken, but docker, kubernetes, etcd, and the OpenShift layer on top are all go-based. Fortunately I used C all through college… picking up go doesn’t look difficult, should be fun.

golang vs gcc-go – the former is what most are using, the latter gets us more supported platforms if it works with the codebase.

Docker

Docker will be replacing our homegrown containers. It’s a formalization of a lot of the same concepts – creating and containing processes with regards to network, file access, resource usage, etc. Some questions for me to get through:

  • How do I get files into/out of a container? Bind mounts, other kinds of mounts, …? What happens when it goes away?
  • What exactly happens with exposing container networking?
  • How does SELinux contain a Docker container?
  • How do cgroups contain a Docker container in RAM/CPU/etc?
  • How do I control what user runs the processes in a container?
  • How does UnionFS compose multiple containers?
  • How do I configure where images come from?
  • How do I figure out what went wrong after one exits and goes away?

… and a million other things.

Kubernetes

Kubernetes is one orchestration layer on top of Docker. It will handle things like ensuring you have the expected number of copies of an image running across the various hosts on the cluster, and providing a proxy (aka “service”) for reaching them at a stable location.

Kubernetes introduces the concept of “pods” which are essentially just related containers running together on a host and sharing resources. As far as OpenShift is concerned, pods will likely only ever have a single container and thus be synonymous, but the terminology is there nonetheless. Do not confuse “pods” with “apps” (which are also composed of containers, but potentially spread across multiple hosts).

Things to learn:

  • Kubernetes masters present a REST API, so need to know that a bit.
  • How are multiple kubernetes masters synchronized? Just via entries in etcd, or more directly?
  • How do kubernetes masters communicate with minions (kubelets)?
  • How do services/replication masters determine whether a container/pod is working or not?

etcd

Distributed key-value store. I’m not sure why we needed another one, but it seems that it’s going to be the store for lots of critical stuff. Which critical stuff? Good question, probably not *all* of it… What else might we use for a data store?

Aside from the general capabilities of etcd, I need to learn how to cluster and shard them, and how the RAFT consensus synchronizer works (or when it doesn’t work).

OpenShift v3

Of course, this is going to add a further layer on top of Kubernetes, a layer to define apps and user access to them. A lot of it is still in pretty early stages, e.g. there’s not really any concept of users or access controls yet. That’ll change.

  • REST API (parallel but separate from Kubernetes)
  • Building container layers from source code
  • Deployment strategies
  • How does OpenShift influence the placement algorithm with parallels for the scaled/HA apps, zones, and regions of v2?
  • What does the routing layer look like? (We aren’t simply going to expose Kubernetes services) Good gracious, the networking looks to be complicated for this.
  • How will we define and mount storage for use by containers / apps?

Angular.js web console

Having a web application server is so last year (or maybe decade?). The data is all available from REST APIs… now your web app can just be static pages with a ton of JavaScript doing all the work on the client side. This replaces the OpenShift v2 web console app. At least it’s one less service to keep running, and you won’t need to hit “reload” all the time to watch things changing.

Is anything staying the same?

Technology-wise, nothing is staying the same. Get ready for that (I’ve marveled that the rest of the team could pivot so quickly). But we’ve spent a few years now building a PaaS, and of course there are certain patterns that are going to pop up no matter what technology we use. Despite all the technology changes, those same issues are probably what we’ll be beating our heads against, and where hopefully our previous experience will help OpenShift prevail.

Infrastructure, nodes, and routing

OpenShift will probably constitute the infrastructure only – the apps will actually run on hosts that run Linux, Docker, and kubelets. But the general pattern will remain – an orchestration interface, a cluster of compute nodes, and routing layer to reach them.

Composing apps

Apps will still be put together from several components – potentially several containers (I don’t think we’ll call them gears), each potentially composed of some kind of framework plus some of your code. Defining and wiring these together will be the core of what OpenShift continues to do.

Access control

We’re still going to have users. There will still be teams. There may be more layers (e.g. probably admins, “utility” users). It will still be necessary to define things that those users and teams can access. And it will still be necessary to interface with the various ways in which enterprises already define users, groups, authentication, and authorization (Kerberos, LDAP, …).

Proxies (AKA layers of indirection)

In OpenShift v2, there are a number of ways in which your request to an app can actually reach the thing that answers the request, often going through multiple proxies. In perhaps the most complicated case, with an HA app setup, you lookup the app by name (DNS itself consists of several layers of indirection) and reach the external routing layer, which forwards to a node host routing layer, which forwards to a load-balancing gear layer, which forwards to another node’s port proxy, which finally forwards to the application server running in a gear. V3 will differ in the details, but not the pattern.

These proxies don’t exist just to peel back layers of the onion; each point provides an opportunity to hide complicated behavior behind a simple facade. For example:

  • DNS records provide all sorts of routing opportunities, including directing the user to a data center that’s available and geographically close to them.
  • A routing layer can monitor multiple endpoints of the application such that outages are detected and requests are directed to functioning endpoints. These can also hide the fact that gears are being moved, rolling deployments, etc.
  • The node host routing layer can hide the fact that a gear was actually idle when a request arrived, bringing it up only when needed and conserving resources otherwise.
  • The load-balancing gear layer balances traffic and implements sticky sessions.

As you can see, proxies are actually where a lot of the “magic” of a PaaS happens, and you can expect this pattern to continue in v3.

Implementing an OpenShift Enterprise routing layer for HA applications

My previous post described how to make an application HA and what exactly that means behind the scenes. This post is to augment the explanation in the HA PEP of how an administrator should expect to implement the routing layer for HA apps.

The routing layer implementation is currently left entirely up to the administrator. At some point OpenShift will likely ship a supported routing layer component, but the first priority was to provide an SPI (Service Provider Interface) so that administrators could reuse existing routing and load balancer infrastructure with OpenShift. Since most enterprises already have such infrastructure, we expected they would prefer to leverage that investment (both in equipment and experience) rather than be forced to use something OpenShift-specific.

Still, this leaves the administrator with the task of implementing the interface to the routing layer. Worldline published an nginx implementation, and we have some reference implementations in the works, but I thought I’d outline some of the details that might not be obvious in such an implementation.

The routing SPI

The first step in the journey is to understand the routing SPI events. The routing SPI itself is an interface on the OpenShift broker app that must be implemented via plugin. The example routing plugin that is packaged for Origin and Enterprise simply serializes the SPI events to YAML and puts them on an ActiveMQ message queue/topic. This is just one way to distribute the events, but it’s a pretty good way, at least in the abstract. For routing layer development and testing, you can just publishes messages on a topic on the ActiveMQ instance OpenShift already uses (for Enterprise, openshift.sh does this for you) and use the trivial “echo” listener to see exactly what comes through. For production, publish events to a queue (or several if multiple instances need updating) on an HA ActiveMQ deployment that stores messages to disk when shutting down (you really don’t want to lose routing events) – note that the ActiveMQ deployment described in OpenShift docs and deployed by the installer does not do this, being intended for much more ephemeral messages.

I’m not going to go into detail about the routing events. You’ll become plenty familiar if you implement this. You can see some good example events in this description, but always check what is actually coming out of the SPI as there may have been updates (generally additions) since. The general outline of the events can be seen in the Sample Routing Plug-in Notifications table from the Deployment Guide or in the example implementation of the SPI. Remember you can always write your own plugin to give you information in the desired format.

Consuming SPI events for app creation

The routing SPI publishes events for all apps, not just HA ones, and you might want to do something with other apps (e.g. implement blue/green deployments), but the main focus of a routing layer is to implement HA apps. So let’s look at how you do that. I’m assuming YAML entries from the sample activemq plugin below — if you use a different plugin, similar concepts should apply just with different details.

First when an app is created you’re going to get an app creation event:

$ rhc app create phpha php-5.4 -s

:action: :create_application
:app_name: phpha
:namespace: demo
:scalable: true
:ha: false

This is pretty much just a placeholder for the application name. Note that it is not marked as HA. There is some work coming to make apps HA at creation, but currently you just get a scaled app and have to make it HA after it’s created. This plugin doesn’t publish the app UUID, which is what I would probably do if I were writing a plugin now. Instead, you’ll identify the application in any future events by the combination of app_name and namespace.

Once an actual gear is deployed, you’ll get two (or more) :add_public_endpoint actions, one for haproxy’s load_balancer type and one for the cartridge web_framework type (and possibly other endpoints depending on cartridge).

:action: :add_public_endpoint
:app_name: phpha
:namespace: demo
:gear_id: 542b72abafec2de3aa000009
:public_port_name: haproxy-1.4
:public_address: 172.16.4.200
:public_port: 50847
:protocols:
- http
- ws
:types:
- load_balancer
:mappings:
- frontend: ''
 backend: ''
- frontend: /health
 backend: /configuration/health

You might expect that when you make the app HA, there is some kind of event specific to being made HA. There isn’t at this time. You just get another load_balancer endpoint creation event for the same app, and you can infer that it’s now HA. For simplicity of implementation, it’s probably just best to treat all scaled apps as if they were already HA and define routing configuration for them.

Decision point 1: The routing layer can either direct requests only to the load_balancer endpoints and let them forward traffic all to the other gears, or it can actually just send traffic directly to all web_framework endpoints. The recommendation is to send traffic to the load_balancer endpoints, for a few reasons:

  1. This allows haproxy to monitor traffic in order to auto-scale.
  2. It will mean less frequent changes to your routing configuration (important when changes mean restarts).
  3. It will mean fewer entries in your routing configuration, which could grow quite large and become a performance concern.

However, direct routing is viable, and allows an implementation of HA without actually going to the trouble of making apps HA. You would just have to set up a DNS entry for the app that points at the routing layer and use that. You’d also have to handle scaling events manually or from the routing layer somehow (or even customize the HAproxy autoscale algorithm to use stats from the routing layer).

Decision point 2: The expectation communicated in the PEP (and how this was intended to be implemented) is that requests will be directed to the external proxy port on the node (in the example above, that would be http://172.16.4.200:50847/). There is one problem with doing this – idling. Idler stats are gathered only on requests that go through the node frontend proxy, so if we direct all traffic to the port proxy, the haproxy gear(s) will eventually idle and the app will be unavailable even though it’s handling lots of traffic. (Fun fact: secondary gears are exempt from idling – doesn’t help, unless the routing layer proxies directly to them.) So, how do we prevent idling? Here are a few options:

  1. Don’t enable the idler on nodes where you expect to have HA apps. This assumes you can set aside nodes for (presumably production) HA apps that you never want to idle. Definitely the simplest option.
  2. Implement health checks that actually go to the node frontend such that HA apps will never idle. You’ll need the gear name, which is slightly tricky – the above endpoint being on the first gear, it will be accessible by a request for http://phpha-demo.cloud_domain/health to the node at 172.16.4.200. When the next gear comes in, you’ll have to recognize that it’s not the head gear and send the health check to e.g. http://542b72abafec2de3aa000009-demo.cloud_domain/health.
  3. Flout the PEP and send actual traffic to the node frontend. This would be the best of all worlds since the idler would work as intended without any special tricks, but there are some caveats I’ll discuss later.

Terminating SSL (TLS)

When hosting multiple applications behind a proxy, it is basically necessary to terminate SSL at the proxy. (Despite SSL having been essentially replaced by TLS at this point, we’re probably going to call it SSL for the lifetime of the internet.) This has to do with the way routing works under HTTPS; during the intialization of the TLS connection, the client has to indicate the name it wants (in our case the application’s DNS name) in the SNI extension to the TLS “hello”. The proxy can’t behave as a dumb layer 4 proxy (just forwarding packets unexamined to another TLS endpoint) because it has to examine the stream at the protocol level to determine where to send it. Since the SNI information is (from my reading of the RFC) volunteered by the client at the start of the connection, it does seem like it would be possible for a proxy to examine the protocol and then act like a layer 4 proxy based on that examination, and indeed I think F5 LBs have this capability, but it does not seem to be a standard proxy/LB capability, and certainly not for existing open source implementations (nginx, haproxy, httpd – someone correct me if I’m missing something here), so to be inclusive we are left with proxies that operate at the layer 7 protocol layer, meaning they perform the TLS negotiation from the client’s perspective.

Edit 2014-10-08: layer 4 routing based on SNI is probably more available than I thought. I should have realized HAproxy 1.5 can do it, given OpenShift’s SNI proxy is using that capability. It’s hard to find details on though. If most candidate routing layer technologies have this ability, then it could simplify a number of the issues around TLS because terminating TLS could be deferred to the node.

If that was all greek to you, the important point to extract is that a reverse proxy has to have all the information to handle TLS connections, meaning the appropriate key and certificate for any requested application name. This is the same information used at the node frontend proxy; indeed, the routing layer will likely need to reuse the same *.cloud_domain wildcard certificate and key that is shared on all OpenShift nodes, and it needs to be made aware of aliases and their custom certificates so that it can properly terminate requests for them. (If OpenShift supported specifying x509 authentication via client certificates [which BTW could be implemented without large structural changes], the necessary parameters would also need to be published to the routing layer in addition to the node frontend proxy.)

We assume that a wildcard certificate covers the standard HA DNS name created for HA apps (e.g. in this case ha-phpha-demo.cloud_domain, depending of course on configuration; notice that no event announces this name — it is implied when an app is HA). That leaves aliases which have their own custom certs needing to be understood at the routing layer:

$ rhc alias add foo.example.com -a phpha
:action: :add_alias
:app_name: phpha
:namespace: demo
:alias: foo.example.com

$ rhc alias update-cert foo.example.com -a phpha --certificate certfile --private-key keyfile
:action: :add_ssl
:app_name: phpha
:namespace: demo
:alias: foo.example.com
:ssl: [...]
:private_key: [...]
:pass_phrase:

Aliases will of course need their own routing configuration entries regardless of HTTP/S, and something will have to create their DNS entries as CNAMEs to the ha- application DNS record.

A security-minded administrator would likely desire to encrypt connections from the routing layer back to the gears. Two methods of doing this present themselves:

  1. Send an HTTPS request back to the gear’s port proxy. This won’t work with any of the existing cartridges OpenShift provides (including the haproxy-1.4 LB cartridge), because none of them expose an HTTPS-aware endpoint. It may be possible to change this, but it would be a great deal of work and is not likely to happen in the lifetime of the current architecture.
  2. Send an HTTPS request back to the node frontend proxy, which does handle HTTPS. This actually works fine, if the app is being accessed via an alias – more about this caveat later.

Sending the right HTTP headers

It is critically important in any reverse-proxy situation to preserve the client’s HTTP request headers indicating the URL at which it is accessing an application. This allows the application to build self-referencing URLs accurately. This can be a little complicated in a reverse-proxy situation, because the same HTTP headers may be used to route requests to the right application. Let’s think a little bit about how this needs to work. Here’s an example HTTP request:

POST /app/login.php HTTP/1.1
Host: phpha-demo.openshift.example.com
[...]

If this request comes into the node frontend proxy, it looks at the Host header, and assuming that it’s a known application, forwards the request to the correct gear on that node. It’s also possible (although OpenShift doesn’t do this, but a routing layer might) to use the path (/app/login.php here) to route to different apps, e.g. requests for /app1/ might go to a different place than /app2/.

Now, when an application responds, it will often create response headers (e.g. a redirect with a Location: header) as well as content based on the request headers that are intended to link to itself relative to what the client requested. The client could be accessing the application by a number of paths – for instance, our HA app above should be reachable either as phpha-demo.openshift.example.com or as ha-phpha-demo.openshift.example.com (default HA config). We would not want a client that requests the ha- address to receive a link to the non-ha- address, which may not even resolve for it, and in any case would not be HA. The application, in order to be flexible, should not make a priori assumptions about how it will be addressed, so every application framework of any note provides methods for creating redirects and content links based on the request headers. Thus, as stated above, it’s critical for these headers to come in with an accurate representation of what the client requested, meaning:

  1. The same path (URI) the client requested
  2. The same host the client requested
  3. The same protocol the client requested

(The last is implemented via the “X-Forwarded-Proto: https” header for secure connections. Interestingly, a recent RFC specifies a new header for communicating items 2 and 3, but not 1. This will be a useful alternative as it becomes adopted by proxies and web frameworks.)

Most reverse proxy software should be well aware of this requirement and provide options such that when the request is proxied, the headers are preserved (for example, the ProxyPreserveHost directive in httpd). This works perfectly with the HA routing layer scheme proposed in the PEP, where the proxied request goes directly to an application gear. The haproxy cartridge does not need to route based on Host: header (although it does route requests based on a cookie it sets for sticky sessions), so the request can come in for any name at all and it’s simply forwarded as-is for the application to use.

The complication arises in situations where, for example, you would like the routing layer to forward requests to the node frontend proxy (in order to use HTTPS, or to prevent idling). The node frontend does care about the Host header because it’s used for routing, so the requested host name has to be one that the OpenShift node knows in relation to the desired gear. It might be tempting to think that you can just rewrite the request to use the gear’s “normal” name (e.g. phpha-demo.cloud_domain) but this would be a mistake because the application would respond with headers and links based on this name. Reverse proxies often offer options for rewriting the headers and even contents of responses in an attempt to fix this, but they cannot do so accurately for all situations (example: links embedded in JavaScript properties) so this should not be attempted. (Side note: the same prohibition applies to rewriting the URI path while proxying. Rewriting example.com/app/… to app.internal.example.com/… is only safe for sites that provide static content and all-relative links.)

What was that caveat?

I mentioned a caveat both on defeating the idler and proxying HTTPS connections to the node frontend, and it’s related to the section above. You can absolutely forward an HA request to the node frontend if the request is for a configured alias of the application, because the node frontend knows how to route aliases (so you don’t have to rewrite the Host: header which, as just discussed, is a terrible idea). The caveat is that, strangely, OpenShift does not create an alias for the ha- DNS entry automatically assigned to an HA app, so manual definition of an alias is currently required per-app for implementation of this scheme. I have created a feature request to instate the ha- DNS entry as an alias, and being hopefully easy to implement, this may soon remove the caveat behind this approach to routing layer implementation.

Things go away too

I probably shouldn’t even have to mention this, but: apps, endpoints, aliases, and certificates can all go away, too. Make sure that you process these events and don’t leave any debris lying around in your routing layer confs. Gears can also be moved from one host to another, which is an easy use case to forget about.

And finally, speaking of going away, the example routing plugin initially provided :add_gear and :remove_gear events, and for backwards compatibility it still does (duplicating the endpoint events). These events are deprecated and should disappear soon.