Running an OpenShift install into containers

For testing purposes, we would like the ability to set up and tear down a whole lot of OpenShift clusters (single- or multi-node). And why do this with VMs when we have all of this container technology? A container looks a lot like a VM, right? And we have the very nifty (but little-documented) docker connection plugin for Ansible to treat a container like a host. So we ought to be able to run the installer against containers.

Of course, things are not quite that simple. And even though I’m not sure how useful this will be, I set out to just see what happens. Perhaps we could at least have a base image from an actual Ansible install of OpenShift that runs an all-in-one cluster in a container, rather than going through oc cluster up or the like. Then we would have full configuration files and separate systemd units to work with in our testing.

So first, defining the “hosts”. It took me a few iterations to get to this given the examples go in a different direction, but I can just define containers in my inventory as if they were hosts, and specify the docker connection method for them as a host variable. Here’s my inventory for an Origin install:

[OSEv3:children]
masters
nodes
etcd

[OSEv3:vars]
deployment_type=origin
openshift_release=1.4
openshift_uninstall_images=False

[masters]
master_container ansible_connection=docker

[nodes]
master_container ansible_connection=docker
node_container ansible_connection=docker

[etcd]
master_container ansible_connection=docker

To ensure the containers exist and are running before ansible tries to connect to them, I created a play to iterate over the inventory names and create them:

---
- name: start up containers
  hosts: localhost
  tasks:
  - name: start containers
    with_inventory_hostnames:
      - all
    docker_container:
      image: centos/systemd
      name: "{{ item }}"
      state: started
      volumes:
        - /var/run/docker.sock:/var/run/docker.sock:z

This uses the Ansible docker_container module to ensure there is a docker container for each hostname that is running the centos/systemd image (a base CentOS image that runs systemd init). Since I don’t really want to run a separate docker inside of each container once the cluster is up (remember, I want to start a lot of these, and they’ll pretty much all use the same images, so I’d really like to reuse the docker image cache), I’m mounting in the host’s docker socket so everyone will use one big happy docker daemon.

Then I just have to run the regular plays for an install (this assumes we’re in the openshift-ansible source directory):

- include: playbooks/byo/openshift-cluster/config.yml

Now of course it could not be that simple. After a few minutes of installing, I ran into an error:

TASK [openshift_clock : Start and enable ntpd/chronyd] *************************
fatal: [master_container]: FAILED! => {
 "changed": true, 
 "cmd": "timedatectl set-ntp true", 
 "delta": "0:00:00.200535", 
 "end": "2017-03-14 23:43:39.038562", 
 "failed": true, 
 "rc": 1, 
 "start": "2017-03-14 23:43:38.838027", 
 "warnings": []
}

STDERR:

Failed to create bus connection: No such file or directory

I looked around and found others who had similarly experienced this issue, and it seemed related to running dbus, but dbus is installed in the image and I couldn’t get it running. Eventually a colleague told me that you have to run the container privileged for dbus to work. Why this should be, I don’t know, but it’s easily enough done.

On to the next problem. I ran into an error from within Ansible that was trying to use 1.4 as a string when it’s specified as a float.

TASK [openshift_version : set_fact] **********************************************************************************
fatal: [master_container]: FAILED! => {
 "failed": true
}

MSG:

The conditional check 'openshift_release is defined and 
openshift_release[0] == 'v'' failed. The error was: 
error while evaluating conditional (openshift_release is 
defined and openshift_release[0] == 'v'): float object has no element 0

Having seen this sort of thing before I could see this was due to how I specified the openshift_release in my inventory. It looks like a number so the YAML parser treats it as one. So I can just change it to "1.4" or v1.4 and it will be parsed as a string. I think this was only a problem when I was running Ansible from source; I didn’t see it with the released package.

Next problem. A playbook error because I’m using the docker connection plugin and so no ssh user is specified and thus it can’t be retrieved. Well, even though it’s unnecessary, just specify one in the inventory.

[OSEv3:vars]
ansible_user=root

Next problem. The installer complains that you need to have NetworkManager before running the install.

TASK [openshift_node_dnsmasq : fail] *******************************************
fatal: [master_container]: FAILED! => {
 "changed": false, 
 "failed": true
}

MSG:

Currently, NetworkManager must be installed and enabled prior to installation.

And I quickly found out that things will hang if you don’t restart dbus (possibly related to this old Fedora bug) after installing NetworkManager. Alright, just add that to my plays:

- name: set up NetworkManager
  hosts: all
  tasks:
  - name: ensure NetworkManager is installed
    package:
      name: NetworkManager
      state: present
  - name: ensure NetworkManager is enabled
    systemd:
      name: NetworkManager
      enabled: True
  - name: dbus needs a restart after this or NetworkManager and firewall-cmd choke
    systemd:
      name: dbus
      state: restarted

When I was first experimenting with this it went through just fine. On later tries, starting with fresh containers, this hung at starting NetworkManager, and I haven’t figured out why yet.

Finally it looked like everything is actually installing successfully, but then of course starting the actual node failed.

fatal: [node_container]: FAILED! => {
 "attempts": 1, 
 "changed": false, 
 "failed": true
}

MSG:

Unable to start service origin-node: Job for origin-node.service 
failed because the control process exited with error code. 
See "systemctl status origin-node.service" and "journalctl -xe" for details.

# docker exec -i --tty node_container bash
[root@9f7e04f06921 /]# journalctl --no-pager -eu origin-node 
[...]
systemd[1]: Starting Origin Node...
origin-node[8835]: F0315 19:13:21.972837 8835 start_node.go:131] 
cannot fetch "default" cluster network: Get 
https://cf42f96fd2f8:8443/oapi/v1/clusternetworks/default: 
dial tcp: lookup cf42f96fd2f8: no such host
systemd[1]: origin-node.service: main process exited, code=exited, status=255/n/a
systemd[1]: Failed to start Origin Node.


Actually I got a completely different error previously related to ovs that I’m not seeing now. These could be anything as far as I know, but it may be related to the fact that I didn’t expose any ports or specify any external IP addresses for my “hosts” to talk to each other nor arrange any DNS for them to resolve each other. In any case, something to remedy another day. So far the playbook and inventory look like this:

---
- name: start up containers
  hosts: localhost
  tasks:
    - name: start containers
  with_inventory_hostnames:
    - all
  docker_container:
    image: centos/systemd
    name: "{{ item }}"
    state: started
    privileged: True
    volumes:
    - /var/run/docker.sock:/var/run/docker.sock:z

- name: set up NetworkManager
  hosts: all
  tasks:
    - name: ensure NetworkManager is installed
      package:
        name: NetworkManager
        state: present
    - name: ensure NetworkManager is enabled
      systemd:
        name: NetworkManager
        enabled: yes
        state: started
    - name: dbus needs a restart after this or NetworkManager and firewall-cmd choke
      systemd:
        name: dbus
        state: restarted

- include: openshift-cluster/config.yml

 

[OSEv3:children]
masters
nodes
etcd

[OSEv3:vars]
deployment_type=origin
openshift_release="1.4"
openshift_uninstall_images=False
ansible_user=root

[masters]
master_container ansible_connection=docker

[nodes]
master_container ansible_connection=docker
node_container ansible_connection=docker

[etcd]
master_container ansible_connection=docker
Advertisements

A summary of my life with depression

This may not seem like a technical post but if you cross-reference to this talk it should be clear this is a problem that developers should really be aware of. I mean, not my personal issues, but the topic of mental illness in the developer world. So in the spirit of openness and sharing with others, I present my own story.

I’m not sure when depression began; I think it was kind of a slow progression over years. I’m 41 years old and I’ve always been kind of a negative person, always looking for flaws in things and worrying about what could go wrong. I liked to think this made me a better engineer. Somewhere along the way it became a state of mind where I could only see the negative.

I started to realize I was in trouble when it became clear that I didn’t really enjoy anything any more. It’s called anhedonia. It turns out you can function for quite a while being motivated only by negatives (fear of failure, fear of letting down your family/coworkers, etc.) and through sheer determination, but it’s a really miserable way to live. It’s hard to understand if you haven’t experienced it, so it’s not much use explaining it. Without anything that drives me to say “yes!” life seems pretty damn pointless. I was frustrated and angry all the time.

My wife encouraged me to get clinical help and eventually I did. Apparently sometime around September 2015, though I don’t really remember. I remember describing my depression not so much as “stuck in a pit” as “life in a dense fog”. The next year and few months my psychiatrist had me trying out various medications and tweaking dosages and such. Sometimes something would seem to be helping a little, but nothing really seemed to stick or make a big difference. It was discouraging to say the least. I was doing counseling, too, though I have yet to find a counselor who helps much.

In the summer of 2016 my mother was diagnosed with incurable cancer. In September 2016 my wife and I separated and I started shuttling our kids between us. Then in late November her health took a nosedive and I was left taking care of the kids alone, in addition to working full time with depression. I had always been able to deal with everything myself before, but something finally gave out in me. My job at Red Hat, which before had always been a refuge in turbulent times, became unbearable. I would spend all day staring at my screen and moving the mouse occasionally when it started to go dark because I hadn’t done anything for so long. I felt crushing guilt and shame that the one thing I had always been good at and enjoyed was now a joyless burden and I was letting everyone down.

I went on disability leave in early December. I didn’t even know you could go on disability for depression, but it was definitely disabling me, so it makes sense. My psychiatrist suggested trying a new course of treatment called TMS (Transcranial Magnetic Stimulation). In short, it uses an electromagnet to stimulate your brain, in daily treatments over the course of 6-8 weeks. I was expecting to get started with it ASAP, but it turned out I couldn’t start until January 4th.

I thought disability leave would be a relief, and it certainly was in the sense that I no longer had to feel guilty about the work I wasn’t doing (well, less guilty anyway – getting paid to do nothing really rubs me the wrong way). The downside is that it gave me a lot more time to brood over how useless I was and how I was going to lose everything and end up still depressed but in a homeless shelter, with my kids in foster care. I can look at things objectively and say that actually my situation is not that bad, and quite recoverable if I can just kick this depression thing, and there’s a good chance I can. And I’m so grateful for being able to take disability, and for my health insurance that covers all this pretty well, and for having my health otherwise. But the thing about being depressed is that you still feel hopeless, regardless of the reality of the situation.

I’m two weeks into TMS now. If anything, I feel worse because I’m starting to develop anxiety and having more trouble sleeping. My psychiatrist said her patients often saw improvement within a week (which made me more anxious when that week passed that I might be among the 20% or so that don’t benefit), but the TMS folks said actually to expect more like four weeks so I’m trying to be patient. If it doesn’t work out, I can do genetic testing to see if that helps pinpoint a medication that will actually help. I’m trying meditation, working on gratitude, connecting with people (something I never put much effort into before), contradicting my negative thoughts, and other random things in case they might help. And exercising, that seems to help. And just keeping busy to distract myself from feeling hopeless. I don’t have a happy ending yet, but everyone tells me things will get better if I just keep trying.

I guess if there’s a silver lining, it’s that people have come out of the woodwork to tell me they understand what I’m going through because they have been there. This is so common, there should be no reason to feel shame or to avoid treatment like I did for so long. It’s made me realize that in my fierce self-sufficiency I’ve never been open to being helped, or for that matter to helping others. But it turns out that nearly everybody needs some help sometimes, and I hope that out of this experience I’ll learn to be a more decent human being than I have been so far.

2015-12-28

SSL is dead. Long live TLS.

Hmm, do you think there are any other protocols that could be resurrected with a different name? How about good old HTTP? It hasn’t been about “Hypertext” transport for a long time. I mean sure, HTML is still around, but half the time it’s being written on the fly by your Javascript app anyway, not to mention there are CSS, SVG, images, audio, video, and a host of other things being transported via HTTP. And it’s not just passively transferring files, it’s communicating complicated application responses.

Maybe we should just call it Transport Protocol, “TP” for short. Yeah, I like it!

Happy New Year!

2015-12-8

I can’t believe it’s already December.

Problem du jour: getting logs from a pod via the OpenShift API. You would think I could just look at the impl of the oc logs command, but as usual it’s too tangled a mess (or, more likely, I just need to understand how to really use the go tools I have).

oc logs first muddies the water trying to figure out what kind of resource the user wants logs for. I can hopefully ignore this since I already have the pod I created ready.

Interlude – trying to figure out what gets injected into a pod’s /etc/resolv.conf file. Because someone is getting a wildcard domain added to their search directive and that causes everything to resolve to that domain IP, including e.g. github.com. I couldn’t get a useful read on what settings are relevant. I thought there was a setting for whether or not to inject the skydns nameserver; now I can’t find it. I created a pod on my devenv and it didn’t get anything injected. So I’m not sure all the sources of input to this file.

WordPress used to have a button to remove the distractions and make the editor take up the whole window. What ever happened to that?

So back to getting logs. Looks like I need to store the command’s Factory somewhere in order to be able to get to the LogsForObject method. Kubernetes or OpenShift factory? I have the OpenShift factory from my command and it contains the Kubernetes factory so it’s all the same.

I got the pod running… after I remembered to actually have the diagnostic call the necessary code. Disconcerting when you run a diagnostic and get *nothing* back. Now I have the pod being created and a readCloser with the results. Reminding myself how to use a readCloser.

Pro tip: don’t try to Fscanln a reader. Create a bufio Scanner instead.

See the thing I made

I wrote an article for the Red Hat Developers Blog. I haven’t felt much like blogging this year, but there’s one thing at least. If I have articles I think would be of outside interest, I’ll probably post them over there. This blog should return to its original purpose, which was for me to blather about my frustrations and solutions in a kind of stream of consciousness.

libvirt boxen for OpenShift v3

I promise I have not been struggling with vagrant the whole time since my last post. Actually I updated the vagrant-openshift docs and made some other fixes so the whole thing is a little more sane and obvious how to use, and then went on to other stuff. Today I’m just trying to put together OpenShift v3 libvirt boxen to put up for the public next to the virtualbox ones. Should be easy, actually it probably is; my problems today all seem to be local.

It would be nice if, just once, vagrant had a little transparency. It doesn’t have a verbose mode, and never tells you where anything is or should be.

$ vagrant box list
aws-dummy-box (aws, 0)
fedora_base (libvirt, 0)
fedora_inst (libvirt, 0)
openstack-dummy-box (openstack, 0)

Ah, yeah… so… where are those defined? What images do they point to, and where were they downloaded from?

The errors are the worst. When something goes wrong, could you please tell me what you think you got from me, what you tried to do with that, and what went wrong? No.

$ vagrant up --provider=libvirt
Bringing machine 'openshiftdev' up with 'libvirt' provider...
Name `origin_openshiftdev` of domain about to create is already taken.
Please try to run `vagrant up` command again.

Just try to figure out what is specifying “origin_openshiftdev” as a domain and what to do about it. Or how to release it so I can, in fact, run vagrant up again.

$ vagrant status
Current machine states:

openshiftdev not created (libvirt)

The Libvirt domain is not created. Run `vagrant up` to create it.
$ vagrant destroy
==> openshiftdev: Domain is not created. Please run `vagrant up` first.

Part of the problem is that I have at least three semi-autonomous bits of vagrant to deal with. There’s vagrant itself, which keeps track of box definitions. There’s the Vagrantfile I’m feeding it from OpenShift Origin, which might interact with the vagrant-openshift plugin (though I don’t think so on vagrant up) but in any case defines what hosts I’m supposed to be creating. Finally, there’s the provider plugin (libvirt in this case) that has to interface with the virtualization to actually manage the hosts. If something goes wrong, I can’t even tell which part is complaining, much less why.

Enough complaining, what is going on?

The primary input to vagrant is a “box”. This is really just a tarball that contains a minimal Vagrantfile, metadata file, and the real payload, the disk image of the virtual host. The vagrant “box” is provider-specific – the metadata specifies a provider.

When you run vagrant up, the local Vagrantfile should specify which box to start with – a URL to retrieve it and the name for vagrant to import it as. The first run will download and unpack it under ~/.vagrant.d/boxes/<name>/<version>/<provider>/ (note, you can have multiple providers for the same box name/version). Subsequent runs just use that box definition. Simple enough as it goes.

vagrant up also creates a local .vagrant/ directory to keep track of “machines” (which are intended to represent actual running virtual hosts instantiated from boxes). Machines are stored under .vagrant/machines/<name>/<provider>, where the name comes from the Vagrantfile VM definition. In OpenShift’s Vagrantfile we have config.vm.define “openshiftdev”, so for the libvirt provider I could expect to see a directory .vagrant/machines/openshiftdev/libvirt once I’ve brought up a machine. (Under vbox you can define a master and several minions, which would all have different names. I hope we can do that soon with the other providers too.)

I was planning to build a libvirt box from scratch, but then I realized there is a Vagrant plugin “vagrant-mutate” that will take an existing box and change it to another provider. Since we already have boxes defined for vbox I thought I’d just try this out to make a libvirt version of it.

$ vagrant mutate \
  https://mirror.openshift.com/pub/vagrant/boxes/openshift3/centos7_virtualbox_inst.box \
  libvirt
Downloading box centos7_virtualbox_inst from https://mirror.openshift.com/pub/vagrant/boxes/openshift3/centos7_virtualbox_inst.box
Extracting box file to a temporary directory.
Converting centos7_virtualbox_inst from virtualbox to libvirt.
 (100.00/100%)
Cleaning up temporary files.
The box centos7_virtualbox_inst (libvirt) is now ready to use.

So far, so good. Or not, because what does “ready to use” mean? Where is it? Turns out, it means said box is stored under my ~/.vagrant.d/boxes directory for use with the next vagrant up. It kept the same name with the provider embedded in it, but if I just change the name…

$ mv ~/.vagrant.d/boxes/centos7_{virtualbox_,}inst
$ vagrant box list
aws-dummy-box (aws, 0)
centos7_inst (libvirt, 0)
fedora_base (libvirt, 0)
fedora_inst (libvirt, 0)
openstack-dummy-box (openstack, 0)

… everything works out fine. So to use that with my openshift/origin Vagrantfile, I just put that name into my .vagrant-openshift.json file like so:

"libvirt": {
  "box_name": "centos7_inst"
},

Note that I don’t need to specify a box_url because the box is already local. Folks will need the box_url to access it once I publish it. So let’s vagrant up already…

$ vagrant up --provider=libvirt
Bringing machine 'openshiftdev' up with 'libvirt' provider...
/home/luke/.vagrant.d/gems/gems/fog-1.27.0/lib/fog/libvirt/requests/compute/list_volumes.rb:32:in `info': 
Call to virStorageVolGetInfo failed: Storage volume not found: 
no storage vol with matching path '/mnt/VMs/origin_openshiftdev.img'
(Libvirt::RetrieveError)

Ah. This is definitely due to some messing around on my part, because I deleted that image as I thought vagrant was saying earlier it was in the way (remember “Name `origin_openshiftdev` of domain about to create is already taken” ?). This error at least seems safe to pin on the libvirt provider, but I’m not sure what to do about it. Shouldn’t libvirt just clone the image from the vagrant box to create a new VM? How did my request to instantiate the “centos7_inst” box as “openshiftdev” get translated into looking for that particular file to exist?

I’m guessing (since grep got me nowhere) that the libvirt provider takes the directory I’m in and the box being requested and uses that as the VM name. Or at least, a volume name from which VMs can be cloned for Vagrant usage.

virsh to the rescue

I’m not really very knowledgeable of libvirt, mainly because I’ve been able to run VMs just fine using the graphical virt-manager interface and didn’t really need a lot more. I deleted that image above using virt-manager, figuring it would take care of referential integrity. Now that I’m venturing into the world of scripted VM management, I have been fiddling a little with virsh, so let’s apply that:

# virsh vol-list default
 Name                     Path 
------------------------------------------------------------------------------
[...]
 origin_openshiftdev.img  /mnt/VMs/origin_openshiftdev.img

Hmm, yes, libvirt does actually seem to expect that volume to be there. And then it’s failing trying to use it because the actual file isn’t there. So let’s nuke the volume record, wherever that may be.

# virsh vol-delete origin_openshiftdev.img default
Vol origin_openshiftdev.img deleted

And vagrant up --provider=libvirt suddenly works again.

Updating libvirt boxes

One extra note about using libvirt as a provider: as soon as you use vagrant to start a libvirt box you have downloaded, the vagrant-libvirt plugin makes a copy of the image from the box definition and uses that. The copy is made in libvirt’s default storage pool (unless you tell it otherwise… BTW, quite a few interesting options at the vagrant-libvirt README) and is named <box_name>_vagrant_box_image.img. So my box above translates to /mnt/VMs/centos7_inst_vagrant_box_image.img (I use a separate mount point for my VM storage because it’s just too easy to fill your root fs otherwise). Then when you actually create a VM, it uses a copy-on-write snapshot of that image, which seems to be named after the project and VM definition (my problem volume above, origin_openshiftdev.img). That way it’s a pretty fast, efficient startup from a consistent starting point.

Of course this could be a bit confusing if you actually want to update your vagrant box. You might download a new box definition from vagrant’s perspective, but vagrant-libvirt sees it already has a volume with the right name and keeps using that (in fact, once it has copied the volume, you may as well truncate the box.img under .vagrant to save space). You have to nuke the libvirt volume to get it to use the updated box definition. virt-manager seems to do just as well as virsh vol-delete at this (not sure what happened before in my case). So e.g.

# virsh vol-delete centos7_inst_vagrant_box_image.img default

Then the next vagrant up with that box will use the updated box definition.

vagrant setup

I may be an idiot, but I’ve simply never used vagrant successfully before.

“Just vagrant up and you’re ready to go!” say all the instructions. Yeah, that probably works fine with the default VirtualBox, which is available for all major desktop platforms. But I don’t want to use any more proprietary Oracle crap than I absolutely have to. I don’t even want to run VMs on my local host (all my RAM is already taken up by my browser tab habit), but if I did it would be on KVM/QEMU that’s native to Fedora. But I have access to AWS and OpenStack, so why would I even do that?

Well, if you want to use something other than the default, you have to add provider plugins. Alright, sounds easy enough.

$  vagrant plugin install vagrant-openstack-provider
Installing the 'vagrant-openstack-provider' plugin. This can take a few minutes...
Installed the plugin 'vagrant-openstack-provider (0.6.0)'!
$ vagrant plugin install vagrant-aws
Installing the 'vagrant-aws' plugin. This can take a few minutes...
Installed the plugin 'vagrant-aws (0.6.0)'!

Oh yeah, easy-peasy man! OK, let’s fire up OpenShift v3:

$ git clone https://github.com/openshift/origin
$ cd origin
$ vagrant up --provider=aws
There are errors in the configuration of this machine. Please fix
the following errors and try again:
SSH:
* `private_key_path` file must exist: PATH TO AWS KEYPAIR PRIVATE KEY

Hm, OK, must be something I need to provide. Look through the Vagrantfile, looks like it’s expecting an entry for AWSPrivateKeyPath in my .awscreds file. I have a private key file, I can do that. Try again…

$ vagrant up --provider=aws
Bringing machine 'openshiftdev' up with 'aws' provider...
/home/luke/.vagrant.d/gems/gems/fog-aws-0.0.6/lib/fog/aws/region_methods.rb:6:in `validate_aws_region': Unknown region: "<AMI_REGION>" (ArgumentError)

Erm… right, more stuff to fill in. I don’t particularly want to edit the Vagrantfile, and not sure which AMI_REGION I should use. Surely someone on my team has specified all this somewhere? A search brings me to https://github.com/openshift/vagrant-openshift which looks like it ought to at least create me a config file that Vagrant will read. Sounds good, let’s go:

$ cd vagrant-openshift
$ bundle
Fetching git://github.com/mitchellh/vagrant.git
Fetching gem metadata from https://rubygems.org/.........
Resolving dependencies...
[.......]
Using vagrant-openshift 1.0.12 from source at .
Your bundle is complete!
Use `bundle show [gemname]` to see where a bundled gem is installed.

$ rake vagrant:install
vagrant-openshift 1.0.12 built to pkg/vagrant-openshift-1.0.12.gem.
The plugin 'vagrant-openshift' is not installed. Please install it first.
Installing the 'pkg/vagrant-openshift-1.0.12.gem' plugin. This can take a few minutes...
Installed the plugin 'vagrant-openshift (1.0.12)'!

$ cd ~/go/
[luke:/home/luke/go] $ vagrant openshift3-local-checkout -u sosiouxme
/home/luke/.rvm/rubies/ruby-1.9.3-p545/lib/ruby/site_ruby/1.9.1/rubygems/dependency.rb:298:in `to_specs': Could not find 'vagrant' (>= 0) among 218 total gem(s) (Gem::LoadError)
[...stack trace]

Whaaaaat? I’ve entirely broken vagrant now, and I have no idea how. Vagrant seems just a little more… fragile?… than I was expecting. Fine, let’s move to ruby 2.0 and define a gemset just for vagrant, such that if I hose things up again, it’s contained. (I tried ruby 2.1 first but had an error getting rubygems from rubygems.org… well, that’s not vagrant’s fault.) Wait, I can’t do that, recent vagrant versions are no longer published as a rubygem; I’m supposed to get it from my OS. I have it installed as the vagrant-0.6.5 RPM. If I try to add plugins under rvm, it complains that the vagrant *gem* isn’t installed. Which of course it isn’t… if you do install it, it just tells you not to do that.

OK, so let’s just go with the system ruby that apparently the RPM is expecting.

$ rvm use system
Now using system ruby.
$ vagrant plugin install vagrant-aws
Installing the 'vagrant-aws' plugin. This can take a few minutes...
Installed the plugin 'vagrant-aws (0.6.0)'!
$ bundle
[...]
Using vagrant 1.7.2 from git://github.com/mitchellh/vagrant.git (at master)
 [ should I be worried the version doesn't match?]
Installing vagrant-aws 0.6.0
[...]
$ vagrant openshift3-local-checkout -u sosiouxme
Waiting for the cloning process to finish
Cloning origin ...
Cloning git@github.com:sosiouxme/origin
Cloning source-to-image ...
Cloning git@github.com:sosiouxme/source-to-image
Cloning wildfly-8-centos ...
Cloning git@github.com:sosiouxme/wildfly-8-centos
Cloning ruby-20-centos ...
Cloning git@github.com:sosiouxme/ruby-20-centos
ERROR: Repository not found.
 [yeah, I haven't cloned all the repos... do I need to???]
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
Fork of repo wildfly-8-centos not found. Cloning read-only copy from upstream
ERROR: Repository not found.
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
Fork of repo source-to-image not found. Cloning read-only copy from upstream
ERROR: Repository not found.
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
Fork of repo ruby-20-centos not found. Cloning read-only copy from upstream
remote: Counting objects: 1, done.
remote: Total 1 (delta 0), reused 1 (delta 0)
Unpacking objects: 100% (1/1), done.
From https://github.com/openshift/origin
 * [new branch] master -> upstream/master
 * [new tag] v0.2 -> v0.2
OpenShift repositories cloned into /home/luke/go/src/github.com/openshift

$ cd src/github.com/openshift/origin/
$ vagrant origin-init --stage inst --os fedora lmeyer-osv3dev
Reading AWS credentials from /home/luke/.awscred
Searching for latest base AMI
Found: ami-0221586a (devenv-fedora_559)
$ vagrant up --provider=aws
Bringing machine 'openshiftdev' up with 'aws' provider...
[...]
/home/luke/.vagrant.d/gems/gems/excon-0.43.0/lib/excon/middlewares/expects.rb:6:in `response_call': The key pair 'AWS KEYPAIR NAME' does not exist (Fog::Compute::AWS::NotFound)

OK now what? I really don’t want to have to edit Vagrantfile and deal with keeping that out of git. I haven’t quite torn my hair out before a coworker points out https://github.com/openshift/vagrant-openshift#aws-credentials which was considerably further down than I was looking. OK, so I just needed to add  AWSKeyPairName to my .awscreds, and…

$ vagrant up --provider=aws
Bringing machine 'openshiftdev' up with 'aws' provider...
[...]
==> openshiftdev: Machine is booted and ready for use!

Finally! And “vagrant ssh” works too! There’s not really much running yet, but I’ll figure that out later. Now what if I want to tear down that box and do something different? Let’s see…

$ vagrant -h
/home/luke/.vagrant.d/gems/gems/vagrant-share-1.1.3/lib/vagrant-share/activate.rb:8:in `rescue in <encoded>': vagrant-share can't be installed without vagrant login (RuntimeError)

Really? Ah, seems I ran into a bug and just need to upgrade vagrant by downloading it from vagrantup.com rather than Fedora. Just like I apparently did long ago and forgot about. Fun.

What would probably have been obvious to anyone who knew vagrant is that adding vagrant-openshift actually adds commands to what vagrant can handle, such as “vagrant openshift3-local-checkout” above. There are a bunch more on the help menu.

So, back to running stuff. This looks promising:

$ vagrant install-openshift3
Running ssh/sudo command 'yum install -y augeas' with timeout 600. Attempt #0
Package augeas-1.2.0-2.fc20.x86_64 already installed and latest version
[...]
$ vagrant test-openshift3
***************************************************
Running hack/test-go.sh...
[...]

It’s not obvious (to me) how to actually run openshift via vagrant. I’m guessing you just vagrant ssh in and run it from /data where everything is compiled. I was kind of hoping for more magic (like, here’s a vagrant command that sets up three clustered etcd servers and five nodes, and you just ssh in and “osc get foo” works). Also I need to try out the libvirt and openstack providers. But that’s all I have time for today…

Yep, easy-peasy!