libvirt boxen for OpenShift v3

I promise I have not been struggling with vagrant the whole time since my last post. Actually I updated the vagrant-openshift docs and made some other fixes so the whole thing is a little more sane and obvious how to use, and then went on to other stuff. Today I’m just trying to put together OpenShift v3 libvirt boxen to put up for the public next to the virtualbox ones. Should be easy, actually it probably is; my problems today all seem to be local.

It would be nice if, just once, vagrant had a little transparency. It doesn’t have a verbose mode, and never tells you where anything is or should be.

$ vagrant box list
aws-dummy-box (aws, 0)
fedora_base (libvirt, 0)
fedora_inst (libvirt, 0)
openstack-dummy-box (openstack, 0)

Ah, yeah… so… where are those defined? What images do they point to, and where were they downloaded from?

The errors are the worst. When something goes wrong, could you please tell me what you think you got from me, what you tried to do with that, and what went wrong? No.

$ vagrant up --provider=libvirt
Bringing machine 'openshiftdev' up with 'libvirt' provider...
Name `origin_openshiftdev` of domain about to create is already taken.
Please try to run `vagrant up` command again.

Just try to figure out what is specifying “origin_openshiftdev” as a domain and what to do about it. Or how to release it so I can, in fact, run vagrant up again.

$ vagrant status
Current machine states:

openshiftdev not created (libvirt)

The Libvirt domain is not created. Run `vagrant up` to create it.
$ vagrant destroy
==> openshiftdev: Domain is not created. Please run `vagrant up` first.

Part of the problem is that I have at least three semi-autonomous bits of vagrant to deal with. There’s vagrant itself, which keeps track of box definitions. There’s the Vagrantfile I’m feeding it from OpenShift Origin, which might interact with the vagrant-openshift plugin (though I don’t think so on vagrant up) but in any case defines what hosts I’m supposed to be creating. Finally, there’s the provider plugin (libvirt in this case) that has to interface with the virtualization to actually manage the hosts. If something goes wrong, I can’t even tell which part is complaining, much less why.

Enough complaining, what is going on?

The primary input to vagrant is a “box”. This is really just a tarball that contains a minimal Vagrantfile, metadata file, and the real payload, the disk image of the virtual host. The vagrant “box” is provider-specific – the metadata specifies a provider.

When you run vagrant up, the local Vagrantfile should specify which box to start with – a URL to retrieve it and the name for vagrant to import it as. The first run will download and unpack it under ~/.vagrant.d/boxes/<name>/<version>/<provider>/ (note, you can have multiple providers for the same box name/version). Subsequent runs just use that box definition. Simple enough as it goes.

vagrant up also creates a local .vagrant/ directory to keep track of “machines” (which are intended to represent actual running virtual hosts instantiated from boxes). Machines are stored under .vagrant/machines/<name>/<provider>, where the name comes from the Vagrantfile VM definition. In OpenShift’s Vagrantfile we have config.vm.define “openshiftdev”, so for the libvirt provider I could expect to see a directory .vagrant/machines/openshiftdev/libvirt once I’ve brought up a machine. (Under vbox you can define a master and several minions, which would all have different names. I hope we can do that soon with the other providers too.)

I was planning to build a libvirt box from scratch, but then I realized there is a Vagrant plugin “vagrant-mutate” that will take an existing box and change it to another provider. Since we already have boxes defined for vbox I thought I’d just try this out to make a libvirt version of it.

$ vagrant mutate \
  https://mirror.openshift.com/pub/vagrant/boxes/openshift3/centos7_virtualbox_inst.box \
  libvirt
Downloading box centos7_virtualbox_inst from https://mirror.openshift.com/pub/vagrant/boxes/openshift3/centos7_virtualbox_inst.box
Extracting box file to a temporary directory.
Converting centos7_virtualbox_inst from virtualbox to libvirt.
 (100.00/100%)
Cleaning up temporary files.
The box centos7_virtualbox_inst (libvirt) is now ready to use.

So far, so good. Or not, because what does “ready to use” mean? Where is it? Turns out, it means said box is stored under my ~/.vagrant.d/boxes directory for use with the next vagrant up. It kept the same name with the provider embedded in it, but if I just change the name…

$ mv ~/.vagrant.d/boxes/centos7_{virtualbox_,}inst
$ vagrant box list
aws-dummy-box (aws, 0)
centos7_inst (libvirt, 0)
fedora_base (libvirt, 0)
fedora_inst (libvirt, 0)
openstack-dummy-box (openstack, 0)

… everything works out fine. So to use that with my openshift/origin Vagrantfile, I just put that name into my .vagrant-openshift.json file like so:

"libvirt": {
  "box_name": "centos7_inst"
},

Note that I don’t need to specify a box_url because the box is already local. Folks will need the box_url to access it once I publish it. So let’s vagrant up already…

$ vagrant up --provider=libvirt
Bringing machine 'openshiftdev' up with 'libvirt' provider...
/home/luke/.vagrant.d/gems/gems/fog-1.27.0/lib/fog/libvirt/requests/compute/list_volumes.rb:32:in `info': 
Call to virStorageVolGetInfo failed: Storage volume not found: 
no storage vol with matching path '/mnt/VMs/origin_openshiftdev.img'
(Libvirt::RetrieveError)

Ah. This is definitely due to some messing around on my part, because I deleted that image as I thought vagrant was saying earlier it was in the way (remember “Name `origin_openshiftdev` of domain about to create is already taken” ?). This error at least seems safe to pin on the libvirt provider, but I’m not sure what to do about it. Shouldn’t libvirt just clone the image from the vagrant box to create a new VM? How did my request to instantiate the “centos7_inst” box as “openshiftdev” get translated into looking for that particular file to exist?

I’m guessing (since grep got me nowhere) that the libvirt provider takes the directory I’m in and the box being requested and uses that as the VM name. Or at least, a volume name from which VMs can be cloned for Vagrant usage.

virsh to the rescue

I’m not really very knowledgeable of libvirt, mainly because I’ve been able to run VMs just fine using the graphical virt-manager interface and didn’t really need a lot more. I deleted that image above using virt-manager, figuring it would take care of referential integrity. Now that I’m venturing into the world of scripted VM management, I have been fiddling a little with virsh, so let’s apply that:

# virsh vol-list default
 Name                     Path 
------------------------------------------------------------------------------
[...]
 origin_openshiftdev.img  /mnt/VMs/origin_openshiftdev.img

Hmm, yes, libvirt does actually seem to expect that volume to be there. And then it’s failing trying to use it because the actual file isn’t there. So let’s nuke the volume record, wherever that may be.

# virsh vol-delete origin_openshiftdev.img default
Vol origin_openshiftdev.img deleted

And vagrant up --provider=libvirt suddenly works again.

Updating libvirt boxes

One extra note about using libvirt as a provider: as soon as you use vagrant to start a libvirt box you have downloaded, the vagrant-libvirt plugin makes a copy of the image from the box definition and uses that. The copy is made in libvirt’s default storage pool (unless you tell it otherwise… BTW, quite a few interesting options at the vagrant-libvirt README) and is named <box_name>_vagrant_box_image.img. So my box above translates to /mnt/VMs/centos7_inst_vagrant_box_image.img (I use a separate mount point for my VM storage because it’s just too easy to fill your root fs otherwise). Then when you actually create a VM, it uses a copy-on-write snapshot of that image, which seems to be named after the project and VM definition (my problem volume above, origin_openshiftdev.img). That way it’s a pretty fast, efficient startup from a consistent starting point.

Of course this could be a bit confusing if you actually want to update your vagrant box. You might download a new box definition from vagrant’s perspective, but vagrant-libvirt sees it already has a volume with the right name and keeps using that (in fact, once it has copied the volume, you may as well truncate the box.img under .vagrant to save space). You have to nuke the libvirt volume to get it to use the updated box definition. virt-manager seems to do just as well as virsh vol-delete at this (not sure what happened before in my case). So e.g.

# virsh vol-delete centos7_inst_vagrant_box_image.img default

Then the next vagrant up with that box will use the updated box definition.

Advertisements

Upgrades – as much fun as a barrel of rotten fish

Having neglected my aging personal desktop for some time, I decided it was time to do some upgrades and make it into something I’d enjoy developing on again. Not that I’ve made much time for that lately, but that’s just the point, you know?

First order of business: get dual monitors working again on Linux. My desktop has added PCI video boards to add to the onboard video. I’ve had as many as three monitors hooked in that way, but decided in the end that’s too much. Two is perfect. But the PCI boards aren’t initialized until quite late in the boot process. This worked fine until Fedora 9, at which point some major X11 change was made and this configuration would simply crash no matter what I did. I didn’t have the time to debug it. Staying working on Fedora 8 got less appealing as time wore on, and sadly, I mostly left it on the Windows boot (which of coruse has no problem using both monitors) to just browse the web.

Solution: get a fancy PCIe dual-head video card and just use that. Actually, getting this to work required me to do a little digging in the BIOS. There are a number of settings for what to initialize first (onboard, PCI, or PCIe), and had I poked at it enough, I might conceivably have gotten the PCI boards working. Or maybe I tried that before to no avail, don’t remember. Anyway, by initializing the PCIe board first at boot, I have dual monitors under Fedora 17.

Yes, 17! Well, it’s not technically released until Tuesday, but it’s close enough.

Next problem: adding more RAM. After reviewing what my motherboard can handle, I added two 4GB DDR2 modules to bring my RAM up to 10 GB. That part was easy. I’m gonna need that RAM, because I’m all about emulation (Android) and virtualization (VMs!) these days.

Installing Fedora 17 was a little tricky. This computer has seemingly random boot issues. For one thing, it has always refused to boot from USB. That would be the preferred way to try a live distro on it. Burning CDs… so archaic, but at least it works. I saw some BIOS settings related to that, and while they seemed correct, perhaps I should fiddle with them some more. But a live CD worked for now. The other boot issue is that my keyboard is often (but not always!) disabled at the GRUB menu. Today I found BIOS settings for that too. Honestly, who would ever WANT to disable their USB keyboard at boot? And why did it work sometimes? Well, whatever. If only that were the last of the tricks my motherboard had for me.

Having installed F17 and fiddled around a bit, it was time to try out virtualization. I wanted to give the OpenShift LiveCD a try, so I started up virt-manager. It crashed. I tried VirtualBox instead. It wouldn’t run either. The problem was, I didn’t have kernel sources and headers (kernel-devel and kernel-headers) to match my running kernel, in order for kernel modules to be built for virtualization. And when I looked at what was available in the yum repos, there simply weren’t any matches. I.e., no kernel versions matched any versions of header/source available. I would basically have to build my kernel from source to get a match.

I hoped that situation would resolve itself, and when I looked today, it had. New kernel-devel available to match the kernel. Onward and upward! I tried virt-manager. It crashed. I tried VirtualBox. It gave me an error: “AMD-V is disabled in the BIOS. (VERR_SVM_DISABLED)”. Grrr! So I looked up what this is all about. There are two levels of problem.

First, my Athlon 64 does support AMD-V (the flag in /proc/cpuinfo is “svm”). However, this capability can be disabled by the motherboard. In my motherboard, the Gigabyte GA-M61P-S3, it is disabled with no option in the BIOS to configure it. Why? Gigabyte has offered no explanation. Five years ago, someone ran into this exact same problem using Xen, and the only solution he found is downgrading the BIOS. Downgrade? Yes. The ability to disable this feature was introduced in later CPU steps, and taken advantage of in later versions of the BIOS. I would be curious if the problem was rectified in later BIOS versions after that exchange (latest releases are 2007/10 and 2010/08). Doubt it, since the notes there don’t mention it. Also, I was idly looking at getting the “new hotness” of an Athlon 64 X2, since the desktop speed is about as great as my crappy laptop, but that would require a BIOS upgrade, so moving in the wrong direction as far as virtualization (and I’m not sure I want to deal with the extra heat). Maybe a third-party BIOS would do the trick? Maybe it’s just time for better hardware. Sadly, AMD seems to be somewhat throwing in the towel against Intel. Can’t really justify backing the underdog anymore :(

Second, with AMD-V disabled, virt-manager was running into an SELinux violation trying to do non-hardware-assisted emulation. Took me a while to find that link, but it shouldn’t have been hard, given that’s exactly the error I was hitting. Too bad it’s not solved in F17. At least that one is easy enough to resolve! So I at least have QEMU to work with. It’s gonna be a dog, though.

Now, the two other major things I get to deal with: dealing with GRUB2 (introduced in F16, but I haven’t really used that) and the migration from SysV to systemd.

I saw GRUB2 at work when I tried Ubuntu a while back, but didn’t really recognize it at the time and ran screaming. I’m so used to just editing my GRUB menus directly, it seems so much more complicated. It’s a bigger leap than moving from LILO years ago. But I think I can begin to see some of the benefits. Also, it seems to be pretty good at detecting the existing OSes on the system and providing boot options for them. That’s nice, since I rarely work on a single-boot system anymore. I just need to spend a little more time with guides like this one to get the hang of it.

The transition to systemd was a rude awakening too. How do I turn on sshd? How do I configure the firewall so I can VNC and ssh in? (The firewall was particularly confusing since there was an abortive attempt at replacing that with something else which made it onto the beta but was reverted.) Fortunately it doesn’t look too difficult.

Wow, this was a pretty big release for Fedora!

Oh yeah. One more thing: how to get the desktop switching of the dual monitors to work the way it used to and obviously ought to: both monitors switch when you switch desktops. I read about that decision when Gnome 3 came out with F16, and it made no sense to me. There was some hackish tweak for getting it to work the way it should, but I can’t find it right now. It seemed totally unstable under F16 – gnome-shell would freeze randomly all the time. I hope this is better in F17. I’m not encouraged, though, by the fact that most of the gnome-tweak extensions I want to use seem to be broken at this time.