Exploring 3 – docker

More unreliable ruminations –

When Docker started to make a splash, I took a quick look at it, you know, the basic tutorial. All very nice, but not too much depth. And even though the rest of the OpenShift team has pivoted to this platform fairly quickly, I’ve been waiting until I would actually have some real time to devote to it before digging in deeper.

Although I know that at the pace this stuff is moving, RHEL 7 is already far behind, I brought up a RHEL 7 host and started running through https://access.redhat.com/articles/881893 which has a little more meat to it as far as introducing Docker capabilities. Under RHEL 7, Docker is in the “extras” channel (and relies on one pkg in the “optional” channel). It’s useful to know that the “extras” channel is actually supported (unlike “optional”), but not on the same terms as the rest of RHEL – things in this channel are allowed to move quickly and break compatibility. That’s a good place for Docker, since I know our team is still collaborating heavily with Docker to get in features needed for OpenShift. I expect there will be a sizeable update for RHEL 7.1, although chances are we’ll be using Atomic anyway.

Atomic ships tmux but not screen. I guess it’s time for me to finally make the leap. As tempting as it is to just remap the meta key to C-a, I should probably get used to using the defaults.

The first thing that would probably help me to understand Docker is an analogy with Docker registries/repositories and git. Docker is clearly informed by git and VCS, using some of the same verbs (pull, push, commit, tag) but assigning different semantics.

This article clarified the similarities and differences in terms (although it’s not clear when it was written, looks like about a year ago… seriously, an undated blog post on new technology? How does this keep happening?). Dockerhub is approximately like Github… repositories are approximately like Github repos. The location of the image layer information doesn’t seem to be the same for me, but I don’t know if that’s because Docker changed in the meantime or because it is packaged differently for RHEL/Atomic.

docker pull

So, you “docker pull” an image. It’s a little confusing where you’re pulling it from and to. “To” turns out to be clearest… a local cache, which nothing ever tells you where that is, but it looks like on RHEL 7 it’s under /var/lib/docker/ – there’s image metadata at /var/lib/docker/graph/ and perhaps some actual content at /var/lib/docker/devicemapper/ but I’m having trouble seeing exactly how the image data is stored – I’m sure this is confusing for a reason. Open question for now.

Here’s a handy alias:

# alias json="python -mjson.tool <"

Now you can pretty-print json without having to think much about it:

json /var/lib/docker/graph/216c11b99bd09033054595d08c28cf27dabcc1b18c2cd0991fce6b1ff1c0086f/json | less

Docker storage is configurable in /etc/sysconfig/docker-storage and under Atomic, perhaps predictably, it is customized to live under /dev/atomicos/. Though there’s still plenty under /var/lib/docker.

So this is a bit like a system-wide git repository. You can contact as many “remotes” (registries) as you like, and pull down “branches” (images) composed of successive “commits” (layers) potentially with “tags” (tags! although tags do double duty as points in time and moving like branches). Once they’re present locally you can fire them up, modify them (with another commit) and push the results back to a registry.

It’s less than crystal clear to me how “docker pull” chooses a remote, i.e. how registries are determined. OK, if you “docker pull registry.access.redhat.com/rhel” it should be apparent where that’s coming from. But despite the docker hub reportedly being disabled, if I “docker pull ubuntu” or “docker pull nginx” those load up just fine – from where? Evidently Docker Hub isn’t disabled. Here’s how it seems to work:

docker pull <word e.g. "ubuntu">  = get images from public "word" repository on Docker Hub
docker pull <word>/<repo> = get images from repo owned by <word> account on Docker hub
docker pull <hostname or IP>/<repo> = get images from repo on other registry

In all cases, you can add a :tag to pull only a specific tag (and any images it is based on) rather than all of the tags in the repository.

As with git repos, you have a local concept of the remote repo which can be out of sync. So you have to push and pull to sync them up as needed.

docker commit / build / tag

If you run an image as a container, you can then commit the result as an image. If you commit it with the same name as an existing repository, it’s implicitly tagged as :latest.

Similarly you can use “docker build” with a Dockerfile that specifies base image and commands to run against it, then commit the result as an image in a repository.

Finally, you can just re-tag any image in the local cache with any repository and tag you want (within the bounds of syntax, which are pretty loose). So “docker tag” doesn’t just apply tags (and moving tags = branches) but also repositories.

docker push

Having created an image in a repo, docker push is the reverse of docker pull… and the repo indicates where it will go.

You can’t docker push to one of the root repos (like just plain “mongodb”). You can of course pull that, re-tag it with your own Docker Hub id (e.g. “docker tag mongodb sosiouxme/mongodb”) and then push it (assuming you’ve logged in and want it on your Docker Hub account).

Finally if you have tagged your image with a repo name that includes hostname/IP, then docker push will try to push it to a registry at that hostname/IP (assuming it exists and you have access). RHEL 7 ships docker-registry, but Atomic does not at this point – and why should it when you can just run the registry itself in a container?

Advertisements

Checking out Hibernate with STS

I have cause to revisit my post about importing a git checkout into Eclipse. I need to take a look at the Hibernate code, which is on github at git://github.com/hibernate/hibernate-orm.git, and given the somewhat convoluted nature of this code, I need an IDE to help navigate it.

Now, I hear that with Eclipse Indigo (3.7), which is included with the latest STS 2.9 (which is what I use), the EGit plugin is included out of the box (which, for the purposes of this post, is what I’m doing – completely stock install). That’s helpful. See, previously, if you wanted to do something with git, you would find no evidence within Eclipse that it could. If you figured “there must be an extension for that” and searched for “git” from the extensions wizard, there would be no hits. Because what you needed to look for was “JGit” or “EGit” – you big dummy. An example of what I consider low discoverability that’s pervasive in Eclipse/STS. But I digress.

At least EGit has had a couple years to bake since my post. I went to File->Import->Git->Projects from Git and put in the URI above. This seems pretty straightforward:

Image

I’m not sure why it parses the URI into Host and Repository path here. Is there some reason you’d want to customize these?

In the next step, I pick the branches from the repo I want and proceed to the “local destination” dialog.

Image

These steps might be somewhat confusing to those who don’t know git and just want to take a look at some code. Since git is distributed, you don’t just get a point-in-time checkout from a repo, you get your own copy of the repo – or as much of it as you want. Basically it’s asking where I want my copy of the repository and checkout to go. The checkout (“initial branch” here) will go in the directory, and the repo will go in a .git subdirectory. “origin” is the name given to the repository I cloned this from, in case I want to sync with that later. That might be kind of obvious to someone familiar with git, but how about some tips for those who aren’t?

My question: why doesn’t this all simply default to being inside the workspace? What’s a workspace for, if not the project contents? As you can see, the default is to create a ~/git directory and checkout/clone the repo there.

Next step, three inscrutable options for how to deal with the resulting project(s) that have been checked out:

Image

OK. These seriously need some explanation. What do these do?

“Import existing projects” gets me nowhere in this case, as it requires Eclipse project descriptors to be included in the checkout, and they’re not. Arguably, they probably shouldn’t be. I just get the error “no projects found” if I try this. But that means I need to figure out myself how to get Eclipse/STS to interpret this checkout properly.

“Use the New Project wizard” is an option I don’t really understand. It just dumps you into the new project wizard you would get by clicking the new project button (generally the first button in the toolbar). This is also where you end up if you click “Finish” instead of “Next” anywhere along the way. I guess I could make use of the directory just created. I  also can’t go “back” and choose another option from here; cancel, and I’m back to square one. In general, I find the “New Project wizard” one of the most confusing things about Eclipse/STS, because there are so many options, many sounding similar yet meaning something completely different, and no explanations of what you can expect to get. Do I really have to go looking for doc that should be a click away? I digress.

“Import as general project” basically just creates a project with the given content and no organization. STS recognizes the different file types, of course, but there’s no concept of where the classpaths begin, how to build and test the project, anything like that – just plain directories with content. This won’t get me to my goal, which is to be able to look up class hierarchies, implementors of interfaces, etc. However, having done this, I can try to configure the project to get it to where STS understands these things.

I’m interested in the 3.6 branch of Hibernate, which is a Maven project (you can tell from the pom.xml – woe betide you in the Java world if you don’t recognize Maven when you see it. The “master” branch seems to be using Gradle). So I can right-click the project and Configure -> Convert to Maven Project.

By the way, let me point out something that didn’t work at all: creating a new project with the wizard “Maven -> Checkout Maven Projects from SCM”.

Image

This is apparently not aware of the EGit plugin, because there’s no SCM protocol listed here (the first box  is greyed out). If I click “Finish” here nothing happens except the dialog exits. I think it would probably work if I added a m2e SCM connector like the link suggests, but how would I know to do that?

Alright, so now I have a Maven project. Right away in the top-level pom.xml I get a “Project build error: Unresolveable build extension: Plugin org.jboss.maven.plugins:maven-jdocbook-style-plugin:2.0.0 or one of its dependencies could not be resolved: Could not find artifact org.jboss.maven.plugins:maven-jdocbook-style-plugin:jar:2.0.0 in central (http://repo1.maven.org/maven2)”. I happen to know what this is about because I know there are a bunch of JBoss dependencies not in Maven Central. How would I know that if I didn’t know? Google, I guess. Fortunately searching for that exact error message gets me right to a StackOverflow question about exactly the same thing, which someone has helpfully solved. I love SO, I just hate that it has to exist. Documentation is always about how to use something the right way, not what to do when something goes wrong. SO fills that gap.

So, add the repository information to the pom.xml – or, better, to my Maven settings.xml (which I had to create since STS is providing Maven in this setup) and on to the next problem. Two, actually (always seems to be the way of it – removing a build problem just uncovers more). These are related to “Missing artifact commons-logging”. A little Google sauce on that turns up this blog post (like the name, kinda like my blog!) about the death of the commons-logging dependency. Gotta love trying to support these old builds from a public ever-changing repo. Problem is, the Hibernate pom (actually the parent pom, which is in a subdirectory! huh?) uses the hack from that article, but the repo supplying the dummy dependencies seems to be down. So perhaps I should try the exclusions suggested by commentors in that blog? I found something that looks handy: in the pom dependency hierarchy, right-click and choose “Exclude Maven artifact”:

Image

Sadly, this doesn’t work:

Image

But here’s another StackOverflow suggestion. This seems to work, after removing the existing commons-logging dependencies and adding those ones in the parent pom, and (finally) right-clicking on the project, Maven -> Update project configuration. The errors are gone, and (I suspect) so is all the Maven-fu I can expect today.

Unfortunately I’m still not at my goal – I just have the Maven nature working now.

Turns out, this wasn’t quite the right path. What I’m looking at here are multiple Maven projects, with interdependencies. There’s no way I’m ever going to get anything useful from this in a single STS project. What I need to do is import these as multiple projects. In the meantime, delete the existing project (but leave the checkout) so it doesn’t get in the way.

So here’s what I do: File -> Import -> Existing Maven Projects and enter the path to my local checkout as the “Root Directory”:

If I select all the projects, they’ll all be created as interdependent workspace projects, each with build path and so forth configured according to Maven.

With lots of errors, of course… thousands, in fact. But let me start with the Maven problems, which are probably the source of the rest. Looks like all of the Maven errors are of the form “Plugin execution not covered by lifecycle configuration: org.jboss.maven.plugins:maven-injection-plugin:1.0.2:bytecode (execution: default, phase: compile)” – with a different plugin each time. I remember the import screen warned about some problems that would need to be resolved later – this seems to be what it was talking about.

Well, much later now, I think the Maven errors were mostly irrelevant. Those were due to the change to the m2eclipse plugin which broke the world for a lot of Maven users in Eclipse. Most of them were things that looked like it was safe to have m2eclipse “ignore” as recommended there. I went ahead and ran some of the goals that looked important (antrun:run and injection:bytecode in hibernate-entitymanager, the latter in hibernate-core) from the command line. Not sure they made much difference. I did Maven -> Update Project Configuration on everything changed and most of the red X’s went away.

I also ran into this problem and crashed a few times just by mousing over the “Window->Browser” menu before adding “-Dorg.eclipse.swt.browser.DefaultType=mozilla” to my STS.ini to avoid it.

At this point, the only problem seems to be that hibernate-entity has a ton of tests with imports like this:

import org.hibernate.ejb.metamodel.Customer_;
import org.hibernate.ejb.metamodel.Order;
import org.hibernate.ejb.metamodel.Order_;

… and then goes on to use these classes with underscores, which aren’t there. Evidently they’re supposed to be generated at some point, but I’m not sure how. I don’t really care about running these tests, just wanted to look at the framework code, so although STS reports 14382 Java problems, I can consider my work done here. Boy, that was easy!

One more note: I went back and added the git SCM connector for m2eclipse to try it out. It worked… but poorly. The way that worked was to select “git” for the scheme, then put in the git:// URI for the project, then wait for a popup to select the projects to import. If I reversed order or didn’t wait, I got either an error or nothing after hitting “Finish”. Hmm… hope that’s better in the next update. And, interestingly… the checkout/repo went into the workspace.