Concerns for preflight check design

Lately I’m working on preflight checks for OpenShift v3 Ansible installs/upgrades. There is no piece right now that checks that you have everything you might reasonably need set up for an install/upgrade and bails out before doing anything if you don’t. What happens right now is that you get partway through the install/upgrade and then find out… oh, you have the wrong repos enabled or whatever, UGLY ERROR -> Fix it and start over again… bleah. Nobody enjoys SEV1 support calls in the middle of the night. For installs and particularly for upgrades, we’d really like the sysadmin to be able to run a preflight check before their outage window and find out about any common problems at that time.

So my latest conundrum is figuring out what the user expects during a preflight check. This is not as straightforward as you might think. The installer does a pretty good job of figuring out what you meant without you having to specify everything down to the last detail (because humans are not reliably good at doing that). Thing is, it may install and configure a number of things on your systems… just in order to figure out how to run.

This isn’t a big deal in the installer, because when you run an install or upgrade, you expect to install and configure things. Preflight checks are different because you’d like to affect system state as little as possible. The whole idea is to do checks before you make changes. So if we just reuse the logic the installer uses, users may be unpleasantly surprised to find their systems being changed.

So, for example. Pretty much the first thing that we want is facts about the configuration and the systems, which the openshift_facts role provides. This role runs various custom Ansible modules on target systems, which requires several dependencies to be present on those systems. If they aren’t there, they’re installed.

An Origin RPM install requires enabling an Origin repo. Unless you configure one beforehand, for Origin this is usually set up by the openshift_repos role, which is a dependency of the openshift_version role. So if you want to run the preflight checks before an install, you won’t have any Origin repo to check RPMs unless the checks configure this repo like the installer does.

The openshift_version role itself relies on some clever things to determine the version to install. If you’re doing an RPM install, it uses the repoquery tool to determine the precise version of RPMs that are available, so it can match it with the precise version of images to run; thus yum-utils is installed to provide repoquery. If you’re doing an enterprise containerized install, it looks up the precise version of images available by running a docker image on the remote host — and on an RPM-based host, installs and configures firewalld and docker to run that.

So in thinking about this, I’ve tried to determine if there’s any way to tease out just what we need for preflight checks and put that in a shared role, without having to go through as thorough a setup as we would for an install or upgrade. Or if we can make simplifying assumptions to do only what we need. Without going through too detailed an analysis, I think the answer is basically… no. We do not want to create and maintain parallel logic in the preflight checks for the very complex ways in which the installer determines what to do.

Reflecting a bit further, letting preflight config setup alter the systems is not really a problem, practically speaking.  If the user is installing a new cluster or adding hosts to an existing one, the target hosts are not in production yet, so altering them should be acceptable. If the user is upgrading, all of the necessary config and dependencies should already be in place, so hosts won’t be substantially altered. So, just depend on the same logic from the installer (and perhaps improve the user-friendliness of the output when things go wrong even before preflight checks). And very clearly document expectations.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: