OpenShift logging and metrics

Server logs aren’t usually a very exciting topic. But if you’re a sysadmin of an OpenShift Enterprise deployment with hundreds of app servers coming and going unpredictably, managing logs can get… interesting. Tools for managing logs are essential for keeping audit trails, collecting metrics, and debugging.

What’s new

Prior to OpenShift Enterprise 2.1, gear logs simply wrote to log files. Simple and effective. But this is not ideal for a number of reasons:

  1. Log files take up your gear storage capacity. It is not hard at all to fill up your gear with logs and DoS yourself.
  2. Log files go away when your gear does. Particularly for scaled applications, this is an unacceptable loss of auditability.
  3. Log file locations and rotation policies are at the whim of the particular cartridge, thus inconsistent.
  4. It’s a pain for administrators to gather app server logs for analysis, especially when they’re spread across several gears on several nodes.

OSE 2.1  introduced a method to redirect component and gear logs to syslogd, which is a standard Linux service for managing logs. In the simplest  configuration, you could have syslog just combine all the logs it receives into a single log file (and define rotation policy on that). But you can do much more. You can filter and send log entries to different destinations based on where they came from; you can send them to an external logging server, perhaps to be analyzed by tools like Splunk. Just by directing logs to syslog we get all this capability for free (we’re all about reusing existing tools in OpenShift).

Where did that come from?

Well, nothing is free. Once you’ve centralized all your logging to syslogd, then you have the problem of separating entries back out again according to source so your automation and log analysis tools can distinguish the logs of different gears from each other and from other components. This must be taken into account when directing logs to syslogd; the log entries must include enough identifying information to determine where they came from down to the level of granularity you care about.

We now give instructions for directing logs to syslog for OpenShift components too; take a look at the relevant sections of the Administration Guide for all of this. Redirecting logs from OpenShift components is fairly straightforward. There are separate places to configure if you want to use syslog from the broker rails application, the management console rails application, and the node platform. We don’t describe how to do this with MongoDB, ActiveMQ, or httpd, but those are standard components and should also be straightforward to configure as needed. Notably left out of the instructions at this point are instructions to syslog the httpd servers hosting the broker and console rails apps; but the main items of interest in those logs are error messages from the actual loading of the rails apps, which (fingers crossed) shouldn’t happen.

Notice that when configuring the node platform logging, there is an option to add “context” which is to say, the request ID and app/gear UUIDs if relevant. Adding the request ID allows connecting what happened on the node back to the broker API request that spawned the action on the node; previously this request ID was often shown in API error responses, but was only logged in the broker log. Logging the request ID with the logs for resulting node actions to the syslog  now makes it a lot easier to get the whole picture of what happened with a problem request, even if the gear was destroyed after the request failed.

Distinguishing gear logs

There are gear logs from two sources to be handled. First, we would like to collect the httpd access logs for the gears, which are generated by the node host httpd proxy (the “frontend”). Second, we would like to collect logs from the actual application servers running in each gear, whether they be httpd, Tomcat, MongoDB, or something else entirely.

Frontend access logs

These logs were already centralized into /var/log/httpd/openshift_log and included the app hostname as well as which backend address the request was proxied to. A single httpd option “OpenShiftFrontendSyslogEnabled” adds logging via “logger” which is the standard way to write to the syslog. Every entry is tagged with “openshift-node-frontend” to distinguish frontend access logs from any other httpd logs you might write.

With 2.1 the ability to look up and log the app and gear UUIDs is added. A single application may have multiple aliases, so it is hard to automatically collate all log entries for a single application. Also, an application could be destroyed and re-created with the same address, though it is technically a different app from OpenShift’s viewpoint. Also, the same application may have multiple gears, and those gears may come and go or be moved between hosts; the backend address for a gear could also be reused by a different gear after it has been destroyed.

In order to uniquely identify an application and its gears in the httpd logs for all time, OSE 2.1 introduces the “OpenShiftAnnotateFrontendAccessLog” option which adds the application and gear UUIDs as entries in the log messages. The application UUID is unique to an application for all time (another app created with exactly the same name will get a different UUID) and shared by all of its gears. The gear UUID is unique to each gear; note that the UUID (Universally Unique ID) is different from the gear UID (User ID) which is just a Linux user number and may be shared with many other gears. Scale an application down and back up, and even if the re-created gear has the same UID as a previous gear, it will have a different UUID. But note that if you move a gear between hosts, it retains its UUID.

If you want to automatically collect all of the frontend logs for an application from syslog, the way you should do it is to set the “OpenShiftAnnotateFrontendAccessLog” option and collect logs by Application UUID. Then your httpd log entries look like this:

Jun 10 14:43:59 vm openshift-node-frontend[6746]: 192.168.122.51 php-demo.openshift.example.com – – [10/Jun/2014:14:43:59 -0400] “HEAD / HTTP/1.1” 200 – “-” “curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.15.3 zlib/1.2.3 libidn/1.18 libssh2/1.4.2” (3480us) – 127.1.244.1:8080/ 53961099e659c55b08000102 53961099e659c55b08000102

The “openshift-node-frontend” tag is added to these syslog entries by logger (followed by the process ID which isn’t very useful here). The app and gear UUIDs are at the end there, after the backend address proxied to. The UUIDs will typically be equal in the frontend logs since the head gear in a scaled app gets the same UUID as the app; they would be different for secondary proxy gears in an HA app or if you directly requested any secondary gear by its DNS entry for some reason.

Gear application logs

In order to centralize application logs, it was necessary to standardize cartridge logging such that all logs go through a standard mechanism that can then be centrally configured. You might think this would just be syslog, but it was also a requirement that users should be able to keep their logs in their gear if so desired, and getting syslog to navigate all of the permissions necessary to lay down those log files with the right ownership proved difficult. So instead, all cartridges now must log via the new utility logshifter (our first released component written in “go” as far as I know). logshifter will just write logs to the gear app-root/logs directory by default, but it can also be configured (via /etc/openshift/logshifter.conf) to write to syslog. It can also be configured such that the end user can choose to override this and have logs written to gear files again (which may save them from having to navigate whatever logging service ends up handling syslogs when all they want to do is debug their app).

Here distinguishing between which gear is creating the log requires somewhat more drastic measures. We want to indicate which gear created each log entry, but we can’t trust each gear to self-report accurately (as opposed to spoofing the log traffic actually coming from another gear or something else entirely). So the context information is added by syslog itself via a custom rsyslog plugin, mmopenshift. Properly configuring this plugin requires an update to rsyslog version 7, which (to avoid conflicting with the version shipped in RHEL) is actually shipped in a separate RPM, rsyslog7. So to usefully consolidate gear logs into syslog really requires replacing your entire rsyslog with the newer one. This might seem extreme, but it’s actually not too bad.

Once this is done, any logs from an application can be directed to a central location and distinguished from other applications. This time the distinguishing characteristics are placed at the front of the log entry, e.g. for the app server entry corresponding to the frontend entry above:

2014-06-10T14:43:59.891285-04:00 vm php[2988]: app=php ns=demo appUuid=53961099e659c55b08000102 gearUuid=53961099e659c55b08000102 192.168.122.51 – – [10/Jun/2014:14:43:59 -0400] “HEAD / HTTP/1.1” 200 – “-” “curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.15.3 zlib/1.2.3 libidn/1.18 libssh2/1.4.2”

The example configuration in the manual directs these to a different log file, /var/log/openshift_gears. This log traffic could be directed to /var/log/messages like the default for everything else, or sent to a different destination entirely.

Gear metrics

Aside from just improving log administration capabilities, one of the motivations for these changes is to enable collection of arbitrary metrics from gears (see the metrics PEP for background). As of OSE 2.1, metrics are basically just implemented as log messages that begin with “type=metric”. These can be generated in a number of ways:

  • The application itself can actively generate log messages at any time; if your application framework provides a scheduler, just have it periodically output to stdout beginning with “type=metric” and logshifter will bundle these messages into the rest of your gear logs for analysis.
    • Edit 2014-06-25: Note that these have a different format and tag than the watchman-generated metrics described next, which appear under the “openshift-platform” tag and aren’t processed by the mmopenshift rsyslog plugin. So you may need to do some work to have your log analyzer consider these metrics.
  • Metrics can be passively generated in a run by the openshift-watchman service in a periodic node-wide run. This can generate metrics in several ways:
    • By default it generates standard metrics out of cgroups for every gear. These include RAM, CPU, and storage.
    • Each cartridge can indicate in its manifest that it supports metrics, in which case the bin/metrics script is executed and its output is logged as metrics. No standard cartridges shipped with OSE support metrics at this time, but custom cartridges could.
    • Each application can create a metrics action hook script in its git repo, which is executed with each watchman run and its output logged as metrics. This enables the application owner to add custom metrics per app.

It should be noted that the cartridge and action hook metrics scripts have a limited time to run, so that they can’t indefinitely block the metrics run for the rest of the gears on the node. All of this is configurable with watchman settings in node.conf. Also it should be noted that watchman-generated logs are tagged with “openshift-platform” e.g.:

Jun 10 16:25:39 vm openshift-platform[29398]: type=metric appName=php6 gear=53961099e659c55b08000102 app=53961099e659c55b08000102 ns=demo quota.blocks.used=988 quota.blocks.limit=1048576 quota.files.used=229 quota.files.limit=80000

The example rsyslog7 and openshift-watchman configuration will route watchman-generated entries differently from application-server entries since the app UUID parameter is specified differently (“app=” vs “appUuid=”). This is all very configurable.

I am currently working on installer options to enable these centralized logging options as sanely as possible.

Advertisements

OpenShift, Apache, and severe hair removal

I just solved a problem that stumped me for a week. It was what we in the business call “a doozy”. I’ll share here, mostly to vent, but also in case the process I went through might help someone else.

The problem: a non-starter

I’ve been working on packaging all of OpenShift Enterprise into a VM image with a desktop for convient hacking on a personal/throwaway environment. It’s about 3 GB (compressed) and takes an hour or so to build. It has to be built on our official build servers using signed RPMs via an unattended kickstart. There have been a few unexpected challenges, and this was basically the cherry on top.

The problem was that the first time the VM booted… openshift-broker and openshift-console failed to start. Everything else worked, including the host-level httpd. Those two (which are httpd-based) didn’t start, and they didn’t leave any logs to indicate why. They didn’t even get to the point of making log files.

And the best part? It only happened the first time. If you started the services manually, they worked. If you simply rebooted after starting the first time… they worked. So basically, the customer’s first impression would be that it was hosed… even though it magically starts working after a reboot, the damage is done. I would look like an idiot trying to release with that little caveat in place.

Can you recreate it? Conveniently?

WTF causes that? And how the heck do I troubleshoot? For a while, the best I could think of was starting the VM up in runlevel 1 (catch GRUB at boot and add the “1” parameter), patching in diagnostics, and then proceeding to init. After one run, if I don’t have what I need… recreate the VM and try again. So painful I literally just avoided it and busied myself with other things.

The first breakthrough was when I tried to test the kinds of things that happen only at the first boot after install. There are a number, potentially – services coming up for the first time and so forth. Another big one is network initialization on a new network and device. I couldn’t see how that would affect these services (they are httpd binding only to localhost), but I did experiment with changing the VM’s network device after booting (remove the udev rule, shut down, remove the device, add another), and found that indeed, it caused the failure on the next boot reliably.

So it had to be something to do with network initialization.

What’s the actual error?

Being able to cause it at will on reboot meant much easier iteration of diagnostics. First I tried just adding a call to ifconfig in the openshift-broker init script. I couldn’t see anything in the console output, so I assumed it was just being suppressed somehow.

Next I tried to zero in on the actual failure. When invoked via init script, the “daemon” function apparently swallows console output from the httpd command, but it provides an opportunity to add options to the command invocation, so I looked up httpd command line options and found two that looked helpful: “-e debug -E /var/log/httpd_console”:

-e level
Sets the LogLevel to level during server startup. This is useful for temporarily increasing the verbosity of the error messages to find problems during startup.

-E file
Send error messages during server startup to file.

This let me bump up the logging level at startup and capture the server startup messages. (Actually probably only the latter matters. Another one to keep in mind is -X which starts it as a single process/thread only – helpful for using strace to follow it. Not helpful here though.)

This let me see the actual failure:

 [crit] (EAI 9)Address family for hostname not supported: alloc_listener: failed to set up sockaddr for 127.0.0.1

Apparently the httpd startup process tries to bind to network interfaces before it even opens logs, and this is what you get when binding to localhost fails.

What’s the fix?

An error is great, but searching the mighty Google for it was not very encouraging. There were a number of reports of the problem, but precious little about what actually caused it. The closest I could find was this httpd bug report:

Bug 52709 – Apache can’t bind to 127.0.0.1 if eth0 has only IPv6

[…]

This bug also affects httpd function ap_get_local_host() as described in http://bugs.debian.org/629899
Httpd will then fail to get the fully qualified domain name.

This occurs when apache starts before dhclient finished its job.

Here was something linking the failure to bind to 127.0.0.1 to incomplete network initialization by dhclient. Suddenly the fact that my test “ifconfig” at service start had no output did not seem like a fluke. When I started the service manually, ifconfig certainly had output.

So, here’s the part I don’t really claim to understand. Apparently there’s a period after NetworkManager does its thing and other services are starting where, in some sense at least, the network isn’t really available. At least not in a way that ifconfig can detect, and not in a way that allows httpd to explicitly bind to localhost.

As a workaround, I added a shim service that would wait for network initialization to actually complete before trying to start the broker and console. I could add a wait into those service scripts directly, but I didn’t want to hack up the official files that way. So I created this very simple service (openshift-await-eth0) that literally just runs “ifconfig eth0” and waits up to 60 seconds for it to include a line beginning “inet addr:”. Notably, if I have it start right after NetworkManager, it finishes immediately, so it seems the network is up at that point, but goes away just in time for openshift-broker and openshift-console to trip over it. So I have the service run right before openshift-broker, and now my first boot successfully starts everything.

Since this is probably the only place we’ll ever deploy OpenShift that has dhclient trying to process a new IP at just the time the broker and console are being started, I don’t know that it will ever be relevant to anyone else. But who knows. Maybe someone can further enlighten me on what’s going on here and a better way to avoid it.

NIC after clone

In the spirit of blogging to remember what I did, I hope I have a solution by the end of this post.

On RHEL 6, using virt-clone, I cloned a paused QEMU/KVM VM , then cloned the clone and brought that up. The  MAC on the NIC was changed, but I couldn’t bring it up:

ifup eth0
Device eth0 does not seem to be present, delaying initialization.

Looking in ifcfg-eth0, everything looked generic:

DEVICE=eth0
BOOTPROTO=dhcp
ONBOOT=yes
NM_MANAGED=no

Just for completeness, I went ahead and added:

HWADDR=52:54:00:72:a6:c0

Don’t think that should be necessary, anyway it made no difference.

My coworker helpfully pointed out that udev might need some changes after a clone. Looking at /etc/udev/rules.d/70-persistent-net.rules I could see some of the problem:

# This file was automatically generated by the /lib/udev/write_net_rules
# program, run by the persistent-net-generator.rules rules file.
#
# You can modify it, as long as you keep each rule on a single
# line, and change only the value of the NAME= key.
# PCI device 0x1af4:0x1000 (virtio-pci)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="52:54:00:ef:64:a4", ATTR{type}=="1", KERNEL=="eth*", NAME="eth0"
# PCI device 0x1af4:0x1000 (virtio-pci)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="52:54:00:72:a6:c0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth1"

Looks like rather than replace eth0’s MAC addr, the clone just added a new device, and the record of the old one is still there. But the obvious thing to do here doesn’t seem to help. I tried commenting out the first definition and changing the second one to eth0. No dice – same result. So, more poking around.

StackOverflow turned up something that mentioned /sys/class/net/ which on my clone contains:

# ls /sys/class/net/
eth1 lo

Huh… so as far as the system is concerned, there really is no eth0. I could probably get by just having eth1, but I’d really like to know how to fix this.

This page brought me to some more interesting low-level stuff.

# ip link show eth0
Device "eth0" does not exist.
# ip link show eth1
2: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 52:54:00:72:a6:c0 brd ff:ff:ff:ff:ff:ff

Good to know we’re all reading from the same script. So who determines what gets assigned to eth0 or eth1?

That’s when I started writing this.

And then I found this forum entry. So I deleted the udev rules file and rebooted. And that worked – the NIC came back as eth0 and got a connection without me changing anything.

I guess it makes some sense. The rules file refers to a device that doesn’t exist, so a rule is added for the new device, and just in case the old one comes back, the old rule remains.

But seriously, rebooting? There has to be something I can just run to set things right.

Reverse proxy resource usage – httpd 2.2.15

Recently I had reason to find out what happens when your httpd config includes a LOT of ProxyPass statements. The particular use case involved an application farm with one VirtualHost per domain for hundreds of domains, each very similar (though not identical) with dozens of ProxyPass statements to shared backend apps. But just a few dozen domains resulted in very high memory and CPU use.

I set up a simple test rig (CentOS 5 VM,1GB RAM, single x86 CPU, ERS 4.0.2 httpd 2.2.15) and ran some unscientific tests where the config included 20,000 ProxyPass statements with these variables:

  1. Unique vs. repeated – unique statements each proxied to a unique destination, while repeated ones proxied different locations to the same destination.
  2. Balancer – the statements either proxy directly to the URL, or use a pre-defined balancer:// address.
  3. Vhost – the ProxyPass statements were either all in the main_server definition or inside vhosts, one per.
  4. MPM – either prefork or worker MPM is used.

No actual load was applied to the server – I just wanted to see what it took to read the config and start up. Settings were defaults per MPM (5 children for prefork, 3 children for  worker) – obviously you’d tune these depending on usage. I tried to wait for things to settle down a bit after startup before reading “top” sorted by memory usage.

I also tried some other methods for accomplishing this task to see what the memory footprint would be.

Continue reading

Trying out mod_fcgid

mod_fcgid is an Apache httpd module that sets up a separate daemon for processing CGI requests. It’s like mod_cgid but it keeps a pool of processes running rather than spin up one each time a script needs execution. SpringSource ERS 4 comes with it but you have to enable the module and configure.

Official docs: http://httpd.apache.org/mod_fcgid/mod/mod_fcgid.html Continue reading