OpenShift logging and metrics

Server logs aren’t usually a very exciting topic. But if you’re a sysadmin of an OpenShift Enterprise deployment with hundreds of app servers coming and going unpredictably, managing logs can get… interesting. Tools for managing logs are essential for keeping audit trails, collecting metrics, and debugging.

What’s new

Prior to OpenShift Enterprise 2.1, gear logs simply wrote to log files. Simple and effective. But this is not ideal for a number of reasons:

  1. Log files take up your gear storage capacity. It is not hard at all to fill up your gear with logs and DoS yourself.
  2. Log files go away when your gear does. Particularly for scaled applications, this is an unacceptable loss of auditability.
  3. Log file locations and rotation policies are at the whim of the particular cartridge, thus inconsistent.
  4. It’s a pain for administrators to gather app server logs for analysis, especially when they’re spread across several gears on several nodes.

OSE 2.1  introduced a method to redirect component and gear logs to syslogd, which is a standard Linux service for managing logs. In the simplest  configuration, you could have syslog just combine all the logs it receives into a single log file (and define rotation policy on that). But you can do much more. You can filter and send log entries to different destinations based on where they came from; you can send them to an external logging server, perhaps to be analyzed by tools like Splunk. Just by directing logs to syslog we get all this capability for free (we’re all about reusing existing tools in OpenShift).

Where did that come from?

Well, nothing is free. Once you’ve centralized all your logging to syslogd, then you have the problem of separating entries back out again according to source so your automation and log analysis tools can distinguish the logs of different gears from each other and from other components. This must be taken into account when directing logs to syslogd; the log entries must include enough identifying information to determine where they came from down to the level of granularity you care about.

We now give instructions for directing logs to syslog for OpenShift components too; take a look at the relevant sections of the Administration Guide for all of this. Redirecting logs from OpenShift components is fairly straightforward. There are separate places to configure if you want to use syslog from the broker rails application, the management console rails application, and the node platform. We don’t describe how to do this with MongoDB, ActiveMQ, or httpd, but those are standard components and should also be straightforward to configure as needed. Notably left out of the instructions at this point are instructions to syslog the httpd servers hosting the broker and console rails apps; but the main items of interest in those logs are error messages from the actual loading of the rails apps, which (fingers crossed) shouldn’t happen.

Notice that when configuring the node platform logging, there is an option to add “context” which is to say, the request ID and app/gear UUIDs if relevant. Adding the request ID allows connecting what happened on the node back to the broker API request that spawned the action on the node; previously this request ID was often shown in API error responses, but was only logged in the broker log. Logging the request ID with the logs for resulting node actions to the syslog  now makes it a lot easier to get the whole picture of what happened with a problem request, even if the gear was destroyed after the request failed.

Distinguishing gear logs

There are gear logs from two sources to be handled. First, we would like to collect the httpd access logs for the gears, which are generated by the node host httpd proxy (the “frontend”). Second, we would like to collect logs from the actual application servers running in each gear, whether they be httpd, Tomcat, MongoDB, or something else entirely.

Frontend access logs

These logs were already centralized into /var/log/httpd/openshift_log and included the app hostname as well as which backend address the request was proxied to. A single httpd option “OpenShiftFrontendSyslogEnabled” adds logging via “logger” which is the standard way to write to the syslog. Every entry is tagged with “openshift-node-frontend” to distinguish frontend access logs from any other httpd logs you might write.

With 2.1 the ability to look up and log the app and gear UUIDs is added. A single application may have multiple aliases, so it is hard to automatically collate all log entries for a single application. Also, an application could be destroyed and re-created with the same address, though it is technically a different app from OpenShift’s viewpoint. Also, the same application may have multiple gears, and those gears may come and go or be moved between hosts; the backend address for a gear could also be reused by a different gear after it has been destroyed.

In order to uniquely identify an application and its gears in the httpd logs for all time, OSE 2.1 introduces the “OpenShiftAnnotateFrontendAccessLog” option which adds the application and gear UUIDs as entries in the log messages. The application UUID is unique to an application for all time (another app created with exactly the same name will get a different UUID) and shared by all of its gears. The gear UUID is unique to each gear; note that the UUID (Universally Unique ID) is different from the gear UID (User ID) which is just a Linux user number and may be shared with many other gears. Scale an application down and back up, and even if the re-created gear has the same UID as a previous gear, it will have a different UUID. But note that if you move a gear between hosts, it retains its UUID.

If you want to automatically collect all of the frontend logs for an application from syslog, the way you should do it is to set the “OpenShiftAnnotateFrontendAccessLog” option and collect logs by Application UUID. Then your httpd log entries look like this:

Jun 10 14:43:59 vm openshift-node-frontend[6746]: 192.168.122.51 php-demo.openshift.example.com – – [10/Jun/2014:14:43:59 -0400] “HEAD / HTTP/1.1” 200 – “-” “curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.15.3 zlib/1.2.3 libidn/1.18 libssh2/1.4.2” (3480us) – 127.1.244.1:8080/ 53961099e659c55b08000102 53961099e659c55b08000102

The “openshift-node-frontend” tag is added to these syslog entries by logger (followed by the process ID which isn’t very useful here). The app and gear UUIDs are at the end there, after the backend address proxied to. The UUIDs will typically be equal in the frontend logs since the head gear in a scaled app gets the same UUID as the app; they would be different for secondary proxy gears in an HA app or if you directly requested any secondary gear by its DNS entry for some reason.

Gear application logs

In order to centralize application logs, it was necessary to standardize cartridge logging such that all logs go through a standard mechanism that can then be centrally configured. You might think this would just be syslog, but it was also a requirement that users should be able to keep their logs in their gear if so desired, and getting syslog to navigate all of the permissions necessary to lay down those log files with the right ownership proved difficult. So instead, all cartridges now must log via the new utility logshifter (our first released component written in “go” as far as I know). logshifter will just write logs to the gear app-root/logs directory by default, but it can also be configured (via /etc/openshift/logshifter.conf) to write to syslog. It can also be configured such that the end user can choose to override this and have logs written to gear files again (which may save them from having to navigate whatever logging service ends up handling syslogs when all they want to do is debug their app).

Here distinguishing between which gear is creating the log requires somewhat more drastic measures. We want to indicate which gear created each log entry, but we can’t trust each gear to self-report accurately (as opposed to spoofing the log traffic actually coming from another gear or something else entirely). So the context information is added by syslog itself via a custom rsyslog plugin, mmopenshift. Properly configuring this plugin requires an update to rsyslog version 7, which (to avoid conflicting with the version shipped in RHEL) is actually shipped in a separate RPM, rsyslog7. So to usefully consolidate gear logs into syslog really requires replacing your entire rsyslog with the newer one. This might seem extreme, but it’s actually not too bad.

Once this is done, any logs from an application can be directed to a central location and distinguished from other applications. This time the distinguishing characteristics are placed at the front of the log entry, e.g. for the app server entry corresponding to the frontend entry above:

2014-06-10T14:43:59.891285-04:00 vm php[2988]: app=php ns=demo appUuid=53961099e659c55b08000102 gearUuid=53961099e659c55b08000102 192.168.122.51 – – [10/Jun/2014:14:43:59 -0400] “HEAD / HTTP/1.1” 200 – “-” “curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.15.3 zlib/1.2.3 libidn/1.18 libssh2/1.4.2”

The example configuration in the manual directs these to a different log file, /var/log/openshift_gears. This log traffic could be directed to /var/log/messages like the default for everything else, or sent to a different destination entirely.

Gear metrics

Aside from just improving log administration capabilities, one of the motivations for these changes is to enable collection of arbitrary metrics from gears (see the metrics PEP for background). As of OSE 2.1, metrics are basically just implemented as log messages that begin with “type=metric”. These can be generated in a number of ways:

  • The application itself can actively generate log messages at any time; if your application framework provides a scheduler, just have it periodically output to stdout beginning with “type=metric” and logshifter will bundle these messages into the rest of your gear logs for analysis.
    • Edit 2014-06-25: Note that these have a different format and tag than the watchman-generated metrics described next, which appear under the “openshift-platform” tag and aren’t processed by the mmopenshift rsyslog plugin. So you may need to do some work to have your log analyzer consider these metrics.
  • Metrics can be passively generated in a run by the openshift-watchman service in a periodic node-wide run. This can generate metrics in several ways:
    • By default it generates standard metrics out of cgroups for every gear. These include RAM, CPU, and storage.
    • Each cartridge can indicate in its manifest that it supports metrics, in which case the bin/metrics script is executed and its output is logged as metrics. No standard cartridges shipped with OSE support metrics at this time, but custom cartridges could.
    • Each application can create a metrics action hook script in its git repo, which is executed with each watchman run and its output logged as metrics. This enables the application owner to add custom metrics per app.

It should be noted that the cartridge and action hook metrics scripts have a limited time to run, so that they can’t indefinitely block the metrics run for the rest of the gears on the node. All of this is configurable with watchman settings in node.conf. Also it should be noted that watchman-generated logs are tagged with “openshift-platform” e.g.:

Jun 10 16:25:39 vm openshift-platform[29398]: type=metric appName=php6 gear=53961099e659c55b08000102 app=53961099e659c55b08000102 ns=demo quota.blocks.used=988 quota.blocks.limit=1048576 quota.files.used=229 quota.files.limit=80000

The example rsyslog7 and openshift-watchman configuration will route watchman-generated entries differently from application-server entries since the app UUID parameter is specified differently (“app=” vs “appUuid=”). This is all very configurable.

I am currently working on installer options to enable these centralized logging options as sanely as possible.

Advertisements

One Response

  1. Hi Luke Meyer,

    I would like to stop writing below type of logs to /var/log/messages file in openshift V2 as these loggings are causing high diskspace utilizations in my nodes. Could you please let me know how can i disable below type of loggings in messages file..?

    Jun 10 14:43:59 vm openshift-node-frontend[6746]: 192.168.122.51 php-demo.openshift.example.com – – [10/Jun/2014:14:43:59 -0400] “HEAD /

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: