Implementing an OpenShift Enterprise routing layer for HA applications

My previous post described how to make an application HA and what exactly that means behind the scenes. This post is to augment the explanation in the HA PEP of how an administrator should expect to implement the routing layer for HA apps.

The routing layer implementation is currently left entirely up to the administrator. At some point OpenShift will likely ship a supported routing layer component, but the first priority was to provide an SPI (Service Provider Interface) so that administrators could reuse existing routing and load balancer infrastructure with OpenShift. Since most enterprises already have such infrastructure, we expected they would prefer to leverage that investment (both in equipment and experience) rather than be forced to use something OpenShift-specific.

Still, this leaves the administrator with the task of implementing the interface to the routing layer. Worldline published an nginx implementation, and we have some reference implementations in the works, but I thought I’d outline some of the details that might not be obvious in such an implementation.

The routing SPI

The first step in the journey is to understand the routing SPI events. The routing SPI itself is an interface on the OpenShift broker app that must be implemented via plugin. The example routing plugin that is packaged for Origin and Enterprise simply serializes the SPI events to YAML and puts them on an ActiveMQ message queue/topic. This is just one way to distribute the events, but it’s a pretty good way, at least in the abstract. For routing layer development and testing, you can just publishes messages on a topic on the ActiveMQ instance OpenShift already uses (for Enterprise, openshift.sh does this for you) and use the trivial “echo” listener to see exactly what comes through. For production, publish events to a queue (or several if multiple instances need updating) on an HA ActiveMQ deployment that stores messages to disk when shutting down (you really don’t want to lose routing events) – note that the ActiveMQ deployment described in OpenShift docs and deployed by the installer does not do this, being intended for much more ephemeral messages.

I’m not going to go into detail about the routing events. You’ll become plenty familiar if you implement this. You can see some good example events in this description, but always check what is actually coming out of the SPI as there may have been updates (generally additions) since. The general outline of the events can be seen in the Sample Routing Plug-in Notifications table from the Deployment Guide or in the example implementation of the SPI. Remember you can always write your own plugin to give you information in the desired format.

Consuming SPI events for app creation

The routing SPI publishes events for all apps, not just HA ones, and you might want to do something with other apps (e.g. implement blue/green deployments), but the main focus of a routing layer is to implement HA apps. So let’s look at how you do that. I’m assuming YAML entries from the sample activemq plugin below — if you use a different plugin, similar concepts should apply just with different details.

First when an app is created you’re going to get an app creation event:

$ rhc app create phpha php-5.4 -s

:action: :create_application
:app_name: phpha
:namespace: demo
:scalable: true
:ha: false

This is pretty much just a placeholder for the application name. Note that it is not marked as HA. There is some work coming to make apps HA at creation, but currently you just get a scaled app and have to make it HA after it’s created. This plugin doesn’t publish the app UUID, which is what I would probably do if I were writing a plugin now. Instead, you’ll identify the application in any future events by the combination of app_name and namespace.

Once an actual gear is deployed, you’ll get two (or more) :add_public_endpoint actions, one for haproxy’s load_balancer type and one for the cartridge web_framework type (and possibly other endpoints depending on cartridge).

:action: :add_public_endpoint
:app_name: phpha
:namespace: demo
:gear_id: 542b72abafec2de3aa000009
:public_port_name: haproxy-1.4
:public_address: 172.16.4.200
:public_port: 50847
:protocols:
- http
- ws
:types:
- load_balancer
:mappings:
- frontend: ''
 backend: ''
- frontend: /health
 backend: /configuration/health

You might expect that when you make the app HA, there is some kind of event specific to being made HA. There isn’t at this time. You just get another load_balancer endpoint creation event for the same app, and you can infer that it’s now HA. For simplicity of implementation, it’s probably just best to treat all scaled apps as if they were already HA and define routing configuration for them.

Decision point 1: The routing layer can either direct requests only to the load_balancer endpoints and let them forward traffic all to the other gears, or it can actually just send traffic directly to all web_framework endpoints. The recommendation is to send traffic to the load_balancer endpoints, for a few reasons:

  1. This allows haproxy to monitor traffic in order to auto-scale.
  2. It will mean less frequent changes to your routing configuration (important when changes mean restarts).
  3. It will mean fewer entries in your routing configuration, which could grow quite large and become a performance concern.

However, direct routing is viable, and allows an implementation of HA without actually going to the trouble of making apps HA. You would just have to set up a DNS entry for the app that points at the routing layer and use that. You’d also have to handle scaling events manually or from the routing layer somehow (or even customize the HAproxy autoscale algorithm to use stats from the routing layer).

Decision point 2: The expectation communicated in the PEP (and how this was intended to be implemented) is that requests will be directed to the external proxy port on the node (in the example above, that would be http://172.16.4.200:50847/). There is one problem with doing this – idling. Idler stats are gathered only on requests that go through the node frontend proxy, so if we direct all traffic to the port proxy, the haproxy gear(s) will eventually idle and the app will be unavailable even though it’s handling lots of traffic. (Fun fact: secondary gears are exempt from idling – doesn’t help, unless the routing layer proxies directly to them.) So, how do we prevent idling? Here are a few options:

  1. Don’t enable the idler on nodes where you expect to have HA apps. This assumes you can set aside nodes for (presumably production) HA apps that you never want to idle. Definitely the simplest option.
  2. Implement health checks that actually go to the node frontend such that HA apps will never idle. You’ll need the gear name, which is slightly tricky – the above endpoint being on the first gear, it will be accessible by a request for http://phpha-demo.cloud_domain/health to the node at 172.16.4.200. When the next gear comes in, you’ll have to recognize that it’s not the head gear and send the health check to e.g. http://542b72abafec2de3aa000009-demo.cloud_domain/health.
  3. Flout the PEP and send actual traffic to the node frontend. This would be the best of all worlds since the idler would work as intended without any special tricks, but there are some caveats I’ll discuss later.

Terminating SSL (TLS)

When hosting multiple applications behind a proxy, it is basically necessary to terminate SSL at the proxy. (Despite SSL having been essentially replaced by TLS at this point, we’re probably going to call it SSL for the lifetime of the internet.) This has to do with the way routing works under HTTPS; during the intialization of the TLS connection, the client has to indicate the name it wants (in our case the application’s DNS name) in the SNI extension to the TLS “hello”. The proxy can’t behave as a dumb layer 4 proxy (just forwarding packets unexamined to another TLS endpoint) because it has to examine the stream at the protocol level to determine where to send it. Since the SNI information is (from my reading of the RFC) volunteered by the client at the start of the connection, it does seem like it would be possible for a proxy to examine the protocol and then act like a layer 4 proxy based on that examination, and indeed I think F5 LBs have this capability, but it does not seem to be a standard proxy/LB capability, and certainly not for existing open source implementations (nginx, haproxy, httpd – someone correct me if I’m missing something here), so to be inclusive we are left with proxies that operate at the layer 7 protocol layer, meaning they perform the TLS negotiation from the client’s perspective.

Edit 2014-10-08: layer 4 routing based on SNI is probably more available than I thought. I should have realized HAproxy 1.5 can do it, given OpenShift’s SNI proxy is using that capability. It’s hard to find details on though. If most candidate routing layer technologies have this ability, then it could simplify a number of the issues around TLS because terminating TLS could be deferred to the node.

If that was all greek to you, the important point to extract is that a reverse proxy has to have all the information to handle TLS connections, meaning the appropriate key and certificate for any requested application name. This is the same information used at the node frontend proxy; indeed, the routing layer will likely need to reuse the same *.cloud_domain wildcard certificate and key that is shared on all OpenShift nodes, and it needs to be made aware of aliases and their custom certificates so that it can properly terminate requests for them. (If OpenShift supported specifying x509 authentication via client certificates [which BTW could be implemented without large structural changes], the necessary parameters would also need to be published to the routing layer in addition to the node frontend proxy.)

We assume that a wildcard certificate covers the standard HA DNS name created for HA apps (e.g. in this case ha-phpha-demo.cloud_domain, depending of course on configuration; notice that no event announces this name — it is implied when an app is HA). That leaves aliases which have their own custom certs needing to be understood at the routing layer:

$ rhc alias add foo.example.com -a phpha
:action: :add_alias
:app_name: phpha
:namespace: demo
:alias: foo.example.com

$ rhc alias update-cert foo.example.com -a phpha --certificate certfile --private-key keyfile
:action: :add_ssl
:app_name: phpha
:namespace: demo
:alias: foo.example.com
:ssl: [...]
:private_key: [...]
:pass_phrase:

Aliases will of course need their own routing configuration entries regardless of HTTP/S, and something will have to create their DNS entries as CNAMEs to the ha- application DNS record.

A security-minded administrator would likely desire to encrypt connections from the routing layer back to the gears. Two methods of doing this present themselves:

  1. Send an HTTPS request back to the gear’s port proxy. This won’t work with any of the existing cartridges OpenShift provides (including the haproxy-1.4 LB cartridge), because none of them expose an HTTPS-aware endpoint. It may be possible to change this, but it would be a great deal of work and is not likely to happen in the lifetime of the current architecture.
  2. Send an HTTPS request back to the node frontend proxy, which does handle HTTPS. This actually works fine, if the app is being accessed via an alias – more about this caveat later.

Sending the right HTTP headers

It is critically important in any reverse-proxy situation to preserve the client’s HTTP request headers indicating the URL at which it is accessing an application. This allows the application to build self-referencing URLs accurately. This can be a little complicated in a reverse-proxy situation, because the same HTTP headers may be used to route requests to the right application. Let’s think a little bit about how this needs to work. Here’s an example HTTP request:

POST /app/login.php HTTP/1.1
Host: phpha-demo.openshift.example.com
[...]

If this request comes into the node frontend proxy, it looks at the Host header, and assuming that it’s a known application, forwards the request to the correct gear on that node. It’s also possible (although OpenShift doesn’t do this, but a routing layer might) to use the path (/app/login.php here) to route to different apps, e.g. requests for /app1/ might go to a different place than /app2/.

Now, when an application responds, it will often create response headers (e.g. a redirect with a Location: header) as well as content based on the request headers that are intended to link to itself relative to what the client requested. The client could be accessing the application by a number of paths – for instance, our HA app above should be reachable either as phpha-demo.openshift.example.com or as ha-phpha-demo.openshift.example.com (default HA config). We would not want a client that requests the ha- address to receive a link to the non-ha- address, which may not even resolve for it, and in any case would not be HA. The application, in order to be flexible, should not make a priori assumptions about how it will be addressed, so every application framework of any note provides methods for creating redirects and content links based on the request headers. Thus, as stated above, it’s critical for these headers to come in with an accurate representation of what the client requested, meaning:

  1. The same path (URI) the client requested
  2. The same host the client requested
  3. The same protocol the client requested

(The last is implemented via the “X-Forwarded-Proto: https” header for secure connections. Interestingly, a recent RFC specifies a new header for communicating items 2 and 3, but not 1. This will be a useful alternative as it becomes adopted by proxies and web frameworks.)

Most reverse proxy software should be well aware of this requirement and provide options such that when the request is proxied, the headers are preserved (for example, the ProxyPreserveHost directive in httpd). This works perfectly with the HA routing layer scheme proposed in the PEP, where the proxied request goes directly to an application gear. The haproxy cartridge does not need to route based on Host: header (although it does route requests based on a cookie it sets for sticky sessions), so the request can come in for any name at all and it’s simply forwarded as-is for the application to use.

The complication arises in situations where, for example, you would like the routing layer to forward requests to the node frontend proxy (in order to use HTTPS, or to prevent idling). The node frontend does care about the Host header because it’s used for routing, so the requested host name has to be one that the OpenShift node knows in relation to the desired gear. It might be tempting to think that you can just rewrite the request to use the gear’s “normal” name (e.g. phpha-demo.cloud_domain) but this would be a mistake because the application would respond with headers and links based on this name. Reverse proxies often offer options for rewriting the headers and even contents of responses in an attempt to fix this, but they cannot do so accurately for all situations (example: links embedded in JavaScript properties) so this should not be attempted. (Side note: the same prohibition applies to rewriting the URI path while proxying. Rewriting example.com/app/… to app.internal.example.com/… is only safe for sites that provide static content and all-relative links.)

What was that caveat?

I mentioned a caveat both on defeating the idler and proxying HTTPS connections to the node frontend, and it’s related to the section above. You can absolutely forward an HA request to the node frontend if the request is for a configured alias of the application, because the node frontend knows how to route aliases (so you don’t have to rewrite the Host: header which, as just discussed, is a terrible idea). The caveat is that, strangely, OpenShift does not create an alias for the ha- DNS entry automatically assigned to an HA app, so manual definition of an alias is currently required per-app for implementation of this scheme. I have created a feature request to instate the ha- DNS entry as an alias, and being hopefully easy to implement, this may soon remove the caveat behind this approach to routing layer implementation.

Things go away too

I probably shouldn’t even have to mention this, but: apps, endpoints, aliases, and certificates can all go away, too. Make sure that you process these events and don’t leave any debris lying around in your routing layer confs. Gears can also be moved from one host to another, which is an easy use case to forget about.

And finally, speaking of going away, the example routing plugin initially provided :add_gear and :remove_gear events, and for backwards compatibility it still does (duplicating the endpoint events). These events are deprecated and should disappear soon.

Advertisements

One Response

  1. Hi Luke, we have developed an Advanced Load Balancer http://bit.ly/1wc1WUI based on #keepalived #nginx #rsync for Openshift Enterprise, we have evaluated
    several related project but noone looks ready for production, we are still improving the documentation.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: