Reverse proxy resource usage – httpd 2.2.15

Recently I had reason to find out what happens when your httpd config includes a LOT of ProxyPass statements. The particular use case involved an application farm with one VirtualHost per domain for hundreds of domains, each very similar (though not identical) with dozens of ProxyPass statements to shared backend apps. But just a few dozen domains resulted in very high memory and CPU use.

I set up a simple test rig (CentOS 5 VM,1GB RAM, single x86 CPU, ERS 4.0.2 httpd 2.2.15) and ran some unscientific tests where the config included 20,000 ProxyPass statements with these variables:

  1. Unique vs. repeated – unique statements each proxied to a unique destination, while repeated ones proxied different locations to the same destination.
  2. Balancer – the statements either proxy directly to the URL, or use a pre-defined balancer:// address.
  3. Vhost – the ProxyPass statements were either all in the main_server definition or inside vhosts, one per.
  4. MPM – either prefork or worker MPM is used.

No actual load was applied to the server – I just wanted to see what it took to read the config and start up. Settings were defaults per MPM (5 children for prefork, 3 children for  worker) – obviously you’d tune these depending on usage. I tried to wait for things to settle down a bit after startup before reading “top” sorted by memory usage.

I also tried some other methods for accomplishing this task to see what the memory footprint would be.

Continue reading


Clustering Tomcat (part II)

Refer to my earlier post on the subject for background. Here are some more things I ran into.

Tricking mod_proxy_html

It didn’t take too long to come up against something that mod_proxy_html wouldn’t rewrite correctly. In the petcare sample app, there’s a line in a Javascript file (resources/jquery/openid-selector/js/openid-jquery.js) that looks like this:

	img_path: '/petcare/resources/jquery/openid-selector/images/',

There’s nothing to identify it as a URL; it’s just a regular old JS property,  so mod_proxy_html does nothing with it. In fact, as far as I can see, there isn’t any way to tweak the module configuration to adjust it, even knowing exactly what the problem is. But having the path wrong meant in this case that some vital icons on the OpenID sign-in page didn’t show up.

So I went ahead and broke everything out into vhosts like I should have all along. Look folks, if you’re reverse proxying, you really just need to have the same path on the front and backend unless what you’re doing is dead simple. If your front-end URL-space dictates a particular path, move your app to that path on the backend; don’t try to remap the path, it will almost certainly cause you headaches.

<Location> bleeds into VirtualHost

Having created vhosts, I noticed something interesting: <Location> sections that I defined in the  main server config sometimes applied in the vhosts as well – sometimes not. In particular, applying a handler (like status-handler) also applied to vhosts, while a JkMount in the main section did not cause vhosts to also serve requests through mod_jk. I think there were some other oddities like this but I can’t recall them any more.

It’s worth noting that RewriteRule directives in the main server config are explicitly not inherited by vhosts unless you specify that they should be.

mod_jk cluster partitioning

mod_jk workers have this property “domain” that you can set (ref. the Tomcat Connector guide). It’s not exactly obvious what it does – it doesn’t make sense to use the domain as the route when you’ve already got a route for each instance. I also read somewhere that mod_jk can be used to partition a cluster. Reading between the lines a little bit and trying it out, here’s what I found:

If you specify a domain, the workers with the same domain will failover like a sub-cluster within the cluster; they all still have their own routes, but if one instance fails, mod_jk will try to route to another member of the same sub-cluster. This means that you only need to replicate sessions between the members of the sub-cluster (assuming you trust the sub-cluster not to fail entirely). This could significantly cut down on the amount of session replication network traffic in a large cluster.