Reverse proxy resource usage – httpd 2.2.15

Recently I had reason to find out what happens when your httpd config includes a LOT of ProxyPass statements. The particular use case involved an application farm with one VirtualHost per domain for hundreds of domains, each very similar (though not identical) with dozens of ProxyPass statements to shared backend apps. But just a few dozen domains resulted in very high memory and CPU use.

I set up a simple test rig (CentOS 5 VM,1GB RAM, single x86 CPU, ERS 4.0.2 httpd 2.2.15) and ran some unscientific tests where the config included 20,000 ProxyPass statements with these variables:

  1. Unique vs. repeated – unique statements each proxied to a unique destination, while repeated ones proxied different locations to the same destination.
  2. Balancer – the statements either proxy directly to the URL, or use a pre-defined balancer:// address.
  3. Vhost – the ProxyPass statements were either all in the main_server definition or inside vhosts, one per.
  4. MPM – either prefork or worker MPM is used.

No actual load was applied to the server – I just wanted to see what it took to read the config and start up. Settings were defaults per MPM (5 children for prefork, 3 children for  worker) – obviously you’d tune these depending on usage. I tried to wait for things to settle down a bit after startup before reading “top” sorted by memory usage.

I also tried some other methods for accomplishing this task to see what the memory footprint would be.

Continue reading

Advertisements

Clustering Tomcat (part II)

Refer to my earlier post on the subject for background. Here are some more things I ran into.

Tricking mod_proxy_html

It didn’t take too long to come up against something that mod_proxy_html wouldn’t rewrite correctly. In the petcare sample app, there’s a line in a Javascript file (resources/jquery/openid-selector/js/openid-jquery.js) that looks like this:

	img_path: '/petcare/resources/jquery/openid-selector/images/',

There’s nothing to identify it as a URL; it’s just a regular old JS property,  so mod_proxy_html does nothing with it. In fact, as far as I can see, there isn’t any way to tweak the module configuration to adjust it, even knowing exactly what the problem is. But having the path wrong meant in this case that some vital icons on the OpenID sign-in page didn’t show up.

So I went ahead and broke everything out into vhosts like I should have all along. Look folks, if you’re reverse proxying, you really just need to have the same path on the front and backend unless what you’re doing is dead simple. If your front-end URL-space dictates a particular path, move your app to that path on the backend; don’t try to remap the path, it will almost certainly cause you headaches.

<Location> bleeds into VirtualHost

Having created vhosts, I noticed something interesting: <Location> sections that I defined in the  main server config sometimes applied in the vhosts as well – sometimes not. In particular, applying a handler (like status-handler) also applied to vhosts, while a JkMount in the main section did not cause vhosts to also serve requests through mod_jk. I think there were some other oddities like this but I can’t recall them any more.

It’s worth noting that RewriteRule directives in the main server config are explicitly not inherited by vhosts unless you specify that they should be.

mod_jk cluster partitioning

mod_jk workers have this property “domain” that you can set (ref. the Tomcat Connector guide). It’s not exactly obvious what it does – it doesn’t make sense to use the domain as the route when you’ve already got a route for each instance. I also read somewhere that mod_jk can be used to partition a cluster. Reading between the lines a little bit and trying it out, here’s what I found:

If you specify a domain, the workers with the same domain will failover like a sub-cluster within the cluster; they all still have their own routes, but if one instance fails, mod_jk will try to route to another member of the same sub-cluster. This means that you only need to replicate sessions between the members of the sub-cluster (assuming you trust the sub-cluster not to fail entirely). This could significantly cut down on the amount of session replication network traffic in a large cluster.

Trying out mod_fcgid

mod_fcgid is an Apache httpd module that sets up a separate daemon for processing CGI requests. It’s like mod_cgid but it keeps a pool of processes running rather than spin up one each time a script needs execution. SpringSource ERS 4 comes with it but you have to enable the module and configure.

Official docs: http://httpd.apache.org/mod_fcgid/mod/mod_fcgid.html Continue reading

Fixing Apache httpd reverse proxy redirect rewrites

ProxyPassReverse statement was adding an extra “/” in its Location: header rewrites. I noticed this when requesting a URL like “/petcare”. Tomcat would redirect this appropriately with a 302 and this header:

Location: http://vm-centos-cluster-ers.sosiouxme.lan:8100/petcare/

But when it came back through the proxy as “/http/nocluster/petcare”, it ended up rewritten as:

Location: http://vm-centos-cluster-ers.sosiouxme.lan/http/nocluster//petcare/

It seemed like a small thing – and, after all, it still worked due to URL canonicalization – but I wanted to understand why this happened and make it right. Here’s a typical configuration section initially:

# use mod_proxy_http to connect to non-replicated tc instances
<Proxy balancer://http-nocluster/>
BalancerMember http://vm-centos-cluster-ers.sosiouxme.lan:8100 route=ers
BalancerMember http://vm-centos-cluster-tcs.sosiouxme.lan:8100 route=tcs
ProxySet stickysession=JSESSIONID nofailover=On
</Proxy>

<Location /http/nocluster/>
ProxyPass balancer://http-nocluster/
ProxyPassReverse balancer://http-nocluster/
ProxyPassReverseCookiePath / /http/nocluster/
ProxyHTMLURLMap / /http/nocluster/
</Location>

I figured it was just a matter of juggling where “/” appeared at the end of various things. I cranked up the logging to “debug” and tried a few changes one by one.

  • Remove “/” from end of ProxyPass. This gave me a lovely 500 error and log messages:

ProxyPass balancer://http-nocluster

[debug] mod_proxy_balancer.c(46): proxy: BALANCER: canonicalising URL //http-noclusterpetcare

[debug] proxy_util.c(1525): [client 172.31.1.52] proxy: *: found reverse proxy worker for balancer://http-noclusterpetcare/

[…]

[warn] proxy: No protocol handler was valid for the URL /http/nocluster/petcare. If you are using a DSO version of mod_proxy, make sure the proxy submodules are included in the configuration using LoadModule.

  • Remove “/” from end of ProxyPassReverse. No apparent effect.
  • Remove “/” from end of <Proxy balancer://http-nocluster/> – no effect.
  • Remove “/” from end of <Location /http/nocluster/> – now we were getting somewhere! The Location header was rewritten correctly; only problem is that after rewriting, it was passed through to Tomcat as //petcare/ and failing.
  • Remove “/” from the end of everything! This seems to be what works best – everything passes through correctly and Location is rewritten correctly. So the configuration I ended up with is:

# use mod_proxy_http to connect to non-replicated tc instances
<Proxy balancer://http-nocluster>
BalancerMember http://vm-centos-cluster-ers.sosiouxme.lan:8100 route=ers
BalancerMember http://vm-centos-cluster-tcs.sosiouxme.lan:8100 route=tcs
ProxySet stickysession=JSESSIONID nofailover=On
</Proxy>

<Location /http/nocluster>
ProxyPass balancer://http-nocluster
ProxyPassReverse balancer://http-nocluster
ProxyPassReverseCookiePath / /http/nocluster/
ProxyHTMLURLMap / /http/nocluster/
</Location>

This was pretty much the only thing I tried that worked properly. Now, with mod_proxy_ajp it was a different story. The configuration looked pretty similar (because I’d done a cut/paste/edit):

# use mod_proxy_ajp to connect to non-replicated tc instances
<Proxy balancer://ajp-nocluster>
BalancerMember ajp://vm-centos-cluster-tcs.sosiouxme.lan:8109 route=tcs
BalancerMember ajp://vm-centos-cluster-ers.sosiouxme.lan:8109 route=ers
ProxySet stickysession=JSESSIONID nofailover=On
</Proxy>

<Location /ajp/nocluster/>
ProxyPass balancer://ajp-nocluster/
ProxyPassReverse balancer://ajp-nocluster/
ProxyPassReverseCookiePath / /ajp/nocluster/
ProxyHTMLURLMap / /ajp/nocluster/
</Location>

Thing is, my ProxyPassReverse there wasn’t doing anything at all. This is a little-known fact about how mod_proxy_ajp and ProxyPassReverse interact: an AJP connection doesn’t get a new http request to the backend; rather the HTTP headers from the request to the proxy are passed to the backend, and typically presented by Tomcat to the app as the request headers. So when the app (or Tomcat) forms a redirect (Location: header), it is relative to the host and port on the proxy, not the backend.

Meanwhile, ProxyPassReverse is very literal-minded. It only matches exactly what you put in the statement. So it’s a common error to have config like this:

ProxyPass / ajp://backend.example.com:8009/

ProxyPassReverse / ajp://backend.example.com:8009/

The ProxyPassReverse there isn’t doing anything at all, because it’s never going to see a “Location: ajp://backend.example.com:8009/” header from the backend – instead it will see URLs based on the front end. Most people won’t notice this because most people are using the same paths on front and backend, so nothing needed to be rewritten anyway. I had to be different and remap paths so I noticed when they weren’t rewritten.

The exception to the literal-mindedness of PPR is the balancer:// faux protocol. When you have a bunch of http backends in a balancer, you would normally need to rewrite headers corresponding to any of them – so, a PPR directive for each. This is pretty tedious. Starting in (I think) httpd 2.2.9 you could do a single PPR directive with the balancer:// notation as above and get this for free. I was hoping it would be smarter about AJP, but it’s not. That’s not such a big deal, though – since the host and port are always that of the front-end, I only need a single PPR for the rewrite.

<Location /ajp/nocluster/>
ProxyPass balancer://ajp-nocluster/

# note: http! This is the proxy server URL
ProxyPassReverse http://vm-centos-cluster-ers.sosiouxme.lan/
ProxyPassReverseCookiePath / /ajp/nocluster/
ProxyHTMLURLMap / /ajp/nocluster/
</Location>

And I didn’t even have to futz with the slashes, it just worked with them in.

Clustering Tomcat

I’ve been meaning for some time to set up Tomcat clustering in several different ways just to see how it works. I finally got a chance today. There are guides and other bits of information elsewhere on how to do all this. I’m not gonna tell you how, sorry; the point of this post is to document my problems and how I overcame them.

A word about setup

My front end load balancer is Apache httpd 2.2.15 (from SpringSource ERS 4.0.2) using mod_proxy_http, mod_proxy_ajp, mod_jk, and (just to keep things interesting) mod_rewrite as connectors to the backend Tomcat 6.0.28 instances. I wanted to try out 2-node Tomcat clusters (one side of each cluster ERS Tomcat, the other side tc Server 2.0.3) without any session replication (so, sticky sessions and you get a new session if there’s a failure) and with session replication via the standard Delta manager (which replicates all sessions to all nodes) and the Backup manager (which replicates all sessions to a single reference node, the “backup” for each app).  Basically the idea is to test out every conceivable way to put together a cluster with standard SpringSource technologies.

The first trick was mapping all of these into the  URL space for my httpd proxy. I wanted to put each setup behind its own URL /<connector>/<cluster> so e.g /http/backup and /ajp/delta. This is typically not done and for good reason; mod_proxy will let you do the mapping and will even clean up redirects and cookie paths from the backend, but to take care of self-referential links in the backend apps you actually have to rewrite the content that comes back; for that I installed mod_proxy_html, a non-standard module for doing such things. The reason it’s a non-standard module is that this approach is fraught with danger. But given that I mostly don’t care about how well the apps work in my demo cluster, I thought it’d be a great time to put it through its paces.

For this reason, most people make sure the URLs on the front-end and back-end are the same; and in fact, as far as I could tell, there was no way at all to make mod_jk do any mapping, so I’m setting it up as a special case – more later if it’s relevant. The best way to do this demo would probably be to set up virtual hosts on the proxy for each scenario and not require any URI mapping; if I run into enough problems I’ll probably fall back to that.

Problem 1: the AJP connector

I got things running without session replication fairly easily. My first big error with session replication was actually something else, but at first I thought it might be related to this warning in the Tomcat log:

Aug 13, 2010 9:43:00 PM org.apache.catalina.core.AprLifecycleListener init

INFO: The APR based Apache Tomcat Native library which allows optimal performance in production environments was not found on the java.library.path: /usr/java/jdk1.6.0_21/jre/lib/i386/server:/usr/java/jdk1.6.0_21/jre/lib/i386:/usr/java/jdk1.6.0_21/jre/../lib/i386:/usr/java/packages/lib/i386:/lib:/usr/lib
Aug 13, 2010 9:43:00 PM org.apache.catalina.startup.ConnectorCreateRule _setExecutor
WARNING: Connector [org.apache.catalina.connector.Connector@1d6f122] does not support external executors. Method setExecutor(java.util.concurrent.Executor) not found.

These actually turn out to be related:

<Listener className=”org.apache.catalina.core.AprLifecycleListener” SSLEngine=”on” />

<Connector  executor=”tomcatThreadPool”
port=”8209″
protocol=”AJP/1.3″
emptySessionPath=”true”
/>

I don’t really know why the APR isn’t working properly, but a little searching turned up some obscure facts: if the APR isn’t loaded, then for the AJP connector Tomcat makes a default choice of implementation that doesn’t use the executor thread pool. So you have to explicitly set the class to use like this:

<Connector  executor=”tomcatThreadPool”
port=”8209″
protocol=”org.apache.coyote.ajp.AjpProtocol”
emptySessionPath=”true”
/>

Nice, eh? OK. But that was just an annoyance.

Problem 2: MBeans blowup

The real thing holding me back was this error:

Aug 13, 2010 9:52:16 PM org.apache.catalina.mbeans.ServerLifecycleListener createMBeans
SEVERE: createMBeans: Throwable
java.lang.NullPointerException
at org.apache.catalina.mbeans.MBeanUtils.createObjectName(MBeanUtils.java:1086)
at org.apache.catalina.mbeans.MBeanUtils.createMBean(MBeanUtils.java:504)

Now what the heck was that all about? Well, I found no love on Google. But I did eventually guess what the problem was. This probably underscores a good methodology: when working on configs, add one thing at a time and test it out before going on. If only Tomcat had a “configtest” like httpd – it takes forever to “try it out”.

In case anyone else runs into this, I’ll tell you what it was. The Cluster Howto made it pretty clear that you need to set your webapps context to be distributable for clustering to work. It wasn’t clear to me where to put that, but I didn’t want to create a context descriptor for each app on each instance. I knew you could put a <Context> element in server.xml so that’s right where I put it, right inside the <Host> element:

<Context distributable=”true” />

Well, that turns out to be a bad idea. It causes the error above. So don’t do that. For the products I’m using, there’s a single context.xml that applies to all apps on the server; that’s where you want to put the distributable attribute.

Cluster membership – static

My next task was to get the cluster members in each case to recognize each other and replicate sessions. Although all the examples use multicast to do this, I wanted to try setting up static membership, because I didn’t want to look up how to enable multicast just yet. And it should be simpler, right? Well, I had a heck of a time finding this, but it looks like the StaticMembershipInterceptor is the path.

My interceptor looks like this:

<Interceptor className=”org.apache.catalina.tribes.group.interceptors.StaticMembershipInterceptor”>
<Member className=”org.apache.catalina.tribes.membership.StaticMember”
port=”8210″ securePort=”-1″
host=”vm-centos-cluster-tcs.sosiouxme.lan”
domain=”delta-cluster”
uniqueId=”{0,1,2,3,4,5,6,7,8,9}”
/>
</Interceptor>

(with the “host” being the other host in the 2-node cluster on each side). Starting with this configuration brings an annoying warning message:

WARNING: [SetPropertiesRule]{Server/Service/Engine/Cluster/Channel/Interceptor/Member} Setting property ‘uniqueId’ to ‘{1,2,3,4,5,6,7,8,9,0}’ did not find a matching property.

So I guess that property has been removed and the docs not updated; didn’t seem necessary anyway given the host/port combo should be unique.

In any case, the log at first looks encouraging:

Aug 13, 2010 10:13:04 PM org.apache.catalina.ha.tcp.SimpleTcpCluster memberAdded
INFO: Replication member added:org.apache.catalina.tribes.membership.MemberImpl[tcp://vm-centos-cluster-tcs.sosiouxme.lan:8210,vm-centos-cluster-tcs.sosiouxme.lan,8210, alive=0,id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 }, payload={}, command={}, domain={100 101 108 116 97 45 99 108 117 …(13)}, ]
Aug 13, 2010 10:13:04 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Sleeping for 1000 milliseconds to establish cluster membership, start level:4
Aug 13, 2010 10:13:04 PM org.apache.catalina.ha.tcp.SimpleTcpCluster memberAdded
INFO: Replication member added:org.apache.catalina.tribes.membership.MemberImpl[tcp://{172, 31, 1, 108}:8210,{172, 31, 1, 108},8210, alive=25488670,id={66 108 -53 22 -38 -64 76 -110 -110 -54 28 -11 -126 -44 66 28 }, payload={}, command={}, domain={}, ]

Sounds like the other cluster member is found, right? I don’t know why it’s in there twice (once w/ hostname, once with IP) but I think the first is the configuration value, and the second is for when the member is actually contacted (alive=).

And indeed, for one node, later in the log as the applications are being deployed, I see the session replication appears to happen for each:

WARNING: Manager [localhost#/petclinic], requesting session state from org.apache.catalina.tribes.membership.MemberImpl[tcp://vm-centos-cluster-tcs.sosiouxme.lan:8210,vm-centos-cluster-tcs.sosiouxme.lan,8210, alive=0,id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 }, payload={}, command={}, domain={100 101 108 116 97 45 99 108 117 …(13)}, ]. This operation will timeout if no session state has been received within 60 seconds.
Aug 13, 2010 10:13:09 PM org.apache.catalina.ha.session.DeltaManager waitForSendAllSessions
INFO: Manager [localhost#/petclinic]; session state send at 8/13/10 10:13 PM received in 186 ms.

Um, why is that a WARNING? Looks like normal operation to me, surely it should be an INFO. Whatever. The real problem is on the other side of the node:

13-Aug-2010 22:25:16.624 INFO org.apache.catalina.ha.session.DeltaManager.start Register manager /manager to cluster element Engine with name Catalina
13-Aug-2010 22:25:16.624 INFO org.apache.catalina.ha.session.DeltaManager.start Starting clustering manager at/manager
13-Aug-2010 22:25:16.627 WARNING org.apache.catalina.ha.session.DeltaManager.getAllClusterSessions Manager [localhost#/manager], requesting session state from org.apache.catalina.tribes.membership.MemberImpl[tcp://vm-centos-cluster-ers.sosiouxme.lan:8210,vm-centos-cluster-ers.sosiouxme.lan,8210, alive=0,id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 }, payload={}, command={}, domain={100 101 108 116 97 45 99 108 117 …(13)}, ]. This operation will timeout if no session state has been received within 60 seconds.
13-Aug-2010 22:26:16.635 SEVERE org.apache.catalina.ha.session.DeltaManager.waitForSendAllSessions Manager [localhost#/manager]: No session state send at 8/13/10 10:25 PM received, timing out after 60,009 ms.

And so on for the rest of my webapps too! And if I try to access my cluster, it seems to be hung! While I was writing this up I think I figured out the problem with that. On the first node, I had gone into the context files for the manager and host-manager apps and set distributable=”false” (doesn’t really make sense to distribute manager apps). On the second node I had not done the same. My bad, BUT:

  1. Why did it take 60 seconds to figure this out for EACH app; and
  2. Why did EVERY app, not just the non-distributable ones, fail replication (at 60 seconds apiece)?

Well, having cleared that up my cluster with DeltaManager session replication seems to be working great.

Cluster with BackupManager, multicast

OK, here’s the surprise denouement – this seems to have just worked out of the box. I didn’t even think multicast would work without me having to tweak something in my OS (CentOS 5.5) or my router. But it did, and failover worked flawlessly when I killed a node too. Sweet.

test from a blog tool

i thought i’d try out some desktop blogging helpers so i don’t have to involve a browser for blogging. ubuntu has a few… going to see what lekhonee does for this blog (first complaint: it’d be nice if it told you your user/pass was bad before you tried to post). in addition to this blog i have an old personal one and a new one just about android that i’ll work on together with my brother – he’s hosting it.

speaking of which, he got back to me about setting the domain to what we wanted and enabling WP to write its own config so that’s fine; however the rewrite rules aren’t working. i’m used to setting up apache on fedora/RHEL so i’m not used to how things are done in ubuntu, but it didn’t take too long to figure things out (everything is called apache2 instead of httpd). curiously, there’s nothing in httpd.conf; i guess that’s fine, everything is modularized out into directories, which must make it a lot easier for tools to configure things without having to work around unrelated config items.

.htaccess isn’t being read because we have AllowOverride None in /etc/apache2/sites-enabled/android.opensourceror.org – needs to be All; and probably need to enable mod_rewrite with sudo a2enmod rewrite – so, waiting on him to do that. curses for not having my own root :-)

i worked on my android dev intro outline and i think i have it fairly complete and organized, enough to fill the time and more for the meetup this thursday :-) even as a n00b it’s not hard to have just a little more knowledge than those i’m giving the intro to.

The post is brought to you by lekhonee v0.7