Hash collision DoS

Have been dealing with this vulnerability a little bit. Amusingly, my old favorite Perl has had the fix for this for years – salt the hash randomly so an attacker can’t predict how your entries will hash. That’s really the only fix, because while you might be able to mitigate the specific case of hashing CGI parameters, anything that takes user input in any form from potentially malicious clients could be vulnerable. That’s a pretty wide use case.

Of course, if the bad guys don’t know how the processing of input is implemented, it will be tricky for them to find the hole to exploit. So I suppose blocking the specific method (as Tomcat did by limiting the number of parameters it will hash) serves to block opportunistic attacks. But it may still leave possibilities for those who are really determined to cause havoc with a specific site.

 

Advertisements

Clustering Tomcat (part III): the HTTPS connector

Refer to my earlier posts on the subject for background. Here are further explorations, not having much to do with clustering as it turns out, but everything to do with proxying.

In a number of situations you might want to set up encrypted communication between the proxy and backend. For this, Tomcat supplies an HTTPS connector (as far as I know, the only way to encrypt requests to Tomcat).

Connector setup

Setup is actually fairly simple with just a few surprises. Mainly, the protocol setting on the connector remains “HTTP/1.1” not some version of “HTTPS” – the protocol being the language spoken, and SSL encryption being a layer on top of that which you specify with SSLProtocol. Basically, HTTPS connectors look like HTTP connectors with some extra SSL properties specifying the encryption:

<Connector executor="tomcatThreadPool"
 port="${https.port}"
 protocol="HTTP/1.1"
 connectionTimeout="20000"
 acceptCount="100"
 maxKeepAliveRequests="15"
 SSLEnabled="true"
 SSLProtocol="TLS"
 scheme="https"
 secure="true"
 keystorePass="changeit"
 clientAuth="false"
/>

If I really wanted to be thorough I would set clientAuth=”true” and set up the proxy with a client certificate signed by this server’s truststore, thereby guaranteeing only the proxy can even make a request. But not right now.

Note the “scheme” and “secure” properties here. These don’t actually affect the connection; instead, they specify what Tomcat should answer when a servlet asks questions about its request. Specifically, request.getScheme() is likely to be used to create self-referential URLs, while request.isSecure() is used to make several security-related decisions. Defaults are for a non-secure HTTP connector but they can be set to whatever makes sense in context – in fact, AFAICS the scheme can be set to anything and the connector will still serve data fine. Read on for clarity on uses of these properties. Continue reading

Clustering Tomcat (part II)

Refer to my earlier post on the subject for background. Here are some more things I ran into.

Tricking mod_proxy_html

It didn’t take too long to come up against something that mod_proxy_html wouldn’t rewrite correctly. In the petcare sample app, there’s a line in a Javascript file (resources/jquery/openid-selector/js/openid-jquery.js) that looks like this:

	img_path: '/petcare/resources/jquery/openid-selector/images/',

There’s nothing to identify it as a URL; it’s just a regular old JS property,  so mod_proxy_html does nothing with it. In fact, as far as I can see, there isn’t any way to tweak the module configuration to adjust it, even knowing exactly what the problem is. But having the path wrong meant in this case that some vital icons on the OpenID sign-in page didn’t show up.

So I went ahead and broke everything out into vhosts like I should have all along. Look folks, if you’re reverse proxying, you really just need to have the same path on the front and backend unless what you’re doing is dead simple. If your front-end URL-space dictates a particular path, move your app to that path on the backend; don’t try to remap the path, it will almost certainly cause you headaches.

<Location> bleeds into VirtualHost

Having created vhosts, I noticed something interesting: <Location> sections that I defined in the  main server config sometimes applied in the vhosts as well – sometimes not. In particular, applying a handler (like status-handler) also applied to vhosts, while a JkMount in the main section did not cause vhosts to also serve requests through mod_jk. I think there were some other oddities like this but I can’t recall them any more.

It’s worth noting that RewriteRule directives in the main server config are explicitly not inherited by vhosts unless you specify that they should be.

mod_jk cluster partitioning

mod_jk workers have this property “domain” that you can set (ref. the Tomcat Connector guide). It’s not exactly obvious what it does – it doesn’t make sense to use the domain as the route when you’ve already got a route for each instance. I also read somewhere that mod_jk can be used to partition a cluster. Reading between the lines a little bit and trying it out, here’s what I found:

If you specify a domain, the workers with the same domain will failover like a sub-cluster within the cluster; they all still have their own routes, but if one instance fails, mod_jk will try to route to another member of the same sub-cluster. This means that you only need to replicate sessions between the members of the sub-cluster (assuming you trust the sub-cluster not to fail entirely). This could significantly cut down on the amount of session replication network traffic in a large cluster.

Clustering Tomcat

I’ve been meaning for some time to set up Tomcat clustering in several different ways just to see how it works. I finally got a chance today. There are guides and other bits of information elsewhere on how to do all this. I’m not gonna tell you how, sorry; the point of this post is to document my problems and how I overcame them.

A word about setup

My front end load balancer is Apache httpd 2.2.15 (from SpringSource ERS 4.0.2) using mod_proxy_http, mod_proxy_ajp, mod_jk, and (just to keep things interesting) mod_rewrite as connectors to the backend Tomcat 6.0.28 instances. I wanted to try out 2-node Tomcat clusters (one side of each cluster ERS Tomcat, the other side tc Server 2.0.3) without any session replication (so, sticky sessions and you get a new session if there’s a failure) and with session replication via the standard Delta manager (which replicates all sessions to all nodes) and the Backup manager (which replicates all sessions to a single reference node, the “backup” for each app).  Basically the idea is to test out every conceivable way to put together a cluster with standard SpringSource technologies.

The first trick was mapping all of these into the  URL space for my httpd proxy. I wanted to put each setup behind its own URL /<connector>/<cluster> so e.g /http/backup and /ajp/delta. This is typically not done and for good reason; mod_proxy will let you do the mapping and will even clean up redirects and cookie paths from the backend, but to take care of self-referential links in the backend apps you actually have to rewrite the content that comes back; for that I installed mod_proxy_html, a non-standard module for doing such things. The reason it’s a non-standard module is that this approach is fraught with danger. But given that I mostly don’t care about how well the apps work in my demo cluster, I thought it’d be a great time to put it through its paces.

For this reason, most people make sure the URLs on the front-end and back-end are the same; and in fact, as far as I could tell, there was no way at all to make mod_jk do any mapping, so I’m setting it up as a special case – more later if it’s relevant. The best way to do this demo would probably be to set up virtual hosts on the proxy for each scenario and not require any URI mapping; if I run into enough problems I’ll probably fall back to that.

Problem 1: the AJP connector

I got things running without session replication fairly easily. My first big error with session replication was actually something else, but at first I thought it might be related to this warning in the Tomcat log:

Aug 13, 2010 9:43:00 PM org.apache.catalina.core.AprLifecycleListener init

INFO: The APR based Apache Tomcat Native library which allows optimal performance in production environments was not found on the java.library.path: /usr/java/jdk1.6.0_21/jre/lib/i386/server:/usr/java/jdk1.6.0_21/jre/lib/i386:/usr/java/jdk1.6.0_21/jre/../lib/i386:/usr/java/packages/lib/i386:/lib:/usr/lib
Aug 13, 2010 9:43:00 PM org.apache.catalina.startup.ConnectorCreateRule _setExecutor
WARNING: Connector [org.apache.catalina.connector.Connector@1d6f122] does not support external executors. Method setExecutor(java.util.concurrent.Executor) not found.

These actually turn out to be related:

<Listener className=”org.apache.catalina.core.AprLifecycleListener” SSLEngine=”on” />

<Connector  executor=”tomcatThreadPool”
port=”8209″
protocol=”AJP/1.3″
emptySessionPath=”true”
/>

I don’t really know why the APR isn’t working properly, but a little searching turned up some obscure facts: if the APR isn’t loaded, then for the AJP connector Tomcat makes a default choice of implementation that doesn’t use the executor thread pool. So you have to explicitly set the class to use like this:

<Connector  executor=”tomcatThreadPool”
port=”8209″
protocol=”org.apache.coyote.ajp.AjpProtocol”
emptySessionPath=”true”
/>

Nice, eh? OK. But that was just an annoyance.

Problem 2: MBeans blowup

The real thing holding me back was this error:

Aug 13, 2010 9:52:16 PM org.apache.catalina.mbeans.ServerLifecycleListener createMBeans
SEVERE: createMBeans: Throwable
java.lang.NullPointerException
at org.apache.catalina.mbeans.MBeanUtils.createObjectName(MBeanUtils.java:1086)
at org.apache.catalina.mbeans.MBeanUtils.createMBean(MBeanUtils.java:504)

Now what the heck was that all about? Well, I found no love on Google. But I did eventually guess what the problem was. This probably underscores a good methodology: when working on configs, add one thing at a time and test it out before going on. If only Tomcat had a “configtest” like httpd – it takes forever to “try it out”.

In case anyone else runs into this, I’ll tell you what it was. The Cluster Howto made it pretty clear that you need to set your webapps context to be distributable for clustering to work. It wasn’t clear to me where to put that, but I didn’t want to create a context descriptor for each app on each instance. I knew you could put a <Context> element in server.xml so that’s right where I put it, right inside the <Host> element:

<Context distributable=”true” />

Well, that turns out to be a bad idea. It causes the error above. So don’t do that. For the products I’m using, there’s a single context.xml that applies to all apps on the server; that’s where you want to put the distributable attribute.

Cluster membership – static

My next task was to get the cluster members in each case to recognize each other and replicate sessions. Although all the examples use multicast to do this, I wanted to try setting up static membership, because I didn’t want to look up how to enable multicast just yet. And it should be simpler, right? Well, I had a heck of a time finding this, but it looks like the StaticMembershipInterceptor is the path.

My interceptor looks like this:

<Interceptor className=”org.apache.catalina.tribes.group.interceptors.StaticMembershipInterceptor”>
<Member className=”org.apache.catalina.tribes.membership.StaticMember”
port=”8210″ securePort=”-1″
host=”vm-centos-cluster-tcs.sosiouxme.lan”
domain=”delta-cluster”
uniqueId=”{0,1,2,3,4,5,6,7,8,9}”
/>
</Interceptor>

(with the “host” being the other host in the 2-node cluster on each side). Starting with this configuration brings an annoying warning message:

WARNING: [SetPropertiesRule]{Server/Service/Engine/Cluster/Channel/Interceptor/Member} Setting property ‘uniqueId’ to ‘{1,2,3,4,5,6,7,8,9,0}’ did not find a matching property.

So I guess that property has been removed and the docs not updated; didn’t seem necessary anyway given the host/port combo should be unique.

In any case, the log at first looks encouraging:

Aug 13, 2010 10:13:04 PM org.apache.catalina.ha.tcp.SimpleTcpCluster memberAdded
INFO: Replication member added:org.apache.catalina.tribes.membership.MemberImpl[tcp://vm-centos-cluster-tcs.sosiouxme.lan:8210,vm-centos-cluster-tcs.sosiouxme.lan,8210, alive=0,id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 }, payload={}, command={}, domain={100 101 108 116 97 45 99 108 117 …(13)}, ]
Aug 13, 2010 10:13:04 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Sleeping for 1000 milliseconds to establish cluster membership, start level:4
Aug 13, 2010 10:13:04 PM org.apache.catalina.ha.tcp.SimpleTcpCluster memberAdded
INFO: Replication member added:org.apache.catalina.tribes.membership.MemberImpl[tcp://{172, 31, 1, 108}:8210,{172, 31, 1, 108},8210, alive=25488670,id={66 108 -53 22 -38 -64 76 -110 -110 -54 28 -11 -126 -44 66 28 }, payload={}, command={}, domain={}, ]

Sounds like the other cluster member is found, right? I don’t know why it’s in there twice (once w/ hostname, once with IP) but I think the first is the configuration value, and the second is for when the member is actually contacted (alive=).

And indeed, for one node, later in the log as the applications are being deployed, I see the session replication appears to happen for each:

WARNING: Manager [localhost#/petclinic], requesting session state from org.apache.catalina.tribes.membership.MemberImpl[tcp://vm-centos-cluster-tcs.sosiouxme.lan:8210,vm-centos-cluster-tcs.sosiouxme.lan,8210, alive=0,id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 }, payload={}, command={}, domain={100 101 108 116 97 45 99 108 117 …(13)}, ]. This operation will timeout if no session state has been received within 60 seconds.
Aug 13, 2010 10:13:09 PM org.apache.catalina.ha.session.DeltaManager waitForSendAllSessions
INFO: Manager [localhost#/petclinic]; session state send at 8/13/10 10:13 PM received in 186 ms.

Um, why is that a WARNING? Looks like normal operation to me, surely it should be an INFO. Whatever. The real problem is on the other side of the node:

13-Aug-2010 22:25:16.624 INFO org.apache.catalina.ha.session.DeltaManager.start Register manager /manager to cluster element Engine with name Catalina
13-Aug-2010 22:25:16.624 INFO org.apache.catalina.ha.session.DeltaManager.start Starting clustering manager at/manager
13-Aug-2010 22:25:16.627 WARNING org.apache.catalina.ha.session.DeltaManager.getAllClusterSessions Manager [localhost#/manager], requesting session state from org.apache.catalina.tribes.membership.MemberImpl[tcp://vm-centos-cluster-ers.sosiouxme.lan:8210,vm-centos-cluster-ers.sosiouxme.lan,8210, alive=0,id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 }, payload={}, command={}, domain={100 101 108 116 97 45 99 108 117 …(13)}, ]. This operation will timeout if no session state has been received within 60 seconds.
13-Aug-2010 22:26:16.635 SEVERE org.apache.catalina.ha.session.DeltaManager.waitForSendAllSessions Manager [localhost#/manager]: No session state send at 8/13/10 10:25 PM received, timing out after 60,009 ms.

And so on for the rest of my webapps too! And if I try to access my cluster, it seems to be hung! While I was writing this up I think I figured out the problem with that. On the first node, I had gone into the context files for the manager and host-manager apps and set distributable=”false” (doesn’t really make sense to distribute manager apps). On the second node I had not done the same. My bad, BUT:

  1. Why did it take 60 seconds to figure this out for EACH app; and
  2. Why did EVERY app, not just the non-distributable ones, fail replication (at 60 seconds apiece)?

Well, having cleared that up my cluster with DeltaManager session replication seems to be working great.

Cluster with BackupManager, multicast

OK, here’s the surprise denouement – this seems to have just worked out of the box. I didn’t even think multicast would work without me having to tweak something in my OS (CentOS 5.5) or my router. But it did, and failover worked flawlessly when I killed a node too. Sweet.