I have been a staunch user and supporter of Log4J for a long time. In fact a long time ago I wrote a contribution called Chainsaw. Till today I held the view that Log4J was bulletproof and above suspicion. However, I have discovered that I was naive.

My current assignment is a large J2EE project, made up of a number of web-applications deployed to WebSphere 6. Occasionally we have been experiencing the main web-application locking up. So I decided to investigate, rather than just blaming WebSphere and applying the latest fix-pack.

Today when the application locked up, I collected a number of JVM thread dumps. WebSphere makes it very easy to collect using the wsadmin tool (documented here). Having never looked at JVM thread dumps before I found the WebSphere Journal article WebSphere Application Server Java Dumps very useful.

I hoped to use the WebSphere ThreadAnalyzer tool to analyse the thread dumps. Unfortunately I was to quick and overlooked that it does not support WebSphere 6. Going back to square one, well Google, I found the JavaOne presentation View from the Trenches: Thread Dumps thread, which gave me pointers to a couple of useful tools.

I had good success using Samurai which is easy to run thanks to a webstart link. ThreadDumpAnalyzer is useful as it can do a time series across multiple dumps, but it is awkward to setup.

The image below is a screen shot of the Samurai application in the overview mode. You can see that a number of threads are blocked as they have a red box. Only the WebContainer : 4 thread is potentially working, signified by the white box.

Samurai

As the WebContainer : 4 thread stayed in this state permanently, I used Samurai to investigate where it was stuck. I was able to get the following stack trace.

Thread dump :"WebContainer : 4"  Shrink idle threads:on. - disable

Thread dump 1/1
"WebContainer : 4" (TID:0x104496D8, sys_thread_t:0x51658DB0, state:R, native ID:0x12C4) prio=5
  at java.net.SocketOutputStream.socketWrite0(Native Method)
  at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java(Compiled Code))
  at java.net.SocketOutputStream.write(SocketOutputStream.java(Compiled Code))
  at java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java(Compiled Code))
  at java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java(Compiled Code))
  at java.io.ObjectOutputStream.defaultWriteObject(ObjectOutputStream.java(Compiled Code))
  at org.apache.log4j.spi.LoggingEvent.writeObject(LoggingEvent.java:398)
  at sun.reflect.GeneratedMethodAccessor2726.invoke(Unknown Source)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java(Compiled Code))
  at java.lang.reflect.Method.invoke(Method.java(Compiled Code))
  at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java(Compiled Code))
  at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java(Compiled Code))
  at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java(Compiled Code))
  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java(Compiled Code))
  at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java(Compiled Code))
  at org.apache.log4j.net.SocketAppender.append(SocketAppender.java:224)
  at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java(Compiled Code))
  at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java(Compiled Code))
  at org.apache.log4j.Category.callAppenders(Category.java(Compiled Code))
  at org.apache.log4j.Category.forcedLog(Category.java:379)
  at org.apache.log4j.Category.debug(Category.java:248)
  ...

The stack trace reveals that the thread was blocked in the Log4J code, which was configured to log to file and a SocketAppender. Now knowing that Log4J was the problem, I did some searching to find this and this. What is very worrying is to search for the word deadlock in a recent bug-list.

I concluded that Log4J is deadlocking, and most likely because of the SocketAppender. As most of the time Chainsaw is not being run on the server, Log4J generates loads of warning messages. It will do this if you do not call setQuietMode(true).

In the end I changed the applications to call the setQuietMode(true) method on startup, and removed the SocketAppender from the production configuration. I have learnt a lot today. :-)