现象描述:
Weblogic启动后,10到30分钟就会hang住,应用和管理控制台都无法访问。强制kill -9 pid后端口无法释放,使用rmsock 命令查看端口显示Wait for exiting processes to be cleaned up before removing the socket。
分析及处理过程
1. 用ps –ef | grep java找到weblogic进程,每隔三分种执行kill -3 pid,在domain目录下生成javacore文件
2. 分析weblogic日志,发现如下内容
<Aug 21, 2009 4:33:37 AM CDT> <Error> <WebLogicServer> <BEA-000337> <[STUCK] ExecuteThread: ‘1′ for queue: ‘weblogic.kernel.Default (self-tuning)’ has been busy for “620″ seconds working on the request
“weblogic.work.SelfTuningWorkManagerImpl$WorkAdapterImpl@20de20de”, which is more than the configured time (StuckThreadMaxTime) of “600″ seconds. Stack trace:
java.net.SocketOutputStream.socketWrite0(Native Method)
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:103)
……
<Aug 21, 2009 4:34:37 AM CDT> <Error> <WebLogicServer> <BEA-000337> <[STUCK] ExecuteThread: ‘1′ for queue: ‘weblogic.kernel.Default (self-tuning)’ has been busy for “680″ seconds working on the request
“weblogic.work.SelfTuningWorkManagerImpl$WorkAdapterImpl@20de20de”, which is more than the configured time (StuckThreadMaxTime) of “600″ seconds. Stack trace:
java.net.SocketOutputStream.socketWrite0(Native Method)
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:103)
……
3. 用IBM Thread and Monitor Dump Analyzer for java分析刚才生成的thread dump,找到如下两个线程信息:
3XMTHREADINFO “[ACTIVE] ExecuteThread: ‘5′ for queue: ‘weblogic.kernel.Default (self-tuning)’” TID:0×39CBED00, j9thread_t:0×3751C83C, state:R, prio=5
3XMTHREADINFO1 (native thread ID:0xCE1DB, native priority:0×5, native policy:UNKNOWN)
4XESTACKTRACE at java/net/PlainSocketImpl.socketClose0(Native Method)
4XESTACKTRACE at java/net/PlainSocketImpl.socketPreClose(PlainSocketImpl.java:706)
4XESTACKTRACE at java/net/PlainSocketImpl.close(PlainSocketImpl.java:540)
4XESTACKTRACE at java/net/SocksSocketImpl.close(SocksSocketImpl.java:1041)
4XESTACKTRACE at java/net/Socket.close(Socket.java:1343)
4XESTACKTRACE at weblogic/socket/SocketMuxer.closeSocket(SocketMuxer.java:475)
4XESTACKTRACE at weblogic/socket/SocketMuxer.cancelIo(SocketMuxer.java:813)
4XESTACKTRACE at weblogic/socket/SocketMuxer$TimerListenerImpl.timerExpired(SocketMuxer.java:1021(Compiled Code))
4XESTACKTRACE at weblogic/timers/internal/TimerImpl.run(TimerImpl.java:273(Compiled Code))
4XESTACKTRACE at weblogic/work/SelfTuningWorkManagerImpl$WorkAdapterImpl.run(SelfTuningWorkManagerImpl.java:516(Compiled Code))
4XESTACKTRACE at weblogic/work/ExecuteThread.execute(ExecuteThread.java:201(Compiled Code))
4XESTACKTRACE at weblogic/work/ExecuteThread.run(ExecuteThread.java:173)
3XMTHREADINFO “ExecuteThread: ‘7′ for queue: ‘weblogic.socket.Muxer’” TID:0×35381D00, j9thread_t:0×35385864, state:R, prio=5
3XMTHREADINFO1 (native thread ID:0xB916F, native priority:0×5, native policy:UNKNOWN)
4XESTACKTRACE at weblogic/socket/PosixSocketMuxer.poll(Native Method)
4XESTACKTRACE at weblogic/socket/PosixSocketMuxer.processSockets(PosixSocketMuxer.java:102(Compiled Code))
4XESTACKTRACE at weblogic/socket/SocketReaderRequest.run(SocketReaderRequest.java:29)
4XESTACKTRACE at weblogic/socket/SocketReaderRequest.execute(SocketReaderRequest.java:42)
4XESTACKTRACE at weblogic/kernel/ExecuteThread.execute(ExecuteThread.java:145)
4XESTACKTRACE at weblogic/kernel/ExecuteThread.run(ExecuteThread.java:117)
4. 执行线程只有这两个是running状态,一个做CLOSE(),一个做POLL()。别的都是blocked或者wait状态。
5. 经过metalink查询以及和800支持人员确认,这是Weblogic在AIX的JVM上由来已久的bug,从8.1.4就开始在不同版本间出现。原因是IBM的JVM底层socket实现和weblogic配合问题,需要打patch CR370915_1030GA.jar解决。
操作过程
1.在weblogic的启动脚本中,找到CLASSPATH一行
2.在CLASSPATH变量的第一位添加补丁jar包
Eg: CLASSPATH=”${CLASSPATH}${CLASSPATHSEP}${MEDREC_WEBLOGIC_CLASSPATH}”
—>
CLASSPATH=/路径/CR370915_1030GA.jar:”${CLASSPATH}${CLASSPATHSEP}${MEDREC_WEBLOGIC_CLASSPATH}”
3.以上操作仅对这个domain起作用,为了对所有domain起作用,可以添加到common/bin/的目录中的commEnv.sh文件中WEBLOGIC_CLASSPATH=最前面
总结
这个bug在weblgoic和IBM的JVM相组合的平台上出现较为普遍,如果出现相关日志信息,基本可以断定需要打CR370915补丁。
--转自