应局方要求协助局方工程师和第三方工程师调查Weblogic调用Tuxedo失败的原因。
Tuxedo方工程师在查看ULOG日志时出现大量的重新建立域连接的日志信息,如下:
170149.jfxz2!GWTDOMAIN.925698.1.0: LIBGWT_CAT:1553: INFO: New connection from domain <WTC_PHS_01> accepted, drop old connection!
170149.jfxz2!GWTDOMAIN.925698.1.0: LIBGWT_CAT:1572: INFO: Duplicated connection to remote domain <WTC_PHS_01>, drop new connection!
170249.jfxz2!GWTDOMAIN.925698.1.0: LIBGWT_CAT:1553: INFO: New connection from domain <WTC_PHS_01> accepted, drop old connection!
170249.jfxz2!GWTDOMAIN.925698.1.0: LIBGWT_CAT:1572: INFO: Duplicated connection to remote domain <WTC_PHS_01>, drop new connection!
170349.jfxz2!GWTDOMAIN.925698.1.0: LIBGWT_CAT:1553: INFO: New connection from domain <WTC_PHS_01> accepted, drop old connection!
170349.jfxz2!GWTDOMAIN.925698.1.0: LIBGWT_CAT:1572: INFO: Duplicated connection to remote domain <WTC_PHS_01>, drop new connection!
170449.jfxz2!GWTDOMAIN.925698.1.0: LIBGWT_CAT:1553: INFO: New connection from domain <WTC_PHS_01> accepted, drop old connection!
170449.jfxz2!GWTDOMAIN.925698.1.0: LIBGWT_CAT:1572: INFO: Duplicated connection to remote domain <WTC_PHS_01>, drop new connection!
上述日志信息为当前域连接存在的前提下,远程域又一次建立了新的请求然后丢弃之前的连接。
查看WTC_PHS_01域所在的Weblogic系统,发现WTC_PHS_01所在的日志中存在大量建立连接的信息:
####<2013-2-21
上午08时54分51秒 GMT+08:00> <Info> <Health> <tyjk1>
<PHS_OUT_162> <weblogic.GCMonitor> <<anonymous>>
<> <> <1361408091466> <BEA-310002> <67% of the
total memory in the server is free>
####<2013-2-21
上午08时54分59秒 GMT+08:00> <Info> <WTC> <tyjk1>
<PHS_OUT_162> <[ACTIVE] ExecuteThread: '12' for queue:
'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>>
<> <> <1361408099750> <BEA-180087> <Established a
connection to a remote domain TDOM_cbspay70.>
####<2013-2-21
上午08时55分32秒 GMT+08:00> <Info> <WTC> <tyjk1>
<PHS_OUT_162> <[ACTIVE] ExecuteThread: '11' for queue:
'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>>
<> <> <1361408132615> <BEA-180087> <Established a
connection to a remote domain TDOM_cbspay20.>
####<2013-2-21
上午08时55分38秒 GMT+08:00> <Info> <WTC> <tyjk1>
<PHS_OUT_162> <[ACTIVE] ExecuteThread: '12' for queue:
'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>>
<> <> <1361408138396> <BEA-180087> <Established a
connection to a remote domain TDOM_cbspay80.>
####<2013-2-21
上午08时55分51秒 GMT+08:00> <Info> <Health> <tyjk1>
<PHS_OUT_162> <weblogic.GCMonitor> <<anonymous>>
<> <> <1361408151476> <BEA-310002> <55% of the
total memory in the server is free>
####<2013-2-21
上午08时55分59秒 GMT+08:00> <Info> <WTC> <tyjk1>
<PHS_OUT_162> <[ACTIVE] ExecuteThread: '17' for queue:
'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>>
<> <> <1361408159775> <BEA-180087> <Established a
connection to a remote domain TDOM_cbspay70.>
####<2013-2-21
上午08时56分32秒 GMT+08:00> <Info> <WTC> <tyjk1>
<PHS_OUT_162> <[ACTIVE] ExecuteThread: '2' for queue:
'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>>
<> <> <1361408192649> <BEA-180087> <Established a
connection to a remote domain TDOM_cbspay20.>
####<2013-2-21
上午08时56分38秒 GMT+08:00> <Info> <WTC> <tyjk1>
<PHS_OUT_162> <[ACTIVE] ExecuteThread: '13' for queue:
'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>>
<> <> <1361408198428> <BEA-180087> <Established a
connection to a remote domain TDOM_cbspay80.>
####<2013-2-21
上午08时56分51秒 GMT+08:00> <Info> <Health> <tyjk1>
<PHS_OUT_162> <weblogic.GCMonitor> <<anonymous>>
<> <> <1361408211486> <BEA-310002> <82% of the
total memory in the server is free>
####<2013-2-21
上午08时56分59秒 GMT+08:00> <Info> <WTC> <tyjk1>
<PHS_OUT_162> <[ACTIVE] ExecuteThread: '13' for queue:
'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>>
<> <> <1361408219801> <BEA-180087> <Established a
connection to a remote domain TDOM_cbspay70.>
####<2013-2-21
上午08时57分32秒 GMT+08:00> <Info> <WTC> <tyjk1>
<PHS_OUT_162> <[ACTIVE] ExecuteThread: '6' for queue:
'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>>
<> <> <1361408252675> <BEA-180087> <Established a
connection to a remote domain TDOM_cbspay20.>
####<2013-2-21
上午08时57分38秒 GMT+08:00> <Info> <WTC> <tyjk1>
<PHS_OUT_162> <[ACTIVE] ExecuteThread: '13' for queue:
'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>>
<> <> <1361408258455> <BEA-180087> <Established a
connection to a remote domain TDOM_cbspay80.>
####<2013-2-21
上午08时57分51秒 GMT+08:00> <Info> <Health> <tyjk1>
<PHS_OUT_162> <weblogic.GCMonitor> <<anonymous>>
<> <> <1361408271496> <BEA-310002> <70% of the
total memory in the server is free>
####<2013-2-21
上午08时57分59秒 GMT+08:00> <Info> <WTC> <tyjk1>
<PHS_OUT_162> <[ACTIVE] ExecuteThread: '16' for queue:
'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>>
<> <> <1361408279825> <BEA-180087> <Established a
connection to a remote domain TDOM_cbspay70.>
可以看到对以上几个远程域出现频繁的建立域连接的请求。
从上述日志分析可以看到主要是由于WEblogic侧频繁的建立远程域连接请求导致Tuxedo侧出现丢弃新连接的日志信息。
分析日志信息后发现频繁建立连接的远程域只有TDOM_cbspay20,TDOM_cbspay70,TDOM_cbspay80,而TDOM_cbspay10并未出现重连的情况。
与其他工程师分析Tuxedo配置文件发现配置信息相同,并未发现不同配置信息。
而将Tuxedo连接策略修改为ON_DEMAND,并未奏效。
根据metalink查询错误号可以查到BUG,BUG内容如下:
applies to:
Oracle Weblogic
Server - Version 10.3 and later
Information in
this document applies to any platform.
Symptoms
In WebLogic
Server (WLS) server.log there are too many notices about WTC connection
creation as below:
####<2012-8-23
timestamp> <Info> <WTC> <machA1> <serverA1>
<[STANDBY] ExecuteThread: '103' for queue: 'weblogic.kernel.Default
(self-tuning)'> <<WLS Kernel>> <> <>
<1345689192781> <BEA-180087> <Established a connection to a
remote domain rdom.>
####<2012-8-23
timestamp> <Info> <WTC> <machA1> <serverA1>
<[STANDBY] ExecuteThread: '103' for queue: 'weblogic.kernel.Default
(self-tuning)'> <<WLS Kernel>> <> <>
<1345689193814> <BEA-180087> <Established a connection to a
remote domain rdom.>
####<2012-8-23
timestamp> <Info> <WTC> <machA1> <serverA1>
<[STANDBY] ExecuteThread: '103' for queue: 'weblogic.kernel.Default
(self-tuning)'> <<WLS Kernel>> <> <>
<1345689194427> <BEA-180087> <Established a connection to a
remote domain rdom.>
####<2012-8-23
timestamp> <Info> <WTC> <machA1> <serverA1>
<[STANDBY] ExecuteThread: '103' for queue: 'weblogic.kernel.Default
(self-tuning)'> <<WLS Kernel>> <> <>
<1345689195462> <BEA-180087> <Established a connection to a
remote domain rdom.>
####<2012-8-23
timestamp> <Info> <WTC> <machA1> <serverA1>
<[STANDBY] ExecuteThread: '103' for queue: 'weblogic.kernel.Default
(self-tuning)'> <<WLS Kernel>> <> <>
<1345689196207> <BEA-180087> <Established a connection to a
remote domain rdom.>
####<2012-8-23
timestamp> <Info> <WTC> <machA1> <serverA1>
<[STANDBY] ExecuteThread: '103' for queue: 'weblogic.kernel.Default
(self-tuning)'> <<WLS Kernel>> <> <>
<1345689197057> <BEA-180087> <Established a connection to a
remote domain rdom.>
####<2012-8-23
timestamp> <Info> <WTC> <machA1> <serverA1>
<[STANDBY] ExecuteThread: '103' for queue: 'weblogic.kernel.Default
(self-tuning)'> <<WLS Kernel>> <> <>
<1345689197737> <BEA-180087> <Established a connection to a
remote domain rdom.>
The remote
Tuxedo ULOG contains the following message:
164848.server!GWTDOMAIN.13730.4.0:
LIBGWT_CAT:1553: INFO: New connection from domain <rdom_wtc> accepted,
drop old connection!
164948.server!GWTDOMAIN.13730.4.0:
LIBGWT_CAT:1553: INFO: New connection from domain <rdom_wtc> accepted,
drop old connection!
165048.server!GWTDOMAIN.13730.4.0:
LIBGWT_CAT:1553: INFO: New connection from domain <rdom_wtc> accepted,
drop old connection!
165148.server!GWTDOMAIN.13730.4.0:
LIBGWT_CAT:1553: INFO: New connection from domain <rdom_wtc> accepted,
drop old connection!
165248.server!GWTDOMAIN.13730.4.0:
LIBGWT_CAT:1553: INFO: New connection from domain <rdom_wtc> accepted,
drop old connection!
165348.server!GWTDOMAIN.13730.4.0:
LIBGWT_CAT:1553: INFO: New connection from domain <rdom_wtc> accepted,
drop old connection!
165348.server!GWTDOMAIN.13730.4.0:
LIBGWT_CAT:1553: INFO: New connection from domain <rdom_wtc> accepted,
drop old connection!
Cause
When using
Tuxedo and WTC with ON_STARTUP connection policy configured on both sides:
If both nodes
try to establish connection at the same time, Tuxedo will reject the first
connection and then keep the second.
However, WTC
will get a disconnection event from the first connection. The first
connection status will be none, service is unadvertised,
and WTC will
kepp retrying to connect the Tuxedo
connection(up to the configured retry setting).
Solution
Apply the WTC
patch for BUG 13556740 which has been created for your release of WLS.
References
BUG:13556740 -
WTC CAN'T HANDLE DUPLICATE CONNECTION
以上BUG信息比较符合现在所在的描述。
而该Weblogic于12年10月21日重启后并未再次进行重启,而且同一个集群中的另外一个服务并未出现这种情况。
因此在不方便打补丁的情况下建议先重启,如果问题复现在进行补丁升级。
该贴被funny编辑于2014-2-17 23:40:52该贴被funny编辑于2014-2-17 23:42:03该贴被funny编辑于2014-2-17 23:44:07该贴被funny编辑于2014-2-18 0:14:43