1.概述
分析XX系统tuxedo出现问题时候的日志。
2.系统当前配置
Tuxedo 8.1 on HP-UX 11
3.日志分析
1)执行时间较长的service
1月13日监控tuxedo的运行情况,总体运行比较稳定,不过有执行时间比较长的service:blpncpolicy。如下所示:6402对应的server一直在运行blpncpolicy,而且rqdone数目并没有增加。
> psr -i 6402
Prog Name Queue Name Grp Name ID RqDone Load Done Current Service
--------- ---------- -------- -- ------ --------- ---------------
blpncpolicy_cs blpncpolic+ blprpp 6402 9838 491900 blpncpolicy
> psr -i 6402
Prog Name Queue Name Grp Name ID RqDone Load Done Current Service
--------- ---------- -------- -- ------ --------- ---------------
blpncpolicy_cs blpncpolic+ blprpp 6402 9838 491900 blpncpolicy
> psr -i 6402
Prog Name Queue Name Grp Name ID RqDone Load Done Current Service
--------- ---------- -------- -- ------ --------- ---------------
blpncpolicy_cs blpncpolic+ blprpp 6402 9838 491900 blpncpolicy
> psr -i 6402
Prog Name Queue Name Grp Name ID RqDone Load Done Current Service
--------- ---------- -------- -- ------ --------- ---------------
blpncpolicy_cs blpncpolic+ blprpp 6402 9838 491900 blpncpolicy
2) 因执行时间超时而停止的服务
125549.server_1!BBL.20327: CMDTUX_CAT:1667: WARN: Server(28501) processing terminated after SVCTIMEOUT
125549.server_1!BBL.20327: LIBTUX_CAT:541: WARN: Server blprpc/103 terminated
3)无法调用信号量的错误
50848.server_1!WSH.21190: LIBTUX_CAT:752: ERROR: semop system call failure for semaphore 393257, errno 36
150848.server_1!dbprpo_csvr.20651: LIBTUX_CAT:752: ERROR: semop system call failure for semaphore 393257, errno 36
150848.server_1!blutil_asvr.20693: LIBTUX_CAT:752: ERROR: semop system call failure for semaphore 393257, errno 36
150848.server_1!blpncpolicy_csvr.20914: LIBTUX_CAT:752: ERROR: semop system call failure for semaphore 393257, errno 36
150848.server_1!blutih_asvr.20466: ERROR: msgrcv err(LIBTUX_CAT:666: ERROR: Message operation failed because the queue was removed):
errno=36,qid=35652042,buf=1074273000,bytes=1411,type=-1073741824,flag=0
4)在个server中有tpreturn message send blocked, will try file transfer
的现象
blpncpolicy_csvr.20915: LIBTUX_CAT:1285: WARN: tpreturn message send blocked, will try file transfer
blprpj_csvr.23782: LIBTUX_CAT:1285: WARN: tpreturn message send blocked, will try file transfer
4.内核参数
操作系统中内核参数:msgmnb为16K
5.分析建议
1)在service blpncpolicy和blprpc中添加时间戳,以便确定执行时间长的具体原因。也可暂时在ubb配置文件中增加blprpc的执行时间限制
2)从日志中无法调用信号和tpreturn message send blocked, will try file transfer以及对应内核参数msgmnb的值来看,建议增大msgmnb的值为512K