1.概述
应客户要求,在增加一台tuxed服务器把大部分客户端的连接都负载到了新机器上的情况下,再次分析XX的tuxedo 最近服务端和客户端的日志,使用netstat –p tcp监控操作系统相关网络状态,查看系统运行情况。
2.系统当前配置
Tuxedo 8.1 on HP-UX 11
3.服务端日志分析以及tcp监控
从今天上午的监控来看,执行netstat -p tcp结果中connect requests dropped due to full queue 的值增加的速度比以前变的慢很多了,尤其是新机器上,一上午才增加了21个,但是ulog中还是有很多因为执行超时而被杀死重启的服务
hp8420!BBL.2872: CMDTUX_CAT:1836: WARN: Server(13689) processing terminated with SIGKILL after SVCTIMEOUT
064445.hp8420!BBL.2872: LIBTUX_CAT:541: WARN: Server dbprpo/942 terminated
064445.hp8420!BBL.2872: LIBTUX_CAT:557: INFO: Server dbprpo/942 being restarted
082247.hp8420!BBL.2872: CMDTUX_CAT:1836: WARN: Server(26754) processing terminated with SIGKILL after SVCTIMEOUT
082257.hp8420!BBL.2872: LIBTUX_CAT:541: WARN: Server blprpo/260 terminated
082257.hp8420!BBL.2872: LIBTUX_CAT:557: INFO: Server blprpo/260 being restarted
082738.hp8420!BBL.2872: CMDTUX_CAT:1836: WARN: Server(29987) processing terminated with SIGKILL after SVCTIMEOUT
082758.hp8420!BBL.2872: LIBTUX_CAT:541: WARN: Server dbprpt/1592 terminated
082758.hp8420!BBL.2872: LIBTUX_CAT:557: INFO: Server dbprpt/1592 being restarted
082948.hp8420!BBL.2872: CMDTUX_CAT:1836: WARN: Server(29196) processing terminated with SIGKILL after SVCTIMEOUT
082958.hp8420!BBL.2872: LIBTUX_CAT:541: WARN: Server blprpc/151 terminated
082958.hp8420!BBL.2872: LIBTUX_CAT:557: INFO: Server blprpc/151 being restarted
160003.hp8420!BBL.2872: CMDTUX_CAT:1836: WARN: Server(19596) processing terminated with SIGKILL after SVCTIMEOUT
160023.hp8420!BBL.2872: LIBTUX_CAT:541: WARN: Server blprpt/341 terminated
160023.hp8420!BBL.2872: LIBTUX_CAT:557: INFO: Server blprpt/341 being restarted
090859.hp8420!BBL.2872: CMDTUX_CAT:1836: WARN: Server(100) processing terminated with SIGKILL after SVCTIMEOUT
090859.hp8420!BBL.2872: LIBTUX_CAT:541: WARN: Server dbprpp/1492 terminated
090859.hp8420!BBL.2872: LIBTUX_CAT:557: INFO: Server dbprpp/1492 being restarted
091910.hp8420!BBL.2872: CMDTUX_CAT:1836: WARN: Server(952) processing terminated with SIGKILL after SVCTIMEOUT
092005.hp8420!BBL.2872: LIBTUX_CAT:541: WARN: Server dbprpc/680 terminated
092005.hp8420!BBL.2872: LIBTUX_CAT:557: INFO: Server dbprpc/680 being restarted
跟XX的客户沟通,他们说客户端的反应很快了,他们很满意现在的速度。初步分析应该是以前机器的容量不够了包括网络那一块的性能,新增加一台服务器分担处理请求效果显著。
4.新添加服务器相关参数
tcp_conn_request_max:4096
tcp_syn_rcvd_max: 500
tcp_time_wait_interval: 60000
ulimit :10240
msgnb: 64k
5.建议
1)根据实际的运行时间,加大dbprpo、blprpo、dbprpc等service的SVCTIMEOUT参数的值,或者优化这些service以减少它的执行时间。
2)把tcp_syn_rcvd_max的值由500调整到1000
3)把tcp_time_wait_interval值由60000调整到30000
4)请客户继续观察系统运行状况,业务是否都能被正确处理,因为从Ulog里面看,还是有很多执行超时的服务,从客户体验角度来说会好些,但处理业务的质量并不能保证,还需继续观察