Hi,
We are using Oracle tuxedo, Version 10.3.0.0, 64-bit, Patch Level 095 on AIX 6.1 power 7 machine. We have four domain (2 domain have MP configuration(master-slave) and 2 individual domains). There are local and remote service published in domains. During test runs we found that doamins keep on disconnecting from each other and not connected again altough we get re-connection message in ULOG.
Let me present one scenarion. I got following from ULOG.
071515.uaix3034!GWTDOMAIN.18022630.772.0: LIBGWT_CAT:1130: INFO: Disconnected from domain (domainid=<PATDom2>)
071515.uaix3034!GWTDOMAIN.18022630.772.0: LIBGWT_CAT:1354: INFO: Retrying domain (domainid=<PATDom2>) every 60 seconds
071515.uaix3034!GWTDOMAIN.18022630.772.0: LIBGWT_CAT:1502: WARN: Message with TPNOREPLY to service ..TMS dropped - network down
071515.uaix3034!GWTDOMAIN.18022630.772.0: LIBGWT_CAT:1502: WARN: Message with TPNOREPLY to service ..TMS dropped - network down
071515.uaix3034!GWTDOMAIN.18022630.772.0: LIBGWT_CAT:1502: WARN: Message with TPNOREPLY to service ..TMS dropped - network down
071552.uaix3034!GWTDOMAIN.18022630.772.0: LIBGWT_CAT:1502: WARN: Message with TPNOREPLY to service ..TMS dropped - network down
071552.uaix3034!GWTDOMAIN.18022630.772.0: LIBGWT_CAT:1502: WARN: Message with TPNOREPLY to service ..TMS dropped - network down
071552.uaix3034!GWTDOMAIN.18022630.772.0: LIBGWT_CAT:1502: WARN: Message with TPNOREPLY to service ..TMS dropped - network down
071553.uaix3034!GWTDOMAIN.18022630.772.0: LIBGWT_CAT:1129: INFO: Connection established with domain (domainid=<PATDom2>)
071602.uaix3034!GWTDOMAIN.18022630.772.0: LIBGWT_CAT:1130: INFO: Disconnected from domain (domainid=<PATDom1>)
071602.uaix3034!GWTDOMAIN.18022630.772.0: LIBGWT_CAT:1354: INFO: Retrying domain (domainid=<PATDom1>) every 60 seconds
071602.uaix3034!GWTDOMAIN.18022630.772.0: LIBGWT_CAT:1502: WARN: Message with TPNOREPLY to service ..TMS dropped - network down
071602.uaix3034!GWTDOMAIN.18022630.772.0: LIBGWT_CAT:1502: WARN: Message with TPNOREPLY to service ..TMS dropped - network down
071602.uaix3034!GWTDOMAIN.18022630.772.0: LIBGWT_CAT:1502: WARN: Message with TPNOREPLY to service ..TMS dropped - network down
071602.uaix3034!GWTDOMAIN.18022630.772.0: LIBGWT_CAT:1502: WARN: Message with TPNOREPLY to service ..TMS dropped - network down
071602.uaix3034!GWTDOMAIN.18022630.772.0: LIBGWT_CAT:1502: WARN: Message with TPNOREPLY to service ..TMS dropped - network down
071602.uaix3034!GWTDOMAIN.18022630.772.0: LIBGWT_CAT:1502: WARN: Message with TPNOREPLY to service ..TMS dropped - network down
071652.uaix3034!GWTDOMAIN.18022630.772.0: LIBGWT_CAT:1502: WARN: Message with TPNOREPLY to service ..TMS dropped - network down
071652.uaix3034!GWTDOMAIN.18022630.772.0: LIBGWT_CAT:1502: WARN: Message with TPNOREPLY to service ..TMS dropped - network down
071652.uaix3034!GWTDOMAIN.18022630.772.0: LIBGWT_CAT:1502: WARN: Message with TPNOREPLY to service ..TMS dropped - network down
071652.uaix3034!GWTDOMAIN.18022630.772.0: LIBGWT_CAT:1502: WARN: Message with TPNOREPLY to service ..TMS dropped - network down
071652.uaix3034!GWTDOMAIN.18022630.772.0: LIBGWT_CAT:1502: WARN: Message with TPNOREPLY to service ..TMS dropped - network down
071652.uaix3034!GWTDOMAIN.18022630.772.0: LIBGWT_CAT:1502: WARN: Message with TPNOREPLY to service ..TMS dropped - network down
071653.uaix3034!GWTDOMAIN.18022630.772.0: LIBGWT_CAT:1129: INFO: Connection established with domain (domainid=<PATDom1>)
I get a message that connection is reestablished (last line of log above) but one of the remote service called from remote domain PATDom1 failed with TPESTSTEM Error and it was through only after many retries and after bbclean and pclean was run through tmadmin.
This is a true OLTP application and outgoing message are not sent in real time and delayed due to service failures.
I have following questions:
1/ Does LIBGWT_CAT:1502 point to network between domain being down where as actually this is not the case as it is checked at network level and there is no issue or it points to some other error ?
2/ How to trace domain communication(service calls across domain) more effectively so that any service failure can be detected early and handled.
Regards,
Ajeet Tewari