说明:
该文档专门针对WebLogic10.3怎么查问题、定位问题及日常维护中的一些小技巧,该文档适合对WEB应用程序有一定认识、对WebLogic有一定了解,对当前主机环境熟悉,对Aix命令有一定基础的相关人员。其它非法人员切记在不了解的情况下做任何修改,执行任何命令,以此带来的任何问题及影响概不负责。
一、日常监控
1、集群负载监控
1. 查看集群内各Server的单独访问地址,是否能正常访问,Server访问地址后面有统一说明,常见的无法访问原因有:
a.Server没有启动(提示Service not available),启动Server即可;
b.应用程序是否为Active(提示403或404),update或start即可;
c.应用程序已经是Active,Server也RUNNING仍无法访问,则可以通过应用程序的Monitoring查看各Server上的运行情况,查看对应Server的*.out日志,通常情况下是应用程序更新有问题导致的;
2. 查看Proxy Server的线程数,根据应用程序根路径名可以定位是哪个具体的Server或集群,若有队列吞吐量为0的情况说明集群没有实现分发(需要分析Proxy的web.xml文件,看看对应的servlet(即应用程序根路径))是否有如下配置:
<servlet>
<servlet-name>Ngboss</servlet-name> #该名称必需要有对应的servlet-mapping
<servlet-class>
weblogic.servlet.proxy.HttpClusterServlet
</servlet-class>
<init-param>
<param-name>WebLogicCluster</param-name>
<param-value>
10.131.39.75:7101 10.131.39.76:7101 #IP及端口是否配置正确,顺序是否正确
</param-value>
</init-param>
<init-param>
<param-name>CookieName</param-name>
<param-value>NGBOSS_JSESSIONID</param-value> #该名称必需与应用程序的weblogic.xml里配置的CookieName一致
</init-param>
<init-param>
<param-name>wl-dispatch-policy</param-name>
<param-value>ngboss</param-value>
</init-param> #该段配置是用来对集群分发,配合它才能实现集群的分发,param-value即servlet 的访问名称
</servlet>
......
......
<servlet-mapping>
<servlet-name>Ngboss</servlet-name> #该名称必需要有对应的servlet
<url-pattern>/*</url-pattern>
</servlet-mapping>
3. 查看应用程序的Monitoring的Session,根据各Server上的在线Session数即可判断集群是否实现负载,常见的无法负载原因有:
a.集群内存在已挂死的Server(查看各Server的*.out日志即可);
b.Proxy的web.xml配置错误,如IP、端口、CookieName、servlet名等;
c.应用程序发布的target是不是在集群;
2、Server内存、队列、线程数监控
1. 1.进入对应的Server监控页,查看Performance的Java 内存、Threads的队列及线程数、JDBC连接数等
3、WTC监控
1. Service->WTCServer->Ctrl是否connected
4、JDBC监制
1. 查看连接的总体情况,看看是否有未释放的连接,查看时要多次刷新页面看看Java内存是否可以正常回收、队列及线程数是否有居高不下、JDBC是否有长时间不释放(这种情况下需要DBA配合分析)
5、日志监控
1. 定期查看server/logs和logs/xxx_error.log出现的错误日志
6、JMS监制
1. Interoperability->WTC Servers->Connected是否为true
7、后台进程监控
1. 用命令ps –ef grep $ServerName,看进程是否存在,ServerName列表可参看Server访问地址里的访问根路径名
二、问题分析定位必杀技
1. 用topas命令,查看占用CPU高的进程ID;
4. 然后用ps -ef grep $ID即可看到对应的ServerName;
5. 然后进Console看该Server的运行状态(RUNNING),如果Server已经Down掉了可以直接通过Console启起来,记得不要把ngbossdomain/servers/$ServerName/logs下的文件删除,用作之后的日志分析
6. 在Console可以操作的前提下,还可以查看该Server的Dump 日志、进程的内存使用情况、队列及空闲线程、JDBC连接数等
7. 若Console不可操作(一般在有Server挂死的情况下),可以用命令ps -mp $ID -o THREAD grep R;kill -3 $ID(需要记录该命令的输出内容),执行完后会在ngbossdomain下生成javacore文件(该文件与Console里的Dump日志内容相似),该命令可以多执行几次,生成多个javacore文件,方便之后的问题分析;如果需要重启服务可以用kill -9 $ID,执行完这个命令后该Server会自动重启,切记要先生成javacore,再kill -9;
说明:通过分析某一进程的各个线程运行情况,定位问题,可以通过多种途径获取进程,如:
1. 根据有问题的系统模块找到对应的Server,然后用ps –ef grep ServerName即可得到进程名
2. 根据WL的控制台找到对应的Server,然后用ps –ef grep ServerName即可得到进程名
三、javacore快速定位
1、生成javacore文件
1. 通过命令ps -mp $ID -o THREAD grep R;kill -3 $ID($ID为进程ID),即可在域目录(ngbossdomain)下生成与进程ID对应的javacore文件,主要分析*.txt文件,记下该命令的输出日志,记下着色处的数字,示例如下:
8. :/ngboss/webapp $ps -mp 286906 -o THREAD grep R;
USER PID PPID TID ST CP PRI SC WCHAN F TT BND COMMAND
webapp 286906 53800 - A 90 60 245 * 242001 - - /usr/java6_64/jre/bin/java -Dweblogic.Name=acctmanm22 -Djava.security.policy=/bea/weblogic/server/lib/weblogic.policy -Dweblogic.management.server=http://10.200.141.23:7001 -Djava.library.path=/usr/java6_64/jre/lib/ppc64/default:/usr/java6_64/jre/lib/ppc64:/usr/java6_64/jre/lib/ppc64:/usr/java6_64/jre/lib/ppc64/default:/usr/lib:/usr/java6_64/jre/lib/ppc64/j9vm:/usr/java6_64/jre/lib/ppc64:/usr/java6_64/jre/../lib/ppc64::/bea/weblogic/server/native/aix/ppc64:/usr/lib -Djava.class.path=/bea/weblogic/server/lib/AIX-ComboPatch-Essex.jar:/bea/weblogic/server/lib/CR370915_1030GA.jar:/bea/patch_wls1030/profiles/default/sys_manifest_classpath/weblogic_patch.jar:/bea/patch_cie660/profiles/default/sys_manifest_classpath/weblogic_patch.jar:/usr/java6_64/lib/tools.jar:/bea/weblogic/server/lib/weblogic_sp.jar:/bea/weblogic/server/lib/weblogic.jar:/bea/modules/features/weblogic.server.modules_10.3.0.0.jar:/bea/weblogic/server/lib/webservices.jar:/bea/modules/org.apache.ant_1.6.5/lib/ant-all.jar:/bea/modules/net.sf.antcontrib_1.0.0.0_1-0b2/lib/ant-contrib.jar::/bea -Dweblogic.system.BootIdentityFile=/ngboss/webapp/ngbossdomain/servers/acctmanm22/data/nodemanager/boot.properties -Dweblogic.nodemanager.ServiceEnabled=true -Dweblogic.security.SSL.ignoreHostnameVerification=false -Dweblogic.ReverseDNSAllowed=false -Xms3072m -Xmx4096m -Dibm.stream.nio=true -Dfile.encoding=GBK -Duser.language=zh -Duser.region=CN -Xgcpolicy:gencon weblogic.Server
- - - 975497 R 88 141 0 - 400000 - - -
说明:
该命令生成当进程正在运行(状态为R)的线程日志,着色处为线程ID,将该ID用Windows自带的计算器转换成16位即可在javacore文件里找到对应的线程日志
2、分析javacore定位问题
1. 用vi命令查看javacore*.txt文件,查找线程ID转16位的串;
9. 如果1查到的内容是:GC日志、Wait状态、autoLogin栈都不需要关注,只关注有带有相应模块串的异常栈信息,示例如下:
10. 3XMTHREADINFO "ExecuteThread: '2' for queue: 'default'" TID:0x0000000117168700, j9thread_t:0x00000001170E8160, state:CW, prio=5
3XMTHREADINFO1 (native thread ID:0x325063, native priority:0x5, native policy:UNKNOWN)
4XESTACKTRACE at oracle/jdbc/driver/T4CMAREngine.unmarshalCLR(T4CMAREngine.java:1589(Compiled Code))
4XESTACKTRACE at oracle/jdbc/driver/T4CMAREngine.unmarshalCLR(T4CMAREngine.java:1801(Compiled Code))
4XESTACKTRACE at oracle/jdbc/driver/T4CMAREngine.unmarshalDALC(T4CMAREngine.java:2125(Compiled Code))
4XESTACKTRACE at oracle/jdbc/driver/T4C8TTIrxh.unmarshalV10(T4C8TTIrxh.java:107(Compiled Code))
4XESTACKTRACE at oracle/jdbc/driver/T4C8Oall.receive(T4C8Oall.java:654(Compiled Code))
4XESTACKTRACE at oracle/jdbc/driver/T4CPreparedStatement.doOall8(T4CPreparedStatement.java:194(Compiled Code))
4XESTACKTRACE at oracle/jdbc/driver/T4CPreparedStatement.fetch(T4CPreparedStatement.java:1017(Compiled Code))
4XESTACKTRACE at oracle/jdbc/driver/OracleResultSetImpl.close_or_fetch_from_next(OracleResultSetImpl.java:275(Compiled Code))
4XESTACKTRACE at oracle/jdbc/driver/OracleResultSetImpl.next(OracleResultSetImpl.java:228(Compiled Code))
4XESTACKTRACE at weblogic/jdbc/wrapper/ResultSet_oracle_jdbc_driver_OracleResultSetImpl.next(Bytecode PC:20(Compiled Code))
4XESTACKTRACE at com/linkage/appframework/data/DatasetResult.<init>(DatasetResult.java:28(Compiled Code))
4XESTACKTRACE at com/linkage/dbframework/jdbc/DaoManager.queryList(DaoManager.java:1612(Compiled Code))
4XESTACKTRACE at com/linkage/dbframework/jdbc/DaoManager.queryList(DaoManager.java:1741(Compiled Code))
4XESTACKTRACE at com/linkage/dbframework/jdbc/DaoManager.queryList(DaoManager.java:1756(Compiled Code))
4XESTACKTRACE at com/linkage/dbframework/BaseEntity.queryList(BaseEntity.java:246(Compiled Code))
4XESTACKTRACE at com/linkage/dbframework/BaseEntity.queryListBySqlstoreParser(BaseEntity.java:566(Compiled Code))
4XESTACKTRACE at com/linkage/cencustmgr/queryserverinfo/dao/QueryServerInfoDAO.queryServerInfo(QueryServerInfoDAO.java:61)
4XESTACKTRACE at com/linkage/cencustmgr/queryserverinfo/bean/QueryServerInfoBean.queryCheckRecord(QueryServerInfoBean.java:74)
4XESTACKTRACE at com/linkage/cencustmgr/queryserverinfo/page/QueryServerInfo.exportExcel(QueryServerInfo.java:170)
4XESTACKTRACE at sun/reflect/NativeMethodAccessorImpl.invoke0(Native Method)
11. 根据2即可定位到是具体应用的具体Java文件的具体方法,定位具体问题,然后再去分析该代码即可
3、WADE应用javacore文件分析小技巧
1、后台定位
1. 找到进程ID(可通过topas,或ps -ef grep ServerName等方法),执行kill -3命令,生成javacore文件
12. vi 1 生成的javacore文件,查找queryList或export关键字符,即可快速定位问题代码
2、前台定位
1. 登录WebLogic控制台,打开有问题Server,进入Performance监控页,生成Dump Thread Stacks内容;
13. 查找queryList或export关键字符,即可快速定位问题代码;
说明:
针对javacore文件分析,要灵活运行相关信息。对于WADE的在线系统,一般都是查询SQL引起Server挂死问题,所以可以通过queryList,export这些关键字快速定位问题,针对非WADE应用,该思路同样适应。
四、WebLogic应急方案
1、WebLogic控制台假死
一般是因为子Server挂死导致,需要把挂死的Server停掉或重启即可;AdminServer有相应的停启脚本stopadm.sh、startadm.sh,其它Server可以用ps -ef grep "ServerName" 取到对应的ID,然后kill -9 $ID即可;
2、Server挂死
一般是因为应用程序问题导致,紧急情况下可以在Console里重启,也可以直接kill -9 $ID,Server会自动重启;
五、Server访问地址
说明:
格式:[系统子模块]:http://IP:端口/[应用根路径对应weblogic.xml的contextroot]/[应用servlet名对应应用程序web.xml的servlet]
1、 前台WEB应用
web入口1:
http://10.131.39.75:8080
web入口2:
http://10.131.39.76:8080
web入口3:
http://10.131.39.77:8080
web入口4:
http://10.131.39.78:8080
客服入口:
http://10.131.39.77:8081
代理商入口:
http://10.131.39.78:8081
NGBOSS:
http://10.131.39.75:7101/ngboss/ngboss
http://10.131.39.76:7101/ngboss/ngboss
http://10.131.39.77:7101/ngboss/ngboss
http://10.131.39.78:7101/ngboss/ngboss
个人业务:
http://10.131.39.75:8101/saleserv/saleserv
http://10.131.39.75:8102/saleserv/saleserv
http://10.131.39.75:8103/saleserv/saleserv
http://10.131.39.76:8201/saleserv/saleserv
http://10.131.39.76:8202/saleserv/saleserv
http://10.131.39.76:8203/saleserv/saleserv
http://10.131.39.77:8301/saleserv/saleserv
http://10.131.39.77:8302/saleserv/saleserv
http://10.131.39.77:8303/saleserv/saleserv
http://10.131.39.78:8401/saleserv/saleserv
http://10.131.39.78:8402/saleserv/saleserv
http://10.131.39.78:8403/saleserv/saleserv
集团业务:
http://10.131.39.75:8109/groupserv/saleserv
http://10.131.39.76:8209/groupserv/saleserv
http://10.131.39.77:8309/groupserv/saleserv
http://10.131.39.78:8409/groupserv/saleserv
账务管理:
http://10.131.39.75:8111/acctmanm/acctmanm
http://10.131.39.75:8112/acctmanm/acctmanm
http://10.131.39.75:8113/acctmanm/acctmanm
http://10.131.39.76:8211/acctmanm/acctmanm
http://10.131.39.76:8212/acctmanm/acctmanm
http://10.131.39.76:8213/acctmanm/acctmanm
http://10.131.39.77:8311/acctmanm/acctmanm
http://10.131.39.77:8312/acctmanm/acctmanm
http://10.131.39.77:8313/acctmanm/acctmanm
http://10.131.39.78:8411/acctmanm/acctmanm
http://10.131.39.78:8412/acctmanm/acctmanm
http://10.131.39.78:8413/acctmanm/acctmanm
客户管理:
http://10.131.39.75:8121/custmanm/custmanm
http://10.131.39.76:8221/custmanm/custmanm
http://10.131.39.77:8321/custmanm/custmanm
http://10.131.39.78:8421/custmanm/custmanm
资源管理:
http://10.131.39.75:8131/resmanm/resmanm
http://10.131.39.76:8231/resmanm/resmanm
http://10.131.39.77:8331/resmanm/resmanm
http://10.131.39.78:8431/resmanm/resmanm
稽核管理:
http://10.131.39.75:8131/rasmanm/rasmanm
http://10.131.39.76:8231/rasmanm/rasmanm
http://10.131.39.77:8331/rasmanm/rasmanm
http://10.131.39.78:8431/rasmanm/rasmanm
渠道管理:
http://10.131.39.75:8141/chnlmanm/chnlmanm
http://10.131.39.76:8241/chnlmanm/chnlmanm
http://10.131.39.77:8341/chnlmanm/chnlmanm
http://10.131.39.78:8441/chnlmanm/chnlmanm
统计分析:
http://10.131.39.75:8151/statmanm/statmanm
http://10.131.39.77:8351/statmanm/statmanm
产品管理
http://10.131.39.75:8161/prodmcrm/prodmcrm
http://10.131.39.77:8361/prodmcrm/prodmcrm
http://10.131.39.75:8161/prodmbil/prodmbil
http://10.131.39.77:8361/prodmbil/prodmbil
http://10.131.39.75:8161/bilmanm/bilmanm
http://10.131.39.77:8361/bilmanm/bilmanm
营销管理:
http://10.131.39.76:8271/salemanm/salemanm
http://10.131.39.78:8471/salemanm/salemanm
系统管理:
http://10.131.39.76:8281/sysmanm/sysmanm
http://10.131.39.78:8481/sysmanm/sysmanm
合作伙伴管理:
http://10.131.39.76:8291/copmanm/copmanm
http://10.131.39.78:8491/copmanm/copmanm
2、后台接口应用
内部接口:
http://10.131.39.69:8080/[subsys]/httptran/CrmService
平台服务接口:
http://10.131.39.69:8101/callpf/httptran/CrmService
http://10.131.39.70:8201/callpf/httptran/CrmService
批量业务接口:
http://10.131.39.69:8111/batserv/httptran/CrmService
http:// 10.131.39.70:8211/batserv/httptran/CrmService
个人业务接口:
http://10.131.39.69:8121/saleserv/httptran/CrmService
http://10.131.39.70:8221/saleserv/httptran/CrmService
http://10.131.39.71:8321/saleserv/httptran/CrmService
http://10.131.39.72:8421/saleserv/httptran/CrmService
账务管理接口:
http://10.131.39.69:8121/acctmanm/httptran/CrmService
http://CrmTux2:8221/acctmanm/httptran/CrmService
http://CrmTux3:8321/acctmanm/httptran/CrmService
http://CrmTux4:8421/acctmanm/httptran/CrmService
资源管理接口:
http://10.131.39.69:8121/resmanm/httptran/CrmService
http://CrmTux2:8221/resmanm/httptran/CrmService
http://CrmTux3:8321/resmanm/httptran/CrmService
http://CrmTux4:8421/resmanm/httptran/CrmService
客户管理接口:
http://10.131.39.69:8131/custmanm/httptran/CrmService
http://CrmTux2:8231/custmanm/httptran/CrmService
营销管理接口:
http://CrmTux3:8331/salemanm/httptran/CrmService
http://CrmTux4:8431/salemanm/httptran/CrmService
一级BOSS接口:
http://ItfTux1:8080/[subsys]/httptran/CrmService
个人业务接口:
http://ItfTux1:8501/saleserv/httptran/CrmService
http://ItfTux2:8601/saleserv/httptran/CrmService
http://ActTux1:8701/saleserv/httptran/CrmService
http://ActTux2:8801/saleserv/httptran/CrmService
账务管理接口:
http://ItfTux1:8501/acctmanm/httptran/CrmService
http://ItfTux2:8601/acctmanm/httptran/CrmService
http://ActTux1:8701/acctmanm/httptran/CrmService
http://ActTux2:8801/acctmanm/httptran/CrmService
资源管理接口:
http://ItfTux1:8501/resmanm/httptran/CrmService
http://ItfTux2:8601/resmanm/httptran/CrmService
http://ActTux1:8701/resmanm/httptran/CrmService
http://ActTux2:8801/resmanm/httptran/CrmService
客户管理接口:
http://ItfTux1:8502/custmanm/httptran/CrmService
http://ItfTux2:8602/custmanm/httptran/CrmService
营销管理接口:
http://ItfTux1:8502/salemanm/httptran/CrmService
http://ItfTux2:8602/salemanm/httptran/CrmService
电子渠道接口:
http://ItfTux2:8080/[subsys]/httptran/CrmService
个人业务接口:
http://ItfTux1:8511/saleserv/httptran/CrmService
http://ItfTux2:8611/saleserv/httptran/CrmService
http://ActTux1:8711/saleserv/httptran/CrmService
http://ActTux2:8811/saleserv/httptran/CrmService
账务管理接口:
http://ItfTux1:8511/acctmanm/httptran/CrmService
http://ItfTux2:8611/acctmanm/httptran/CrmService
http://ActTux1:8711/acctmanm/httptran/CrmService
http://ActTux2:8811/acctmanm/httptran/CrmService
资源管理接口:
http://ItfTux1:8511/resmanm/httptran/CrmService
http://ItfTux2:8611/resmanm/httptran/CrmService
http://ActTux1:8711/resmanm/httptran/CrmService
http://ActTux2:8811/resmanm/httptran/CrmService
客户管理接口:
http://ItfTux1:8512/custmanm/httptran/CrmService
http://ItfTux2:8612/custmanm/httptran/CrmService
营销管理接口:
http://ItfTux1:8512/salemanm/httptran/CrmService
http://ItfTux2:8612/salemanm/httptran/CrmService
客服接口:
http://ActTux1:8080/[subsys]/httptran/CrmService
个人业务接口:
http://ItfTux1:8521/saleserv/httptran/CrmService
http://ItfTux2:8621/saleserv/httptran/CrmService
http://ActTux1:8721/saleserv/httptran/CrmService
http://ActTux2:8821/saleserv/httptran/CrmService
账务管理接口:
http://ItfTux1:8521/acctmanm/httptran/CrmService
http://ItfTux2:8621/acctmanm/httptran/CrmService
http://ActTux1:8721/acctmanm/httptran/CrmService
http://ActTux2:8821/acctmanm/httptran/CrmService
资源管理接口:
http://ItfTux1:8521/resmanm/httptran/CrmService
http://ItfTux2:8621/resmanm/httptran/CrmService
http://ActTux1:8721/resmanm/httptran/CrmService
http://ActTux2:8821/resmanm/httptran/CrmService
客户管理接口:
http://ItfTux1:8522/custmanm/httptran/CrmService
http://ItfTux2:8622/custmanm/httptran/CrmService
http://ActTux1:8722/custmanm/httptran/CrmService
http://ActTux2:8822/custmanm/httptran/CrmService
营销管理接口:
http://ItfTux1:8522/salemanm/httptran/CrmService
http://ItfTux2:8622/salemanm/httptran/CrmService
http://ActTux1:8722/salemanm/httptran/CrmService
http://ActTux2:8822/salemanm/httptran/CrmService
六、WebLogic监控指标
1、Server内存
指标:Server最大内存的90%(最大内存可以在控制台的Server->Monitoring->Performance看到);
说明:默认Server在消耗最大内存的90%时会强制GC(垃圾回收)来释放内存,当内存超过最大内存的90%且长时间无法释放时Server运行异常;
2、Server队列
指标:队列长度的90%(队列长度可以在控制台的Server->Configuration->Queues看到);
说明:默认的队列长度为15,限制百分比为90%。一般情况下,应保留90%或其左右,以应对一些潜在的情况,使得有额外的线程可以去处理一些请求中的异常。
3、Server线程数
指标:最大线程的90%(线程数可以在控制台的Server->Monitoring->Threads看到);
说明:默认最大线程数为400
4、JDBC连接数
指标:最大连接数的90%(JDBC最大连接数可以在控制台的Service->JDBC->DataSource实列里看到);
说明:若有连接长时间不释放则需要DBA配合分析原因;
5、WTC连接
指标:连接正常(WTCServer->Ctrl是否connected)
说明:WTC队列监控在Tuxedo,WebLogic这边只能看到是否正常连接
6、JMS连接
指标:连接正常(Interoperability->WTC Servers->Connected是否为true)
说明:JMS为Java消息服务,连接正常即可
七、WADE的JDBC连接
WADE的JDBC是页面级事务,并且在每个lietener动作后都会释放连接,无论是否有异常抛出都会处理,在WADE框架这一层不会有连接池泄漏。