[原创]TT数据库问题处理_MySQL, Oracle及数据库讨论区_Weblogic技术|Tuxedo技术|中间件技术|Oracle论坛|JAVA论坛|Linux/Unix技术|hadoop论坛

总帖数

每页帖数

1/1页

返回列表

发起投票

查看: 4692 | 回复: 0

主题： [原创]TT数据库问题处理

jie.liang

注册用户

等级：少校
经验：1003
发帖：77
精华：0
注册：2013-10-11
状态：离线
发送短消息息给jie.liang

加好友发送短消息息给jie.liang

发消息

发表于：

2014-3-19 17:47:30 | [全部帖] [楼主帖]

楼主

1、问题描述
系统中业务量增加，前端应用大量并发连接到TT数据库时，TT数据库出现主进程崩溃，最终导致TT数据库宕机。
2、错误日志

错误日志：

TT14000: TimesTen daemon internal error

导致原因：

The TimesTen daemon vanished because it was overwhelmed by all the threads it was managing since it had to handle a thread for each server or direct connection to it. In this particular user environment, it could not handle 1219 concurrent users. Note that this article describes one particular way that the TimesTen main daemon has crashed, do not conclude that this is the explanation for all cases where the main daemon has vanished.
The customer is using the default settings for MaxConnsPerServer=1, which creates a new child ttcserver process for the duration of of new client connection. This results in a lot of operating system resources being used by the daemon to handle that many connected processes. Please review document:1184993.1 for further information about TimesTen server configuration settings.

日志2：

错误日志：

TT40046-18449-0003-bcStuff01096: TT_CACHEGROUP Error: [TimesTen]TT0703: Subdaemon connect to data store failed with error TT9999

导致原因：

The cause of this ttRepAdmin failing is bug 13952486m which can only occur in 11.2.2.0 thru 11.2.2.2.0. This was a bug in the daemon code such that if OS semaphore setting, semmsl, is set to a value between 2155 and 2205, ttRepAdmin -duplicate will fail with errors and the sudaemon may core dump, this is because the bug in daemon code when connections values is set between 2000 and 2047 that is not handle properly and duplicate uses a value of connections=2045 when it does it work, irregardless of what dsn value you had set for connections.
This also means that a these errors and subdaemon core dump could happen when loading the data store and using a dsn setting of Connections between 2000 and 2047.

3、问题分析

从问题的报错信息及metalink中给出导致错误的信息来看，前端应用多线程并发连接TT数据库，TT数据库在进行并发线程分配时出现问题，导致主进程崩溃，TT数据库宕机。原系统中/etc/system下面的部分配置参数也会导致问题出现，比如参数semmsl的值设置在2000到2047之间就是主进程代码的一个bug。操作系统本身支持多线程，但是默认值达不到目前应用要求，建议修改并发机制来响应前端应用的连接。

系统中的CORE文件产生可以归结为两种方式：（1）由TT数据库崩溃导致；（2）由应用程序代码“BUG”产生，现对TT整改。

4、建议方案

修改TT并发连接参数，增加并发连接，同时修改操作系统内核参数。因为solaris从5.10版本之后系统内核参数整改，大部分参数删除或者过时，建议删除当前在/etc/system下增加的配置信息，统一添加到新版本的内核参数调整模式中。

5、操作步骤

（1）关闭应用进程，TT进程

（2）在当前的TT配置文件中添加如下参数，目的增加并发连接

----- ttendaemon.options添加下面配置信息
-MaxConnsPerServer 100
-ServersPerDSN 40
-ServerPool 10

（3）root用户登陆，注销在/etc/system配置文件中添加的配置信息，删除之前配置的project，重新添加project

projdel ttadmin
projadd -U ttadmin -p 1000 ttadmin
projmod -a -K "process.max-msg-qbytes=(priv,65536,deny)" ttadmin;
projmod -a -K "process.max-msg-messages=(priv,65536,deny)" ttadmin;
projmod -a -K "process.max-sem-ops=(priv,4096,deny)" ttadmin;
projmod -a -K "process.max-sem-nsems=(priv,4096,deny)" ttadmin;
projmod -a -K "project.max-shm-memory=(priv,50GB,deny)" ttadmin;
projmod -a -K "project.max-shm-ids=(priv,2048,deny)" ttadmin;
projmod -a -K "project.max-msg-ids=(priv,2048,deny)" ttadmin;
projmod -a -K "project.max-sem-ids=(priv,2048,deny)" ttadmin;
newtask -p ttadmin -c $$

（4）重启服务器

（5）重启TT

newtask -p ttadmin -c $$su -ttadmin
newtask -p ttadmin -c $$
ttdaemonadmin –start