今天在生产库上碰到一个ORA-00060 Deadlock Detected报错,这种错误和ORA-00600内核错误代码很相似,稍不注意就混在一起了,这个错都会显示上alert.log里面,同时会在user_dump_dest或backgroud_dump_dest目录下产生一下.trc文件,对于新手而言拿到.trc文件可能不知道如何分析,以我的生产库下.trc文件内容为例进行分析!
--------------------------------------------------------------------------------------------------
/dbfs/oradata/admin/htdb/udump/htdb_ora_423766.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /oracle/product/10.2.0.5
System name: AIX
Node name: ECMora01
Release: 3
Version: 5
Machine: 00C05BB64C00
Instance name: htdb
Redo thread mounted by this instance: 1
Oracle process number: 380
Unix process pid: 423766, image: oracle@ECMora01
对于这部分内核是数据库和主机的一些版本信息,看一言就OK了,信息价值量不大
--------------------------------------------------------------------------------------------------
*** 2013-07-31 17:12:37.244
*** SERVICE NAME:(SYS$USERS) 2013-07-31 17:12:37.215
*** SESSION ID:(1405.59555) 2013-07-31 17:12:37.215
DEADLOCK DETECTED ( ORA-00060 )
[Transaction Deadlock]
The following deadlock is not an ORACLE error. It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL. The following
information may aid in determining the deadlock:
这部分内容告诉我们发生的具体时间,SESSION ID:(1405.59555)这个会话抛出的错误,是同时告诉我们这个错误不是ORACLE错误,是应用程序SQL设计问题引起的,需要交由开发人员处理。
--------------------------------------------------------------------------------------------------
Deadlock graph:
---------Blocker(s)-------- ---------Waiter(s)---------
Resource Name process session holds waits process session holds waits
TX-00730018-0000c55b 380 1405 X 306 1538 X
TX-00ac0009-0000715b 306 1538 X 380 1405 X
session 1405: DID 0001-017C-002C6A2B session 1538: DID 0001-0132-00375797
session 1538: DID 0001-0132-00375797 session 1405: DID 0001-017C-002C6A2B
其中Resource Name 中TX说明死锁属于TX类型,常见的还有TM类型的锁,后面有时间再分析,最后一个X表示独占模式
session 1405这个会话以独占模式占有资源"DID 0001-017C-002C6A2B "现在又需要独占资源"DID 0001-0132-00375797 "但是却被session 1538会话以独占的模式占有,与此同时对于session 1538这个会话来讲,他现在也想独占session 1405占用的资源"TX-00730018-0000c55b",可能有点绕,打比方说,就好比两辆车对向行驶在单车道上,张三和李四都想让对方往后倒车,把路(资源)给自己让出来,却谁都不往后倒车,僵持很久互不相让,只能再这条路上死等,这就是产生死锁的原理。
死锁不是ORACLE错误,是应用程序设计存在问题导致死锁的发生
--------------------------------------------------------------------------------------------------
Rows waited on:
Session 1538: obj - rowid = 000287D7 - AAAofXAArAABLIlAAy
(dictionary objn - 165847, file - 43, block - 307749, slot - 50)
Session 1405: obj - rowid = 000287D7 - AAAofXAAcAAFkfnABb
(dictionary objn - 165847, file - 28, block - 1460199, slot - 91)
Information on the OTHER waiting sessions:
这部分告诉我们以下信息,Session 1538等待这个对像objec_id:165847,文件号:43 数据库块:307749,行号是:AAAofXAArAABLIlAAy
可以跟据以下SQL查询对像、以行文件或行号等信息
SQL> SELECT owner, object_name, object_type FROM dba_objects WHERE object_id = 165847;
OWNER OBJECT_NAME OBJECT_TYPE
-------------- ---------------------- -------------------
CUECM T_CONTRACT_TAXINFO TABLE
SQL> SELECT * FROM T_CONTRACT_TAXINFO WHERE ROWID='AAAofXAArAABLIlAAy';
ID PAYMODE SUBJECTID SUBJECTNAME
-------------- --------------------- ------------ --------------------
983459855 2 286409107 建筑安装工程承包合同
--------------------------------------------------------------------------------------------------
Session 1538:
sid: 1538 ser: 40397 audsid: 38442408 user: 191/CUECM
flags: (0x100041) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
flags2: (0x8)
pid: 306 O/S info: user: oracle, term: UNKNOWN, ospid: 763378
image: oracle@ECMora01
O/S info: user: , term: , ospid: 1234, machine: ECM_APP2_02
program:
Current SQL Statement:
这个部分显示了除产生死锁外的其它等待信息,不用关注
--------------------------------------------------------------------------------------------------
update t_contract_taxinfo a set a.locked=0,a.updateDate=sysdate,a.taxInfoStatus=1 。。。。。。。
End of information on OTHER waiting sessions.
Current SQL statement for this session:
update t_contract_taxinfo a set a.locked=0,a.updateDate=sysdate,payedTaxTotalAmount。。。。。。。
这部分显示产生死锁的两个session涉及到的SQL
--------------------------------------------------------------------------------------------------
产生死锁原因:
1.业务逻辑设计不合理,高并发下产生死锁
2.外键列没有索引,对外键列更新时产生死锁
3.SQL执行效率低下导致死锁
如何查询那些外键没有索引
select table_name,
constraint_name,
cname1 || nvl2(cname2, ',' || cname2, null) ||
nvl2(cname3, ',' || cname3, null) ||
nvl2(cname4, ',' || cname4, null) ||
nvl2(cname5, ',' || cname5, null) ||
nvl2(cname6, ',' || cname6, null) ||
nvl2(cname7, ',' || cname7, null) ||
nvl2(cname8, ',' || cname8, null) columns
from (select b.table_name,
b.constraint_name,
max(decode(position, 1, column_name, null)) cname1,
max(decode(position, 2, column_name, null)) cname2,
max(decode(position, 3, column_name, null)) cname3,
max(decode(position, 4, column_name, null)) cname4,
max(decode(position, 5, column_name, null)) cname5,
max(decode(position, 6, column_name, null)) cname6,
max(decode(position, 7, column_name, null)) cname7,
max(decode(position, 8, column_name, null)) cname8,
count(*) col_cnt
from (select substr(table_name, 1, 30) table_name,
substr(constraint_name, 1, 30) constraint_name,
substr(column_name, 1, 30) column_name,
position
from user_cons_columns) a,
user_constraints b
where a.constraint_name = b.constraint_name
and b.constraint_type = 'R'
group by b.table_name, b.constraint_name) cons
where col_cnt > ALL
(select count(*)
from user_ind_columns i
where i.table_name = cons.table_name
and i.column_name in (cname1, cname2, cname3, cname4, cname5,
cname6, cname7, cname8)
and i.column_position <= cons.col_cnt
group by i.index_name)
--转自
该贴由koei123转至本版2015-6-1 14:57:11