一个朋友的数据库在经过自己的千辛万苦终于open成功,但是几分钟就down掉,使得他想导出数据重建库的目标不能实现,让我帮忙处理。
alert日志报ORA-00600[kafspa:columnBuffer1]
Wed Aug 8 10:55:31 2012
Completed: ALTER DATABASE OPEN
Wed Aug 8 10:55:41 2012
Errors in file /oracle/ora10/admin/ora10g/udump/ora10g_ora_12160.trc:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []
Wed Aug 8 10:55:47 2012
Non-fatal internal error happenned while SMON was doing temporary segment drop.
SMON encountered 2 out of maximum 100 non-fatal internal errors.
Wed Aug 8 10:55:47 2012
Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []
Wed Aug 8 10:55:58 2012
Non-fatal internal error happenned while SMON was doing temporary segment drop.
SMON encountered 3 out of maximum 100 non-fatal internal errors.
Wed Aug 8 10:55:59 2012
Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []
Wed Aug 8 10:56:10 2012
Non-fatal internal error happenned while SMON was doing temporary segment drop.
SMON encountered 4 out of maximum 100 non-fatal internal errors.
Wed Aug 8 10:56:11 2012
Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []
Wed Aug 8 10:56:22 2012
Non-fatal internal error happenned while SMON was doing temporary segment drop.
SMON encountered 5 out of maximum 100 non-fatal internal errors.
Wed Aug 8 10:56:32 2012
Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []
Wed Aug 8 10:56:43 2012
Non-fatal internal error happenned while SMON was doing temporary segment drop.
SMON encountered 6 out of maximum 100 non-fatal internal errors.
Wed Aug 8 10:56:53 2012
Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []
Wed Aug 8 10:57:04 2012
Non-fatal internal error happenned while SMON was doing temporary segment drop.
SMON encountered 7 out of maximum 100 non-fatal internal errors.
Wed Aug 8 10:57:14 2012
Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []
Wed Aug 8 10:57:25 2012
Non-fatal internal error happenned while SMON was doing temporary segment drop.
SMON encountered 8 out of maximum 100 non-fatal internal errors.
Wed Aug 8 10:57:35 2012
Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []
Wed Aug 8 10:57:38 2012
Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_pmon_8856.trc:
ORA-00474: SMON process terminated with error
Wed Aug 8 10:57:38 2012
PMON: terminating instance due to error 474
Instance terminated by PMON, pid = 8856
这里出现ORA-00600[kafspa:columnBuffer1],一个未知的错误,但是根据相关的提示,可以大概猜出来是什么原因导致数据库异常
Non-fatal internal error happenned while SMON was doing temporary segment drop.
出现这个错误,使得我们想到一个smon的功能,清理临时段.该数据库down掉很可能和smon清理临时段的过程发生失败有关系
SMON encountered 8 out of maximum 100 non-fatal internal errors.
这个错误提示是因为smon内部最多允许发生100次错误,记录错误发生了8次,当然这次数据库down掉是smon还没有达到100次就直接abort掉
SQL> col name for a32
SQL> col value for a24
SQL> col description for a70
SQL> set linesize 150
SQL> select a.ksppinm name,b.ksppstvl value,a.ksppdesc description
2 from x$ksppi a,x$ksppcv b
3 where a.inst_id = USERENV ('Instance')
4 and b.inst_id = USERENV ('Instance')
5 and a.indx = b.indx
6 and upper(a.ksppinm) LIKE upper('%¶m%')
7 order by name
8
SQL> /
Enter value for param: smon_internal_errlimit
old 6: and upper(a.ksppinm) LIKE upper('%¶m%')
new 6: and upper(a.ksppinm) LIKE upper('%smon_internal_errlimit%')
NAME VALUE DESCRIPTION
-------------------------------- ------------------------ -----------------------------------
_smon_internal_errlimit 100 limit of SMON internal errors
分析trace文件
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /oracle/ora10/product
System name: Linux
Node name: DBN-HLD-155-2
Release: 2.6.18-92.el5PAE
Version: #1 SMP Tue Apr 29 13:31:02 EDT 2008
Machine: i686
Instance name: ora10g
Redo thread mounted by this instance: 1
Oracle process number: 8
Unix process pid: 8941, image: oracle@DBN-HLD-155-2 (SMON)
*** SERVICE NAME:() 2012-08-08 10:55:20.208
*** SESSION ID:(274.1) 2012-08-08 10:55:20.208
*** 2012-08-08 10:55:20.208
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [25027], [0], [0], [], [], [], [], []
Current SQL statement for this session:
select file#, block#, ts# from seg$ where type# = 3
果然是smon在查询type#=3的时候发现异常,出现ORA-00600[25027]错误.通过对seg$相关视图分析,可以知道type#=3表示临时段,也就是说数据库smon在查询哪
些segment是临时段的时候发生意外,而结合alert日志,完整的错误应该就是:数据库启动后,smon进程为了清理临时段,需要通过select file#, block#, ts#
from seg$ where type# = 3查询临时段,但是在查询的时候发生错误.而这个错误累积几次导致数据库异常关闭。
解决问题
--修改两个参数
event='10061 trace name context forever, level 10'
_smon_internal_errlimit=1000000
--启动数据库
SQL> startup pfile='/tmp/pfile'
ORACLE instance started.
Total System Global Area 1577058304 bytes
Fixed Size 1267716 bytes
Variable Size 385878012 bytes
Database Buffers 1174405120 bytes
Redo Buffers 15507456 bytes
Database mounted.
Database opened.
因为屏蔽了smon回收临时段,数据库未出现开始时错误,观察数据库几个小时,运行正常,到此基本上解决了此次异常,通过exp可以顺利导出数据然后导入到新库
中.
导致该异常sql分析
SQL> select file#, block#, ts# from seg$ where type# = 3;
select file#, block#, ts# from seg$ where type# = 3
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [25027], [0], [0], [], [], [], [],
[]
SQL> set autot trace exp
SQL> select file#, block#, ts# from seg$ where type# = 3;
Execution Plan
----------------------------------------------------------
Plan hash value: 1605285479
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 150 | 389 (1)| 00:00:05 |
|* 1 | TABLE ACCESS FULL| SEG$ | 10 | 150 | 389 (1)| 00:00:05 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("TYPE#"=3)
SQL> select count(rowid) from seg$;
Execution Plan
----------------------------------------------------------
Plan hash value: 763549841
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 12 | 389 (1)| 00:00:05 |
| 1 | SORT AGGREGATE | | 1 | 12 | | |
| 2 | TABLE ACCESS FULL| SEG$ | 45682 | 535K| 389 (1)| 00:00:05 |
---------------------------------------------------------------------------
SQL> select /*+ full(t) */ count(*) from seg$;
Execution Plan
----------------------------------------------------------
Plan hash value: 763549841
-------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 29 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | TABLE ACCESS FULL| SEG$ | 3223 | 29 (0)| 00:00:01 |
-------------------------------------------------------------------
SQL> select ts# from seg$;
Execution Plan
----------------------------------------------------------
Plan hash value: 1605285479
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3223 | 9669 | 29 (0)| 00:00:01 |
| 1 | TABLE ACCESS FULL| SEG$ | 3223 | 9669 | 29 (0)| 00:00:01 |
--------------------------------------------------------------------------
SQL> set autot off
SQL> select count(rowid) from seg$;
SQL> /
COUNT(ROWID)
------------
45727
SQL> select /*+ full(t) */ count(*) from seg$;
COUNT(*)
----------
45727
SQL> select ts# from seg$;
0
0
……
7
7
ORA-00600: internal error code, arguments: [25027], [0], [0], [], [], [], [],
[]
SQL> !dbv file='/oracle/ora10/oradata/ora10g/system01.dbf'
DBVERIFY: Release 10.2.0.4.0 - Production on Thu Aug 9 14:05:09 2012
Copyright (c) 1982, 2007, Oracle. All rights reserved.
DBVERIFY - Verification starting : FILE = /oracle/ora10/oradata/ora10g/system01.dbf
Block Checking: DBA = 4225673, Block Type = KTB-managed data block
data header at 0xb6d8225c
kdbchk: bad row offset slot 6 offs 3030 fseo 3752 dtl 8168 bhs 72
Page 31369 failed with check code 6135
Block Checking: DBA = 4236289, Block Type = KTB-managed data block
data header at 0xb6d7225c
kdbchk: tosp bad (-13399)
Page 41985 failed with check code 6127
DBVERIFY - Verification complete
Total Pages Examined : 192000
Total Pages Processed (Data) : 47588
Total Pages Failing (Data) : 2
Total Pages Processed (Index): 40929
Total Pages Failing (Index): 0
Total Pages Processed (Other): 1784
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 101699
Total Pages Marked Corrupt : 0
Total Pages Influx : 0
Highest block SCN : 1775671440 (0.1775671440)
SQL> ANALYZE TABLE sys.SEG$ VALIDATE STRUCTURE CASCADE;
ANALYZE TABLE sys.SEG$ VALIDATE STRUCTURE CASCADE
*
ERROR at line 1:
ORA-01498: block check failure - see trace file
通过这里的分析大概可以确定是由于两块KTB-managed data block数据块异常,导致直接对seg$进行TABLE ACCESS FULL操作的时候发生异常.因为这个库已经破
坏了数据一致性,先导出来数据,至于出现该错误的原因,后续继续关注分析。
--转自