Oracle大力推荐ASM磁盘,但是ASM对于我们来说就是个黑盒子。如果出现错误,都不知道该
怎么办。下面的这个case让我们对ASM有一些了解。总的来说,备份是最重要的。
当我准被创建一个tablespace的时候,遇到了ASM错误:
create tablespace aa_data
datafile
'+DATA/dbs101/aa_data01.dbf' size 20M
EXTENT MANAGEMENT LOCAL AUTOALLOCATE
SEGMENT SPACE MANAGEMENT AUTO
/
ORA-01119: error in creating database file '+DATA/dbs101/aa_data01.dbf'
ORA-17502: ksfdcre:4 Failed to create file +DATA/dbs101/aa_data01.dbf
ORA-15130: diskgroup "DATA" is being dismounted
ORA-15066: offlining disk "DATAVOL1" may result in a data loss
从错误看disk "DATAVOL1"有数据丢失:
查看asm实例的alert.log:
WARNING: IO Failed. group:1 disk(number.incarnation):0.0xe96892e8 disk_path:ORCL:DATAVOL1
AU:2 disk_offset(bytes):2097152 io_size:4096 operation:Read type:synchronous
result:I/O error process_id:11679
WARNING: cache failed reading from group=DATA fn=1 blk=0 count=1 from disk= 0 DATAVOL1 kfkist=0x20 status=0x02 file=kfc.c line=10225
ERROR: cache failed to read group=DATA fn=1 blk=0 from disk(s): 0 DATAVOL1
ORA-15080: synchronous I/O operation to a disk failed
System State dumped to trace file /u01/app/grid/diag/asm/+asm/+ASM/trace/+ASM_ora_11679.trc
从错误看disk "DATAVOL1"有坏块,然后用dbv和rman检查坏块:
使用dbv检查,没有发现坏块。
dbv file='+DATA/dbs101/users01.dbf' userid=sys/"***" feedback=100 >> aa.txt 2>&1
使用rman检查,也没有发现坏块。
backup validate check logical database;
select count(*) from v$database_block_corruption;
没有发现任何坏块:
偶然地发现在/u01/app/grid/diag/asm/+asm/+ASM/trace/目录下有这么个目录:
amdu_2011_04_26_17_13_28,里面有个文件report.txt。
---------------------------- SCANNING DISK N0002 -----------------------------
Disk N0002: 'ORCL:DATAVOL1'
AMDU-00407: asmlib error!! function = [asm_close], error = [0], mesg = [I/O Error]
AMDU-00200: Unable to read [262144] bytes from Disk N0002 at offset [2097152]
AMDU-00201: Disk N0002: 'ORCL:DATAVOL1'
Allocated AU's: 3
Free AU's: 0
AU's read for dump: 2
Block images saved: 512
Map lines written: 2
Heartbeats seen: 0
Corrupt metadata blocks: 0
Corrupt AT blocks: 0
明明白白说有坏块。看看硬件有没有问题:
dmesg|more
Info fld=0x1fa81d1, Current sda: sense key Medium Error
Additional sense: Data synchronization mark error
end_request: I/O error, dev sda, sector 33194449
scsi6: ERROR on channel 0, id 0, lun 0, CDB: Read (10) 00 01 fa 81 d1 00 02 00 0
扇区3319449 I/O错误。网上有个修复这种错误的方法:
http://blogold.chinaunix.net/u1/46701/showart.php?id=677428
不过这个方法是针对文件系统的:
再查oracle 11g的ASM文档,有个remap命令,可以标志物理坏块:
I caculate the corrupt block as 84926 * 1024 * 1024 / 512 = 173928448
84927 * 1024 * 1024 / 512 = 173930496
remap DATA DATAVOL1 173928447-173928448
remap DATA DATAVOL1 173930496-173930496
amdu -dump 'DATA'
remap DATA DATAVOL1 173928448-173928448
重新创建tablespace,还是同样的错误:
再查看amdu_2011_04_26_17_13_28,发现amdu是个asm的新工具,可以用来抽取并恢复asm数据。
但是没有逻辑的corrupt,也用不上。
决定先释放空间,再drop坏的disk,然后重新初始化disk,最后加回去disk。
drop tablespace data01 including contents and datafiles;
当执行这个命令,出现错误,然后data磁盘组再也不能mount了。
好在有备份,只有重新创建DATA磁盘组:
/etc/oracleasm query DATAVOL1
/etc/oracleasm query DATAVOL2
/etc/oracleasm query DATAVOL3
/etc/oracleasm query DATAVOL4
找到磁盘对应的设备:
[root@pft ~]# /etc/init.d/oracleasm querydisk DATAVOL1
Disk "DATAVOL1" is a valid ASM disk on device [8, 5]
[root@pft ~]# /etc/init.d/oracleasm querydisk DATAVOL2
Disk "DATAVOL2" is a valid ASM disk on device [8, 17]
[root@pft ~]# /etc/init.d/oracleasm querydisk DATAVOL3
Disk "DATAVOL3" is a valid ASM disk on device [8, 33]
[root@pft ~]# /etc/init.d/oracleasm querydisk DATAVOL4
Disk "DATAVOL4" is a valid ASM disk on device [8, 49]
[root@pft ~]#
ls -la /dev/ | grep 8,
brw-rw---- 1 root disk 8, 5 Apr 27 21:15 sda5
brw-rw---- 1 root disk 8, 17 Apr 27 21:15 sdb1
brw-rw---- 1 root disk 8, 33 Apr 27 21:15 sdc1
brw-rw---- 1 root disk 8, 49 Apr 27 21:15 sdd1
清除重新初始化磁盘:
dd if=/dev/zero f=/dev/sda5 bs=1048576 count=50
dd if=/dev/zero f=/dev/sdb1 bs=1048576 count=50
dd if=/dev/zero f=/dev/sdc1 bs=1048576 count=50
dd if=/dev/zero f=/dev/sdd1 bs=1048576 count=50
删除asm磁盘
/etc/oracleasm deletedisk DATAVOL1
/etc/oracleasm deletedisk DATAVOL2
/etc/oracleasm deletedisk DATAVOL3
/etc/oracleasm deletedisk DATAVOL4
创建asm磁盘
/etc/oracleasm createdisk DATAVOL1 /dev/sda5
/etc/oracleasm createdisk DATAVOL2 /dev/sdb1
/etc/oracleasm createdisk DATAVOL3 /dev/sdc1
/etc/oracleasm createdisk DATAVOL4 /dev/sdd1
创建磁盘组:
use asmca recreate disk group
重新导入数据。
总结:一定要做好备份,特别是像ASM这种黑盒子磁盘。
1. 数据库的备份
2. ASM元数据的备份
11g
md_backup/md_restore
10g
dd if=device name f=unique header file bs=4096 count=1
3. amdu工具抽取磁盘数据
dump出磁盘组DATA
amdu -dump 'DATA'
从image中找出文件sequence号
strings DATA_0001.img |more
...
DATAFILE
system01.dbf
sysaux01.dbf
undotbs01.dbf
users01.dbf
CONTROLFILE
control01.ctl
一般256的文件时control01.ctl文件,如果不是可以向下计算。
amdu -extract 'DATA.256'
256是SYSTEM表空间:
strings DATA_256.f |more
}|{z
M5#PFT10G
SYSTEM
_SYSSMU2$
从256开始数,260是控制文件。
amdu -extract 'DATA.260'
4. KFED tool
kfed read disk
kfed repair disk
--转自