邮政项目故障现象
报错信息:2006-08-15 20:23:10到20:23:35
[quote] Aug 15 20:00:02 racdb2 RFHA [941]: rsfmon monitoring OK on racdb2,(hostname racdb2, ID 7f24890e) Aug 15 20:23:16 racdb2 kernel: EXT2-fs error (device lvm(58,38)): ext2_check_page: bad entry in directory #1922: unaligned directory entry - offset=0, inode=3191362671, rec_len=58386, name_len=231 Aug 15 20:23:32 racdb2 kernel: EXT2-fs error (device lvm(58,38)): ext2_check_page: bad entry in directory #2083: directory entry across blocks - offset=0, inode=4216550099, [/quote] rec_len=58692,name_len=255 Aug 15 20:23:35 racdb2 kernel: EXT2-fs error (device lvm(58,38)):ext2_check_page: bad entry in directory #2082: inode out of bounds - offset=0, inode=4257133918, rec_len=960, name_len=251 Aug 15 20:23:35 racdb2 kernel: EXT2-fs error (device lvm(58,38)):ext2_check_page: bad entry in directory #2081: unaligned directory entry - offset=0, inode=298188045, rec_len=281, name_len=15 en=155 |
报错信息:2006-08-1520:23:49到20:23:51
Aug 15 20:23:49 racdb2 kernel: attempt to access beyond end of device Aug 15 20:23:49 racdb2 kernel: 3a:26: rw=0, want=1369542076, limit=2097152 Aug 15 20:23:49 racdb2 kernel: attempt to access beyond end of device Aug 15 20:23:50 racdb2 kernel: attempt to access beyond end of device Aug 15 20:23:50 racdb2 kernel: 3a:26: rw=0, want=1305580832, limit=2097152 Aug 15 20:23:51 racdb2 kernel: attempt to access beyond end of device Aug 15 20:23:51 racdb2 kernel: 3a:26: rw=0, want=1026132440, limit=2097152 |
报错信息:2006-08-16 11:16:13到11:16:16
Aug 16 11:16:13 racdb2 kernel: attempt to access beyond end of device Aug 16 11:16:13 racdb2 kernel: 3a:26: rw=0, want=1082717648, limit=2097152 Aug 16 11:16:13 racdb2 kernel: attempt to access beyond end of device Aug 16 11:16:14 racdb2 kernel: 3a:26: rw=0, want=633372812, limit=2097152 Aug 16 11:16:14 racdb2 kernel: attempt to access beyond end of device Aug 16 11:16:14 racdb2 kernel: 3a:26: rw=0, want=1149826448, limit=2097152 Aug 16 11:16:15 racdb2 kernel: attempt to access beyond end of device Aug 16 11:16:15 racdb2 kernel: 3a:26: rw=0, want=276859076, limit=2097152 Aug 16 11:16:15 racdb2 kernel: attempt to access beyond end of device Aug 16 11:16:16 racdb2 kernel: 3a:26: rw=0, want=1972227480, limit=2097152 Aug 16 11:16:16 racdb2 kernel: attempt to access beyond end of device |
报错信息:2006-08-16 11:32:28到11:32:30
Aug 16 11:32:28 racdb2 kernel: EXT2-fs error (device lvm(58,39)): ext2_free_blocks: bit already cleared for block 239810 Aug 16 11:32:28 racdb2 kernel: EXT2-fs error (device lvm(58,39)): ext2_free_blocks: bit already cleared for block 239813 Aug 16 11:32:29 racdb2 kernel: EXT2-fs error (device lvm(58,39)): ext2_free_blocks: bit already cleared for block 239814 Aug 16 11:32:30 racdb2 kernel: EXT2-fs error (device lvm(58,39)): ext2_free_blocks: bit already cleared for block 119440 Aug 16 11:32:30 racdb2 kernel: EXT2-fs error (device lvm(58,39)): ext2_free_blocks: bit already cleared for block 119441 |
报错信息:2006-08-16 14:26:40到14:26:42
Aug 16 14:26:40 racdb2 kernel: EXT2-fs error (device lvm(58,38)): ext2_free_blocks: Freeing blocks not in datazone - block = 1635131426, count = 1 Aug 16 14:26:40 racdb2 kernel: EXT2-fs error (device lvm(58,38)): ext2_free_blocks: Freeing blocks not in datazone - block = 790769698, count = 1 Aug 16 14:26:41 racdb2 kernel: EXT2-fs error (device lvm(58,38)): ext2_free_blocks: Freeing blocks not in datazone - block = 3084441314, count = 1 Aug 16 14:26:42 racdb2 kernel: EXT2-fs error (device lvm(58,38)): ext2_free_blocks: Freeing blocks not in datazone - block = 1948263074, count = 1
|
应用程序启动时报错信息:
exec ORASERVER -A -- -a CONSOLE : CMDTUX_CAT:1685: ERROR: Application initialization failure exec ORASERVER -A -- -a HOSTMANAGE : process ... Started. exec ORASERVER -A -- -a SMSUSR : CMDTUX_CAT:1685: ERROR: Application initialization failure exec ORASERVER -A -- -a SMS : CMDTUX_CAT:1685: ERROR: Application initialization failure exec ORASERVER -A -- -a SMSQRY : process ... Started. exec ORASERVER -A -- -a SMSQRY : process ... Started. exec ORASERVER -A -- -a SMSDB : CMDTUX_CAT:1685: ERROR: Application initialization failure
|
文件系统在读取(device lvm(58,38)),(device lvm(58,39))时出现文件I/O错误信息
(racdb2:tsamsg)/home/tsamsg> find . -name ULOG.081606 -print find: ./tsare/config/TSAConsole/DelCom.xml.1/440607320: Input/output error find: ./tsare/config/TSAConsole/DelCom.xml.1/441581003: Input/output error find: ./tsare/config/TSAConsole/DelCom.xml.1/440113003: Input/output error find: ./tsare/config/TSAConsole/DelCom.xml.1/445222020: Input/output error find: ./tsare/config/TSAConsole/DelCom.xml.1/440224001: Input/output error find: ./tsare/config/TSAConsole/SysOprLogon.xml.new.bak: Input/output error
|
以上报错信息是在项目的应用程序启动时系统日志记录下来的日志,日志记录的系统及应用程序故障现象:
1、EXT2-fs文件系统在读取(device lvm(58,38)),(device lvm(58,39))这两个设备资源时在数据区域定位时发生错误,在(device lvm(58,38)),(device lvm(58,39))这两个设备上创建文件时出现“该设备没有空间,无法创建目录”
2、应用程序在初始化时,出现"tpsrvinit () failed",无法初始化应用程序,日志中也表明系统核心尝试读写磁盘阵列分区的信息时发生错误,并且从应用程序的配置文件目录中发现应用程序的大部分配置文件出现损坏,无法修复
故障分析:
从系统日志表明的文件系统引起的错误是EXT2,而项目本身的操作系统的文件系统格式是EXT3,而LVM监管的分区所使用的文件系统是EXT2,由此断定系统故障的根源是在LVM监管的分区上;从项目的硬件架构上分析,项目的应用程序的全部数据存放在磁盘阵列的分区上,这些分区由系统LVM管理,从LVM管理工具分析得到(device lvm(58,38)),(device lvm(58,39))是应用程序的/home/tsamsg,/hometsaplus的挂载目录,应用程序在初始化过程中必须先去/home/tsamsg/config/读取相关配置文件,而/home/tsamsg目录无法读写导致应用程序无法启动并向系统报错;如果LVM管理的分区表出现错误,会导致系统或者应用程序无法对LVM监管的磁盘阵列的分区读写数据,引起LVM分区表损坏的原因有两个:
1、系统或者应用程序在读写LVM监管的磁盘阵列的分区时,出现数据异步引起的
2、系统或者应用程序不正常关闭或者启动时引起的经与邮政工程师王清文、刘小玲沟通之后,引起LVM分区表损坏的原因是由于数据异步引起,表明之前曾有多位工程师同时操作修改应用程序,也正由于这个原因引起应用程序的数据在增长时的异步,导致了LVM分区表损坏。
故障解决:
由于LVM分区损坏,根据LVM分配资源的原理是随机分配空间给分区的,所以只以重新划分分区就可以恢复故障,操作步骤如下:
1、备份/home/tsamsg,/home/tsaplus的数据
2、将/home/tsamsg,/home/tsaplus所挂载的磁盘阵列上的分区(device lvm(58,38)),(device lvm(58,39))重新划分,磁盘阵列分区读写正常
3、恢复数据,重新启动应用程序,应用程序程序启动正常,该项目的业务正常运行,系统状态正常
为了使邮政项目业务正常运营,我们公司特提出以下两个建议:
1、对邮政项目的应用程序管理时,实行单系统管理员登入修改,避免同时有多个系统管理员修改配置文件而导致以上类似异步错误
2、应用程序所挂载的分区使用的文件系统格式是EXT2,而现在已经有完全兼容ext2文件系统的文件系统EXT3,EXT3还具有日志文件系统,当出
现数据崩溃时,EXT3能够更好地防止数据丢失,所以推荐将应用程序所挂载的分区使用的文件系统格式升级
|