love wife love life —Roger的Oracle/MySQL/PostgreSQL数据恢复博客

Phone:18180207355 提供专业Oracle/MySQL/PostgreSQL数据恢复、性能优化、迁移升级、紧急救援等服务

一次远程协助的恢复 遇到异灵事件

本站文章除注明转载外,均为本站原创: 转载自love wife love life —Roger的Oracle/MySQL/PostgreSQL数据恢复博客

本文链接地址: 一次远程协助的恢复 遇到异灵事件

最近2天远程帮忙恢复了一个数据库,虽然过程不难,但是却遇到了非常奇怪的事情。
某个客户的windows 10201的库,alert报错如下:

Tue Jul 05 15:28:52 2011
Hex dump of (file 3, block 2674) in trace file e:\oracle\product\10.2.0\admin\

orcl\bdump\orcl_p001_1916.trc
Corrupt block relative dba: 0x00c00a72 (file 3, block 2674)
Tue Jul 05 15:28:52 2011
RECOVERY OF THREAD 1 STUCK AT BLOCK 2095 OF FILE 2
Fractured block found during crash/instance recovery
Data in bad block:
type: 33 format: 2 rdba: 0x00c00a72
last change scn: 0x0000.0e5676f3 seq: 0x1 flg: 0x04
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0xad4b2101
check value in block header: 0xe187
computed block checksum: 0xdbb8
Reread of rdba: 0x00c00a72 (file 3, block 2674) found same corrupted data
Tue Jul 05 15:28:52 2011

Tue Jul 05 16:32:22 2011
Hex dump of (file 2, block 321) in trace file e:\oracle\product\10.2.0\admin

\orcl\bdump\orcl_p003_2164.trc
Corrupt block relative dba: 0x00800141 (file 2, block 321)
Fractured block found during crash/instance recovery
Data in bad block:
type: 1 format: 6 rdba: 0x0141a226
last change scn: 0x8ad2.8ad20141 seq: 0x0 flg: 0x00
spare1: 0x26 spare2: 0xa2 spare3: 0x893e
consistency value in tail: 0x26020000
check value in block header: 0x0
block checksum disabled
Tue Jul 05 16:32:23 2011
Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_p003_2164.trc:
ORA-00600: internal error code, arguments: [ksfdrmms1], [0x4EA4B98C], [], [], [], []

Tue Jul 05 16:53:14 2011
Hex dump of (file 2, block 321) in trace file e:\oracle\product\10.2.0\admin\orcl

\bdump\orcl_p003_2192.trc
Corrupt block relative dba: 0x00800141 (file 2, block 321)
Fractured block found during crash/instance recovery
Data in bad block:
type: 5 format: 2 rdba: 0x0141a226
last change scn: 0x8ad2.8ad20141 seq: 0x0 flg: 0x00
spare1: 0x26 spare2: 0xa2 spare3: 0x893e
consistency value in tail: 0x26020000
check value in block header: 0x0
block checksum disabled
Reread of rdba: 0x00800141 (file 2, block 321) found different data
Hex dump of (file 2, block 321) in trace file e:\oracle\product\10.2.0\admin\orcl\

bdump\orcl_p003_2192.trc
Corrupt block relative dba: 0x00800141 (file 2, block 321)
Fractured block found during reread
Data in bad block:
type: 38 format: 2 rdba: 0x00800141
last change scn: 0x0000.0e568ad2 seq: 0x2 flg: 0x04
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0xc1c42602
check value in block header: 0x893e
computed block checksum: 0xea14
Tue Jul 05 16:53:15 2011
Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_p002_2188.trc:
ORA-00600: internal error code, arguments: [ksfdchkfob1], [0x4EA4CF64], [0xB424CF94], []

Hex dump of (file 1, block 166657) in trace file e:\oracle\product\10.2.0\admin\

orcl\bdump\orcl_p004_3696.trc
Corrupt block relative dba: 0x00428b01 (file 1, block 166657)
Fractured block found during media recovery
Data in bad block:
type: 6 format: 2 rdba: 0x00428b01
last change scn: 0x0000.0e53ba17 seq: 0x1 flg: 0x06
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0xbb9b0601
check value in block header: 0xc268
computed block checksum: 0xb19e
Reread of rdba: 0x00428b01 (file 1, block 166657) found same corrupted data

第一次恢复是7月5号下午6点左右,当时恢复以后,就下班回家了,最后发现system仍然有坏块,如下:

Tue Jul 05 18:20:14 2011
Completed: ALTER DATABASE OPEN
Tue Jul 05 18:20:15 2011
ORA-1578 encountered when generating server alert SMG-4120
ORA-1578 encountered when generating server alert SMG-4121
ORA-1578 encountered when generating server alert SMG-4121
Tue Jul 05 18:23:11 2011
ORA-1578 encountered when generating server alert SMG-3503
Tue Jul 05 18:24:53 2011
alter database datafile ‘E:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\UNDOTBS2.DBF’ resize 2048M
Tue Jul 05 18:25:10 2011
ORA-1578 encountered when generating server alert SMG-4121
ORA-1578 encountered when generating server alert SMG-4121
Tue Jul 05 18:25:41 2011
Completed: alter database datafile ‘E:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\UNDOTBS2.DBF’ resize 2048M
Tue Jul 05 18:25:41 2011
Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_smon_3656.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 4554)
ORA-01110: data file 1: ‘E:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\SYSTEM01.DBF’

换句话说,从5号到6号下午6点左右,我们再次进行恢复的时候,alert 一直在报错上面的坏块。
关于这个file 1 block 4554,其实都是一样的,10gR2的库,查询都一样,如下:

在6号下午6点进行的恢复时,用dbv检查system datafile,发现居然有2个坏块。

windows环境下,检测的结果是十进制,首先需要转换为16进制,然后再转换为file id和 block number。

对于dbv检测到的第1个坏块,通过直接rebuild index就解决了,如下:

对于第2个坏块,尝试了多种方式,均无法删除,如下:

开始的尝试都不行,最后我将该表的结构以及相关的几个index通过
dbms_metadata.get_ddl获取出来,然后使用dbms_repair进行跳过坏块,才完成的。

做完上面的工作以后,才能删除smon_scn_time,我狂汗,最后使用同样的方式删除smon_scn_time_bak。
最后再创建表和相关的cluster以及index,如下:


从上看,file 3 block 2647又出问题了。
关于后面为什么再次出现坏块,这个我没信息了,客户那边也打算将数据导出新建库,他们以后这个库不会用了。
由于这个事情,我本身也是在帮人处理,故暂且到此。

值得庆幸的是,昨天我在recover的时候,最后将那个坏块file 1 block 166657 dump了出来,现在就让咋们来
好好分析一下这个block的情况

*** SERVICE NAME:(SYS$USERS) 2011-07-06 20:35:03.000
*** SESSION ID:(159.3) 2011-07-06 20:35:03.000
tkcrrsarc: (WARN) Failed to find ARCH for message (message:0x1)
tkcrrpa: (WARN) Failed initial attempt to send ARCH message (message:0x1)
*** 2011-07-06 20:36:48.484
Start dump data blocks tsn: 0 file#: 1 minblk 166657 maxblk 166657
buffer tsn: 0 rdba: 0x00428b01 (1/166657)
scn: 0x0000.00000000 seq: 0xff flg: 0x04 tail: 0x000006ff
frmt: 0x02 chkval: 0x7e29 type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
Dump of memory from 0x04CC2200 to 0x04CC4200

先不说block dump的内容,就前面的那两句,我感觉可能就不正常,或者说可能是bug,虽然可能不是致命的。
如下:

tkcrrsarc: (WARN) Failed to find ARCH for message (message:0x1)
tkcrrpa: (WARN) Failed initial attempt to send ARCH message (message:0x1)

经常mos,发现果然是bug,再次狂汗。。。。 详见如下:

BUG 4704334 – ”TKCRRSARC: (WARN) FAILED TO FIND ARCH FOR MESSAGE (MESSAGE:0X1)”

These messages are warning messages and are a consequence of the internal archive process spawning and releasing.

There can be different reasons for this message to appear, but it is in fact related to internal work.

The BUG 4704334 has been closed as an duplicate of BUG 6074620 – fixed in 10.2.0.4.1, 10.2.0.5, 11.1.0.7 and 11.2 releases.

这个意义不大,咋们继续看上面的block dump信息。

我们就来看看怪异的事情:

scn: 0x0000.00000000 seq: 0xff flg: 0x04 tail: 0x000006ff
frmt: 0x02 chkval: 0x7e29 type: 0x06=trans data
Hex dump of block: st=0, typ_found=1

今天查看昨天的操作记录,发现了如下信息:

难道说是因为我们rename了该表,导致该block的scn被清空了 ?
下午自己做了下实验,发现rename table的过程实际上不会该表block的位置,换句话说,
原来这个表分布在a,b两个block上,那么rename以后仍然分布在a和b 两个block上。

关于block scn为0的情况,问了老白,他的说法是除非是刚格式化的block或空块,不然不可能为0.
但是事实上我这里确实是遇到了,太诡异了。
最近遇到了好几个怪异的问题,这是其中之一,还有前不久的临时段的问题,看来需要花点功夫再研究研究。

最后补充一下,关于那几个字典表,其实可以查询sql.bsq如下:

2 Responses to “一次远程协助的恢复 遇到异灵事件”

  1. uggs on sale Says:

    Great post and nice discussion some of the comments

  2. love wife & love life ——Roger » Blog Archive » 关于对象SMON_SCN_TIME Says:

    […] 其实关于该表,以前我也遇到过一次,那是一次数据库的恢复, 详见链接: 一次远程协助的恢复 遇到异灵事件 SQL> SELECT owner, TABLE_NAME, num_rows, blocks, empty_blocks, avg_row_len 2 FROM dba_tables 3 […]

Leave a Reply

You must be logged in to post a comment.