某大学的数据库恢复过程

本站文章除注明转载外，均为本站原创： 转载自love wife love life —Roger的Oracle/MySQL/PostgreSQL数据恢复博客

某客户的数据库出现崩溃，无法正常启动，经过我的远程紧急救援恢复之后，恢复正常，如下是简单的处理过程，供参考！

在open数据库时，发现无法打开，报错如下：

SMON: enabling cache recovery
ORA-01555 caused by SQL statement below (SQL ID: 4krwuz0ctqxdt, SCN: 0x0000.1d2be1e6):
select ctime, mtime, stime from obj$ where obj# = :1
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_20559.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 7 with name "_SYSSMU7_3286610060$" too small
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_20559.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 7 with name "_SYSSMU7_3286610060$" too small
Error 704 happened during db open, shutting down database
USER (ospid: 20559): terminating the instance due to error 704
Instance terminated by USER, pid = 20559
ORA-1092 signalled during: alter database open resetlogs...
opiodr aborting process unknown ospid (20559) as a result of ORA-1092
Mon Sep 28 21:03:41 2015
ORA-1092 : opitsk aborting process

SMON: enabling cache recovery

ORA-01555 caused by SQL statement below (SQL ID: 4krwuz0ctqxdt, SCN: 0x0000.1d2be1e6):

select ctime, mtime, stime from obj$ where obj# = :1

Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_20559.trc:

ORA-00704: bootstrap process failure

ORA-00604: error occurred at recursive SQL level 1

ORA-01555: snapshot too old: rollback segment number 7 with name "_SYSSMU7_3286610060$" too small

Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_20559.trc:

ORA-00704: bootstrap process failure

ORA-00604: error occurred at recursive SQL level 1

ORA-01555: snapshot too old: rollback segment number 7 with name "_SYSSMU7_3286610060$" too small

Error 704 happened during db open, shutting down database

USER (ospid: 20559): terminating the instance due to error 704

Instance terminated by USER, pid = 20559

ORA-1092 signalled during: alter database open resetlogs...

opiodr aborting process unknown ospid (20559) as a result of ORA-1092

Mon Sep 28 21:03:41 2015

ORA-1092 : opitsk aborting process

对于上述错误，其实是比较常见的，大致上可以理解为Oracle在open 时需要进行一致性读的处理，却发现回滚段内容已经被覆盖，进而报错ora-01555，导致无法open。我们也可以发现，报错的SQL预计是Oracle 递归SQL，这是数据库在open时必须执行的SQL，很明显，该SQL无法执行成功，那么也就导致数据库无法正常打开。

处理思路很简单，首先我们要做的事情是通过10046 trace跟踪确认数据库在执行该SQL时访问了那些block，进而报错的？

通过oracle 10046 trace得到如下的内容：

=====================
PARSING IN CURSOR #4 len=70 dep=1 uid=0 oct=3 lid=0 tim=1443431294544569 hv=3377894161 ad='122f55f28' sqlid='32d4jrb4pd4sj'
select charsetid, charsetform from col$  where obj# = :1 and col# = :2
END OF STMT
PARSE #4:c=1000,e=226,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1443431294544569
=====================
PARSING IN CURSOR #5 len=52 dep=1 uid=0 oct=3 lid=0 tim=1443431294545488 hv=429618617 ad='121b18860' sqlid='4krwuz0ctqxdt'
select ctime, mtime, stime from obj$ where obj# = :1
END OF STMT
PARSE #5:c=0,e=217,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1443431294545488
BINDS #5:
Bind#0
oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
kxsbbbfp=7f1daaac1700  bln=22  avl=02  flg=05
value=20
EXEC #5:c=0,e=421,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=1218588913,tim=1443431294545986
WAIT #5: nam='db file sequential read' ela= 9 file#=1 block#=337 blocks=1 obj#=36 tim=1443431294546059
WAIT #5: nam='db file sequential read' ela= 6 file#=1 block#=338 blocks=1 obj#=36 tim=1443431294546117
WAIT #5: nam='db file sequential read' ela= 6 file#=1 block#=241 blocks=1 obj#=18 tim=1443431294546166
=====================
PARSING IN CURSOR #6 len=142 dep=2 uid=0 oct=3 lid=0 tim=1443431294546497 hv=361892850 ad='123eebc48' sqlid='7bd391hat42zk'
select /*+ rule */ name,file#,block#,status$,user#,undosqn,xactsqn,scnbas,scnwrp,DECODE(inst#,0,NULL,inst#),ts#,spare1 from undo$ where us#=:1
END OF STMT
PARSE #6:c=0,e=292,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=3,plh=0,tim=1443431294546496
BINDS #6:
Bind#0
oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
kxsbbbfp=7f1daaa2ebb8  bln=22  avl=02  flg=05
value=10
EXEC #6:c=0,e=473,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=3,plh=906473769,tim=1443431294547061
WAIT #6: nam='db file sequential read' ela= 9 file#=1 block#=321 blocks=1 obj#=34 tim=1443431294547113
WAIT #6: nam='db file sequential read' ela= 7 file#=1 block#=225 blocks=1 obj#=15 tim=1443431294547169
FETCH #6:c=0,e=116,p=2,cr=2,cu=0,mis=0,r=1,dep=2,og=3,plh=906473769,tim=1443431294547196
STAT #6 id=1 cnt=1 pid=0 pos=1 obj=15 op='TABLE ACCESS BY INDEX ROWID UNDO$ (cr=2 pr=2 pw=0 time=0 us)'
STAT #6 id=2 cnt=1 pid=1 pos=1 obj=34 op='INDEX UNIQUE SCAN I_UNDO1 (cr=1 pr=1 pw=0 time=0 us)'
CLOSE #6:c=0,e=5,dep=2,type=0,tim=1443431294547246
WAIT #5: nam='db file sequential read' ela= 11 file#=3 block#=7344 blocks=1 obj#=0 tim=1443431294547284
=====================
PARSING IN CURSOR #6 len=142 dep=2 uid=0 oct=3 lid=0 tim=1443431294547411 hv=361892850 ad='123eebc48' sqlid='7bd391hat42zk'
select /*+ rule */ name,file#,block#,status$,user#,undosqn,xactsqn,scnbas,scnwrp,DECODE(inst#,0,NULL,inst#),ts#,spare1 from undo$ where us#=:1
END OF STMT
PARSE #6:c=0,e=22,p=0,cr=0,cu=0,mis=0,r=0,dep=2,og=3,plh=906473769,tim=1443431294547411
BINDS #6:
Bind#0
oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
kxsbbbfp=7f1daaa2ebb8  bln=22  avl=02  flg=05
value=7
EXEC #6:c=0,e=47,p=0,cr=0,cu=0,mis=0,r=0,dep=2,og=3,plh=906473769,tim=1443431294547514
FETCH #6:c=0,e=95,p=0,cr=2,cu=0,mis=0,r=1,dep=2,og=3,plh=906473769,tim=1443431294547630
STAT #6 id=1 cnt=1 pid=0 pos=1 obj=15 op='TABLE ACCESS BY INDEX ROWID UNDO$ (cr=2 pr=0 pw=0 time=0 us)'
STAT #6 id=2 cnt=1 pid=1 pos=1 obj=34 op='INDEX UNIQUE SCAN I_UNDO1 (cr=1 pr=0 pw=0 time=0 us)'
CLOSE #6:c=0,e=4,dep=2,type=0,tim=1443431294547678
WAIT #5: nam='db file sequential read' ela= 8 file#=3 block#=552 blocks=1 obj#=0 tim=1443431294547710
FETCH #5:c=1999,e=1938,p=7,cr=9,cu=0,mis=0,r=0,dep=1,og=4,plh=1218588913,tim=1443431294547945
STAT #5 id=1 cnt=0 pid=0 pos=1 obj=18 op='TABLE ACCESS BY INDEX ROWID OBJ$ (cr=0 pr=0 pw=0 time=0 us)'
STAT #5 id=2 cnt=1 pid=1 pos=1 obj=36 op='INDEX RANGE SCAN I_OBJ1 (cr=2 pr=2 pw=0 time=0 us)'
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 7 with name "_SYSSMU7_3286610060$" too small
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 7 with name "_SYSSMU7_3286610060$" too small

=====================

PARSING IN CURSOR #4 len=70 dep=1 uid=0 oct=3 lid=0 tim=1443431294544569 hv=3377894161 ad='122f55f28' sqlid='32d4jrb4pd4sj'

select charsetid, charsetform from col$ where obj# = :1 and col# = :2

END OF STMT

PARSE #4:c=1000,e=226,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1443431294544569

=====================

PARSING IN CURSOR #5 len=52 dep=1 uid=0 oct=3 lid=0 tim=1443431294545488 hv=429618617 ad='121b18860' sqlid='4krwuz0ctqxdt'

select ctime, mtime, stime from obj$ where obj# = :1

END OF STMT

PARSE #5:c=0,e=217,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1443431294545488

BINDS #5:

Bind#0

oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00

oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0

kxsbbbfp=7f1daaac1700 bln=22 avl=02 flg=05

value=20

EXEC #5:c=0,e=421,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=1218588913,tim=1443431294545986

WAIT #5: nam='db file sequential read' ela= 9 file#=1 block#=337 blocks=1 obj#=36 tim=1443431294546059

WAIT #5: nam='db file sequential read' ela= 6 file#=1 block#=338 blocks=1 obj#=36 tim=1443431294546117

WAIT #5: nam='db file sequential read' ela= 6 file#=1 block#=241 blocks=1 obj#=18 tim=1443431294546166

=====================

PARSING IN CURSOR #6 len=142 dep=2 uid=0 oct=3 lid=0 tim=1443431294546497 hv=361892850 ad='123eebc48' sqlid='7bd391hat42zk'

select /*+ rule */ name,file#,block#,status$,user#,undosqn,xactsqn,scnbas,scnwrp,DECODE(inst#,0,NULL,inst#),ts#,spare1 from undo$ where us#=:1

END OF STMT

PARSE #6:c=0,e=292,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=3,plh=0,tim=1443431294546496

BINDS #6:

Bind#0

oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00

oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0

kxsbbbfp=7f1daaa2ebb8 bln=22 avl=02 flg=05

value=10

EXEC #6:c=0,e=473,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=3,plh=906473769,tim=1443431294547061

WAIT #6: nam='db file sequential read' ela= 9 file#=1 block#=321 blocks=1 obj#=34 tim=1443431294547113

WAIT #6: nam='db file sequential read' ela= 7 file#=1 block#=225 blocks=1 obj#=15 tim=1443431294547169

FETCH #6:c=0,e=116,p=2,cr=2,cu=0,mis=0,r=1,dep=2,og=3,plh=906473769,tim=1443431294547196

STAT #6 id=1 cnt=1 pid=0 pos=1 obj=15 op='TABLE ACCESS BY INDEX ROWID UNDO$ (cr=2 pr=2 pw=0 time=0 us)'

STAT #6 id=2 cnt=1 pid=1 pos=1 obj=34 op='INDEX UNIQUE SCAN I_UNDO1 (cr=1 pr=1 pw=0 time=0 us)'

CLOSE #6:c=0,e=5,dep=2,type=0,tim=1443431294547246

WAIT #5: nam='db file sequential read' ela= 11 file#=3 block#=7344 blocks=1 obj#=0 tim=1443431294547284

=====================

PARSING IN CURSOR #6 len=142 dep=2 uid=0 oct=3 lid=0 tim=1443431294547411 hv=361892850 ad='123eebc48' sqlid='7bd391hat42zk'

select /*+ rule */ name,file#,block#,status$,user#,undosqn,xactsqn,scnbas,scnwrp,DECODE(inst#,0,NULL,inst#),ts#,spare1 from undo$ where us#=:1

END OF STMT

PARSE #6:c=0,e=22,p=0,cr=0,cu=0,mis=0,r=0,dep=2,og=3,plh=906473769,tim=1443431294547411

BINDS #6:

Bind#0

oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00

oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0

kxsbbbfp=7f1daaa2ebb8 bln=22 avl=02 flg=05

value=7

EXEC #6:c=0,e=47,p=0,cr=0,cu=0,mis=0,r=0,dep=2,og=3,plh=906473769,tim=1443431294547514

FETCH #6:c=0,e=95,p=0,cr=2,cu=0,mis=0,r=1,dep=2,og=3,plh=906473769,tim=1443431294547630

STAT #6 id=1 cnt=1 pid=0 pos=1 obj=15 op='TABLE ACCESS BY INDEX ROWID UNDO$ (cr=2 pr=0 pw=0 time=0 us)'

STAT #6 id=2 cnt=1 pid=1 pos=1 obj=34 op='INDEX UNIQUE SCAN I_UNDO1 (cr=1 pr=0 pw=0 time=0 us)'

CLOSE #6:c=0,e=4,dep=2,type=0,tim=1443431294547678

WAIT #5: nam='db file sequential read' ela= 8 file#=3 block#=552 blocks=1 obj#=0 tim=1443431294547710

FETCH #5:c=1999,e=1938,p=7,cr=9,cu=0,mis=0,r=0,dep=1,og=4,plh=1218588913,tim=1443431294547945

STAT #5 id=1 cnt=0 pid=0 pos=1 obj=18 op='TABLE ACCESS BY INDEX ROWID OBJ$ (cr=0 pr=0 pw=0 time=0 us)'

STAT #5 id=2 cnt=1 pid=1 pos=1 obj=36 op='INDEX RANGE SCAN I_OBJ1 (cr=2 pr=2 pw=0 time=0 us)'

ORA-00704: bootstrap process failure

ORA-00604: error occurred at recursive SQL level 1

ORA-01555: snapshot too old: rollback segment number 7 with name "_SYSSMU7_3286610060$" too small

ORA-00704: bootstrap process failure

ORA-00604: error occurred at recursive SQL level 1

ORA-01555: snapshot too old: rollback segment number 7 with name "_SYSSMU7_3286610060$" too small

根据我们常见的处理思路，将上述访问的block中的事务状态改成8000之后，发现仍然报错。我们仔细来对比下block中的scn与报错的scn信息，发现了其中的关系，如下：

BBED&gt; p ktbbh
struct ktbbh, 48 bytes                      @20
。。。。。
struct ktbbhitl[0], 24 bytes             @44
struct ktbitxid, 8 bytes              @44
ub2 kxidusn                        @44       0x000a
ub2 kxidslt                        @46       0x0019
ub4 kxidsqn                        @48       0x00071d47
struct ktbituba, 8 bytes              @52
ub4 kubadba                        @52       0x00c01cb0
ub2 kubaseq                        @56       0x2c95
ub1 kubarec                        @58       0x15
ub2 ktbitflg                          @60       0x8000 (KTBFCOM)
union _ktbitun, 2 bytes               @62
b2 _ktbitfsc                       @62       0
ub2 _ktbitwrp                      @62       0x0000
ub4 ktbitbas                          @64       0x1d2be6ea

BBED> p ktbbh

struct ktbbh, 48 bytes @20

。。。。。

struct ktbbhitl[0], 24 bytes @44

struct ktbitxid, 8 bytes @44

ub2 kxidusn @44 0x000a

ub2 kxidslt @46 0x0019

ub4 kxidsqn @48 0x00071d47

struct ktbituba, 8 bytes @52

ub4 kubadba @52 0x00c01cb0

ub2 kubaseq @56 0x2c95

ub1 kubarec @58 0x15

ub2 ktbitflg @60 0x8000 (KTBFCOM)

union _ktbitun, 2 bytes @62

b2 _ktbitfsc @62 0

ub2 _ktbitwrp @62 0x0000

ub4 ktbitbas @64 0x1d2be6ea

将上述的scn bas值转换为10 进制后为：489416426，我们再来查询下数据库文件头的scn：

SQL&gt; select file#,checkpoint_change# from v$datafile_header;
FILE# CHECKPOINT_CHANGE#
---------- ------------------
1          489415139
2          489415139
3          489415139
4          489415139

SQL> select file#,checkpoint_change# from v$datafile_header;

FILE# CHECKPOINT_CHANGE#

---------- ------------------

1 489415139

2 489415139

3 489415139

4 489415139

我们不难发现，报错的block中的scn比数据文件头的scn要大，其次也比前面报错的的scn：1d2be1e6 （转换后为 489415142）要大一些。这说明什么？

当数据库处于running的情况之下，Oracle 不知道下一个时间点事务什么时间结束，因此也不知道下一个时间点的scn是多少，所以其对应的scn 往往要比当前的大一些。当数据库crash后，加上undo损坏，那么很容易出现这样的情况。

所以，我们这要做的事情，很简单，将上述scn 修改得比报错的scn小一些（或者等于），则可以解决该错误。

SQL&gt; alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [2662], [0], [489415146], [0],
[489416426], [4194545], [], [], [], [], [], []
Process ID: 22192
Session ID: 66 Serial number: 1

SQL> alter database open resetlogs;

alter database open resetlogs

ERROR at line 1:

ORA-01092: ORACLE instance terminated. Disconnection forced

ORA-00704: bootstrap process failure

ORA-00600: internal error code, arguments: [2662], [0], [489415146], [0],

[489416426], [4194545], [], [], [], [], [], []

Process ID: 22192

Session ID: 66 Serial number: 1

修改之后，再次启动数据库，发现报错发生了变化，查看此时的alert log，发现信息如下：

Thread 1 opened at log sequence 1
Current log# 1 seq# 1 mem# 0: /data/oradata/orcl/redo01.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Mon Sep 28 21:17:17 2015
SMON: enabling cache recovery
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_22192.trc  (incident=128561):
ORA-00600: internal error code, arguments: [2662], [0], [489415146], [0], [489416426], [4194545], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_128561/orcl_ora_22192_i128561.trc
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_22192.trc:
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [2662], [0], [489415146], [0], [489416426], [4194545], [], [], [], [], [], []
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_22192.trc:
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [2662], [0], [489415146], [0], [489416426], [4194545], [], [], [], [], [], []
Error 704 happened during db open, shutting down database
USER (ospid: 22192): terminating the instance due to error 704
Instance terminated by USER, pid = 22192
ORA-1092 signalled during: alter database open resetlogs...
opiodr aborting process unknown ospid (22192) as a result of ORA-1092

Thread 1 opened at log sequence 1

Current log# 1 seq# 1 mem# 0: /data/oradata/orcl/redo01.log

Successful open of redo thread 1

MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set

Mon Sep 28 21:17:17 2015

SMON: enabling cache recovery

Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_22192.trc (incident=128561):

ORA-00600: internal error code, arguments: [2662], [0], [489415146], [0], [489416426], [4194545], [], [], [], [], [], []

Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_128561/orcl_ora_22192_i128561.trc

Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_22192.trc:

ORA-00704: bootstrap process failure

ORA-00600: internal error code, arguments: [2662], [0], [489415146], [0], [489416426], [4194545], [], [], [], [], [], []

Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_22192.trc:

ORA-00704: bootstrap process failure

ORA-00600: internal error code, arguments: [2662], [0], [489415146], [0], [489416426], [4194545], [], [], [], [], [], []

Error 704 happened during db open, shutting down database

USER (ospid: 22192): terminating the instance due to error 704

Instance terminated by USER, pid = 22192

ORA-1092 signalled during: alter database open resetlogs...

opiodr aborting process unknown ospid (22192) as a result of ORA-1092

很明显，这是scn的问题，要处理也很简单，通过推进scn即可解决掉。通过推进scn之后，发现打开数据库时，还是报错了，但是错误再一次发生了改变：

*********************************************************************
Database Characterset is ZHS16GBK
No Resource Manager plan active
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x41CE1144] [PC:0x2297740, kgegpa()+40] [flags: 0x0, count: 1]
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x41CE1144] [PC:0x229596B, kgebse()+279] [flags: 0x2, count: 2]
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x41CE1144] [PC:0x229596B, kgebse()+279] [flags: 0x2, count: 2]
Mon Sep 28 21:25:01 2015
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_p000_23736.trc  (incident=132169):
ORA-00600: internal error code, arguments: [6006], [1], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_132169/orcl_p000_23736_i132169.trc
Mon Sep 28 21:25:03 2015
PMON (ospid: 23582): terminating the instance due to error 397
Instance terminated by PMON, pid = 23582

*********************************************************************

Database Characterset is ZHS16GBK

No Resource Manager plan active

Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x41CE1144] [PC:0x2297740, kgegpa()+40] [flags: 0x0, count: 1]

Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x41CE1144] [PC:0x229596B, kgebse()+279] [flags: 0x2, count: 2]

Mon Sep 28 21:25:01 2015

Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_p000_23736.trc (incident=132169):

ORA-00600: internal error code, arguments: [6006], [1], [], [], [], [], [], [], [], [], [], []

Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_132169/orcl_p000_23736_i132169.trc

Mon Sep 28 21:25:03 2015

PMON (ospid: 23582): terminating the instance due to error 397

Instance terminated by PMON, pid = 23582

这次过程处理起来就很简单了，通过屏蔽undo就可以很容易解决掉。其次在后续的恢复过程中，还遇到了如下的一些错误：

Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_j004_27644.trc  (incident=137080):
ORA-00600: internal error code, arguments: [4097], [9], [7], [63399], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_137080/orcl_j004_27644_i137080.trc
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_j004_27644.trc  (incident=137081):
ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [4097], [9], [7], [63399], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_137081/orcl_j004_27644_i137081.trc
Mon Sep 28 22:00:10 2015
Trace dumping is performing id=[cdmp_20150928220010]
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_j004_27644.trc  (incident=137082):
ORA-00600: internal error code, arguments: [4097], [9], [7], [63399], [], [], [], [], [], [], [], []
ORA-06512: at "SYS.DBMS_STATS", line 26392
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_137082/orcl_j004_27644_i137082.trc
Trace dumping is performing id=[cdmp_20150928220011]

Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_j004_27644.trc (incident=137080):

ORA-00600: internal error code, arguments: [4097], [9], [7], [63399], [], [], [], [], [], [], [], []

Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_137080/orcl_j004_27644_i137080.trc

Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_j004_27644.trc (incident=137081):

ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], [], [], [], [], []

ORA-00600: internal error code, arguments: [4097], [9], [7], [63399], [], [], [], [], [], [], [], []

Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_137081/orcl_j004_27644_i137081.trc

Mon Sep 28 22:00:10 2015

Trace dumping is performing id=[cdmp_20150928220010]

Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_j004_27644.trc (incident=137082):

ORA-00600: internal error code, arguments: [4097], [9], [7], [63399], [], [], [], [], [], [], [], []

ORA-06512: at "SYS.DBMS_STATS", line 26392

Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_137082/orcl_j004_27644_i137082.trc

Trace dumping is performing id=[cdmp_20150928220011]

这部分错误处理起来都相对简单的多。【4097】也是回滚段的问题，在处理undo时，可以一并处理之。我博客之前就写了该错误的处理案例，这里不再累述。这种恢复场景，最后打开数据库后一般还会有如下的错误：

Mon Sep 28 22:10:57 2015
Sweep [inc][141951]: completed
Sweep [inc2][141951]: completed
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_j000_29255.trc  (incident=141952):
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
Trace dumping is performing id=[cdmp_20150928221057]
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_j000_29255.trc  (incident=141953):
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_j000_29255.trc  (incident=141954):
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_j000_29255.trc  (incident=142843):
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_j000_29255.trc  (incident=142844):
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_j000_29255.trc  (incident=142845):
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_j000_29255.trc  (incident=142846):
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
Mon Sep 28 22:11:00 2015

Mon Sep 28 22:10:57 2015

Sweep [inc][141951]: completed

Sweep [inc2][141951]: completed

Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_j000_29255.trc (incident=141952):

ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []

Trace dumping is performing id=[cdmp_20150928221057]

Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_j000_29255.trc (incident=141953):

ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []

Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_j000_29255.trc (incident=141954):

ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []

Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_j000_29255.trc (incident=142843):

ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []

Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_j000_29255.trc (incident=142844):

ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []

Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_j000_29255.trc (incident=142845):

ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []

Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_j000_29255.trc (incident=142846):

ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []

Mon Sep 28 22:11:00 2015

最后这个错误处理起来十分简单，通过重建index即可解决上述错误，对于大量的日志，建议直接grep，然后重建相关index即可。

[root@db2 ~]# cat /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_132169/orcl_p000_23736_i132169.trc |grep obj:
dbwrid: 0 obj: 5777 objn: 5777 tsn: 1 afn: 2 hint: f
seg/obj: 0x1691  csc: 0x00.1d2be90f  itc: 2  flg: E  typ: 2 - INDEX
dbwrid: 0 obj: 37 objn: 37 tsn: 0 afn: 1 hint: f
。。。。。
seg/obj: 0x1692  csc: 0x00.1d2be8f1  itc: 2  flg: E  typ: 1 - DATA
dbwrid: 1 obj: 36 objn: 36 tsn: 0 afn: 1 hint: f
seg/obj: 0x24  csc: 0x00.1ca4ab69  itc: 1  flg: -  typ: 2 - INDEX
dbwrid: 0 obj: 36 objn: 36 tsn: 0 afn: 1 hint: f
seg/obj: 0x24  csc: 0x00.16f03  itc: 2  flg: -  typ: 2 - INDEX
dbwrid: 0 obj: 5777 objn: 5777 tsn: 1 afn: 2 hint: f
seg/obj: 0x1691  csc: 0x00.1d2be90f  itc: 2  flg: E  typ: 2 - INDEX
dbwrid: 1 obj: 5777 objn: 5777 tsn: 1 afn: 2 hint: f
seg/obj: 0x1691  csc: 0x00.1d2baac0  itc: 1  flg: E  typ: 2 - INDEX

[root@db2 ~]# cat /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_132169/orcl_p000_23736_i132169.trc |grep obj:

dbwrid: 0 obj: 5777 objn: 5777 tsn: 1 afn: 2 hint: f

seg/obj: 0x1691 csc: 0x00.1d2be90f itc: 2 flg: E typ: 2 - INDEX

dbwrid: 0 obj: 37 objn: 37 tsn: 0 afn: 1 hint: f

。。。。。

seg/obj: 0x1692 csc: 0x00.1d2be8f1 itc: 2 flg: E typ: 1 - DATA

dbwrid: 1 obj: 36 objn: 36 tsn: 0 afn: 1 hint: f

seg/obj: 0x24 csc: 0x00.1ca4ab69 itc: 1 flg: - typ: 2 - INDEX

dbwrid: 0 obj: 36 objn: 36 tsn: 0 afn: 1 hint: f

seg/obj: 0x24 csc: 0x00.16f03 itc: 2 flg: - typ: 2 - INDEX

dbwrid: 0 obj: 5777 objn: 5777 tsn: 1 afn: 2 hint: f

seg/obj: 0x1691 csc: 0x00.1d2be90f itc: 2 flg: E typ: 2 - INDEX

dbwrid: 1 obj: 5777 objn: 5777 tsn: 1 afn: 2 hint: f

seg/obj: 0x1691 csc: 0x00.1d2baac0 itc: 1 flg: E typ: 2 - INDEX

最后通过mos的脚本来check 数据字典是否存在异常，这样就可以确保数据库起码可以正常运行。如下是检测结果：

SQL&gt; exec hcheck.full
H.Check Version 9i+/hc3.50
---------------------------------------
Catalog Version 11.2.0.1.0 (1102000100)
---------------------------------------
Catalog       Fixed
Procedure Name                     Version    Vs Release      Run
------------------------------ ... ---------- -- ----------   ---
.- SynLastDDLTim               ... 1102000100 &gt;  1001000200 : n/a
.- LobNotInObj                 ... 1102000100 &gt;  1000000200 : n/a
.- MissingOIDOnObjCol          ... 1102000100 &lt;=  *All Rel* : Ok
.- SourceNotInObj              ... 1102000100 &gt;  1002000100 : n/a
.- IndIndparMismatch           ... 1102000100 &lt;= 1102000100 : Ok
.- InvCorrAudit                ... 1102000100 &lt;= 1102000100 : Ok
.- OversizedFiles              ... 1102000100 &lt;=  *All Rel* : Ok
.- TinyFiles                   ... 1102000100 &gt;   900010000 : n/a
.- PoorDefaultStorage          ... 1102000100 &lt;=  *All Rel* : Ok
.- PoorStorage                 ... 1102000100 &lt;=  *All Rel* : Ok
.- MissTabSubPart              ... 1102000100 &gt;   900010000 : n/a
.- PartSubPartMismatch         ... 1102000100 &lt;= 1102000100 : Ok
.- TabPartCountMismatch        ... 1102000100 &lt;=  *All Rel* : Ok
.- OrphanedTabComPart          ... 1102000100 &gt;   900010000 : n/a
.- ZeroTabSubPart              ... 1102000100 &gt;   902000100 : n/a
.- MissingSum$                 ... 1102000100 &lt;=  *All Rel* : Ok
.- MissingDir$                 ... 1102000100 &lt;=  *All Rel* : Ok
.- DuplicateDataobj            ... 1102000100 &lt;=  *All Rel* : Ok
.- ObjSynMissing               ... 1102000100 &lt;=  *All Rel* : Ok
.- ObjSeqMissing               ... 1102000100 &lt;=  *All Rel* : Ok
.- OrphanedUndo                ... 1102000100 &lt;=  *All Rel* : Ok
.- OrphanedIndex               ... 1102000100 &lt;=  *All Rel* : Ok
.- OrphanedIndexPartition      ... 1102000100 &lt;=  *All Rel* : Ok
.- OrphanedIndexSubPartition   ... 1102000100 &lt;=  *All Rel* : Ok
.- OrphanedTable               ... 1102000100 &lt;=  *All Rel* : Ok
.- OrphanedTablePartition      ... 1102000100 &lt;=  *All Rel* : Ok
.- OrphanedTableSubPartition   ... 1102000100 &lt;=  *All Rel* : Ok
.- MissingPartCol              ... 1102000100 &lt;=  *All Rel* : Ok
.- OrphanedSeg$                ... 1102000100 &lt;=  *All Rel* : Ok
.- OrphanedIndPartObj#         ... 1102000100 &gt;  1101000600 : n/a
.- DuplicateBlockUse           ... 1102000100 &lt;=  *All Rel* : Ok
.- HighObjectIds               ... 1102000100 &gt;   801060000 : n/a
.- PQsequence                  ... 1102000100 &gt;   800060000 : n/a
.- TruncatedCluster            ... 1102000100 &gt;   801070000 : n/a
.- FetUet                      ... 1102000100 &lt;=  *All Rel* : Ok
.- Uet0Check                   ... 1102000100 &lt;=  *All Rel* : Ok
.- ExtentlessSeg               ... 1102000100 &lt;=  *All Rel* : Ok
.- SeglessUET                  ... 1102000100 &lt;=  *All Rel* : Ok
.- BadInd$                     ... 1102000100 &lt;=  *All Rel* : Ok
.- BadTab$                     ... 1102000100 &lt;=  *All Rel* : Ok
.- BadIcolDepCnt               ... 1102000100 &gt;  1101000700 : n/a
.- WarnIcolDep                 ... 1102000100 &gt;  1101000700 : n/a
.- OnlineRebuild$              ... 1102000100 &lt;=  *All Rel* : Ok
.- DropForceType               ... 1102000100 &gt;  1001000200 : n/a
.- TrgAfterUpgrade             ... 1102000100 &lt;=  *All Rel* : Ok
.- FailedInitJVMRun            ... 1102000100 &lt;=  *All Rel* : Ok
.- TypeReusedAfterDrop         ... 1102000100 &gt;   900010000 : n/a
.- Idgen1$TTS                  ... 1102000100 &gt;   900010000 : n/a
.- DroppedFuncIdx              ... 1102000100 &gt;   902000100 : n/a
.- BadOwner                    ... 1102000100 &gt;   900010000 : n/a
.- UpgCheckc0801070            ... 1102000100 &lt;=  *All Rel* : Ok
.- BadPublicObjects            ... 1102000100 &lt;=  *All Rel* : Ok
.- BadSegFreelist              ... 1102000100 &lt;=  *All Rel* : Ok
.- BadCol#                     ... 1102000100 &gt;  1001000200 : n/a
.- BadDepends                  ... 1102000100 &lt;=  *All Rel* : Ok
.- CheckDual                   ... 1102000100 &lt;=  *All Rel* : Ok
.- ObjectNames                 ... 1102000100 &lt;=  *All Rel* : Ok
.- BadCboHiLo                  ... 1102000100 &lt;=  *All Rel* : Ok
.- ChkIotTs                    ... 1102000100 &lt;=  *All Rel* : Ok
.- NoSegmentIndex              ... 1102000100 &lt;=  *All Rel* : Ok
.- BadNextObject               ... 1102000100 &lt;=  *All Rel* : Ok
.- OrphanIndopt                ... 1102000100 &gt;   902000800 : n/a
.- UpgFlgBitTmp                ... 1102000100 &gt;  1001000100 : n/a
.- RenCharView                 ... 1102000100 &gt;  1001000100 : n/a
.- Upg9iTab$                   ... 1102000100 &gt;   902000400 : n/a
.- Upg9iTsInd                  ... 1102000100 &gt;   902000500 : n/a
.- Upg10gInd$                  ... 1102000100 &gt;  1002000000 : n/a
.- DroppedROTS                 ... 1102000100 &lt;=  *All Rel* : Ok
.- ChrLenSmtcs                 ... 1102000100 &gt;  1101000600 : n/a
.- FilBlkZero                  ... 1102000100 &lt;=  *All Rel* : Ok
.- DbmsSchemaCopy              ... 1102000100 &lt;=  *All Rel* : Ok
Found 0 potential problem(s) and 0 warning(s)
PL/SQL procedure successfully completed.

SQL> exec hcheck.full

H.Check Version 9i+/hc3.50

---------------------------------------

Catalog Version 11.2.0.1.0 (1102000100)

---------------------------------------

Catalog Fixed

Procedure Name Version Vs Release Run

------------------------------ ... ---------- -- ---------- ---

.- SynLastDDLTim ... 1102000100 > 1001000200 : n/a

.- LobNotInObj ... 1102000100 > 1000000200 : n/a

.- MissingOIDOnObjCol ... 1102000100 <= *All Rel* : Ok

.- SourceNotInObj ... 1102000100 > 1002000100 : n/a

.- IndIndparMismatch ... 1102000100 <= 1102000100 : Ok

.- InvCorrAudit ... 1102000100 <= 1102000100 : Ok

.- OversizedFiles ... 1102000100 <= *All Rel* : Ok

.- TinyFiles ... 1102000100 > 900010000 : n/a

.- PoorDefaultStorage ... 1102000100 <= *All Rel* : Ok

.- PoorStorage ... 1102000100 <= *All Rel* : Ok

.- MissTabSubPart ... 1102000100 > 900010000 : n/a

.- PartSubPartMismatch ... 1102000100 <= 1102000100 : Ok

.- TabPartCountMismatch ... 1102000100 <= *All Rel* : Ok

.- OrphanedTabComPart ... 1102000100 > 900010000 : n/a

.- ZeroTabSubPart ... 1102000100 > 902000100 : n/a

.- MissingSum$ ... 1102000100 <= *All Rel* : Ok

.- MissingDir$ ... 1102000100 <= *All Rel* : Ok

.- DuplicateDataobj ... 1102000100 <= *All Rel* : Ok

.- ObjSynMissing ... 1102000100 <= *All Rel* : Ok

.- ObjSeqMissing ... 1102000100 <= *All Rel* : Ok

.- OrphanedUndo ... 1102000100 <= *All Rel* : Ok

.- OrphanedIndex ... 1102000100 <= *All Rel* : Ok

.- OrphanedIndexPartition ... 1102000100 <= *All Rel* : Ok

.- OrphanedIndexSubPartition ... 1102000100 <= *All Rel* : Ok

.- OrphanedTable ... 1102000100 <= *All Rel* : Ok

.- OrphanedTablePartition ... 1102000100 <= *All Rel* : Ok

.- OrphanedTableSubPartition ... 1102000100 <= *All Rel* : Ok

.- MissingPartCol ... 1102000100 <= *All Rel* : Ok

.- OrphanedSeg$ ... 1102000100 <= *All Rel* : Ok

.- OrphanedIndPartObj# ... 1102000100 > 1101000600 : n/a

.- DuplicateBlockUse ... 1102000100 <= *All Rel* : Ok

.- HighObjectIds ... 1102000100 > 801060000 : n/a

.- PQsequence ... 1102000100 > 800060000 : n/a

.- TruncatedCluster ... 1102000100 > 801070000 : n/a

.- FetUet ... 1102000100 <= *All Rel* : Ok

.- Uet0Check ... 1102000100 <= *All Rel* : Ok

.- ExtentlessSeg ... 1102000100 <= *All Rel* : Ok

.- SeglessUET ... 1102000100 <= *All Rel* : Ok

.- BadInd$ ... 1102000100 <= *All Rel* : Ok

.- BadTab$ ... 1102000100 <= *All Rel* : Ok

.- BadIcolDepCnt ... 1102000100 > 1101000700 : n/a

.- WarnIcolDep ... 1102000100 > 1101000700 : n/a

.- OnlineRebuild$ ... 1102000100 <= *All Rel* : Ok

.- DropForceType ... 1102000100 > 1001000200 : n/a

.- TrgAfterUpgrade ... 1102000100 <= *All Rel* : Ok

.- FailedInitJVMRun ... 1102000100 <= *All Rel* : Ok

.- TypeReusedAfterDrop ... 1102000100 > 900010000 : n/a

.- Idgen1$TTS ... 1102000100 > 900010000 : n/a

.- DroppedFuncIdx ... 1102000100 > 902000100 : n/a

.- BadOwner ... 1102000100 > 900010000 : n/a

.- UpgCheckc0801070 ... 1102000100 <= *All Rel* : Ok

.- BadPublicObjects ... 1102000100 <= *All Rel* : Ok

.- BadSegFreelist ... 1102000100 <= *All Rel* : Ok

.- BadCol# ... 1102000100 > 1001000200 : n/a

.- BadDepends ... 1102000100 <= *All Rel* : Ok

.- CheckDual ... 1102000100 <= *All Rel* : Ok

.- ObjectNames ... 1102000100 <= *All Rel* : Ok

.- BadCboHiLo ... 1102000100 <= *All Rel* : Ok

.- ChkIotTs ... 1102000100 <= *All Rel* : Ok

.- NoSegmentIndex ... 1102000100 <= *All Rel* : Ok

.- BadNextObject ... 1102000100 <= *All Rel* : Ok

.- OrphanIndopt ... 1102000100 > 902000800 : n/a

.- UpgFlgBitTmp ... 1102000100 > 1001000100 : n/a

.- RenCharView ... 1102000100 > 1001000100 : n/a

.- Upg9iTab$ ... 1102000100 > 902000400 : n/a

.- Upg9iTsInd ... 1102000100 > 902000500 : n/a

.- Upg10gInd$ ... 1102000100 > 1002000000 : n/a

.- DroppedROTS ... 1102000100 <= *All Rel* : Ok

.- ChrLenSmtcs ... 1102000100 > 1101000600 : n/a

.- FilBlkZero ... 1102000100 <= *All Rel* : Ok

.- DbmsSchemaCopy ... 1102000100 <= *All Rel* : Ok

Found 0 potential problem(s) and 0 warning(s)

PL/SQL procedure successfully completed.

我们可以发现，至少通过Oracle mos的脚本检查之后，没有数据字典有问题。

对于这样的复杂数据恢复，建议联系 云和恩墨 获取专业技术支持！

Tags: ora-00600 2662, ORA-00604, ORA-00704, ora-01555, [4097], [6006], [kdsgrp1]
Posted in backup & rcovery, Oracle rdbms on 10月 7, 2015

You must be logged in to post a comment.

love wife love life —Roger的Oracle/MySQL/PostgreSQL数据恢复博客

Categories

Archives

最新评论

国内圈子

oracle security

某大学的数据库恢复过程

Leave a Reply