15 TB 3节点RAC 的恢复记录
本站文章除注明转载外,均为本站原创: 转载自love wife love life —Roger的Oracle/MySQL/PostgreSQL数据恢复博客
本文链接地址: 15 TB 3节点RAC 的恢复记录
某客户的业务系统(15TB,3节点RAC) 由于abort关闭之后,导致数据库无法正常open. 在Open时报错如下:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 | Tue Oct 15 14:06:36 2013 Trace dumping is performing id=[cdmp_20131015140636] Errors in file /oracle/app/product/diag/rdbms/xxxx/xxxx/trace/xxxx03_ora_49348828.trc: ORA-00308: cannot open archived log '/arch/xxxx/2_30992_766859529.dbf' ORA-17503: ksfdopn:4 Failed to open file /arch/xxxx/2_30992_766859529.dbf ORA-17500: ODM err:File does not exist ORA-00600: internal error code, arguments: [4097], [3909], [23], [121212], [], [], [], [], [], [], [], [] Errors in file /oracle/app/product/diag/rdbms/xxxx/xxxx03/trace/xxxx03_ora_49348828.trc: ORA-00600: internal error code, arguments: [4097], [3909], [23], [121212], [], [], [], [], [], [], [], [] Errors in file /oracle/app/product/diag/rdbms/xxxx/xxxx03/trace/xxxx03_ora_49348828.trc: ORA-00600: internal error code, arguments: [4097], [3909], [23], [121212], [], [], [], [], [], [], [], [] Error 600 happened during db open, shutting down database USER (ospid: 49348828): terminating the instance due to error 600 Tue Oct 15 14:06:37 2013 | 
但看上面的错误,或许有人以为这是回档有问题,或者以为是AIX操作系统有问题。 其实不然,why ?
我想问一下,实例恢复需要archivelog吗?
因此我们可以确认,数据库无法正常open的原因是ora-00600 [4097]错误在作祟。 关于这个错误,不想描述的太多。
网上也有很多文章,教你通过屏蔽回滚段来强制拉库。
但是这里,我想知道是,为什么会抛出这个错误。
我们来看Metalink 文档关于该错误的一个标准解释:
| 1 2 3 | We are accessing a rollback segment header to see if a transaction has been committed. However, the xid given is in the future of the transaction table. This could be due to a rollback segment corruption issue OR you might be hitting the following known problem | 
根据文档解释,产生该错误的原因是oracle在open时会去读取回滚段头中的事务表信息,以此来判断是否已经提交.
当发现某个事务(XID)的warp#比当前数据库的最大值都还要大时,将出现该错误。 下面我们来看下trace文件:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | Dump continued from file: /oracle/app/product/diag/rdbms/xxxx/xxxx03/trace/xxxx03_ora_17629274.trc ORA-00600: internal error code, arguments: [4097], [3909], [23], [121212], [], [], [], [], [], [], [], [] ========= Dump for incident 526533 (ORA 600 [4097]) ======== ----- Beginning of Customized Incident Dump(s) ----- xid: 0x0f45.017.0001d97c GLOBAL CACHE ELEMENT DUMP (address: 0x7000000ddef7278):   id1: 0x26d7d0 id2: 0x5 pkey: TS#5 block: (5/2545616)   lock: S rls: 0x0 acq: 0x0 latch: 5   flags: 0x20 fair: 0 recovery: 0 fpin: 'ktuwh02: ktugus'   bscn: 0xc54.80f5a25a bctx: 0x0 write: 0 scan: 0x0   lcp: 0x0 lnk: [NULL] lch: [0x700000223e151e8,0x700000223e151e8]   seq: 7 hist: 239 143:0 16 143:5 208 352 32   LIST OF BUFFERS LINKED TO THIS GLOBAL CACHE ELEMENT:     flg: 0x00000000 state: SCURRENT tsn: 5 tsh: 0 mode: SHR foq: 0       pin: 'ktuwh02: ktugus'       addr: 0x700000223e150b8 obj: INVALID cls: UNDO HEAD bscn: 0xc54.80f5a25a | 
oracle internal错误如下:
ORA-00600: internal error code, arguments: [4097], [3909], [23], [121212]
从trace 内容下面的信息,可以判断出oracle在open的时候,是在对事务XID 0x0f45.017.0001d97c
进行操作时无法正常进行,进而抛出ora-00600 内部错误。
关于XID的结构如下:
XID= 0x0f45.017.0001d97c
0f45: 表示回滚段编号,转换为10进制后为3909
017:表示slot编号,转换为10进制后为23
0001d97c:表示wrap#号,转换为10进制后为121212
而从trace下面的内容可以看出,该事务操作的数据块是(5/2545616),即datafile 5,block 254516.
同时,在对数据库open之前,进行10046 trace跟踪时,从跟踪内容也可以确认,数据库在open时在
访问数据块file 5 block 2545616时出现异常,进而导致数据库无法打开,如下:
| 1 2 3 4 5 6 7 8 | EXEC #2:c=0,e=68,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,plh=4258302260,tim=32037576906172 FETCH #2:c=0,e=11,p=0,cr=3,cu=0,mis=0,r=1,dep=1,og=3,plh=4258302260,tim=32037576906202 CLOSE #2:c=0,e=4,dep=1,type=3,tim=32037576906229 WAIT #1: nam='db file sequential read' ela= 398 file#=4 block#=27168 blocks=1 obj#=0 tim=32037576906669 WAIT #1: nam='db file sequential read' ela= 255 file#=4 block#=94014 blocks=1 obj#=0 tim=32037576906991 WAIT #1: nam='db file sequential read' ela= 267 file#=5 block#=2545616 blocks=1 obj#=0 tim=32037576907293 Incident 598501 created, dump file: /oracle/app/product/diag/rdbms/xxxx/xxxx03/incident/incdir_598501/xxxx03_ora_15925410_i598501.trc ORA-00600: internal error code, arguments: [4097], [3909], [23], [121212], [], [], [], [], [], [], [], [] | 
到这里,我们可以确认问题出在回滚段上。按理说这里应该存在活动事务,然而我并没有发现:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | index  state cflags  wrap#    uel         scn            dba            parent-xid    nub     stmt_num    cmt   ------------------------------------------------------------------------------------------------    0x00    9    0x00  0x1d963  0x0015  0x0c52.0abb4e53  0x014197c8  0x0000.000.00000000  0x00000001   0x00000000  1381038712    0x01    9    0x00  0x1d952  0x001f  0x0c52.0ab7658c  0x01411a9f  0x0000.000.00000000  0x00000001   0x00000000  1381032874    0x02    9    0x17  0x1d981  0x0016  0x0c54.80ef57d6  0x014197ca  0x0000.000.00000000  0x00000001   0x014197ca  1381802983    0x03    9    0x00  0x1d970  0x0014  0x0c52.0ac0927b  0x014197ca  0x0000.000.00000000  0x00000001   0x00000000  1381045593    0x04    9    0x00  0x1d95f  0x001b  0x0c52.0abd8064  0x014197c9  0x0000.000.00000000  0x00000001   0x00000000  1381040912    0x05    9    0x00  0x1d95e  0x0006  0x0c52.0ab83478  0x01411a9f  0x0000.000.00000000  0x00000001   0x00000000  1381034672    0x06    9    0x00  0x1d96d  0x0008  0x0c52.0ab87c76  0x01411a9f  0x0000.000.00000000  0x00000001   0x00000000  1381034803    0x07    9    0x03  0x1d90c  0x001c  0x0c54.80f454d4  0x014197cd  0x0000.000.00000000  0x00000001   0x014197cd  1381803275    0x08    9    0x00  0x1d96b  0x000a  0x0c52.0ab8d83e  0x014197c8  0x0000.000.00000000  0x00000001   0x00000000  1381035069    0x09    9    0x00  0x1d93a  0x001a  0x0c52.0abe36ce  0x014197ca  0x0000.000.00000000  0x00000001   0x00000000  1381042341    0x0a    9    0x03  0x1d969  0x000e  0x0c52.0ab97338  0x014197c8  0x0000.000.00000000  0x00000001   0x014197c8  1381035606    0x0b    9    0x00  0x1d978  0x0007  0x0c54.80f3d624  0x014197cd  0x0000.000.00000000  0x00000001   0x00000000  1381803242    0x0c    9    0x17  0x1d967  0x001e  0x0c54.80f2d421  0x014197cc  0x0000.000.00000000  0x00000001   0x014197cc  1381803174    0x0d    9    0x17  0x1d966  0x0019  0x0c54.80f1250d  0x014197cb  0x0000.000.00000000  0x00000001   0x014197cb  1381803094    0x0e    9    0x00  0x1d965  0x0020  0x0c52.0ab99ed0  0x014197c8  0x0000.000.00000000  0x00000001   0x00000000  1381035752    0x0f    9    0x00  0x1d964  0xffff  0x0c54.80f5a25a  0x014197ce  0x0000.000.00000000  0x00000002   0x00000000  1381803359    0x10    9    0x00  0x1d963  0x0000  0x0c52.0abb2bf4  0x014197c8  0x0000.000.00000000  0x00000001   0x00000000  1381038687    0x11    9    0x00  0x1d952  0x0010  0x0c52.0abaefc0  0x014197c8  0x0000.000.00000000  0x00000001   0x00000000  1381038651    0x12    9    0x00  0x1d8f1  0x0003  0x0c52.0abff18d  0x014197ca  0x0000.000.00000000  0x00000001   0x00000000  1381044622    0x13    9    0x00  0x1d960  0x000d  0x0c54.80f0697d  0x014197cb  0x0000.000.00000000  0x00000001   0x00000000  1381803039    0x14    9    0x00  0x1d96f  0x001d  0x0c52.0ac0bd84  0x014197ca  0x0000.000.00000000  0x00000001   0x00000000  1381045638    0x15    9    0x00  0x1d96e  0x0021  0x0c52.0abbe7e3  0x014197c8  0x0000.000.00000000  0x00000001   0x00000000  1381039204    0x16    9    0x00  0x1d97d  0x0013  0x0c54.80f013a2  0x014197cb  0x0000.000.00000000  0x00000001   0x00000000  1381803019    0x17    9    0x00  0x1d96c  0x0001  0x0c52.0ab6d367  0x01411a9f  0x0000.000.00000000  0x00000001   0x00000000  1381032050    0x18    9    0x00  0x1d94b  0x0012  0x0c52.0abf8955  0x014197ca  0x0000.000.00000000  0x00000001   0x00000000  1381044066    0x19    9    0x03  0x1d94a  0x000c  0x0c54.80f22b8e  0x014197cc  0x0000.000.00000000  0x00000001   0x014197cc  1381803133    0x1a    9    0x00  0x1d969  0x0018  0x0c52.0abeabbd  0x014197ca  0x0000.000.00000000  0x00000001   0x00000000  1381042812    0x1b    9    0x00  0x1d968  0x0009  0x0c52.0abde1cb  0x014197ca  0x0000.000.00000000  0x00000002   0x00000000  1381041868    0x1c    9    0x03  0x1d967  0x000f  0x0c54.80f4e901  0x014197cd  0x0000.000.00000000  0x00000001   0x014197cd  1381803313    0x1d    9    0x00  0x1d966  0x0002  0x0c52.0ac116f6  0x00000000  0x0000.000.00000000  0x00000000   0x00000000  1381045850    0x1e    9    0x00  0x1d965  0x000b  0x0c54.80f34151  0x014197cd  0x0000.000.00000000  0x00000001   0x00000000  1381803201    0x1f    9    0x00  0x1d954  0x0005  0x0c52.0ab7b73c  0x01411a9f  0x0000.000.00000000  0x00000001   0x00000000  1381033709    0x20    9    0x00  0x1d943  0x0011  0x0c52.0aba5120  0x014197c8  0x0000.000.00000000  0x00000001   0x00000000  1381037404    0x21    9    0x00  0x1d962  0x0004  0x0c52.0abc21a6  0x014197c8  0x0000.000.00000000  0x00000001   0x00000000  1381039267 | 
所以,只能说该数据库在被强制关闭时oracle并没有来得及去更新warp#,最终出现了这个情况。 当然解决方案比较简单,屏蔽掉回滚段即可。话说客户这里,我strings一下发现有超过10000个回滚段,哈哈~~~。 还好,顺利打开数据库,且从我们的分析来看,没有数据丢失.



3 Responses to “15 TB 3节点RAC 的恢复记录”
是否需要重建UNDO表空间?
这里其实不需要,open之后,把问题回滚段drop即可。
安全起见可以重建undo!
[…] 详见原本博客链接地址:15 TB 3节点RAC 的恢复记录 本条目发布于 2013 年 10 月 25 日。属于 数据库教程 分类。作者是 admin。 0 次浏览 document.getElementById("bdshell_js").src = "http://bdimg.share.baidu.com/static/js/shell_v2.js?cdnversion=" + Math.ceil(new Date()/3600000) […]
Leave a Reply
You must be logged in to post a comment.