RAC 11G ASM磁盘损坏恢复

一个存储档案的rac数据库起不来了,生产环境是linux rac 11.2.0.4,原因是因为用工具测试磁盘IO时损坏了ocr所在磁盘组与存储数据ASM磁盘的磁盘头。下面是恢复过程:
1.检查crs的状态:

[grid@darac1 ~]$ crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager


[root@darac1 crsd]# ps -ef|grep crs
root      3126     1  1 10:34 ?        00:00:31 /u01/app/product/11.2.0/crs/bin/ohasd.bin reboot
grid      3514     1  0 10:34 ?        00:00:07 /u01/app/product/11.2.0/crs/bin/oraagent.bin
grid      3525     1  0 10:34 ?        00:00:00 /u01/app/product/11.2.0/crs/bin/mdnsd.bin
grid      3537     1  0 10:34 ?        00:00:16 /u01/app/product/11.2.0/crs/bin/gpnpd.bin
grid      3549     1  1 10:34 ?        00:00:33 /u01/app/product/11.2.0/crs/bin/gipcd.bin
root      4128     1  0 10:54 ?        00:00:02 /u01/app/product/11.2.0/crs/bin/cssdmonitor
root      4144     1  0 10:54 ?        00:00:01 /u01/app/product/11.2.0/crs/bin/cssdagent
grid      4167     1  2 10:55 ?        00:00:14 /u01/app/product/11.2.0/crs/bin/ocssd.bin 
root      4354  3680  0 11:04 pts/1    00:00:00 grep crs

2.强制关闭crs

[root@darac1 bin]# ./crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'darac1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'darac1'
CRS-2673: Attempting to stop 'ora.gipcd' on 'darac1'
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'darac1'
CRS-2677: Stop of 'ora.cssdmonitor' on 'darac1' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'darac1' succeeded
CRS-2677: Stop of 'ora.gipcd' on 'darac1' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'darac1'
CRS-2677: Stop of 'ora.gpnpd' on 'darac1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'darac1' has completed
CRS-4133: Oracle High Availability Services has been stopped.

3.以exclusive模式启动crs

[root@darac1 bin]# ./crsctl start crs -excl -nocrs
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.mdnsd' on 'darac1'
CRS-2676: Start of 'ora.mdnsd' on 'darac1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'darac1'
CRS-2676: Start of 'ora.gpnpd' on 'darac1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'darac1'
CRS-2672: Attempting to start 'ora.gipcd' on 'darac1'
CRS-2676: Start of 'ora.gipcd' on 'darac1' succeeded
CRS-2676: Start of 'ora.cssdmonitor' on 'darac1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'darac1'
CRS-2672: Attempting to start 'ora.diskmon' on 'darac1'
CRS-2676: Start of 'ora.diskmon' on 'darac1' succeeded
CRS-2676: Start of 'ora.cssd' on 'darac1' succeeded
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'darac1'
CRS-2679: Attempting to clean 'ora.cluster_interconnect.haip' on 'darac1'
CRS-2672: Attempting to start 'ora.ctssd' on 'darac1'
CRS-2681: Clean of 'ora.cluster_interconnect.haip' on 'darac1' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'darac1'
CRS-2676: Start of 'ora.ctssd' on 'darac1' succeeded
CRS-2676: Start of 'ora.drivers.acfs' on 'darac1' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'darac1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'darac1'
CRS-2676: Start of 'ora.asm' on 'darac1' succeeded

4.查看GI相关的alert.log日志文件如何

[ohasd(5040)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running). 
2016-10-13 11:20:47.302: 
[gpnpd(5215)]CRS-2328:GPNPD started on node darac1. 
2016-10-13 11:20:58.388: 
[ohasd(5040)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE
2016-10-13 11:21:00.608: 
[cssd(5318)]CRS-1713:CSSD daemon is started in clustered mode
2016-10-13 11:21:01.521: 
[/u01/app/product/11.2.0/crs/bin/orarootagent.bin(5304)]CRS-5013:Agent "/u01/app/product/11.2.0/crs/bin/orarootagent.bin" failed to start process 

"/u01/app/product/11.2.0/crs/bin/osysmond" for action "start": details at "(:CLSN00008:)" in 

"/u01/app/product/11.2.0/crs/log/darac1/agent/ohasd/orarootagent_root//orarootagent_root.log"
2016-10-13 11:21:03.585: 
[ohasd(5040)]CRS-2878:Failed to restart resource 'ora.crf'
2016-10-13 11:21:05.399: 
[/u01/app/product/11.2.0/crs/bin/orarootagent.bin(5340)]CRS-5013:Agent "/u01/app/product/11.2.0/crs/bin/orarootagent.bin" failed to start process 

"/u01/app/product/11.2.0/crs/bin/osysmond" for action "start": details at "(:CLSN00008:)" in 

"/u01/app/product/11.2.0/crs/log/darac1/agent/ohasd/orarootagent_root//orarootagent_root.log"
2016-10-13 11:21:10.703: 
[ohasd(5040)]CRS-2878:Failed to restart resource 'ora.crf'
2016-10-13 11:21:23.464: 
[cssd(5318)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in 

/u01/app/product/11.2.0/crs/log/darac1/cssd/ocssd.log
2016-10-13 11:21:38.698: 
[cssd(5318)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in 

/u01/app/product/11.2.0/crs/log/darac1/cssd/ocssd.log
2016-10-13 11:21:53.925: 
[cssd(5318)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in 

/u01/app/product/11.2.0/crs/log/darac1/cssd/ocssd.log
2016-10-13 11:22:09.463: 
[cssd(5318)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in 

/u01/app/product/11.2.0/crs/log/darac1/cssd/ocssd.log
2016-10-13 11:22:24.804: 
[cssd(5318)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in 

/u01/app/product/11.2.0/crs/log/darac1/cssd/ocssd.log
2016-10-13 11:22:40.252: 
[cssd(5318)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in 

/u01/app/product/11.2.0/crs/log/darac1/cssd/ocssd.log
2016-10-13 11:22:56.722: 
[cssd(5318)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in 

/u01/app/product/11.2.0/crs/log/darac1/cssd/ocssd.log
2016-10-13 11:23:12.009: 
[cssd(5318)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in 

/u01/app/product/11.2.0/crs/log/darac1/cssd/ocssd.log
2016-10-13 11:23:27.290: 
[cssd(5318)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in 

/u01/app/product/11.2.0/crs/log/darac1/cssd/ocssd.log
2016-10-13 11:23:42.872: 
[cssd(5318)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in 

/u01/app/product/11.2.0/crs/log/darac1/cssd/ocssd.log
2016-10-13 11:23:58.198: 
[cssd(5318)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in 

/u01/app/product/11.2.0/crs/log/darac1/cssd/ocssd.log
2016-10-13 11:24:13.500: 
[cssd(5318)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in 

/u01/app/product/11.2.0/crs/log/darac1/cssd/ocssd.log
2016-10-13 11:24:28.786: 
[cssd(5318)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in 

/u01/app/product/11.2.0/crs/log/darac1/cssd/ocssd.log
2016-10-13 11:24:43.488: 
[client(5394)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/product/11.2.0/crs/log/darac1/client/ocrcheck_5394.log.
2016-10-13 11:24:43.959: 
[cssd(5318)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in 

/u01/app/product/11.2.0/crs/log/darac1/cssd/ocssd.log
2016-10-13 11:24:51.823: 
[client(5424)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/product/11.2.0/crs/log/darac1/client/crsctl_grid.log.
2016-10-13 11:24:59.345: 
[cssd(5318)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in 

/u01/app/product/11.2.0/crs/log/darac1/cssd/ocssd.log
2016-10-13 11:25:14.526: 
[cssd(5318)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in 

/u01/app/product/11.2.0/crs/log/darac1/cssd/ocssd.log
2016-10-13 11:25:29.696: 
[cssd(5318)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in 

/u01/app/product/11.2.0/crs/log/darac1/cssd/ocssd.log
2016-10-13 11:25:44.860: 
[cssd(5318)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in 

/u01/app/product/11.2.0/crs/log/darac1/cssd/ocssd.log
2016-10-13 11:26:00.042: 
[cssd(5318)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in 

/u01/app/product/11.2.0/crs/log/darac1/cssd/ocssd.log
2016-10-13 11:26:15.218: 
[cssd(5318)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in 

/u01/app/product/11.2.0/crs/log/darac1/cssd/ocssd.log
2016-10-13 11:26:30.409: 
[cssd(5318)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in 

/u01/app/product/11.2.0/crs/log/darac1/cssd/ocssd.log
2016-10-13 11:26:45.577: 
[cssd(5318)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in 

/u01/app/product/11.2.0/crs/log/darac1/cssd/ocssd.log
2016-10-13 11:26:49.031: 
[client(5460)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/product/11.2.0/crs/log/darac1/client/ocrconfig_5460.log.
2016-10-13 11:27:00.766: 
[cssd(5318)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in 

/u01/app/product/11.2.0/crs/log/darac1/cssd/ocssd.log
2016-10-13 11:27:15.951: 
[cssd(5318)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in 

/u01/app/product/11.2.0/crs/log/darac1/cssd/ocssd.log
2016-10-13 11:27:31.142: 
[cssd(5318)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in 

/u01/app/product/11.2.0/crs/log/darac1/cssd/ocssd.log
2016-10-13 11:27:46.339: 
[cssd(5318)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in 

/u01/app/product/11.2.0/crs/log/darac1/cssd/ocssd.log
2016-10-13 11:28:01.530: 
[cssd(5318)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in 

/u01/app/product/11.2.0/crs/log/darac1/cssd/ocssd.log
2016-10-13 11:28:16.733: 
[cssd(5318)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in 

/u01/app/product/11.2.0/crs/log/darac1/cssd/ocssd.log
2016-10-13 11:28:32.008: 
[cssd(5318)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in 

/u01/app/product/11.2.0/crs/log/darac1/cssd/ocssd.log
2016-10-13 11:28:47.191: 
[cssd(5318)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in 

/u01/app/product/11.2.0/crs/log/darac1/cssd/ocssd.log
2016-10-13 11:29:02.389: 
[cssd(5318)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in 

/u01/app/product/11.2.0/crs/log/darac1/cssd/ocssd.log
2016-10-13 11:29:17.610: 
[cssd(5318)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in 

/u01/app/product/11.2.0/crs/log/darac1/cssd/ocssd.log
2016-10-13 11:29:32.832: 
[cssd(5318)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in 

/u01/app/product/11.2.0/crs/log/darac1/cssd/ocssd.log
2016-10-13 11:29:48.035: 
[cssd(5318)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in 

/u01/app/product/11.2.0/crs/log/darac1/cssd/ocssd.log
2016-10-13 11:30:03.229: 
[cssd(5318)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in 

/u01/app/product/11.2.0/crs/log/darac1/cssd/ocssd.log
2016-10-13 11:30:18.434: 
[cssd(5318)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in 

/u01/app/product/11.2.0/crs/log/darac1/cssd/ocssd.log
2016-10-13 11:30:33.679: 
[cssd(5318)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in 

/u01/app/product/11.2.0/crs/log/darac1/cssd/ocssd.log
2016-10-13 11:30:48.876: 
[cssd(5318)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in 

/u01/app/product/11.2.0/crs/log/darac1/cssd/ocssd.log
2016-10-13 11:31:01.534: 
[/u01/app/product/11.2.0/crs/bin/cssdagent(5284)]CRS-5818:Aborted command 'start' for resource 'ora.cssd'. Details at (:CRSAGF00113:) {0:0:2} in 

/u01/app/product/11.2.0/crs/log/darac1/agent/ohasd/oracssdagent_root//oracssdagent_root.log.
2016-10-13 11:31:01.540: 
[cssd(5318)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/app/product/11.2.0/crs/log/darac1/cssd/ocssd.log
2016-10-13 11:31:01.541: 
[cssd(5318)]CRS-1603:CSSD on node darac1 shutdown by user.

从上面的信息可以看到找不到voting files

5.检查ASM的alert.log可以找如下创建CRSDG,DATADG磁盘组的创建语句:

Wed Dec 02 16:09:01 2015

SQL> CREATE DISKGROUP CRSDG EXTERNAL REDUNDANCY  DISK '/dev/raw/raw1' ATTRIBUTE 'compatible.asm'='11.2.0.0.0','au_size'='1M' /* ASMCA */ 

6.检查磁盘头

[grid@darac1 ~]$ kfed read /dev/raw/raw1
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:              2147483648 ; 0x008: disk=0
kfbh.check:                   300392945 ; 0x00c: 0x11e7a1f1
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
B7F46200 00000000 00000000 00000000 00000000  [................]
  Repeat 255 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

7.使用kfed恢复CRSDG的磁盘头,但因为备份信息也被损坏所以恢复时报错,而且没有手动备份

[grid@darac1 ~]$ kfed repair /dev/raw/raw1

KFED-00320: Invalid block num1 = [0], num2 = [1], error = [endian_kfbh]

没有通过自动备份的磁盘头信息来进行恢复,只能使用自动备份的ocr信息来恢复了操作如下。

8.创建磁盘组

[grid@darac1 ~]$ sqlplus / as sysasm

SQL*Plus: Release 11.2.0.4.0 Production on Thu Oct 13 13:00:42 2016

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - Production
With the Real Application Clusters and Automatic Storage Management options

SQL> select * from v$asm_diskgroup;

no rows selected

SQL> create diskgroup CRSDG external redundancy disk  '/dev/raw/raw1' attribute 'COMPATIBLE.ASM' = '11.2.0.0.0';

Diskgroup created.

9.查看自动备份的ocr文件

[root@darac1 bin]# ./ocrconfig -showbackup
PROT-26: Oracle Cluster Registry backup locations were retrieved from a local copy

darac2     2016/10/13 06:29:53     /u01/app/product/11.2.0/crs/cdata/darac-cluster/backup00.ocr

darac2     2016/10/13 02:29:45     /u01/app/product/11.2.0/crs/cdata/darac-cluster/backup01.ocr

darac2     2016/10/12 22:29:37     /u01/app/product/11.2.0/crs/cdata/darac-cluster/backup02.ocr

darac2     2016/10/12 02:27:20     /u01/app/product/11.2.0/crs/cdata/darac-cluster/day.ocr

darac2     2016/10/11 22:27:10     /u01/app/product/11.2.0/crs/cdata/darac-cluster/week.ocr

10.还原ocr

[root@darac1 bin]# ./ocrconfig -restore /u01/app/product/11.2.0/crs/cdata/darac-cluster/backup00.ocr

11.处理votedisk

[root@darac1 bin]# ./ocrconfig -restore /u01/app/product/11.2.0/crs/cdata/darac-cluster/backup00.ocr
[root@darac1 bin]# ./crsctl replace votedisk +CRSDG
Successful addition of voting disk 44eaf86504ea4f76bfb43cb7931a3fc7.
Successfully replaced voting disk group with +CRSDG.
CRS-4266: Voting file(s) successfully replaced

12.创建asm spfile

[grid@darac1 ~]$ vi /tmp/asm.txt
instance_type='asm'
large_pool_size=12M
remote_login_passwordfile= 'EXCLUSIVE'
asm_diskstring           = '/dev/raw/raw*'
asm_power_limit          =1


[grid@darac1 ~]$ sqlplus / as sysasm

SQL*Plus: Release 11.2.0.4.0 Production on Thu Oct 13 13:40:02 2016

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - Production
With the Real Application Clusters and Automatic Storage Management options

SQL> create spfile='+CRSDG' FROM pfile='/tmp/asm.txt';

File created.

13.重启crs

[root@darac1 bin]# ./crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'darac1'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'darac1'
CRS-2673: Attempting to stop 'ora.ctssd' on 'darac1'
CRS-2673: Attempting to stop 'ora.asm' on 'darac1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'darac1'
CRS-2677: Stop of 'ora.ctssd' on 'darac1' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'darac1' succeeded
CRS-2677: Stop of 'ora.asm' on 'darac1' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'darac1'
CRS-2677: Stop of 'ora.drivers.acfs' on 'darac1' succeeded
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'darac1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'darac1'
CRS-2677: Stop of 'ora.cssd' on 'darac1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'darac1'
CRS-2677: Stop of 'ora.gipcd' on 'darac1' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'darac1'
CRS-2677: Stop of 'ora.gpnpd' on 'darac1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'darac1' has completed
CRS-4133: Oracle High Availability Services has been stopped.

[root@darac1 bin]# ./crsctl start crs
CRS-4123: Oracle High Availability Services has been started.


[grid@darac1 ~]$ crsctl stat res -t 
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.CRSDG.dg
               ONLINE  ONLINE       darac1                                       
               ONLINE  ONLINE       darac2                                       
ora.DATADG.dg
               ONLINE  OFFLINE      darac1                                       
               ONLINE  OFFLINE      darac2                                       
ora.LISTENER.lsnr
               ONLINE  ONLINE       darac1                                       
               ONLINE  ONLINE       darac2                                       
ora.asm
               ONLINE  ONLINE       darac1                   Started             
               ONLINE  ONLINE       darac2                   Started             
ora.gsd
               OFFLINE OFFLINE      darac1                                       
               OFFLINE OFFLINE      darac2                                       
ora.net1.network
               ONLINE  ONLINE       darac1                                       
               ONLINE  ONLINE       darac2                                       
ora.ons
               ONLINE  ONLINE       darac1                                       
               ONLINE  OFFLINE      darac2                                       
ora.registry.acfs
               ONLINE  ONLINE       darac1                                       
               ONLINE  ONLINE       darac2                                       
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       darac1                                       
ora.cvu
      1        ONLINE  ONLINE       darac1                                       
ora.darac.db
      1        ONLINE  OFFLINE                               Corrupted Controlfi 
                                                             le                  
      2        ONLINE  OFFLINE                               Corrupted Controlfi 
                                                             le                  
ora.darac1.vip
      1        ONLINE  ONLINE       darac1                                       
ora.darac2.vip
      1        ONLINE  ONLINE       darac2                                       
ora.darac3.vip
      1        ONLINE  OFFLINE                                                   
ora.oc4j
      1        ONLINE  OFFLINE                               STARTING            
ora.scan1.vip
      1        ONLINE  ONLINE       darac1    

从上面的信息可以看到DATADG磁盘组没有加载,数据库darac也没有启动,并且显示错误的控制文件。alert_asm1.log中,有创建磁盘组的信息:

Wed Dec 02 18:27:46 2015

SQL> CREATE DISKGROUP DATADG EXTERNAL REDUNDANCY  DISK '/dev/raw/raw3' SIZE 10240M  ATTRIBUTE 'compatible.asm'='11.2.0.0.0','au_size'='1M' /* ASMCA */ 

14.查看磁盘组的状态

SQL> select name,state from v$asm_diskgroup;

NAME                                               STATE
-------------------------------------------------- ----------------------
CRSDG                                              MOUNTED
ARCH                                               MOUNTED

15.手动加载DATADG磁盘报错

SQL> alter diskgroup DATADG mount;
alter diskgroup DATADG mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15017: diskgroup "DATADG" cannot be mounted
ORA-15040: diskgroup is incomplete

16.查看磁盘组磁盘头的状态,可以看到/dev/raw/raw3为candidate

SQL> select name,path,header_status from v$asm_disk;

NAME                                               PATH                                               HEADER_STATUS
-------------------------------------------------- -------------------------------------------------- ------------------------------
                                                   /dev/raw/raw3                                      CANDIDATE
ARCH_0000                                          /dev/raw/raw2                                      MEMBER
CRSDG_0000                                         /dev/raw/raw1                                      MEMBER

17.尝试使用自动备份的磁盘头信息来恢复磁盘头,这个DATADG磁盘恢复成功。

[grid@darac1 ~]$ kfed repair /dev/raw/raw3


SQL> select name,state from v$asm_diskgroup;

NAME                                               STATE
-------------------------------------------------- ----------------------
CRSDG                                              MOUNTED
DATADG                                             DISMOUNTED
ARCH                                               MOUNTED

SQL> select name,path,header_status from v$asm_disk;

NAME                                               PATH                                               HEADER_STATUS
-------------------------------------------------- -------------------------------------------------- ------------------------------
                                                   /dev/raw/raw3                                      MEMBER
ARCH_0000                                          /dev/raw/raw2                                      MEMBER
CRSDG_0000                                         /dev/raw/raw1                                      MEMBER

18.手动加载DATADG磁盘报错

SQL> alter diskgroup DATADG mount;

Diskgroup altered.

SQL> select name,state from v$asm_diskgroup;

NAME                                               STATE
-------------------------------------------------- ----------------------
CRSDG                                              MOUNTED
DATADG                                             MOUNTED
ARCH                                               MOUNTED

19.查看磁盘组磁盘头的状态,可以看到/dev/raw/raw3为member

SQL> select name,path,header_status from v$asm_disk;

NAME                                               PATH                                               HEADER_STATUS
-------------------------------------------------- -------------------------------------------------- ------------------------------
ARCH_0000                                          /dev/raw/raw2                                      MEMBER
DATADG_0000                                        /dev/raw/raw3                                      MEMBER
CRSDG_0000                                         /dev/raw/raw1                                      MEMBER

20.启动数据库darac

[grid@darac1 ~]$ srvctl start database -d darac
[grid@darac1 ~]$ crsctl stat res -t 
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ARCH.dg
               ONLINE  ONLINE       darac1                                       
               ONLINE  ONLINE       darac2                                       
ora.CRSDG.dg
               ONLINE  ONLINE       darac1                                       
               ONLINE  ONLINE       darac2                                       
ora.DATADG.dg
               ONLINE  ONLINE       darac1                                       
               ONLINE  ONLINE       darac2                                       
ora.LISTENER.lsnr
               ONLINE  ONLINE       darac1                                       
               ONLINE  ONLINE       darac2                                       
ora.asm
               ONLINE  ONLINE       darac1                   Started             
               ONLINE  ONLINE       darac2                   Started             
ora.gsd
               OFFLINE OFFLINE      darac1                                       
               OFFLINE OFFLINE      darac2                                       
ora.net1.network
               ONLINE  ONLINE       darac1                                       
               ONLINE  ONLINE       darac2                                       
ora.ons
               ONLINE  ONLINE       darac1                                       
               ONLINE  ONLINE       darac2                                       
ora.registry.acfs
               ONLINE  ONLINE       darac1                                       
               ONLINE  ONLINE       darac2                                       
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       darac1                                       
ora.cvu
      1        ONLINE  ONLINE       darac1                                       
ora.darac.db
      1        ONLINE  ONLINE       darac1                   Open                
      2        ONLINE  ONLINE       darac2                   Open                
ora.darac1.vip
      1        ONLINE  ONLINE       darac1                                       
ora.darac2.vip
      1        ONLINE  ONLINE       darac2                                       
ora.darac3.vip
      1        ONLINE  OFFLINE                                                   
ora.oc4j
      1        ONLINE  ONLINE       darac1                                       
ora.scan1.vip
      1        ONLINE  ONLINE       darac1    

到此数据库恢复成功。

2 thoughts on “RAC 11G ASM磁盘损坏恢复

  1. 1 第13步,重建DATADG操作,此举不怕把磁盘组上原先存在的业务数据给整丢了?

    2 第8步,重建了CRSDG磁盘组后,不可以直接执行:kfed repairt /dev/raw/raw1?也就是,第9步的意义何在?
    还是说,kfed 操作必须是在 CRS磁盘组 正常的情况下,才能执行,当 crs磁盘组 故障时,得用这命令:

    ./crsctl replace votedisk +CRSDG

    3 第7步,kfed repairt /dev/raw/raw1 失败,而第17步,却可以使用自动备份的磁盘头来恢复,为何?

    1. 第7步,kfed repairt /dev/raw/raw1 失败,而第17步,却可以使用自动备份的磁盘头来恢复,为何?
      因为第17步恢复的的datadg磁盘组,IO测试工具没有把/dev/raw/raw3磁盘所自动备份的磁盘头信息给破坏

      第8步,重建了CRSDG磁盘组后,不可以直接执行:kfed repairt /dev/raw/raw1?也就是,第9步的意义何在?
      还是说,kfed 操作必须是在 CRS磁盘组 正常的情况下,才能执行,当 crs磁盘组 故障时,得用这命令:

      ./crsctl replace votedisk +CRSDG

      第9步的意义是查看自动备份的ocr文件,因为文件每四个小时备份一次,还原时要指定备分文件名
      重建crsdg磁盘组后,只能使用备份的ocr文件来还原

      第13步,是ASM的日志文件中显示了操作信息,因为磁盘组不能mount ,asm实例选择的重建操作,是日志记录,我不是的恢复操作记录。

eric0435进行回复 取消回复

电子邮件地址不会被公开。