私有IP丢失造成Oracle 12C RAC集群节点不能启动

某生产环境Oracle Linux 7.1,Oracle 12C RAC显示节点1的集群资源没有启动,信息如下:

[grid@cs2 ~]$ crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       cs2                      STABLE
ora.CRS.dg
               ONLINE  ONLINE       cs2                      STABLE
ora.DATA.dg
               ONLINE  ONLINE       cs2                      STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       cs2                      STABLE
ora.chad
               ONLINE  OFFLINE      cs2                      STABLE
ora.net1.network
               ONLINE  ONLINE       cs2                      STABLE
ora.ons
               ONLINE  OFFLINE      cs2                      STABLE
ora.proxy_advm
               OFFLINE OFFLINE      cs2                      STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       cs2                      STABLE
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       cs2                      STABLE
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       cs2                      STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       cs2                      169.254.255.92 88.88
                                                             .88.191,STABLE
ora.asm
      1        ONLINE  OFFLINE                               STABLE
      2        ONLINE  ONLINE       cs2                      Started,STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.cs.db
      1        ONLINE  OFFLINE                               STABLE
      2        ONLINE  OFFLINE                               Instance Shutdown,ST
                                                             ABLE
ora.cs1.vip
      1        ONLINE  INTERMEDIATE cs2                      FAILED OVER,STABLE
ora.cs2.vip
      1        ONLINE  ONLINE       cs2                      STABLE
ora.cvu
      1        ONLINE  ONLINE       cs2                      STABLE
ora.gns
      1        ONLINE  OFFLINE                               STABLE
ora.gns.vip
      1        ONLINE  OFFLINE                               STABLE
ora.mgmtdb
      1        ONLINE  OFFLINE                               Instance Shutdown,ST
                                                             ABLE
ora.qosmserver
      1        ONLINE  ONLINE       cs2                      STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       cs2                      STABLE
ora.scan2.vip
      1        ONLINE  ONLINE       cs2                      STABLE
ora.scan3.vip
      1        ONLINE  ONLINE       cs2                      STABLE
--------------------------------------------------------------------------------

如果查看节点1上的alert.log文件,信息如下:

[root@cs1 soft]# tail -100 /u01/app/grid/diag/crs/cs1/crs/trace/alert.log

2020-01-14 20:45:50.226 [CVUD(2366CRS-10051: CVU found following errors with Clusterware setup : PRVE-3191 :
(DESCRIPTION = (LOAD_BALANCE=on)  (ADDRESS = (PROTOCOL = TCP)(HOST = cs-cluster-scan.cs-cluster.jy.net)(PORT = 1521)) (CONNECT_DATA =(SERVER = DEDICATED)(SERVICE_NAME = cs)))
Listener refused the connection with the following error:
ORA-12514, TNS:listener does not currently know of service requested in connect descriptor
 

2020-02-17 14:49:28.563 [ORAROOTAGENT(15500)]CRS-5822: Agent '/u01/app/product/12.2.0/crs/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117:) {0:5:63832} in /u01/app/grid/diag/crs/cs1/crs/trace/crsd_orarootagent_root.trc.
2020-02-17 14:49:28.584 [ORAAGENT(15466)]CRS-5822: Agent '/u01/app/product/12.2.0/crs/bin/oraagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:1:36} in /u01/app/grid/diag/crs/cs1/crs/trace/crsd_oraagent_grid.trc.
2020-02-17 14:49:29.607 [GPNPD(3293)]CRS-2329: GPNPD on node cs1 shut down. 
2020-02-17 14:49:30.123 [OSYSMOND(6077)]CRS-8504: Oracle Clusterware OSYSMOND process with operating system process ID 6077 is exiting
2020-02-17 14:49:31.377 [MDNSD(3254)]CRS-5602: mDNS service stopping by request.
2020-02-17 14:49:31.747 [MDNSD(3254)]CRS-8504: Oracle Clusterware MDNSD process with operating system process ID 3254 is exiting
2020-02-17 14:49:46.650 [OCTSSD(5883)]CRS-2405: The Cluster Time Synchronization Service on host cs1 is shutdown by user
2020-02-17 14:49:46.651 [OCTSSD(5883)]CRS-8504: Oracle Clusterware OCTSSD process with operating system process ID 5883 is exiting
2020-02-17 14:49:47.651 [OCSSD(3616)]CRS-1603: CSSD on node cs1 has been shut down.
2020-02-17 14:49:47.958 [OCSSD(3616)]CRS-1660: The CSS daemon shutdown has completed
2020-02-17 14:49:47.959 [OCSSD(3616)]CRS-8504: Oracle Clusterware OCSSD process with operating system process ID 3616 is exiting
2020-02-17 14:49:49.726 [ORAAGENT(3235)]CRS-5822: Agent '/u01/app/product/12.2.0/crs/bin/oraagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:9:818} in /u01/app/grid/diag/crs/cs1/crs/trace/ohasd_oraagent_grid.trc.
2020-02-17 14:49:49.727 [ORAROOTAGENT(3029)]CRS-5822: Agent '/u01/app/product/12.2.0/crs/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117:) {0:1:21} in /u01/app/grid/diag/crs/cs1/crs/trace/ohasd_orarootagent_root.trc.

根据上面的错误信息提示查看crsd_oraagent_grid.trc跟踪文件:

[root@cs1 ~]# more /u01/app/grid/diag/crs/cs1/crs/trace/crsd_oraagent_grid.trc
Trace file /u01/app/grid/diag/crs/cs1/crs/trace/crsd_oraagent_grid.trc
Oracle Database 12c Clusterware Release 12.2.0.1.0 - Production Copyright 1996, 2016 Oracle. All rights reserved.

*** TRACE CONTINUED FROM FILE /u01/app/grid/diag/crs/cs1/crs/trace/crsd_oraagent_grid_910.trc ***

2020-02-17 14:39:22.129 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] LsnrAgent::regEndpoint 110 reset regEndPointDone:0
2020-02-17 14:39:22.129 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] generateEndPointStrings:generate endpoints
2020-02-17 14:39:22.129 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] VendorType=0
2020-02-17 14:39:22.129 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] VendorType=0
2020-02-17 14:39:22.129 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] VendorType=0
2020-02-17 14:39:22.129 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] VendorType=0
2020-02-17 14:39:22.129 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] generateEndPointStrings:No IP matching the subnet 88.88.88.0
2020-02-17 14:39:22.130 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] Ocr Context init default level 1814629536
2020-02-17 14:39:22.133 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] Agent::getDeploymentPlatformId return 
2020-02-17 14:39:22.133 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] LsnrAgent::generateEndPoints 000 entry {nonPriv
2020-02-17 14:39:22.133 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] LsnrAgent::generateEndPoints 000 entry { lsnrResName: endpAttr:TCP:1526 type:4
2020-02-17 14:39:22.133 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] LsnrAgent::generateEndPoints 040 Listener ResName:
2020-02-17 14:39:22.133 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] LsnrAgent::removeDuplicates 
2020-02-17 14:39:22.133 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] LsnrAgent::generateEndPoints 999 exit nonPriv}
2020-02-17 14:39:22.133 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] LsnrAgent::regEndpoint 200 endpStrings empty
2020-02-17 14:39:22.134 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] AsmLsnrAgent::init
2020-02-17 14:39:22.154 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] VendorType=0
2020-02-17 14:39:22.154 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] VendorType=0
2020-02-17 14:39:22.154 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] VendorType=0
2020-02-17 14:39:22.154 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] VendorType=0
2020-02-17 14:39:22.154 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] Cannot find an IP address matching the subnet 88.88.88.0
2020-02-17 14:39:22.154 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] Ocr Context init default level 1814629536
2020-02-17 14:39:22.157 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] Agent::getDeploymentPlatformId return 
2020-02-17 14:39:22.158 : CLSCEVT:2462013184: (:CLSCE0028:)clsce_unsubscribe 0x7f725c004510 successfully unsubscribed : 974833
2020-02-17 14:39:22.158 : USRTHRD:2462013184: {0:1:18} WorkerThread::removeWorker total workers: 2
2020-02-17 14:39:22.158 : USRTHRD:2462013184: {0:1:18} Worker::schedule adding worker 'ReconnSub-LsnrAgentSub-ASMNET1LSNR_ASM' to thread 'ReconnectingSubscribers'
2020-02-17 14:39:22.158 : USRTHRD:2462013184: {0:1:18} WorkerThread::addWorker total workers: 3
2020-02-17 14:39:22.158 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] Skipping CSS Initialization for rebootless recovery
2020-02-17 14:39:22.159 : CLSCEVT:2478823168: clsce_subscribe 0x7f72600d06b0 filter='(^CRS_RESOURCE_PROFILE_CHANGE.*RESOURCE_CLASS='(scan_vip|vip)')|(^CRS_RESOURCE_PROFILE_CHANGE.*TYPE='ora\.network\.type')', flags=1, handler=0xc6ad0c, 
arg=0x7f726016de10

从上面的错误信息中可以找到Cannot find an IP address matching the subnet 88.88.88.0,这说明私有IP丢失了。

如是查看网络IP地址
节点2私有IP正常

[grid@cs2 ~]$ ifconfig -a
ens160: flags=4163  mtu 1500
        inet 10.13.13.191  netmask 255.255.255.0  broadcast 10.13.13.255
        inet6 fe80::250:56ff:fea0:92af  prefixlen 64  scopeid 0x20
        ether 00:50:56:a0:92:af  txqueuelen 1000  (Ethernet)
        RX packets 127560233  bytes 10186683662 (9.4 GiB)
        RX errors 0  dropped 353  overruns 0  frame 0
        TX packets 18954431  bytes 1554204413 (1.4 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens160:1: flags=4163  mtu 1500
        inet 10.13.13.130  netmask 255.255.255.0  broadcast 10.13.13.255
        ether 00:50:56:a0:92:af  txqueuelen 1000  (Ethernet)

ens160:2: flags=4163  mtu 1500
        inet 10.13.13.147  netmask 255.255.255.0  broadcast 10.13.13.255
        ether 00:50:56:a0:92:af  txqueuelen 1000  (Ethernet)

ens160:3: flags=4163  mtu 1500
        inet 10.13.13.141  netmask 255.255.255.0  broadcast 10.13.13.255
        ether 00:50:56:a0:92:af  txqueuelen 1000  (Ethernet)

ens160:4: flags=4163  mtu 1500
        inet 10.13.13.138  netmask 255.255.255.0  broadcast 10.13.13.255
        ether 00:50:56:a0:92:af  txqueuelen 1000  (Ethernet)

ens160:5: flags=4163  mtu 1500
        inet 10.13.13.137  netmask 255.255.255.0  broadcast 10.13.13.255
        ether 00:50:56:a0:92:af  txqueuelen 1000  (Ethernet)

ens160:6: flags=4163  mtu 1500
        inet 10.13.13.136  netmask 255.255.255.0  broadcast 10.13.13.255
        ether 00:50:56:a0:92:af  txqueuelen 1000  (Ethernet)

ens192: flags=4163  mtu 1500
        inet 88.88.88.191  netmask 255.255.255.0  broadcast 88.88.88.255
        inet6 fe80::250:56ff:fea0:a05c  prefixlen 64  scopeid 0x20
        ether 00:50:56:a0:a0:5c  txqueuelen 1000  (Ethernet)
        RX packets 294716840  bytes 215309349272 (200.5 GiB)
        RX errors 0  dropped 355  overruns 0  frame 0
        TX packets 63703465  bytes 363393728693 (338.4 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens192:1: flags=4163  mtu 1500
        inet 169.254.255.92  netmask 255.255.0.0  broadcast 169.254.255.255
        ether 00:50:56:a0:a0:5c  txqueuelen 1000  (Ethernet)

lo: flags=73  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10
        loop  txqueuelen 0  (Local Loopback)
        RX packets 90029207  bytes 223833448058 (208.4 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 90029207  bytes 223833448058 (208.4 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

virbr0: flags=4099  mtu 1500
        inet 192.168.122.1  netmask 255.255.255.0  broadcast 192.168.122.255
        ether 52:54:00:d0:6e:0b  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

virbr0-nic: flags=4098  mtu 1500
        ether 52:54:00:d0:6e:0b  txqueuelen 500  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0


[grid@cs2 ~]$ cat /etc/hosts

127.0.0.1      localhost
10.13.13.190  cs1
10.13.13.191  cs2

#10.13.13.192 cs1-vip.jy.net cs1-vip
#10.13.13.193 cs2-vip.jy.net cs2-vip

88.88.88.190   cs1-priv.jy.net cs1-priv
88.88.88.191   cs2-priv.jy.net cs2-priv


10.13.13.141 cs-cluster-scan 
10.13.13.142 cs-cluster-scan 
10.13.13.143 cs-cluster-scan 

节点1的私有IP丢失

[root@cs1 ~]# ifconfig -a
ens160: flags=4163  mtu 1500
        inet 10.13.13.190  netmask 255.255.255.0  broadcast 10.13.13.255
        inet6 fe80::250:56ff:fea0:4e69  prefixlen 64  scopeid 0x20
        inet6 2018::2  prefixlen 64  scopeid 0x0
        ether 00:50:56:a0:4e:69  txqueuelen 1000  (Ethernet)
        RX packets 53266410  bytes 3937179306 (3.6 GiB)
        RX errors 0  dropped 143  overruns 0  frame 0
        TX packets 2988621  bytes 284679938 (271.4 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens192: flags=4163  mtu 1500
        inet6 fe80::250:56ff:fea0:17dc  prefixlen 64  scopeid 0x20
        ether 00:50:56:a0:17:dc  txqueuelen 1000  (Ethernet)
        RX packets 70361859  bytes 5602168856 (5.2 GiB)
        RX errors 0  dropped 141  overruns 0  frame 0
        TX packets 24902418  bytes 195063342601 (181.6 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10
        loop  txqueuelen 0  (Local Loopback)
        RX packets 30235303  bytes 10958756943 (10.2 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 30235303  bytes 10958756943 (10.2 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

virbr0: flags=4099  mtu 1500
        inet 192.168.122.1  netmask 255.255.255.0  broadcast 192.168.122.255
        ether 52:54:00:d0:6e:0b  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

virbr0-nic: flags=4098  mtu 1500
        ether 52:54:00:d0:6e:0b  txqueuelen 500  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0



设置私有IP

[root@cs1 network-scripts]# cat ifcfg-Ethernet_connection_ens192
TYPE=Ethernet
BOOTPROTO=none
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=no
NAME=ens192
UUID=8ec45d90-c1fb-4924-84a0-72f58fc9b82f
DEVICE=ens192
ONBOOT=yes
IPADDR=88.88.88.190
PREFIX=24
NETMASK=255.255.255.0

[root@cs1 network-scripts]# ifconfig -a
ens160: flags=4163  mtu 1500
        inet 10.138.130.190  netmask 255.255.255.0  broadcast 10.138.130.255
        inet6 fe80::250:56ff:fea0:4e69  prefixlen 64  scopeid 0x20
        inet6 2018::2  prefixlen 64  scopeid 0x0
        ether 00:50:56:a0:4e:69  txqueuelen 1000  (Ethernet)
        RX packets 53277081  bytes 3937945677 (3.6 GiB)
        RX errors 0  dropped 143  overruns 0  frame 0
        TX packets 2989099  bytes 284806540 (271.6 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens192: flags=4163  mtu 1500
        inet 88.88.88.190  netmask 255.255.255.0  broadcast 88.88.88.255
        inet6 fe80::250:56ff:fea0:17dc  prefixlen 64  scopeid 0x20
        ether 00:50:56:a0:17:dc  txqueuelen 1000  (Ethernet)
        RX packets 70371906  bytes 5602881459 (5.2 GiB)
        RX errors 0  dropped 141  overruns 0  frame 0
        TX packets 24902428  bytes 195063343317 (181.6 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10
        loop  txqueuelen 0  (Local Loopback)
        RX packets 30236949  bytes 10959017396 (10.2 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 30236949  bytes 10959017396 (10.2 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

virbr0: flags=4099  mtu 1500
        inet 192.168.122.1  netmask 255.255.255.0  broadcast 192.168.122.255
        ether 52:54:00:d0:6e:0b  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

virbr0-nic: flags=4098  mtu 1500
        ether 52:54:00:d0:6e:0b  txqueuelen 500  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@cs1 ~]# crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on ‘cs2’
CRS-2673: Attempting to stop ‘ora.mdnsd’ on ‘cs2’
CRS-2673: Attempting to stop ‘ora.gpnpd’ on ‘cs2’
CRS-2673: Attempting to stop ‘ora.crf’ on ‘cs2’
CRS-2673: Attempting to stop ‘ora.ctssd’ on ‘cs2’
CRS-2673: Attempting to stop ‘ora.evmd’ on ‘cs2’
CRS-2673: Attempting to stop ‘ora.asm’ on ‘cs2’
CRS-2673: Attempting to stop ‘ora.drivers.acfs’ on ‘cs2’
CRS-2677: Stop of ‘ora.drivers.acfs’ on ‘cs2’ succeeded
CRS-2677: Stop of ‘ora.ctssd’ on ‘cs2’ succeeded
CRS-2677: Stop of ‘ora.crf’ on ‘cs2’ succeeded
CRS-2677: Stop of ‘ora.gpnpd’ on ‘cs2’ succeeded
CRS-2677: Stop of ‘ora.evmd’ on ‘cs2’ succeeded
CRS-2677: Stop of ‘ora.mdnsd’ on ‘cs2’ succeeded
CRS-2677: Stop of ‘ora.asm’ on ‘cs2’ succeeded
CRS-2673: Attempting to stop ‘ora.cluster_interconnect.haip’ on ‘cs2’
CRS-2677: Stop of ‘ora.cluster_interconnect.haip’ on ‘cs2’ succeeded
CRS-2673: Attempting to stop ‘ora.cssd’ on ‘cs2’
CRS-2677: Stop of ‘ora.cssd’ on ‘cs2’ succeeded
CRS-2673: Attempting to stop ‘ora.gipcd’ on ‘cs2’
CRS-2673: Attempting to stop ‘ora.driver.afd’ on ‘cs2’
CRS-2677: Stop of ‘ora.driver.afd’ on ‘cs2’ succeeded
CRS-2677: Stop of ‘ora.gipcd’ on ‘cs2’ succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on ‘cs2’ has completed
CRS-4133: Oracle High Availability Services has been stopped.
[root@cs1 ~]# crsctl start crs
CRS-4123: Oracle High Availability Services has been started.

手动启动节点1的集群软件

[root@cs1 ~]# crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       cs1                      STABLE
               ONLINE  ONLINE       cs2                      STABLE
ora.CRS.dg
               ONLINE  ONLINE       cs1                      STABLE
               ONLINE  ONLINE       cs2                      STABLE
ora.DATA.dg
               ONLINE  ONLINE       cs1                      STABLE
               ONLINE  ONLINE       cs2                      STABLE
ora.DN.dg
               OFFLINE OFFLINE      cs1                      STABLE
               OFFLINE OFFLINE      cs2                      STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       cs1                      STABLE
               ONLINE  ONLINE       cs2                      STABLE
ora.chad
               ONLINE  ONLINE       cs1                      STABLE
               ONLINE  ONLINE       cs2                      STABLE
ora.net1.network
               ONLINE  ONLINE       cs1                      STABLE
               ONLINE  ONLINE       cs2                      STABLE
ora.ons
               ONLINE  ONLINE       cs1                      STABLE
               ONLINE  ONLINE       cs2                      STABLE
ora.proxy_advm
               OFFLINE OFFLINE      cs1                      STABLE
               OFFLINE OFFLINE      cs2                      STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       cs1                      STABLE
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       cs2                      STABLE
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       cs1                      STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       cs1                      169.254.207.24 88.88
                                                             .88.190,STABLE
ora.asm
      1        ONLINE  ONLINE       cs1                      Started,STABLE
      2        ONLINE  ONLINE       cs2                      Started,STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.cs.db
      1        ONLINE  ONLINE       cs1                      Open,HOME=/u01/app/o
                                                             racle/product/12.2.0
                                                             /db,STABLE
      2        ONLINE  ONLINE       cs2                      Open,HOME=/u01/app/o
                                                             racle/product/12.2.0
                                                             /db,STABLE
ora.cs1.vip
      1        ONLINE  ONLINE       cs1                      STABLE
ora.cs2.vip
      1        ONLINE  ONLINE       cs2                      STABLE
ora.cvu
      1        ONLINE  ONLINE       cs1                      STABLE
ora.gns
      1        ONLINE  ONLINE       cs2                      STABLE
ora.gns.vip
      1        ONLINE  ONLINE       cs2                      STABLE
ora.mgmtdb
      1        ONLINE  OFFLINE      cs1                      Open,STABLE
ora.qosmserver
      1        ONLINE  ONLINE       cs1                      STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       cs1                      STABLE
ora.scan2.vip
      1        ONLINE  ONLINE       cs2                      STABLE
ora.scan3.vip
      1        ONLINE  ONLINE       cs1                      STABLE
--------------------------------------------------------------------------------

到此恢复正常状态。

PLSQL通过Oracle 11g客户端连接Oracle 12c服务器错误 ORA-28040

PLSQL通过Oracle 11g客户端连接Oracle 12c服务器错误 ORA-28040
环境描述:
oracle服务器端版本:oracle 12.2.0.1.0
oracle客户端版本:oracle 11.2.0.1.0
PLSQL是11.4

因为PLSQL连接数据库也是要通过Oracle客户端,那么使用11g客户端访问oracle 12c应该也会得到如下错误:

C:\Users\Administrator>sqlplus sys/abcd@shardcat as sysdba

SQL*Plus: Release 11.2.0.1.0 Production on Sat Dec 20 22:22:07 2014

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

ERROR:
ORA-28040: No matching authentication protocol

查看关于错误的详细描述:

[oracle@shard1 admin]$ oerr ora 28040
28040, 0000, "No matching authentication protocol"
// *Cause:  There was no acceptable authentication protocol for
//          either client or server.
// *Action: The administrator should set the values of the
//          SQLNET.ALLOWED_LOGON_VERSION_SERVER and
//          SQLNET.ALLOWED_LOGON_VERSION_CLIENT parameters, on both the
//          client and on the server, to values that match the minimum
//          version software supported in the system.
//          This error is also raised when the client is authenticating to
//          a user account which was created without a verifier suitable for
//          the client software version. In this situation, that account's
//          password must be reset, in order for the required verifier to
//          be generated and allow authentication to proceed successfully.

在服务器端查看表:dba_users,PASSWORD_VERSIONS 列是:11G 12C

SQL> select username, account_status,password_versions from dba_users where account_status='OPEN';

USERNAME                       ACCOUNT_STATUS                                                   PASSWORD_VERSIONS
------------------------------ ---------------------------------------------------------------- ----------------------------------
SYS                            OPEN                                                             11G 12C
SYSTEM                         OPEN                                                             11G 12C
GSMCATUSER                     OPEN                                                             11G 12C
JY                             OPEN                                                             11G 12C
MYGDSADMIN                     OPEN                                                             11G 12C
APP_SCHEMA                     OPEN                                                             11G 12C

6 rows selected.

服务器端:修改 sqlnet.ora 配置:(配置修改后,不需要重启oracle服务器)我这里是新创建的sqlnet.ora,因为原来没有创建

[oracle@shard1 admin]$ ls -lrt
总用量 12
-rw-r--r--. 1 oracle oinstall 1441 8月  28 2015 shrept.lst
drwxr-xr-x. 2 oracle oinstall   61 10月 12 2017 samples
-rw-r-----. 1 oracle oinstall  960 1月  18 2018 tnsnames.ora
-rw-r--r--  1 oracle oinstall  504 2月  17 12:58 listener.ora
[oracle@shard1 admin]$ vi sqlnet.ora
SQLNET.ALLOWED_LOGON_VERSION_SERVER=11
SQLNET.ALLOWED_LOGON_VERSION_CLIENT=11

客户端:再次尝试连接,提示用户名密码错误:

C:\Users\Administrator>sqlplus sys/abcd@shardcat as sysdba

SQL*Plus: Release 11.2.0.1.0 Production on Sat Dec 20 22:16:06 2014

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

ERROR:
ORA-01017: invalid username/password; logon denied

修改服务器端 sqlnet.ora 后,需要重新登录sqlplus,再修改用户密码,否则修改用户密码后,标记的密码版本仍然为11G 12C;重新登录sqlplus,修改scott用户密码,并查看 PASSWORD_VERSIONS,多了一个 10G

[oracle@shard1 ~]$ sqlplus sys/abcd@shardcat as sysdba

SQL*Plus: Release 12.2.0.1.0 Production on Mon Feb 17 13:07:28 2020

Copyright (c) 1982, 2016, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

SQL> alter user sys identified by "abcd";

User altered.


SQL> select username, account_status,password_versions from dba_users where account_status='OPEN';

USERNAME                       ACCOUNT_STATUS                                                   PASSWORD_VERSIONS
------------------------------ ---------------------------------------------------------------- ----------------------------------
SYS                            OPEN                                                             10G 11G 12C
SYSTEM                         OPEN                                                             11G 12C
GSMCATUSER                     OPEN                                                             11G 12C
JY                             OPEN                                                             11G 12C
MYGDSADMIN                     OPEN                                                             11G 12C
APP_SCHEMA                     OPEN                                                             11G 12C

6 rows selected.

客户端:再次尝试登录oracle 12c,成功:

C:\Users\Administrator>sqlplus sys/abcd@shardcat as sysdba

SQL*Plus: Release 11.2.0.1.0 Production on Sat Dec 20 22:22:39 2014

Copyright (c) 1982, 2010, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

SQL>

总结:oracle11g 客户端连接 oracle 12c服务器,需要在服务器端配置 sqlnet.ora,并重新修改用户密码。

Oracle Linux 6.7中 Oracle 11.2.0.4 RAC集群CRS异常处理

最近一个月单位上的Oracle RAC集群CRS异常导致集群命令无法使用,执行crsctl stat res -t命令出现如下错误:

[grid@db1 ~]$ crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.

[grid@db2 ~]$ crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.

但是数据库可以正常访问,业务系统也运行正常

[root@db1 ~]# ps -ef | grep pmon
root       8024 204594  0 21:11 pts/0    00:00:00 grep pmon
grid      77120      1  0 Dec21 ?        00:04:21 asm_pmon_+ASM1
oracle    77790      1  0 Dec21 ?        00:05:18 ora_pmon_CAIWU1
oracle    77794      1  0 Dec21 ?        00:05:08 ora_pmon_dadb1
oracle    77848      1  0 Dec21 ?        00:05:39 ora_pmon_chdyl1
oracle    77910      1  0 Dec21 ?        00:07:47 ora_pmon_RLZY1

[root@db2 ~]# ps -ef | grep pmon
grid      27745      1  0 Dec21 ?        00:04:21 asm_pmon_+ASM2
oracle    28393      1  0 Dec21 ?        00:05:21 ora_pmon_dadb2
oracle    28569      1  0 Dec21 ?        00:04:58 ora_pmon_CAIWU2
oracle    28573      1  0 Dec21 ?        00:05:36 ora_pmon_chdyl2
oracle    28583      1  0 Dec21 ?        00:07:49 ora_pmon_RLZY2

查看ASM磁盘组的状态,发现OCR磁盘组确实offline了

[grid@db1 ~]$ asmcmd lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512   4096  1048576   3072000  2679522                0         2679522              0             N  ARCH/
MOUNTED  EXTERN  N         512   4096  1048576    204800   151138                0          151138              0             N  CWDATA/
MOUNTED  EXTERN  N         512   4096  1048576    512000   472546                0          472546              0             N  DADATA/
MOUNTED  EXTERN  N         512   4096  1048576   3072000   595334                0          595334              0             N  DATA/
MOUNTED  EXTERN  N         512   4096  1048576   1843200   609953                0          609953              0             N  SBDATA/

[grid@db2 ~]$ asmcmd lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512   4096  1048576   3072000  2679522                0         2679522              0             N  ARCH/
MOUNTED  EXTERN  N         512   4096  1048576    204800   151138                0          151138              0             N  CWDATA/
MOUNTED  EXTERN  N         512   4096  1048576    512000   472546                0          472546              0             N  DADATA/
MOUNTED  EXTERN  N         512   4096  1048576   3072000   595334                0          595334              0             N  DATA/
MOUNTED  EXTERN  N         512   4096  1048576   1843200   609953                0          609953              0             N  SBDATA/

手工将crsdg上线,命令能够执行成功,但执行crsctl stat res -t命令仍然报错。

[grid@db1 ~]$ sqlplus / as sysasm

SQL*Plus: Release 11.2.0.4.0 Production on Mon Dec 30 21:15:33 2019

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> alter diskgroup ocr mount;

Diskgroup altered.

SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

[grid@db1 ~]$ crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.

[grid@db2 ~]$ sqlplus / as sysasm

SQL*Plus: Release 11.2.0.4.0 Production on Mon Dec 30 21:15:05 2019

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> alter diskgroup ocr mount;

Diskgroup altered.

SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

[grid@db2 ~]$ crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.

检查节点db1的alert_+ASM1.log有如下报错,说是不能访问OCR磁盘组的相关磁盘而强制dismount了OCR磁盘,但使用dd命令测试是能够访问的

Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_77212.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
WARNING: requested mirror side 1 of virtual extent 0 logical extent 0 offset 102400 is not allocated; I/O request failed
WARNING: requested mirror side 2 of virtual extent 0 logical extent 1 offset 102400 is not allocated; I/O request failed
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_77212.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted
Sat Dec 28 05:30:48 2019
SQL> alter diskgroup OCR check /* proxy */ 
ORA-15032: not all alterations performed
ORA-15001: diskgroup "OCR" does not exist or is not mounted
ERROR: alter diskgroup OCR check /* proxy */
NOTE: client exited [77184]
Sat Dec 28 05:30:49 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35285] opening OCR file
Sat Dec 28 05:30:51 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35305] opening OCR file
Sat Dec 28 05:30:53 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35322] opening OCR file
Sat Dec 28 05:30:55 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35346] opening OCR file
Sat Dec 28 05:30:57 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35363] opening OCR file
Sat Dec 28 05:31:00 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35459] opening OCR file
Sat Dec 28 05:31:02 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35481] opening OCR file
Sat Dec 28 05:31:04 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35520] opening OCR file
Sat Dec 28 05:31:06 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35539] opening OCR file
Sat Dec 28 05:31:08 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35557] opening OCR file
Sat Dec 28 21:00:10 2019
Warning: VKTM detected a time drift.
Time drifts can result in an unexpected behavior such as time-outs. Please check trace file for more details.

检查错误日志

[root@db1 ~]# more /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_77212.trc
Trace file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_77212.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
ORACLE_HOME = /u01/app/11.2.0/grid
System name:    Linux
Node name:      db1
Release:        3.8.13-68.3.4.el6uek.x86_64
Version:        #2 SMP Tue Jul 14 15:03:36 PDT 2015
Machine:        x86_64
Instance name: +ASM1
Redo thread mounted by this instance: 0 
Oracle process number: 24
Unix process pid: 77212, image: oracle@db1 (TNS V1-V3)


*** 2019-12-28 05:30:44.894
*** SESSION ID:(2929.3) 2019-12-28 05:30:44.894
*** CLIENT ID:() 2019-12-28 05:30:44.894
*** SERVICE NAME:() 2019-12-28 05:30:44.894
*** MODULE NAME:(crsd.bin@db1 (TNS V1-V3)) 2019-12-28 05:30:44.894
*** ACTION NAME:() 2019-12-28 05:30:44.894
 
Received ORADEBUG command (#1) 'CLEANUP_KFK_FD' from process 'Unix process pid: 35253, image: '

*** 2019-12-28 05:30:44.895
Finished processing ORADEBUG command (#1) 'CLEANUP_KFK_FD'

*** 2019-12-28 05:30:48.235
WARNING:failed xlate 1 
ORA-15078: ASM diskgroup was forcibly dismounted
ksfdrfms:Mirror Read file=+OCR.255.4294967295 fob=0x9b00e5d8 bufp=0x7f5dd012ba00 blkno=25 nbytes=4096
WARNING:failed xlate 1 
WARNING: requested mirror side 1 of virtual extent 0 logical extent 0 offset 102400 is not allocated; I/O request failed
ksfdrfms:Read failed from mirror side=1 logical extent number=0 dskno=65535
WARNING:failed xlate 1 
WARNING: requested mirror side 2 of virtual extent 0 logical extent 1 offset 102400 is not allocated; I/O request failed
ksfdrfms:Read failed from mirror side=2 logical extent number=1 dskno=65535
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted

检查节点db1的alertdb1.log有如下报错,也是说不能访问OCR磁盘组的相关磁盘

2019-12-28 05:30:48.468: 
[/u01/app/11.2.0/grid/bin/oraagent.bin(77466)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/oraagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:1:4} in /u01/app/11.2.0/grid/log/db1/agent/crsd/oraagent_grid/oraagent_grid.lo
g.
2019-12-28 05:30:48.468: 
[/u01/app/11.2.0/grid/bin/oraagent.bin(77684)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/oraagent_oracle' disconnected from server. Details at (:CRSAGF00117:) {0:7:332} in /u01/app/11.2.0/grid/log/db1/agent/crsd/oraagent_oracle/oraagent_o
racle.log.
2019-12-28 05:30:48.471: 
[/u01/app/11.2.0/grid/bin/orarootagent.bin(77482)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117:) {0:5:11497} in /u01/app/11.2.0/grid/log/db1/agent/crsd/orarootagent_root/
orarootagent_root.log.
2019-12-28 05:30:48.480: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:30:50.003: 
[crsd(35285)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:50.021: 
[crsd(35285)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:50.520: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:30:51.918: 
[crsd(35305)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:51.929: 
[crsd(35305)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:52.557: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:30:53.945: 
[crsd(35322)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:53.956: 
[crsd(35322)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:54.595: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:30:55.976: 
[crsd(35346)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:55.988: 
[crsd(35346)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:56.633: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:30:58.010: 
[crsd(35363)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:58.020: 
[crsd(35363)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:58.669: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:31:00.043: 
[crsd(35459)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:00.054: 
[crsd(35459)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:00.706: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:31:02.093: 
[crsd(35481)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:02.103: 
[crsd(35481)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:02.742: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:31:04.109: 
[crsd(35520)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:04.119: 
[crsd(35520)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:04.777: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:31:06.141: 
[crsd(35539)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:06.151: 
[crsd(35539)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:06.810: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:31:08.181: 
[crsd(35557)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:08.191: 
[crsd(35557)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:08.846: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:31:08.847: 
[ohasd(33022)]CRS-2771:Maximum restart attempts reached for resource 'ora.crsd'; will not restart.
2019-12-28 05:31:08.848: 
[ohasd(33022)]CRS-2769:Unable to failover resource 'ora.crsd'.

检查节点db1的oraagent_grid.log文件有如下报错,显示OCR磁盘组的状态被改变为offline了。

2019-12-28 05:30:16.531: [    AGFW][511039232]{1:30746:2} Agent received the message: AGENT_HB[Engine] ID 12293:113720
2019-12-28 05:30:37.808: [    AGFW][511039232]{1:30746:9373} Agent received the message: RESOURCE_STOP[ora.OCR.dg db1 1] ID 4099:113730
2019-12-28 05:30:37.808: [    AGFW][511039232]{1:30746:9373} Preparing STOP command for: ora.OCR.dg db1 1
2019-12-28 05:30:37.808: [    AGFW][511039232]{1:30746:9373} ora.OCR.dg db1 1 state changed from: ONLINE to: STOPPING
2019-12-28 05:30:37.809: [ora.OCR.dg][513140480]{1:30746:9373} [stop] (:CLSN00108:) clsn_agent::stop {
2019-12-28 05:30:37.810: [ora.OCR.dg][513140480]{1:30746:9373} [stop] DgpAgent::stop: enter { 
2019-12-28 05:30:37.810: [ora.OCR.dg][513140480]{1:30746:9373} [stop] getResAttrib: attrib name USR_ORA_OPI value true len 4
2019-12-28 05:30:37.810: [ora.OCR.dg][513140480]{1:30746:9373} [stop] Agent::flagUsrOraOpiIsSet(true) reason not dependency
2019-12-28 05:30:37.810: [ora.OCR.dg][513140480]{1:30746:9373} [stop] DgpAgent::stop: tha exit }
2019-12-28 05:30:37.810: [ora.OCR.dg][513140480]{1:30746:9373} [stop] DgpAgent::stopSingle status:2 }
2019-12-28 05:30:37.810: [ora.OCR.dg][513140480]{1:30746:9373} [stop] (:CLSN00108:) clsn_agent::stop }
2019-12-28 05:30:37.810: [    AGFW][513140480]{1:30746:9373} Command: stop for resource: ora.OCR.dg db1 1 completed with status: SUCCESS
2019-12-28 05:30:37.810: [ora.OCR.dg][513140480]{1:30746:9373} [check] CrsCmd::ClscrsCmdData::stat entity 1 statflag 33 useFilter 0
2019-12-28 05:30:37.811: [    AGFW][511039232]{1:30746:9373} Agent sending reply for: RESOURCE_STOP[ora.OCR.dg db1 1] ID 4099:113730
2019-12-28 05:30:37.838: [ora.OCR.dg][513140480]{1:30746:9373} [check] DgpAgent::runCheck: asm stat asmRet 0
2019-12-28 05:30:37.839: [ora.OCR.dg][513140480]{1:30746:9373} [check] DgpAgent::getConnxn connected
2019-12-28 05:30:37.844: [ora.OCR.dg][513140480]{1:30746:9373} [check] DgpAgent::queryDgStatus excp no data found
2019-12-28 05:30:37.844: [ora.OCR.dg][513140480]{1:30746:9373} [check] DgpAgent::queryDgStatus no data found in v$asm_diskgroup_stat
2019-12-28 05:30:37.844: [ora.OCR.dg][513140480]{1:30746:9373} [check] DgpAgent::queryDgStatus dgName OCR ret 1
2019-12-28 05:30:37.845: [    AGFW][511039232]{1:30746:9373} ora.OCR.dg db1 1 state changed from: STOPPING to: OFFLINE
2019-12-28 05:30:37.845: [    AGFW][511039232]{1:30746:9373} Agent sending last reply for: RESOURCE_STOP[ora.OCR.dg db1 1] ID 4099:113730
2019-12-28 05:30:43.889: [ora.asm][503641856]{1:30746:2} [check] CrsCmd::ClscrsCmdData::stat entity 1 statflag 33 useFilter 0
2019-12-28 05:30:43.920: [ora.asm][503641856]{1:30746:2} [check] AsmProxyAgent::check clsagfw_res_status 0
2019-12-28 05:30:48.465: [ CRSCOMM][521545472] IpcC: IPC client connection 6c to member 0 has been removed
2019-12-28 05:30:48.465: [CLSFRAME][521545472] Removing IPC Member:{Relative|Node:0|Process:0|Type:1}
2019-12-28 05:30:48.465: [CLSFRAME][521545472] Disconnected from CRSD:db1 process: {Relative|Node:0|Process:0|Type:1}
2019-12-28 05:30:48.474: [   AGENT][511039232]{0:1:4} {0:1:4} Created alert : (:CRSAGF00117:) :  Disconnected from server, Agent is shutting down.
2019-12-28 05:30:48.474: [    AGFW][511039232]{0:1:4} Agent is exiting with exit code: 1

检查节点db2的alert_+ASM2.log有如下报错,出现了类似的”Waited 15 secs for write IO to PST disk 0 in group 1″信息,这说明对OCR磁盘组执行写操作时超时了15秒.

Sat Dec 28 03:02:51 2019
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 3.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 3.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 5.
WARNING: Waited 15 secs for write IO to PST disk 1 in group 5.
WARNING: Waited 15 secs for write IO to PST disk 2 in group 5.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 5.
WARNING: Waited 15 secs for write IO to PST disk 1 in group 5.
WARNING: Waited 15 secs for write IO to PST disk 2 in group 5.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 6.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 6.
Sat Dec 28 03:02:51 2019
NOTE: process _b000_+asm2 (45488) initiating offline of disk 0.3916160907 (OCR1) with mask 0x7e in group 5
NOTE: process _b000_+asm2 (45488) initiating offline of disk 1.3916160906 (OCR2) with mask 0x7e in group 5
NOTE: process _b000_+asm2 (45488) initiating offline of disk 2.3916160905 (OCR3) with mask 0x7e in group 5
NOTE: checking PST: grp = 5
GMON checking disk modes for group 5 at 19 for pid 27, osid 45488
ERROR: no read quorum in group: required 2, found 0 disks
NOTE: checking PST for grp 5 done.
NOTE: initiating PST update: grp = 5, dsk = 0/0xe96bdf8b, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 5, dsk = 1/0xe96bdf8a, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 5, dsk = 2/0xe96bdf89, mask = 0x6a, op = clear
GMON updating disk modes for group 5 at 20 for pid 27, osid 45488
ERROR: no read quorum in group: required 2, found 0 disks
Sat Dec 28 03:02:51 2019
NOTE: cache dismounting (not clean) group 5/0x8F5B2F9F (OCR) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 45490, image: oracle@db2 (B001)
Sat Dec 28 03:02:51 2019
NOTE: halting all I/Os to diskgroup 5 (OCR)
Sat Dec 28 03:02:52 2019
NOTE: LGWR doing non-clean dismount of group 5 (OCR)
NOTE: LGWR sync ABA=23.100 last written ABA 23.100
WARNING: Offline for disk OCR1 in mode 0x7f failed.
WARNING: Offline for disk OCR2 in mode 0x7f failed.
WARNING: Offline for disk OCR3 in mode 0x7f failed.
Sat Dec 28 03:02:52 2019
kjbdomdet send to inst 1
detach from dom 5, sending detach message to inst 1
Sat Dec 28 03:02:52 2019
List of instances:
 1 2
Dirty detach reconfiguration started (new ddet inc 1, cluster inc 36)
 Global Resource Directory partially frozen for dirty detach
* dirty detach - domain 5 invalid = TRUE 
 0 GCS resources traversed, 0 cancelled
Dirty Detach Reconfiguration complete
Sat Dec 28 03:02:52 2019
WARNING: dirty detached from domain 5
NOTE: cache dismounted group 5/0x8F5B2F9F (OCR) 
SQL> alter diskgroup OCR dismount force /* ASM SERVER:2405117855 */ 
Sat Dec 28 03:02:52 2019
NOTE: cache deleting context for group OCR 5/0x8f5b2f9f
GMON dismounting group 5 at 21 for pid 28, osid 45490
NOTE: Disk OCR1 in mode 0x7f marked for de-assignment
NOTE: Disk OCR2 in mode 0x7f marked for de-assignment
NOTE: Disk OCR3 in mode 0x7f marked for de-assignment
NOTE:Waiting for all pending writes to complete before de-registering: grpnum 5
Sat Dec 28 03:03:03 2019
WARNING: Waited 27 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 27 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 24 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 24 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 27 secs for write IO to PST disk 0 in group 3.
WARNING: Waited 27 secs for write IO to PST disk 0 in group 3.
WARNING: Waited 21 secs for write IO to PST disk 0 in group 4.
WARNING: Waited 21 secs for write IO to PST disk 0 in group 4.
WARNING: Waited 27 secs for write IO to PST disk 0 in group 6.
WARNING: Waited 27 secs for write IO to PST disk 0 in group 6.
Sat Dec 28 03:03:03 2019
ASM Health Checker found 1 new failures
Sat Dec 28 03:03:22 2019
SUCCESS: diskgroup OCR was dismounted
SUCCESS: alter diskgroup OCR dismount force /* ASM SERVER:2405117855 */
SUCCESS: ASM-initiated MANDATORY DISMOUNT of group OCR
Sat Dec 28 03:03:22 2019
NOTE: diskgroup resource ora.OCR.dg is offline
Sat Dec 28 03:03:22 2019
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Sat Dec 28 05:30:34 2019
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 6.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 6.
Sat Dec 28 05:30:37 2019
 Received dirty detach msg from inst 1 for dom 5
Sat Dec 28 05:30:37 2019
List of instances:
 1 2
Dirty detach reconfiguration started (new ddet inc 2, cluster inc 36)
 Global Resource Directory partially frozen for dirty detach
* dirty detach - domain 5 invalid = TRUE 
 0 GCS resources traversed, 0 cancelled
freeing rdom 5
Dirty Detach Reconfiguration complete
Sat Dec 28 05:30:37 2019
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
WARNING: requested mirror side 1 of virtual extent 5 logical extent 0 offset 704512 is not allocated; I/O request failed
WARNING: requested mirror side 2 of virtual extent 5 logical extent 1 offset 704512 is not allocated; I/O request failed
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted
Sat Dec 28 05:30:37 2019
SQL> alter diskgroup OCR check /* proxy */ 
ORA-15032: not all alterations performed
ORA-15001: diskgroup "OCR" does not exist or is not mounted
ERROR: alter diskgroup OCR check /* proxy */
Sat Dec 28 05:30:44 2019
WARNING: Waited 20 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 20 secs for write IO to PST disk 0 in group 2.
Sat Dec 28 05:30:48 2019
NOTE: client exited [27819]
Sat Dec 28 05:30:49 2019
NOTE: [crsd.bin@db2 (TNS V1-V3) 142641] opening OCR file
Sat Dec 28 05:30:51 2019
NOTE: [crsd.bin@db2 (TNS V1-V3) 142660] opening OCR file
Sat Dec 28 05:30:53 2019
NOTE: [crsd.bin@db2 (TNS V1-V3) 142678] opening OCR file
Sat Dec 28 05:30:55 2019
NOTE: [crsd.bin@db2 (TNS V1-V3) 142696] opening OCR file
Sat Dec 28 05:30:57 2019
NOTE: [crsd.bin@db2 (TNS V1-V3) 142723] opening OCR file
Sat Dec 28 05:30:59 2019
NOTE: [crsd.bin@db2 (TNS V1-V3) 142744] opening OCR file
Sat Dec 28 05:31:01 2019
NOTE: [crsd.bin@db2 (TNS V1-V3) 142773] opening OCR file
Sat Dec 28 05:31:03 2019
NOTE: [crsd.bin@db2 (TNS V1-V3) 142792] opening OCR file
Sat Dec 28 05:31:05 2019
NOTE: [crsd.bin@db2 (TNS V1-V3) 142806] opening OCR file
Sat Dec 28 05:31:07 2019
NOTE: [crsd.bin@db2 (TNS V1-V3) 142821] opening OCR file
Sat Dec 28 06:18:42 2019
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 2.

检查节点db2的alertdb2.log有如下报错,也是说不能访问OCR磁盘组的相关磁盘,但使用dd命令测试是可以访问的。

2019-12-28 05:30:48.019: 
[/u01/app/11.2.0/grid/bin/oraagent.bin(28268)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/oraagent_oracle' disconnected from server. Details at (:CRSAGF00117:) {0:7:73} in /u01/app/11.2.0/grid/log/db2/agent/crsd/oraagent_oracle/oraagent_oracle.log.
2019-12-28 05:30:48.019: 
[/u01/app/11.2.0/grid/bin/scriptagent.bin(37953)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/scriptagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:9:8} in /u01/app/11.2.0/grid/log/db2/agent/crsd/scriptagent_grid/scriptagent_grid.log.
2019-12-28 05:30:48.020: 
[/u01/app/11.2.0/grid/bin/oraagent.bin(28009)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/oraagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:1:10} in /u01/app/11.2.0/grid/log/db2/agent/crsd/oraagent_grid/oraagent_grid.log.
2019-12-28 05:30:48.021: 
[/u01/app/11.2.0/grid/bin/orarootagent.bin(28025)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117:) {0:5:373} in /u01/app/11.2.0/grid/log/db2/agent/crsd/orarootagent_root/orarootagent_root.log.
2019-12-28 05:30:48.024: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:30:49.410: 
[crsd(142641)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:49.420: 
[crsd(142641)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:50.063: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:30:51.442: 
[crsd(142660)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:51.451: 
[crsd(142660)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:52.100: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:30:53.471: 
[crsd(142678)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:53.480: 
[crsd(142678)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:54.138: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:30:55.507: 
[crsd(142696)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:55.517: 
[crsd(142696)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:56.176: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:30:57.551: 
[crsd(142723)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:57.560: 
[crsd(142723)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:58.216: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:30:59.592: 
[crsd(142744)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:59.602: 
[crsd(142744)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:00.253: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:31:01.627: 
[crsd(142773)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:01.636: 
[crsd(142773)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:02.290: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:31:03.658: 
[crsd(142792)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:03.668: 
[crsd(142792)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:04.327: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:31:05.701: 
[crsd(142806)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:05.711: 
[crsd(142806)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:06.365: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:31:07.726: 
[crsd(142821)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:07.735: 
[crsd(142821)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:08.402: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:31:08.402: 
[ohasd(13034)]CRS-2771:Maximum restart attempts reached for resource 'ora.crsd'; will not restart.
2019-12-28 05:31:08.403: 
[ohasd(13034)]CRS-2769:Unable to failover resource 'ora.crsd'.

检查节点db2的oraagent_grid.log文件有如下报错:

2019-12-28 05:29:59.329: [    AGFW][3601811200]{2:6928:2} Agent received the message: AGENT_HB[Engine] ID 12293:273304
2019-12-28 05:30:17.162: [ora.LISTENER_SCAN2.lsnr][3592312576]{1:34166:403} [check] Utils:execCmd action = 3 flags = 38 ohome = (null) cmdname = lsnrctl. 
2019-12-28 05:30:17.267: [ora.LISTENER_SCAN2.lsnr][3592312576]{1:34166:403} [check] execCmd ret = 0
2019-12-28 05:30:17.267: [ora.LISTENER_SCAN2.lsnr][3592312576]{1:34166:403} [check] CrsCmd::ClscrsCmdData::stat entity 5 statflag 32 useFilter 1
2019-12-28 05:30:17.298: [ora.LISTENER_SCAN2.lsnr][3592312576]{1:34166:403} [check] ScanLsnrAgent::checkDependentVipResource: scanVipResource = ora.scan2.vip, statRet = 0
2019-12-28 05:30:17.881: [ora.LISTENER_SCAN3.lsnr][2950686464]{1:34166:403} [check] Utils:execCmd action = 3 flags = 38 ohome = (null) cmdname = lsnrctl. 
2019-12-28 05:30:17.986: [ora.LISTENER_SCAN3.lsnr][2950686464]{1:34166:403} [check] execCmd ret = 0
2019-12-28 05:30:17.987: [ora.LISTENER_SCAN3.lsnr][2950686464]{1:34166:403} [check] CrsCmd::ClscrsCmdData::stat entity 5 statflag 32 useFilter 1
2019-12-28 05:30:18.019: [ora.LISTENER_SCAN3.lsnr][2950686464]{1:34166:403} [check] ScanLsnrAgent::checkDependentVipResource: scanVipResource = ora.scan3.vip, statRet = 0
2019-12-28 05:30:27.292: [ora.asm][2950686464]{2:6928:2} [check] CrsCmd::ClscrsCmdData::stat entity 1 statflag 33 useFilter 0
2019-12-28 05:30:27.319: [ora.asm][2950686464]{2:6928:2} [check] AsmProxyAgent::check clsagfw_res_status 0
2019-12-28 05:30:34.522: [ora.ons][2950686464]{2:6928:2} [check] getOracleHomeAttrib: oracle_home = /u01/app/11.2.0/grid
2019-12-28 05:30:34.522: [ora.ons][2950686464]{2:6928:2} [check] Utils:execCmd action = 3 flags = 6 ohome = /u01/app/11.2.0/grid/opmn/ cmdname = onsctli. 
2019-12-28 05:30:34.627: [ora.ons][2950686464]{2:6928:2} [check] (:CLSN00010:)ons is running ...
2019-12-28 05:30:34.627: [ora.ons][2950686464]{2:6928:2} [check] (:CLSN00010:)
2019-12-28 05:30:34.628: [ora.ons][2950686464]{2:6928:2} [check] execCmd ret = 0
2019-12-28 05:30:37.858: [ USRTHRD][3575973632]{1:30748:9373} Processing the event CRS_RESOURCE_STATE_CHANGE
2019-12-28 05:30:38.652: [ora.LISTENER.lsnr][3594413824]{2:6928:2} [check] Utils:execCmd action = 3 flags = 38 ohome = (null) cmdname = lsnrctl. 
2019-12-28 05:30:38.757: [ora.LISTENER.lsnr][3594413824]{2:6928:2} [check] execCmd ret = 0
2019-12-28 05:30:48.017: [ CRSCOMM][3612317440] IpcC: IPC client connection 6c to member 0 has been removed
2019-12-28 05:30:48.017: [CLSFRAME][3612317440] Removing IPC Member:{Relative|Node:0|Process:0|Type:1}
2019-12-28 05:30:48.017: [CLSFRAME][3612317440] Disconnected from CRSD:db2 process: {Relative|Node:0|Process:0|Type:1}
2019-12-28 05:30:48.020: [   AGENT][3601811200]{0:1:10} {0:1:10} Created alert : (:CRSAGF00117:) :  Disconnected from server, Agent is shutting down.
2019-12-28 05:30:48.020: [    AGFW][3601811200]{0:1:10} Agent is exiting with exit code: 1

检查2个节点的/var/log/messages日志,发现2个节点均有多路径相关的错误信息,但相关的磁盘是用于备份的,不是用于生产数据库所使用的

Dec 30 05:25:31 db1 multipathd: backup2: sdcr - emc_clariion_checker: query command indicates error
Dec 30 05:25:31 db1 multipathd: checker failed path 69:240 in map backup2
Dec 30 05:25:31 db1 kernel: device-mapper: multipath: Failing path 69:240.
Dec 30 05:25:31 db1 multipathd: backup2: remaining active paths: 3
Dec 30 05:25:37 db1 multipathd: backup2: sdcr - emc_clariion_checker: Active path is healthy.
Dec 30 05:25:37 db1 multipathd: 69:240: reinstated
Dec 30 05:25:37 db1 multipathd: backup2: remaining active paths: 4
Dec 30 05:25:37 db1 kernel: sd 5:0:3:2: emc: ALUA failover mode detected
Dec 30 05:25:37 db1 kernel: sd 5:0:3:2: emc: at SP A Port 5 (owned, default SP A)
Dec 30 05:26:03 db1 kernel: qla2xxx [0000:05:00.1]-801c:5: Abort command issued nexus=5:3:4 --  1 2002.
Dec 30 06:03:35 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 06:15:23 db1 multipathd: backup3: sdcq - emc_clariion_checker: Read error for WWN 600601608b203300d563752524c1e611.  Sense data are 0x0/0x0/0x0.
Dec 30 06:15:23 db1 kernel: qla2xxx [0000:05:00.1]-801c:5: Abort command issued nexus=5:3:1 --  1 2002.
Dec 30 06:15:23 db1 kernel: device-mapper: multipath: Failing path 69:224.
Dec 30 06:15:23 db1 multipathd: checker failed path 69:224 in map backup3
Dec 30 06:15:23 db1 multipathd: backup3: remaining active paths: 3
Dec 30 06:15:28 db1 multipathd: backup3: sdcq - emc_clariion_checker: Active path is healthy.
Dec 30 06:15:28 db1 multipathd: 69:224: reinstated
Dec 30 06:15:28 db1 multipathd: backup3: remaining active paths: 4
Dec 30 06:15:28 db1 kernel: sd 5:0:3:1: emc: ALUA failover mode detected
Dec 30 06:15:28 db1 kernel: sd 5:0:3:1: emc: at SP A Port 5 (owned, default SP A)
Dec 30 06:59:29 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 07:53:22 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 07:55:11 db1 multipathd: sdct: couldn't get asymmetric access state
Dec 30 07:55:11 db1 multipathd: backup4: load table [0 2147483648 multipath 2 queue_if_no_path retain_attached_hw_handler 1 emc 2 1 round-robin 0 2 1 70:16 1 66:240 1 round-robin 0 2 1 8:64 1 67:80 1]
Dec 30 07:55:11 db1 kernel: sd 5:0:3:4: emc: ALUA failover mode detected
Dec 30 07:55:11 db1 kernel: sd 5:0:3:4: emc: at SP A Port 5 (owned, default SP A)
Dec 30 07:55:11 db1 kernel: sd 4:0:3:4: emc: ALUA failover mode detected
Dec 30 07:55:11 db1 kernel: sd 4:0:3:4: emc: at SP A Port 4 (owned, default SP A)
Dec 30 07:55:35 db1 multipathd: backup2: sdcr - emc_clariion_checker: Read error for WWN 600601608b203300d663752524c1e611.  Sense data are 0x0/0x0/0x0.
Dec 30 07:55:35 db1 multipathd: checker failed path 69:240 in map backup2
Dec 30 07:55:35 db1 multipathd: backup2: remaining active paths: 3
Dec 30 07:55:35 db1 kernel: device-mapper: multipath: Failing path 69:240.
Dec 30 07:55:40 db1 multipathd: backup2: sdcr - emc_clariion_checker: Active path is healthy.
Dec 30 07:55:40 db1 multipathd: 69:240: reinstated
Dec 30 07:55:40 db1 multipathd: backup2: remaining active paths: 4
Dec 30 07:55:40 db1 kernel: sd 5:0:3:2: emc: ALUA failover mode detected
Dec 30 07:55:40 db1 kernel: sd 5:0:3:2: emc: at SP A Port 5 (owned, default SP A)
Dec 30 08:39:47 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 08:43:36 db1 multipathd: mpathb: load table [0 20971520 multipath 2 queue_if_no_path retain_attached_hw_handler 1 emc 2 1 round-robin 0 2 1 69:208 1 66:176 1 round-robin 0 2 1 8:0 1 67:16 1]
Dec 30 08:43:36 db1 kernel: sd 5:0:3:0: emc: ALUA failover mode detected
Dec 30 08:43:36 db1 kernel: sd 5:0:3:0: emc: at SP A Port 5 (owned, default SP A)
Dec 30 08:43:36 db1 kernel: sd 4:0:3:0: emc: ALUA failover mode detected
Dec 30 08:43:36 db1 kernel: sd 4:0:3:0: emc: at SP A Port 4 (owned, default SP A)
Dec 30 09:24:04 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 10:13:09 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 11:06:07 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 12:07:36 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 13:08:58 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 14:00:19 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 14:52:20 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 15:40:45 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 16:34:38 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 17:09:56 db1 auditd[15975]: Audit daemon rotating log files
Dec 30 17:38:16 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 18:59:38 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 19:54:43 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

经查看集群相关日志可以确定,由于存储磁盘出现IO问题(或光线闪断、或IO延迟),导致集群CRS异常宕机。但是,比较奇怪的是,虽然CSR掉线了,ASM实例和DB实例却好好的,还可以正常使用。查询oracle support发现一篇文章1581864.1?提到ASM CRS仲裁盘访问超时与隐藏参数_asm_hbeatiowait有关系,而ASM的隐藏参数_asm_hbeatiowait由于操作系统多路径Multipath配置的polling_interval有关,具体的故障原因是操作系统盘的判断访问超时远大于数据库ASM仲裁盘访问超时,导致ORACLE RAC判定ASM中仲裁盘无法访问从而将仲裁盘强制Offline。解决的思路是:首先,确定操作系统polling_interval参数与数据库ASM隐藏参数值_asm_hbeatiowait,将_asm_hbeatiowait的值调整到比polling_interval值大即可。
下面是具体的解决操作:
1、查看数据库RAC ASM的_asm_hbeatiowait值(默认是15秒):

SQL> col ksppinm for a20
SQL> col ksppstvl for a40
SQL> col ksppdesc for a80
SQL> SELECT   ksppinm, ksppstvl, ksppdesc
  2     FROM   x$ksppi x, x$ksppcv y
  3    WHERE   x.indx = y.indx AND  ksppinm = '_asm_hbeatiowait';

KSPPINM              KSPPSTVL                                 KSPPDESC
-------------------- ---------------------------------------- --------------------------------------------------------------------------------
_asm_hbeatiowait     15                                       number of secs to wait for PST Async Hbeat IO return

2、查看操作存储盘访问超时时间(Oracle Linux 6.7默认是30秒)

[root@db1 ~]# cat /sys/block/sdb/device/timeout 
30

[root@db2 ~]# cat /sys/block/sdb/device/timeout 
30

3、将_asm_hbeatiowait 的值调整为45秒(该参数是静态参数,需要重启集群)

SQL>  alter system set "_asm_hbeatiowait"=45 scope=spfile sid='*';

System altered.

4.重启集群

5.检查集群状态

[grid@db1 ~]$ crsctl stat res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ARCH.dg
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.CWDATA.dg
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.DADATA.dg
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.DATA.dg
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.LISTENER.lsnr
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.OCR.dg
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.SBKDATA.dg
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.asm
               ONLINE  ONLINE       db1                      Started             
               ONLINE  ONLINE       db2                      Started             
ora.gsd
               OFFLINE OFFLINE      db1                                          
               OFFLINE OFFLINE      db2                                          
ora.net1.network
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.ons
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       db2                                          
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       db1                                          
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       db1                                          
ora.caiwu.db
      1        ONLINE  ONLINE       db1                      Open                
      2        ONLINE  ONLINE       db2                      Open                
ora.chdyl.db
      1        ONLINE  ONLINE       db1                      Open                
      2        ONLINE  ONLINE       db2                      Open                
ora.cvu
      1        ONLINE  ONLINE       db1                                          
ora.dadb.db
      1        ONLINE  ONLINE       db1                      Open                
      2        ONLINE  ONLINE       db2                      Open                
ora.db1.vip
      1        ONLINE  ONLINE       db1                                          
ora.db2.vip
      1        ONLINE  ONLINE       db2                                          
ora.oc4j
      1        ONLINE  ONLINE       db1                                          
ora.rlzy.db
      1        ONLINE  ONLINE       db1                      Open                
      2        ONLINE  ONLINE       db2                      Open                
ora.scan1.vip
      1        ONLINE  ONLINE       db2                                          
ora.scan2.vip
      1        ONLINE  ONLINE       db1                                          
ora.scan3.vip
      1        ONLINE  ONLINE       db1                                          


到此处理完成。

RMAN删除归档日志出现RMAN-0813错误的处理

数据库备份日志报错:RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process,DB版本为11.2.0.4,根据备策略,每天备份归档,并且自动删除档,保留一个星期的归档,手功执如下命令:

delete noprompt archivelog until time 'sysdate-7';

依然报错,信息如下:

archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_24/thread_1_seq_2328.1241.1027887401 thread=1 sequence=2328
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_24/thread_1_seq_2329.1242.1027887719 thread=1 sequence=2329
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_24/thread_1_seq_2330.1243.1027887813 thread=1 sequence=2330
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_24/thread_1_seq_2331.1244.1027887873 thread=1 sequence=2331
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_25/thread_1_seq_2332.1245.1027902623 thread=1 sequence=2332
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_25/thread_1_seq_2333.1246.1027902719 thread=1 sequence=2333
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_25/thread_1_seq_2334.1247.1027908493 thread=1 sequence=2334
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_25/thread_1_seq_2335.1248.1027911721 thread=1 sequence=2335
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_26/thread_1_seq_2336.1249.1027989005 thread=1 sequence=2336
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_26/thread_1_seq_2337.1250.1027989067 thread=1 sequence=2337
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_26/thread_1_seq_2338.1251.1027990617 thread=1 sequence=2338
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_26/thread_1_seq_2339.1252.1028041713 thread=1 sequence=2339
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_27/thread_1_seq_2340.1253.1028075413 thread=1 sequence=2340
...............

在MOS搜索发现这是11g中的一个bug,描述的原因是在11g中,RMAN查看的是v$database中的MIN_REQUIRED_CAPTURE_CHANGE#,而不是DBA_CAPTURE。默认情况下,该值每6小时更新一次。因此,在默认情况下,v$database中的值将比当前值晚6个小时。
给出的解决方法是需要在delete 后面加上force。
手动执行delete noprompt force archivelog until time ‘sysdate-7’, 成功将归档删除。

deleted archived log
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_24/thread_1_seq_2330.1243.1027887813 RECID=18420 STAMP=1027887814
deleted archived log
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_24/thread_1_seq_2331.1244.1027887873 RECID=18421 STAMP=1027887875
deleted archived log
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_25/thread_1_seq_2332.1245.1027902623 RECID=18422 STAMP=1027902628
deleted archived log
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_25/thread_1_seq_2333.1246.1027902719 RECID=18423 STAMP=1027902721
deleted archived log
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_25/thread_1_seq_2334.1247.1027908493 RECID=18424 STAMP=1027908500
deleted archived log
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_25/thread_1_seq_2335.1248.1027911721 RECID=18425 STAMP=1027911722
deleted archived log
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_26/thread_1_seq_2336.1249.1027989005 RECID=18426 STAMP=1027989010
deleted archived log
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_26/thread_1_seq_2337.1250.1027989067 RECID=18427 STAMP=1027989069
deleted archived log
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_26/thread_1_seq_2338.1251.1027990617 RECID=18428 STAMP=1027990618
deleted archived log
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_26/thread_1_seq_2339.1252.1028041713 RECID=18429 STAMP=1028041715
...........
Deleted 800 objects

Oracle 11G RAC One Node’s Instance Arise ORA-01105 ORA-01606

生产环境,Oracle Linux 6.4 RAC 11.2.4.0,今天出现了grid集群因为OCR磁盘组不能访问而不能通过命令来显示集群状态信息,在手动mount OCR磁盘组后,准备重启节点2时出现了ORA-01105,ORA-01606错误,具体信息如下:

SQL> startup
ORACLE instance started.

Total System Global Area 5.3447E+10 bytes
Fixed Size                  2265864 bytes
Variable Size            1.3019E+10 bytes
Database Buffers         4.0265E+10 bytes
Redo Buffers              160698368 bytes
ORA-01105: mount is incompatible with mounts by other instances
ORA-01606: parameter not identical to that of another mounted instance

查看错误详细说明,根据错误描述可知是由于2号实例与1号实例由于某些参数设置一样所导致的

[oracle@db2 dbs]$ oerr ora 1105
01105, 00000, "mount is incompatible with mounts by other instances"
// *Cause:  An attempt to mount the database discovered that another instance
//         mounted a database by the same name, but the mount is not
//         compatible. Additional errors are reported explaining why.
// *Action: See accompanying errors.
[oracle@db2 dbs]$ oerr ora 1606
01606, 00000, "parameter not identical to that of another mounted instance"
// *Cause:  A parameter was different on two instances.
// *Action: Modify the initialization parameter and restart.

使用spfile文件来创建文本格式的pfile文件

SQL> create pfile='rlcs.ora' from spfile;

File created.

[oracle@db2 dbs]$ cat rlcs.ora

*._serial_direct_read='AUTO'
*._swrf_mmon_flush=TRUE
*._use_adaptive_log_file_sync='FALSE'
*.audit_file_dest='/u01/app/oracle/admin/RL/adump'
*.audit_trail='NONE'
*.cluster_database=true
*.compatible='11.2.0.4.0'
*.control_files='+DATA/rl/controlfile/current.260.926786537','+ARCH/rl/controlfile/current.256.926786537'
*.db_block_size=8192
*.db_create_file_dest='+DATA'
*.db_domain=''
*.db_file_name_convert='+data/rl/','+data/rldg/'
*.db_name='RL'
*.db_recovery_file_dest='+ARCH'
*.db_recovery_file_dest_size=10737418240
*.diagnostic_dest='/u01/app/oracle'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=RLZYXDB)'
*.fal_server='yb_st'
RL1.instance_number=1
RL2.instance_number=2
*.listener_networks=''
*.log_archive_config='dg_config=(rl,rldg)'
*.log_archive_dest_1='location=+ARCH  valid_for=(all_logfiles,all_roles) db_unique_name=rl'
*.log_archive_dest_2='service=yb_st valid_for=(online_logfiles,primary_role) db_unique_name=rldg'
*.log_archive_dest_state_1='ENABLE'
*.log_archive_dest_state_2='ENABLE'
*.log_archive_format='%t_%s_%r.arch'
*.log_file_name_convert='+data/rl/','+data/rldg/','+arch/rl/','+arch/rldg/'
*.open_cursors=300
*.parallel_adaptive_multi_user=TRUE
*.parallel_degree_policy='AUTO'
*.parallel_force_local=FALSE
*.pga_aggregate_target=21474836480
*.processes=2000
*.remote_listener='scan-ip:1521'
*.remote_login_passwordfile='exclusive'
*.service_names='rl'
*.sessions=2205
*.sga_max_size=53687091200
*.sga_target=53687091200
*.standby_file_management='manual'
RLZY2.thread=2
RLZY1.thread=1
*.undo_retention=7200
RLZY2.undo_tablespace='UNDOTBS2'
RLZY1.undo_tablespace='UNDOTBS1'
*.utl_file_dir='/rmanbak/utl'
从文本参数文件看不出来有什么参数是两个实例不一致的。

通过执行下面的语句来查看2号实例与gc相关的参数
SQL> set linesize 333
SQL> col name for a35
SQL> col description for a66
SQL> col value for a30
SQL> SELECT   i.ksppinm name,  
  2     i.ksppdesc description,  
  3     CV.ksppstvl VALUE
  4  FROM   sys.x$ksppi i, sys.x$ksppcv CV  
  5     WHERE   i.inst_id = USERENV ('Instance')  
  6     AND CV.inst_id = USERENV ('Instance')  
  7     AND i.indx = CV.indx  
  8     AND i.ksppinm LIKE '/_gc%' ESCAPE '/'  
  9  ORDER BY   REPLACE (i.ksppinm, '_', '');  

NAME                                DESCRIPTION                                                        VALUE
----------------------------------- ------------------------------------------------------------------ ------------------------------
_gc_affinity_locking                if TRUE, enable object affinity                                    TRUE
_gc_affinity_locks                  if TRUE, get affinity locks                                        TRUE
_gc_affinity_ratio                  dynamic object affinity ratio                                      50
_gc_async_memcpy                    if TRUE, use async memcpy                                          FALSE
_gc_bypass_readers                  if TRUE, modifications bypass readers                              TRUE
_gc_check_bscn                      if TRUE, check for stale blocks                                    TRUE
_gc_coalesce_recovery_reads         if TRUE, coalesce recovery reads                                   TRUE
_gc_cpu_time                        if TRUE, record the gc cpu time                                    FALSE
_gc_cr_server_read_wait             if TRUE, cr server waits for a read to complete                    TRUE
_gc_defer_ping_index_only           if TRUE, restrict deferred ping to index blocks only               TRUE
_gc_defer_time                      how long to defer pings for hot buffers in milliseconds            0

NAME                                DESCRIPTION                                                        VALUE
----------------------------------- ------------------------------------------------------------------ ------------------------------
_gc_delta_push_compression          if delta >= K bytes, compress before push                          3072
_gc_delta_push_max_level            max delta level for delta push                                     100
_gc_delta_push_objects              objects which use delta push                                       0
_gc_disable_s_lock_brr_ping_check   if TRUE, disable S lock BRR ping check for lost write protect      TRUE
_gc_down_convert_after_keep         if TRUE, down-convert lock after recovery                          TRUE
_gc_element_percent                 global cache element percent                                       110
_gc_escalate_bid                    if TRUE, escalates create a bid                                    TRUE
_gc_fg_merge                        if TRUE, merge pi buffers in the foreground                        TRUE
_gc_flush_during_affinity           if TRUE, flush during affinity                                     TRUE
_gc_fusion_compression              compress fusion blocks if there is free space                      1024
_gc_global_checkpoint_scn           if TRUE, enable global checkpoint scn                              TRUE

NAME                                DESCRIPTION                                                        VALUE
----------------------------------- ------------------------------------------------------------------ ------------------------------
_gc_global_cpu                      global cpu checks                                                  TRUE
_gc_global_lru                      turn global lru off, make it automatic, or turn it on              AUTO
_gc_global_lru_touch_count          global lru touch count                                             5
_gc_global_lru_touch_time           global lru touch time in seconds                                   60
_gc_integrity_checks                set the integrity check level                                      1
_gc_keep_recovery_buffers           if TRUE, make single instance crash recovery buffers current       TRUE
_gc_latches                         number of latches per LMS process                                  8
_gc_log_flush                       if TRUE, flush redo log before a current block transfer            TRUE
_gc_long_query_threshold            threshold for long running query                                   0
_gc_max_downcvt                     maximum downconverts to process at one time                        256
_gc_maximum_bids                    maximum number of bids which can be prepared                       0

NAME                                DESCRIPTION                                                        VALUE
----------------------------------- ------------------------------------------------------------------ ------------------------------
_gc_no_fairness_for_clones          if TRUE, no fairness if we serve a clone                           TRUE
_gc_object_queue_max_length         maximum length for an object queue                                 0
_gc_override_force_cr               if TRUE, try to override force-cr requests                         TRUE
_gc_persistent_read_mostly          if TRUE, enable persistent read-mostly locking                     TRUE
_gc_policy_minimum                  dynamic object policy minimum activity per minute                  1500
_gc_policy_time                     how often to make object policy decisions in minutes               10
_gc_read_mostly_flush_check         if TRUE, optimize flushes for read mostly objects                  FALSE
_gc_read_mostly_locking             if TRUE, enable read-mostly locking                                FALSE
_gcr_enable_high_cpu_kill           if TRUE, GCR may kill foregrounds under high load                  FALSE
_gcr_enable_high_cpu_rm             if TRUE, GCR may enable a RM plan under high load                  FALSE
_gcr_enable_high_cpu_rt             if TRUE, GCR may boost bg priority under high load                 FALSE

NAME                                DESCRIPTION                                                        VALUE
----------------------------------- ------------------------------------------------------------------ ------------------------------
_gcr_high_cpu_threshold             minimum amount of CPU process must consume to be kill target       10
_gcr_use_css                        if FALSE, GCR wont register with CSS nor use any CSS feature       TRUE
_gc_sanity_check_cr_buffers         if TRUE, sanity check CR buffers                                   FALSE
_gcs_disable_remote_handles         disable remote client/shadow handles                               FALSE
_gcs_disable_skip_close_remastering if TRUE, disable skip close optimization in remastering            FALSE
_gc_serve_high_pi_as_current        if TRUE, use a higher clone scn when serving a pi                  TRUE
_gcs_fast_reconfig                  if TRUE, enable fast reconfiguration for gcs locks                 TRUE
_gcs_latches                        number of gcs resource hash latches to be allocated per LMS proces 64
                                    s

_gcs_pkey_history                   number of pkey remastering history                                 4000

NAME                                DESCRIPTION                                                        VALUE
----------------------------------- ------------------------------------------------------------------ ------------------------------
_gcs_process_in_recovery            if TRUE, process gcs requests during instance recovery             TRUE
_gcs_resources                      number of gcs resources to be allocated
_gcs_res_per_bucket                 number of gcs resource per hash bucket                             4
_gcs_shadow_locks                   number of pcm shadow locks to be allocated
_gc_statistics                      if TRUE, kcl statistics are maintained                             TRUE
_gcs_testing                        GCS testing parameter                                              0
_gc_transfer_ratio                  dynamic object read-mostly transfer ratio                          2
_gc_undo_affinity                   if TRUE, enable dynamic undo affinity                              TRUE
_gc_undo_block_disk_reads           if TRUE, enable undo block disk reads                              TRUE
_gc_use_cr                          if TRUE, allow CR pins on PI and WRITING buffers                   TRUE
_gc_vector_read                     if TRUE, vector read current buffers                               TRUE

64 rows selected.

查看1号实例与gc相关的参数

SQL> set linesize 333
SQL> col name for a35
SQL> col description for a66
SQL> col value for a30
SQL> SELECT   i.ksppinm name,  
  2     i.ksppdesc description,  
  3     CV.ksppstvl VALUE
  4  FROM   sys.x$ksppi i, sys.x$ksppcv CV  
  5     WHERE   i.inst_id = USERENV ('Instance')  
  6     AND CV.inst_id = USERENV ('Instance')  
  7     AND i.indx = CV.indx  
  8     AND i.ksppinm LIKE '/_gc%' ESCAPE '/'  
  9  ORDER BY   REPLACE (i.ksppinm, '_', '');  

NAME                                DESCRIPTION                                                        VALUE
----------------------------------- ------------------------------------------------------------------ ------------------------------
_gc_affinity_locking                if TRUE, enable object affinity                                    TRUE
_gc_affinity_locks                  if TRUE, get affinity locks                                        TRUE
_gc_affinity_ratio                  dynamic object affinity ratio                                      50
_gc_async_memcpy                    if TRUE, use async memcpy                                          FALSE
_gc_bypass_readers                  if TRUE, modifications bypass readers                              TRUE
_gc_check_bscn                      if TRUE, check for stale blocks                                    TRUE
_gc_coalesce_recovery_reads         if TRUE, coalesce recovery reads                                   TRUE
_gc_cpu_time                        if TRUE, record the gc cpu time                                    FALSE
_gc_cr_server_read_wait             if TRUE, cr server waits for a read to complete                    TRUE
_gc_defer_ping_index_only           if TRUE, restrict deferred ping to index blocks only               TRUE
_gc_defer_time                      how long to defer pings for hot buffers in milliseconds            0

NAME                                DESCRIPTION                                                        VALUE
----------------------------------- ------------------------------------------------------------------ ------------------------------
_gc_delta_push_compression          if delta >= K bytes, compress before push                          3072
_gc_delta_push_max_level            max delta level for delta push                                     100
_gc_delta_push_objects              objects which use delta push                                       0
_gc_disable_s_lock_brr_ping_check   if TRUE, disable S lock BRR ping check for lost write protect      TRUE
_gc_down_convert_after_keep         if TRUE, down-convert lock after recovery                          TRUE
_gc_element_percent                 global cache element percent                                       110
_gc_escalate_bid                    if TRUE, escalates create a bid                                    TRUE
_gc_fg_merge                        if TRUE, merge pi buffers in the foreground                        TRUE
_gc_flush_during_affinity           if TRUE, flush during affinity                                     TRUE
_gc_fusion_compression              compress fusion blocks if there is free space                      1024
_gc_global_checkpoint_scn           if TRUE, enable global checkpoint scn                              TRUE

NAME                                DESCRIPTION                                                        VALUE
----------------------------------- ------------------------------------------------------------------ ------------------------------
_gc_global_cpu                      global cpu checks                                                  TRUE
_gc_global_lru                      turn global lru off, make it automatic, or turn it on              AUTO
_gc_global_lru_touch_count          global lru touch count                                             5
_gc_global_lru_touch_time           global lru touch time in seconds                                   60
_gc_integrity_checks                set the integrity check level                                      1
_gc_keep_recovery_buffers           if TRUE, make single instance crash recovery buffers current       TRUE
_gc_latches                         number of latches per LMS process                                  8
_gc_log_flush                       if TRUE, flush redo log before a current block transfer            TRUE
_gc_long_query_threshold            threshold for long running query                                   0
_gc_max_downcvt                     maximum downconverts to process at one time                        256
_gc_maximum_bids                    maximum number of bids which can be prepared                       0

NAME                                DESCRIPTION                                                        VALUE
----------------------------------- ------------------------------------------------------------------ ------------------------------
_gc_no_fairness_for_clones          if TRUE, no fairness if we serve a clone                           TRUE
_gc_object_queue_max_length         maximum length for an object queue                                 0
_gc_override_force_cr               if TRUE, try to override force-cr requests                         TRUE
_gc_persistent_read_mostly          if TRUE, enable persistent read-mostly locking                     TRUE
_gc_policy_minimum                  dynamic object policy minimum activity per minute                  1500
_gc_policy_time                     how often to make object policy decisions in minutes               10
_gc_read_mostly_flush_check         if TRUE, optimize flushes for read mostly objects                  FALSE
_gc_read_mostly_locking             if TRUE, enable read-mostly locking                                TRUE
_gcr_enable_high_cpu_kill           if TRUE, GCR may kill foregrounds under high load                  FALSE
_gcr_enable_high_cpu_rm             if TRUE, GCR may enable a RM plan under high load                  FALSE
_gcr_enable_high_cpu_rt             if TRUE, GCR may boost bg priority under high load                 FALSE

NAME                                DESCRIPTION                                                        VALUE
----------------------------------- ------------------------------------------------------------------ ------------------------------
_gcr_high_cpu_threshold             minimum amount of CPU process must consume to be kill target       10
_gcr_use_css                        if FALSE, GCR wont register with CSS nor use any CSS feature       TRUE
_gc_sanity_check_cr_buffers         if TRUE, sanity check CR buffers                                   FALSE
_gcs_disable_remote_handles         disable remote client/shadow handles                               FALSE
_gcs_disable_skip_close_remastering if TRUE, disable skip close optimization in remastering            FALSE
_gc_serve_high_pi_as_current        if TRUE, use a higher clone scn when serving a pi                  TRUE
_gcs_fast_reconfig                  if TRUE, enable fast reconfiguration for gcs locks                 TRUE
_gcs_latches                        number of gcs resource hash latches to be allocated per LMS proces 64
                                    s

_gcs_pkey_history                   number of pkey remastering history                                 4000

NAME                                DESCRIPTION                                                        VALUE
----------------------------------- ------------------------------------------------------------------ ------------------------------
_gcs_process_in_recovery            if TRUE, process gcs requests during instance recovery             TRUE
_gcs_resources                      number of gcs resources to be allocated
_gcs_res_per_bucket                 number of gcs resource per hash bucket                             4
_gcs_shadow_locks                   number of pcm shadow locks to be allocated
_gc_statistics                      if TRUE, kcl statistics are maintained                             TRUE
_gcs_testing                        GCS testing parameter                                              0
_gc_transfer_ratio                  dynamic object read-mostly transfer ratio                          2
_gc_undo_affinity                   if TRUE, enable dynamic undo affinity                              TRUE
_gc_undo_block_disk_reads           if TRUE, enable undo block disk reads                              TRUE
_gc_use_cr                          if TRUE, allow CR pins on PI and WRITING buffers                   TRUE
_gc_vector_read                     if TRUE, vector read current buffers                               TRUE

64 rows selected.

通过对比发现_gc_read_mostly_locking参数在1号实例中为true,2号实例为false

将所有实例中的_gc_read_mostly_locking参数设置为true

SQL> alter system set "_gc_read_mostly_locking"=true scope=spfile sid='*';

System altered.

重启2号实例成功

SQL> shutdown immediate
ORA-01507: database not mounted


ORACLE instance shut down.
SQL> startup
ORACLE instance started.

Total System Global Area 5.3447E+10 bytes
Fixed Size                  2265864 bytes
Variable Size            1.3019E+10 bytes
Database Buffers         4.0265E+10 bytes
Redo Buffers              160698368 bytes
Database mounted.
Database opened.

原因是因为之前有人修改过_gc_read_mostly_locking隐藏参数,只是只修改了1号实例。

Proudly powered by WordPress | Indrajeet by Sus Hill.