ORA-12547: TNS:lost contact故障的处理

redhat 5.4 oracle 10.2.0.5 rac其中一个节点连接实例时报ORA-12547: TNS:lost contact

[oracle@wj2 ~]$ sqlplus / as sysdba
SQL*Plus: Release 10.2.0.5.0 - Production on Wed Nov 26 22:12:42 2014
Copyright (c) 1982, 2010, Oracle. All Rights Reserved. 

ERROR:
ORA-12547: TNS:lost contact 

解决步骤如下:
1.先检查两个节点的操作系统内核参数是否有变化

[root@wj1 bin]# ulimit -a
 core file size (blocks, -c) 0
 data seg size (kbytes, -d) unlimited
 scheduling priority (-e) 0
 file size (blocks, -f) unlimited
 pending signals (-i) 16383
 max locked memory (kbytes, -l) 32
 max memory size (kbytes, -m) unlimited
 open files (-n) 1024
 pipe size (512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority (-r) 0
 stack size (kbytes, -s) 10240
 cpu time (seconds, -t) unlimited
 max user processes (-u) 16383
 virtual memory (kbytes, -v) unlimited
 file locks (-x) unlimited 


[root@wj2 bin]# ulimit -a
 core file size (blocks, -c) 0
 data seg size (kbytes, -d) unlimited
 scheduling priority (-e) 0
 file size (blocks, -f) unlimited
 pending signals (-i) 16383
 max locked memory (kbytes, -l) 32
 max memory size (kbytes, -m) unlimited
 open files (-n) 1024
 pipe size (512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority (-r) 0
 stack size (kbytes, -s) 10240
 cpu time (seconds, -t) unlimited
 max user processes (-u) 16383
 virtual memory (kbytes, -v) unlimited
 file locks (-x) unlimited 

两个节点一样排除内核参数的原因

2.在主机中安装了10g rac与11g rac所以要确认$ORACLE_HOME/bin/oracle文件权限是否有问题(rac下需要查看$ORACLE_HOME/bin/oracle和$GRID_HOME/bin/oracle)

[oracle@wj1 bin]$ ll $ORACLE_HOME/bin/oracle
 -rwsr-s--x 1 oracle oinstall 98849122 Nov 20 17:23 /u01/app/oracle/product/10.2.0/db/bin/oracle 

[oracle@wj2 bin]$ ll $ORACLE_HOME/bin/oracle
-rwxrwxr-x 1 grid oinstall 98849122 Nov 20 17:23 oracle 


[oracle@wj1 bin]$ ls -lrt $ORACLE_HOME/rdbms/lib/config.o
 -rw-r----- 1 oracle oinstall 816 Nov 20 17:24 /u01/app/oracle/product/10.2.0/db/rdbms/lib/config.o 

[oracle@wj2 bin]$ ls -lrt $ORACLE_HOME/rdbms/lib/config.o
-rwxrwxr-x 1 grid oinstall 816 Nov 20 17:24 /u01/app/oracle/product/10.2.0/db/rdbms/lib/config.o
[root@wj2 bin]# chown -R oracle:oinstall /u01/app/oracle 

很明oracle 10g rac的$ORACLE_HOME/bin/oracle与$ORACLE_HOME/rdbms/lib/config.o文件的所有者变成了grid用户,而grid用户是用来安装oracle 11g rac的。需要修改这两个文件的所有者

[oracle@wj2 bin]# chown -R oracle:oinstall $ORACLE_HOME/bin/oracle 

[oracle@wj2 bin]# chown -R oracle:oinstall $ORACLE_HOME/rdbms/lib/config.o 


[oracle@wj1 ~]$ sqlplus / as sysdba
SQL*Plus: Release 10.2.0.5.0 Production on Wed Nov 26 16:45:40 2014
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 10.2.0.5.0 - Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL> 

小结:在同一主机上最好不要安装多个版本的软件,如果确实要安装不要在同一个oracle_base目录下安装不同的版本,很容易造成权限混乱。

对union集合操作理解不足造成的巨大的灾难

朋友和我说发现了oracle union操作的一个bug,他的查询语句如下:

SQL> select sum(a1) a1,sum(a2) a2
  2  from (
  3  select 1 a1,0 a2 from dual
  4  union
  5  select 0 a2, 1 a1 from dual
  6  )
  7  ;

        A1         A2
---------- ----------
         1          1


SQL> select 1 a1,0 a2 from dual
  2  union
  3  select 0 a2, 1 a1 from dual
  4  ;

        A1         A2
---------- ----------
         0          1
         1          0

说结果出错了,应该a2为0,他说别名为 a2的值其实是0,说union不是按别名来进行区别的,一开始我还没反应过来。后面想了一下,union的意思是用来合并两条查询的结果并去重,在oracle关于sql的文档中是说union集合操作中的两个查询中的列的类型和个数要相同。例如将上面两个查询别名为a2的值从0改成’0’再执行union操作就会报错:

SQL> select 1 a1,'0' a2 from dual
  2  union
  3  select '0' a2, 1 a1 from dual
  4  ;

select 1 a1,'0' a2 from dual
union
select '0' a2, 1 a1 from dual

ORA-01790: 表达式必须具有与对应表达式相同的数据类型

从而说明了union其实是按列出现的顺序进行合并的,并不是按别名.对union的意思理解不对危害很大,从上面的列来说他本来的意思是a2应该为0,但因为他的顺序写错了结果变成了1。这种危害太大了,如果这些数字是钱本来应该为0的,结果因为查询中列的顺序写错了,就会造成巨大的损失。而朋友的系统就是这么巧就是money,且软件运行了有些年头了。

Oracle RAC修改public,private,vip scan IP

oracle rac修改IP涉及到OCR,在执行修改前要进行一个备份,这样即使修改失败,也能还原回来,如何备份可以参考
oracle-rac-ocr-和-votingdisk-的备份与恢复/
实验环境:redhat 5.4 oracle 10g rac,oracle 11g rac
原来两节点rac IP设置如下:
192.168.56.11 jy1
192.168.56.12 jy2
192.168.56.13 jy1-priv
192.168.56.14 jy2-priv
192.168.56.15 jy1-vip
192.168.56.16 jy2-vip
现在修改为
192.168.56.111 jy1
192.168.56.112 jy2
192.168.100.1 jy1-priv
192.168.100.2 jy2-priv
192.168.56.113 jy1-vip
192.168.56.114 jy2-vip

一. 停止oracle相关的所有进程, 包括数据库, asm, node application, crs
1.1 查看当前系统上crs运行的状态

[root@jy1 bin]# cd /u01/app/oracle/product/10.2.0/crs/bin
[root@jy1 bin]# pwd
/u01/app/oracle/product/10.2.0/crs/bin
[root@jy1 bin]# ./crs_stat -t
Name           Type           Target    State     Host        
------------------------------------------------------------
ora....SM1.asm application    ONLINE    ONLINE    jy1         
ora....Y1.lsnr application    ONLINE    ONLINE    jy1         
ora.jy1.gsd    application    ONLINE    ONLINE    jy1         
ora.jy1.ons    application    ONLINE    ONLINE    jy1         
ora.jy1.vip    application    ONLINE    ONLINE    jy1         
ora....SM2.asm application    ONLINE    ONLINE    jy2         
ora....Y2.lsnr application    ONLINE    ONLINE    jy2         
ora.jy2.gsd    application    ONLINE    ONLINE    jy2         
ora.jy2.ons    application    ONLINE    ONLINE    jy2         
ora.jy2.vip    application    ONLINE    ONLINE    jy2         
ora.jyrac.db   application    ONLINE    ONLINE    jy1         
ora....c1.inst application    ONLINE    ONLINE    jy1         
ora....c2.inst application    ONLINE    ONLINE    jy2     

1.2 关闭数据库

[root@jy1 bin]# ./srvctl stop database -d jyrac

1.3 关闭asm实例

[root@jy1 bin]# ./srvctl stop asm -n jy1
[root@jy1 bin]# ./srvctl stop asm -n jy2

1.4 关闭其他应用程序

[root@jy1 bin]# ./srvctl stop nodeapps -n jy1
[root@jy1 bin]# ./srvctl stop nodeapps -n jy2

1.5 关闭crs后台进程, 在操作系统一级中止运行的crs后台进程, 必须在所有节点上运行.

[root@jy1 bin]# /etc/init.d/init.crs stop
Shutting down Oracle Cluster Ready Services (CRS):
Stopping resources.
Successfully stopped CRS resources 
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.
Shutdown has begun. The daemons should exit soon.

[root@jy2 bin]# /etc/init.d/init.crs stop
Shutting down Oracle Cluster Ready Services (CRS):
Stopping resources.
Successfully stopped CRS resources 
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.
Shutdown has begun. The daemons should exit soon.

二 修改操作系统的ip设置
2.1 修改 /etc/hosts 文件。 保持2个节点的一致

[root@jy1 bin]# vi /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               jy1 localhost.localdomain localhost
::1             localhost6.localdomain6 localhost6
192.168.56.111           jy1
192.168.56.112           jy2
192.168.100.1           jy1-priv
192.168.100.2           jy2-priv
192.168.56.113           jy1-vip
192.168.56.114           jy2-vip

[root@jy2 bin]# vi /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               jy2 localhost.localdomain localhost
::1             localhost6.localdomain6 localhost6
192.168.56.111           jy1
192.168.56.112           jy2
192.168.100.1           jy1-priv
192.168.100.2           jy2-priv
192.168.56.113           jy1-vip
192.168.56.114           jy2-vip

2.2 在两个节点上分别修改eth0和eth1地址:

[root@jy1 bin]# vi /etc/sysconfig/network-scripts/ifcfg-eth0
# Intel Corporation 82540EM Gigabit Ethernet Controller
DEVICE=eth0
BOOTPROTO=none
ONBOOT=yes
HWADDR=08:00:27:54:5a:6a
NETMASK=255.255.255.0
IPADDR=192.168.56.111
GATEWAY=192.168.56.1
TYPE=Ethernet
USERCTL=no
IPV6INIT=no
PEERDNS=yes



[root@jy2 bin]# vi /etc/sysconfig/network-scripts/ifcfg-eth0
# Intel Corporation 82540EM Gigabit Ethernet Controller
DEVICE=eth0
BOOTPROTO=none
ONBOOT=yes
HWADDR=08:00:27:41:5d:e5
NETMASK=255.255.255.0
IPADDR=192.168.56.112
GATEWAY=192.168.56.1
TYPE=Ethernet
USERCTL=no
IPV6INIT=no
PEERDNS=yes


[root@jy1 bin]# vi /etc/sysconfig/network-scripts/ifcfg-eth1
IPADDR=192.168.100.1
# Intel Corporation 82540EM Gigabit Ethernet Controller
DEVICE=eth1
BOOTPROTO=none
ONBOOT=yes
HWADDR=08:00:27:c3:b7:37
NETMASK=255.255.255.0
IPADDR=192.168.100.1
TYPE=Ethernet
USERCTL=no
IPV6INIT=no
PEERDNS=yes

[root@jy2 bin]# vi /etc/sysconfig/network-scripts/ifcfg-eth1
GATEWAY=192.168.56.1
# Intel Corporation 82540EM Gigabit Ethernet Controller
DEVICE=eth1
BOOTPROTO=none
ONBOOT=yes
HWADDR=08:00:27:4a:e3:cb
TYPE=Ethernet
NETMASK=255.255.255.0
IPADDR=192.168.100.2
USERCTL=no
IPV6INIT=no
PEERDNS=yes

2.3 在两个节点上分别重启网络服务,是修改的IP生效

[root@jy1 bin]# service network restart

[root@jy2 bin]# service network restart

三 启动crs, 设置oracle中ip地址相关的设置
3.1 在两个节点启动crs, 并关闭随crs启动的所有服务

[root@jy1 bin]# /etc/init.d/init.crs start
Startup will be queued to init within 90 seconds.

[root@jy2 bin]# /etc/init.d/init.crs start
Startup will be queued to init within 90 seconds.

由于oracle所有服务设置为自动启动, 所以在crs启动时会试图启动所有的服务, 但是在对oracle相关的ip地址进行设置时需要crs处于运行状态. 但是数据库, asm和node application处于停止状态, 在一个节点执行以下命令来停止所有服务:

[root@jy1 bin]# ./srvctl stop database -d jyrac
[root@jy1 bin]# ./srvctl stop asm -n jy1
[root@jy1 bin]# ./srvctl stop asm -n jy2
[root@jy1 bin]# ./srvctl stop nodeapps -n jy1
[root@jy1 bin]# ./srvctl stop nodeapps -n jy2

[root@jy1 bin]# ./crs_stat -t
Name           Type           Target    State     Host        
------------------------------------------------------------
ora....SM1.asm application    OFFLINE   OFFLINE               
ora....Y1.lsnr application    OFFLINE   OFFLINE               
ora.jy1.gsd    application    OFFLINE   OFFLINE               
ora.jy1.ons    application    OFFLINE   OFFLINE               
ora.jy1.vip    application    OFFLINE   OFFLINE               
ora....SM2.asm application    OFFLINE   OFFLINE               
ora....Y2.lsnr application    OFFLINE   OFFLINE               
ora.jy2.gsd    application    OFFLINE   OFFLINE               
ora.jy2.ons    application    OFFLINE   OFFLINE               
ora.jy2.vip    application    OFFLINE   OFFLINE               
ora.jyrac.db   application    OFFLINE   OFFLINE               
ora....c1.inst application    OFFLINE   OFFLINE               
ora....c2.inst application    OFFLINE   OFFLINE     

3.2 使用oifcfg修改网卡设置, oifconfig可以用来设置和查看网卡使用的方式(如果修改的网段相同可以不做这一步)
3.2.1查看当前配置:

[root@jy1 bin]# ./oifcfg getif -global
eth0  192.168.56.0  global  public
eth1  192.168.56.0  global  cluster_interconnect

3.2.2删除当前配置

[root@jy1 bin]# ./oifcfg delif -global eth0
[root@jy1 bin]# ./oifcfg delif -global eth1
[root@jy1 bin]# ./oifcfg getif

3.2.3重新添加

[root@jy1 bin]# ./oifcfg setif -global eth0/192.168.56.0:public
[root@jy1 bin]# ./oifcfg setif -global eth1/192.168.100.0:cluster_interconnect
[root@jy1 bin]# ./oifcfg getif
eth0  192.168.56.0  global  public
eth1  192.168.100.0  global  cluster_interconnect
[root@jy1 bin]# ./oifcfg iflist
eth0  192.168.56.0
eth1  192.168.100.0

[root@jy2 bin]# ./oifcfg iflist
eth0  192.168.56.0
eth1  192.168.100.0

这里IP 地址最后一个为0,代表的是一个网段。修改的时候要切记。否在在启动OCR 时会报如下错误:[ CRSOCR][4054413904] OCR context init failure. Error: PROC-44: 网络地址和网络接口操作中出错 网络地址和网络接口操作错误 [7]

3.3 修改VIP

[root@jy1 bin]# ./srvctl modify nodeapps -n jy1 -A 192.168.56.113/255.255.255.0/eth0

[root@jy1 bin]# ./srvctl modify nodeapps -n jy2 -A 192.168.56.114/255.255.255.0/eth0

3.4 设置listener.ora和tnsnames.ora, 检查这些文件中是否有指定原来ip的地方,
修改为更改后的ip地址(注意:如果你使用了ocfs,修改ocfs配置文件(/etc/ocfs/cluster.conf),
验证修改后是否可用)

在节点jy1上没有使用原来的固定IP不用修改
[root@jy1 bin]# vi /u01/app/oracle/product/10.2.0/db/network/admin/listener.ora
# listener.ora.jy1 Network Configuration File: /u01/app/oracle/product/10.2.0/db/network/admin/listener.ora.jy1
# Generated by Oracle configuration tools.

LISTENER_JY1 =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = jy1-vip)(PORT = 1521)(IP = FIRST))
)
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 127.0.0.1)(PORT = 1521)(IP = FIRST))
)
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC))
)
)
)

节点jy2使用了原来的IP192.168.56.12所以将其修改为192.168.56.112
[root@jy2 bin]# cat /u01/app/oracle/product/10.2.0/db/network/admin/listener.ora
# listener.ora.jy2 Network Configuration File: /u01/app/oracle/product/10.2.0/db/network/admin/listener.ora.jy2
# Generated by Oracle configuration tools.

LISTENER_JY2 =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = jy2-vip)(PORT = 1521)(IP = FIRST))
)
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.56.12)(PORT = 1521)(IP = FIRST))
)
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC))
)
)
)

[root@jy2 bin]# vi /u01/app/oracle/product/10.2.0/db/network/admin/listener.ora
# listener.ora.jy2 Network Configuration File: /u01/app/oracle/product/10.2.0/db/network/admin/listener.ora.jy2
# Generated by Oracle configuration tools.

LISTENER_JY2 =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = jy2-vip)(PORT = 1521)(IP = FIRST))
)
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.56.112)(PORT = 1521)(IP = FIRST))
)
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC))
)
)
)

3.5 启动node applications, asm, 数据库
[root@jy1 bin]# ./srvctl start nodeapps -n jy1
[root@jy1 bin]# ./srvctl start nodeapps -n jy2
[root@jy1 bin]# ./srvctl start asm -n jy1
[root@jy1 bin]# ./srvctl start asm -n jy2
[root@jy1 bin]# ./srvctl start instance -d jyrac -i jyrac1
[root@jy1 bin]# ./srvctl start instance -d jyrac -i jyrac2

[root@jy1 bin]# ./crs_stat -t
Name Type Target State Host
————————————————————
ora….SM1.asm application ONLINE ONLINE jy1
ora….Y1.lsnr application ONLINE ONLINE jy1
ora.jy1.gsd application ONLINE ONLINE jy1
ora.jy1.ons application ONLINE ONLINE jy1
ora.jy1.vip application ONLINE ONLINE jy1
ora….SM2.asm application ONLINE ONLINE jy2
ora….Y2.lsnr application ONLINE ONLINE jy2
ora.jy2.gsd application ONLINE ONLINE jy2
ora.jy2.ons application ONLINE ONLINE jy2
ora.jy2.vip application ONLINE ONLINE jy2
ora.jyrac.db application ONLINE ONLINE jy2
ora….c1.inst application ONLINE ONLINE jy1
ora….c2.inst application ONLINE ONLINE jy2

3.6 确认IP, 修改完成。
[root@jy1 bin]# ifconfig
eth0 Link encap:Ethernet HWaddr 08:00:27:54:5A:6A
inet addr:192.168.56.111 Bcast:192.168.56.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fe54:5a6a/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:515539 errors:0 dropped:0 overruns:0 frame:0
TX packets:546714 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:294661949 (281.0 MiB) TX bytes:357427245 (340.8 MiB)
Base address:0xd010 Memory:f0000000-f0020000

eth0:1 Link encap:Ethernet HWaddr 08:00:27:54:5A:6A
inet addr:192.168.56.113 Bcast:192.168.56.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Base address:0xd010 Memory:f0000000-f0020000

eth1 Link encap:Ethernet HWaddr 08:00:27:C3:B7:37
inet addr:192.168.100.1 Bcast:192.168.100.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fec3:b737/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:21818 errors:0 dropped:0 overruns:0 frame:0
TX packets:32288 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:10823304 (10.3 MiB) TX bytes:26553389 (25.3 MiB)
Base address:0xd240 Memory:f0820000-f0840000

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:180050 errors:0 dropped:0 overruns:0 frame:0
TX packets:180050 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:57676272 (55.0 MiB) TX bytes:57676272 (55.0 MiB)

[root@jy2 bin]# ifconfig
eth0 Link encap:Ethernet HWaddr 08:00:27:41:5D:E5
inet addr:192.168.56.112 Bcast:192.168.56.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fe41:5de5/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:545442 errors:0 dropped:0 overruns:0 frame:0
TX packets:514967 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:356961066 (340.4 MiB) TX bytes:295147830 (281.4 MiB)
Base address:0xd010 Memory:f0000000-f0020000

eth0:1 Link encap:Ethernet HWaddr 08:00:27:41:5D:E5
inet addr:192.168.56.114 Bcast:192.168.56.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Base address:0xd010 Memory:f0000000-f0020000

eth1 Link encap:Ethernet HWaddr 08:00:27:4A:E3:CB
inet addr:192.168.100.2 Bcast:192.168.100.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fe4a:e3cb/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:32190 errors:0 dropped:0 overruns:0 frame:0
TX packets:21813 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:26535297 (25.3 MiB) TX bytes:10821634 (10.3 MiB)
Base address:0xd240 Memory:f0820000-f0840000

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:163867 errors:0 dropped:0 overruns:0 frame:0
TX packets:163867 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:30461066 (29.0 MiB) TX bytes:30461066 (29.0 MiB)

从输出结果可知修改成功.

四 修改oracle 11g rac的scan IP,我这里没有使用DNS,只设置了一个scan IP:10.138.130.155现将其修改为10.138.130.159(因为11g rac修改public,private,vip IP与上面10g rac一样不再说明)

root@jyrac1 ~]# cat  /etc/hosts
10.138.130.151 jyrac1
10.138.130.152 jyrac2
10.10.10.1     jyrac1-priv
10.10.10.2     jyrac2-priv
10.138.130.153 jyrac1-vip
10.138.130.154 jyrac2-vip
10.138.130.155 jyrac-scan

4.1在所有节点中 /etc/hosts 文件中修改 scan 对应的ip

[root@jyrac1 ~]# vi /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               jyrac1 localhost.localdomain localhost
::1             localhost6.localdomain6 localhost6
10.138.130.151 jyrac1
10.138.130.152 jyrac2
10.10.10.1     jyrac1-priv
10.10.10.2     jyrac2-priv
10.138.130.153 jyrac1-vip
10.138.130.154 jyrac2-vip
10.138.130.159 jyrac-scan
~
[root@jyrac2 ~]# vi /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               jyrac2 localhost.localdomain localhost
::1             localhost6.localdomain6 localhost6
10.138.130.151 jyrac1
10.138.130.152 jyrac2
10.10.10.1     jyrac1-priv
10.10.10.2     jyrac2-priv
10.138.130.153 jyrac1-vip
10.138.130.154 jyrac2-vip
10.138.130.159 jyrac-scan

4.2 查看scan IP信息

[grid@jyrac1 ~]$ srvctl config scan
SCAN name: jyrac-scan, Network: 1/10.138.130.0/255.255.255.0/eth0
SCAN VIP name: scan1, IP: /jyrac-scan/10.138.130.155

4.3 停止scan_listener ,scan 应用

[grid@jyrac2 ~]$ srvctl stop scan_listener
[grid@jyrac2 ~]$ srvctl stop scan

4.4 确认 scan_listener,scan 的状态

[grid@jyrac1 ~]$ srvctl status scan_listener
SCAN Listener LISTENER_SCAN1 is enabled
SCAN listener LISTENER_SCAN1 is not running
[grid@jyrac1 ~]$ srvctl status scan
SCAN VIP scan1 is enabled
SCAN VIP scan1 is not running

[grid@jyrac1 ~]$ crs_stat -t | grep  scan  
ora.scan1.vip  ora....ip.type OFFLINE   OFFLINE               
[grid@jyrac1 ~]$ crs_stat -t | grep  lsnr
ora....ER.lsnr ora....er.type ONLINE    ONLINE    jyrac1      
ora....N1.lsnr ora....er.type OFFLINE   OFFLINE               
ora....C1.lsnr application    ONLINE    ONLINE    jyrac1      
ora....C2.lsnr application    ONLINE    ONLINE    jyrac2 

4.5 修改scan 名称

[grid@jyrac1 ~]$ srvctl modify scan -h

Modifies the SCAN name.

Usage: srvctl modify scan -n 
    -n            Domain name qualified SCAN name 域名限定的 SCAN 名
    -h                       Print usage 输出帮助信息

grid 用户无权更改scan ip,必须使用root权限

[grid@jyrac2 ~]$ srvctl modify scan -n jyrac-scan
PRCS-1034 : Failed to modify Single Client Access Name jyrac-scan
PRCR-1071 : Failed to register or update resource type ora.scan_vip.type
CRS-0245:  User doesn't have enough privilege to perform the operation

[root@jyrac1 bin]# cd /u01/app/product/11.2.0/crs/bin
[root@jyrac1 bin]# pwd
/u01/app/product/11.2.0/crs/bin
[root@jyrac1 bin]# ./srvctl modify scan -n jyrac-scan

4.6 变更后,进行确认

[root@jyrac1 bin]# ./srvctl config scan
SCAN name: jyrac-scan, Network: 1/10.138.130.0/255.255.255.0/eth0
SCAN VIP name: scan1, IP: /jyrac-scan/10.138.130.159

4.7 重新启动scan,scan_listener

[grid@jyrac1 ~]$ srvctl start scan
[grid@jyrac1 ~]$ srvctl start scan_listener

4.8 确认scan 和scan_listener.ora的状态

[grid@jyrac1 ~]$ crs_stat -t | grep  scan
ora.scan1.vip  ora....ip.type ONLINE    ONLINE    jyrac1      
[grid@jyrac1 ~]$ crs_stat -t | grep  lsnr
ora....ER.lsnr ora....er.type ONLINE    ONLINE    jyrac1      
ora....N1.lsnr ora....er.type ONLINE    ONLINE    jyrac1      
ora....C1.lsnr application    ONLINE    ONLINE    jyrac1      
ora....C2.lsnr application    ONLINE    ONLINE    jyrac2  

4.9 进行ping测试

[grid@jyrac2 ~]$ ping 10.138.130.159
PING 10.138.130.159 (10.138.130.159) 56(84) bytes of data.
64 bytes from 10.138.130.159: icmp_seq=1 ttl=64 time=1.88 ms
64 bytes from 10.138.130.159: icmp_seq=2 ttl=64 time=0.166 ms

修改scan IP与修改private ip ,vip 不一样,修改scan ip 无需停止数据库实例,asm 或者重启crs比较简单

Oracle RAC OCR 和 VotingDisk 的备份与恢复

一:Voting Disk
Voting Disk 这个文件主要用于记录节点成员状态,在出现脑裂时,决定那个Partion获得控制权,其他的Partion必须从集群中剔除。Voting disk使用的是一种“多数可用算法”,如果有多个Voting disk,则必须一半以上的Votedisk同时使用,Clusterware才能正常使用。 比如配置了4个Votedisk,坏一个Votedisk,集群可以正常工作,如果坏了2个,则不能满足半数以上,集群会立即宕掉,
所有节点立即重启,所以如果添加Votedisk,尽量不要只添加一个,而应该添加2个。这点和OCR 不一样。OCR 只需配置一个。

1.1查看votedisk的位置:

[root@jy1 ~]# cd u01/app/oracle/product/10.2.0/crs/bin

[root@jy1 bin]# ./crsctl query css votedisk
 0.     0    /dev/raw/raw2

located 1 votedisk(s).

1.2备份votedisk盘:

[root@jy1 bin]# dd if=/dev/raw/raw2 of=/home/oracle/votedisk.bak
6291456+0 records in
6291456+0 records out
3221225472 bytes (3.2 GB) copied, 201.63 seconds, 16 MB/s

1.3通过Strings 命令来查看 voting disk 的内容

[root@jy1 bin]# strings /home/oracle/votedisk.bak | sort -u
fSLC
ssLckcoT
SslcLlik
sSlcrEp0
}|{z

1.4恢复votedisk盘:

[root@jy1 bin]# dd if=/home/oracle/votedisk.bak of=/dev/raw/raw2
6291456+0 records in
6291456+0 records out
3221225472 bytes (3.2 GB) copied, 201.63 seconds, 16 MB/s

二 :OCR
Oracle Clusterware把整个集群的配置信息放在共享存储上,这些信息包括了集群节点的列表,集群数据库实例到节点的映射以及CRS应用程序资源信息。存放的位置就在OCR Disk上. 在整个集群中,只有一个节点能对OCR Disk 进行读写操作,这个节点叫作Master Node,所有节点都会在内存中保留一份OCR的拷贝,同时有一个OCR Process 从这个内存中读取内容。 OCR 内容发生改变时,由Master Node的OCR Process负责同步到其他节点的OCR Process。

Oracle 每4个小时对其做一次备份,并且保留最后的3个备份,以及前一天,前一周的最后一个备份。 这个备份由Master Node CRSD进程完成,备份的默认位置是$CRS_HOME/crs/cdata/目录下,可以通过ocrconfig -backuploc 命令修改到新的目录。 每次备份后,备份文件名自动更改,以反应备份时间顺序,最近一次的备份
叫作backup00.ocr。这些备份文件除了保存在本地,DBA还应该在其他存储设备上保留一份,以防止意外的存储故障。

[root@jy1 crs]# pwd
/u01/app/oracle/product/10.2.0/crs/cdata/crs
[root@jy1 crs]# ls -lrt
total 12396
-rw-r--r-- 1 root root 4227072 Nov 17 14:45 backup00.ocr
-rw-r--r-- 1 root root 4227072 Nov 17 14:45 week.ocr
-rw-r--r-- 1 root root 4227072 Nov 17 14:45 day.ocr

在安装clusterware过程中,如果选择External Redundancy冗余方式,则只能输入一个OCR磁盘位置。 但是Oracle允许配置两个OCR 磁盘互为镜像,以防止OCR 磁盘的单点故障。 OCR 磁盘和Votedisk磁盘不一样,OCR磁盘最多只能有两个,一个Primary OCR 和一个Mirror OCR。

Oracle 推荐在对集群做调整时,比如增加,删除节点之前,修改RAC IP之前,对OCR做一个备份,可以使用export 备份到指定文件,如果做了replace或者restore 等操作,Oracle 建议使用 cluvfy comp ocr -n all 命令来做一次全面的检查。对OCR的备份与恢复,我们可以使用ocrconfig 命令。

[root@jy1 bin]# ./ocrconfig --help
Name:
        ocrconfig - Configuration tool for Oracle Cluster Registry.

Synopsis:
        ocrconfig [option]
        option:
                -export  [-s online]
                                                    - Export cluster register contents to a file
                -import                   - Import cluster registry contents from a file
                -upgrade [ []]
                                                    - Upgrade cluster registry from previous version
                -downgrade [-version ]
                                                    - Downgrade cluster registry to the specified version
                -backuploc                 - Configure periodic backup location
                -showbackup                         - Show backup information
                -restore                  - Restore from physical backup
                -replace ocr|ocrmirror [] - Add/replace/remove a OCR device/file
                -overwrite                          - Overwrite OCR configuration on disk
                -repair ocr|ocrmirror     - Repair local OCR configuration
                -help                               - Print out this help information

Note:
        A log file will be created in
        $ORACLE_HOME/log//client/ocrconfig_.log. Please ensure
        you have file creation privileges in the above directory before
        running this tool.

1. 用导出导入备份恢复OCR

1.1首先关闭所有节点的CRS

[root@jy1 bin]# ./crsctl stop crs
Stopping resources.
Successfully stopped CRS resources 
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.

[root@jy2 bin]# ./crsctl stop crs
Stopping resources.
Successfully stopped CRS resources 
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.

1.2用root 用户导出OCR内容

[root@jy1 bin]# ./ocrconfig -export /u01/ocrbak.exp
[root@jy1 bin]# ls -lrt /u01
total 96
drwxr-xr-x 3 root root  4096 Nov 10 23:12 app
drwxrwxrwx 6 root root  4096 Nov 11 11:54 tmp
-rw-r--r-- 1 root root 84375 Nov 17 16:52 ocrbak.exp

1.3重启CRS

[root@jy1 bin]# ./crsctl start crs
Attempting to start CRS stack 
The CRS stack will be started shortly

[root@jy2 bin]# ./crsctl start crs
Attempting to start CRS stack 
The CRS stack will be started shortly

1.4检查CRS 状态

Cannot communicate with EVM 
[root@jy1 bin]# ./crsctl check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy

[root@jy2 bin]# ./crsctl check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy

[root@jy1 bin]# ./crs_stat -t
Name           Type           Target    State     Host        
------------------------------------------------------------
ora....SM1.asm application    ONLINE    ONLINE    jy1         
ora....Y1.lsnr application    ONLINE    ONLINE    jy1         
ora.jy1.gsd    application    ONLINE    ONLINE    jy1         
ora.jy1.ons    application    ONLINE    ONLINE    jy1         
ora.jy1.vip    application    ONLINE    ONLINE    jy1         
ora....SM2.asm application    ONLINE    ONLINE    jy2         
ora....Y2.lsnr application    ONLINE    ONLINE    jy2         
ora.jy2.gsd    application    ONLINE    ONLINE    jy2         
ora.jy2.ons    application    ONLINE    ONLINE    jy2         
ora.jy2.vip    application    ONLINE    ONLINE    jy2         
ora.jyrac.db   application    ONLINE    ONLINE    jy2         
ora....c1.inst application    ONLINE    ONLINE    jy1         
ora....c2.inst application    ONLINE    ONLINE    jy2    

1.5 检查OCR一致性

[root@jy1 bin]# ./ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :    3145640
         Used space (kbytes)      :       3816
         Available space (kbytes) :    3141824
         ID                       : 1032702449
         Device/File Name         : /dev/raw/raw1
                                    Device/File integrity check succeeded

                                    Device/File not configured

         Cluster registry integrity check succeeded

1.6破坏OCR内容

[root@jy1 bin]# dd if=/dev/zero of=/dev/raw/raw1 bs=8192 count=1000
1000+0 records in
1000+0 records out
8192000 bytes (8.2 MB) copied, 0.355733 seconds, 23.0 MB/s

1.7再次检查OCR一致性

[root@jy1 bin]# ./ocrcheck
PROT-601: Failed to initialize ocrcheck

再来执行crs_stat -t命令就会发现crs已经终止了

[root@jy1 bin]# ./crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.

1.8使用cluvfy 工具检查一致性

[root@jy1 cluvfy]# su - oracle
[oracle@jy1 ~]$ cd /soft/clusterware/cluvfy
[oracle@jy1 ~]$ ./runcluvfy.sh comp ocr -n all

Verifying OCR integrity 
Unable to retrieve nodelist from Oracle clusterware.

Verification cannot proceed.

1.9使用Import 恢复OCR 内容(使用restore选项只能导入OCR自动产生的物理备份, import选项只能导入通过export选项导出的的逻辑备份)

[root@jy1 bin]#  ./ocrconfig -import /u01/ocrbak.exp

1.10 再次检查OCR

[root@jy1 bin]# ./ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :    3145640
         Used space (kbytes)      :       3816
         Available space (kbytes) :    3141824
         ID                       : 1032702449
         Device/File Name         : /dev/raw/raw1
                                    Device/File integrity check succeeded

                                    Device/File not configured

         Cluster registry integrity check succeeded

1.11 使用cluvfy工具检查

[root@jy1 cluvfy]# su - oracle
[oracle@jy1 ~]$ cd /soft/clusterware/cluvfy
[oracle@jy1 cluvfy]$ ./runcluvfy.sh comp ocr -n all

Verifying OCR integrity 

Checking OCR integrity...

Checking the absence of a non-clustered configuration...
All nodes free of non-clustered, local-only configurations.

Uniqueness check for OCR device passed.

Checking the version of OCR...
OCR of correct Version "2" exists.

Checking data integrity of OCR...
Data integrity check for OCR passed.

OCR integrity check passed.

Verification of OCR integrity was successful.

2使用自动备份恢复OCR
2.1关闭运行在集群数据库的所有节点上的CRS服务程序(在Oracle 11gR2 中已经没有了init.crs 命令了。 只能通过crsctl stop crs命令来关闭CRS.)
/etc/init.d/init.crs stop 或者crsctl stop crs

2.2 通过ocrconfig 的showbackup选项查看最近的备份

[root@jy1 bin]# /etc/init.d/init.crs stop
Shutting down Oracle Cluster Ready Services (CRS):
Stopping resources.
Successfully stopped CRS resources 
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.
Shutdown has begun. The daemons should exit soon.

[root@jy2 bin]# /etc/init.d/init.crs stop
Shutting down Oracle Cluster Ready Services (CRS):
Stopping resources.
Successfully stopped CRS resources 
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.
Shutdown has begun. The daemons should exit soon.

2.2通过ocrconfig 的showbackup选项查看最近的备份

[root@jy1 bin]# ./ocrconfig -showbackup

jy1     2014/11/17 14:45:54     /u01/app/oracle/product/10.2.0/crs/cdata/crs

          

          

jy1     2014/11/17 14:45:54     /u01/app/oracle/product/10.2.0/crs/cdata/crs

jy1     2014/11/17 14:45:54     /u01/app/oracle/product/10.2.0/crs/cdata/crs
[root@jy1 bin]# ls -lrt /u01/app/oracle/product/10.2.0/crs/cdata/crs
total 12396
-rw-r--r-- 1 root root 4227072 Nov 17 14:45 backup00.ocr
-rw-r--r-- 1 root root 4227072 Nov 17 14:45 week.ocr
-rw-r--r-- 1 root root 4227072 Nov 17 14:45 day.ocr

2.3破坏OCR内容

[root@jy1 bin]# dd if=/dev/zero of=/dev/raw/raw1 bs=8192 count=1000
1000+0 records in
1000+0 records out
8192000 bytes (8.2 MB) copied, 0.355733 seconds, 23.0 MB/s

2.4再次检查OCR一致性

[root@jy1 bin]# ./ocrcheck
PROT-601: Failed to initialize ocrcheck

再来执行crs_stat -t命令就会发现crs已经终止了

[root@jy1 bin]# ./crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.

2.5使用cluvfy 工具检查一致性

[root@jy1 cluvfy]# su - oracle
[oracle@jy1 ~]$ cd /soft/clusterware/cluvfy
[oracle@jy1 ~]$ ./runcluvfy.sh comp ocr -n all

Verifying OCR integrity 
Unable to retrieve nodelist from Oracle clusterware.

Verification cannot proceed.

2.6通过ocrconfig的restore或import选项导入OCR数据(使用restore选项只能导入OCR自动产生的物理备份,import选项只能导入通过export选项导出的的逻辑备份)
ocrconfig -restore filename_location

                         
[root@jy1 bin]# ./ocrconfig  -restore  /u01/app/oracle/product/10.2.0/crs/cdata/crs/backup00.ocr

2.7 检查CRS

[root@jy1 bin]# ./ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :    3145640
         Used space (kbytes)      :       3816
         Available space (kbytes) :    3141824
         ID                       : 1387716561
         Device/File Name         : /dev/raw/raw1
                                    Device/File integrity check succeeded

                                    Device/File not configured

         Cluster registry integrity check succeeded

2.8 使用cluvfy工具检查

[root@jy1 cluvfy]# su - oracle
[oracle@jy1 ~]$ cd /soft/clusterware/cluvfy
[oracle@jy1 cluvfy]$ ./runcluvfy.sh comp ocr -n all

Verifying OCR integrity 

Checking OCR integrity...

Checking the absence of a non-clustered configuration...
All nodes free of non-clustered, local-only configurations.

Uniqueness check for OCR device passed.

Checking the version of OCR...
OCR of correct Version "2" exists.

Checking data integrity of OCR...
Data integrity check for OCR passed.

OCR integrity check passed.

Verification of OCR integrity was successful. 

2.9 在所有节点上重新启动CRS
/etc/init.d/init.crs start 而在Oracle 11gR2使用:crsctl start crs 命令来启动CRS.

[root@jy1 bin]# /etc/init.d/init.crs start
Startup will be queued to init within 90 seconds.

[root@jy2 bin]# /etc/init.d/init.crs start
Startup will be queued to init within 90 seconds.

[root@jy1 bin]# ./crs_stat -t
Name           Type           Target    State     Host        
------------------------------------------------------------
ora....SM1.asm application    ONLINE    ONLINE    jy1         
ora....Y1.lsnr application    ONLINE    ONLINE    jy1         
ora.jy1.gsd    application    ONLINE    ONLINE    jy1         
ora.jy1.ons    application    ONLINE    ONLINE    jy1         
ora.jy1.vip    application    ONLINE    ONLINE    jy1         
ora....SM2.asm application    ONLINE    ONLINE    jy2         
ora....Y2.lsnr application    ONLINE    ONLINE    jy2         
ora.jy2.gsd    application    ONLINE    ONLINE    jy2         
ora.jy2.ons    application    ONLINE    ONLINE    jy2         
ora.jy2.vip    application    ONLINE    ONLINE    jy2         
ora.jyrac.db   application    ONLINE    ONLINE    jy1         
ora....c1.inst application    ONLINE    ONLINE    jy1         
ora....c2.inst application    ONLINE    ONLINE    jy2   

Linux 5.4在安装rac执行root.sh Failed to upgrade Oracle Cluster Registry configuration故障处理

因为oracle 10g版权被全省买断,11g没有买版权所以朋友要在Red hat Linux 5.4下使用裸设备来安装oracle 10g rac数据库。但在安装过程出现了一些错误 到节点2上用root用户身份执行/u01/app/oracle/product/10.2.0/crs下的root.sh:
/u01/app/product/10.2.0/crs/jdk/jre//bin/java: error while loading shared libraries:
libpthread.so.0: cannot open shared object file: No such file or directory

节点2上执行root.sh之前需要编辑两个文件,这两个文件都位于/u01/app/oracle/product/10.2.0/crs/bin下,文件名分别为vipca和srvctl。首先编辑vipca文件,找到如下内容:
Remove this workaround when the bug 3937317 is fixed arch=`uname -m`

if [ “$arch” = “i686” -o “$arch” = “ia64” ]
then
LD_ASSUME_KERNEL=2.4.19
export LD_ASSUME_KERNEL
fi
#End workaround
在fi 后新添加一行:
unset LD_ASSUME_KERNEL
然后编辑srvctl文件,找到如下内容:
LD_ASSUME_KERNEL=2.4.19
export LD_ASSUME_KERNEL
同样在其后新增加一行: unset LD_ASSUME_KERNEL

同时需要你改的还有节点1上的$ORACLE_HOME/bin/srvctl 文件,不然等装完数据库之后,srvctl 命令也是会报这个错误地。

再次执行root.sh脚本报错:Failed to upgrade Oracle Cluster Registry configuration
[root@rac2 crs]#./u01/app/oracle/product/10.2.0/crs/root.sh
WARNING: directory ‘/u01/app/oracle/product’ is not owned by root
WARNING: directory ‘/u01/app/oracle’ is not owned by root
WARNING: directory ‘/u01/app’ is not owned by root
WARNING: directory ‘/u01’ is not owned by root
Checking to see if Oracle CRS stack is already configured

Setting the permissions on OCR backup directory
Setting up NS directories
PROT-1: Failed to initialize ocrconfig
Failed to upgrade Oracle Cluster Registry configuration
一般出现这个问题的原因有几种:

(1) Bug
这种情况参考:
Executing root.sh errors with “Failed To Upgrade Oracle Cluster Registry Configuration” [ID 466673.1]
(2) raw 设备的权限问题
[root@rac2 /]# ls -lrt /dev/raw/
total 0
crw-rw—- 1 oracle oinstall 162, 4 Nov 10 23:34 raw4
crw-rw—- 1 oracle oinstall 162, 3 Nov 10 23:34 raw3
crw-rw—- 1 root oinstall 162, 2 Nov 10 23:34 raw2
crw-rw—- 1 root oinstall 162, 1 Nov 10 23:34 raw1

权限要设置成如下:
chown root.oinstall /dev/raw/raw[1-2]
chown oracle.oinstall /dev/raw/raw[3-5]

chmod 775 /dev/raw/raw[1-2]
chmod 775 /dev/raw/raw[3-5]

可以在udev的规则里设置

(3) raw 设备读写问题
这种情况参考:
Placement of Voting disk and OCR Files in Oracle RAC 10g and 11gR1 [ID 293819.1]
清空一下RAW:
[root@rac2 raw]# dd if=/dev/zero of=/dev/mapper/crsp6 bs=1M count=200

我这里是属于bug用clsfmt检查raw 设备
[root@rac2 bin]# ./clsfmt.bin ocr /dev/raw/raw1
clsfmt: Received unexpected error 4 from skgfifi
skgfifi: Additional information: -2
Additional information: -1073741824

这里没有成功是bug
用Pathch中的新文件替换clsfmt.bin 文件
[root@rac2 4679769]# cp clsfmt.bin /u01/app/oracle/product/10.2.0/crs/bin/

再次测试:
[root@rac2 bin]# ./clsfmt.bin ocr /dev/raw/raw1
clsfmt: successfully initialized file /dev/raw/raw1
[root@rac2 bin]# ./clsfmt.bin ocr /dev/raw/raw2
clsfmt: successfully initialized file /dev/raw/raw2
[root@rac2 bin]# ./clsfmt.bin ocr /dev/raw/raw3
clsfmt: successfully initialized file /dev/raw/raw3
[root@rac2 bin]# ./clsfmt.bin ocr /dev/raw/raw4
clsfmt: successfully initialized file /dev/raw/raw4

[root@jy2 bin]# ./clsfmt.bin ocr /dev/raw/raw1
clsfmt: successfully initialized file /dev/raw/raw1
[root@jy2 bin]# ./clsfmt.bin ocr /dev/raw/raw2
clsfmt: successfully initialized file /dev/raw/raw2
[root@jy2 bin]# ./clsfmt.bin ocr /dev/raw/raw3
clsfmt: successfully initialized file /dev/raw/raw3
[root@jy2 bin]# ./clsfmt.bin ocr /dev/raw/raw4
clsfmt: successfully initialized file /dev/raw/raw4

再次执行root.sh报如上错误:
Error 0(Native: listNetInterfaces:[3])
[Error 0(Native: listNetInterfaces:[3])]
You have mail in /var/spool/mail/root
Error 0(Native: listNetInterfaces:[3])
[Error 0(Native: listNetInterfaces:[3])]

解决方式如下:
[root@rac2 crs]# cd bin
[root@rac2 bin]# ./oifcfg iflist
eth0 192.168.1.0
eth1 192.168.1.0
[root@rac2 bin]# ./oifcfg setif -global eth0/192.168.1.0:public
[root@rac2 bin]# ./oifcfg setif -global eth1/192.168.1.0:cluster_interconnect
[root@rac2 bin]# ./oifcfg getif
eth0 192.168.1.0 global public
eth1 192.168.1.0 global cluster_interconnect

补丁文件:
p4679769_10201_LINUX

p4679769_10201_Linux-x86-64

p4679769_10202_LINUX-zSer

hp rx6600两台oracle双机互备服务器其中一台经常自动关机的故障诊断

hp rx6600两台oracle数据库双机互备服务器其中一台经常自动关机,刚好在做巡检时遇到了就顺便检查一下原因.检查经常出故障的一台小机日志信息如下:

rx6600-1:[/]#cat /var/adm/syslog/syslog.log
Nov  6 10:40:35 rx6600-1 syslogd: restart
Nov  6 10:40:35 rx6600-1 vmunix: Found adjacent data tr.  Growing size.  0x32a6000 -> 0x72a6000.
Nov  6 10:40:35 rx6600-1 vmunix: Pinned PDK malloc pool: base: 0xe000000100d5a000  size=117400K
Nov  6 10:40:35 rx6600-1 vmunix: Loaded ACPI revision 2.0 tables.
Nov  6 10:40:35 rx6600-1 vmunix: MMIO on this platform supports Write Coalescing.
Nov  6 10:40:35 rx6600-1 vmunix: 
Nov  6 10:40:35 rx6600-1 vmunix: MFS is defined: base= 0xe000000100d5a000  size= 5084 KB
Nov  6 10:40:35 rx6600-1 vmunix: Unpinned PDK malloc pool: base: 0xe000000108000000  size=393216K
Nov  6 10:40:35 rx6600-1 vmunix: NOTICE: cachefs_link(): File system was registered at index 5.
Nov  6 10:40:35 rx6600-1 vmunix: emcp:GPX:Info: GPX emcpgpx_install() success.
Nov  6 10:40:35 rx6600-1 vmunix: 
Nov  6 10:40:35 rx6600-1  above message repeats 2 times
Nov  6 10:40:35 rx6600-1 vmunix: emcp:GPX:Info: DM emcpgpx_dm_install() success.
Nov  6 10:40:35 rx6600-1 vmunix: emcp:GPX:Info: VLUMD emcpgpx_vlumd_install() success.
Nov  6 10:40:35 rx6600-1 vmunix: emcp:GPX:Info: XCRYPT emcpgpx_xcrypt_install() success.
Nov  6 10:40:35 rx6600-1 vmunix: NOTICE: nfs3_link(): File system was registered at index 8.
Nov  6 10:40:35 rx6600-1 vmunix: NOTICE: mod_fs_reg: Cannot retrieve configured loading phase from KRS for module: cifs. Setting to load at INIT
Nov  6 10:40:35 rx6600-1 vmunix: 
Nov  6 10:40:35 rx6600-1 vmunix: 0 sba
Nov  6 10:40:35 rx6600-1 vmunix: 0/0 lba
Nov  6 10:40:35 rx6600-1 vmunix: 0/0/1/0 rmp3f01
Nov  6 10:40:35 rx6600-1 vmunix: 0/0/1/1 rmp3f01
Nov  6 10:40:35 rx6600-1 vmunix: 0/0/1/2 asio0
Nov  6 10:40:35 rx6600-1 vmunix: 0/0/2/0 UsbOhci
Nov  6 10:40:35 rx6600-1 vmunix: NOTICE: USB device attached.  Identification String: 
Nov  6 10:40:35 rx6600-1 vmunix: Devices/Device/USB/Standard/hp/Unknown/0_1
Nov  6 10:40:35 rx6600-1 vmunix:  <2.1.3.10.1008.4390.1>
Nov  6 10:40:35 rx6600-1 vmunix: 0/0/2/0.0 UsbMiniBus
Nov  6 10:40:35 rx6600-1 vmunix: Devices/Keyboard/USB/Boot/hp/Unknown/0_1
Nov  6 10:40:35 rx6600-1 vmunix:  <2.305.3.100.1008.4390.1>
Nov  6 10:40:35 rx6600-1 vmunix: 0/0/2/0.0.0 UsbBootKeyboard
Nov  6 10:40:35 rx6600-1 vmunix: Devices/Mouse/USB/Standard/hp/Unknown/0_1
Nov  6 10:40:35 rx6600-1 vmunix:  <2.307.3.10.1008.4390.1>
Nov  6 10:40:35 rx6600-1 vmunix: 0/0/2/1 UsbOhci
Nov  6 10:40:35 rx6600-1 vmunix: Devices/Device/USB/Standard/hp/Multibay/0_a1
Nov  6 10:40:35 rx6600-1 vmunix:  <2.1.3.10.1008.294.161>
Nov  6 10:40:35 rx6600-1 vmunix: 0/0/2/1.0 UsbMiniBus
Nov  6 10:40:35 rx6600-1 vmunix: Devices/MassStorage-SCSI/USB/BulkOnly/hp/Multibay/0_a1
Nov  6 10:40:35 rx6600-1 vmunix:  <2.310.3.150.1008.294.161>
Nov  6 10:40:35 rx6600-1 vmunix: 0/0/2/1.0.0 UsbBulkOnlyMS
Nov  6 10:40:35 rx6600-1 vmunix: Devices/ScsiControllerAdaptor/USB/BulkOnly/hp/Multibay
Nov  6 10:40:35 rx6600-1 vmunix:  <2.1000.3.150.1008.294>
Nov  6 10:40:35 rx6600-1 vmunix: 0/0/2/1.0.16 UsbScsiAdaptor
Nov  6 10:40:35 rx6600-1 vmunix: NOTICE: USB device attached.  Identification String: 
Nov  6 10:40:36 rx6600-1  above message repeats 5 times
Nov  6 10:40:35 rx6600-1 vmunix: 0/0/2/1.0.16.0 tgt
Nov  6 10:40:35 rx6600-1 vmunix: 0/0/2/1.0.16.0.0 sdisk
Nov  6 10:40:35 rx6600-1 vmunix: 0/0/2/1.0.16.7 tgt
Nov  6 10:40:35 rx6600-1 vmunix: 0/0/2/1.0.16.7.0 sctl
Nov  6 10:40:35 rx6600-1 vmunix: NOTICE: USB device attached.  Identification String: 
Nov  6 10:40:35 rx6600-1 vmunix: Devices/Device/USB/Standard/Avocent/KVMAdaptor/1_0
Nov  6 10:40:35 rx6600-1 vmunix:  <2.1.3.10.1572.833.256>
Nov  6 10:40:35 rx6600-1 vmunix: 0/0/2/1.1 UsbMiniBus
Nov  6 10:40:35 rx6600-1 vmunix: Devices/Keyboard/USB/Boot/Avocent/KVMAdaptor/1_0
Nov  6 10:40:35 rx6600-1 vmunix:  <2.305.3.100.1572.833.256>
Nov  6 10:40:35 rx6600-1 vmunix: 0/0/2/1.1.0 UsbBootKeyboard
Nov  6 10:40:35 rx6600-1 vmunix: Devices/Mouse/USB/Boot/Avocent/KVMAdaptor/1_0
Nov  6 10:40:35 rx6600-1 vmunix:  <2.307.3.100.1572.833.256>
Nov  6 10:40:35 rx6600-1 vmunix: 0/0/2/1.1.1 UsbBootMouse
Nov  6 10:40:35 rx6600-1 vmunix: NOTICE: USB device attached.  Identification String: 
Nov  6 10:40:36 rx6600-1  above message repeats 2 times
Nov  6 10:40:35 rx6600-1 vmunix: 0/0/2/2 UsbEhci
Nov  6 10:40:35 rx6600-1 vmunix: 0/0/4/0 gvid_core
Nov  6 10:40:35 rx6600-1 vmunix: 0/1 lba
Nov  6 10:40:35 rx6600-1 vmunix: 0/2 lba
Nov  6 10:40:35 rx6600-1 vmunix: 0/2/1/0 PCItoPCI
Nov  6 10:40:35 rx6600-1 vmunix: fcd: Claimed HP AD193-60001 4Gb Fibre Channel port at hardware path 0/2/1/0/4/0 (FC Port 1 on HBA)
Nov  6 10:40:35 rx6600-1 vmunix: 0/2/1/0/4/0 fcd
Nov  6 10:40:35 rx6600-1 vmunix: 0/2/1/0/6/0 iether
Nov  6 10:40:35 rx6600-1 vmunix: 0/3 lba
Nov  6 10:40:35 rx6600-1 vmunix: 0/3/1/0 PCItoPCI
Nov  6 10:40:35 rx6600-1 vmunix: fcd: Claimed HP AD193-60001 4Gb Fibre Channel port at hardware path 0/3/1/0/4/0 (FC Port 1 on HBA)
Nov  6 10:40:35 rx6600-1 vmunix: 0/3/1/0/4/0 fcd
Nov  6 10:40:35 rx6600-1 vmunix: 0/3/1/0/6/0 iether
Nov  6 10:40:35 rx6600-1 vmunix: 0/4 lba
Nov  6 10:40:35 rx6600-1 vmunix: sasd: Claimed HP PCI/PCI-X SAS MPT adapter at hardware path 0/4/1/0 
Nov  6 10:40:35 rx6600-1 vmunix: 0/4/1/0 sasd
Nov  6 10:40:35 rx6600-1 vmunix: 0/4/2/0 iether
Nov  6 10:40:35 rx6600-1 vmunix: 0/4/2/1 iether
Nov  6 10:40:35 rx6600-1 vmunix: 0/5 lba
Nov  6 10:40:35 rx6600-1 vmunix: 0/5/1/0 PCItoPCI
Nov  6 10:40:35 rx6600-1 vmunix: fcd: Claimed HP AD193-60001 4Gb Fibre Channel port at hardware path 0/5/1/0/4/0 (FC Port 1 on HBA)
Nov  6 10:40:35 rx6600-1 vmunix: 0/5/1/0/4/0 fcd
Nov  6 10:40:35 rx6600-1 vmunix: 0/5/1/0/6/0 iether
Nov  6 10:40:35 rx6600-1 vmunix: 0/6 lba
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0 PCItoPCI
Nov  6 10:40:35 rx6600-1 vmunix: fcd: Claimed HP AD193-60001 4Gb Fibre Channel port at hardware path 0/6/1/0/4/0 (FC Port 1 on HBA)
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0 fcd
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/6/0 iether
Nov  6 10:40:35 rx6600-1 vmunix: 0/7 lba
Nov  6 10:40:35 rx6600-1 vmunix: Initializing the Ultra320 SCSI Controller at 0/7/1/0. Controller firmware version is 01.03.35.70
Nov  6 10:40:35 rx6600-1 vmunix: 0/7/1/0 mpt
Nov  6 10:40:35 rx6600-1 vmunix: Initializing the Ultra320 SCSI Controller at 0/7/1/1. Controller firmware version is 01.03.35.70
Nov  6 10:40:35 rx6600-1 vmunix: 0/7/1/1 mpt
Nov  6 10:40:35 rx6600-1 vmunix: 120 processor
Nov  6 10:40:35 rx6600-1 vmunix: 121 processor
Nov  6 10:40:35 rx6600-1 vmunix: 122 processor
Nov  6 10:40:35 rx6600-1 vmunix: 123 processor
Nov  6 10:40:35 rx6600-1 vmunix: 124 processor
Nov  6 10:40:35 rx6600-1 vmunix: 125 processor
Nov  6 10:40:35 rx6600-1 vmunix: 126 processor
Nov  6 10:40:35 rx6600-1 vmunix: 127 processor
Nov  6 10:40:35 rx6600-1 vmunix: 250 pdh
Nov  6 10:40:35 rx6600-1 vmunix: 250/0 ipmi
Nov  6 10:40:35 rx6600-1 vmunix: 250/1 asio0
Nov  6 10:40:35 rx6600-1 vmunix: 250/2 acpi_node
Nov  6 10:40:35 rx6600-1 vmunix: 0/7/1/0.7 tgt
Nov  6 10:40:35 rx6600-1 vmunix: 0/7/1/0.7.0 sctl
Nov  6 10:40:35 rx6600-1 vmunix: 0/5/1/0/4/0.1 fcd_fcp
Nov  6 10:40:35 rx6600-1 vmunix: 0/5/1/0/4/0.1.9.0.0 fcd_vbus
Nov  6 10:40:35 rx6600-1 vmunix: 0/5/1/0/4/0.1.9.255.0 fcd_vbus
Nov  6 10:40:35 rx6600-1 vmunix: 0/5/1/0/4/0.1.13.255.0 fcd_vbus
Nov  6 10:40:35 rx6600-1 vmunix: 0/5/1/0/4/0.1.13.255.0.0 tgt
Nov  6 10:40:35 rx6600-1 vmunix: 0/5/1/0/4/0.1.13.255.0.0.0 sdisk
Nov  6 10:40:35 rx6600-1 vmunix: 0/5/1/0/4/0.1.9.0.0.0 tgt
Nov  6 10:40:35 rx6600-1 vmunix: 0/5/1/0/4/0.1.9.255.0.0 tgt
Nov  6 10:40:35 rx6600-1 vmunix: 0/5/1/0/4/0.1.9.0.0.0.0 sdisk
Nov  6 10:40:35 rx6600-1 vmunix: 0/5/1/0/4/0.1.9.255.0.0.0 sctl
Nov  6 10:40:35 rx6600-1 vmunix: 0/5/1/0/4/0.1.9.0.0.0.1 sdisk
Nov  6 10:40:35 rx6600-1 vmunix: 0/5/1/0/4/0.1.9.0.0.0.2 sdisk
Nov  6 10:40:35 rx6600-1 vmunix: 0/5/1/0/4/0.1.9.0.0.0.3 sdisk
Nov  6 10:40:35 rx6600-1 vmunix: 0/5/1/0/4/0.1.9.0.0.0.4 sdisk
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1 fcd_fcp
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1.9.0.0 fcd_vbus
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1.9.255.0 fcd_vbus
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1.13.0.0 fcd_vbus
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1.13.255.0 fcd_vbus
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1.13.0.0.0 tgt
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1.13.255.0.0 tgt
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1.9.0.0.0 tgt
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1.13.0.0.0.0 sdisk
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1.9.255.0.0 tgt
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1.13.255.0.0.0 sctl
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1.9.0.0.0.0 sdisk
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1.9.255.0.0.0 sctl
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1.13.0.0.0.1 sdisk
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1.13.0.0.0.2 sdisk
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1.13.0.0.0.3 sdisk
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1.9.0.0.0.1 sdisk
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1.13.0.0.0.4 sdisk
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1.9.0.0.0.2 sdisk
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1.9.0.0.0.3 sdisk
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1.9.0.0.0.4 sdisk
Nov  6 10:40:35 rx6600-1 vmunix: 0/7/1/1.7 tgt
Nov  6 10:40:35 rx6600-1 vmunix: 0/7/1/1.7.0 sctl
Nov  6 10:40:35 rx6600-1 vmunix: 0/4/1/0.0.0 sasd_vbus
Nov  6 10:40:35 rx6600-1 vmunix: 0/4/1/0.0.0.0 tgt
Nov  6 10:40:35 rx6600-1 vmunix: 0/4/1/0.0.0.0.0 sdisk
Nov  6 10:40:35 rx6600-1 vmunix: Boot device's HP-UX HW path is: 0/4/1/0.0.0.0.0
Nov  6 10:40:35 rx6600-1 vmunix: 
Nov  6 10:40:35 rx6600-1 vmunix:     System Console is on the Built-In Serial Interface
Nov  6 10:40:35 rx6600-1 vmunix: iether0: INITIALIZING HP AD193-60001 PCI/PCI-X 1000Base-T 4Gb FC/1000B-T Combo Adapter at hardware path 0/2/1/0/6/0
Nov  6 10:40:35 rx6600-1 vmunix: iether1: INITIALIZING HP AD193-60001 PCI/PCI-X 1000Base-T 4Gb FC/1000B-T Combo Adapter at hardware path 0/3/1/0/6/0
Nov  6 10:40:35 rx6600-1 vmunix: iether2: INITIALIZING HP AB352-60003 PCI/PCI-X 1000Base-T Dual-port Core at hardware path 0/4/2/0
Nov  6 10:40:35 rx6600-1 vmunix: iether4: INITIALIZING HP AD193-60001 PCI/PCI-X 1000Base-T 4Gb FC/1000B-T Combo Adapter at hardware path 0/5/1/0/6/0
Nov  6 10:40:35 rx6600-1 vmunix: iether5: INITIALIZING HP AD193-60001 PCI/PCI-X 1000Base-T 4Gb FC/1000B-T Combo Adapter at hardware path 0/6/1/0/6/0
Nov  6 10:40:35 rx6600-1 vmunix: iether3: INITIALIZING HP AB352-60003 PCI/PCI-X 1000Base-T Dual-port Core at hardware path 0/4/2/1
Nov  6 10:40:35 rx6600-1 vmunix: Logical volume 64, 0x3 configured as ROOT
Nov  6 10:40:35 rx6600-1 vmunix: Logical volume 64, 0x2 configured as SWAP
Nov  6 10:40:35 rx6600-1 vmunix: Logical volume 64, 0x2 configured as DUMP
Nov  6 10:40:35 rx6600-1 vmunix:     Swap device table:  (start & size given in 512-byte blocks)
Nov  6 10:40:35 rx6600-1 vmunix:         entry 0 - major is 64, minor is 0x2; start = 0, size = 16777216
Nov  6 10:40:35 rx6600-1 vmunix:     Dump device table:  (start & size given in 1-Kbyte blocks)
Nov  6 10:40:35 rx6600-1 vmunix:         entry 0000000000000000 - major is 31, minor is 0x30000; start = 2349940, size = 8388604
Nov  6 10:40:35 rx6600-1 vmunix: Starting the STREAMS daemons-phase 1
Nov  6 10:40:35 rx6600-1 vmunix: Create STCP device files
Nov  6 10:40:35 rx6600-1 vmunix: Starting the STREAMS daemons-phase 2
Nov  6 10:40:35 rx6600-1 vmunix:      $Revision: vmunix:    B11.23_LR FLAVOR=perf Fri Aug 29 22:35:38 PDT 2003 $
Nov  6 10:40:35 rx6600-1 vmunix: Memory Information:
Nov  6 10:40:35 rx6600-1 vmunix:     physical page size = 4096 bytes, logical page size = 4096 bytes
Nov  6 10:40:35 rx6600-1 vmunix:     Physical: 25133536 Kbytes, lockable: 18994328 Kbytes, available: 22051156 Kbytes
Nov  6 10:40:35 rx6600-1 vmunix: 
Nov  6 10:40:36 rx6600-1 nettl[832]: nettl starting up.
Nov  6 10:40:48 rx6600-1 sshd[986]: Server listening on :: port 22.
Nov  6 10:40:48 rx6600-1 sshd[986]: Server listening on 0.0.0.0 port 22.
Nov  6 10:40:49 rx6600-1 rpcbind: check_netconfig: Found CLTS loopback transport
Nov  6 10:40:49 rx6600-1 rpcbind: check_netconfig: Found COTS loopback transport
Nov  6 10:40:49 rx6600-1 rpcbind: check_netconfig: Found COTS ORD loopback transport
Nov  6 10:40:49 rx6600-1 rpcbind: init_transport: check binding for udp
Nov  6 10:40:49 rx6600-1 rpcbind: init_transport: check binding for tcp
Nov  6 10:40:49 rx6600-1 rpcbind: init_transport: check binding for ticlts
Nov  6 10:40:49 rx6600-1 rpcbind: init_transport: check binding for ticotsord
Nov  6 10:40:49 rx6600-1 rpcbind: init_transport: check binding for ticots
Nov  6 10:40:50 rx6600-1 inetd[1100]: Reading configuration
Nov  6 10:40:50 rx6600-1 inetd[1100]: ftp/tcp: Added service, server /usr/lbin/ftpd
Nov  6 10:40:50 rx6600-1 inetd[1100]: telnet/tcp: Added service, server /usr/lbin/telnetd
Nov  6 10:40:50 rx6600-1 inetd[1100]: tftp/udp: Added service, server /usr/lbin/tftpd
Nov  6 10:40:50 rx6600-1 inetd[1100]: login/tcp: Added service, server /usr/lbin/rlogind
Nov  6 10:40:50 rx6600-1 inetd[1100]: shell/tcp: Added service, server /usr/lbin/remshd
Nov  6 10:40:50 rx6600-1 inetd[1100]: exec/tcp: Added service, server /usr/lbin/rexecd
Nov  6 10:40:50 rx6600-1 inetd[1100]: ntalk/udp: Added service, server /usr/lbin/ntalkd
Nov  6 10:40:50 rx6600-1 inetd[1100]: auth/tcp: Added service, server /usr/lbin/identd
Nov  6 10:40:50 rx6600-1 inetd[1100]: printer/tcp: Added service, server /usr/sbin/rlpdaemon
Nov  6 10:40:51 rx6600-1 inetd[1100]: daytime/tcp: Added service, server internal
Nov  6 10:40:51 rx6600-1 inetd[1100]: daytime/udp: Added service, server internal
Nov  6 10:40:51 rx6600-1 inetd[1100]: time/tcp: Added service, server internal
Nov  6 10:40:51 rx6600-1 inetd[1100]: echo/tcp: Added service, server internal
Nov  6 10:40:51 rx6600-1 inetd[1100]: echo/udp: Added service, server internal
Nov  6 10:40:51 rx6600-1 inetd[1100]: discard/tcp: Added service, server internal
Nov  6 10:40:51 rx6600-1 inetd[1100]: discard/udp: Added service, server internal
Nov  6 10:40:51 rx6600-1 inetd[1100]: chargen/tcp: Added service, server internal
Nov  6 10:40:51 rx6600-1 inetd[1100]: chargen/udp: Added service, server internal
Nov  6 10:40:51 rx6600-1 inetd[1100]: kshell/tcp: Added service, server /usr/lbin/remshd
Nov  6 10:40:51 rx6600-1 inetd[1100]: klogin/tcp: Added service, server /usr/lbin/rlogind
Nov  6 10:40:51 rx6600-1 inetd[1100]: dtspc/tcp: Added service, server /usr/dt/bin/dtspcd
Nov  6 10:40:51 rx6600-1 inetd[1100]: recserv/tcp: Added service, server /usr/lbin/recserv
Nov  6 10:40:51 rx6600-1 inetd[1100]: swat/tcp: Added service, server /opt/samba/bin/swat
Nov  6 10:40:51 rx6600-1 inetd[1100]: registrar/tcp: Added service, server /etc/opt/resmon/lbin/registrar
Nov  6 10:40:51 rx6600-1 inetd[1100]: hacl-probe/tcp: Added service, server /opt/cmom/lbin/cmomd
Nov  6 10:40:51 rx6600-1 inetd[1100]: hacl-cfg/udp: Added service, server /usr/lbin/cmclconfd
Nov  6 10:40:51 rx6600-1 inetd[1100]: hacl-cfg/tcp: Added service, server /usr/lbin/cmclconfd
Nov  6 10:40:51 rx6600-1 inetd[1100]: instl_boots/udp: Added service, server /opt/ignite/lbin/instl_bootd
Nov  6 10:40:51 rx6600-1 inetd[1100]: omni/tcp: Added service, server /opt/omni/lbin/inet
Nov  6 10:40:51 rx6600-1 inetd[1100]: rpc.cmsd/udp: Added service, server /usr/dt/bin/rpc.cmsd
Nov  6 10:40:51 rx6600-1 inetd[1100]: rpc.ttdbserver/tcp: Added service, server /usr/dt/bin/rpc.ttdbserver
Nov  6 10:40:51 rx6600-1 inetd[1100]: Configuration complete
Nov  6 10:40:53 rx6600-1 EMCPP: emcpAudit: Info: cmd=powermt: restore  (user ID real=0 effective=0)
Nov  6 10:40:53 rx6600-1 EMCPP: emcpAudit: Info: cmd=powermt: config  (user ID real=0 effective=0)
Nov  6 10:40:53 rx6600-1 EMCPP: emcpAudit: Info: cmd=powermt: save  (user ID real=0 effective=0)
Nov  6 10:40:54 rx6600-1 su: + tty?? root-sfmdb
Nov  6 10:41:06 rx6600-1 cimserver[1706]: starting
Nov  6 10:41:29 rx6600-1 cimserver[1707]: PGS10026:  THE CIM SERVER IS LISTENING ON HTTPS PORT 5,989.
Nov  6 10:41:29 rx6600-1 cimserver[1707]: PGS10028: THE CIM SERVER IS LISTENING ON THE LOCAL CONNECTION SOCKET.
Nov  6 10:41:29 rx6600-1 cimserver[1707]: PGS10030:  STARTED HP-UX WBEM Services VERSION A.02.07.
Nov  6 10:41:32 rx6600-1 FontServer[1755]: Warning: Bad font path element: "/usr/lib/X11/fonts/hp_japanese/100dpi/"
Nov  6 10:41:32 rx6600-1 FontServer[1755]: Warning: Bad font path element: "/usr/lib/X11/fonts/hp_japanese/75dpi/"
Nov  6 10:41:32 rx6600-1 FontServer[1755]: Warning: Bad font path element: "/usr/lib/X11/fonts/hp_korean/75dpi/"
Nov  6 10:41:32 rx6600-1 FontServer[1755]: Warning: Cannot initialize font path element: "/usr/lib/X11/fonts/hp_chinese_t/75dpi/"
Nov  6 10:41:32 rx6600-1 FontServer[1755]: Warning: Bad font path element: "/usr/lib/X11/fonts/ttfjpn.st"
Nov  6 10:41:32 rx6600-1 FontServer[1755]: Warning: Bad font path element: "/usr/lib/X11/fonts/ifojpn.st"
Nov  6 10:41:34 rx6600-1 pwgrd: Started at Thu Nov  6 10:41:34 2014, pid = 1798
Nov  6 10:41:34 rx6600-1 diagmond[1833]: started
Nov  6 10:41:34 rx6600-1 /usr/sbin/envd[1837]: VXPBFt6/, 2"6A3vEdVCND< ~
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2180]: Setting STREAMS-HEAD high water value to 131072.
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2185]: nfsd do_one mpctl succeeded: ncpus = 8.
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2185]: nfsd do_one pmap 2
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2185]: nfsd do_one pmap 3
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2190]: nfsd do_one bind 0
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2191]: nfsd do_one bind 1
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2192]: nfsd do_one bind 2
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2193]: nfsd do_one bind 3
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2194]: nfsd do_one bind 4
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2195]: nfsd do_one bind 5
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2185]: nfsd do_one bind 7
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2195]: Return from t_optmgmt(XTI_DISTRIBUTE) 0
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2195]: nfsd 5 1  sock 4
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2197]: nfsd 5 0  sock 4
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2193]: Return from t_optmgmt(XTI_DISTRIBUTE) 0
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2192]: Return from t_optmgmt(XTI_DISTRIBUTE) 0
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2193]: nfsd 3 1  sock 4
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2192]: nfsd 2 1  sock 4
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2200]: nfsd 2 0  sock 4
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2191]: Return from t_optmgmt(XTI_DISTRIBUTE) 0
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2194]: Return from t_optmgmt(XTI_DISTRIBUTE) 0
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2191]: nfsd 1 1  sock 4
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2201]: nfsd 1 0  sock 4
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2199]: nfsd 3 0  sock 4
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2194]: nfsd 4 1  sock 4
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2202]: nfsd 4 0  sock 4
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2185]: Return from t_optmgmt(XTI_DISTRIBUTE) 0
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2185]: nfsd 7 1  sock 4
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2219]: nfsd 7 0  sock 4
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2196]: nfsd do_one bind 6
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2190]: Return from t_optmgmt(XTI_DISTRIBUTE) 0
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2196]: Return from t_optmgmt(XTI_DISTRIBUTE) 0
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2190]: nfsd 0 1  sock 4
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2220]: nfsd 0 0  sock 4
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2196]: nfsd 6 1  sock 4
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2221]: nfsd 6 0  sock 4
Nov  6 10:41:53 rx6600-1 krsd[2300]: Delay time is 300 seconds
Nov  6 10:41:53 rx6600-1 sfd[2301]: daemon already running.
Nov  6 10:41:54 rx6600-1 sfd[2314]: starting the daemon.
Nov  6 10:41:54 rx6600-1 emcp_mond: PP daemon: Info: New event pair [0] (2,4,60)
Nov  6 10:41:54 rx6600-1 emcp_mond: PP daemon: Info: New event pair [1] (20,40,300)
Nov  6 10:41:54 rx6600-1 emcp_mond: PP daemon: Info: SetLogMask:: EventLogMask set to 0x66 
Nov  6 10:41:54 rx6600-1 emcp_mond: PP daemon: Info: Using hostname localhost community public debug 0
Nov  6 10:41:54 rx6600-1 emcp_mond: PP daemon: Info: Daemon created successfully.  Starting it now
Nov  6 10:41:54 rx6600-1 emcp_mond: PP daemon: Info: SNMP trap processing disabled.
Nov  6 10:41:54 rx6600-1 emcp_mond: PP daemon: Info: PP Remote Management disabled.
Nov  6 10:45:17 rx6600-1 vmunix: emcp:Mpx:Info: PowerPath Auto Host Registration on VNX-FCN00125000137 is unavailable: incompatible initiator information received from the array
Nov  6 10:45:42 rx6600-1 /usr/sbin/envd[1837]: ***** 9} HH AY =g >/ 8f *****
Nov  6 10:45:42 rx6600-1 /usr/sbin/envd[1837]: NB6H3,9}U}3#9$WwAY=gV5, P^U}9}HHLu< ~!#
Nov  6 10:45:42 rx6600-1 EMS [2970]: ------ EMS Event Notification ------   Value: "MAJORWARNING (3)" for Resource: "/system/events/ia64_corehw/core_hw"     (Threshold:  >= " 3")   
 Execute the following command to obtain event details:   /opt/resmon/bin/resdata -R 194641922 -r /system/events/ia64_corehw/core_hw -n 194641921 -a 
Nov  6 10:49:14 rx6600-1 EMS [2928]: ------ EMS Event Notification ------   Value: "CRITICAL (5)" for Resource: "/system/events/ipmi_fpl/ipmi_fpl"     (Threshold:  >= " 3")    
Execute the following command to obtain event details:   /opt/resmon/bin/resdata -R 191889410 -r /system/events/ipmi_fpl/ipmi_fpl -n 191889409 -a 
Nov  6 18:48:12 rx6600-1 EMS [2970]: ------ EMS Event Notification ------   Value: "CRITICAL (5)" for Resource: "/system/events/ia64_corehw/core_hw"     (Threshold:  >= " 3")    
Execute the following command to obtain event details:   /opt/resmon/bin/resdata -R 194641922 -r /system/events/ia64_corehw/core_hw -n 194641922 -a 
Nov  6 19:00:00 rx6600-1 su: + tty?? root-oracle
Nov  7 08:00:00 rx6600-1 su: + tty?? root-root

从如下信息看到服务器已经出问题了,且信息已经指出可以执行
/opt/resmon/bin/resdata -R 194641922 -r /system/events/ia64_corehw/core_hw -n 194641921 -a 命令来查看详细信息

Nov  6 10:41:54 rx6600-1 emcp_mond: PP daemon: Info: SNMP trap processing disabled.
Nov  6 10:41:54 rx6600-1 emcp_mond: PP daemon: Info: PP Remote Management disabled.
Nov  6 10:45:17 rx6600-1 vmunix: emcp:Mpx:Info: PowerPath Auto Host Registration on VNX-FCN00125000137 is unavailable: incompatible initiator information received from the array
Nov  6 10:45:42 rx6600-1 /usr/sbin/envd[1837]: ***** 9} HH AY =g >/ 8f *****
Nov  6 10:45:42 rx6600-1 /usr/sbin/envd[1837]: NB6H3,9}U}3#9$WwAY=gV5, P^U}9}HHLu< ~!#
Nov  6 10:45:42 rx6600-1 EMS [2970]: ------ EMS Event Notification ------   Value: "MAJORWARNING (3)" for Resource: "/system/events/ia64_corehw/core_hw"     (Threshold:  >= " 3")   
Execute the following command to obtain event details:   /opt/resmon/bin/resdata -R 194641922 -r /system/events/ia64_corehw/core_hw -n 194641921 -a 

执行/opt/resmon/bin/resdata -R 194641922 -r /system/events/ia64_corehw/core_hw -n 194641921 -a 命令来查看详细信息

rx6600-1:[/]#/opt/resmon/bin/resdata -R 194641922 -r /system/events/ia64_corehw/core_hw -n 194641921 -a 

ARCHIVED MONITOR DATA:

Event Time..........: Thu Nov  6 10:45:42 2014
Severity............: MAJORWARNING
Monitor.............: ia64_corehw
Event #.............: 101011              
System..............: rx6600-1

Summary:
     System temperature is out of normal range. 


Description of Error:

     The system temperature is not within normal operating range. It is higher
     than required operating range.

这个错误描述是说系统的温度超出了正常范围,下面信息说明了可能的原因

Probable Cause / Recommended Action:

     Something may be blocking the cooling intakes of the fans. Check for
     obstruction.
     One or more fans may be operating at lower speed than normal. Check the
     fan performance.

     Check for problems with the room air conditioning.

     If the problem is not fixed, the operating temperature may become
     non-recoverable, in which case there are chances that the hardware may be
     damaged.  At that temperature level, on Integrity servers, the firmware
     will shutdown the system automatically. However on HP 9000 servers, the
     action specified in the envd config file will be taken - which may be to
     shutdown the system automatically.

     For information on the sensor that generated this event, refer to FRU ID
     in Event Details section.

上面的信息是说,可能需要清理一下风机,或者风机性能出现问题,或者检查空调情况,如果不是这些原因造成那么可能是硬件出现问题了。下面的论断事件的数据:

Additional Event Data: 
     System IP Address...: 10.138.129.5
     Event Id............: 0x545ae0d600000000
     Monitor Version.....: B.01.00
     Event Class.........: System
     Client Configuration File...........:
     /var/stm/config/tools/monitor/default_ia64_corehw.clcfg 
     Client Configuration File Version...: A.01.00 
          Qualification criteria met.
               Number of events..: 1 
     Associated OS error log entry id(s): 
          None
     Additional System Data:
          System Model Number.............: ia64 hp server rx6600 
          EMS Version.....................: A.04.20 
          STM Version.....................: C.58.00 
          System Serial Number............: SGH48045VY 
     Latest information on this event:
          http://docs.hp.com/hpux/content/hardware/ems/ia64_corehw.htm#101011

v-v-v-v-v-v-v-v-v-v-v-v-v    D  E  T  A  I  L  S    v-v-v-v-v-v-v-v-v-v-v-v-v


Event Details :

     Event Date .............: Thu Nov  6 10:44:08 2014
     Sensor Number ..........: 0xdb
     Sensor Type ............: Temperature
     Sensor Class ...........: Threshold based
     Sensor Reading/Offset...: 0x07 (Offset)
     Event  Type.............: Assertion
     Entity ID ..............: 3
     Generic Message.........: 
       Temperature :  Upper non-critical - going high 
     Entity FRU Id Info......: 
       processor (Sensor ID: Processor 2)

从上面的Event Details信息可以看到,传感器类型是温度方面的问题,传感器类别是基于阈值,事件类型是断言,是说2号cpu的温度已经超过了阈值.经过检查不是机房空调,通风口堵塞问题,需要联系小机厂商来进行一步检查是什么原因造成cpu温度超过阈值,平时cpu使用率只有10%。