但行好事莫问前程

Oracle RAC修改public,private,vip scan IP

Posted on 2014年11月18日2014年11月18日 by YongJing(敬勇)

oracle rac修改IP涉及到OCR,在执行修改前要进行一个备份，这样即使修改失败，也能还原回来,如何备份可以参考
oracle-rac-ocr-和-votingdisk-的备份与恢复/
实验环境:redhat 5.4 oracle 10g rac,oracle 11g rac
原来两节点rac IP设置如下:
192.168.56.11 jy1
192.168.56.12 jy2
192.168.56.13 jy1-priv
192.168.56.14 jy2-priv
192.168.56.15 jy1-vip
192.168.56.16 jy2-vip
现在修改为
192.168.56.111 jy1
192.168.56.112 jy2
192.168.100.1 jy1-priv
192.168.100.2 jy2-priv
192.168.56.113 jy1-vip
192.168.56.114 jy2-vip

一．停止oracle相关的所有进程, 包括数据库, asm, node application, crs
1.1 查看当前系统上crs运行的状态

[root@jy1 bin]# cd /u01/app/oracle/product/10.2.0/crs/bin
[root@jy1 bin]# pwd
/u01/app/oracle/product/10.2.0/crs/bin
[root@jy1 bin]# ./crs_stat -t
Name           Type           Target    State     Host        
------------------------------------------------------------
ora....SM1.asm application    ONLINE    ONLINE    jy1         
ora....Y1.lsnr application    ONLINE    ONLINE    jy1         
ora.jy1.gsd    application    ONLINE    ONLINE    jy1         
ora.jy1.ons    application    ONLINE    ONLINE    jy1         
ora.jy1.vip    application    ONLINE    ONLINE    jy1         
ora....SM2.asm application    ONLINE    ONLINE    jy2         
ora....Y2.lsnr application    ONLINE    ONLINE    jy2         
ora.jy2.gsd    application    ONLINE    ONLINE    jy2         
ora.jy2.ons    application    ONLINE    ONLINE    jy2         
ora.jy2.vip    application    ONLINE    ONLINE    jy2         
ora.jyrac.db   application    ONLINE    ONLINE    jy1         
ora....c1.inst application    ONLINE    ONLINE    jy1         
ora....c2.inst application    ONLINE    ONLINE    jy2

1.2 关闭数据库

[root@jy1 bin]# ./srvctl stop database -d jyrac

1.3 关闭asm实例

[root@jy1 bin]# ./srvctl stop asm -n jy1
[root@jy1 bin]# ./srvctl stop asm -n jy2

1.4 关闭其他应用程序

[root@jy1 bin]# ./srvctl stop nodeapps -n jy1
[root@jy1 bin]# ./srvctl stop nodeapps -n jy2

1.5 关闭crs后台进程, 在操作系统一级中止运行的crs后台进程, 必须在所有节点上运行.

[root@jy1 bin]# /etc/init.d/init.crs stop
Shutting down Oracle Cluster Ready Services (CRS):
Stopping resources.
Successfully stopped CRS resources 
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.
Shutdown has begun. The daemons should exit soon.

[root@jy2 bin]# /etc/init.d/init.crs stop
Shutting down Oracle Cluster Ready Services (CRS):
Stopping resources.
Successfully stopped CRS resources 
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.
Shutdown has begun. The daemons should exit soon.

二修改操作系统的ip设置
2.1 修改 /etc/hosts 文件。保持2个节点的一致

[root@jy1 bin]# vi /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               jy1 localhost.localdomain localhost
::1             localhost6.localdomain6 localhost6
192.168.56.111           jy1
192.168.56.112           jy2
192.168.100.1           jy1-priv
192.168.100.2           jy2-priv
192.168.56.113           jy1-vip
192.168.56.114           jy2-vip

[root@jy2 bin]# vi /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               jy2 localhost.localdomain localhost
::1             localhost6.localdomain6 localhost6
192.168.56.111           jy1
192.168.56.112           jy2
192.168.100.1           jy1-priv
192.168.100.2           jy2-priv
192.168.56.113           jy1-vip
192.168.56.114           jy2-vip

2.2 在两个节点上分别修改eth0和eth1地址：

[root@jy1 bin]# vi /etc/sysconfig/network-scripts/ifcfg-eth0
# Intel Corporation 82540EM Gigabit Ethernet Controller
DEVICE=eth0
BOOTPROTO=none
ONBOOT=yes
HWADDR=08:00:27:54:5a:6a
NETMASK=255.255.255.0
IPADDR=192.168.56.111
GATEWAY=192.168.56.1
TYPE=Ethernet
USERCTL=no
IPV6INIT=no
PEERDNS=yes



[root@jy2 bin]# vi /etc/sysconfig/network-scripts/ifcfg-eth0
# Intel Corporation 82540EM Gigabit Ethernet Controller
DEVICE=eth0
BOOTPROTO=none
ONBOOT=yes
HWADDR=08:00:27:41:5d:e5
NETMASK=255.255.255.0
IPADDR=192.168.56.112
GATEWAY=192.168.56.1
TYPE=Ethernet
USERCTL=no
IPV6INIT=no
PEERDNS=yes


[root@jy1 bin]# vi /etc/sysconfig/network-scripts/ifcfg-eth1
IPADDR=192.168.100.1
# Intel Corporation 82540EM Gigabit Ethernet Controller
DEVICE=eth1
BOOTPROTO=none
ONBOOT=yes
HWADDR=08:00:27:c3:b7:37
NETMASK=255.255.255.0
IPADDR=192.168.100.1
TYPE=Ethernet
USERCTL=no
IPV6INIT=no
PEERDNS=yes

[root@jy2 bin]# vi /etc/sysconfig/network-scripts/ifcfg-eth1
GATEWAY=192.168.56.1
# Intel Corporation 82540EM Gigabit Ethernet Controller
DEVICE=eth1
BOOTPROTO=none
ONBOOT=yes
HWADDR=08:00:27:4a:e3:cb
TYPE=Ethernet
NETMASK=255.255.255.0
IPADDR=192.168.100.2
USERCTL=no
IPV6INIT=no
PEERDNS=yes

2.3 在两个节点上分别重启网络服务，是修改的IP生效

[root@jy1 bin]# service network restart

[root@jy2 bin]# service network restart

三启动crs, 设置oracle中ip地址相关的设置
3.1 在两个节点启动crs, 并关闭随crs启动的所有服务

[root@jy1 bin]# /etc/init.d/init.crs start
Startup will be queued to init within 90 seconds.

[root@jy2 bin]# /etc/init.d/init.crs start
Startup will be queued to init within 90 seconds.

由于oracle所有服务设置为自动启动, 所以在crs启动时会试图启动所有的服务, 但是在对oracle相关的ip地址进行设置时需要crs处于运行状态. 但是数据库, asm和node application处于停止状态, 在一个节点执行以下命令来停止所有服务：

[root@jy1 bin]# ./srvctl stop database -d jyrac
[root@jy1 bin]# ./srvctl stop asm -n jy1
[root@jy1 bin]# ./srvctl stop asm -n jy2
[root@jy1 bin]# ./srvctl stop nodeapps -n jy1
[root@jy1 bin]# ./srvctl stop nodeapps -n jy2

[root@jy1 bin]# ./crs_stat -t
Name           Type           Target    State     Host        
------------------------------------------------------------
ora....SM1.asm application    OFFLINE   OFFLINE               
ora....Y1.lsnr application    OFFLINE   OFFLINE               
ora.jy1.gsd    application    OFFLINE   OFFLINE               
ora.jy1.ons    application    OFFLINE   OFFLINE               
ora.jy1.vip    application    OFFLINE   OFFLINE               
ora....SM2.asm application    OFFLINE   OFFLINE               
ora....Y2.lsnr application    OFFLINE   OFFLINE               
ora.jy2.gsd    application    OFFLINE   OFFLINE               
ora.jy2.ons    application    OFFLINE   OFFLINE               
ora.jy2.vip    application    OFFLINE   OFFLINE               
ora.jyrac.db   application    OFFLINE   OFFLINE               
ora....c1.inst application    OFFLINE   OFFLINE               
ora....c2.inst application    OFFLINE   OFFLINE

3.2 使用oifcfg修改网卡设置, oifconfig可以用来设置和查看网卡使用的方式(如果修改的网段相同可以不做这一步)
3.2.1查看当前配置：

[root@jy1 bin]# ./oifcfg getif -global
eth0  192.168.56.0  global  public
eth1  192.168.56.0  global  cluster_interconnect

3.2.2删除当前配置

[root@jy1 bin]# ./oifcfg delif -global eth0
[root@jy1 bin]# ./oifcfg delif -global eth1
[root@jy1 bin]# ./oifcfg getif

3.2.3重新添加

[root@jy1 bin]# ./oifcfg setif -global eth0/192.168.56.0:public
[root@jy1 bin]# ./oifcfg setif -global eth1/192.168.100.0:cluster_interconnect
[root@jy1 bin]# ./oifcfg getif
eth0  192.168.56.0  global  public
eth1  192.168.100.0  global  cluster_interconnect
[root@jy1 bin]# ./oifcfg iflist
eth0  192.168.56.0
eth1  192.168.100.0

[root@jy2 bin]# ./oifcfg iflist
eth0  192.168.56.0
eth1  192.168.100.0

这里IP 地址最后一个为0,代表的是一个网段。修改的时候要切记。否在在启动OCR 时会报如下错误：[ CRSOCR][4054413904] OCR context init failure. Error: PROC-44: 网络地址和网络接口操作中出错网络地址和网络接口操作错误 [7]

3.3 修改VIP

[root@jy1 bin]# ./srvctl modify nodeapps -n jy1 -A 192.168.56.113/255.255.255.0/eth0

[root@jy1 bin]# ./srvctl modify nodeapps -n jy2 -A 192.168.56.114/255.255.255.0/eth0

3.4 设置listener.ora和tnsnames.ora, 检查这些文件中是否有指定原来ip的地方,
修改为更改后的ip地址(注意:如果你使用了ocfs，修改ocfs配置文件(/etc/ocfs/cluster.conf),
验证修改后是否可用)

在节点jy1上没有使用原来的固定IP不用修改
[root@jy1 bin]# vi /u01/app/oracle/product/10.2.0/db/network/admin/listener.ora
# listener.ora.jy1 Network Configuration File: /u01/app/oracle/product/10.2.0/db/network/admin/listener.ora.jy1
# Generated by Oracle configuration tools.

LISTENER_JY1 =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = jy1-vip)(PORT = 1521)(IP = FIRST))
)
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 127.0.0.1)(PORT = 1521)(IP = FIRST))
)
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC))
)
)
)

节点jy2使用了原来的IP192.168.56.12所以将其修改为192.168.56.112
[root@jy2 bin]# cat /u01/app/oracle/product/10.2.0/db/network/admin/listener.ora
# listener.ora.jy2 Network Configuration File: /u01/app/oracle/product/10.2.0/db/network/admin/listener.ora.jy2
# Generated by Oracle configuration tools.

LISTENER_JY2 =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = jy2-vip)(PORT = 1521)(IP = FIRST))
)
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.56.12)(PORT = 1521)(IP = FIRST))
)
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC))
)
)
)

[root@jy2 bin]# vi /u01/app/oracle/product/10.2.0/db/network/admin/listener.ora
# listener.ora.jy2 Network Configuration File: /u01/app/oracle/product/10.2.0/db/network/admin/listener.ora.jy2
# Generated by Oracle configuration tools.

LISTENER_JY2 =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = jy2-vip)(PORT = 1521)(IP = FIRST))
)
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.56.112)(PORT = 1521)(IP = FIRST))
)
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC))
)
)
)

3.5 启动node applications, asm, 数据库
[root@jy1 bin]# ./srvctl start nodeapps -n jy1
[root@jy1 bin]# ./srvctl start nodeapps -n jy2
[root@jy1 bin]# ./srvctl start asm -n jy1
[root@jy1 bin]# ./srvctl start asm -n jy2
[root@jy1 bin]# ./srvctl start instance -d jyrac -i jyrac1
[root@jy1 bin]# ./srvctl start instance -d jyrac -i jyrac2

[root@jy1 bin]# ./crs_stat -t
Name Type Target State Host
————————————————————
ora….SM1.asm application ONLINE ONLINE jy1
ora….Y1.lsnr application ONLINE ONLINE jy1
ora.jy1.gsd application ONLINE ONLINE jy1
ora.jy1.ons application ONLINE ONLINE jy1
ora.jy1.vip application ONLINE ONLINE jy1
ora….SM2.asm application ONLINE ONLINE jy2
ora….Y2.lsnr application ONLINE ONLINE jy2
ora.jy2.gsd application ONLINE ONLINE jy2
ora.jy2.ons application ONLINE ONLINE jy2
ora.jy2.vip application ONLINE ONLINE jy2
ora.jyrac.db application ONLINE ONLINE jy2
ora….c1.inst application ONLINE ONLINE jy1
ora….c2.inst application ONLINE ONLINE jy2

3.6 确认IP, 修改完成。
[root@jy1 bin]# ifconfig
eth0 Link encap:Ethernet HWaddr 08:00:27:54:5A:6A
inet addr:192.168.56.111 Bcast:192.168.56.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fe54:5a6a/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:515539 errors:0 dropped:0 overruns:0 frame:0
TX packets:546714 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:294661949 (281.0 MiB) TX bytes:357427245 (340.8 MiB)
Base address:0xd010 Memory:f0000000-f0020000

eth0:1 Link encap:Ethernet HWaddr 08:00:27:54:5A:6A
inet addr:192.168.56.113 Bcast:192.168.56.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Base address:0xd010 Memory:f0000000-f0020000

eth1 Link encap:Ethernet HWaddr 08:00:27:C3:B7:37
inet addr:192.168.100.1 Bcast:192.168.100.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fec3:b737/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:21818 errors:0 dropped:0 overruns:0 frame:0
TX packets:32288 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:10823304 (10.3 MiB) TX bytes:26553389 (25.3 MiB)
Base address:0xd240 Memory:f0820000-f0840000

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:180050 errors:0 dropped:0 overruns:0 frame:0
TX packets:180050 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:57676272 (55.0 MiB) TX bytes:57676272 (55.0 MiB)

[root@jy2 bin]# ifconfig
eth0 Link encap:Ethernet HWaddr 08:00:27:41:5D:E5
inet addr:192.168.56.112 Bcast:192.168.56.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fe41:5de5/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:545442 errors:0 dropped:0 overruns:0 frame:0
TX packets:514967 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:356961066 (340.4 MiB) TX bytes:295147830 (281.4 MiB)
Base address:0xd010 Memory:f0000000-f0020000

eth0:1 Link encap:Ethernet HWaddr 08:00:27:41:5D:E5
inet addr:192.168.56.114 Bcast:192.168.56.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Base address:0xd010 Memory:f0000000-f0020000

eth1 Link encap:Ethernet HWaddr 08:00:27:4A:E3:CB
inet addr:192.168.100.2 Bcast:192.168.100.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fe4a:e3cb/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:32190 errors:0 dropped:0 overruns:0 frame:0
TX packets:21813 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:26535297 (25.3 MiB) TX bytes:10821634 (10.3 MiB)
Base address:0xd240 Memory:f0820000-f0840000

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:163867 errors:0 dropped:0 overruns:0 frame:0
TX packets:163867 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:30461066 (29.0 MiB) TX bytes:30461066 (29.0 MiB)

从输出结果可知修改成功.

四修改oracle 11g rac的scan IP,我这里没有使用DNS，只设置了一个scan IP:10.138.130.155现将其修改为10.138.130.159（因为11g rac修改public,private,vip IP与上面10g rac一样不再说明）

root@jyrac1 ~]# cat  /etc/hosts
10.138.130.151 jyrac1
10.138.130.152 jyrac2
10.10.10.1     jyrac1-priv
10.10.10.2     jyrac2-priv
10.138.130.153 jyrac1-vip
10.138.130.154 jyrac2-vip
10.138.130.155 jyrac-scan

4.1在所有节点中 /etc/hosts 文件中修改 scan 对应的ip

[root@jyrac1 ~]# vi /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               jyrac1 localhost.localdomain localhost
::1             localhost6.localdomain6 localhost6
10.138.130.151 jyrac1
10.138.130.152 jyrac2
10.10.10.1     jyrac1-priv
10.10.10.2     jyrac2-priv
10.138.130.153 jyrac1-vip
10.138.130.154 jyrac2-vip
10.138.130.159 jyrac-scan
~
[root@jyrac2 ~]# vi /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               jyrac2 localhost.localdomain localhost
::1             localhost6.localdomain6 localhost6
10.138.130.151 jyrac1
10.138.130.152 jyrac2
10.10.10.1     jyrac1-priv
10.10.10.2     jyrac2-priv
10.138.130.153 jyrac1-vip
10.138.130.154 jyrac2-vip
10.138.130.159 jyrac-scan

4.2 查看scan IP信息

[grid@jyrac1 ~]$ srvctl config scan
SCAN name: jyrac-scan, Network: 1/10.138.130.0/255.255.255.0/eth0
SCAN VIP name: scan1, IP: /jyrac-scan/10.138.130.155

4.3 停止scan_listener ，scan 应用

[grid@jyrac2 ~]$ srvctl stop scan_listener
[grid@jyrac2 ~]$ srvctl stop scan

4.4 确认 scan_listener,scan 的状态

[grid@jyrac1 ~]$ srvctl status scan_listener
SCAN Listener LISTENER_SCAN1 is enabled
SCAN listener LISTENER_SCAN1 is not running
[grid@jyrac1 ~]$ srvctl status scan
SCAN VIP scan1 is enabled
SCAN VIP scan1 is not running

[grid@jyrac1 ~]$ crs_stat -t | grep  scan  
ora.scan1.vip  ora....ip.type OFFLINE   OFFLINE               
[grid@jyrac1 ~]$ crs_stat -t | grep  lsnr
ora....ER.lsnr ora....er.type ONLINE    ONLINE    jyrac1      
ora....N1.lsnr ora....er.type OFFLINE   OFFLINE               
ora....C1.lsnr application    ONLINE    ONLINE    jyrac1      
ora....C2.lsnr application    ONLINE    ONLINE    jyrac2

4.5 修改scan 名称

[grid@jyrac1 ~]$ srvctl modify scan -h

Modifies the SCAN name.

Usage: srvctl modify scan -n 
    -n            Domain name qualified SCAN name 域名限定的 SCAN 名
    -h                       Print usage 输出帮助信息

grid 用户无权更改scan ip，必须使用root权限

[grid@jyrac2 ~]$ srvctl modify scan -n jyrac-scan
PRCS-1034 : Failed to modify Single Client Access Name jyrac-scan
PRCR-1071 : Failed to register or update resource type ora.scan_vip.type
CRS-0245:  User doesn't have enough privilege to perform the operation

[root@jyrac1 bin]# cd /u01/app/product/11.2.0/crs/bin
[root@jyrac1 bin]# pwd
/u01/app/product/11.2.0/crs/bin
[root@jyrac1 bin]# ./srvctl modify scan -n jyrac-scan

4.6 变更后，进行确认

[root@jyrac1 bin]# ./srvctl config scan
SCAN name: jyrac-scan, Network: 1/10.138.130.0/255.255.255.0/eth0
SCAN VIP name: scan1, IP: /jyrac-scan/10.138.130.159

4.7 重新启动scan，scan_listener

[grid@jyrac1 ~]$ srvctl start scan
[grid@jyrac1 ~]$ srvctl start scan_listener

4.8 确认scan 和scan_listener.ora的状态

[grid@jyrac1 ~]$ crs_stat -t | grep  scan
ora.scan1.vip  ora....ip.type ONLINE    ONLINE    jyrac1      
[grid@jyrac1 ~]$ crs_stat -t | grep  lsnr
ora....ER.lsnr ora....er.type ONLINE    ONLINE    jyrac1      
ora....N1.lsnr ora....er.type ONLINE    ONLINE    jyrac1      
ora....C1.lsnr application    ONLINE    ONLINE    jyrac1      
ora....C2.lsnr application    ONLINE    ONLINE    jyrac2

4.9 进行ping测试

[grid@jyrac2 ~]$ ping 10.138.130.159
PING 10.138.130.159 (10.138.130.159) 56(84) bytes of data.
64 bytes from 10.138.130.159: icmp_seq=1 ttl=64 time=1.88 ms
64 bytes from 10.138.130.159: icmp_seq=2 ttl=64 time=0.166 ms

修改scan IP与修改private ip ，vip 不一样，修改scan ip 无需停止数据库实例，asm 或者重启crs比较简单

Oracle RAC OCR 和 VotingDisk 的备份与恢复

Posted on 2014年11月18日 by YongJing(敬勇)

一:Voting Disk
Voting Disk 这个文件主要用于记录节点成员状态，在出现脑裂时，决定那个Partion获得控制权，其他的Partion必须从集群中剔除。Voting disk使用的是一种“多数可用算法”，如果有多个Voting disk，则必须一半以上的Votedisk同时使用，Clusterware才能正常使用。比如配置了4个Votedisk，坏一个Votedisk，集群可以正常工作，如果坏了2个，则不能满足半数以上，集群会立即宕掉，
所有节点立即重启，所以如果添加Votedisk，尽量不要只添加一个，而应该添加2个。这点和OCR 不一样。OCR 只需配置一个。

1.1查看votedisk的位置：

[root@jy1 ~]# cd u01/app/oracle/product/10.2.0/crs/bin

[root@jy1 bin]# ./crsctl query css votedisk
 0.     0    /dev/raw/raw2

located 1 votedisk(s).

1.2备份votedisk盘：

[root@jy1 bin]# dd if=/dev/raw/raw2 of=/home/oracle/votedisk.bak
6291456+0 records in
6291456+0 records out
3221225472 bytes (3.2 GB) copied, 201.63 seconds, 16 MB/s

1.3通过Strings 命令来查看 voting disk 的内容

[root@jy1 bin]# strings /home/oracle/votedisk.bak | sort -u
fSLC
ssLckcoT
SslcLlik
sSlcrEp0
}|{z

1.4恢复votedisk盘：

[root@jy1 bin]# dd if=/home/oracle/votedisk.bak of=/dev/raw/raw2
6291456+0 records in
6291456+0 records out
3221225472 bytes (3.2 GB) copied, 201.63 seconds, 16 MB/s

二 :OCR
Oracle Clusterware把整个集群的配置信息放在共享存储上,这些信息包括了集群节点的列表,集群数据库实例到节点的映射以及CRS应用程序资源信息。存放的位置就在OCR Disk上. 在整个集群中，只有一个节点能对OCR Disk 进行读写操作，这个节点叫作Master Node，所有节点都会在内存中保留一份OCR的拷贝，同时有一个OCR Process 从这个内存中读取内容。 OCR 内容发生改变时，由Master Node的OCR Process负责同步到其他节点的OCR Process。

Oracle 每4个小时对其做一次备份，并且保留最后的3个备份，以及前一天，前一周的最后一个备份。这个备份由Master Node CRSD进程完成，备份的默认位置是$CRS_HOME/crs/cdata/目录下，可以通过ocrconfig -backuploc 命令修改到新的目录。每次备份后，备份文件名自动更改，以反应备份时间顺序，最近一次的备份
叫作backup00.ocr。这些备份文件除了保存在本地，DBA还应该在其他存储设备上保留一份，以防止意外的存储故障。

[root@jy1 crs]# pwd
/u01/app/oracle/product/10.2.0/crs/cdata/crs
[root@jy1 crs]# ls -lrt
total 12396
-rw-r--r-- 1 root root 4227072 Nov 17 14:45 backup00.ocr
-rw-r--r-- 1 root root 4227072 Nov 17 14:45 week.ocr
-rw-r--r-- 1 root root 4227072 Nov 17 14:45 day.ocr

在安装clusterware过程中，如果选择External Redundancy冗余方式，则只能输入一个OCR磁盘位置。但是Oracle允许配置两个OCR 磁盘互为镜像，以防止OCR 磁盘的单点故障。 OCR 磁盘和Votedisk磁盘不一样，OCR磁盘最多只能有两个，一个Primary OCR 和一个Mirror OCR。

Oracle 推荐在对集群做调整时，比如增加，删除节点之前，修改RAC IP之前，对OCR做一个备份，可以使用export 备份到指定文件，如果做了replace或者restore 等操作，Oracle 建议使用 cluvfy comp ocr -n all 命令来做一次全面的检查。对OCR的备份与恢复，我们可以使用ocrconfig 命令。

[root@jy1 bin]# ./ocrconfig --help
Name:
        ocrconfig - Configuration tool for Oracle Cluster Registry.

Synopsis:
        ocrconfig [option]
        option:
                -export  [-s online]
                                                    - Export cluster register contents to a file
                -import                   - Import cluster registry contents from a file
                -upgrade [ []]
                                                    - Upgrade cluster registry from previous version
                -downgrade [-version ]
                                                    - Downgrade cluster registry to the specified version
                -backuploc                 - Configure periodic backup location
                -showbackup                         - Show backup information
                -restore                  - Restore from physical backup
                -replace ocr|ocrmirror [] - Add/replace/remove a OCR device/file
                -overwrite                          - Overwrite OCR configuration on disk
                -repair ocr|ocrmirror     - Repair local OCR configuration
                -help                               - Print out this help information

Note:
        A log file will be created in
        $ORACLE_HOME/log//client/ocrconfig_.log. Please ensure
        you have file creation privileges in the above directory before
        running this tool.

1. 用导出导入备份恢复OCR

1.1首先关闭所有节点的CRS

[root@jy1 bin]# ./crsctl stop crs
Stopping resources.
Successfully stopped CRS resources 
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.

[root@jy2 bin]# ./crsctl stop crs
Stopping resources.
Successfully stopped CRS resources 
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.

1.2用root 用户导出OCR内容

[root@jy1 bin]# ./ocrconfig -export /u01/ocrbak.exp
[root@jy1 bin]# ls -lrt /u01
total 96
drwxr-xr-x 3 root root  4096 Nov 10 23:12 app
drwxrwxrwx 6 root root  4096 Nov 11 11:54 tmp
-rw-r--r-- 1 root root 84375 Nov 17 16:52 ocrbak.exp

1.3重启CRS

[root@jy1 bin]# ./crsctl start crs
Attempting to start CRS stack 
The CRS stack will be started shortly

[root@jy2 bin]# ./crsctl start crs
Attempting to start CRS stack 
The CRS stack will be started shortly

1.4检查CRS 状态

Cannot communicate with EVM 
[root@jy1 bin]# ./crsctl check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy

[root@jy2 bin]# ./crsctl check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy

[root@jy1 bin]# ./crs_stat -t
Name           Type           Target    State     Host        
------------------------------------------------------------
ora....SM1.asm application    ONLINE    ONLINE    jy1         
ora....Y1.lsnr application    ONLINE    ONLINE    jy1         
ora.jy1.gsd    application    ONLINE    ONLINE    jy1         
ora.jy1.ons    application    ONLINE    ONLINE    jy1         
ora.jy1.vip    application    ONLINE    ONLINE    jy1         
ora....SM2.asm application    ONLINE    ONLINE    jy2         
ora....Y2.lsnr application    ONLINE    ONLINE    jy2         
ora.jy2.gsd    application    ONLINE    ONLINE    jy2         
ora.jy2.ons    application    ONLINE    ONLINE    jy2         
ora.jy2.vip    application    ONLINE    ONLINE    jy2         
ora.jyrac.db   application    ONLINE    ONLINE    jy2         
ora....c1.inst application    ONLINE    ONLINE    jy1         
ora....c2.inst application    ONLINE    ONLINE    jy2

1.5 检查OCR一致性

[root@jy1 bin]# ./ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :    3145640
         Used space (kbytes)      :       3816
         Available space (kbytes) :    3141824
         ID                       : 1032702449
         Device/File Name         : /dev/raw/raw1
                                    Device/File integrity check succeeded

                                    Device/File not configured

         Cluster registry integrity check succeeded

1.6破坏OCR内容

[root@jy1 bin]# dd if=/dev/zero of=/dev/raw/raw1 bs=8192 count=1000
1000+0 records in
1000+0 records out
8192000 bytes (8.2 MB) copied, 0.355733 seconds, 23.0 MB/s

1.7再次检查OCR一致性

[root@jy1 bin]# ./ocrcheck
PROT-601: Failed to initialize ocrcheck

再来执行crs_stat -t命令就会发现crs已经终止了

[root@jy1 bin]# ./crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.

1.8使用cluvfy 工具检查一致性

[root@jy1 cluvfy]# su - oracle
[oracle@jy1 ~]$ cd /soft/clusterware/cluvfy
[oracle@jy1 ~]$ ./runcluvfy.sh comp ocr -n all

Verifying OCR integrity 
Unable to retrieve nodelist from Oracle clusterware.

Verification cannot proceed.

1.9使用Import 恢复OCR 内容(使用restore选项只能导入OCR自动产生的物理备份， import选项只能导入通过export选项导出的的逻辑备份)

[root@jy1 bin]#  ./ocrconfig -import /u01/ocrbak.exp

1.10 再次检查OCR

[root@jy1 bin]# ./ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :    3145640
         Used space (kbytes)      :       3816
         Available space (kbytes) :    3141824
         ID                       : 1032702449
         Device/File Name         : /dev/raw/raw1
                                    Device/File integrity check succeeded

                                    Device/File not configured

         Cluster registry integrity check succeeded

1.11 使用cluvfy工具检查

[root@jy1 cluvfy]# su - oracle
[oracle@jy1 ~]$ cd /soft/clusterware/cluvfy
[oracle@jy1 cluvfy]$ ./runcluvfy.sh comp ocr -n all

Verifying OCR integrity 

Checking OCR integrity...

Checking the absence of a non-clustered configuration...
All nodes free of non-clustered, local-only configurations.

Uniqueness check for OCR device passed.

Checking the version of OCR...
OCR of correct Version "2" exists.

Checking data integrity of OCR...
Data integrity check for OCR passed.

OCR integrity check passed.

Verification of OCR integrity was successful.

2使用自动备份恢复OCR
2.1关闭运行在集群数据库的所有节点上的CRS服务程序(在Oracle 11gR2 中已经没有了init.crs 命令了。只能通过crsctl stop crs命令来关闭CRS.)
/etc/init.d/init.crs stop 或者crsctl stop crs

2.2 通过ocrconfig 的showbackup选项查看最近的备份

[root@jy1 bin]# /etc/init.d/init.crs stop
Shutting down Oracle Cluster Ready Services (CRS):
Stopping resources.
Successfully stopped CRS resources 
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.
Shutdown has begun. The daemons should exit soon.

[root@jy2 bin]# /etc/init.d/init.crs stop
Shutting down Oracle Cluster Ready Services (CRS):
Stopping resources.
Successfully stopped CRS resources 
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.
Shutdown has begun. The daemons should exit soon.

2.2通过ocrconfig 的showbackup选项查看最近的备份

[root@jy1 bin]# ./ocrconfig -showbackup

jy1     2014/11/17 14:45:54     /u01/app/oracle/product/10.2.0/crs/cdata/crs

          

          

jy1     2014/11/17 14:45:54     /u01/app/oracle/product/10.2.0/crs/cdata/crs

jy1     2014/11/17 14:45:54     /u01/app/oracle/product/10.2.0/crs/cdata/crs
[root@jy1 bin]# ls -lrt /u01/app/oracle/product/10.2.0/crs/cdata/crs
total 12396
-rw-r--r-- 1 root root 4227072 Nov 17 14:45 backup00.ocr
-rw-r--r-- 1 root root 4227072 Nov 17 14:45 week.ocr
-rw-r--r-- 1 root root 4227072 Nov 17 14:45 day.ocr

2.3破坏OCR内容

[root@jy1 bin]# dd if=/dev/zero of=/dev/raw/raw1 bs=8192 count=1000
1000+0 records in
1000+0 records out
8192000 bytes (8.2 MB) copied, 0.355733 seconds, 23.0 MB/s

2.4再次检查OCR一致性

[root@jy1 bin]# ./ocrcheck
PROT-601: Failed to initialize ocrcheck

再来执行crs_stat -t命令就会发现crs已经终止了

[root@jy1 bin]# ./crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.

2.5使用cluvfy 工具检查一致性

[root@jy1 cluvfy]# su - oracle
[oracle@jy1 ~]$ cd /soft/clusterware/cluvfy
[oracle@jy1 ~]$ ./runcluvfy.sh comp ocr -n all

Verifying OCR integrity 
Unable to retrieve nodelist from Oracle clusterware.

Verification cannot proceed.

2.6通过ocrconfig的restore或import选项导入OCR数据(使用restore选项只能导入OCR自动产生的物理备份,import选项只能导入通过export选项导出的的逻辑备份)
ocrconfig -restore filename_location

                         
[root@jy1 bin]# ./ocrconfig  -restore  /u01/app/oracle/product/10.2.0/crs/cdata/crs/backup00.ocr

2.7 检查CRS

[root@jy1 bin]# ./ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :    3145640
         Used space (kbytes)      :       3816
         Available space (kbytes) :    3141824
         ID                       : 1387716561
         Device/File Name         : /dev/raw/raw1
                                    Device/File integrity check succeeded

                                    Device/File not configured

         Cluster registry integrity check succeeded

2.8 使用cluvfy工具检查

[root@jy1 cluvfy]# su - oracle
[oracle@jy1 ~]$ cd /soft/clusterware/cluvfy
[oracle@jy1 cluvfy]$ ./runcluvfy.sh comp ocr -n all

Verifying OCR integrity 

Checking OCR integrity...

Checking the absence of a non-clustered configuration...
All nodes free of non-clustered, local-only configurations.

Uniqueness check for OCR device passed.

Checking the version of OCR...
OCR of correct Version "2" exists.

Checking data integrity of OCR...
Data integrity check for OCR passed.

OCR integrity check passed.

Verification of OCR integrity was successful.

2.9 在所有节点上重新启动CRS
/etc/init.d/init.crs start 而在Oracle 11gR2使用:crsctl start crs 命令来启动CRS.

[root@jy1 bin]# /etc/init.d/init.crs start
Startup will be queued to init within 90 seconds.

[root@jy2 bin]# /etc/init.d/init.crs start
Startup will be queued to init within 90 seconds.

[root@jy1 bin]# ./crs_stat -t
Name           Type           Target    State     Host        
------------------------------------------------------------
ora....SM1.asm application    ONLINE    ONLINE    jy1         
ora....Y1.lsnr application    ONLINE    ONLINE    jy1         
ora.jy1.gsd    application    ONLINE    ONLINE    jy1         
ora.jy1.ons    application    ONLINE    ONLINE    jy1         
ora.jy1.vip    application    ONLINE    ONLINE    jy1         
ora....SM2.asm application    ONLINE    ONLINE    jy2         
ora....Y2.lsnr application    ONLINE    ONLINE    jy2         
ora.jy2.gsd    application    ONLINE    ONLINE    jy2         
ora.jy2.ons    application    ONLINE    ONLINE    jy2         
ora.jy2.vip    application    ONLINE    ONLINE    jy2         
ora.jyrac.db   application    ONLINE    ONLINE    jy1         
ora....c1.inst application    ONLINE    ONLINE    jy1         
ora....c2.inst application    ONLINE    ONLINE    jy2

Linux 5.4在安装rac执行root.sh Failed to upgrade Oracle Cluster Registry configuration故障处理

Posted on 2014年11月12日 by YongJing(敬勇)

因为oracle 10g版权被全省买断，11g没有买版权所以朋友要在Red hat Linux 5.4下使用裸设备来安装oracle 10g rac数据库。但在安装过程出现了一些错误到节点2上用root用户身份执行/u01/app/oracle/product/10.2.0/crs下的root.sh：
/u01/app/product/10.2.0/crs/jdk/jre//bin/java: error while loading shared libraries:
libpthread.so.0: cannot open shared object file: No such file or directory

节点2上执行root.sh之前需要编辑两个文件，这两个文件都位于/u01/app/oracle/product/10.2.0/crs/bin下，文件名分别为vipca和srvctl。首先编辑vipca文件，找到如下内容：
Remove this workaround when the bug 3937317 is fixed arch=`uname -m`

if [ “$arch” = “i686” -o “$arch” = “ia64” ]
then
LD_ASSUME_KERNEL=2.4.19
export LD_ASSUME_KERNEL
fi
#End workaround
在fi 后新添加一行：
unset LD_ASSUME_KERNEL
然后编辑srvctl文件，找到如下内容：
LD_ASSUME_KERNEL=2.4.19
export LD_ASSUME_KERNEL
同样在其后新增加一行： unset LD_ASSUME_KERNEL

同时需要你改的还有节点1上的$ORACLE_HOME/bin/srvctl 文件，不然等装完数据库之后，srvctl 命令也是会报这个错误地。

再次执行root.sh脚本报错：Failed to upgrade Oracle Cluster Registry configuration
[root@rac2 crs]#./u01/app/oracle/product/10.2.0/crs/root.sh
WARNING: directory ‘/u01/app/oracle/product’ is not owned by root
WARNING: directory ‘/u01/app/oracle’ is not owned by root
WARNING: directory ‘/u01/app’ is not owned by root
WARNING: directory ‘/u01’ is not owned by root
Checking to see if Oracle CRS stack is already configured

Setting the permissions on OCR backup directory
Setting up NS directories
PROT-1: Failed to initialize ocrconfig
Failed to upgrade Oracle Cluster Registry configuration
一般出现这个问题的原因有几种：

（1） Bug
这种情况参考：
Executing root.sh errors with “Failed To Upgrade Oracle Cluster Registry Configuration” [ID 466673.1]
（2） raw 设备的权限问题
[root@rac2 /]# ls -lrt /dev/raw/
total 0
crw-rw—- 1 oracle oinstall 162, 4 Nov 10 23:34 raw4
crw-rw—- 1 oracle oinstall 162, 3 Nov 10 23:34 raw3
crw-rw—- 1 root oinstall 162, 2 Nov 10 23:34 raw2
crw-rw—- 1 root oinstall 162, 1 Nov 10 23:34 raw1

权限要设置成如下：
chown root.oinstall /dev/raw/raw[1-2]
chown oracle.oinstall /dev/raw/raw[3-5]

chmod 775 /dev/raw/raw[1-2]
chmod 775 /dev/raw/raw[3-5]

可以在udev的规则里设置

（3） raw 设备读写问题
这种情况参考：
Placement of Voting disk and OCR Files in Oracle RAC 10g and 11gR1 [ID 293819.1]
清空一下RAW:
[root@rac2 raw]# dd if=/dev/zero of=/dev/mapper/crsp6 bs=1M count=200

我这里是属于bug用clsfmt检查raw 设备
[root@rac2 bin]# ./clsfmt.bin ocr /dev/raw/raw1
clsfmt: Received unexpected error 4 from skgfifi
skgfifi: Additional information: -2
Additional information: -1073741824

这里没有成功是bug
用Pathch中的新文件替换clsfmt.bin 文件
[root@rac2 4679769]# cp clsfmt.bin /u01/app/oracle/product/10.2.0/crs/bin/

再次测试：
[root@rac2 bin]# ./clsfmt.bin ocr /dev/raw/raw1
clsfmt: successfully initialized file /dev/raw/raw1
[root@rac2 bin]# ./clsfmt.bin ocr /dev/raw/raw2
clsfmt: successfully initialized file /dev/raw/raw2
[root@rac2 bin]# ./clsfmt.bin ocr /dev/raw/raw3
clsfmt: successfully initialized file /dev/raw/raw3
[root@rac2 bin]# ./clsfmt.bin ocr /dev/raw/raw4
clsfmt: successfully initialized file /dev/raw/raw4

[root@jy2 bin]# ./clsfmt.bin ocr /dev/raw/raw1
clsfmt: successfully initialized file /dev/raw/raw1
[root@jy2 bin]# ./clsfmt.bin ocr /dev/raw/raw2
clsfmt: successfully initialized file /dev/raw/raw2
[root@jy2 bin]# ./clsfmt.bin ocr /dev/raw/raw3
clsfmt: successfully initialized file /dev/raw/raw3
[root@jy2 bin]# ./clsfmt.bin ocr /dev/raw/raw4
clsfmt: successfully initialized file /dev/raw/raw4

再次执行root.sh报如上错误:
Error 0(Native: listNetInterfaces:[3])
[Error 0(Native: listNetInterfaces:[3])]
You have mail in /var/spool/mail/root
Error 0(Native: listNetInterfaces:[3])
[Error 0(Native: listNetInterfaces:[3])]

解决方式如下：
[root@rac2 crs]# cd bin
[root@rac2 bin]# ./oifcfg iflist
eth0 192.168.1.0
eth1 192.168.1.0
[root@rac2 bin]# ./oifcfg setif -global eth0/192.168.1.0:public
[root@rac2 bin]# ./oifcfg setif -global eth1/192.168.1.0:cluster_interconnect
[root@rac2 bin]# ./oifcfg getif
eth0 192.168.1.0 global public
eth1 192.168.1.0 global cluster_interconnect

补丁文件:
p4679769_10201_LINUX

p4679769_10201_Linux-x86-64

p4679769_10202_LINUX-zSer

hp rx6600两台oracle双机互备服务器其中一台经常自动关机的故障诊断

Posted on 2014年11月7日2014年11月7日 by YongJing(敬勇)

hp rx6600两台oracle数据库双机互备服务器其中一台经常自动关机，刚好在做巡检时遇到了就顺便检查一下原因.检查经常出故障的一台小机日志信息如下:

rx6600-1:[/]#cat /var/adm/syslog/syslog.log
Nov  6 10:40:35 rx6600-1 syslogd: restart
Nov  6 10:40:35 rx6600-1 vmunix: Found adjacent data tr.  Growing size.  0x32a6000 -> 0x72a6000.
Nov  6 10:40:35 rx6600-1 vmunix: Pinned PDK malloc pool: base: 0xe000000100d5a000  size=117400K
Nov  6 10:40:35 rx6600-1 vmunix: Loaded ACPI revision 2.0 tables.
Nov  6 10:40:35 rx6600-1 vmunix: MMIO on this platform supports Write Coalescing.
Nov  6 10:40:35 rx6600-1 vmunix: 
Nov  6 10:40:35 rx6600-1 vmunix: MFS is defined: base= 0xe000000100d5a000  size= 5084 KB
Nov  6 10:40:35 rx6600-1 vmunix: Unpinned PDK malloc pool: base: 0xe000000108000000  size=393216K
Nov  6 10:40:35 rx6600-1 vmunix: NOTICE: cachefs_link(): File system was registered at index 5.
Nov  6 10:40:35 rx6600-1 vmunix: emcp:GPX:Info: GPX emcpgpx_install() success.
Nov  6 10:40:35 rx6600-1 vmunix: 
Nov  6 10:40:35 rx6600-1  above message repeats 2 times
Nov  6 10:40:35 rx6600-1 vmunix: emcp:GPX:Info: DM emcpgpx_dm_install() success.
Nov  6 10:40:35 rx6600-1 vmunix: emcp:GPX:Info: VLUMD emcpgpx_vlumd_install() success.
Nov  6 10:40:35 rx6600-1 vmunix: emcp:GPX:Info: XCRYPT emcpgpx_xcrypt_install() success.
Nov  6 10:40:35 rx6600-1 vmunix: NOTICE: nfs3_link(): File system was registered at index 8.
Nov  6 10:40:35 rx6600-1 vmunix: NOTICE: mod_fs_reg: Cannot retrieve configured loading phase from KRS for module: cifs. Setting to load at INIT
Nov  6 10:40:35 rx6600-1 vmunix: 
Nov  6 10:40:35 rx6600-1 vmunix: 0 sba
Nov  6 10:40:35 rx6600-1 vmunix: 0/0 lba
Nov  6 10:40:35 rx6600-1 vmunix: 0/0/1/0 rmp3f01
Nov  6 10:40:35 rx6600-1 vmunix: 0/0/1/1 rmp3f01
Nov  6 10:40:35 rx6600-1 vmunix: 0/0/1/2 asio0
Nov  6 10:40:35 rx6600-1 vmunix: 0/0/2/0 UsbOhci
Nov  6 10:40:35 rx6600-1 vmunix: NOTICE: USB device attached.  Identification String: 
Nov  6 10:40:35 rx6600-1 vmunix: Devices/Device/USB/Standard/hp/Unknown/0_1
Nov  6 10:40:35 rx6600-1 vmunix:  <2.1.3.10.1008.4390.1>
Nov  6 10:40:35 rx6600-1 vmunix: 0/0/2/0.0 UsbMiniBus
Nov  6 10:40:35 rx6600-1 vmunix: Devices/Keyboard/USB/Boot/hp/Unknown/0_1
Nov  6 10:40:35 rx6600-1 vmunix:  <2.305.3.100.1008.4390.1>
Nov  6 10:40:35 rx6600-1 vmunix: 0/0/2/0.0.0 UsbBootKeyboard
Nov  6 10:40:35 rx6600-1 vmunix: Devices/Mouse/USB/Standard/hp/Unknown/0_1
Nov  6 10:40:35 rx6600-1 vmunix:  <2.307.3.10.1008.4390.1>
Nov  6 10:40:35 rx6600-1 vmunix: 0/0/2/1 UsbOhci
Nov  6 10:40:35 rx6600-1 vmunix: Devices/Device/USB/Standard/hp/Multibay/0_a1
Nov  6 10:40:35 rx6600-1 vmunix:  <2.1.3.10.1008.294.161>
Nov  6 10:40:35 rx6600-1 vmunix: 0/0/2/1.0 UsbMiniBus
Nov  6 10:40:35 rx6600-1 vmunix: Devices/MassStorage-SCSI/USB/BulkOnly/hp/Multibay/0_a1
Nov  6 10:40:35 rx6600-1 vmunix:  <2.310.3.150.1008.294.161>
Nov  6 10:40:35 rx6600-1 vmunix: 0/0/2/1.0.0 UsbBulkOnlyMS
Nov  6 10:40:35 rx6600-1 vmunix: Devices/ScsiControllerAdaptor/USB/BulkOnly/hp/Multibay
Nov  6 10:40:35 rx6600-1 vmunix:  <2.1000.3.150.1008.294>
Nov  6 10:40:35 rx6600-1 vmunix: 0/0/2/1.0.16 UsbScsiAdaptor
Nov  6 10:40:35 rx6600-1 vmunix: NOTICE: USB device attached.  Identification String: 
Nov  6 10:40:36 rx6600-1  above message repeats 5 times
Nov  6 10:40:35 rx6600-1 vmunix: 0/0/2/1.0.16.0 tgt
Nov  6 10:40:35 rx6600-1 vmunix: 0/0/2/1.0.16.0.0 sdisk
Nov  6 10:40:35 rx6600-1 vmunix: 0/0/2/1.0.16.7 tgt
Nov  6 10:40:35 rx6600-1 vmunix: 0/0/2/1.0.16.7.0 sctl
Nov  6 10:40:35 rx6600-1 vmunix: NOTICE: USB device attached.  Identification String: 
Nov  6 10:40:35 rx6600-1 vmunix: Devices/Device/USB/Standard/Avocent/KVMAdaptor/1_0
Nov  6 10:40:35 rx6600-1 vmunix:  <2.1.3.10.1572.833.256>
Nov  6 10:40:35 rx6600-1 vmunix: 0/0/2/1.1 UsbMiniBus
Nov  6 10:40:35 rx6600-1 vmunix: Devices/Keyboard/USB/Boot/Avocent/KVMAdaptor/1_0
Nov  6 10:40:35 rx6600-1 vmunix:  <2.305.3.100.1572.833.256>
Nov  6 10:40:35 rx6600-1 vmunix: 0/0/2/1.1.0 UsbBootKeyboard
Nov  6 10:40:35 rx6600-1 vmunix: Devices/Mouse/USB/Boot/Avocent/KVMAdaptor/1_0
Nov  6 10:40:35 rx6600-1 vmunix:  <2.307.3.100.1572.833.256>
Nov  6 10:40:35 rx6600-1 vmunix: 0/0/2/1.1.1 UsbBootMouse
Nov  6 10:40:35 rx6600-1 vmunix: NOTICE: USB device attached.  Identification String: 
Nov  6 10:40:36 rx6600-1  above message repeats 2 times
Nov  6 10:40:35 rx6600-1 vmunix: 0/0/2/2 UsbEhci
Nov  6 10:40:35 rx6600-1 vmunix: 0/0/4/0 gvid_core
Nov  6 10:40:35 rx6600-1 vmunix: 0/1 lba
Nov  6 10:40:35 rx6600-1 vmunix: 0/2 lba
Nov  6 10:40:35 rx6600-1 vmunix: 0/2/1/0 PCItoPCI
Nov  6 10:40:35 rx6600-1 vmunix: fcd: Claimed HP AD193-60001 4Gb Fibre Channel port at hardware path 0/2/1/0/4/0 (FC Port 1 on HBA)
Nov  6 10:40:35 rx6600-1 vmunix: 0/2/1/0/4/0 fcd
Nov  6 10:40:35 rx6600-1 vmunix: 0/2/1/0/6/0 iether
Nov  6 10:40:35 rx6600-1 vmunix: 0/3 lba
Nov  6 10:40:35 rx6600-1 vmunix: 0/3/1/0 PCItoPCI
Nov  6 10:40:35 rx6600-1 vmunix: fcd: Claimed HP AD193-60001 4Gb Fibre Channel port at hardware path 0/3/1/0/4/0 (FC Port 1 on HBA)
Nov  6 10:40:35 rx6600-1 vmunix: 0/3/1/0/4/0 fcd
Nov  6 10:40:35 rx6600-1 vmunix: 0/3/1/0/6/0 iether
Nov  6 10:40:35 rx6600-1 vmunix: 0/4 lba
Nov  6 10:40:35 rx6600-1 vmunix: sasd: Claimed HP PCI/PCI-X SAS MPT adapter at hardware path 0/4/1/0 
Nov  6 10:40:35 rx6600-1 vmunix: 0/4/1/0 sasd
Nov  6 10:40:35 rx6600-1 vmunix: 0/4/2/0 iether
Nov  6 10:40:35 rx6600-1 vmunix: 0/4/2/1 iether
Nov  6 10:40:35 rx6600-1 vmunix: 0/5 lba
Nov  6 10:40:35 rx6600-1 vmunix: 0/5/1/0 PCItoPCI
Nov  6 10:40:35 rx6600-1 vmunix: fcd: Claimed HP AD193-60001 4Gb Fibre Channel port at hardware path 0/5/1/0/4/0 (FC Port 1 on HBA)
Nov  6 10:40:35 rx6600-1 vmunix: 0/5/1/0/4/0 fcd
Nov  6 10:40:35 rx6600-1 vmunix: 0/5/1/0/6/0 iether
Nov  6 10:40:35 rx6600-1 vmunix: 0/6 lba
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0 PCItoPCI
Nov  6 10:40:35 rx6600-1 vmunix: fcd: Claimed HP AD193-60001 4Gb Fibre Channel port at hardware path 0/6/1/0/4/0 (FC Port 1 on HBA)
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0 fcd
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/6/0 iether
Nov  6 10:40:35 rx6600-1 vmunix: 0/7 lba
Nov  6 10:40:35 rx6600-1 vmunix: Initializing the Ultra320 SCSI Controller at 0/7/1/0. Controller firmware version is 01.03.35.70
Nov  6 10:40:35 rx6600-1 vmunix: 0/7/1/0 mpt
Nov  6 10:40:35 rx6600-1 vmunix: Initializing the Ultra320 SCSI Controller at 0/7/1/1. Controller firmware version is 01.03.35.70
Nov  6 10:40:35 rx6600-1 vmunix: 0/7/1/1 mpt
Nov  6 10:40:35 rx6600-1 vmunix: 120 processor
Nov  6 10:40:35 rx6600-1 vmunix: 121 processor
Nov  6 10:40:35 rx6600-1 vmunix: 122 processor
Nov  6 10:40:35 rx6600-1 vmunix: 123 processor
Nov  6 10:40:35 rx6600-1 vmunix: 124 processor
Nov  6 10:40:35 rx6600-1 vmunix: 125 processor
Nov  6 10:40:35 rx6600-1 vmunix: 126 processor
Nov  6 10:40:35 rx6600-1 vmunix: 127 processor
Nov  6 10:40:35 rx6600-1 vmunix: 250 pdh
Nov  6 10:40:35 rx6600-1 vmunix: 250/0 ipmi
Nov  6 10:40:35 rx6600-1 vmunix: 250/1 asio0
Nov  6 10:40:35 rx6600-1 vmunix: 250/2 acpi_node
Nov  6 10:40:35 rx6600-1 vmunix: 0/7/1/0.7 tgt
Nov  6 10:40:35 rx6600-1 vmunix: 0/7/1/0.7.0 sctl
Nov  6 10:40:35 rx6600-1 vmunix: 0/5/1/0/4/0.1 fcd_fcp
Nov  6 10:40:35 rx6600-1 vmunix: 0/5/1/0/4/0.1.9.0.0 fcd_vbus
Nov  6 10:40:35 rx6600-1 vmunix: 0/5/1/0/4/0.1.9.255.0 fcd_vbus
Nov  6 10:40:35 rx6600-1 vmunix: 0/5/1/0/4/0.1.13.255.0 fcd_vbus
Nov  6 10:40:35 rx6600-1 vmunix: 0/5/1/0/4/0.1.13.255.0.0 tgt
Nov  6 10:40:35 rx6600-1 vmunix: 0/5/1/0/4/0.1.13.255.0.0.0 sdisk
Nov  6 10:40:35 rx6600-1 vmunix: 0/5/1/0/4/0.1.9.0.0.0 tgt
Nov  6 10:40:35 rx6600-1 vmunix: 0/5/1/0/4/0.1.9.255.0.0 tgt
Nov  6 10:40:35 rx6600-1 vmunix: 0/5/1/0/4/0.1.9.0.0.0.0 sdisk
Nov  6 10:40:35 rx6600-1 vmunix: 0/5/1/0/4/0.1.9.255.0.0.0 sctl
Nov  6 10:40:35 rx6600-1 vmunix: 0/5/1/0/4/0.1.9.0.0.0.1 sdisk
Nov  6 10:40:35 rx6600-1 vmunix: 0/5/1/0/4/0.1.9.0.0.0.2 sdisk
Nov  6 10:40:35 rx6600-1 vmunix: 0/5/1/0/4/0.1.9.0.0.0.3 sdisk
Nov  6 10:40:35 rx6600-1 vmunix: 0/5/1/0/4/0.1.9.0.0.0.4 sdisk
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1 fcd_fcp
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1.9.0.0 fcd_vbus
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1.9.255.0 fcd_vbus
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1.13.0.0 fcd_vbus
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1.13.255.0 fcd_vbus
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1.13.0.0.0 tgt
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1.13.255.0.0 tgt
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1.9.0.0.0 tgt
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1.13.0.0.0.0 sdisk
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1.9.255.0.0 tgt
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1.13.255.0.0.0 sctl
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1.9.0.0.0.0 sdisk
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1.9.255.0.0.0 sctl
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1.13.0.0.0.1 sdisk
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1.13.0.0.0.2 sdisk
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1.13.0.0.0.3 sdisk
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1.9.0.0.0.1 sdisk
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1.13.0.0.0.4 sdisk
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1.9.0.0.0.2 sdisk
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1.9.0.0.0.3 sdisk
Nov  6 10:40:35 rx6600-1 vmunix: 0/6/1/0/4/0.1.9.0.0.0.4 sdisk
Nov  6 10:40:35 rx6600-1 vmunix: 0/7/1/1.7 tgt
Nov  6 10:40:35 rx6600-1 vmunix: 0/7/1/1.7.0 sctl
Nov  6 10:40:35 rx6600-1 vmunix: 0/4/1/0.0.0 sasd_vbus
Nov  6 10:40:35 rx6600-1 vmunix: 0/4/1/0.0.0.0 tgt
Nov  6 10:40:35 rx6600-1 vmunix: 0/4/1/0.0.0.0.0 sdisk
Nov  6 10:40:35 rx6600-1 vmunix: Boot device's HP-UX HW path is: 0/4/1/0.0.0.0.0
Nov  6 10:40:35 rx6600-1 vmunix: 
Nov  6 10:40:35 rx6600-1 vmunix:     System Console is on the Built-In Serial Interface
Nov  6 10:40:35 rx6600-1 vmunix: iether0: INITIALIZING HP AD193-60001 PCI/PCI-X 1000Base-T 4Gb FC/1000B-T Combo Adapter at hardware path 0/2/1/0/6/0
Nov  6 10:40:35 rx6600-1 vmunix: iether1: INITIALIZING HP AD193-60001 PCI/PCI-X 1000Base-T 4Gb FC/1000B-T Combo Adapter at hardware path 0/3/1/0/6/0
Nov  6 10:40:35 rx6600-1 vmunix: iether2: INITIALIZING HP AB352-60003 PCI/PCI-X 1000Base-T Dual-port Core at hardware path 0/4/2/0
Nov  6 10:40:35 rx6600-1 vmunix: iether4: INITIALIZING HP AD193-60001 PCI/PCI-X 1000Base-T 4Gb FC/1000B-T Combo Adapter at hardware path 0/5/1/0/6/0
Nov  6 10:40:35 rx6600-1 vmunix: iether5: INITIALIZING HP AD193-60001 PCI/PCI-X 1000Base-T 4Gb FC/1000B-T Combo Adapter at hardware path 0/6/1/0/6/0
Nov  6 10:40:35 rx6600-1 vmunix: iether3: INITIALIZING HP AB352-60003 PCI/PCI-X 1000Base-T Dual-port Core at hardware path 0/4/2/1
Nov  6 10:40:35 rx6600-1 vmunix: Logical volume 64, 0x3 configured as ROOT
Nov  6 10:40:35 rx6600-1 vmunix: Logical volume 64, 0x2 configured as SWAP
Nov  6 10:40:35 rx6600-1 vmunix: Logical volume 64, 0x2 configured as DUMP
Nov  6 10:40:35 rx6600-1 vmunix:     Swap device table:  (start & size given in 512-byte blocks)
Nov  6 10:40:35 rx6600-1 vmunix:         entry 0 - major is 64, minor is 0x2; start = 0, size = 16777216
Nov  6 10:40:35 rx6600-1 vmunix:     Dump device table:  (start & size given in 1-Kbyte blocks)
Nov  6 10:40:35 rx6600-1 vmunix:         entry 0000000000000000 - major is 31, minor is 0x30000; start = 2349940, size = 8388604
Nov  6 10:40:35 rx6600-1 vmunix: Starting the STREAMS daemons-phase 1
Nov  6 10:40:35 rx6600-1 vmunix: Create STCP device files
Nov  6 10:40:35 rx6600-1 vmunix: Starting the STREAMS daemons-phase 2
Nov  6 10:40:35 rx6600-1 vmunix:      $Revision: vmunix:    B11.23_LR FLAVOR=perf Fri Aug 29 22:35:38 PDT 2003 $
Nov  6 10:40:35 rx6600-1 vmunix: Memory Information:
Nov  6 10:40:35 rx6600-1 vmunix:     physical page size = 4096 bytes, logical page size = 4096 bytes
Nov  6 10:40:35 rx6600-1 vmunix:     Physical: 25133536 Kbytes, lockable: 18994328 Kbytes, available: 22051156 Kbytes
Nov  6 10:40:35 rx6600-1 vmunix: 
Nov  6 10:40:36 rx6600-1 nettl[832]: nettl starting up.
Nov  6 10:40:48 rx6600-1 sshd[986]: Server listening on :: port 22.
Nov  6 10:40:48 rx6600-1 sshd[986]: Server listening on 0.0.0.0 port 22.
Nov  6 10:40:49 rx6600-1 rpcbind: check_netconfig: Found CLTS loopback transport
Nov  6 10:40:49 rx6600-1 rpcbind: check_netconfig: Found COTS loopback transport
Nov  6 10:40:49 rx6600-1 rpcbind: check_netconfig: Found COTS ORD loopback transport
Nov  6 10:40:49 rx6600-1 rpcbind: init_transport: check binding for udp
Nov  6 10:40:49 rx6600-1 rpcbind: init_transport: check binding for tcp
Nov  6 10:40:49 rx6600-1 rpcbind: init_transport: check binding for ticlts
Nov  6 10:40:49 rx6600-1 rpcbind: init_transport: check binding for ticotsord
Nov  6 10:40:49 rx6600-1 rpcbind: init_transport: check binding for ticots
Nov  6 10:40:50 rx6600-1 inetd[1100]: Reading configuration
Nov  6 10:40:50 rx6600-1 inetd[1100]: ftp/tcp: Added service, server /usr/lbin/ftpd
Nov  6 10:40:50 rx6600-1 inetd[1100]: telnet/tcp: Added service, server /usr/lbin/telnetd
Nov  6 10:40:50 rx6600-1 inetd[1100]: tftp/udp: Added service, server /usr/lbin/tftpd
Nov  6 10:40:50 rx6600-1 inetd[1100]: login/tcp: Added service, server /usr/lbin/rlogind
Nov  6 10:40:50 rx6600-1 inetd[1100]: shell/tcp: Added service, server /usr/lbin/remshd
Nov  6 10:40:50 rx6600-1 inetd[1100]: exec/tcp: Added service, server /usr/lbin/rexecd
Nov  6 10:40:50 rx6600-1 inetd[1100]: ntalk/udp: Added service, server /usr/lbin/ntalkd
Nov  6 10:40:50 rx6600-1 inetd[1100]: auth/tcp: Added service, server /usr/lbin/identd
Nov  6 10:40:50 rx6600-1 inetd[1100]: printer/tcp: Added service, server /usr/sbin/rlpdaemon
Nov  6 10:40:51 rx6600-1 inetd[1100]: daytime/tcp: Added service, server internal
Nov  6 10:40:51 rx6600-1 inetd[1100]: daytime/udp: Added service, server internal
Nov  6 10:40:51 rx6600-1 inetd[1100]: time/tcp: Added service, server internal
Nov  6 10:40:51 rx6600-1 inetd[1100]: echo/tcp: Added service, server internal
Nov  6 10:40:51 rx6600-1 inetd[1100]: echo/udp: Added service, server internal
Nov  6 10:40:51 rx6600-1 inetd[1100]: discard/tcp: Added service, server internal
Nov  6 10:40:51 rx6600-1 inetd[1100]: discard/udp: Added service, server internal
Nov  6 10:40:51 rx6600-1 inetd[1100]: chargen/tcp: Added service, server internal
Nov  6 10:40:51 rx6600-1 inetd[1100]: chargen/udp: Added service, server internal
Nov  6 10:40:51 rx6600-1 inetd[1100]: kshell/tcp: Added service, server /usr/lbin/remshd
Nov  6 10:40:51 rx6600-1 inetd[1100]: klogin/tcp: Added service, server /usr/lbin/rlogind
Nov  6 10:40:51 rx6600-1 inetd[1100]: dtspc/tcp: Added service, server /usr/dt/bin/dtspcd
Nov  6 10:40:51 rx6600-1 inetd[1100]: recserv/tcp: Added service, server /usr/lbin/recserv
Nov  6 10:40:51 rx6600-1 inetd[1100]: swat/tcp: Added service, server /opt/samba/bin/swat
Nov  6 10:40:51 rx6600-1 inetd[1100]: registrar/tcp: Added service, server /etc/opt/resmon/lbin/registrar
Nov  6 10:40:51 rx6600-1 inetd[1100]: hacl-probe/tcp: Added service, server /opt/cmom/lbin/cmomd
Nov  6 10:40:51 rx6600-1 inetd[1100]: hacl-cfg/udp: Added service, server /usr/lbin/cmclconfd
Nov  6 10:40:51 rx6600-1 inetd[1100]: hacl-cfg/tcp: Added service, server /usr/lbin/cmclconfd
Nov  6 10:40:51 rx6600-1 inetd[1100]: instl_boots/udp: Added service, server /opt/ignite/lbin/instl_bootd
Nov  6 10:40:51 rx6600-1 inetd[1100]: omni/tcp: Added service, server /opt/omni/lbin/inet
Nov  6 10:40:51 rx6600-1 inetd[1100]: rpc.cmsd/udp: Added service, server /usr/dt/bin/rpc.cmsd
Nov  6 10:40:51 rx6600-1 inetd[1100]: rpc.ttdbserver/tcp: Added service, server /usr/dt/bin/rpc.ttdbserver
Nov  6 10:40:51 rx6600-1 inetd[1100]: Configuration complete
Nov  6 10:40:53 rx6600-1 EMCPP: emcpAudit: Info: cmd=powermt: restore  (user ID real=0 effective=0)
Nov  6 10:40:53 rx6600-1 EMCPP: emcpAudit: Info: cmd=powermt: config  (user ID real=0 effective=0)
Nov  6 10:40:53 rx6600-1 EMCPP: emcpAudit: Info: cmd=powermt: save  (user ID real=0 effective=0)
Nov  6 10:40:54 rx6600-1 su: + tty?? root-sfmdb
Nov  6 10:41:06 rx6600-1 cimserver[1706]: starting
Nov  6 10:41:29 rx6600-1 cimserver[1707]: PGS10026:  THE CIM SERVER IS LISTENING ON HTTPS PORT 5,989.
Nov  6 10:41:29 rx6600-1 cimserver[1707]: PGS10028: THE CIM SERVER IS LISTENING ON THE LOCAL CONNECTION SOCKET.
Nov  6 10:41:29 rx6600-1 cimserver[1707]: PGS10030:  STARTED HP-UX WBEM Services VERSION A.02.07.
Nov  6 10:41:32 rx6600-1 FontServer[1755]: Warning: Bad font path element: "/usr/lib/X11/fonts/hp_japanese/100dpi/"
Nov  6 10:41:32 rx6600-1 FontServer[1755]: Warning: Bad font path element: "/usr/lib/X11/fonts/hp_japanese/75dpi/"
Nov  6 10:41:32 rx6600-1 FontServer[1755]: Warning: Bad font path element: "/usr/lib/X11/fonts/hp_korean/75dpi/"
Nov  6 10:41:32 rx6600-1 FontServer[1755]: Warning: Cannot initialize font path element: "/usr/lib/X11/fonts/hp_chinese_t/75dpi/"
Nov  6 10:41:32 rx6600-1 FontServer[1755]: Warning: Bad font path element: "/usr/lib/X11/fonts/ttfjpn.st"
Nov  6 10:41:32 rx6600-1 FontServer[1755]: Warning: Bad font path element: "/usr/lib/X11/fonts/ifojpn.st"
Nov  6 10:41:34 rx6600-1 pwgrd: Started at Thu Nov  6 10:41:34 2014, pid = 1798
Nov  6 10:41:34 rx6600-1 diagmond[1833]: started
Nov  6 10:41:34 rx6600-1 /usr/sbin/envd[1837]: VXPBFt6/, 2"6A3vEdVCND< ~
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2180]: Setting STREAMS-HEAD high water value to 131072.
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2185]: nfsd do_one mpctl succeeded: ncpus = 8.
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2185]: nfsd do_one pmap 2
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2185]: nfsd do_one pmap 3
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2190]: nfsd do_one bind 0
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2191]: nfsd do_one bind 1
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2192]: nfsd do_one bind 2
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2193]: nfsd do_one bind 3
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2194]: nfsd do_one bind 4
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2195]: nfsd do_one bind 5
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2185]: nfsd do_one bind 7
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2195]: Return from t_optmgmt(XTI_DISTRIBUTE) 0
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2195]: nfsd 5 1  sock 4
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2197]: nfsd 5 0  sock 4
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2193]: Return from t_optmgmt(XTI_DISTRIBUTE) 0
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2192]: Return from t_optmgmt(XTI_DISTRIBUTE) 0
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2193]: nfsd 3 1  sock 4
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2192]: nfsd 2 1  sock 4
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2200]: nfsd 2 0  sock 4
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2191]: Return from t_optmgmt(XTI_DISTRIBUTE) 0
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2194]: Return from t_optmgmt(XTI_DISTRIBUTE) 0
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2191]: nfsd 1 1  sock 4
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2201]: nfsd 1 0  sock 4
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2199]: nfsd 3 0  sock 4
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2194]: nfsd 4 1  sock 4
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2202]: nfsd 4 0  sock 4
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2185]: Return from t_optmgmt(XTI_DISTRIBUTE) 0
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2185]: nfsd 7 1  sock 4
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2219]: nfsd 7 0  sock 4
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2196]: nfsd do_one bind 6
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2190]: Return from t_optmgmt(XTI_DISTRIBUTE) 0
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2196]: Return from t_optmgmt(XTI_DISTRIBUTE) 0
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2190]: nfsd 0 1  sock 4
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2220]: nfsd 0 0  sock 4
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2196]: nfsd 6 1  sock 4
Nov  6 10:41:50 rx6600-1 /usr/sbin/nfsd[2221]: nfsd 6 0  sock 4
Nov  6 10:41:53 rx6600-1 krsd[2300]: Delay time is 300 seconds
Nov  6 10:41:53 rx6600-1 sfd[2301]: daemon already running.
Nov  6 10:41:54 rx6600-1 sfd[2314]: starting the daemon.
Nov  6 10:41:54 rx6600-1 emcp_mond: PP daemon: Info: New event pair [0] (2,4,60)
Nov  6 10:41:54 rx6600-1 emcp_mond: PP daemon: Info: New event pair [1] (20,40,300)
Nov  6 10:41:54 rx6600-1 emcp_mond: PP daemon: Info: SetLogMask:: EventLogMask set to 0x66 
Nov  6 10:41:54 rx6600-1 emcp_mond: PP daemon: Info: Using hostname localhost community public debug 0
Nov  6 10:41:54 rx6600-1 emcp_mond: PP daemon: Info: Daemon created successfully.  Starting it now
Nov  6 10:41:54 rx6600-1 emcp_mond: PP daemon: Info: SNMP trap processing disabled.
Nov  6 10:41:54 rx6600-1 emcp_mond: PP daemon: Info: PP Remote Management disabled.
Nov  6 10:45:17 rx6600-1 vmunix: emcp:Mpx:Info: PowerPath Auto Host Registration on VNX-FCN00125000137 is unavailable: incompatible initiator information received from the array
Nov  6 10:45:42 rx6600-1 /usr/sbin/envd[1837]: ***** 9} HH AY =g >/ 8f *****
Nov  6 10:45:42 rx6600-1 /usr/sbin/envd[1837]: NB6H3,9}U}3#9$WwAY=gV5, P^U}9}HHLu< ~!#
Nov  6 10:45:42 rx6600-1 EMS [2970]: ------ EMS Event Notification ------   Value: "MAJORWARNING (3)" for Resource: "/system/events/ia64_corehw/core_hw"     (Threshold:  >= " 3")   
 Execute the following command to obtain event details:   /opt/resmon/bin/resdata -R 194641922 -r /system/events/ia64_corehw/core_hw -n 194641921 -a 
Nov  6 10:49:14 rx6600-1 EMS [2928]: ------ EMS Event Notification ------   Value: "CRITICAL (5)" for Resource: "/system/events/ipmi_fpl/ipmi_fpl"     (Threshold:  >= " 3")    
Execute the following command to obtain event details:   /opt/resmon/bin/resdata -R 191889410 -r /system/events/ipmi_fpl/ipmi_fpl -n 191889409 -a 
Nov  6 18:48:12 rx6600-1 EMS [2970]: ------ EMS Event Notification ------   Value: "CRITICAL (5)" for Resource: "/system/events/ia64_corehw/core_hw"     (Threshold:  >= " 3")    
Execute the following command to obtain event details:   /opt/resmon/bin/resdata -R 194641922 -r /system/events/ia64_corehw/core_hw -n 194641922 -a 
Nov  6 19:00:00 rx6600-1 su: + tty?? root-oracle
Nov  7 08:00:00 rx6600-1 su: + tty?? root-root

从如下信息看到服务器已经出问题了，且信息已经指出可以执行
/opt/resmon/bin/resdata -R 194641922 -r /system/events/ia64_corehw/core_hw -n 194641921 -a 命令来查看详细信息

Nov  6 10:41:54 rx6600-1 emcp_mond: PP daemon: Info: SNMP trap processing disabled.
Nov  6 10:41:54 rx6600-1 emcp_mond: PP daemon: Info: PP Remote Management disabled.
Nov  6 10:45:17 rx6600-1 vmunix: emcp:Mpx:Info: PowerPath Auto Host Registration on VNX-FCN00125000137 is unavailable: incompatible initiator information received from the array
Nov  6 10:45:42 rx6600-1 /usr/sbin/envd[1837]: ***** 9} HH AY =g >/ 8f *****
Nov  6 10:45:42 rx6600-1 /usr/sbin/envd[1837]: NB6H3,9}U}3#9$WwAY=gV5, P^U}9}HHLu< ~!#
Nov  6 10:45:42 rx6600-1 EMS [2970]: ------ EMS Event Notification ------   Value: "MAJORWARNING (3)" for Resource: "/system/events/ia64_corehw/core_hw"     (Threshold:  >= " 3")   
Execute the following command to obtain event details:   /opt/resmon/bin/resdata -R 194641922 -r /system/events/ia64_corehw/core_hw -n 194641921 -a

执行/opt/resmon/bin/resdata -R 194641922 -r /system/events/ia64_corehw/core_hw -n 194641921 -a 命令来查看详细信息

rx6600-1:[/]#/opt/resmon/bin/resdata -R 194641922 -r /system/events/ia64_corehw/core_hw -n 194641921 -a 

ARCHIVED MONITOR DATA:

Event Time..........: Thu Nov  6 10:45:42 2014
Severity............: MAJORWARNING
Monitor.............: ia64_corehw
Event #.............: 101011              
System..............: rx6600-1

Summary:
     System temperature is out of normal range. 


Description of Error:

     The system temperature is not within normal operating range. It is higher
     than required operating range.

这个错误描述是说系统的温度超出了正常范围，下面信息说明了可能的原因

Probable Cause / Recommended Action:

     Something may be blocking the cooling intakes of the fans. Check for
     obstruction.
     One or more fans may be operating at lower speed than normal. Check the
     fan performance.

     Check for problems with the room air conditioning.

     If the problem is not fixed, the operating temperature may become
     non-recoverable, in which case there are chances that the hardware may be
     damaged.  At that temperature level, on Integrity servers, the firmware
     will shutdown the system automatically. However on HP 9000 servers, the
     action specified in the envd config file will be taken - which may be to
     shutdown the system automatically.

     For information on the sensor that generated this event, refer to FRU ID
     in Event Details section.

上面的信息是说，可能需要清理一下风机，或者风机性能出现问题，或者检查空调情况，如果不是这些原因造成那么可能是硬件出现问题了。下面的论断事件的数据:

Additional Event Data: 
     System IP Address...: 10.138.129.5
     Event Id............: 0x545ae0d600000000
     Monitor Version.....: B.01.00
     Event Class.........: System
     Client Configuration File...........:
     /var/stm/config/tools/monitor/default_ia64_corehw.clcfg 
     Client Configuration File Version...: A.01.00 
          Qualification criteria met.
               Number of events..: 1 
     Associated OS error log entry id(s): 
          None
     Additional System Data:
          System Model Number.............: ia64 hp server rx6600 
          EMS Version.....................: A.04.20 
          STM Version.....................: C.58.00 
          System Serial Number............: SGH48045VY 
     Latest information on this event:
          http://docs.hp.com/hpux/content/hardware/ems/ia64_corehw.htm#101011

v-v-v-v-v-v-v-v-v-v-v-v-v    D  E  T  A  I  L  S    v-v-v-v-v-v-v-v-v-v-v-v-v


Event Details :

     Event Date .............: Thu Nov  6 10:44:08 2014
     Sensor Number ..........: 0xdb
     Sensor Type ............: Temperature
     Sensor Class ...........: Threshold based
     Sensor Reading/Offset...: 0x07 (Offset)
     Event  Type.............: Assertion
     Entity ID ..............: 3
     Generic Message.........: 
       Temperature :  Upper non-critical - going high 
     Entity FRU Id Info......: 
       processor (Sensor ID: Processor 2)

从上面的Event Details信息可以看到，传感器类型是温度方面的问题，传感器类别是基于阈值，事件类型是断言，是说2号cpu的温度已经超过了阈值.经过检查不是机房空调，通风口堵塞问题，需要联系小机厂商来进行一步检查是什么原因造成cpu温度超过阈值，平时cpu使用率只有10%。

hp-ux利用rman将数据库跨平台迁移到aix平台上

Posted on 2014年10月1日 by YongJing(敬勇)

这次因为要迁移的数据库大小并不大但也不是只有几十G所以选择使用rman将生产数据库从hp-ux迁移到aix上，数据库是10.2.0.4
1.操作环境
1.1 源生产环境
数据库名 CAIWU
实例名 CAIWU
DBID 338270341
数据库 HP-UX 10.2.0.4.0 – 64bit Production
数据文件目录 /cwjc_data/CAIWU
文本初始化参数文件:/oracle/product/10.2.0/dbs/init/initCAIWU.ora

1.2 目标生产环境

数据库名 CAIWU
实例名 CAIWU
DBID 338270341
数据库 HP-UX 10.2.0.4.0 – 64bit Production
数据文件目录 /yl_oradata/CAIWU
文本初始化参数文件:/oracle/product/10.2.0/db_1/dbs/init/initCAIWU.ora

2 备份
2.1 备份注意事项
1. 迁移库只能在同字节顺序平台之间(即ENDIAN_FORMAT相同)且得是ORACLE所支持的平台，可以在V$TRANSPORTABLE_PLATFORM中查看。
2. 传输前需要将源库置为只读。
3. 生成的2个脚本文件要仔细根据需要修改。

2.2 源库操作
2.2.1 将库只读，并检查是否支持迁移
一．将库只读，并检查是否支持迁移

[rx6600-2:oracle:/cwjc_data]$sqlplus / as sysdba

SQL*Plus: Release 10.2.0.4.0 - Production on Mon Sep 29 22:34:24 2014

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> shutdown immediate
Database closed.
Database dismounted.

SQL> startup mount
ORACLE instance started.

Total System Global Area 3439329280 bytes
Fixed Size                  2060008 bytes
Variable Size            2348810520 bytes
Database Buffers         1073741824 bytes
Redo Buffers               14716928 bytes
Database mounted.
SQL> alter database open read only;

Database altered.

2.2.2 检查支持迁移平台

SQL> select * from v$transportable_platform;

PLATFORM_ID PLATFORM_NAME                          ENDIAN_FORMAT
----------- -------------------------------------- --------------
          1 Solaris[tm] OE (32-bit)                Big
          2 Solaris[tm] OE (64-bit)                Big
          7 Microsoft Windows IA (32-bit)          Little
         10 Linux IA (32-bit)                      Little
          6 AIX-Based Systems (64-bit)             Big
          3 HP-UX (64-bit)                         Big
          5 HP Tru64 UNIX                          Little
          4 HP-UX IA (64-bit)                      Big
         11 Linux IA (64-bit)                      Little
         15 HP Open VMS                            Little
          8 Microsoft Windows IA (64-bit)          Little
          9 IBM zSeries Based Linux                Big
         13 Linux x86 64-bit                       Little
         16 Apple Mac OS                           Big
         12 Microsoft Windows x86 64-bit           Little
         17 Solaris Operating System (x86)         Little
         18 IBM Power Based Linux                  Big
         20 Solaris Operating System (x86-64)      Little
         19 HP IA Open VMS                         Little

19 rows selected.

2.2.3 利用dbms_tdb 包检查数据库
二．利用dbms_tdb 包检查数据库能否被传输以及列出外部表和DIRECTORY 等无法传输的对象信息，
分别在hp-ux平台数据库通过SQLPLUS执行以下两个存储过程

SQL> set serveroutput on
SQL> declare
  2   db_ready boolean;
  3   begin
  4   /* db_ready is ignored, but with SERVEROUTPUT set to ON any
  5   * conditions preventing transport will be output to console */
  6   db_ready := dbms_tdb.check_db('AIX-Based Systems (64-bit)',
  7   dbms_tdb.skip_none);
  8   end;
  9   /

PL/SQL procedure successfully completed.

SQL>  declare
  2   external boolean;
  3   begin
  4   /* value of external is ignored, but with SERVEROUTPUT set to ON
  5   * dbms_tdb.check_external displays report of external objects
  6   * on console */
  7   external := dbms_tdb.check_external;
  8  end;
  9  /
The following directories exist in the database:
SYS.DUMP_CAIWU, SYS.DATA_PUMP_DIR, SYS.WORK_DIR, SYS.ADMIN_DIR, SYS.ORACLE_OCM_CONFIG_DIR

PL/SQL procedure successfully completed.

2.2.4 RMAN ConvertDB
连接target database 查看数据库打开模式

[rx6600-2:oracle:/home/oracle]$sqlplus / as sysdba

SQL*Plus: Release 10.2.0.4.0 - Production on Mon Sep 29 22:46:30 2014

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> select open_mode from v$database;

OPEN_MODE
----------
READ ONLY

利用RMAN转换数据库

RMAN> convert database new database 'CAIWU'
2> transport script '/cwjc_data/CAIWURMAN/caiwu.sql'
3> to platform 'AIX-Based Systems (64-bit)'
4> db_file_name_convert '/cwjc_data/CAIWU/','/cwjc_data/CAIWURMAN/';

Starting convert at 29-SEP-14
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=539 devtype=DISK

Directory SYS.DUMP_CAIWU found in the database
Directory SYS.DATA_PUMP_DIR found in the database
Directory SYS.WORK_DIR found in the database
Directory SYS.ADMIN_DIR found in the database
Directory SYS.ORACLE_OCM_CONFIG_DIR found in the database

User SYS with SYSDBA and SYSOPER privilege found in password file
channel ORA_DISK_1: starting datafile conversion
input datafile fno=00005 name=/cwjc_data/CAIWU/zwdata01.dbf
converted datafile=/cwjc_data/CAIWURMAN/zwdata01.dbf
channel ORA_DISK_1: datafile conversion complete, elapsed time: 00:06:25
channel ORA_DISK_1: starting datafile conversion
input datafile fno=00006 name=/cwjc_data/CAIWU/zwdata02.dbf
converted datafile=/cwjc_data/CAIWURMAN/zwdata02.dbf
channel ORA_DISK_1: datafile conversion complete, elapsed time: 00:06:25
channel ORA_DISK_1: starting datafile conversion
input datafile fno=00001 name=/cwjc_data/CAIWU/system01.dbf
converted datafile=/cwjc_data/CAIWURMAN/system01.dbf
channel ORA_DISK_1: datafile conversion complete, elapsed time: 00:03:25
channel ORA_DISK_1: starting datafile conversion
input datafile fno=00002 name=/cwjc_data/CAIWU/undotbs01.dbf
converted datafile=/cwjc_data/CAIWURMAN/undotbs01.dbf
channel ORA_DISK_1: datafile conversion complete, elapsed time: 00:01:45
channel ORA_DISK_1: starting datafile conversion
input datafile fno=00003 name=/cwjc_data/CAIWU/sysaux01.dbf
converted datafile=/cwjc_data/CAIWURMAN/sysaux01.dbf
channel ORA_DISK_1: datafile conversion complete, elapsed time: 00:01:45
channel ORA_DISK_1: starting datafile conversion
input datafile fno=00004 name=/cwjc_data/CAIWU/users01.dbf
converted datafile=/cwjc_data/CAIWURMAN/users01.dbf
channel ORA_DISK_1: datafile conversion complete, elapsed time: 00:00:55
Run SQL script /cwjc_data/CAIWURMAN/caiwu.sql on the target platform to create database
Edit init.ora file /oracle/product/10.2.0/dbs/init_00pjoiqb_1_0.ora. This PFILE will be used to create the database on the target platform
To recompile all PL/SQL modules, run utlirp.sql and utlrp.sql on the target platform
To change the internal database identifier, use DBNEWID Utility
Finished backup at 29-SEP-14

将转换后的文件传输到目标主机上

[rx6600-2:oracle:/cwjc_data/CAIWURMAN]$ls -lrt
total 53002438
-rw-r-----   1 oracle     dba        8589942784 Sep 29 22:57 zwdata01.dbf
-rw-r-----   1 oracle     dba        8589942784 Sep 29 23:03 zwdata02.dbf
-rw-r-----   1 oracle     dba        4588576768 Sep 29 23:07 system01.dbf
-rw-r-----   1 oracle     dba        2147491840 Sep 29 23:08 undotbs01.dbf
-rw-r-----   1 oracle     dba        2147491840 Sep 29 23:10 sysaux01.dbf
-rw-r-----   1 oracle     dba        1073750016 Sep 29 23:11 users01.dbf
-rw-r--r--   1 oracle     dba           2704 Sep 29 23:11 caiwu.sql

ftp> get zwdata01.dbf
200 PORT command successful.
150 Opening BINARY mode data connection for zwdata01.dbf (8589942784 bytes).
226 Transfer complete.
8589942784 bytes received in 197.5 seconds (4.248e+04 Kbytes/s)
local: zwdata01.dbf remote: zwdata01.dbf
ftp> get zwdata02.dbf
200 PORT command successful.
150 Opening BINARY mode data connection for zwdata02.dbf (8589942784 bytes).
226 Transfer complete.
8589942784 bytes received in 202.7 seconds (4.138e+04 Kbytes/s)
local: zwdata02.dbf remote: zwdata02.dbf
ftp> get system01.dbf
200 PORT command successful.
150 Opening BINARY mode data connection for system01.dbf (4588576768 bytes).
226 Transfer complete.
4588576768 bytes received in 111.3 seconds (4.027e+04 Kbytes/s)
local: system01.dbf remote: system01.dbf
ftp> get undotbs01.dbf
200 PORT command successful.
150 Opening BINARY mode data connection for undotbs01.dbf (2147491840 bytes).
226 Transfer complete.
2147491840 bytes received in 52.55 seconds (3.99e+04 Kbytes/s)
local: undotbs01.dbf remote: undotbs01.dbf
ftp> get sysaux01.dbf
200 PORT command successful.
150 Opening BINARY mode data connection for sysaux01.dbf (2147491840 bytes).
226 Transfer complete.
2147491840 bytes received in 51.29 seconds (4.089e+04 Kbytes/s)
local: sysaux01.dbf remote: sysaux01.dbf
ftp> get users01.dbf
200 PORT command successful.
150 Opening BINARY mode data connection for users01.dbf (1073750016 bytes).
226 Transfer complete.
1073750016 bytes received in 26.01 seconds (4.032e+04 Kbytes/s)
local: users01.dbf remote: users01.dbf
ftp> get caiwu.sql
200 PORT command successful.
150 Opening BINARY mode data connection for caiwu.sql (2704 bytes).
226 Transfer complete.
2704 bytes received in 9e-05 seconds (2.934e+04 Kbytes/s)
local: caiwu.sql remote: caiwu.sql
ftp> get init_00pjoiqb_1_0.ora
200 PORT command successful.
150 Opening BINARY mode data connection for init_00pjoiqb_1_0.ora (1556 bytes).
226 Transfer complete.
1556 bytes received in 8.7e-05 seconds (1.747e+04 Kbytes/s)
local: init_00pjoiqb_1_0.ora remote: init_00pjoiqb_1_0.ora


[IBMP740-2:oracle:/yl_oradata/CAIWU]$ls -lrt
total 53002408
-rw-r--r--    1 oracle   dba      8589942784 Sep 29 23:02 zwdata01.dbf
-rw-r--r--    1 oracle   dba      8589942784 Sep 29 23:06 zwdata02.dbf
-rw-r--r--    1 oracle   dba      4588576768 Sep 29 23:08 system01.dbf
-rw-r--r--    1 oracle   dba      2147491840 Sep 29 23:09 undotbs01.dbf
-rw-r--r--    1 oracle   dba      2147491840 Sep 29 23:11 sysaux01.dbf
-rw-r--r--    1 oracle   dba      1073750016 Sep 29 23:11 users01.dbf
-rw-r--r--    1 oracle   dba            2704 Sep 29 23:12 caiwu.sql
-rw-r--r--    1 oracle   dba            1556 Sep 29 23:12 init_00pjoiqb_1_0.ora

将init_00pjoiqb_1_0.ora参数文件复制到$ORACLE_HOME/dbs目录下

[IBMP740-2:oracle:/yl_oradata/CAIWU]$ cp init_00pjoiqb_1_0.ora $ORACLE_HOME/dbs

2.2.5 在目标生产机器上修改pfile参数文件并将参数文件init_00pjoiqb_1_0.ora重命名为initCAIWU.ora

IBMP740-2:oracle:/oracle/product/10.2.0/db_1/dbs]$mv init_00pjoiqb_1_0.ora initCAIWU.ora

修改前参数文件内容如下:

[IBMP740-2:oracle:/oracle/product/10.2.0/db_1/dbs]$vi initCAIWU.ora
"initCAIWU.ora" 68 lines, 1556 characters 

  __shared_pool_size       = 2164260864

  __large_pool_size        = 16777216

  __java_pool_size         = 16777216

  __streams_pool_size      = 150994944

  __db_cache_size          = 1073741824
# Please change the values of the following parameters:

  control_files            ="/yl_oradata/CAIWU/control01.ctl","/yl_oradata/CAIWU/control02.ctl","/yl_oradata/CAIWU/control03.ctl"

  db_recovery_file_dest    = "/oracle/flash_recovery_area"

  db_recovery_file_dest_size= 2147483648

  background_dump_dest     = "/oracle/admin/CAIWU/bdump"

  user_dump_dest           = "/oracle/admin/CAIWU/udump"

  core_dump_dest           = "/oracle/admin/CAIWU/cdump"

  audit_file_dest          = "/oracle/admin/CAIWU/adump"

  db_name                  = "CAIWU"



# Please review the values of the following parameters:

  remote_login_passwordfile= "EXCLUSIVE"

  db_domain                = ""

  dispatchers              = "(PROTOCOL=TCP) (SERVICE=CAIWUXDB)"



# The values of the following parameters are from source database:

  processes                = 500

  sessions                 = 555

  sga_target               = 3439329280

  db_block_size            = 8192

  compatible               = "10.2.0.3.0"

# log_archive_dest_1       = "LOCATION=/yl_oradata/arch/CAIWU/"

  log_archive_format       = "%t_%s_%r.dbf"

  db_file_multiblock_read_count= 16

  undo_management          = "AUTO"

  undo_tablespace          = "UNDOTBS1"

  job_queue_processes      = 10

  open_cursors             = 300

  pga_aggregate_target     = 858783744

修改后参数文件内容如下:

[IBMP740-2:oracle:/oracle/product/10.2.0/db_1/dbs]$cat initCAIWU.ora    
# Please change the values of the following parameters:

  control_files            ="/yl_oradata/CAIWU/control01.ctl","/yl_oradata/CAIWU/control02.ctl","/yl_oradata/CAIWU/control03.ctl" 

  db_recovery_file_dest    = "/oracle/flash_recovery_area"

  db_recovery_file_dest_size= 2147483648

  background_dump_dest     = "/oracle/admin/CAIWU/bdump"

  user_dump_dest           = "/oracle/admin/CAIWU/udump"

  core_dump_dest           = "/oracle/admin/CAIWU/cdump"

  audit_file_dest          = "/oracle/admin/CAIWU/adump"

  db_name                  = "CAIWU"



# Please review the values of the following parameters:

  remote_login_passwordfile= "EXCLUSIVE"

  db_domain                = ""

  dispatchers              = "(PROTOCOL=TCP) (SERVICE=CAIWUXDB)"



# The values of the following parameters are from source database:

  processes                = 500

  sessions                 = 555

  sga_target               = 3439329280

  db_block_size            = 8192

  compatible               = "10.2.0.3.0"

# log_archive_dest_1       = "LOCATION=/yl_oradata/arch/CAIWU/"

  log_archive_format       = "%t_%s_%r.dbf"

  db_file_multiblock_read_count= 16

  undo_management          = "AUTO"

  undo_tablespace          = "UNDOTBS1"

  job_queue_processes      = 10

  open_cursors             = 300

  pga_aggregate_target     = 858783744

2.2.6 修改重建控制文件的脚本caiwu.sql
修改后的caiwu.sql内容如下:

[IBMP740-2:oracle:/yl_oradata/CAIWU]$cat caiwu.sql
-- The following commands will create a new control file and use it
-- to open the database.
-- Data used by Recovery Manager will be lost.
-- The contents of online logs will be lost and all backups will
-- be invalidated. Use this only if online logs are damaged.

-- After mounting the created controlfile, the following SQL
-- statement will place the database in the appropriate
-- protection mode:
--  ALTER DATABASE SET STANDBY DATABASE TO MAXIMIZE PERFORMANCE

STARTUP NOMOUNT PFILE='/oracle/product/10.2.0/db_1/dbs/initCAIWU.ora'
CREATE CONTROLFILE REUSE SET DATABASE "CAIWU" RESETLOGS  ARCHIVELOG
    MAXLOGFILES 16
    MAXLOGMEMBERS 3
    MAXDATAFILES 100
    MAXINSTANCES 8
    MAXLOGHISTORY 292
LOGFILE
  GROUP 1 '/yl_oradata/CAIWU/redo01.log'  SIZE 512M,
  GROUP 2 '/yl_oradata/CAIWU/redo02.log'  SIZE 512M,
  GROUP 3 '/yl_oradata/CAIWU/redo03.log'  SIZE 512M
DATAFILE
  '/yl_oradata/CAIWU/system01.dbf',
  '/yl_oradata/CAIWU/undotbs01.dbf',
  '/yl_oradata/CAIWU/sysaux01.dbf',
  '/yl_oradata/CAIWU/users01.dbf',
  '/yl_oradata/CAIWU/zwdata01.dbf',
  '/yl_oradata/CAIWU/zwdata02.dbf'
CHARACTER SET ZHS16GBK
;

-- Database can now be opened zeroing the online logs.
ALTER DATABASE OPEN RESETLOGS;

-- Commands to add tempfiles to temporary tablespaces.
-- Online tempfiles have complete space information.
-- Other tempfiles may require adjustment.
ALTER TABLESPACE TEMP ADD TEMPFILE '/yl_oradata/CAIWU/temp01.dbf'
     SIZE 2048M AUTOEXTEND ON NEXT 655360  MAXSIZE 32767M;
-- End of tempfile additions.
--

set echo off
prompt ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
prompt * Your database has been created successfully!
prompt * There are many things to think about for the new database. Here
prompt * is a checklist to help you stay on track:
prompt * 1. You may want to redefine the location of the directory objects.
prompt * 2. You may want to change the internal database identifier (DBID) 
prompt *    or the global database name for this database. Use the 
prompt *    NEWDBID Utility (nid).
prompt ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

SHUTDOWN IMMEDIATE 
STARTUP UPGRADE PFILE='/oracle/product/10.2.0/db_1/dbs/initCAIWU.ora'
@$ORACLE_HOME/rdbms/admin/utlirp.sql 
SHUTDOWN IMMEDIATE 
STARTUP PFILE='/oracle/product/10.2.0/db_1/dbs/initCAIWU.ora'
-- The following step will recompile all PL/SQL modules.
-- It may take serveral hours to complete.
@$ORACLE_HOME/rdbms/admin/utlrp.sql 
set feedback 6;

2.3 转换数据库
2.3.1 创建实例CAIWU执行转换脚本

SQL> STARTUP NOMOUNT PFILE='/oracle/product/10.2.0/db_1/dbs/initCAIWU.ora'
ORACLE instance started.

Total System Global Area 3439329280 bytes
Fixed Size                  2087840 bytes
Variable Size             671089760 bytes
Database Buffers         2717908992 bytes
Redo Buffers               48242688 bytes
SQL> CREATE CONTROLFILE REUSE SET DATABASE "CAIWU" RESETLOGS  ARCHIVELOG
  2      MAXLOGFILES 16
  3      MAXLOGMEMBERS 3
  4      MAXDATAFILES 100
  5      MAXINSTANCES 8
  6      MAXLOGHISTORY 292
  7  LOGFILE
  8    GROUP 1 '/yl_oradata/CAIWU/redo01.log'  SIZE 512M,
  9    GROUP 2 '/yl_oradata/CAIWU/redo02.log'  SIZE 512M,
 10    GROUP 3 '/yl_oradata/CAIWU/redo03.log'  SIZE 512M
 11  DATAFILE
 12    '/yl_oradata/CAIWU/system01.dbf',
 13    '/yl_oradata/CAIWU/undotbs01.dbf',
 14    '/yl_oradata/CAIWU/sysaux01.dbf',
 15    '/yl_oradata/CAIWU/users01.dbf',
 16    '/yl_oradata/CAIWU/zwdata01.dbf',
 17    '/yl_oradata/CAIWU/zwdata02.dbf'
 18  CHARACTER SET ZHS16GBK
 19  ;

Control file created.

SQL> ALTER DATABASE OPEN RESETLOGS;

Database altered.

SQL> ALTER TABLESPACE TEMP ADD TEMPFILE '/yl_oradata/CAIWU/temp01.dbf'
  2       SIZE 2048M AUTOEXTEND ON NEXT 655360  MAXSIZE 32767M;

Tablespace altered.

SQL> SHUTDOWN IMMEDIATE 
Database closed.
Database dismounted.
ORACLE instance shut down.

STARTUP UPGRADE PFILE='/oracle/product/10.2.0/db_1/dbs/initCAIWU.ora'
SQL>@$ORACLE_HOME/rdbms/admin/utlirp.sql

.....

SQL> 
SQL> Rem Continue even if there are SQL errors
SQL> WHENEVER SQLERROR CONTINUE;
SQL> 
SQL> Rem ===========================================================================
SQL> Rem END utlip.sql
SQL> Rem ===========================================================================
SQL> 
SQL> DOC
DOC>#######################################################################
DOC>#######################################################################
DOC>   utlirp.sql completed successfully. All PL/SQL objects in the
DOC>   database have been invalidated.
DOC>
DOC>   Shut down and restart the database in normal mode and run utlrp.sql to
DOC>   recompile invalid objects.
DOC>#######################################################################
DOC>#######################################################################
DOC># 

SQL> SHUTDOWN IMMEDIATE
Database closed.
Database dismounted.
ORACLE instance shut down

SQL> STARTUP PFILE='/oracle/product/10.2.0/db_1/dbs/initCAIWU.ora'
ORACLE instance started.

Total System Global Area 3439329280 bytes
Fixed Size                  2087840 bytes
Variable Size             671089760 bytes
Database Buffers         2717908992 bytes
Redo Buffers               48242688 bytes
Database mounted.
Database opened.


SQL>@$ORACLE_HOME/rdbms/admin/utlrp.sql
.....



SQL> 
SQL> 
SQL> Rem =====================================================================
SQL> Rem Run component validation procedure
SQL> Rem =====================================================================
SQL> 
SQL> SET serveroutput on
SQL> EXECUTE dbms_registry_sys.validate_components;

PL/SQL procedure successfully completed.

SQL> SET serveroutput off
SQL> 
SQL> 
SQL> Rem ===========================================================================
SQL> Rem END utlrp.sql
SQL> Rem ===========================================================================
SQL>

2.3.2 用system用户和密码进行登录来验证

SQL> conn system/ufgov
Connected.

可以登录
2.3.3 表空间验证，查询迁移后的表空间及文件

SQL> select tablespace_name,file_id from dba_data_files;

TABLESPACE_NAME                   FILE_ID
------------------------------ ----------
ZWDATA                                  6
ZWDATA                                  5
USERS                                   4
SYSAUX                                  3
UNDOTBS1                                2
SYSTEM                                  1

6 rows selected.

2.3.4 对数据库进行DML操作验证

SQL> create table test (i int);

Table created.

SQL> insert into test values (1);

1 row created.

SQL> select * from test;

         I
----------
         1

SQL> delete from test;

1 row deleted.

SQL> drop table test;

Table dropped.

2.3.5 切换日志测试

SQL> alter system switch logfile;

System altered.

小结
1. 确保源库与目标数据库的字符集一致
2. 要确保源库与目标数据库的字节编码相同
3. 数据库迁移后要把无效的数据库对象重新编译，确保正确执行utlrp.sql,utlirp.sql
4. 数据库迁移后，确保应用系统用户可以正确连接，登陆有效
5. 监听要重新配置，检查临时表空间的属主与可用性
6. 迁移完成后要立即进行一次全备份

查询执行hash join出现ora-040300 (QERHJ hash-joi,kllcqas:kllsltba)的诊断案例

Posted on 2014年9月1日 by YongJing(敬勇)

某公安系统开发人员在plsql中执行查询时报ORA-04030: 在尝试分配 123416 字节 (QERHJ hash-joi,kllcqas:kllsltba) 时进程内存不足。数据库是aix上的rac 11.2.0.4。当时朋友找到我，我当时给他的建议是直接把报错的查询直接在数据库服务器上执行看是否报错，因为在服务器上用sqlplus来执行是没有经过监听程序的，但后面给我的答复是在服务器上执行也报错。所以就远程连接来处理。

查看alert.log文件发现如下错误信息：

Errors in file /u01/app/oracle/diag/rdbms/test/test/incident/incdir_156708/test_ora_15859922_i156708.trc

ORA-04030: 在尝试分配 123416 字节 (QERHJ hash-joi,kllcqas:kllsltba) 时进程内存不足

从alert.log文件查看到的非缺省值参数信息如下：

System parameters with non-default values:
  processes                = 1500
  sessions                 = 2272
  _realfree_heap_pagesize_hint= 256K
  _use_realfree_heap       = TRUE
  spfile                   = "+DATA_DG/xtrk/spfilextrk.ora"
  sga_target               = 14G
  control_files            = "+DATA_DG/xtrk/controlfile/current.261.851142581"
  control_files            = "+DATA_DG/xtrk/controlfile/current.260.851142581"
  db_block_size            = 8192
  _external_scn_rejection_threshold_hours= 1
  compatible               = "11.2.0.4.0"
  log_archive_format       = "%t_%s_%r.dbf"
  cluster_database         = TRUE
  db_create_file_dest      = "+DATA_DG"
  db_recovery_file_dest    = "+DATA_DG"
  db_recovery_file_dest_size= 400G
  thread                   = 1
  undo_tablespace          = "UNDOTBS1"
  instance_number          = 1
  remote_login_passwordfile= "EXCLUSIVE"
  db_domain                = ""
  dispatchers              = "(PROTOCOL=TCP) (SERVICE=xtrkXDB)"
  remote_listener          = "test-scan:1521"
  audit_file_dest          = "/u01/app/oracle/admin/test/adump"
  audit_trail              = "DB"
  db_name                  = "test"
  open_cursors             = 600
  pga_aggregate_target     = 10G
  diagnostic_dest          = "/u01/app/oracle"

如是查看错误跟踪文件:

Dump continued from file: /u01/app/oracle/diag/rdbms/xtrk/xtrk1/trace/xtrk1_ora_15859922.trc
ORA-04030: 在尝试分配 123416 字节 (QERHJ hash-joi,kllcqas:kllsltba) 时进程内存不足

========= Dump for incident 156708 (ORA 4030) ========
----- Beginning of Customized Incident Dump(s) -----
=======================================
TOP 10 MEMORY USES FOR THIS PROCESS
---------------------------------------

*** 2014-08-27 15:07:09.388
77%   86 MB, 727 chunks: "kllcqas:kllsltba          "  SQL
         QERHJ hash-joi  ds=11160a378  dsprt=110bcb0f8
 7% 8449 KB,  80 chunks: "QERHJ Bit vector          "  SQL
         QERHJ hash-joi  ds=11160a378  dsprt=110bcb0f8
 6% 6771 KB,   4 chunks: "kllcqc:kllcqslt           "  SQL
         QERHJ hash-joi  ds=11160a378  dsprt=110bcb0f8
 5% 5880 KB, 407 chunks: "free memory               "  SQL
         QERHJ hash-joi  ds=11160a378  dsprt=110bcb0f8
 2% 2056 KB,   2 chunks: "kllcqgf:kllsltba          "  SQL
         kxs-heap-w      ds=110bcb0f8  dsprt=1109f0120
 1%  651 KB,  28 chunks: "permanent memory          "  
         pga heap        ds=110005210  dsprt=0
 0%  376 KB,   1 chunk : "kfk_kfkio_freeq           "  
         KFK_IO_SUBHEAP  ds=1108dae38  dsprt=110005210
 0%  336 KB,   1 chunk : "kfkosd_p                  "  
         KFK_IO_SUBHEAP  ds=1108dae38  dsprt=110005210
 0%  259 KB,   8 chunks: "free memory               "  
         pga heap        ds=110005210  dsprt=0
 0%  253 KB,   2 chunks: "free memory               "  
         top call heap   ds=11011f360  dsprt=0

从上面报错进程内存分配信息来看报错的chunks: “kllcqas:kllsltba ” 大小是86M

=======================================
PRIVATE MEMORY SUMMARY FOR THIS PROCESS
---------------------------------------
******************************************************
PRIVATE HEAP SUMMARY DUMP
111 MB total:
   110 MB commented, 652 KB permanent
   606 KB free (0 KB in empty extents),
     109 MB,   1 heap:    "session heap   "            92 KB free held
------------------------------------------------------
Summary of subheaps at depth 1
110 MB total:
   110 MB commented, 96 KB permanent
    94 KB free (0 KB in empty extents),
     108 MB,   1 heap:    "kxs-heap-w     "            84 KB free held
------------------------------------------------------
Summary of subheaps at depth 2
109 MB total:
   108 MB commented, 45 KB permanent
    32 KB free (4 KB in empty extents),
     106 MB,   4 heaps:   "QERHJ hash-joi "            10 KB free held
------------------------------------------------------
Summary of subheaps at depth 3
106 MB total:
   100 MB commented, 41 KB permanent
  5880 KB free (0 KB in empty extents),
      91 MB, 727 chunks:  "kllcqas:kllsltba          " 5257 KB free held
    8786 KB,  80 chunks:  "QERHJ Bit vector          " 336 KB free held

从111 MB total可知出错进程只分配了111M的内存

=========================================
REAL-FREE ALLOCATOR DUMP FOR THIS PROCESS
-----------------------------------------
 
Dump of Real-Free Memory Allocator Heap [0x110959bf0]
mag=0xfefe0001 flg=0x5000003 fds=0x0 blksz=262144
blkdstbl=0x110959c00, iniblk=8192 maxblk=524288 numsegs=15
In-use num=402 siz=116391936, Freeable num=0 siz=0, Free num=0 siz=0
 
==========================================
INSTANCE-WIDE PRIVATE MEMORY USAGE SUMMARY
------------------------------------------
 
Dumping Work Area Table (level=1)
=====================================
 
  Global SGA Info
  ---------------
 
    global target:    10240 MB
    auto target:       8997 MB
    max pga:           2047 MB
    pga limit:         4095 MB
    pga limit known:  0
    pga limit errors:     0
 
    pga inuse:          275 MB
    pga alloc:          343 MB
    pga freeable:        11 MB
    pga freed:         4150 MB
    pga to free:          0 
    broker request:       0
 
    pga auto:            32 MB
    pga manual:           0 MB
 
    pga alloc  (max):  3043 MB
    pga auto   (max):    32 MB
    pga manual (max):     0 MB
 
    # workareas     :     4
    # workareas(max):     5
 
 

================================
PER-PROCESS PRIVATE MEMORY USAGE
--------------------------------
 
Private memory usage per Oracle process

当出现ora-04030时该实例分配的总PGA大小如下

-------------------------
Top 10 processes:
-------------------------
(percentage is of 420 MB total allocated memory)
27% pid 51: 111 MB used of 112 MB allocated  < = CURRENT PROC
 5% pid 13: 18 MB used of 22 MB allocated 
 5% pid 14: 18 MB used of 22 MB allocated 
 5% pid 36: 18 MB used of 22 MB allocated 
 5% pid 38: 18 MB used of 22 MB allocated 
 5% pid 39: 18 MB used of 22 MB allocated 
 5% pid 40: 18 MB used of 22 MB allocated 
 4% pid 10: 16 MB used of 17 MB allocated 
 4% pid 12: 13 MB used of 17 MB allocated 
 4% pid 20: 12 MB used of 15 MB allocated

从上面的percentage is of 420 MB total allocated memory可知这个实例的PGA只分配了420M。从27% pid 51: 111 MB used of 112 MB allocated < = CURRENT PROC可知当前进程分配了112M,使用了111M。而pid=51,那么接下来查看pid=51的详细会话信息:

------------------------------------
Begin session detail for pid 51
  sid: 430 ser: 5 audsid: 400022 user: 239/TC_RKXT
    flags: (0x45) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
    flags2: (0x40009) -/-/INC
  pid: 51 O/S info: user: grid, term: UNKNOWN, ospid: 15859922
    image: oracle@xtrkdb1
  client details:
    O/S info: user: Administrator, term: RKXXWEB, ospid: 4120:2956
    machine: RKXX\RKXXWEB program: plsqldev.exe
    application name: PL/SQL Developer, hash value=1190136663
    action name: SQL Window - New, hash value=3399691616
  current SQL:
  SELECT
 b.master_relation 户关系, b.hu_id_new 户号,a.name 姓名,a.pid 身份证号码,b.when_in_logged 入户时间, b.in_category 入户理由, b.dob,
a.photo_flag 有无相片, substr(ORGNAME,7,10) 所属派出所,
b.quxcun_id,c.name 居委会名称
from t_person a, t_huji b, tc_jcyw.t_org o ,tc_jcyw.t_quxcun c
where a.person_id = b.person_id
      and b.org_id = o.sunit_code
      and b.quxcun_id=c.quxcun_id
      and b.zxbz = '0'
      and a.dob < =to_date( '1996.07.01','yyyy.mm.dd' )
and not Exists(select * from t_pid_accept m where m.iscancel is null and m.pid = a.pid )
End session detail for pid 51
------------------------------------
 
================
SWAP INFORMATION
----------------
swap info: free_mem = 22025.57M rsv = 128.00M 
           alloc = 44.48M avail = 32768.00M swap_free = 32723.52M
----- End of Customized Incident Dump(s) -----

*** 2014-08-27 15:07:09.417
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=1gbsquwh58w4k) -----
SELECT
 b.master_relation 户关系, b.hu_id_new 户号,a.name 姓名,a.pid 身份证号码,b.when_in_logged 入户时间, b.in_category 入户理由, b.dob,
a.photo_flag 有无相片, substr(ORGNAME,7,10) 所属派出所,
b.quxcun_id,c.name 居委会名称
from t_person a, t_huji b, tc_jcyw.t_org o ,tc_jcyw.t_quxcun c
where a.person_id = b.person_id
      and b.org_id = o.sunit_code
      and b.quxcun_id=c.quxcun_id
      and b.zxbz = '0'
      and a.dob <=to_date( '1996.07.01','yyyy.mm.dd' )
and not Exists(select * from t_pid_accept m where m.iscancel is null and m.pid = a.pid )

从上面的信息可知swap大小为32768M,SGA为14G,PGA为10G,当这个实例的总PGA为分配了420M,当这个会话进程只分配111M内存时不应该会报告ora-04030错误.这时我还是亲自在服务器上执行上面的报错查询发现不报错，而开始我要朋友做测试，他是要别人做的，但给我的答案是报错，看来还是得亲自测试。这种情况在MOS有一篇文章（ID 758131.1）就是描述这种情况的并且说在aix下是一个特定的问题。造成这个问题的原因可能是因为操作系统资源限制，说当CRS启动它的资源时是使用root用户可能会出现ora-04030错误，如果是资源通过sqlplus,lsnrctl来启动则没有操作系统资源限制。如果我查看/etc/security/limits文件内容如下：

# cat limits

*
* Sizes are in multiples of 512 byte blocks, CPU time is in seconds
*
* fsize      - soft file size in blocks
* core       - soft core file size in blocks
* cpu        - soft per process CPU time limit in seconds
* data       - soft data segment size in blocks
* stack      - soft stack segment size in blocks
* rss        - soft real memory usage in blocks
* nofiles    - soft file descriptor limit
* fsize_hard - hard file size in blocks
* core_hard  - hard core file size in blocks
* cpu_hard   - hard per process CPU time limit in seconds
* data_hard  - hard data segment size in blocks
* stack_hard - hard stack segment size in blocks
* rss_hard   - hard real memory usage in blocks
* nofiles_hard - hard file descriptor limit
*
* The following table contains the default hard values if the
* hard values are not explicitly defined:
*
*   Attribute        Value
*   ==========    ============
*   fsize_hard    set to fsize
*   cpu_hard      set to cpu
*   core_hard         -1
*   data_hard         -1
*   stack_hard      8388608 
*   rss_hard          -1
*   nofiles_hard      -1
*
* NOTE:  A value of -1 implies "unlimited"
*

default:
        fsize = 2097151
        core = 2097151
        cpu = -1
        data = 262144
#       rss = 65536
        rss = -1
        stack = 65536
        nofiles = 2000

root:
        fsize = -1
        data = -1
        stack = -1
        core = -1
        rss = -1
        nofiles = -1
        stack = -1
        stack_hard = -1
        data_hard = -1
        cpu = -1
        core_hard = -1
        cpu_hard = -1
        fsize_hard = -1

oracle:
        fsize = -1
        data = -1
        stack = -1
        core = -1
        rss = -1
        nofiles = -1
        stack = -1
        stack_hard = -1
        data_hard = -1
        cpu = -1
        core_hard = -1
        cpu_hard = -1
        fsize_hard = -1

发现我朋友已经修改了root用户和oracle用户的权限但为什么还是报ora-04030错误了。原因为在oracle 11g rac安装不像以前的版本，现在安装Grid Infrastructure 和Oracle数据库软件推荐使用不同的用户来安装。通过查看这里用grid用户来安装Grid Infrastructure ,oracle用户来安装的oracle数据库软。下面的limits文件朋友只修改了root,oracle两个用户，但并没有修改grid用户对操作系统资源的使用限制。而在rac中,监听程序是集群软件来启动的也就是由grid用户来管理的。由于修改grid用户对操作系统资源的使用限制由于在limits文件中增加以下内容：

grid:
        fsize = -1
        data = -1
        stack = -1
        core = -1
        rss = -1
        nofiles = -1
        stack = -1
        stack_hard = -1
        data_hard = -1
        cpu = -1
        core_hard = -1
        cpu_hard = -1
        fsize_hard = -1

由于重启rac，在重启之后再执行上面报错语句一切正常了，造成这个问题的根本原因是没有修改oracle,grid用户对操作系统资源的使用限制所造成的，而这个问题应该是可以避免的，因为对这种操作系统资源的使用限制其实在安装rac之前是要做修改的在安装文档中也是有的，所以安装也是需要仔细阅读官方文档的。

Oracle SQL Tuning with Oracle SQLTXPAIN 第一章 (个人翻译）

Posted on 2014年8月14日2014年8月15日 by YongJing(敬勇)

我有一个习惯在看英文资料时喜欢将它翻译成中文，我觉得这是一本介绍SQLTXPLAIN工具的好书，使用SQLTXPLAIN来进行sql优化真的很方便。由于我的水平有限翻译的不好，请大家指正。

Oracle SQL Tuning with Oracle SQLTXPLAIN – 第一章

weblogic 10.3.3 java.lang.IllegalArgumentException: Failed to properly unregister weblogic.work.RequestClassRuntimeMBeanImpl故障解决一例

Posted on 2014年7月16日2014年7月16日 by YongJing(敬勇)

weblogic 10.3.3异常终止服务了检查日志文件发现如下错误信息:

weblogic.management.DeploymentException: 
  at weblogic.application.internal.flow.ApplicationRuntimeMBeanFlow.unprepare(ApplicationRuntimeMBeanFlow.java:64)
  at weblogic.application.internal.BaseDeployment$1.previous(BaseDeployment.java:1233)
  at weblogic.application.utils.StateMachineDriver.previousState(StateMachineDriver.java:167)
  at weblogic.application.utils.StateMachineDriver.previousState(StateMachineDriver.java:159)
  at weblogic.application.internal.BaseDeployment.unprepare(BaseDeployment.java:495)
  at weblogic.application.internal.SingleModuleDeployment.unprepare(SingleModuleDeployment.java:43)
  at weblogic.application.internal.DeploymentStateChecker.unprepare(DeploymentStateChecker.java:205)
  at weblogic.deploy.internal.targetserver.AppContainerInvoker.unprepare(AppContainerInvoker.java:117)
  at weblogic.deploy.internal.targetserver.BasicDeployment.unprepare(BasicDeployment.java:287)
  at weblogic.management.deploy.internal.DeploymentAdapter$1.doUnprepare(DeploymentAdapter.java:81)
  at weblogic.management.deploy.internal.DeploymentAdapter.unprepare(DeploymentAdapter.java:224)
  at weblogic.management.deploy.internal.AppTransition$7.transitionApp(AppTransition.java:75)
  at weblogic.management.deploy.internal.ConfiguredDeployments.transitionApps(ConfiguredDeployments.java:240)
  at weblogic.management.deploy.internal.ConfiguredDeployments.unprepare(ConfiguredDeployments.java:204)
  at weblogic.management.deploy.internal.ConfiguredDeployments.undeploy(ConfiguredDeployments.java:192)
  at weblogic.management.deploy.internal.DeploymentServerService.shutdownApps(DeploymentServerService.java:195)
  at weblogic.management.deploy.internal.DeploymentServerService.shutdownHelper(DeploymentServerService.java:127)
  at weblogic.application.ApplicationShutdownService.halt(ApplicationShutdownService.java:142)
  at weblogic.t3.srvr.ServerServicesManager.haltInternal(ServerServicesManager.java:504)
  at weblogic.t3.srvr.ServerServicesManager.halt(ServerServicesManager.java:336)
  at weblogic.t3.srvr.T3Srvr.shutdown(T3Srvr.java:1039)
  at weblogic.t3.srvr.T3Srvr.forceShutdown(T3Srvr.java:945)
  at weblogic.t3.srvr.T3Srvr$2.run(T3Srvr.java:958)

Caused By: java.lang.IllegalArgumentException: Failed to properly unregister weblogic.work.RequestClassRuntimeMBeanImpl@50b20090 for ObjectName com.bea:ServerRuntime=AdminServer,Name=default@plat_changde_test@null,WorkManagerRuntime=default,ApplicationRuntime=plat_changde_test,Type=RequestClassRuntime
  at weblogic.management.jmx.ObjectNameManagerBase.unregisterObject(ObjectNameManagerBase.java:219)
  at weblogic.management.jmx.ObjectNameManagerBase.unregisterObjectInstance(ObjectNameManagerBase.java:192)
  at weblogic.management.mbeanservers.internal.RuntimeMBeanAgent$1.unregisteredInternal(RuntimeMBeanAgent.java:124)
  at weblogic.management.mbeanservers.internal.RuntimeMBeanAgent$1.unregistered(RuntimeMBeanAgent.java:108)
  at weblogic.management.provider.core.RegistrationManagerBase.invokeRegistrationHandlers(RegistrationManagerBase.java:187)
  at weblogic.management.provider.core.RegistrationManagerBase.unregister(RegistrationManagerBase.java:126)
  at weblogic.management.runtime.RuntimeMBeanDelegate.unregister(RuntimeMBeanDelegate.java:287)
  at weblogic.management.runtime.RuntimeMBeanDelegate.unregisterChildren(RuntimeMBeanDelegate.java:350)
  at weblogic.management.runtime.RuntimeMBeanDelegate.unregister(RuntimeMBeanDelegate.java:274)
  at weblogic.management.runtime.RuntimeMBeanDelegate.unregisterChildren(RuntimeMBeanDelegate.java:350)
  at weblogic.management.runtime.RuntimeMBeanDelegate.unregister(RuntimeMBeanDelegate.java:274)
  at weblogic.j2ee.J2EEApplicationRuntimeMBeanImpl.unregister(J2EEApplicationRuntimeMBeanImpl.java:359)
  at weblogic.application.internal.flow.ApplicationRuntimeMBeanFlow.unprepare(ApplicationRuntimeMBeanFlow.java:62)
  at weblogic.application.internal.BaseDeployment$1.previous(BaseDeployment.java:1233)
  at weblogic.application.utils.StateMachineDriver.previousState(StateMachineDriver.java:167)
  at weblogic.application.utils.StateMachineDriver.previousState(StateMachineDriver.java:159)
  at weblogic.application.internal.BaseDeployment.unprepare(BaseDeployment.java:495)
  at weblogic.application.internal.SingleModuleDeployment.unprepare(SingleModuleDeployment.java:43)
  at weblogic.application.internal.DeploymentStateChecker.unprepare(DeploymentStateChecker.java:205)
  at weblogic.deploy.internal.targetserver.AppContainerInvoker.unprepare(AppContainerInvoker.java:117)
  at weblogic.deploy.internal.targetserver.BasicDeployment.unprepare(BasicDeployment.java:287)
  at weblogic.management.deploy.internal.DeploymentAdapter$1.doUnprepare(DeploymentAdapter.java:81)
  at weblogic.management.deploy.internal.DeploymentAdapter.unprepare(DeploymentAdapter.java:224)
  at weblogic.management.deploy.internal.AppTransition$7.transitionApp(AppTransition.java:75)
  at weblogic.management.deploy.internal.ConfiguredDeployments.transitionApps(ConfiguredDeployments.java:240)
  at weblogic.management.deploy.internal.ConfiguredDeployments.unprepare(ConfiguredDeployments.java:204)
  at weblogic.management.deploy.internal.ConfiguredDeployments.undeploy(ConfiguredDeployments.java:192)
  at weblogic.management.deploy.internal.DeploymentServerService.shutdownApps(DeploymentServerService.java:195)
  at weblogic.management.deploy.internal.DeploymentServerService.shutdownHelper(DeploymentServerService.java:127)
  at weblogic.application.ApplicationShutdownService.halt(ApplicationShutdownService.java:142)
  at weblogic.t3.srvr.ServerServicesManager.haltInternal(ServerServicesManager.java:504)
  at weblogic.t3.srvr.ServerServicesManager.halt(ServerServicesManager.java:336)
  at weblogic.t3.srvr.T3Srvr.shutdown(T3Srvr.java:1039)
  at weblogic.t3.srvr.T3Srvr.forceShutdown(T3Srvr.java:945)
  at weblogic.t3.srvr.T3Srvr$2.run(T3Srvr.java:958)

故障原因

Caused By: java.lang.IllegalArgumentException: Failed to properly unregister weblogic.work.RequestClassRuntimeMBeanImpl@50b20090 for ObjectName com.bea:ServerRuntime=AdminServer,Name=default@plat_changde_test@null,WorkManagerRuntime=default,ApplicationRuntime=plat_changde_test,Type=RequestClassRuntime

在MOS上找到一篇关于这个错误的文章说原因是

When you have JDBC connection pool name same as application name in config.xml we are running into the issue.

在config.xml文件中确实存在jb_zs,jb_test,plat_changde_test,plat_changde的应用名与连接池名相同将其修改为不一样后重新weblogic解决了此问题.

oracle IO性能问题故障诊断案例

Posted on 2014年7月16日2014年7月16日 by YongJing(敬勇)

一业务系统在白天业务时间出现了严重了IO性能问题,下面是下午业务高峰时间(3-5)的awr报告

从等待事件来看主要都是与IO相关

从上面可以看到除了几个语句的逻辑读很高，其实物理不是很高,每秒产生的重做日志以及物理读也不高.

检查磁盘IO

rx6600-1:[/]#sar -d 1 10

HP-UX rx6600-1 B.11.23 U ia64    07/15/14

16:18:45   device   %busy   avque   r+w/s  blks/s  avwait  avserv
16:18:46  c39t0d3  100.00    0.50      18    1130    0.00  132.11
          c41t0d3   83.50    0.50       6     450    0.00  290.78
16:18:47   c3t0d0    0.99    0.50       2      63    0.00    7.12
          c39t0d3   91.09    0.50      10     982    0.00  115.53
          c41t0d3  100.00    0.50      12     586    0.00  291.67
16:18:48   c3t0d0    3.03    0.50       2      32    0.00   15.93
          c39t0d3  100.00    0.50       9    1034    0.00  139.76
          c41t0d3   92.93    0.50       7     388    0.00  310.07
16:18:49   c3t0d0    2.00    0.50       4      64    0.00   19.59
          c39t0d3  100.00    0.50      12    1088    0.00  127.33
          c41t0d3   86.00    0.50       8     416    0.00  251.32
16:18:50   c3t0d0    1.01    0.50       1       2    0.00    8.99
          c39t0d3  100.00    0.50      16     954    0.00  117.10
          c41t0d3  100.00    0.50       9     614    0.00  295.52
16:18:51   c3t0d0    0.99    0.50       1       8    0.00   10.60
          c39t0d3   93.07    0.50      17     913    0.00  110.59
          c41t0d3  100.00    0.50       9     350    0.00  326.92
16:18:52  c39t0d3  100.00    0.50      21    1168    0.00  127.22
          c41t0d3   88.00    0.50      11     544    0.00  252.08
16:18:53   c3t0d0    2.02    0.50       3      48    0.00   18.51
          c39t0d3   88.89    0.50      19    1164    0.00   98.25
          c41t0d3  100.00    0.50      11     630    0.00  324.39
16:18:54   c3t0d0    3.00    0.50       3      20    0.00   12.39
          c39t0d3   95.00    0.50      20     954    0.00  131.90
          c41t0d3   81.00    0.50       9     610    0.00  289.05
16:18:55   c3t0d0    9.00    0.50      11     134    0.00    8.62
          c39t0d3  100.00    0.50      19    1090    0.00  137.20
          c41t0d3  100.00    0.50      11     512    0.00  327.16

Average   c39t0d3   99.50    0.50      16    1048    0.00  123.38
Average   c41t0d3  100.00    0.50       9     510    0.00  296.44
Average    c3t0d0    2.20    0.50       3      37    0.00   12.28
rx6600-1:[/]#sar -d 1 10

HP-UX rx6600-1 B.11.23 U ia64    07/15/14

16:20:04   device   %busy   avque   r+w/s  blks/s  avwait  avserv
16:20:05   c3t0d0    1.00    0.50       1      16    0.00    8.33
          c39t0d3   98.00    0.50      16     928    0.00  114.86
          c41t0d3   98.00    0.50      10     684    0.00  266.43
16:20:06   c3t0d0    1.98    0.50       4      81    0.00    8.57
          c39t0d3   93.07    0.50      19    1251    0.00  128.81
          c41t0d3   91.09    0.50       6     475    0.00  365.83
16:20:07   c3t0d0    2.00    0.50       3      48    0.00    5.87
          c39t0d3   98.00    0.50      23    1216    0.00  113.66
          c41t0d3   98.00    0.50       8     576    0.00  307.92
16:20:08   c3t0d0    1.00    0.50       2      32    0.00    5.36
          c39t0d3  100.00    0.50      21    1132    0.00  118.47
          c41t0d3  100.00    0.50       7     592    0.00  300.71
16:20:09   c3t0d0    6.00    0.58      13     194    2.22   26.05
          c39t0d3   89.00    0.50      17    1152    0.00  123.54
          c41t0d3   87.00    0.50       8     512    0.00  298.26
16:20:10   c3t0d0    3.00    0.50       6      96    0.00   22.78
          c39t0d3   85.00    0.50      17    1136    0.00  114.79
          c41t0d3   98.00    0.50       9     592    0.00  252.52
16:20:11   c3t0d0    1.00    0.50       1       2    0.00    8.04
          c39t0d3  100.00    0.50      17    1216    0.00  138.04
          c41t0d3  100.00    0.50      12     672    0.00  291.69
16:20:12   c3t0d0    2.00    0.50       3      34    0.00    9.24
          c39t0d3   99.00    0.50      16    1024    0.00  122.11
          c41t0d3   88.00    0.50       9     476    0.00  299.79
16:20:13  c39t0d3   91.00    0.50      18    1024    0.00  111.77
          c41t0d3   92.00    0.50       3     384    0.00  396.25
16:20:14  c39t0d3   99.00    0.50      17     892    0.00  132.15
          c41t0d3  100.00    0.50      10     608    0.00  233.54

Average    c3t0d0    1.80    0.53       3      50    0.87   17.64
Average   c39t0d3   96.00    0.50      18    1097    0.00  121.54
Average   c41t0d3  100.00    0.50       8     557    0.00  290.35

在业务人员下班后重启的双机软件，但在启动数据库时停在了Completed redo application这一步

SQL> startup
ORACLE instance started.

Total System Global Area 1.0318E+10 bytes
Fixed Size                  2073176 bytes
Variable Size            3238006184 bytes
Database Buffers         7063207936 bytes
Redo Buffers               14700544 bytes
Database mounted.

从alert.log文件中可以看到如下信息:

Tue Jul 15 22:23:29 2014
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Picked latch-free SCN scheme 3
Autotune of undo retention is turned on. 
IMODE=BR
ILAT =61
LICENSE_MAX_USERS = 0
SYS auditing is disabled
ksdpec: called for event 13740 prior to event group initialization
Starting up ORACLE RDBMS Version: 10.2.0.4.0.
System parameters with non-default values:
  processes                = 500
  sessions                 = 555
  __shared_pool_size       = 3154116608
  __large_pool_size        = 16777216
  __java_pool_size         = 33554432
  __streams_pool_size      = 33554432
  sga_target               = 10317987840
  control_files            = /sx_data/ORCL/control01.ctl, /sx_data/ORCL/control02.ctl, /sx_data/ORCL/control03.ctl
  db_block_size            = 8192
  __db_cache_size          = 7063207936
  compatible               = 10.2.0.3.0
  log_archive_dest_1       = LOCATION=/sx_data/arch_ORCL/
  log_archive_format       = %t_%s_%r.dbf
  db_file_multiblock_read_count= 16
  db_recovery_file_dest    = /oracle/flash_recovery_area
  db_recovery_file_dest_size= 2147483648
  undo_management          = AUTO
  undo_tablespace          = UNDOTBS1
  undo_retention           = 39600
  fast_start_parallel_rollback= FALSE
  remote_login_passwordfile= EXCLUSIVE
  db_domain                = 
  dispatchers              = (PROTOCOL=TCP) (SERVICE=ORCLXDB)
  local_listener           = ORCL
  job_queue_processes      = 10
  background_dump_dest     = /oracle/admin/ORCL/bdump
  user_dump_dest           = /oracle/admin/ORCL/udump
  core_dump_dest           = /oracle/admin/ORCL/cdump
  audit_file_dest          = /oracle/admin/ORCL/adump
  db_name                  = ORCL
  open_cursors             = 2000
  optimizer_index_cost_adj = 20
  optimizer_index_caching  = 90
  pga_aggregate_target     = 2576351232
PMON started with pid=2, OS id=13613
PSP0 started with pid=3, OS id=13615
MMAN started with pid=4, OS id=13617
DBW0 started with pid=5, OS id=13619
LGWR started with pid=6, OS id=13621
CKPT started with pid=7, OS id=13623
SMON started with pid=8, OS id=13625
RECO started with pid=9, OS id=13627
CJQ0 started with pid=10, OS id=13629
MMON started with pid=11, OS id=13631
Tue Jul 15 22:23:30 2014
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
MMNL started with pid=12, OS id=13635
Tue Jul 15 22:23:30 2014
starting up 1 shared server(s) ...
Tue Jul 15 22:23:31 2014
ALTER DATABASE   MOUNT
Tue Jul 15 22:23:39 2014
Setting recovery target incarnation to 2
Tue Jul 15 22:23:42 2014
Successful mount of redo thread 1, with mount id 1380841571
Tue Jul 15 22:23:42 2014
Database mounted in Exclusive Mode
Completed: ALTER DATABASE   MOUNT
Tue Jul 15 22:23:42 2014
ALTER DATABASE OPEN
Tue Jul 15 22:23:47 2014
Beginning crash recovery of 1 threads
 parallel recovery started with 7 processes
Tue Jul 15 22:23:50 2014
Started redo scan
Tue Jul 15 22:23:52 2014
Completed redo scan
 336597 redo blocks read, 78835 data blocks need recovery
Tue Jul 15 22:23:52 2014
Started redo application at
 Thread 1: logseq 2270, block 29
Tue Jul 15 22:23:53 2014
Recovery of Online Redo Log: Thread 1 Group 4 Seq 2270 Reading mem 0
  Mem# 0: /sx_data/ORCL/redo04.log
Tue Jul 15 22:23:58 2014
Completed redo application

一直停在Completed redo application这，而这时的等待事件是checkpoint complete开始以为是并行恢复慢造成的，就查询了v$transaction，v$fast_start_transactions但视图中并没有进行恢复操作的事务存在.后来咨询了老熊，老熊说检查一下IO情况看是不是存储出问题了，如是再次检查存储IO性能:

rx6600-1:[/]#sar 1 10

HP-UX rx6600-1 B.11.23 U ia64    07/15/14

22:36:41    %usr    %sys    %wio   %idle
22:36:42       2       2      12      84
22:36:43       1       0      12      87
22:36:44       0       0      17      83
22:36:45       0       0      13      87
22:36:46       0       1      12      87
22:36:47       2       1      13      84
22:36:48       1       1      16      82
22:36:49       0       0      12      88
22:36:50       0       0      12      88
22:36:51       0       0      22      78

Average        1       0      14      85

从上面可以看到现在实际上并没有业务在跑居然还存在IO等待这是不正常的

rx6600-2:[/]#bdf
Filesystem          kbytes    used   avail %used Mounted on
/dev/vg00/lvol3     983040  422504  556176   43% /
/dev/vg00/lvol1    1835008  135048 1686776    7% /stand
/dev/vg00/lvol8    8912896 8535352  374824   96% /var
/dev/vg00/lvol7    7962624 2762312 5159704   35% /usr
/dev/vg00/lvol4     524288   83192  437784   16% /tmp
/dev/vg00/tmplv    2064384   93512 1847942    5% /oratmp
/dev/vg00/orasoft  10256384 3144652 6668390   32% /orasoft
/dev/vg00/oracle   20480000 5480497 14062042   28% /oracle
/dev/vg00/lvol6    9076736 5206384 3840128   58% /opt
/dev/vg00/lvol5     131072   25472  104824   20% /home
/dev/cwjcvg/cwjc_datalv
                   414973952 134188114 263239842   34% /cwjc_data
/dev/sxvg/sx_datalv
                   624689152 298665485 305654147   49% /sx_data

rx6600-2:[/]#time dd if=/dev/zero of=/var/test bs=8k count=100000

下面对小机自身的磁盘进行IO测试写800M的数据只要12秒左右

msgcnt 2 vxfs: mesg 001: vx_nospace - /dev/vg00/lvol8 file system full (1 block extent)
I/O error 
47185+0 records in
47184+1 records out

real       11.7
user        0.0
sys         0.8

但是对EMC存储进行IO测试写800M的数据只要30多分还没有完成

rx6600-2:[/]#time dd if=/dev/zero of=/sx_data/test bs=8k count=100000
711856+0 records in
711855+0 records out

real    30:58.4
user        0.5
sys        13.0

这明显的是存储出了问题,后面得知管理人员早上10点多发现了存储有一个磁盘损坏了，存储做的raid 5,有热备盘.而且还有上百G的数据进行存储级的同步.与出现性能问题的时间一至。

到此问题原因找到了解决起来也就简单了.幸亏问题解决了，第二天有大领导来检查要不就…….哈哈

WLS 10.3.0 java.lang.IllegalArgumentException Registered more than one instance部署程序异常终止

Posted on 2014年7月11日2014年7月11日 by YongJing(敬勇)

单位上新上一应用在weblogic控制台进行应用程序更新发布出现
java.lang.IllegalArgumentException Registered more than one instance异常，然后weblogic就异常退出.

查看AdminServer.log日志可以看到如下信息:


####<2014-7-10 下午04时28分11秒 CST>     < [ACTIVE] ExecuteThread: '4' for queue: 'weblogic.kernel.Default (self-tuning)'>  <> <> <1404980891082>   
####<2014-7-10 下午04时28分21秒 CST>     < [ACTIVE] ExecuteThread: '4' for queue: 'weblogic.kernel.Default (self-tuning)'>  <> <> <1404980901145>   
####<2014-7-10 下午04时28分21秒 CST>     < [STANDBY] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'> < > <> <> <1404980901148>   
####<2014-7-10 下午04时28分21秒 CST>     < [STANDBY] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'> < > <> <> <1404980901148>   
####<2014-7-10 下午04时28分21秒 CST>     < [STANDBY] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'> < > <> <> <1404980901149>   
####<2014-7-10 下午04时28分21秒 CST>     < [STANDBY] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'> < > <> <> <1404980901173>  < [CompressingFilter/1.4.6] CompressingFilter is being destroyed...> 
####<2014-7-10 下午04时28分21秒 CST>     < [STANDBY] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'> < > <> <> <1404980901176>   
####<2014-7-10 下午04时28分21秒 CST>     < [STANDBY] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'> < > <> <> <1404980901177>   
####<2014-7-10 下午04时28分21秒 CST>     < [STANDBY] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'> < > <> <> <1404980901177>   
####<2014-7-10 下午04时28分21秒 CST>     < [STANDBY] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'> < > <> <> <1404980901178>   
####<2014-7-10 下午04时28分21秒 CST>     < [STANDBY] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'> < > <> <> <1404980901178>   
####<2014-7-10 下午04时28分21秒 CST>     < [STANDBY] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'> < > <> <> <1404980901190>  (RuntimeMBeanDelegate.java:255)
	at weblogic.management.runtime.RuntimeMBeanDelegate.(RuntimeMBeanDelegate.java:215)
	at weblogic.management.runtime.RuntimeMBeanDelegate.(RuntimeMBeanDelegate.java:193)
	at weblogic.work.WorkManagerRuntimeMBeanImpl.(WorkManagerRuntimeMBeanImpl.java:49)
	at weblogic.work.WorkManagerRuntimeMBeanImpl.getWorkManagerRuntime(WorkManagerRuntimeMBeanImpl.java:59)
	at weblogic.work.WorkManagerCollection.addWorkManagerRuntime(WorkManagerCollection.java:774)
	at weblogic.work.WorkManagerCollection.initialize(WorkManagerCollection.java:187)
	at weblogic.application.internal.flow.WorkManagerFlow.prepare(WorkManagerFlow.java:45)
	at weblogic.application.internal.BaseDeployment$1.next(BaseDeployment.java:1221)
	at weblogic.application.utils.StateMachineDriver.nextState(StateMachineDriver.java:41)
	at weblogic.application.internal.BaseDeployment.prepare(BaseDeployment.java:367)
	at weblogic.application.internal.SingleModuleDeployment.prepare(SingleModuleDeployment.java:43)
	at weblogic.application.internal.DeploymentStateChecker.prepare(DeploymentStateChecker.java:154)
	at weblogic.deploy.internal.targetserver.AppContainerInvoker.prepare(AppContainerInvoker.java:60)
	at weblogic.deploy.internal.targetserver.operations.RedeployOperation.createAndPrepareContainer(RedeployOperation.java:98)
	at weblogic.deploy.internal.targetserver.operations.RedeployOperation.doPrepare(RedeployOperation.java:122)
	at weblogic.deploy.internal.targetserver.operations.AbstractOperation.prepare(AbstractOperation.java:217)
	at weblogic.deploy.internal.targetserver.DeploymentManager.handleDeploymentPrepare(DeploymentManager.java:747)
	at weblogic.deploy.internal.targetserver.DeploymentManager.prepareDeploymentList(DeploymentManager.java:1216)
	at weblogic.deploy.internal.targetserver.DeploymentManager.handlePrepare(DeploymentManager.java:250)
	at weblogic.deploy.internal.targetserver.DeploymentServiceDispatcher.prepare(DeploymentServiceDispatcher.java:159)
	at weblogic.deploy.service.internal.targetserver.DeploymentReceiverCallbackDeliverer.doPrepareCallback(DeploymentReceiverCallbackDeliverer.java:171)
	at weblogic.deploy.service.internal.targetserver.DeploymentReceiverCallbackDeliverer.access$000(DeploymentReceiverCallbackDeliverer.java:13)
	at weblogic.deploy.service.internal.targetserver.DeploymentReceiverCallbackDeliverer$1.run(DeploymentReceiverCallbackDeliverer.java:46)
	at weblogic.work.SelfTuningWorkManagerImpl$WorkAdapterImpl.run(SelfTuningWorkManagerImpl.java:528)
	at weblogic.work.ExecuteThread.execute(ExecuteThread.java:201)
	at weblogic.work.ExecuteThread.run(ExecuteThread.java:173)
> 
####<2014-7-10 下午04时28分21秒 CST>     < [STANDBY] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'> < > <> <> <1404980901192>  (RuntimeMBeanDelegate.java:255)
	at weblogic.management.runtime.RuntimeMBeanDelegate.(RuntimeMBeanDelegate.java:215)
	at weblogic.work.RequestClassRuntimeMBeanImpl.(RequestClassRuntimeMBeanImpl.java:32)
	at weblogic.work.WorkManagerRuntimeMBeanImpl.getRequestClassRuntime(WorkManagerRuntimeMBeanImpl.java:86)
	at weblogic.work.WorkManagerRuntimeMBeanImpl.getWorkManagerRuntime(WorkManagerRuntimeMBeanImpl.java:61)
	at weblogic.work.WorkManagerCollection.addWorkManagerRuntime(WorkManagerCollection.java:774)
	at weblogic.work.WorkManagerCollection.initialize(WorkManagerCollection.java:187)
	at weblogic.application.internal.flow.WorkManagerFlow.prepare(WorkManagerFlow.java:45)
	at weblogic.application.internal.BaseDeployment$1.next(BaseDeployment.java:1221)
	at weblogic.application.utils.StateMachineDriver.nextState(StateMachineDriver.java:41)
	at weblogic.application.internal.BaseDeployment.prepare(BaseDeployment.java:367)
	at weblogic.application.internal.SingleModuleDeployment.prepare(SingleModuleDeployment.java:43)
	at weblogic.application.internal.DeploymentStateChecker.prepare(DeploymentStateChecker.java:154)
	at weblogic.deploy.internal.targetserver.AppContainerInvoker.prepare(AppContainerInvoker.java:60)
	at weblogic.deploy.internal.targetserver.operations.RedeployOperation.createAndPrepareContainer(RedeployOperation.java:98)
	at weblogic.deploy.internal.targetserver.operations.RedeployOperation.doPrepare(RedeployOperation.java:122)
	at weblogic.deploy.internal.targetserver.operations.AbstractOperation.prepare(AbstractOperation.java:217)
	at weblogic.deploy.internal.targetserver.DeploymentManager.handleDeploymentPrepare(DeploymentManager.java:747)
	at weblogic.deploy.internal.targetserver.DeploymentManager.prepareDeploymentList(DeploymentManager.java:1216)
	at weblogic.deploy.internal.targetserver.DeploymentManager.handlePrepare(DeploymentManager.java:250)
	at weblogic.deploy.internal.targetserver.DeploymentServiceDispatcher.prepare(DeploymentServiceDispatcher.java:159)
	at weblogic.deploy.service.internal.targetserver.DeploymentReceiverCallbackDeliverer.doPrepareCallback(DeploymentReceiverCallbackDeliverer.java:171)
	at weblogic.deploy.service.internal.targetserver.DeploymentReceiverCallbackDeliverer.access$000(DeploymentReceiverCallbackDeliverer.java:13)
	at weblogic.deploy.service.internal.targetserver.DeploymentReceiverCallbackDeliverer$1.run(DeploymentReceiverCallbackDeliverer.java:46)
	at weblogic.work.SelfTuningWorkManagerImpl$WorkAdapterImpl.run(SelfTuningWorkManagerImpl.java:528)
	at weblogic.work.ExecuteThread.execute(ExecuteThread.java:201)
	at weblogic.work.ExecuteThread.run(ExecuteThread.java:173)

从上面的错误信息可以看到

 redeploy operation for application, hninsiis

这说明是在更新应用程序重新加载hninsiis应用

java.lang.IllegalArgumentException: Registered more than one instance with the same objectName : com.bea:ServerRuntime=AdminServer,Name=default@hninsiis@null,WorkManagerRuntime=default,ApplicationRuntime=hninsiis,Type=RequestClassRuntime new:weblogic.work.RequestClassRuntimeMBeanImpl@1470f4af existing weblogic.work.RequestClassRuntimeMBeanImpl@3702476f

这个信息是说在加载应用程序时已经存在一个同名的应用程序，而实际上并不存在同名的应用程序

这时只能使用无所不能的MOS了,在MOS上找到了关于这个错误信息的文档

WLS 10.3.0: java.lang.IllegalArgumentException: Existing Timer Manager Has Different Work Manager (文档 ID 1097661.1)

In this Document
Symptoms
Cause
Solution
References

APPLIES TO:

Oracle Weblogic Server - Version 10.3 to 10.3
Information in this document applies to any platform.
***Checked for relevance on 2-May-2013***
SYMPTOMS

When trying to redeploy an application that uses a foreign JMS queue, the error below is shown. The server needs to be rebooted with every new deployment. Otherwise, the server fails consistently with the error: "Existing timer manager has different work manager."

### < [ACTIVE] ExecuteThread: '2' for queue: 'weblogic.kernel.Default (self-tuning)'> < > <> <> <1272310540224>
CAUSE

This issue is caused by a flaw in the WLS code such that all MDB instances/pollers on the same WLS server of the same MDB deployment share the same timer manager for message polling. When one instance is disconnected due to JMS server migration, the timer manager is stopped, which in turn stops other MDB instances on the same server. This issue is addressed by unpublished defect 7669814.

SOLUTION

Patches are available for unpublished defect 7669814:

PATCH INFORMATION
WLS Version Patch Number
10.3.0 Patch 7669814
Fixed in: 10.3.1

To apply one of these patches, click on the link for your WLS version and download the appropriate patch from My Oracle Support. You can also search in My Oracle Support for the patch number for your WLS version: for detailed instructions, please see Master Note: How to Locate and Download Patches for WebLogic Server Using My Oracle Support Document 1302053.1. For instructions on how to apply these patches to your system, please see How to Apply WebLogic Server (WLS) Patches Using Smart Update Document 876004.1. For other issues relating to Smart Update, please see Master Note on Troubleshooting Smart Update Issues Document 1230725.1.

Patches are specifically tied to a particular release and maintenance pack of WebLogic Server (WLS). A patch for WLS 10.3.3, for example, very likely would not work on WLS 10.3.5. In a few cases, patches are also specific to particular operating systems. If you think you are experiencing the problem outlined in this note, and you are running a WLS version which is eligible for error correction (see Document 950131.1 for more about the Oracle Error Correction Policy), but your WLS version is not included in the list of patches provided here, please contact support and ask for a version of the patch appropriate for you. Please reference this note as well as this bug number to help speed our service for you.

NOTE: Patches are applied per WLS installation and not per domain. That is, if you apply this patch on one WLS installation, then all of the servers from all the domains in that installation will have this patch. On the other hand, if you have a managed server in another machine in a domain (that is, set up with its own WLS installation), you need to install this patch on that other machine as well. Generally, patches can only be applied while the server is not running because WLS locks the needed files while it is running. If, however, you are able to apply a patch while WLS is running, you must restart WLS before the patch will take effect.

从上面的信息说已经在10.3.1这个版本已经修复这个bug了.但我们的weblogic版本是10.3.3.0

[root@sx-weblogic31 lib]# java -cp weblogic.jar weblogic.version

WebLogic Server 10.3.3.0  Fri Apr 9 00:05:28 PDT 2010 1321401 

Use 'weblogic.version -verbose' to get subsystem information

Use 'weblogic.utils.Versions' to get version information for all modules

其实给weblogic打补丁提供了两种方法

1. Using Smart Update

You can apply the patch using Smart Update with the following steps:

Download the patch from My Oracle Support (MOS). For more details, please refer to Master Note: How to Locate and Download Patches for WebLogic Server Using My Oracle Support Note 1302053.1.
Extract the contents from the zip file: you will have a jar file and patch-catalog_xxx.xml. A readme file may also be included.
Copy the files (for example, E5W8.jar and WGQJ.jar) and the patch-catalog_xxx.xml from the zip file to the target machine. You do not need the readme file. Copy the files to the appropriate cache_dir directory for the target system: for example, on Windows, %MIDDLEWARE_HOME%\utils\bsu\cache_dir, or on UNIX, $MIDDLEWARE_HOME/utils/bsu/cache_dir. The directory MW_HOME\utils\bsu\cache_diris created as the default patch download directory when you install Smart Update 3.3.0. (see http://docs.oracle.com/cd/E14759_01/doc.32/e14143/start.htm#i1071723)

NOTE: Always copy the patch-catalog_xxx.xml file from the downloaded patch to the cache_dir along with the patch itself. Do NOTrename this file. If you see an "unrecognized patch ID" error, please refer to Note 1186923.1 for details on how to resolve it.
Run Smart Update and apply the patches and/or patch sets to the target system. This can be done using the Smart Update GUI or the command-line interface (see http://download.oracle.com/docs/cd/E14759_01/doc.32/e14143/commands.htm#i1074489).
a. Smart Update in graphical (GUI) mode

Run the /utils/bsu/bsu script (bsu.sh for UNIX, bsu.cmd for Windows). This will start the Smart Update GUI.
Look for the patches you copied in the "Downloaded Patches" section at the bottom.
Select the "Apply" button for each patch you want to apply. This will validate the patch and apply it to the whole installation.
The following viewlet provides an example of using Smart Update in GUI mode:

Video - Applying a Patch Using Smart Update in GUI Mode (1:15) Trouble seeing this video?

b. Command-line interface

This is the syntax for the command to view the downloaded patches as below:
./bsu.sh -prod_dir= -patch_download_dir= -status=downloaded -view -verbose
For example:
./bsu.sh -prod_dir=/opt/bea/weblogic92 -patch_download_dir=/opt/bea/utils/bsu/cache_dir -status=downloaded -view -verbose
This is the syntax for the command to install a patch:
./bsu.sh -prod_dir= -patchlist= -verbose -install
For example:
./bsu.sh -prod_dir=/opt/bea/weblogic92 -patchlist=E5W8 -verbose -install
./bsu.sh -prod_dir=/opt/bea/weblogic92 -patchlist=E5W8,WGQJ -verbose -install
This is the syntax for the command to check if the patch is installed:
./bsu.sh -prod_dir= -patch_download_dir= -status=applied -verbose -view
For example:
./bsu.sh -prod_dir=/opt/bea/weblogic92 -status=applied -verbose -view
2. Applying the patch to the classpath manually

You can apply the patch to the system manually by extracting the actual patch and adding it to the classpath on the system:

Extract the actual patch jar file. It will be in the form .jar (for example: E5W8.jar). Inside this jar file is the actual patch jar file, which will be of the form CR326566_92mp3.jar. Extract the latter file for the following steps.
Add the extracted jar file as the first element of the classpath of the Admin server as well as the managed servers in the domain.
If you are starting servers using the WebLogic Server startup script, update the classpath in the startup script like this:
set CLASSPATH=\jars\CR326566_92mp3.jar;%CLASSPATH% (Windows)
CLASSPATH=/jars/CR326566_92mp3.jar:$CLASSPATH (UNIX)
where PATCH_DIR is the directory on the machine where you extracted/saved the patch file.
Similarly, if you are starting servers using Node Manager, add the patch jar to the beginning of the Class Path argument in the Server Start tab for the server(s).
NOTE: Applying the patch to the classpath manually (approach 2) is recommended only for releases prior to WLS 9.1. Smart Update should be used when it is available as it provides patch conflict and dependency checking.
REFERENCES

上面的第一种方法是要连接网络通过GUI界面或者./bsu.sh的命令行接口来进行但是这也是一种使用不了的方法因为很少有人把weblogic服务器置于连网状态.
所以还是使用第二种方法先下补丁Patch 7669814,然后将补丁包中的.jar文件复制到weblogic服务器主机上然后将.jar包添加到classpath环境变量中.

补丁Patch 7669814包中的文件为KG6I.jar将其复制到weblogic的lib目录下

[root@sx-weblogic32 lib]# ls -lrt
total 28
-rw-r----- 1 root root   702 Mar 29  2011 readme.txt
-rw-r--r-- 1 root root 23302 Jun 10  2011 KG6I.jar

下面就需要修改weblogic的启动脚本将KG6I.jar文件加载到classpath环境变量中然后再重新启动weblogic
在startWebLogic.sh中增加下面的记录
CLASSPATH=/bea11/user_projects/domains/mydomain/lib/KG6I.jar:$CLASSPATH

在重新启动weblogic之后再来更新应用程序

####<2014-7-11 上午08时54分05秒 CST>     < [ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)'> < > <> <> <1405040045244>   
####<2014-7-11 上午08时54分05秒 CST>     < [ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)'> < > <> <> <1405040045438>   
####<2014-7-11 上午08时54分05秒 CST>     < [ACTIVE] ExecuteThread: '7' for queue: 'weblogic.kernel.Default (self-tuning)'> < > <> <> <1405040045474>   
####<2014-7-11 上午08时54分05秒 CST>     < [ACTIVE] ExecuteThread: '7' for queue: 'weblogic.kernel.Default (self-tuning)'> < > <> <> <1405040045476>   
####<2014-7-11 上午08时54分05秒 CST>     < [ACTIVE] ExecuteThread: '7' for queue: 'weblogic.kernel.Default (self-tuning)'> < > <> <> <1405040045730>  < [CompressingFilter/1.4.6] CompressingFilter has initialized> 
####<2014-7-11 上午08时54分14秒 CST>     < [ACTIVE] ExecuteThread: '7' for queue: 'weblogic.kernel.Default (self-tuning)'> < > <> <> <1405040054094>   
####<2014-7-11 上午08时54分14秒 CST>     < [ACTIVE] ExecuteThread: '7' for queue: 'weblogic.kernel.Default (self-tuning)'> < > <> <> <1405040054094>   
####<2014-7-11 上午08时54分14秒 CST>     < [ACTIVE] ExecuteThread: '8' for queue: 'weblogic.kernel.Default (self-tuning)'> < > <> <> <1405040054111>