IPv6 for MySQL

MySQL对IPv6的支持包括这些功能:
.MySQL服务器可以接受客户端通过IPv6进行TCP/IP连接。例如下面的命令是在本机通过IPv6来连接MySQL服务器:

[mysql@localhost ~]$ mysql -h ::1 -uusername -ppassword

要使用这个功能必须满足以下两个条件;
.服务器操作系统必须被配置支持IPv6
.MySQL服务器缺省配置必须除了支持IPv4的连接之外还要支持IPv6的连接。为了改变缺省配置可以在启动MySQL服务器时使用合适的–bind-address选项。

.MySQL账号名允许使用IPv6地址能让DBA为通过IPv6连接到服务器的客户端指定权限。例如:

mysql> create user 'v6root'@'::1' identified by "abcd";
mysql> grant select on mysql.* to 'v6root'@'::1';

.IPv6功能支持在字符串和内部格式IPv6地址格式之间进行转换并且检查其是否为有效的IPv6地址。例如
INET6_ATON()和INET6_NTOA()类似于INET_ATON()和INET_NTOA(),但除了能处理IPv4地址之外还能处理IPv6地址。

验证操作系统是否支持IPv6
在MySQL服务器可以接受IPv6的连接之前,服务器所在操作系统主机必须支持IPv6。例如在Linux主机上执行以下命令:

[root@localhost ~]# ping6 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.100 ms
64 bytes from ::1: icmp_seq=2 ttl=64 time=0.056 ms
64 bytes from ::1: icmp_seq=3 ttl=64 time=0.035 ms
64 bytes from ::1: icmp_seq=4 ttl=64 time=0.039 ms
....

为了对操作系统网络接口生成详细的描述,执行ifconfig -a并查看输出中的IPv6地址。

[root@localhost ~]# ifconfig -a
docker0: flags=4163  mtu 1500
        inet 172.17.0.1  netmask 255.255.0.0  broadcast 172.17.255.255
        inet6 fe80::42:15ff:fe1d:e05a  prefixlen 64  scopeid 0x20
        ether 02:42:15:1d:e0:5a  txqueuelen 0  (Ethernet)
        RX packets 15  bytes 3638 (3.5 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 33  bytes 3825 (3.7 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens160: flags=4163  mtu 1500
        inet 192.168.1.250  netmask 255.255.255.0  broadcast 192.168.1.255
        inet6 fe80::250:56ff:fea3:90e6  prefixlen 64  scopeid 0x20
        ether 00:50:56:a3:90:e6  txqueuelen 1000  (Ethernet)
        RX packets 5634313  bytes 364548809 (347.6 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 19054  bytes 2337541 (2.2 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

如果你的操作系统不支持IPv6,查看操作系统文档查找相关指令来启用IPv6。这可能只需要重新配置现有的网络接口来增加一个IPv6地址。或者需要执行更多的修改,比如使用IPv6选项来重建内核。

配置MySQL服务器允许IPv6连接
MySQL服务器对于TCP/IP连接会监听单个网络套接字。这种套接字被绑定到单个地址,但是可能一个地址映射到多个网络接口。为了指定一个地址,在服务器启动时使用–bind-address=addr选项,addr是IPv4或IPv6地址或主机名。(在MySQL 5.5.3之前不支持IPv6地址)。如果addr是主机名,服务器会解析这个主机名为一个IP地址并绑定到这个地址。

服务器对不同类型地址的处理如下:
.如果地址为*,如果服务器主机支持IPv6那么服务器可以接受来自所有服务器上所有IPv6和IPv4网络接口的TCP/IP连接,或者否则只会接受来自所有服务器上IPv4地址的TCP/IP连接。使用这个地址允许接受来自所有服务器网络接口的IPv4和IPv6连接。这个值是缺省值。

.如果地址为0.0.0.0,服务器只能接受来自所有服务器上IPv4网络接口的TCP/IP连接。

.如果地址为::,服务器接受来自所有服务器上的IPv4和IPv6网络接口的TCP/IP连接,这使用这个地址允许接受来自所有服务器网络接口的IPv4和IPv6连接。

.如果地址为IPv4映射地址,服务器接受来自这个地址的TCP/IP连接,这个地址是IPv4或IPv6格式。例如,如果服务器被绑定到::ffff:127.0.0.1,客户端可以使用–host=127.0.0.1或–host=::ffff:127.0.0.1来进行连接。

.如果地址为常规的IPv4或IPv6地址(比如127.0.0.1或::1),服务器只能接受IPv4或IPv6地址的TCP/IP连接。

如果倾向于绑定服务器到一个特定地址,确保mysql.user授权表包含一个有管理权限的用户可以使用这个地址进行连接。否则你将不能关闭服务器。例如,如果你倾向绑定服务器到*,你可以使用现有的所有账号来连接。但是如果你绑定服务器到::1,它只能接受这个地址的连接。在这种情况下,首先确保’root’@’::1’账号在mysql.user表中存在,所以可以仍然连接到服务器并关闭它。

使用IPv6本地主机地址连接MySQL服务器
下面介绍如何配置MySQL允许客户端通过IPv6(::1本地主机地址)连接到本地服务器。
1.在启动MySQL服务器时使用合适的–bind-address选项来允许服务器接受IPv6连接。例如,在my.cnf选项文件中设置bind-address并重启服务器

[mysqld]
......
bind-address=*
......

[root@localhost ~]# systemctl stop mysqld
[root@localhost ~]# systemctl start mysqld
[root@localhost ~]# systemctl status mysqld
* mysqld.service - LSB: start and stop MySQL
   Loaded: loaded (/etc/rc.d/init.d/mysqld; bad; vendor preset: disabled)
   Active: active (running) since 二 2020-01-14 16:56:48 CST; 5s ago
     Docs: man:systemd-sysv-generator(8)
  Process: 23257 ExecStart=/etc/rc.d/init.d/mysqld start (code=exited, status=0/SUCCESS)
    Tasks: 28
   Memory: 168.2M
   CGroup: /system.slice/mysqld.service
           |-23273 /bin/sh /mysqlsoft/mysql/bin/mysqld_safe --datadir=/mysqldata/mysql --pid-file=/mysqldata/mysql/mysqld.pid
           `-23591 /mysqlsoft/mysql/bin/mysqld --basedir=/mysqlsoft/mysql --datadir=/mysqldata/mysql --plugin-dir=/mysqlsoft/mysql/lib/plugin --user=mysql --log-error=/mysqldata/mysql/mysql.err --pid-file=/mysqldata/mysql/mysqld.pid ...

1月 14 16:56:46 localhost.localdomain systemd[1]: Starting LSB: start and stop MySQL...
1月 14 16:56:48 localhost.localdomain mysqld[23257]: Starting MySQL.. SUCCESS!
1月 14 16:56:48 localhost.localdomain systemd[1]: Started LSB: start and stop MySQL.

另外,你也可以绑定服务器到::1,但这会使用服务器更加受限于TCP/IP连接。它只能接受单个IPv6地址的连接并且会拒绝IPv4的连接。

2.作为管理员,连接到服务器创建一个本地用户账号它将使用::1本地IPv6主机地址进行连接

mysql> create user 'ipv6root'@'::1' identified by "abcd";
Query OK, 0 rows affected (0.01 sec)


mysql> grant select on *.* to 'ipv6root'@'::1';
Query OK, 0 rows affected (0.02 sec)

3.调用mysql客户端工具使用新创建的账号连接服务器

[mysql@localhost mysql]$ mysql -h ::1 -uipv6root -pabcd mysql
mysql: [Warning] Using a password on the command line interface can be insecure.
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 5.7.26-log Source distribution

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> select host,user from user;
+-----------+---------------+
| host      | user          |
+-----------+---------------+
| %         | csroot        |
| %         | root          |
| ::1       | ipv6root      |
| localhost | mysql.session |
| localhost | mysql.sys     |
+-----------+---------------+
5 rows in set (0.00 sec)

4.执行status命令来显示连接信息

mysql> status
--------------
mysql  Ver 14.14 Distrib 5.7.26, for Linux (x86_64) using  EditLine wrapper

Connection id:          7
Current database:       mysql
Current user:           ipv6root@localhost
SSL:                    Cipher in use is DHE-RSA-AES256-SHA
Current pager:          stdout
Using outfile:          ''
Using delimiter:        ;
Server version:         5.7.26-log Source distribution
Protocol version:       10
Connection:             ::1 via TCP/IP
Server characterset:    utf8mb4
Db     characterset:    utf8mb4
Client characterset:    gb2312
Conn.  characterset:    gb2312
TCP port:               3306
Uptime:                 11 min 47 sec

Threads: 1  Questions: 263  Slow queries: 0  Opens: 336  Flush tables: 1  Open tables: 329  Queries per second avg: 0.371
--------------

mysql> SELECT CURRENT_USER(), @@bind_address;
+----------------+----------------+
| CURRENT_USER() | @@bind_address |
+----------------+----------------+
| ipv6root@::1   | *              |
+----------------+----------------+
1 row in set (0.00 sec)

使用IPv6非本地主机地址连接MySQL服务器
下面将介绍如何配置MySQL服务器允许远程主机通过IPv6来连接MySQL服务器。这类似于在本地主机使用IPv6连接MySQL服务器一样,但服务器与客户端在不同的主机上,每台主机有不同的IPv6地址。
服务器主机的IPv6地址为2018::1

[root@shard1 network-scripts]# ifconfig -a
ens160: flags=4163  mtu 1500
        inet6 2018::1  prefixlen 64  scopeid 0x0
        ether 00:50:56:a0:51:0c  txqueuelen 1000  (Ethernet)
        RX packets 27089788  bytes 2770300693 (2.5 GiB)
        RX errors 0  dropped 6  overruns 0  frame 0
        TX packets 6720001  bytes 532979878 (508.2 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

客户端主机的IPv6地址为2018::2

ens160: flags=4163  mtu 1500
        inet6 2018::2  prefixlen 64  scopeid 0x0
        ether 00:50:56:a0:4e:69  txqueuelen 1000  (Ethernet)
        RX packets 25337245  bytes 1913014541 (1.7 GiB)
        RX errors 0  dropped 127  overruns 0  frame 0
        TX packets 1536650  bytes 166036937 (158.3 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

具体操作如下:
1.在启动MySQL服务器时使用合适的–bind-address选项来允许服务器接受IPv6连接。例如,在my.cnf选项文件中设置bind-address并重启服务器

[mysqld]
......
bind-address=*
......

[root@localhost ~]# systemctl stop mysqld
[root@localhost ~]# systemctl start mysqld
[root@localhost ~]# systemctl status mysqld
* mysqld.service - LSB: start and stop MySQL
   Loaded: loaded (/etc/rc.d/init.d/mysqld; bad; vendor preset: disabled)
   Active: active (running) since 二 2020-01-14 16:56:48 CST; 5s ago
     Docs: man:systemd-sysv-generator(8)
  Process: 23257 ExecStart=/etc/rc.d/init.d/mysqld start (code=exited, status=0/SUCCESS)
    Tasks: 28
   Memory: 168.2M
   CGroup: /system.slice/mysqld.service
           |-23273 /bin/sh /mysqlsoft/mysql/bin/mysqld_safe --datadir=/mysqldata/mysql --pid-file=/mysqldata/mysql/mysqld.pid
           `-23591 /mysqlsoft/mysql/bin/mysqld --basedir=/mysqlsoft/mysql --datadir=/mysqldata/mysql --plugin-dir=/mysqlsoft/mysql/lib/plugin --user=mysql --log-error=/mysqldata/mysql/mysql.err --pid-file=/mysqldata/mysql/mysqld.pid ...

1月 14 16:56:46 localhost.localdomain systemd[1]: Starting LSB: start and stop MySQL...
1月 14 16:56:48 localhost.localdomain mysqld[23257]: Starting MySQL.. SUCCESS!
1月 14 16:56:48 localhost.localdomain systemd[1]: Started LSB: start and stop MySQL.

另外,可以绑定服务器到2018:db1:0:f01::1,但这会使用服务器更加受限于TCP/IP连接。它只能接受单个IPv6地址的连接并且会拒绝IPv4的连接。

2.在服务器中(2018::1)创建一个账号将从客户端主机(2018::2)上连接服务器

mysql> create user 'csroot'@'%' identified by "123456";
Query OK, 0 rows affected (0.01 sec)

mysql> GRANT ALL PRIVILEGES ON *.* TO 'csroot'@'%' IDENTIFIED BY '123456' WITH GRANT OPTION;
Query OK, 0 rows affected (0.00 sec)

3.在客户端主机上(2018::2),调用mysql客户端使用创建的新账号来连接服务器(我这时使用MariaDB作为客户端工具)

-bash-4.2$ mysql -h 2018::1 -ucsroot -p123456 mysql
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MySQL connection id is 6
Server version: 5.6.23 MySQL Community Server (GPL)

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [mysql]>

4.执行status命令来显示连接信息

MySQL [mysql]> status
--------------
mysql  Ver 15.1 Distrib 10.0.38-MariaDB, for Linux (x86_64) using readline 5.1

Connection id:          6
Current database:       mysql
Current user:           csroot@2018::2
SSL:                    Not in use
Current pager:          stdout
Using outfile:          ''
Using delimiter:        ;
Server:                 MySQL
Server version:         5.6.23 MySQL Community Server (GPL)
Protocol version:       10
Connection:             2018::1 via TCP/IP
Server characterset:    latin1
Db     characterset:    latin1
Client characterset:    gb2312
Conn.  characterset:    gb2312
TCP port:               3306
Uptime:                 34 days 50 min 24 sec

Threads: 2  Questions: 175  Slow queries: 0  Opens: 87  Flush tables: 1  Open tables: 80  Queries per second avg: 0.000
--------------

MySQL [mysql]> select current_user(),@@bind_address;
+----------------+----------------+
| current_user() | @@bind_address |
+----------------+----------------+
| csroot@%       | *              |
+----------------+----------------+
1 row in set (0.00 sec)

MySQL Strict SQL MODE

严格SQL模式控制MySQL如何处理数据改变语句(insert或update)中的无效或缺失值。一个值可能由于各种原因而无效。例如,它对于列来说有错误的数据类型,或者超过了列的范围。当新记录被插入而对于非NULL且没有显式在定义时指定DEFAULT子句的列没有包含值就会出现缺失值的情况。(对于一个NULL列,如果缺失值就会插入NULL值)严格SQL模式也会影响DDL语句比如create table。

如果严格SQL模式没有生效,MySQL对于无效或者缺失值会插入调整值并生成一个警告。在严格SQL模式中,可以通过使用insert ignore或udpate ignore来产生这种行为。

对于select这样不改变数据的语句,在严格SQL模式中无效值会生成一个警告而不是错误。

严格SQL模式对于试图创建一个键值而超过列的最大键值长度时会产生一个错误。当严格SQL模式没有启用时,会产生一个警告并且截断键值的长度使其满足最大键值长度。

严格SQL模式不影响是否对外键约束执行检查。foreign_key_checks可以被使用。

如果STRICT_ALL_TABLES或STRICT_TRANS_TABLES被启用严格SQL模式就会生效,但这些模式的影响会有不同:
.对于事务表来说,当STRICT_ALL_TABLES或STRICT_TRANS_TABLES被启用后当在数据出现无效或丢失值就会出现错误。语句就会被终止与回滚。

.对于非事务表,如果在插入或更新语句中第一行记录出现坏值这些模式的行为是一样的:语句被终止并且表仍然保持不变。如果语句插入或修改多行记录并且在第二行或之后的行记录中出现坏值,那么结果依赖于严格SQL模式是否被启用。
-对于STRICT_ALL_TABLES,MySQL会返回一个错误并忽略剩余的行记录。然而,因为早些的行记录已经被插入或被更新,会导致部分更新。为了避免这个问题,使用单行语句,就会终止而不会改变表数据。

-对于STRICT_TRANS_TABLES,MySQL会将一个无效的值转换成一个最接近的有效值并插入这个调整值。如果这个值将丢失,MySQL插入这个隐式缺省值。在这种情况下,MySQL生成一个敬告而不是一个错误并继续处理语句。

严格SQL模式对除零,零日期和日期中出现零的处理如下:
.严格SQL模式影响对除零的处理,它包括MOD(N,0):对于数据改变操作(insert,update):
-如果严格SQL模式没有被启用,除零会插入NULL并生成一个警告。
-如果严格SQL模式被启用,除非指定了ignore否则除零操作会生成一个错误。对于insert ignore和update ignore操作,除零操作会插入NULL并生成一个警告。

对于select,除零操作会返回NULL。启用严格SQL模式会导致一个警告。
.严格SQL模式会影响服务器是否允许’0000-00-00’为一个有效日期:
-如果严格SQL模式没有被启用,’0000-00-00’被允许并且插入操作不会产生警告。
-如果严格SQL模式被启用,’0000-00-00’不被允许并且插入操作会产生错误,除非你指定ignore。对于insert ignore或update ignore,’0000-00-00’被允许并且插入操作会产生警告。

.严格SQL模式影响服务器是否允许在日期中的年部分为非零但月和日部分允许为零(比如’2010-00-01’或’2010-01-00′):
-如果严格SQL模式没有被启用,有零的日期被允许并且插入操作不产生警告。
-如果严格SQL模式被启用,有零的日期不被允许并且插入操作产生错误,除非指定ignore。对于insert ignore或update ignore来说,有零的日期将以’0000-00-00’形式被插入并产生一个警告。

IGNORE关键字与严格SQL模式的对比
这里将介绍在语句执行时IGNORE关键字(它降级错误为警告)和严格SQL模式(它升级警告为错误)的对比。描述它们影响那些语句以及应用它们有那些错误。

IGNORE对语句执行的影响
MySQL中的一些语句支持可选的IGNORE关键字。此关键字将导致服务器降级某些类型的错误并生成警告。对于多行语句,IGNORE会导致语句跳到下一行,而不是中止。

例如,如果表t2有一个主键列i,试图在多行记录中插入相同的i值正常来说会产生一个重复键错误:

mysql> insert into t2(i) values(1),(1);
ERROR 1062 (23000): Duplicate entry '1' for key 'PRIMARY'

当使用IGNORE关键字时,包含重复键值的记录仍然不会被插入,但会使用警告来代替错误:

mysql> insert ignore into t2(i) values(1),(1);
Query OK, 1 row affected, 1 warning (0.15 sec)
Records: 2  Duplicates: 1  Warnings: 1

mysql> show warnings;
+---------+------+---------------------------------------+
| Level   | Code | Message                               |
+---------+------+---------------------------------------+
| Warning | 1062 | Duplicate entry '1' for key 'PRIMARY' |
| Warning | 1062 | Duplicate entry '1' for key 'PRIMARY' |
+---------+------+---------------------------------------+
2 rows in set (0.00 sec)

以下语句支持IGNORE关键字:
.create table… select:ignore不能应用到语句的create table或select部分但对于由select语句所提供记录来执行插入语句可以应用。对于唯一键值重复的记录会被丢弃。

.delete:ignore会导致MySQL在处理删除记录时忽略错误。

.insert:使用ignore,对于唯一键值重复的记录会被丢弃。对于重复键值的行记录会导致数据转换为最接近的有效值被插入。

对于分区表当没有匹配指定值的分区被找到时,ignore会导致包含那些不匹配值的记录的插入操作失败。
.load data,load xml:使用ignore,对于唯一键值重复的记录会被丢弃。

.update:使用ignore,对于在唯一键值出现重复键值冲突的记录不会被更新。被更新的记录可能导致数据转换为最接近的有效值被插入。

ignore关键字应用到以下错误:
ER_BAD_NULL_ERROR
ER_DUP_ENTRY
ER_DUP_ENTRY_WITH_KEY_NAME
ER_DUP_KEY
ER_NO_PARTITION_FOR_GIVEN_VALUE
ER_NO_PARTITION_FOR_GIVEN_VALUE_SILENT
ER_NO_REFERENCED_ROW_2
ER_ROW_DOES_NOT_MATCH_GIVEN_PARTITION_SET
ER_ROW_IS_REFERENCED_2
ER_SUBQUERY_NO_1_ROW
ER_VIEW_CHECK_FAILED

严格SQL模式对语句执行的影响
MySQL服务器可以以不同的SQL模式进行操作并且可以应用这些不同模式到不同的客房端,这依赖于sql_mode系统变量。在严格SQL模式中,服务器会将特定的警告升级成错误。

例如,在非严格SQL模式中,向整数类型列插入字符串’abc’的结果是将这个字符串值转换为0并生成一个警告:

mysql> SET sql_mode = '';
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> insert into t2(i) values('abc');
Query OK, 1 row affected, 1 warning (0.02 sec)

mysql> show warnings;
+---------+------+--------------------------------------------------------+
| Level   | Code | Message                                                |
+---------+------+--------------------------------------------------------+
| Warning | 1366 | Incorrect integer value: 'abc' for column 'i' at row 1 |
+---------+------+--------------------------------------------------------+
1 row in set (0.00 sec)

在严格SQL模式下,无效值会被拒绝并生成错误:

mysql> SET sql_mode = 'STRICT_ALL_TABLES';
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> insert into t2(i) values('abc');
ERROR 1366 (HY000): Incorrect integer value: 'abc' for column 'i' at row 1

在某些条件下,某些值可能超出范围或将无效行插入或从表中删除,严格SQL模式适用于以下语句:
alter table
create table
create table … select
delete(单表和多表)
insert
load data
load xml
select sleep()
update(单表和多表)
在存储程序中,如果程序是在严格模式生效时定义的,则刚才列出的类型的各个语句将以严格SQL模式执行。

严格SQL模式应用于以下错误,代表输入值可能无效或丢失这类错误。如果对于列值使用了错误数据类型或超过了值的
范围那么值就是无效的。如果被插入的新行不包含NOT NULL列值但除了在列定义时显式指定了DEFAULT子句的那么就
是值丢失。
ER_BAD_NULL_ERROR
ER_CUT_VALUE_GROUP_CONCAT
ER_DATA_TOO_LONG
ER_DATETIME_FUNCTION_OVERFLOW
ER_DIVISION_BY_ZERO
ER_INVALID_ARGUMENT_FOR_LOGARITHM
ER_NO_DEFAULT_FOR_FIELD
ER_NO_DEFAULT_FOR_VIEW_FIELD
ER_TOO_LONG_KEY
ER_TRUNCATED_WRONG_VALUE
ER_TRUNCATED_WRONG_VALUE_FOR_FIELD
ER_WARN_DATA_OUT_OF_RANGE
ER_WARN_NULL_TO_NOTNULL
ER_WARN_TOO_FEW_RECORDS
ER_WRONG_ARGUMENTS
ER_WRONG_VALUE_FOR_TYPE
WARN_DATA_TRUNCATED

MySQL SQL模式

MySQL服务器可以以不同的SQL模式来进行操作,并且依赖于sql_mode系统变量的值对不同的客户端可以应用这些不同的SQL模式。DBA可以设置全局SQL模式来匹配服务器操作要求,并且每种应用程序可以设置它的会话SQL模式来满足它的要求。

SQL模式会影响MySQL支持的SQL语法和数据验证检查。这可以在不同环境中让MySQL与其它数据库一起使用变得更容易。

当使用InnoDB表时,可以考虑使用innodb_strict_mode系统变量,它可以对InnoDB表启用额外的错误检查。

设置SQL模式
在MySQL 5.7中缺省的SQL模式包含:ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES, NO_ZERO_IN_DATE,NO_ZERO_DATE, ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,和 NO_ENGINE_SUBSTITUTION。ONLY_FULL_GROUP_BY和STRICT_TRANS_TABLES是在MySQL 5.7.5中加入的。NO_AUTO_CREATE_USER是在MySQL 5.7.7中加入的。ERROR_FOR_DIVISION_BY_ZERO,NO_ZERO_IN_DATE和NO_ZERO_DATE是在MySQL 5.7.8中加入的。

为了在服务器启动时设置SQL模式,可以在命令行中使用–sql-mode=”modes”选项或在选项文件比如Unix中的my.cnf或Windows上的my.ini文件中使用sql-mode=”modes”选项。modes是用逗号分的不同模式列表。为了显式的清除SQL模式,可以在命令行中使用–sql-mode=””选项将SQL模式设置为空字符串,或者在选项文件中使用sql-mode=””。

MySQL安装程序可以在安装过程中会配置SQL模式。例如,mysql_install_db将在基本的安装目录中创建一个命名为my.cnf的缺省选项文件。这个文件包含设置SQL模式的记录。

如果SQL模式不同于缺省SQL模式或你所期待的SQL模式,可以检查服务器在启动时所读取的选项文件。

为了在运行时改变SQL模式,可以使用set语句来设置全局或会话级的sql_mode系统变量:
set global sql_mode=’modes’;
set session sql_mode=’modes’;

设置global变量需要有super权限并且影响所有连接的客户端操作。设置session变量只影响当前客户端。每个客户端可以在任何时间改变它会话的sql_mode值来达到改变SQL模式的目的。

为了判断当前全局或session级的sql_mode值,执行以下命令:

mysql> SELECT @@GLOBAL.sql_mode;
+-------------------------------------------------------------------------------------------------------------------------------------------+
| @@GLOBAL.sql_mode                                                                                                                         |
+-------------------------------------------------------------------------------------------------------------------------------------------+
| ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION |
+-------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> SELECT @@SESSION.sql_mode;
+-------------------------------------------------------------------------------------------------------------------------------------------+
| @@SESSION.sql_mode                                                                                                                        |
+-------------------------------------------------------------------------------------------------------------------------------------------+
| ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION |
+-------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

注意在创建分区表并插入数据之后改变服务器的SQL模式会对表的行为造成重大改变并且可能导致丢失或损坏数据。所以强烈建议在创建用户定义的分区表之后不要修改SQL模式。

当复制分区表时,在主从服务器之间不同的SQL模式也会导致一些问题。所以最好的结果就是在主从服务器上使用相同的SQL模式。

最重要的SQL模式
MySQL中最重要的sql_mode值可能是这些:
.ANSI
这种SQL模式改变语法和行为使其更接近标准SQL。它是一种特定的组合模式列表。

.STRICT_TRANS_TABLES
如果一个值不能以指定的方式插入到一个事务表,终止这个语句。对于非事务表,如果这个值在一个单行记录语句或在多行记录语句第一个出现时终止语句。从MySQL 5.7.5开始缺省的SQL模式包括STRICT_TRANS_TABLES。

.TRADITIONAL
使用MySQL行为像一个传统的SQL数据库系统。简单来说这种模式对于将一个不正确值插入到一个列中时抛出一个错误来代替一个警告。它是特定组合模式中的一种。insert或update一旦发现错误会立即终止。如果您正在使用非事务性存储引擎,这可能不是您想要的,因为在错误之前所做的数据更改可能无法回滚,从而导致“部分完成”的更新。

当提到”strict mode”严格模式时,它意味着是STRICT_TRANS_TABLES或STRICT_ALL_TABLES中的一种或两种都被启用。

完整的SQL模式列表
下面是所有支持的SQL模式列表:
.ALLOW_INVALID_DATES
不对日期执行完全检查。只对月份的范围从1到12和日期的范围从1到31执行检查。这对于Web应用程序非常方便,您可以在三个不同的字段中获得年、月和日,并且希望准确地存储用户插入的内容(不执行日期校验)。这种SQL模式应用于date和datetime列。它不应用于timestamp列,因为它总是请求一个合法的日期值。

这种SQL模式要求月分与每天的取值是合法值,并且范围分别不能超过1到12和1到31。当严格模式被禁用时,一个无效的日期值比如’2014-04-31’将被转换成’0000-00-00’并且生成一个警告。当使用严格模式时,一个无效的日期值会生成一个错误。为了允许这样的日期值,启用ALLOW_INVALID_DATES模式。

.ANSI_QUOTES
将”作为标识引用字符(像`引用字符)而不是字符串引用字符。当使用这种SQL模式时仍然可以使用`作为引用标识符,当ANSI_QUOTES被使用时,不能使用双引号来引用文本字符串,因为它被解析作标识符了。

.ERROR_FOR_DIVISION_BY_ZERO
ERROR_FOR_DIVISION_BY_ZERO模式影响对除以零的处理,这包含MOD(N,0)。对于数据修改操作(insert,update),它的影响也依赖于是否启用了严格SQL模式。
-如果这种模式没有被启用,除以零会插入NULL值并且没有警告。
-如果这种模式被启用,除以零会插入NULL值并生成警告。
-如果这种模式与严格SQL模式被启用,除以堆生成一个错误,除非IGNORE也被指定。对于insert ignore和update ignore,除以零会插入NULL值并生成警告。

对于select语句,除以零会返回NULL值。启用ERROR_FOR_DIVISION_BY_ZERO不管是否启用严格SQL模式会导致生成一个警告。

在MySQL 5.7.4中,ERROR_FOR_DIVISION_BY_ZERO被丢弃,在MySQL 5.7.4到5.7.7中当显式命名时ERROR_FOR_DIVISION_BY_ZERO不会做任何事。代替的是,它的影响被包含在严格SQL模式中。在MySQL 5.7.8和以后的版本中,当显式命名时ERROR_FOR_DIVISION_BY_ZERO会有影响并且没有包含在严格SQL模式中,就像MySQL5.7.4之前的版本一样。然而缺省情况下当严格模式启用下应该与它联合使用。如果ERROR_FOR_DIVISION_BY_ZERO被启用而没有启用严格模式或者当启用严格模式而没有启用ERROR_FOR_DIVISION_BY_ZERO时会出现这个警告。

因为ERROR_FOR_DIVISION_BY_ZERO被丢弃,它将在将来的版本中被删除并作为一个单独的模式名并且它的影响被包含在严格SQL模式。

.HIGH_NOT_PRECEDENCE
NOT操作的优先级就像NOT a between b and c会被解析成NOT (a between b and c)一样。在一些旧版本的MySQL中,表达式会被解析成(NOT a) between b and c。旧版本更高优先级行为可以通过启用HIGH_NOT_PRECEDENCE模式来获得。

mysql> SET sql_mode = '';
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> SELECT NOT 1 BETWEEN -5 AND 5;
+------------------------+
| NOT 1 BETWEEN -5 AND 5 |
+------------------------+
|                      0 |
+------------------------+
1 row in set (0.00 sec)

mysql> SET sql_mode = 'HIGH_NOT_PRECEDENCE';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT NOT 1 BETWEEN -5 AND 5;
+------------------------+
| NOT 1 BETWEEN -5 AND 5 |
+------------------------+
|                      1 |
+------------------------+
1 row in set (0.00 sec)

.IGNORE_SPACE
允许在函数名与(符号之间存在空格。这会造成内置函数名被作为保留关键字对待。因此标识符与函数名相同必须有引号。例如,因为有一个count()函数,因此在下面的语句中使用count作为表名就会出错:

mysql> CREATE TABLE count (i INT);
Query OK, 0 rows affected (0.13 sec)

mysql> drop table count cascade;
Query OK, 0 rows affected (0.09 sec)

mysql> SET sql_mode = 'IGNORE_SPACE';
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TABLE count (i INT);
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'count (i INT)' at line 1

mysql> create table `count` (i INT);
Query OK, 0 rows affected (0.16 sec)

IGNORE_SPACE模式应用到内置函数,而不是用户定义的函数或存储过程。它总是允许在用户定义的函数或存储过程名后有空格而不管是否启用了IGNORE_SAPCE模式。

.NO_AUTO_CREATE_USER
除非指定了身份验证信息,否则将阻止GRANT语句自动创建新用户帐户。这个语句必须使用identified by来指定非空密码或使用identified with来使用一种验证插件。

最好使用create user来创建MySQL账号,然后使用Grant语句。NO_AUTO_CREATE_USER已经被丢弃并且缺省的SQL模式包含了NO_AUTO_CREATE_USER模式。将sql_mode修改为NO_AUTO_CREATE_USER模式会生成一个警告,除了指定sql_mode为DEFAULT.NO_AUTO_CREATE_USER将会在将来的版本中被删除,到时它的影响将会一直被启用。

之前,在NO_AUTO_CREATE_USER被丢弃之前,一个不启用它的原因是它是不安全的复制。现在它可以被启用并且使用create user if not exists,drop user if exists和alter user if exists而不是grant语句来执行安全复制管理。这些语句当从服务器相对于主服务器有不同的授权时可以启用安全复制。

.NO_AUTO_VALUE_ON_ZERO
NO_AUTO_VALUE_ON_ZERO影响对AUTO_INCREMENT列的处理。正常来说通过向访列插入NULL或0来为列生成下一个序列值。NO_AUTO_VALUE_ON_ZERO会抑制0的行为因此只有插入NULL时会生成下一个序列值。如果0已经被存储到表的AUTO_INCREMENT列中这个SQL模式可能是有用的。(存储0是不建议的)例如,如果使用mysqldump来dump表并且然后再加载它,当遇到值0时MySQL正常来说会生成一个新的序列值,因此表的内容不同于被dump的内容。在加载dump文件之前启用NO_AUTO_VALUE_ON_ZERO来解决这个问题。mysqldump现在在它的输出中自动包含一个语句来启用NO_AUTO_VALUE_ON_ZERO来避免这个问题。

.NO_BACKSLASH_ESCAPES
禁用在字符串中使用反斜杠字符(\)作为转义字符。启用此模式后,反斜杠将成为与其他字符一样的普通字符。

.NO_DIR_IN_CREATE
创建表时,忽略所有索引目录和数据目录指令。此选项在从复制服务器上非常有用。

.NO_ENGINE_SUBSTITUTION
当一个语句比如create table或alter table指定一个存储引擎已经被禁用或没有被编译时控制缺省存储引擎的自动替换。缺省的SQL模式中包含了NO_ENGINE_SUBSTITUTION。因为存储引擎可以在运行时被附加进来,不可以存储引擎也以相同方式被对待:
当NO_ENGINE_SUBSTITUTION被禁用,对于create table的缺省存储引擎被使用并且如果期待的存储引擎不可用会出现一个警告。对于alter table,会出现一个警告并且表不能被修改。
当NO_ENGINE_SUBSTITUTION被启用时,如果期待的存储引擎不可用会出现一个警告并且表不会被创建或被修改。

.NO_FIELD_OPTIONS
在show create table输出中不打印特定MySQL列选项。这种SQL模式被mysqldump以可移植模式来使用。

.NO_KEY_OPTIONS
在show create table输出中不打印特定MySQL索引选项。这种SQL模式被mysqldump以可移植模式来使用。

.NO_TABLE_OPTIONS
在show create table输出中不打印特定MySQL表选项(比如ENGINE)。这种SQL模式被mysqldump以可移植模式来使用。

.NO_UNSIGNED_SUBTRACTION
两个整数相减,这里一种类型UNSIGNED,缺省情况下生成一个没有符号的结果。如果结果出现负数将会出现错误:

mysql> SET sql_mode = '';
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> SELECT CAST(0 AS UNSIGNED) - 1;
ERROR 1690 (22003): BIGINT UNSIGNED value is out of range in '(cast(0 as unsigned) - 1)'

如果NO_UNSIGNED_SUBTRACTION模式被启用,结果将是负数:

mysql> SET sql_mode = 'NO_UNSIGNED_SUBTRACTION';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT CAST(0 AS UNSIGNED) - 1;
+-------------------------+
| CAST(0 AS UNSIGNED) - 1 |
+-------------------------+
|                      -1 |
+-------------------------+
1 row in set (0.00 sec)

如果使用此类操作的结果更新无符号整数列,则将该结果裁剪为该列类型的最大值,如果启用no_unsigned_subtract,则将其裁剪为0。如果严格SQL模式被启用,则会出现错误并且列会保持不变。

当no_unsigned_subtraction被启用时,就算任何操作数据是无符号的相减的结果是有符号的。例如比较表t1中的c2列与表t2中的c2列:

mysql> SET sql_mode='';
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> CREATE TABLE test (c1 BIGINT UNSIGNED NOT NULL);
Query OK, 0 rows affected (0.20 sec)


mysql> CREATE TABLE t1 SELECT c1 - 1 AS c2 FROM test;
Query OK, 0 rows affected (0.11 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> desc t1;
+-------+---------------------+------+-----+---------+-------+
| Field | Type                | Null | Key | Default | Extra |
+-------+---------------------+------+-----+---------+-------+
| c2    | bigint(21) unsigned | NO   |     | 0       |       |
+-------+---------------------+------+-----+---------+-------+
1 row in set (0.00 sec)

mysql> SET sql_mode='NO_UNSIGNED_SUBTRACTION';
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TABLE t2 SELECT c1 - 1 AS c2 FROM test;
Query OK, 0 rows affected (0.25 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> desc t2;
+-------+------------+------+-----+---------+-------+
| Field | Type       | Null | Key | Default | Extra |
+-------+------------+------+-----+---------+-------+
| c2    | bigint(21) | NO   |     | 0       |       |
+-------+------------+------+-----+---------+-------+
1 row in set (0.00 sec)

这意味着bigint unsigned在所有上下文中不是100%可以使用。

.NO_ZERO_DATE
NO_ZERO_DATE模式影响服务器是否允许’0000-00-00’作为一种有效的日期。它的影响也依赖于是否启用了严格SQL模式。
-如果模式没有被启用,’0000-00-00’被允许并且插入不会产生警告。
-如果模式被启用,’0000-00-00’被允许并且插入会产生警告。
-如果模式和严格SQL模式被启用,’0000-00-00’不被允许并且插入会出现错误,除非指定IGNORE选项。对于insert ignore和update ignore来说,’0000-00-00’被允许并且插入会产生警告。

在MySQL 5.7.4中,NO_ZERO_DATE被丢弃。在MySQL 5.7.4到5.7.7中,NO_ZERO_DATE当显式指定时不会工作。代替地是它的影响已经被包含在严格SQL模式中。在MySQL 5.7.8和以后的版本中,当显式地指定NO_ZERO_DATE模式时它会工作并且它不是严格SQL模式的一部分就像MySQL5.7.4版本之前作用一样。然而,缺省情况下它应该与严格SQL模式联合使用。如果启用NO_ZERO_DATE而没有启用严格SQL模式就是出现警告或者反之亦然。

因为NO_ZERO_DATE已经被丢弃,因此它会在将来的版本中作为单独的模式名被删除并且它的影响会包含在严格SQL模式中。

.NO_ZERO_IN_DATE
NO_ZERO_IN_DATE模式影响服务器是否允许日期中的年部分为非0但月或日部分为0。(这种模式影响日期比如’2010-00-01’或’2010-01-00’,而不是’0000-00-00’。为了控制服务器是否允许’0000-00-00’,使用NO_ZERO_DATE模式)。NO_ZERO_IN_DATE模式的影响也依赖于是否启用了严格SQL模式。
-如果这种模式没有启用,有为0部分折日期被允许并且插入不产生警告。
-如果这种模式被启用,有为0部分的日期值将以’0000-00-00’格式被插入并且生成一个警告。
-如果这种模式与严格SQL模式被启用,有为0部分的日期不被允许并且插入会产生一个错误,除非你指定ignore。对于insert ignore和update ignore来说,有为0部分的日期将以’0000-00-00’格式被插入并且生成一个警告。

在MySQL 5.7.4中,NO_ZERO_IN_DATE被丢弃。在MySQL 5.7.4到5.7.7中,NO_ZERO_DATE当显式指定时不会工作。代替地是它的影响已经被包含在严格SQL模式中。在MySQL 5.7.8和以后的版本中,当显式地指定NO_ZERO_IN_DATE模式时它会工作并且它不是严格SQL模式的一部分就像MySQL5.7.4版本之前作用一样。然而,缺省情况下它应该与严格SQL模式联合使用。如果启用NO_ZERO_IN_DATE而没有启用严格SQL模式就是出现警告或者反之亦然。

因为NO_ZERO_IN_DATE已经被丢弃,因此它会在将来的版本中作为单独的模式名被删除并且它的影响会包含在严格SQL模式中。

.ONLY_FULL_GROUP_BY
拒绝那些select list、HAVING condition或ORDER BY list引用非聚合列的查询,这些列既不在GROUP BY子句中命名,也不依赖于GROUP BY列(由GROUP BY列唯一确定)。

从MySQL 5.7.5开始,缺省的SQL模式包含ONLY_FULL_GROUP_BY模式。(在5.7.5之前,MySQL没有检测到功能依赖并且缺省情况下ONLY_FULL_GROUP_BY模式没有启用。

MySQL扩展了标准SQL来允许在having子句中引用select列表中的别名表达式。在MySQL 5.7.5之前,启用ONLY_FULL_GROUP_BY模式会禁用这种扩展,因此要求having子句以非别名表达式来书写。从MySQL5.7.5开始,这种限制被取消了因此having子句可以引用别名而不用管ONLY_FULL_GROUP_BY模式是否启用了。

.PAD_CAHR_TO_FULL_LENGTH
默认情况下,在检索时从CHAR列值中裁剪尾随空格。如果PAD_CHAR_TO_FULL_LENGTH被启用,裁剪不会发生并且在检索CHAR列值时填充到它的完整长度。这种模式不应用于varchar列。检索时为其保留尾随空格。

mysql> CREATE TABLE t1 (c1 CHAR(10));
Query OK, 0 rows affected (0.12 sec)

mysql> INSERT INTO t1 (c1) VALUES('xy');
Query OK, 1 row affected (0.03 sec)

mysql> SET sql_mode = '';
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> SELECT c1, CHAR_LENGTH(c1) FROM t1;
+------+-----------------+
| c1   | CHAR_LENGTH(c1) |
+------+-----------------+
| xy   |               2 |
+------+-----------------+
1 row in set (0.00 sec)

mysql> SET sql_mode = 'PAD_CHAR_TO_FULL_LENGTH';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT c1, CHAR_LENGTH(c1) FROM t1;
+------------+-----------------+
| c1         | CHAR_LENGTH(c1) |
+------------+-----------------+
| xy         |              10 |
+------------+-----------------+
1 row in set (0.00 sec)

.PIPES_AS_CONCAT
将||作为字符串连接操作符(与concat()一样)而不是作为OR的同义词。

.REAL_AS_FLOAT
将REAL作为FLOAT的同义词。缺省情况下,MySQL将REAL作为DOUBLE的同义词。

.STRICT_ALL_TABLES
对所有的存储引擎启用严格SQL模式。无效的数据会被拒绝。从MySQL 5.7.4到5.7.7,STRICT_ALL_TABLES模式包含了ERROR_FOR_DIVISION_BY_ZERO,NO_ZERO_DATE和NO_ZERO_IN_DATE模式的影响。

.STRICT_TRANS_TABLES
对事务型存储引擎启用严格SQL模式并且在可能的情况下使用非事务型存储引擎。从MySQL 5.7.4到5.7.7,STRICT_TRANS_TABLES模式包含了ERROR_FOR_DIVISION_BY_ZERO,NO_ZERO_DATE和NO_ZERO_IN_DATE模式的影响。

Oracle Linux 6.7中 Oracle 11.2.0.4 RAC集群CRS异常处理

最近一个月单位上的Oracle RAC集群CRS异常导致集群命令无法使用,执行crsctl stat res -t命令出现如下错误:

[grid@db1 ~]$ crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.

[grid@db2 ~]$ crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.

但是数据库可以正常访问,业务系统也运行正常

[root@db1 ~]# ps -ef | grep pmon
root       8024 204594  0 21:11 pts/0    00:00:00 grep pmon
grid      77120      1  0 Dec21 ?        00:04:21 asm_pmon_+ASM1
oracle    77790      1  0 Dec21 ?        00:05:18 ora_pmon_CAIWU1
oracle    77794      1  0 Dec21 ?        00:05:08 ora_pmon_dadb1
oracle    77848      1  0 Dec21 ?        00:05:39 ora_pmon_chdyl1
oracle    77910      1  0 Dec21 ?        00:07:47 ora_pmon_RLZY1

[root@db2 ~]# ps -ef | grep pmon
grid      27745      1  0 Dec21 ?        00:04:21 asm_pmon_+ASM2
oracle    28393      1  0 Dec21 ?        00:05:21 ora_pmon_dadb2
oracle    28569      1  0 Dec21 ?        00:04:58 ora_pmon_CAIWU2
oracle    28573      1  0 Dec21 ?        00:05:36 ora_pmon_chdyl2
oracle    28583      1  0 Dec21 ?        00:07:49 ora_pmon_RLZY2

查看ASM磁盘组的状态,发现OCR磁盘组确实offline了

[grid@db1 ~]$ asmcmd lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512   4096  1048576   3072000  2679522                0         2679522              0             N  ARCH/
MOUNTED  EXTERN  N         512   4096  1048576    204800   151138                0          151138              0             N  CWDATA/
MOUNTED  EXTERN  N         512   4096  1048576    512000   472546                0          472546              0             N  DADATA/
MOUNTED  EXTERN  N         512   4096  1048576   3072000   595334                0          595334              0             N  DATA/
MOUNTED  EXTERN  N         512   4096  1048576   1843200   609953                0          609953              0             N  SBDATA/

[grid@db2 ~]$ asmcmd lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512   4096  1048576   3072000  2679522                0         2679522              0             N  ARCH/
MOUNTED  EXTERN  N         512   4096  1048576    204800   151138                0          151138              0             N  CWDATA/
MOUNTED  EXTERN  N         512   4096  1048576    512000   472546                0          472546              0             N  DADATA/
MOUNTED  EXTERN  N         512   4096  1048576   3072000   595334                0          595334              0             N  DATA/
MOUNTED  EXTERN  N         512   4096  1048576   1843200   609953                0          609953              0             N  SBDATA/

手工将crsdg上线,命令能够执行成功,但执行crsctl stat res -t命令仍然报错。

[grid@db1 ~]$ sqlplus / as sysasm

SQL*Plus: Release 11.2.0.4.0 Production on Mon Dec 30 21:15:33 2019

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> alter diskgroup ocr mount;

Diskgroup altered.

SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

[grid@db1 ~]$ crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.

[grid@db2 ~]$ sqlplus / as sysasm

SQL*Plus: Release 11.2.0.4.0 Production on Mon Dec 30 21:15:05 2019

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> alter diskgroup ocr mount;

Diskgroup altered.

SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

[grid@db2 ~]$ crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.

检查节点db1的alert_+ASM1.log有如下报错,说是不能访问OCR磁盘组的相关磁盘而强制dismount了OCR磁盘,但使用dd命令测试是能够访问的

Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_77212.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
WARNING: requested mirror side 1 of virtual extent 0 logical extent 0 offset 102400 is not allocated; I/O request failed
WARNING: requested mirror side 2 of virtual extent 0 logical extent 1 offset 102400 is not allocated; I/O request failed
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_77212.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted
Sat Dec 28 05:30:48 2019
SQL> alter diskgroup OCR check /* proxy */ 
ORA-15032: not all alterations performed
ORA-15001: diskgroup "OCR" does not exist or is not mounted
ERROR: alter diskgroup OCR check /* proxy */
NOTE: client exited [77184]
Sat Dec 28 05:30:49 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35285] opening OCR file
Sat Dec 28 05:30:51 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35305] opening OCR file
Sat Dec 28 05:30:53 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35322] opening OCR file
Sat Dec 28 05:30:55 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35346] opening OCR file
Sat Dec 28 05:30:57 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35363] opening OCR file
Sat Dec 28 05:31:00 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35459] opening OCR file
Sat Dec 28 05:31:02 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35481] opening OCR file
Sat Dec 28 05:31:04 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35520] opening OCR file
Sat Dec 28 05:31:06 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35539] opening OCR file
Sat Dec 28 05:31:08 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35557] opening OCR file
Sat Dec 28 21:00:10 2019
Warning: VKTM detected a time drift.
Time drifts can result in an unexpected behavior such as time-outs. Please check trace file for more details.

检查错误日志

[root@db1 ~]# more /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_77212.trc
Trace file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_77212.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
ORACLE_HOME = /u01/app/11.2.0/grid
System name:    Linux
Node name:      db1
Release:        3.8.13-68.3.4.el6uek.x86_64
Version:        #2 SMP Tue Jul 14 15:03:36 PDT 2015
Machine:        x86_64
Instance name: +ASM1
Redo thread mounted by this instance: 0 
Oracle process number: 24
Unix process pid: 77212, image: oracle@db1 (TNS V1-V3)


*** 2019-12-28 05:30:44.894
*** SESSION ID:(2929.3) 2019-12-28 05:30:44.894
*** CLIENT ID:() 2019-12-28 05:30:44.894
*** SERVICE NAME:() 2019-12-28 05:30:44.894
*** MODULE NAME:(crsd.bin@db1 (TNS V1-V3)) 2019-12-28 05:30:44.894
*** ACTION NAME:() 2019-12-28 05:30:44.894
 
Received ORADEBUG command (#1) 'CLEANUP_KFK_FD' from process 'Unix process pid: 35253, image: '

*** 2019-12-28 05:30:44.895
Finished processing ORADEBUG command (#1) 'CLEANUP_KFK_FD'

*** 2019-12-28 05:30:48.235
WARNING:failed xlate 1 
ORA-15078: ASM diskgroup was forcibly dismounted
ksfdrfms:Mirror Read file=+OCR.255.4294967295 fob=0x9b00e5d8 bufp=0x7f5dd012ba00 blkno=25 nbytes=4096
WARNING:failed xlate 1 
WARNING: requested mirror side 1 of virtual extent 0 logical extent 0 offset 102400 is not allocated; I/O request failed
ksfdrfms:Read failed from mirror side=1 logical extent number=0 dskno=65535
WARNING:failed xlate 1 
WARNING: requested mirror side 2 of virtual extent 0 logical extent 1 offset 102400 is not allocated; I/O request failed
ksfdrfms:Read failed from mirror side=2 logical extent number=1 dskno=65535
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted

检查节点db1的alertdb1.log有如下报错,也是说不能访问OCR磁盘组的相关磁盘

2019-12-28 05:30:48.468: 
[/u01/app/11.2.0/grid/bin/oraagent.bin(77466)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/oraagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:1:4} in /u01/app/11.2.0/grid/log/db1/agent/crsd/oraagent_grid/oraagent_grid.lo
g.
2019-12-28 05:30:48.468: 
[/u01/app/11.2.0/grid/bin/oraagent.bin(77684)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/oraagent_oracle' disconnected from server. Details at (:CRSAGF00117:) {0:7:332} in /u01/app/11.2.0/grid/log/db1/agent/crsd/oraagent_oracle/oraagent_o
racle.log.
2019-12-28 05:30:48.471: 
[/u01/app/11.2.0/grid/bin/orarootagent.bin(77482)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117:) {0:5:11497} in /u01/app/11.2.0/grid/log/db1/agent/crsd/orarootagent_root/
orarootagent_root.log.
2019-12-28 05:30:48.480: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:30:50.003: 
[crsd(35285)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:50.021: 
[crsd(35285)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:50.520: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:30:51.918: 
[crsd(35305)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:51.929: 
[crsd(35305)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:52.557: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:30:53.945: 
[crsd(35322)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:53.956: 
[crsd(35322)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:54.595: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:30:55.976: 
[crsd(35346)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:55.988: 
[crsd(35346)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:56.633: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:30:58.010: 
[crsd(35363)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:58.020: 
[crsd(35363)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:58.669: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:31:00.043: 
[crsd(35459)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:00.054: 
[crsd(35459)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:00.706: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:31:02.093: 
[crsd(35481)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:02.103: 
[crsd(35481)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:02.742: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:31:04.109: 
[crsd(35520)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:04.119: 
[crsd(35520)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:04.777: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:31:06.141: 
[crsd(35539)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:06.151: 
[crsd(35539)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:06.810: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:31:08.181: 
[crsd(35557)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:08.191: 
[crsd(35557)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:08.846: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:31:08.847: 
[ohasd(33022)]CRS-2771:Maximum restart attempts reached for resource 'ora.crsd'; will not restart.
2019-12-28 05:31:08.848: 
[ohasd(33022)]CRS-2769:Unable to failover resource 'ora.crsd'.

检查节点db1的oraagent_grid.log文件有如下报错,显示OCR磁盘组的状态被改变为offline了。

2019-12-28 05:30:16.531: [    AGFW][511039232]{1:30746:2} Agent received the message: AGENT_HB[Engine] ID 12293:113720
2019-12-28 05:30:37.808: [    AGFW][511039232]{1:30746:9373} Agent received the message: RESOURCE_STOP[ora.OCR.dg db1 1] ID 4099:113730
2019-12-28 05:30:37.808: [    AGFW][511039232]{1:30746:9373} Preparing STOP command for: ora.OCR.dg db1 1
2019-12-28 05:30:37.808: [    AGFW][511039232]{1:30746:9373} ora.OCR.dg db1 1 state changed from: ONLINE to: STOPPING
2019-12-28 05:30:37.809: [ora.OCR.dg][513140480]{1:30746:9373} [stop] (:CLSN00108:) clsn_agent::stop {
2019-12-28 05:30:37.810: [ora.OCR.dg][513140480]{1:30746:9373} [stop] DgpAgent::stop: enter { 
2019-12-28 05:30:37.810: [ora.OCR.dg][513140480]{1:30746:9373} [stop] getResAttrib: attrib name USR_ORA_OPI value true len 4
2019-12-28 05:30:37.810: [ora.OCR.dg][513140480]{1:30746:9373} [stop] Agent::flagUsrOraOpiIsSet(true) reason not dependency
2019-12-28 05:30:37.810: [ora.OCR.dg][513140480]{1:30746:9373} [stop] DgpAgent::stop: tha exit }
2019-12-28 05:30:37.810: [ora.OCR.dg][513140480]{1:30746:9373} [stop] DgpAgent::stopSingle status:2 }
2019-12-28 05:30:37.810: [ora.OCR.dg][513140480]{1:30746:9373} [stop] (:CLSN00108:) clsn_agent::stop }
2019-12-28 05:30:37.810: [    AGFW][513140480]{1:30746:9373} Command: stop for resource: ora.OCR.dg db1 1 completed with status: SUCCESS
2019-12-28 05:30:37.810: [ora.OCR.dg][513140480]{1:30746:9373} [check] CrsCmd::ClscrsCmdData::stat entity 1 statflag 33 useFilter 0
2019-12-28 05:30:37.811: [    AGFW][511039232]{1:30746:9373} Agent sending reply for: RESOURCE_STOP[ora.OCR.dg db1 1] ID 4099:113730
2019-12-28 05:30:37.838: [ora.OCR.dg][513140480]{1:30746:9373} [check] DgpAgent::runCheck: asm stat asmRet 0
2019-12-28 05:30:37.839: [ora.OCR.dg][513140480]{1:30746:9373} [check] DgpAgent::getConnxn connected
2019-12-28 05:30:37.844: [ora.OCR.dg][513140480]{1:30746:9373} [check] DgpAgent::queryDgStatus excp no data found
2019-12-28 05:30:37.844: [ora.OCR.dg][513140480]{1:30746:9373} [check] DgpAgent::queryDgStatus no data found in v$asm_diskgroup_stat
2019-12-28 05:30:37.844: [ora.OCR.dg][513140480]{1:30746:9373} [check] DgpAgent::queryDgStatus dgName OCR ret 1
2019-12-28 05:30:37.845: [    AGFW][511039232]{1:30746:9373} ora.OCR.dg db1 1 state changed from: STOPPING to: OFFLINE
2019-12-28 05:30:37.845: [    AGFW][511039232]{1:30746:9373} Agent sending last reply for: RESOURCE_STOP[ora.OCR.dg db1 1] ID 4099:113730
2019-12-28 05:30:43.889: [ora.asm][503641856]{1:30746:2} [check] CrsCmd::ClscrsCmdData::stat entity 1 statflag 33 useFilter 0
2019-12-28 05:30:43.920: [ora.asm][503641856]{1:30746:2} [check] AsmProxyAgent::check clsagfw_res_status 0
2019-12-28 05:30:48.465: [ CRSCOMM][521545472] IpcC: IPC client connection 6c to member 0 has been removed
2019-12-28 05:30:48.465: [CLSFRAME][521545472] Removing IPC Member:{Relative|Node:0|Process:0|Type:1}
2019-12-28 05:30:48.465: [CLSFRAME][521545472] Disconnected from CRSD:db1 process: {Relative|Node:0|Process:0|Type:1}
2019-12-28 05:30:48.474: [   AGENT][511039232]{0:1:4} {0:1:4} Created alert : (:CRSAGF00117:) :  Disconnected from server, Agent is shutting down.
2019-12-28 05:30:48.474: [    AGFW][511039232]{0:1:4} Agent is exiting with exit code: 1

检查节点db2的alert_+ASM2.log有如下报错,出现了类似的”Waited 15 secs for write IO to PST disk 0 in group 1″信息,这说明对OCR磁盘组执行写操作时超时了15秒.

Sat Dec 28 03:02:51 2019
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 3.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 3.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 5.
WARNING: Waited 15 secs for write IO to PST disk 1 in group 5.
WARNING: Waited 15 secs for write IO to PST disk 2 in group 5.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 5.
WARNING: Waited 15 secs for write IO to PST disk 1 in group 5.
WARNING: Waited 15 secs for write IO to PST disk 2 in group 5.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 6.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 6.
Sat Dec 28 03:02:51 2019
NOTE: process _b000_+asm2 (45488) initiating offline of disk 0.3916160907 (OCR1) with mask 0x7e in group 5
NOTE: process _b000_+asm2 (45488) initiating offline of disk 1.3916160906 (OCR2) with mask 0x7e in group 5
NOTE: process _b000_+asm2 (45488) initiating offline of disk 2.3916160905 (OCR3) with mask 0x7e in group 5
NOTE: checking PST: grp = 5
GMON checking disk modes for group 5 at 19 for pid 27, osid 45488
ERROR: no read quorum in group: required 2, found 0 disks
NOTE: checking PST for grp 5 done.
NOTE: initiating PST update: grp = 5, dsk = 0/0xe96bdf8b, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 5, dsk = 1/0xe96bdf8a, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 5, dsk = 2/0xe96bdf89, mask = 0x6a, op = clear
GMON updating disk modes for group 5 at 20 for pid 27, osid 45488
ERROR: no read quorum in group: required 2, found 0 disks
Sat Dec 28 03:02:51 2019
NOTE: cache dismounting (not clean) group 5/0x8F5B2F9F (OCR) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 45490, image: oracle@db2 (B001)
Sat Dec 28 03:02:51 2019
NOTE: halting all I/Os to diskgroup 5 (OCR)
Sat Dec 28 03:02:52 2019
NOTE: LGWR doing non-clean dismount of group 5 (OCR)
NOTE: LGWR sync ABA=23.100 last written ABA 23.100
WARNING: Offline for disk OCR1 in mode 0x7f failed.
WARNING: Offline for disk OCR2 in mode 0x7f failed.
WARNING: Offline for disk OCR3 in mode 0x7f failed.
Sat Dec 28 03:02:52 2019
kjbdomdet send to inst 1
detach from dom 5, sending detach message to inst 1
Sat Dec 28 03:02:52 2019
List of instances:
 1 2
Dirty detach reconfiguration started (new ddet inc 1, cluster inc 36)
 Global Resource Directory partially frozen for dirty detach
* dirty detach - domain 5 invalid = TRUE 
 0 GCS resources traversed, 0 cancelled
Dirty Detach Reconfiguration complete
Sat Dec 28 03:02:52 2019
WARNING: dirty detached from domain 5
NOTE: cache dismounted group 5/0x8F5B2F9F (OCR) 
SQL> alter diskgroup OCR dismount force /* ASM SERVER:2405117855 */ 
Sat Dec 28 03:02:52 2019
NOTE: cache deleting context for group OCR 5/0x8f5b2f9f
GMON dismounting group 5 at 21 for pid 28, osid 45490
NOTE: Disk OCR1 in mode 0x7f marked for de-assignment
NOTE: Disk OCR2 in mode 0x7f marked for de-assignment
NOTE: Disk OCR3 in mode 0x7f marked for de-assignment
NOTE:Waiting for all pending writes to complete before de-registering: grpnum 5
Sat Dec 28 03:03:03 2019
WARNING: Waited 27 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 27 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 24 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 24 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 27 secs for write IO to PST disk 0 in group 3.
WARNING: Waited 27 secs for write IO to PST disk 0 in group 3.
WARNING: Waited 21 secs for write IO to PST disk 0 in group 4.
WARNING: Waited 21 secs for write IO to PST disk 0 in group 4.
WARNING: Waited 27 secs for write IO to PST disk 0 in group 6.
WARNING: Waited 27 secs for write IO to PST disk 0 in group 6.
Sat Dec 28 03:03:03 2019
ASM Health Checker found 1 new failures
Sat Dec 28 03:03:22 2019
SUCCESS: diskgroup OCR was dismounted
SUCCESS: alter diskgroup OCR dismount force /* ASM SERVER:2405117855 */
SUCCESS: ASM-initiated MANDATORY DISMOUNT of group OCR
Sat Dec 28 03:03:22 2019
NOTE: diskgroup resource ora.OCR.dg is offline
Sat Dec 28 03:03:22 2019
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Sat Dec 28 05:30:34 2019
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 6.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 6.
Sat Dec 28 05:30:37 2019
 Received dirty detach msg from inst 1 for dom 5
Sat Dec 28 05:30:37 2019
List of instances:
 1 2
Dirty detach reconfiguration started (new ddet inc 2, cluster inc 36)
 Global Resource Directory partially frozen for dirty detach
* dirty detach - domain 5 invalid = TRUE 
 0 GCS resources traversed, 0 cancelled
freeing rdom 5
Dirty Detach Reconfiguration complete
Sat Dec 28 05:30:37 2019
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
WARNING: requested mirror side 1 of virtual extent 5 logical extent 0 offset 704512 is not allocated; I/O request failed
WARNING: requested mirror side 2 of virtual extent 5 logical extent 1 offset 704512 is not allocated; I/O request failed
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted
Sat Dec 28 05:30:37 2019
SQL> alter diskgroup OCR check /* proxy */ 
ORA-15032: not all alterations performed
ORA-15001: diskgroup "OCR" does not exist or is not mounted
ERROR: alter diskgroup OCR check /* proxy */
Sat Dec 28 05:30:44 2019
WARNING: Waited 20 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 20 secs for write IO to PST disk 0 in group 2.
Sat Dec 28 05:30:48 2019
NOTE: client exited [27819]
Sat Dec 28 05:30:49 2019
NOTE: [crsd.bin@db2 (TNS V1-V3) 142641] opening OCR file
Sat Dec 28 05:30:51 2019
NOTE: [crsd.bin@db2 (TNS V1-V3) 142660] opening OCR file
Sat Dec 28 05:30:53 2019
NOTE: [crsd.bin@db2 (TNS V1-V3) 142678] opening OCR file
Sat Dec 28 05:30:55 2019
NOTE: [crsd.bin@db2 (TNS V1-V3) 142696] opening OCR file
Sat Dec 28 05:30:57 2019
NOTE: [crsd.bin@db2 (TNS V1-V3) 142723] opening OCR file
Sat Dec 28 05:30:59 2019
NOTE: [crsd.bin@db2 (TNS V1-V3) 142744] opening OCR file
Sat Dec 28 05:31:01 2019
NOTE: [crsd.bin@db2 (TNS V1-V3) 142773] opening OCR file
Sat Dec 28 05:31:03 2019
NOTE: [crsd.bin@db2 (TNS V1-V3) 142792] opening OCR file
Sat Dec 28 05:31:05 2019
NOTE: [crsd.bin@db2 (TNS V1-V3) 142806] opening OCR file
Sat Dec 28 05:31:07 2019
NOTE: [crsd.bin@db2 (TNS V1-V3) 142821] opening OCR file
Sat Dec 28 06:18:42 2019
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 2.

检查节点db2的alertdb2.log有如下报错,也是说不能访问OCR磁盘组的相关磁盘,但使用dd命令测试是可以访问的。

2019-12-28 05:30:48.019: 
[/u01/app/11.2.0/grid/bin/oraagent.bin(28268)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/oraagent_oracle' disconnected from server. Details at (:CRSAGF00117:) {0:7:73} in /u01/app/11.2.0/grid/log/db2/agent/crsd/oraagent_oracle/oraagent_oracle.log.
2019-12-28 05:30:48.019: 
[/u01/app/11.2.0/grid/bin/scriptagent.bin(37953)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/scriptagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:9:8} in /u01/app/11.2.0/grid/log/db2/agent/crsd/scriptagent_grid/scriptagent_grid.log.
2019-12-28 05:30:48.020: 
[/u01/app/11.2.0/grid/bin/oraagent.bin(28009)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/oraagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:1:10} in /u01/app/11.2.0/grid/log/db2/agent/crsd/oraagent_grid/oraagent_grid.log.
2019-12-28 05:30:48.021: 
[/u01/app/11.2.0/grid/bin/orarootagent.bin(28025)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117:) {0:5:373} in /u01/app/11.2.0/grid/log/db2/agent/crsd/orarootagent_root/orarootagent_root.log.
2019-12-28 05:30:48.024: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:30:49.410: 
[crsd(142641)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:49.420: 
[crsd(142641)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:50.063: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:30:51.442: 
[crsd(142660)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:51.451: 
[crsd(142660)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:52.100: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:30:53.471: 
[crsd(142678)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:53.480: 
[crsd(142678)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:54.138: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:30:55.507: 
[crsd(142696)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:55.517: 
[crsd(142696)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:56.176: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:30:57.551: 
[crsd(142723)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:57.560: 
[crsd(142723)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:58.216: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:30:59.592: 
[crsd(142744)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:59.602: 
[crsd(142744)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:00.253: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:31:01.627: 
[crsd(142773)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:01.636: 
[crsd(142773)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:02.290: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:31:03.658: 
[crsd(142792)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:03.668: 
[crsd(142792)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:04.327: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:31:05.701: 
[crsd(142806)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:05.711: 
[crsd(142806)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:06.365: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:31:07.726: 
[crsd(142821)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:07.735: 
[crsd(142821)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:08.402: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:31:08.402: 
[ohasd(13034)]CRS-2771:Maximum restart attempts reached for resource 'ora.crsd'; will not restart.
2019-12-28 05:31:08.403: 
[ohasd(13034)]CRS-2769:Unable to failover resource 'ora.crsd'.

检查节点db2的oraagent_grid.log文件有如下报错:

2019-12-28 05:29:59.329: [    AGFW][3601811200]{2:6928:2} Agent received the message: AGENT_HB[Engine] ID 12293:273304
2019-12-28 05:30:17.162: [ora.LISTENER_SCAN2.lsnr][3592312576]{1:34166:403} [check] Utils:execCmd action = 3 flags = 38 ohome = (null) cmdname = lsnrctl. 
2019-12-28 05:30:17.267: [ora.LISTENER_SCAN2.lsnr][3592312576]{1:34166:403} [check] execCmd ret = 0
2019-12-28 05:30:17.267: [ora.LISTENER_SCAN2.lsnr][3592312576]{1:34166:403} [check] CrsCmd::ClscrsCmdData::stat entity 5 statflag 32 useFilter 1
2019-12-28 05:30:17.298: [ora.LISTENER_SCAN2.lsnr][3592312576]{1:34166:403} [check] ScanLsnrAgent::checkDependentVipResource: scanVipResource = ora.scan2.vip, statRet = 0
2019-12-28 05:30:17.881: [ora.LISTENER_SCAN3.lsnr][2950686464]{1:34166:403} [check] Utils:execCmd action = 3 flags = 38 ohome = (null) cmdname = lsnrctl. 
2019-12-28 05:30:17.986: [ora.LISTENER_SCAN3.lsnr][2950686464]{1:34166:403} [check] execCmd ret = 0
2019-12-28 05:30:17.987: [ora.LISTENER_SCAN3.lsnr][2950686464]{1:34166:403} [check] CrsCmd::ClscrsCmdData::stat entity 5 statflag 32 useFilter 1
2019-12-28 05:30:18.019: [ora.LISTENER_SCAN3.lsnr][2950686464]{1:34166:403} [check] ScanLsnrAgent::checkDependentVipResource: scanVipResource = ora.scan3.vip, statRet = 0
2019-12-28 05:30:27.292: [ora.asm][2950686464]{2:6928:2} [check] CrsCmd::ClscrsCmdData::stat entity 1 statflag 33 useFilter 0
2019-12-28 05:30:27.319: [ora.asm][2950686464]{2:6928:2} [check] AsmProxyAgent::check clsagfw_res_status 0
2019-12-28 05:30:34.522: [ora.ons][2950686464]{2:6928:2} [check] getOracleHomeAttrib: oracle_home = /u01/app/11.2.0/grid
2019-12-28 05:30:34.522: [ora.ons][2950686464]{2:6928:2} [check] Utils:execCmd action = 3 flags = 6 ohome = /u01/app/11.2.0/grid/opmn/ cmdname = onsctli. 
2019-12-28 05:30:34.627: [ora.ons][2950686464]{2:6928:2} [check] (:CLSN00010:)ons is running ...
2019-12-28 05:30:34.627: [ora.ons][2950686464]{2:6928:2} [check] (:CLSN00010:)
2019-12-28 05:30:34.628: [ora.ons][2950686464]{2:6928:2} [check] execCmd ret = 0
2019-12-28 05:30:37.858: [ USRTHRD][3575973632]{1:30748:9373} Processing the event CRS_RESOURCE_STATE_CHANGE
2019-12-28 05:30:38.652: [ora.LISTENER.lsnr][3594413824]{2:6928:2} [check] Utils:execCmd action = 3 flags = 38 ohome = (null) cmdname = lsnrctl. 
2019-12-28 05:30:38.757: [ora.LISTENER.lsnr][3594413824]{2:6928:2} [check] execCmd ret = 0
2019-12-28 05:30:48.017: [ CRSCOMM][3612317440] IpcC: IPC client connection 6c to member 0 has been removed
2019-12-28 05:30:48.017: [CLSFRAME][3612317440] Removing IPC Member:{Relative|Node:0|Process:0|Type:1}
2019-12-28 05:30:48.017: [CLSFRAME][3612317440] Disconnected from CRSD:db2 process: {Relative|Node:0|Process:0|Type:1}
2019-12-28 05:30:48.020: [   AGENT][3601811200]{0:1:10} {0:1:10} Created alert : (:CRSAGF00117:) :  Disconnected from server, Agent is shutting down.
2019-12-28 05:30:48.020: [    AGFW][3601811200]{0:1:10} Agent is exiting with exit code: 1

检查2个节点的/var/log/messages日志,发现2个节点均有多路径相关的错误信息,但相关的磁盘是用于备份的,不是用于生产数据库所使用的

Dec 30 05:25:31 db1 multipathd: backup2: sdcr - emc_clariion_checker: query command indicates error
Dec 30 05:25:31 db1 multipathd: checker failed path 69:240 in map backup2
Dec 30 05:25:31 db1 kernel: device-mapper: multipath: Failing path 69:240.
Dec 30 05:25:31 db1 multipathd: backup2: remaining active paths: 3
Dec 30 05:25:37 db1 multipathd: backup2: sdcr - emc_clariion_checker: Active path is healthy.
Dec 30 05:25:37 db1 multipathd: 69:240: reinstated
Dec 30 05:25:37 db1 multipathd: backup2: remaining active paths: 4
Dec 30 05:25:37 db1 kernel: sd 5:0:3:2: emc: ALUA failover mode detected
Dec 30 05:25:37 db1 kernel: sd 5:0:3:2: emc: at SP A Port 5 (owned, default SP A)
Dec 30 05:26:03 db1 kernel: qla2xxx [0000:05:00.1]-801c:5: Abort command issued nexus=5:3:4 --  1 2002.
Dec 30 06:03:35 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 06:15:23 db1 multipathd: backup3: sdcq - emc_clariion_checker: Read error for WWN 600601608b203300d563752524c1e611.  Sense data are 0x0/0x0/0x0.
Dec 30 06:15:23 db1 kernel: qla2xxx [0000:05:00.1]-801c:5: Abort command issued nexus=5:3:1 --  1 2002.
Dec 30 06:15:23 db1 kernel: device-mapper: multipath: Failing path 69:224.
Dec 30 06:15:23 db1 multipathd: checker failed path 69:224 in map backup3
Dec 30 06:15:23 db1 multipathd: backup3: remaining active paths: 3
Dec 30 06:15:28 db1 multipathd: backup3: sdcq - emc_clariion_checker: Active path is healthy.
Dec 30 06:15:28 db1 multipathd: 69:224: reinstated
Dec 30 06:15:28 db1 multipathd: backup3: remaining active paths: 4
Dec 30 06:15:28 db1 kernel: sd 5:0:3:1: emc: ALUA failover mode detected
Dec 30 06:15:28 db1 kernel: sd 5:0:3:1: emc: at SP A Port 5 (owned, default SP A)
Dec 30 06:59:29 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 07:53:22 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 07:55:11 db1 multipathd: sdct: couldn't get asymmetric access state
Dec 30 07:55:11 db1 multipathd: backup4: load table [0 2147483648 multipath 2 queue_if_no_path retain_attached_hw_handler 1 emc 2 1 round-robin 0 2 1 70:16 1 66:240 1 round-robin 0 2 1 8:64 1 67:80 1]
Dec 30 07:55:11 db1 kernel: sd 5:0:3:4: emc: ALUA failover mode detected
Dec 30 07:55:11 db1 kernel: sd 5:0:3:4: emc: at SP A Port 5 (owned, default SP A)
Dec 30 07:55:11 db1 kernel: sd 4:0:3:4: emc: ALUA failover mode detected
Dec 30 07:55:11 db1 kernel: sd 4:0:3:4: emc: at SP A Port 4 (owned, default SP A)
Dec 30 07:55:35 db1 multipathd: backup2: sdcr - emc_clariion_checker: Read error for WWN 600601608b203300d663752524c1e611.  Sense data are 0x0/0x0/0x0.
Dec 30 07:55:35 db1 multipathd: checker failed path 69:240 in map backup2
Dec 30 07:55:35 db1 multipathd: backup2: remaining active paths: 3
Dec 30 07:55:35 db1 kernel: device-mapper: multipath: Failing path 69:240.
Dec 30 07:55:40 db1 multipathd: backup2: sdcr - emc_clariion_checker: Active path is healthy.
Dec 30 07:55:40 db1 multipathd: 69:240: reinstated
Dec 30 07:55:40 db1 multipathd: backup2: remaining active paths: 4
Dec 30 07:55:40 db1 kernel: sd 5:0:3:2: emc: ALUA failover mode detected
Dec 30 07:55:40 db1 kernel: sd 5:0:3:2: emc: at SP A Port 5 (owned, default SP A)
Dec 30 08:39:47 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 08:43:36 db1 multipathd: mpathb: load table [0 20971520 multipath 2 queue_if_no_path retain_attached_hw_handler 1 emc 2 1 round-robin 0 2 1 69:208 1 66:176 1 round-robin 0 2 1 8:0 1 67:16 1]
Dec 30 08:43:36 db1 kernel: sd 5:0:3:0: emc: ALUA failover mode detected
Dec 30 08:43:36 db1 kernel: sd 5:0:3:0: emc: at SP A Port 5 (owned, default SP A)
Dec 30 08:43:36 db1 kernel: sd 4:0:3:0: emc: ALUA failover mode detected
Dec 30 08:43:36 db1 kernel: sd 4:0:3:0: emc: at SP A Port 4 (owned, default SP A)
Dec 30 09:24:04 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 10:13:09 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 11:06:07 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 12:07:36 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 13:08:58 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 14:00:19 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 14:52:20 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 15:40:45 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 16:34:38 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 17:09:56 db1 auditd[15975]: Audit daemon rotating log files
Dec 30 17:38:16 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 18:59:38 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 19:54:43 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

经查看集群相关日志可以确定,由于存储磁盘出现IO问题(或光线闪断、或IO延迟),导致集群CRS异常宕机。但是,比较奇怪的是,虽然CSR掉线了,ASM实例和DB实例却好好的,还可以正常使用。查询oracle support发现一篇文章1581864.1?提到ASM CRS仲裁盘访问超时与隐藏参数_asm_hbeatiowait有关系,而ASM的隐藏参数_asm_hbeatiowait由于操作系统多路径Multipath配置的polling_interval有关,具体的故障原因是操作系统盘的判断访问超时远大于数据库ASM仲裁盘访问超时,导致ORACLE RAC判定ASM中仲裁盘无法访问从而将仲裁盘强制Offline。解决的思路是:首先,确定操作系统polling_interval参数与数据库ASM隐藏参数值_asm_hbeatiowait,将_asm_hbeatiowait的值调整到比polling_interval值大即可。
下面是具体的解决操作:
1、查看数据库RAC ASM的_asm_hbeatiowait值(默认是15秒):

SQL> col ksppinm for a20
SQL> col ksppstvl for a40
SQL> col ksppdesc for a80
SQL> SELECT   ksppinm, ksppstvl, ksppdesc
  2     FROM   x$ksppi x, x$ksppcv y
  3    WHERE   x.indx = y.indx AND  ksppinm = '_asm_hbeatiowait';

KSPPINM              KSPPSTVL                                 KSPPDESC
-------------------- ---------------------------------------- --------------------------------------------------------------------------------
_asm_hbeatiowait     15                                       number of secs to wait for PST Async Hbeat IO return

2、查看操作存储盘访问超时时间(Oracle Linux 6.7默认是30秒)

[root@db1 ~]# cat /sys/block/sdb/device/timeout 
30

[root@db2 ~]# cat /sys/block/sdb/device/timeout 
30

3、将_asm_hbeatiowait 的值调整为45秒(该参数是静态参数,需要重启集群)

SQL>  alter system set "_asm_hbeatiowait"=45 scope=spfile sid='*';

System altered.

4.重启集群

5.检查集群状态

[grid@db1 ~]$ crsctl stat res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ARCH.dg
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.CWDATA.dg
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.DADATA.dg
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.DATA.dg
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.LISTENER.lsnr
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.OCR.dg
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.SBKDATA.dg
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.asm
               ONLINE  ONLINE       db1                      Started             
               ONLINE  ONLINE       db2                      Started             
ora.gsd
               OFFLINE OFFLINE      db1                                          
               OFFLINE OFFLINE      db2                                          
ora.net1.network
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.ons
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       db2                                          
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       db1                                          
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       db1                                          
ora.caiwu.db
      1        ONLINE  ONLINE       db1                      Open                
      2        ONLINE  ONLINE       db2                      Open                
ora.chdyl.db
      1        ONLINE  ONLINE       db1                      Open                
      2        ONLINE  ONLINE       db2                      Open                
ora.cvu
      1        ONLINE  ONLINE       db1                                          
ora.dadb.db
      1        ONLINE  ONLINE       db1                      Open                
      2        ONLINE  ONLINE       db2                      Open                
ora.db1.vip
      1        ONLINE  ONLINE       db1                                          
ora.db2.vip
      1        ONLINE  ONLINE       db2                                          
ora.oc4j
      1        ONLINE  ONLINE       db1                                          
ora.rlzy.db
      1        ONLINE  ONLINE       db1                      Open                
      2        ONLINE  ONLINE       db2                      Open                
ora.scan1.vip
      1        ONLINE  ONLINE       db2                                          
ora.scan2.vip
      1        ONLINE  ONLINE       db1                                          
ora.scan3.vip
      1        ONLINE  ONLINE       db1                                          


到此处理完成。

RMAN删除归档日志出现RMAN-0813错误的处理

数据库备份日志报错:RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process,DB版本为11.2.0.4,根据备策略,每天备份归档,并且自动删除档,保留一个星期的归档,手功执如下命令:

delete noprompt archivelog until time 'sysdate-7';

依然报错,信息如下:

archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_24/thread_1_seq_2328.1241.1027887401 thread=1 sequence=2328
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_24/thread_1_seq_2329.1242.1027887719 thread=1 sequence=2329
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_24/thread_1_seq_2330.1243.1027887813 thread=1 sequence=2330
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_24/thread_1_seq_2331.1244.1027887873 thread=1 sequence=2331
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_25/thread_1_seq_2332.1245.1027902623 thread=1 sequence=2332
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_25/thread_1_seq_2333.1246.1027902719 thread=1 sequence=2333
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_25/thread_1_seq_2334.1247.1027908493 thread=1 sequence=2334
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_25/thread_1_seq_2335.1248.1027911721 thread=1 sequence=2335
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_26/thread_1_seq_2336.1249.1027989005 thread=1 sequence=2336
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_26/thread_1_seq_2337.1250.1027989067 thread=1 sequence=2337
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_26/thread_1_seq_2338.1251.1027990617 thread=1 sequence=2338
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_26/thread_1_seq_2339.1252.1028041713 thread=1 sequence=2339
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_27/thread_1_seq_2340.1253.1028075413 thread=1 sequence=2340
...............

在MOS搜索发现这是11g中的一个bug,描述的原因是在11g中,RMAN查看的是v$database中的MIN_REQUIRED_CAPTURE_CHANGE#,而不是DBA_CAPTURE。默认情况下,该值每6小时更新一次。因此,在默认情况下,v$database中的值将比当前值晚6个小时。
给出的解决方法是需要在delete 后面加上force。
手动执行delete noprompt force archivelog until time ‘sysdate-7’, 成功将归档删除。

deleted archived log
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_24/thread_1_seq_2330.1243.1027887813 RECID=18420 STAMP=1027887814
deleted archived log
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_24/thread_1_seq_2331.1244.1027887873 RECID=18421 STAMP=1027887875
deleted archived log
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_25/thread_1_seq_2332.1245.1027902623 RECID=18422 STAMP=1027902628
deleted archived log
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_25/thread_1_seq_2333.1246.1027902719 RECID=18423 STAMP=1027902721
deleted archived log
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_25/thread_1_seq_2334.1247.1027908493 RECID=18424 STAMP=1027908500
deleted archived log
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_25/thread_1_seq_2335.1248.1027911721 RECID=18425 STAMP=1027911722
deleted archived log
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_26/thread_1_seq_2336.1249.1027989005 RECID=18426 STAMP=1027989010
deleted archived log
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_26/thread_1_seq_2337.1250.1027989067 RECID=18427 STAMP=1027989069
deleted archived log
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_26/thread_1_seq_2338.1251.1027990617 RECID=18428 STAMP=1027990618
deleted archived log
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_26/thread_1_seq_2339.1252.1028041713 RECID=18429 STAMP=1028041715
...........
Deleted 800 objects

DM7数据复制之数据库级复制

DM 的数据复制可以在表级,模式级和库级进行配置,表级复制因为不能同步DDL,所以必须要求主从段表的结构必须完全一致。 本篇我们看下数据库级数据复制的配置过程。
下面举一个简单的例子来说明数据复制的配置。
1. 准备工作
参与复制的复制实例的信息如下表所示。
数据库 实例名 IP地址 数据库端口号 MAL端口号 文件目录
复制数据库 jydm 10.10.10.180 5236 5241 /dm_home/dmdba/dmdbms/data/
主数据库 hy 10.10.10.184 5236 5242 /dm_home/dmdba/dmdbms/data/
从数据库 wy 10.10.10.185 5236 5243 /dm_home/dmdba/dmdbms/data/

假设主数据库上存在hy用户,从数据库上存在wy用户,现需要创建一个主数据库上的hy用户到从数据库上wy用户的同步复制关系,其名称为REPHY2WY。

2. 参数设置
修改3台DM数据库的dm.ini 文件,添加如下内容:
数据库 dm.ini 设置
复制数据库 INSTANCE_NAME = jydm
PORT_NUM = 5236
MAL_INI = 1
主数据库 INSTANCE_NAME = hy
PORT_NUM = 5236
MAL_INI = 1
从数据库 INSTANCE_NAME = wy
PORT_NUM = 5236
MAL_INI = 1

配置dmmal.ini 文件
修改dmmal.ini文件,添加如下内容。Dmmal 配置文件默认没有,可以从示例目录下复制过来修改。

[root@shard1 ini_script]# pwd
/dm_home/dmdba/dmdbms/samples/ini_script
[root@shard1 ini_script]# ls -lrt
总用量 48
-rwxr-xr-x 1 dmdba dinstall  890 2月  14 2019 dmarch_example.ini
-rwxr-xr-x 1 dmdba dinstall 1966 2月  14 2019 dmdcr_cfg_example.ini
-rwxr-xr-x 1 dmdba dinstall  631 2月  14 2019 dmdcr_example.ini
-rwxr-xr-x 1 dmdba dinstall 1537 2月  14 2019 dminit_example.ini
-rwxr-xr-x 1 dmdba dinstall 2070 2月  14 2019 dmmal_example.ini
-rwxr-xr-x 1 dmdba dinstall 1277 2月  14 2019 dmmonitor_example.ini
-rwxr-xr-x 1 dmdba dinstall  288 2月  14 2019 dmmpp_example.ini
-rwxr-xr-x 1 dmdba dinstall 1679 2月  14 2019 dmtimer_example.ini
-rwxr-xr-x 1 dmdba dinstall 1241 2月  14 2019 dmwatch_example.ini
-rwxr-xr-x 1 dmdba dinstall 2146 2月  14 2019 dmwatcher_example.ini
-rwxr-xr-x 1 dmdba dinstall  522 2月  14 2019 dmwmon_example.ini
-rwxr-xr-x 1 dmdba dinstall  636 2月  14 2019 sqllog_example.ini

在每个数据库服务器上创建一个dmmal.ini文件,每个dmmal.ini 配置必须一致:

MAL_CHECK_INTERVAL = 5
MAL_CONN_FAIL_INTERVAL = 5

[MAL_jydm]
MAL_INST_NAME = jydm
MAL_HOST = 10.10.10.180
MAL_PORT = 5241
MAL_INST_PORT = 5236 
MAL_INST_HOST = 10.10.10.180

[MAL_hy]
MAL_INST_NAME = hy
MAL_HOST = 10.10.10.184
MAL_PORT = 5242
MAL_INST_PORT = 5236 
MAL_INST_HOST = 10.10.10.184

[MAL_wy] 
MAL_INST_NAME =wy
MAL_HOST = 10.10.10.185
MAL_PORT = 5243
MAL_INST_PORT = 5236 
MAL_INST_HOST =10.10.10.185

[dmdba@shard1 jydm]$ cat dmmal.ini
MAL_CHECK_INTERVAL = 5
MAL_CONN_FAIL_INTERVAL = 5

[MAL_jydm]
MAL_INST_NAME = jydm
MAL_HOST = 10.10.10.180
MAL_PORT = 5241
MAL_INST_PORT = 5236 
MAL_INST_HOST = 10.10.10.180

[MAL_hy]
MAL_INST_NAME = hy
MAL_HOST = 10.10.10.184
MAL_PORT = 5242
MAL_INST_PORT = 5236 
MAL_INST_HOST = 10.10.10.184

[MAL_wy] 
MAL_INST_NAME =wy
MAL_HOST = 10.10.10.185
MAL_PORT = 5243
MAL_INST_PORT = 5236 
MAL_INST_HOST =10.10.10.185

[dmdba@jydm2 hy]$ cat dmmal.ini
MAL_CHECK_INTERVAL = 5
MAL_CONN_FAIL_INTERVAL = 5

[MAL_jydm]
MAL_INST_NAME = jydm
MAL_HOST = 10.10.10.180
MAL_PORT = 5241
MAL_INST_PORT = 5236 
MAL_INST_HOST = 10.10.10.180

[MAL_hy]
MAL_INST_NAME = hy
MAL_HOST = 10.10.10.184
MAL_PORT = 5242
MAL_INST_PORT = 5236 
MAL_INST_HOST = 10.10.10.184

[MAL_wy] 
MAL_INST_NAME =wy
MAL_HOST = 10.10.10.185
MAL_PORT = 5243
MAL_INST_PORT = 5236 
MAL_INST_HOST =10.10.10.185

[dmdba@jydm3 wy]$ cat dmmal.ini
MAL_CHECK_INTERVAL = 5
MAL_CONN_FAIL_INTERVAL = 5

[MAL_jydm]
MAL_INST_NAME = jydm
MAL_HOST = 10.10.10.180
MAL_PORT = 5241
MAL_INST_PORT = 5236 
MAL_INST_HOST = 10.10.10.180

[MAL_hy]
MAL_INST_NAME = hy
MAL_HOST = 10.10.10.184
MAL_PORT = 5242
MAL_INST_PORT = 5236 
MAL_INST_HOST = 10.10.10.184

[MAL_wy] 
MAL_INST_NAME =wy
MAL_HOST = 10.10.10.185
MAL_PORT = 5243
MAL_INST_PORT = 5236 
MAL_INST_HOST =10.10.10.185

对3个数据库都要启用mal_ini参数

SQL> sp_set_para_value(2,'MAL_INI',1);
DMSQL executed successfully
used time: 7.465(ms). Execute id is 23.

复制数据库初始化
如果是第一次使用复制数据库,需要对复制数据库执行初始化操作。通过执行系统函数SP_INIT_REP_SYS(create_flag)来初始化复制数据库。其主要作用是创建复制用户(SYSREP/SYSREP)和创建复制数据库上需要的系统表。SP_INIT_REP_SYS 的参数create_flag为1时表示创建用户和系统表,为0时表示删除用户和系统表。

开始初始化:

SQL> SP_INIT_REP_SYS(1);
DMSQL executed successfully
used time: 115.960(ms). Execute id is 30.
SQL> conn SYSREP/SYSREP

Server[LOCALHOST:5236]:mode is normal, state is open
login used time: 9.684(ms)
SQL> select table_name from user_tables;

LINEID     TABLE_NAME       
---------- -----------------
1          RPS_GROUPS
2          RPS_INSTANCES
3          RPS_REPLICATIONS
4          RPS_TABMAPS
5          RPS_FAULT_HISTORY
6          RPS_CONFLICTS
7          RPS_TIMERS

7 rows got

used time: 170.443(ms). Execute id is 31.

创建的7张复制系统表如下:
1) 复制组表
CREATE TABLE SYSREP.RPS_GROUPS (
NAME VARCHAR(128), // 复制组名
ID INT, // 复制组ID
DESC$ VARCHAR(1000), // 描述
CLUSTER PRIMARY (NAME)
);
2) 复制节点实例表
CREATE TABLE SYSREP.RPS_INSTANCES(
INST_NAME VARCHAR(128), // 复制节点实例名
GRP_ID INT, // 复制组ID
INST_ID INT, // 实例在复制组中编号
FAULT_TIMEOUT INT, // 故障超时处理值,以秒为单位,0为立即超时
VALID_FLAG CHAR(1), // 节点系统状态
FAULT_TIME DATETIME, // 节点故障开始时间
NET_VALID_FLAG CHAR(1), // 网络状态
NET_FAULT_TIME DATETIME, // 网络故障开始时间
CLUSTER PRIMAY KEY(GRP_ID, INST_NAME)
);
3) 复制关系表
CREATE TABLE SYSREP.RPS_REPLICATIONS(
REP_NAME VARCHAR(128), // 复制名
GRP_ID INT, // 复制组ID
REP_ID INT, // 复制ID,全局唯一
MINST_ID INT, // 主节点实例编号
SINST_ID INT, // 从节点实例编号
ARCH_DIR VARCHAR(256), // 主节点归档日志路径
FAULT_TIMEOUT INT, // 故障超时处理值,以秒为单位,0为立即超时
VALID_FLAG CHAR(1), // 复制关系状态
FAULT_TIME DATETIME, // 故障开始时间
SYNC_FLAG INTEGER //指定同步或异步复制
TIMER_NAME VARCHAR(128) //指定异步复制的定时器(同步复制没有此项)
DESC$ VARCHAR(1000), // 复制描述
CLUSTER PRIMARY KEY(GRP_ID, REP_NAME)
);
4) 复制映射表
CREATE TABLE SYSREP.RPS_TABMAPS(
REP_ID INT, //复制ID
MSCH_NAME VARCHAR(128), //主表模式名
MTAB_NAME VARCHAR(128), //主表名
MSCH_ID INT, //主表模式ID
MTAB_ID INT, //主表ID
SSCH_NAME VARCHAR(128), //从表模式名
STAB_NAME VARCHAR(128), //从表名
SSCH_ID INT, //从表模式ID
STAB_ID INT, //从表ID
READONLY_MODE INT, //映射模式1:只读模式,0:非只读模式
CLUSTER PRIMARY KEY(REP_ID, MTAB_ID, STAB_ID)
);
5) 复制故障历史表
CREATE TABLE SYSREP.REP_FAULT_HISTORY(
GRP_NAME VARCHAR(128), // 复制组
OBJ_NAME VARCHAR(128), // 故障节点或关系
FAULT_TYPE VARCHAR(128), // 故障类型描述
START_TIME DATETIME, // 故障开始时间
END_TIME DATETIME // 故障结束时间
);
6) 复制冲突历史表
CREATE TABLE SYSREP.RPS_CONFLICTS
(
SEQ_NO BIGINT // 冲突序号
REP_ID INT, // 对应复制号
INST_ID INT, // 产生冲突的节点编号
TABLE_ID INT, // 冲突表ID
TYPE TINYINT // 操作类型
OCC_TIME DATETIME, // 冲突产生时间
KEY_DATA VARCHAR(8000), // 冲突数据的PK值,如包含多个KEY值,则以逗号分隔。如键值超长则截断
);
7) 复制定时器表
CREATE TABLE SYSREP.RPS_TIMERS (
NAME VARCHAR(128), // 定时器名称
TYPE$ INT, // 定时类型,取值可参见SP_RPS_CREATE_TIMER
FERQ_INTERVAL INT, // 间隔天数
FREQ_SUB_INTERVAL INT, // 间隔的月/周(调度类型决定)数
FREQ_MINUTE_INTERVAL INT, // 间隔的分钟数
START_TIME TIME, // 开始时间
END_TIME TIME, // 结束时间
DURING_START_DATE DATETIME // 开始时间点
DURING_END_DATE DATETIME, // 结束时间点
NO_END_DATA_FLAG INTEGER //是否有结束日期(0:有结束日期;1:没有结束日期)
DESC$ VARCHAR(1000), // 定时器描述
CLUSTER PRIMARY KEY(REP_NAME)
);
配置模式级复制

启动3台服务器,启动的顺序不分先后。

[root@shard1 jydm]# systemctl stop DmServicejydm.service
[root@shard1 jydm]# systemctl start DmServicejydm.service

[root@jydm2 ~]# systemctl stop DmServicehy.service
[root@jydm2 ~]# systemctl start DmServicehy.service

[root@jydm3 ~]# systemctl stop DmServicewy.service
[root@jydm3 ~]# systemctl start DmServicewy.service

登录复制数据库(RPS jydm),保证服务器状态为OPEN,开始复制配置。

[dmdba@shard1 bin]$ ./disql sysdba/abcd

Server[LOCALHOST:5236]:mode is normal, state is open
login used time: 9.851(ms)
disql V7.1.6.46-Build(2018.02.08-89107)ENT 
Connected to: DM 7.1.6.46

创建复制组REP_GRP_hy2wy

SQL> SP_RPS_ADD_GROUP('REP_GRP_hy2wy', '主从同步复制');
DMSQL executed successfully
used time: 32.545(ms). Execute id is 26.
SQL> select * from SYSREP.RPS_GROUPS;

LINEID     NAME          ID          DESC$       
---------- ------------- ----------- ------------
1          REP_GRP_hy2wy 3           主从同步复制

used time: 1.073(ms). Execute id is 27.

开始设置

SQL> SP_RPS_SET_BEGIN('REP_GRP_hy2wy');
DMSQL executed successfully
used time: 0.530(ms). Execute id is 57.

添加复制关系

SQL> SP_RPS_ADD_REPLICATION ('REP_GRP_hy2wy', 'REPhy2wy', 'hy到wy的同步复制', 'hy', 'wy', NULL, '/dm_home/dmdba/dmdbms/dmreplog');
DMSQL executed successfully
used time: 2.529(ms). Execute id is 29.

添加数据库级复制

SQL> SP_RPS_ADD_DB_MAP('REPhy2wy', 0);
DMSQL executed successfully
used time: 0.597(ms). Execute id is 30.

提交设置

SQL> SP_RPS_SET_APPLY();
DMSQL executed successfully
used time: 00:00:05.035. Execute id is 31.

SQL>  select * from SYSREP.RPS_REPLICATIONS;

LINEID     REP_NAME GRP_ID      REP_ID      MINST_ID    SINST_ID    ARCH_DIR                       FAULT_TIMEOUT VALID_FLAG  FAULT_TIME                  SYNC_FLAG   TIMER_NAME DESC$           
---------- -------- ----------- ----------- ----------- ----------- ------------------------------ ------------- ----------- --------------------------- ----------- ---------- ----------------
1          REPhy2wy 3           4           7           8           /dm_home/dmdba/dmdbms/dmreplog -1            0           NULL                        1           NULL       hy到wy的同步复制

used time: 1.277(ms). Execute id is 32.
SQL> select * from SYSREP.RPS_INSTANCES;

LINEID     INST_NAME GRP_ID      INST_ID     FAULT_TIMEOUT VALID_FLAG  FAULT_TIME                  NET_VALID_FLAG NET_FAULT_TIME             
---------- --------- ----------- ----------- ------------- ----------- --------------------------- -------------- ---------------------------
1          hy        3           7           -1            0           NULL                        0              NULL
2          wy        3           8           -1            0           NULL                        0              NULL

used time: 1.014(ms). Execute id is 33.
SQL> select * from SYSREP.RPS_TABMAPS;

LINEID     REP_ID      MSCH_NAME MTAB_NAME MSCH_ID     MTAB_ID     SSCH_NAME STAB_NAME SSCH_ID     STAB_ID     READONLY_MODE
---------- ----------- --------- --------- ----------- ----------- --------- --------- ----------- ----------- -------------
1          4           NULL      NULL      -1          -1          NULL      NULL      -1          -1          0

used time: 1.129(ms). Execute id is 34.

至此,数据库级复制配置完成。注意一点,DM的复制配置必须在开始复制配置之前执行SP_RPS_SET_BEGIN,配置完成之后执行SP_RPS_SET_APPLY提交。

验证同步
在主库上的hy用户下创建一个测试表t4

SQL> create table t4 as select * from sysobjects;
executed successfully
used time: 12.280(ms). Execute id is 12.
SQL> select count(*) from t4;

LINEID     COUNT(*)            
---------- --------------------
1          1480

used time: 1.256(ms). Execute id is 13.

在从库上的wy用户查询这个表t4

SQL> select count(*) from t4;
select count(*) from t4;
[-2106]:Error in line: 1
Invalid table or view name [T4].
used time: 0.423(ms). Execute id is 0.

可以看到创建表的DDL语句没有同步到wy用户下。

在主库上创建用户jy

SQL> create user jy identified by "abcd";
executed successfully
used time: 4.940(ms). Execute id is 5.
SQL> grant dba,resource to jy;
executed successfully
used time: 2.929(ms). Execute id is 6.

在从库上登录用户jy

[dmdba@jydm3 bin]$ ./disql jy/abcd
[-70028]:Create SOCKET connection failure.
disql V7.1.6.46-Build(2018.02.08-89107)ENT 
username:

从从库上的日志可以看到故障信息

2019-12-20 00:15:08 [WARNING] database P0000029732 main_thread  rps sys init failed, code:-8748

从复制数据库上查看复制故障信息

SQL> select * from SYSREP.RPS_FAULT_HISTORY;

LINEID     GRP_NAME      OBJ_NAME FAULT_TYPE         START_TIME                  END_TIME                   
---------- ------------- -------- ------------------ --------------------------- ---------------------------
1          REP_GRP_hy2wy REPhy2wy REPLICATION FAULT  2019-12-20 00:14:23.366210  2019-12-20 00:15:24.042032
2          REP_GRP_hy2wy wy       INSTANCE SYS FAULT 2019-12-20 00:14:23.366210  2019-12-20 00:15:24.042032
3          REP_GRP_hy2wy REPhy2wy REPLICATION FAULT  2019-12-20 00:16:23.386842  NULL
4          REP_GRP_hy2wy wy       INSTANCE SYS FAULT 2019-12-20 00:16:23.386842  NULL

used time: 1.179(ms). Execute id is 4.

从测试结果可以看到数据库级的数据复制不支持DDL,数据库的数据复制对创建用户的操作处理异常,会导致从库奔溃。

DM7数据复制之模式级复制

DM 的数据复制可以在表级,模式级和库级进行配置,表级复制因为不能同步DDL,所以必须要求主从段表的结构必须完全一致。 本篇我们看下模式级数据复制的配置过程。
下面举一个简单的例子来说明数据复制的配置。
1. 准备工作
参与复制的复制实例的信息如下表所示。
数据库 实例名 IP地址 数据库端口号 MAL端口号 文件目录
复制数据库 jydm 10.10.10.180 5236 5241 /dm_home/dmdba/dmdbms/data/
主数据库 hy 10.10.10.184 5236 5242 /dm_home/dmdba/dmdbms/data/
从数据库 wy 10.10.10.185 5236 5243 /dm_home/dmdba/dmdbms/data/

假设主数据库上存在hy用户,从数据库上存在wy用户,现需要创建一个主数据库上的hy用户到从数据库上wy用户的同步复制关系,其名称为REPHY2WY。

2. 参数设置
修改3台DM数据库的dm.ini 文件,添加如下内容:
数据库 dm.ini 设置
复制数据库 INSTANCE_NAME = jydm
PORT_NUM = 5236
MAL_INI = 1
主数据库 INSTANCE_NAME = hy
PORT_NUM = 5236
MAL_INI = 1
从数据库 INSTANCE_NAME = wy
PORT_NUM = 5236
MAL_INI = 1

配置dmmal.ini 文件
修改dmmal.ini文件,添加如下内容。Dmmal 配置文件默认没有,可以从示例目录下复制过来修改。

[root@shard1 ini_script]# pwd
/dm_home/dmdba/dmdbms/samples/ini_script
[root@shard1 ini_script]# ls -lrt
总用量 48
-rwxr-xr-x 1 dmdba dinstall  890 2月  14 2019 dmarch_example.ini
-rwxr-xr-x 1 dmdba dinstall 1966 2月  14 2019 dmdcr_cfg_example.ini
-rwxr-xr-x 1 dmdba dinstall  631 2月  14 2019 dmdcr_example.ini
-rwxr-xr-x 1 dmdba dinstall 1537 2月  14 2019 dminit_example.ini
-rwxr-xr-x 1 dmdba dinstall 2070 2月  14 2019 dmmal_example.ini
-rwxr-xr-x 1 dmdba dinstall 1277 2月  14 2019 dmmonitor_example.ini
-rwxr-xr-x 1 dmdba dinstall  288 2月  14 2019 dmmpp_example.ini
-rwxr-xr-x 1 dmdba dinstall 1679 2月  14 2019 dmtimer_example.ini
-rwxr-xr-x 1 dmdba dinstall 1241 2月  14 2019 dmwatch_example.ini
-rwxr-xr-x 1 dmdba dinstall 2146 2月  14 2019 dmwatcher_example.ini
-rwxr-xr-x 1 dmdba dinstall  522 2月  14 2019 dmwmon_example.ini
-rwxr-xr-x 1 dmdba dinstall  636 2月  14 2019 sqllog_example.ini


在每个数据库服务器上创建一个dmmal.ini文件,每个dmmal.ini 配置必须一致:

MAL_CHECK_INTERVAL = 5
MAL_CONN_FAIL_INTERVAL = 5

[MAL_jydm]
MAL_INST_NAME = jydm
MAL_HOST = 10.10.10.180
MAL_PORT = 5241
MAL_INST_PORT = 5236 
MAL_INST_HOST = 10.10.10.180

[MAL_hy]
MAL_INST_NAME = hy
MAL_HOST = 10.10.10.184
MAL_PORT = 5242
MAL_INST_PORT = 5236 
MAL_INST_HOST = 10.10.10.184

[MAL_wy] 
MAL_INST_NAME =wy
MAL_HOST = 10.10.10.185
MAL_PORT = 5243
MAL_INST_PORT = 5236 
MAL_INST_HOST =10.10.10.185

[dmdba@shard1 jydm]$ cat dmmal.ini
MAL_CHECK_INTERVAL = 5
MAL_CONN_FAIL_INTERVAL = 5

[MAL_jydm]
MAL_INST_NAME = jydm
MAL_HOST = 10.10.10.180
MAL_PORT = 5241
MAL_INST_PORT = 5236 
MAL_INST_HOST = 10.10.10.180

[MAL_hy]
MAL_INST_NAME = hy
MAL_HOST = 10.10.10.184
MAL_PORT = 5242
MAL_INST_PORT = 5236 
MAL_INST_HOST = 10.10.10.184

[MAL_wy] 
MAL_INST_NAME =wy
MAL_HOST = 10.10.10.185
MAL_PORT = 5243
MAL_INST_PORT = 5236 
MAL_INST_HOST =10.10.10.185

[dmdba@jydm2 hy]$ cat dmmal.ini
MAL_CHECK_INTERVAL = 5
MAL_CONN_FAIL_INTERVAL = 5

[MAL_jydm]
MAL_INST_NAME = jydm
MAL_HOST = 10.10.10.180
MAL_PORT = 5241
MAL_INST_PORT = 5236 
MAL_INST_HOST = 10.10.10.180

[MAL_hy]
MAL_INST_NAME = hy
MAL_HOST = 10.10.10.184
MAL_PORT = 5242
MAL_INST_PORT = 5236 
MAL_INST_HOST = 10.10.10.184

[MAL_wy] 
MAL_INST_NAME =wy
MAL_HOST = 10.10.10.185
MAL_PORT = 5243
MAL_INST_PORT = 5236 
MAL_INST_HOST =10.10.10.185

[dmdba@jydm3 wy]$ cat dmmal.ini
MAL_CHECK_INTERVAL = 5
MAL_CONN_FAIL_INTERVAL = 5

[MAL_jydm]
MAL_INST_NAME = jydm
MAL_HOST = 10.10.10.180
MAL_PORT = 5241
MAL_INST_PORT = 5236 
MAL_INST_HOST = 10.10.10.180

[MAL_hy]
MAL_INST_NAME = hy
MAL_HOST = 10.10.10.184
MAL_PORT = 5242
MAL_INST_PORT = 5236 
MAL_INST_HOST = 10.10.10.184

[MAL_wy] 
MAL_INST_NAME =wy
MAL_HOST = 10.10.10.185
MAL_PORT = 5243
MAL_INST_PORT = 5236 
MAL_INST_HOST =10.10.10.185

对3个数据库都要启用mal_ini参数

SQL> sp_set_para_value(2,'MAL_INI',1);
DMSQL executed successfully
used time: 7.465(ms). Execute id is 23.

复制数据库初始化
如果是第一次使用复制数据库,需要对复制数据库执行初始化操作。通过执行系统函数SP_INIT_REP_SYS(create_flag)来初始化复制数据库。其主要作用是创建复制用户(SYSREP/SYSREP)和创建复制数据库上需要的系统表。SP_INIT_REP_SYS 的参数create_flag为1时表示创建用户和系统表,为0时表示删除用户和系统表。

开始初始化:

SQL> SP_INIT_REP_SYS(1);
DMSQL executed successfully
used time: 115.960(ms). Execute id is 30.
SQL> conn SYSREP/SYSREP

Server[LOCALHOST:5236]:mode is normal, state is open
login used time: 9.684(ms)
SQL> select table_name from user_tables;

LINEID     TABLE_NAME       
---------- -----------------
1          RPS_GROUPS
2          RPS_INSTANCES
3          RPS_REPLICATIONS
4          RPS_TABMAPS
5          RPS_FAULT_HISTORY
6          RPS_CONFLICTS
7          RPS_TIMERS

7 rows got

used time: 170.443(ms). Execute id is 31.

创建的7张复制系统表如下:
1) 复制组表
CREATE TABLE SYSREP.RPS_GROUPS (
NAME VARCHAR(128), // 复制组名
ID INT, // 复制组ID
DESC$ VARCHAR(1000), // 描述
CLUSTER PRIMARY (NAME)
);
2) 复制节点实例表
CREATE TABLE SYSREP.RPS_INSTANCES(
INST_NAME VARCHAR(128), // 复制节点实例名
GRP_ID INT, // 复制组ID
INST_ID INT, // 实例在复制组中编号
FAULT_TIMEOUT INT, // 故障超时处理值,以秒为单位,0为立即超时
VALID_FLAG CHAR(1), // 节点系统状态
FAULT_TIME DATETIME, // 节点故障开始时间
NET_VALID_FLAG CHAR(1), // 网络状态
NET_FAULT_TIME DATETIME, // 网络故障开始时间
CLUSTER PRIMAY KEY(GRP_ID, INST_NAME)
);
3) 复制关系表
CREATE TABLE SYSREP.RPS_REPLICATIONS(
REP_NAME VARCHAR(128), // 复制名
GRP_ID INT, // 复制组ID
REP_ID INT, // 复制ID,全局唯一
MINST_ID INT, // 主节点实例编号
SINST_ID INT, // 从节点实例编号
ARCH_DIR VARCHAR(256), // 主节点归档日志路径
FAULT_TIMEOUT INT, // 故障超时处理值,以秒为单位,0为立即超时
VALID_FLAG CHAR(1), // 复制关系状态
FAULT_TIME DATETIME, // 故障开始时间
SYNC_FLAG INTEGER //指定同步或异步复制
TIMER_NAME VARCHAR(128) //指定异步复制的定时器(同步复制没有此项)
DESC$ VARCHAR(1000), // 复制描述
CLUSTER PRIMARY KEY(GRP_ID, REP_NAME)
);
4) 复制映射表
CREATE TABLE SYSREP.RPS_TABMAPS(
REP_ID INT, //复制ID
MSCH_NAME VARCHAR(128), //主表模式名
MTAB_NAME VARCHAR(128), //主表名
MSCH_ID INT, //主表模式ID
MTAB_ID INT, //主表ID
SSCH_NAME VARCHAR(128), //从表模式名
STAB_NAME VARCHAR(128), //从表名
SSCH_ID INT, //从表模式ID
STAB_ID INT, //从表ID
READONLY_MODE INT, //映射模式1:只读模式,0:非只读模式
CLUSTER PRIMARY KEY(REP_ID, MTAB_ID, STAB_ID)
);
5) 复制故障历史表
CREATE TABLE SYSREP.REP_FAULT_HISTORY(
GRP_NAME VARCHAR(128), // 复制组
OBJ_NAME VARCHAR(128), // 故障节点或关系
FAULT_TYPE VARCHAR(128), // 故障类型描述
START_TIME DATETIME, // 故障开始时间
END_TIME DATETIME // 故障结束时间
);
6) 复制冲突历史表
CREATE TABLE SYSREP.RPS_CONFLICTS
(
SEQ_NO BIGINT // 冲突序号
REP_ID INT, // 对应复制号
INST_ID INT, // 产生冲突的节点编号
TABLE_ID INT, // 冲突表ID
TYPE TINYINT // 操作类型
OCC_TIME DATETIME, // 冲突产生时间
KEY_DATA VARCHAR(8000), // 冲突数据的PK值,如包含多个KEY值,则以逗号分隔。如键值超长则截断
);
7) 复制定时器表
CREATE TABLE SYSREP.RPS_TIMERS (
NAME VARCHAR(128), // 定时器名称
TYPE$ INT, // 定时类型,取值可参见SP_RPS_CREATE_TIMER
FERQ_INTERVAL INT, // 间隔天数
FREQ_SUB_INTERVAL INT, // 间隔的月/周(调度类型决定)数
FREQ_MINUTE_INTERVAL INT, // 间隔的分钟数
START_TIME TIME, // 开始时间
END_TIME TIME, // 结束时间
DURING_START_DATE DATETIME // 开始时间点
DURING_END_DATE DATETIME, // 结束时间点
NO_END_DATA_FLAG INTEGER //是否有结束日期(0:有结束日期;1:没有结束日期)
DESC$ VARCHAR(1000), // 定时器描述
CLUSTER PRIMARY KEY(REP_NAME)
);
配置模式级复制

启动3台服务器,启动的顺序不分先后。

[root@shard1 jydm]# systemctl stop DmServicejydm.service
[root@shard1 jydm]# systemctl start DmServicejydm.service

[root@jydm2 ~]# systemctl stop DmServicehy.service
[root@jydm2 ~]# systemctl start DmServicehy.service

[root@jydm3 ~]# systemctl stop DmServicewy.service
[root@jydm3 ~]# systemctl start DmServicewy.service

登录复制数据库(RPS jydm),保证服务器状态为OPEN,开始复制配置。

[dmdba@shard1 bin]$ ./disql sysdba/abcd

Server[LOCALHOST:5236]:mode is normal, state is open
login used time: 9.851(ms)
disql V7.1.6.46-Build(2018.02.08-89107)ENT 
Connected to: DM 7.1.6.46

创建复制组REP_GRP_hy2wy

SQL> SP_RPS_ADD_GROUP('REP_GRP_hy2wy', '主从同步复制');
DMSQL executed successfully
used time: 6.109(ms). Execute id is 55.

SQL> select * from SYSREP.RPS_GROUPS;

LINEID     NAME          ID          DESC$       
---------- ------------- ----------- ------------
1          REP_GRP_hy2wy 2           主从同步复制

used time: 1.182(ms). Execute id is 56.

开始设置

SQL> SP_RPS_SET_BEGIN('REP_GRP_hy2wy');
DMSQL executed successfully
used time: 0.530(ms). Execute id is 57.

添加复制关系

SQL> SP_RPS_ADD_REPLICATION ('REP_GRP_hy2wy', 'REPhy2wy', 'hy到wy的同步复制', 'hy', 'wy', NULL, '/dm_home/dmdba/dmdbms/dmreplog');
DMSQL executed successfully
used time: 2.623(ms). Execute id is 58.

添加模式级复制
添加复制映射, 注意这里用户名和表名要大写,否则会提示对象不存在

SQL> SP_RPS_ADD_SCH_MAP('REPhy2wy','hy','wy',0);
SP_RPS_ADD_SCH_MAP('REPhy2wy','hy','wy',0);
[-8713]:source table is not exists.
used time: 1.738(ms). Execute id is 0.

SQL> SP_RPS_ADD_SCH_MAP('REPhy2wy','HY','WY',0);
DMSQL executed successfully
used time: 3.126(ms). Execute id is 6.

提交设置

SQL> SP_RPS_SET_APPLY();
DMSQL executed successfully
used time: 00:00:05.041. Execute id is 7.
SQL>  select * from SYSREP.RPS_REPLICATIONS;

LINEID     REP_NAME GRP_ID      REP_ID      MINST_ID    SINST_ID    ARCH_DIR                       FAULT_TIMEOUT VALID_FLAG  FAULT_TIME                  SYNC_FLAG   TIMER_NAME DESC$           
---------- -------- ----------- ----------- ----------- ----------- ------------------------------ ------------- ----------- --------------------------- ----------- ---------- ----------------
1          REPhy2wy 2           3           5           6           /dm_home/dmdba/dmdbms/dmreplog -1            0           NULL                        1           NULL       hy到wy的同步复制

used time: 1.362(ms). Execute id is 8.
SQL> select * from SYSREP.RPS_INSTANCES;

LINEID     INST_NAME GRP_ID      INST_ID     FAULT_TIMEOUT VALID_FLAG  FAULT_TIME                  NET_VALID_FLAG NET_FAULT_TIME             
---------- --------- ----------- ----------- ------------- ----------- --------------------------- -------------- ---------------------------
1          hy        2           5           -1            0           NULL                        0              NULL
2          wy        2           6           -1            0           NULL                        0              NULL

used time: 0.981(ms). Execute id is 9.
SQL> select * from SYSREP.RPS_TABMAPS;

LINEID     REP_ID      MSCH_NAME MTAB_NAME MSCH_ID     MTAB_ID     SSCH_NAME STAB_NAME SSCH_ID     STAB_ID     READONLY_MODE
---------- ----------- --------- --------- ----------- ----------- --------- --------- ----------- ----------- -------------
1          3           HY        NULL      150995951   -1          WY        NULL      150995951   -1          0

used time: 1.178(ms). Execute id is 10.
SQL>

至此,模式级复制配置完成。注意一点,DM的复制配置必须在开始复制配置之前执行SP_RPS_SET_BEGIN,配置完成之后执行SP_RPS_SET_APPLY提交。

验证同步
先测试DML操作:在主库insert 数据:

SQL> insert into t1 values(1,'hy1');
affect rows 1

used time: 1.565(ms). Execute id is 6.
SQL> commit;
executed successfully
used time: 1.442(ms). Execute id is 7.
SQL> select * from t1;

LINEID     ID USERNAME
---------- -- --------
1          1  hy1

used time: 0.548(ms). Execute id is 8.

从库查询,同步成功:

SQL> select * from t1;

LINEID     ID USERNAME
---------- -- --------
1          1  hy1

used time: 2.538(ms). Execute id is 5.

在主库上的hy用户下创建一个测试表t2

SQL> create table t2 as select * from sysobjects;
executed successfully
used time: 12.763(ms). Execute id is 6.
SQL> select count(*) from t2;

LINEID     COUNT(*)            
---------- --------------------
1          1480

used time: 1.266(ms). Execute id is 7.

在从库上的wy用户查询这个表t2

SQL> select count(*) from t2;

LINEID     COUNT(*)            
---------- --------------------
1          1480

used time: 0.463(ms). Execute id is 8.

可以看到创建表的DDL语句已经同步到wy用户下。

因为我这里的复制从库是0级,就是非只读的,删除后重新添加1级,即只读模式在测试:

SQL> SP_RPS_SET_BEGIN('REP_GRP_hy2wy');
DMSQL executed successfully
used time: 0.408(ms). Execute id is 12.

SQL> SP_RPS_DROP_SCH_MAP('REPhy2wy','HY','WY');
DMSQL executed successfully
used time: 1.456(ms). Execute id is 13.

SQL> SP_RPS_ADD_SCH_MAP('REPhy2wy','HY','WY',1);
DMSQL executed successfully
used time: 2.820(ms). Execute id is 14.
SQL> SP_RPS_SET_APPLY();
DMSQL executed successfully
used time: 00:00:05.039. Execute id is 15.

再测试
从主库的hy用户中创建表t3

SQL> create table t3 as select * from sysobjects;
executed successfully
used time: 9.925(ms). Execute id is 9.
SQL> select count(*) from t3;

LINEID     COUNT(*)            
---------- --------------------
1          1480

used time: 0.861(ms). Execute id is 10.

在从库上的wy用户下查询表t3显示不存在

SQL> select count(*) from t3;
select count(*) from t3;
[-2106]:Error in line: 1
Invalid table or view name [T3].
used time: 0.774(ms). Execute id is 0.

从测试结果可以看到在只读模式下的模式级复制不能同步DDL操作。

DM7数据复制之表级复制

达梦数据复制(DATA REPLICATION)是一个分担系统访问压力、加快异地访问响应速度、提高数据可靠性的解决方案。将一个服务器实例上的数据变更复制到另外的服务器实例。可以用于解决大、中型应用中出现的因来自不同地域、不同部门、不同类型的数据访问请求导致数据库服务器超负荷运行、网络阻塞、远程用户的数据响应迟缓的问题。

重要概念
1. 数据库状态
服务器的状态,在不同的状态下,对能够进行的操作有不同的限制。详见数据守护部分。
2. 主服务器
发起复制操作的服务器,称为主服务器。
3. 从服务器
接收主服务器发送的数据并进行复制的服务器,称为从服务器。
4. 复制节点
涉及到复制的服务器,主服务和从服务的统称。一个节点既可以是主服务器也可以是从服务器。
5. 复制服务器(RPS)
在数据复制环境中,负责配置复制环境,定义复制关系的服务器。RPS有且仅有一台,它只负责配置和监控,并不参与到复制过程中。
6. 复制关系
复制关系指明主服务器和从服务器以何种方式进行复制。按照复制的方式,复制关系分为同步复制和异步复制。
7. 同步复制关系
主服务器数据更新立即复制到从服务器。
8. 异步复制关系
主服务器和从服务器在某段时间内数据可能是不同的,主服务器数据更新不会立刻同步到从服务器,而是在经过一段时间后才进行复制。异步复制的同步时机由指定的定时器确定。
9. 逻辑日志
记录产生数据变化的逻辑操作的日志。记录的逻辑操作包括INSERT、UPDATE、DELETE、TRUNCATE、ROLLBACK和COMMIT。
10. 复制源对象
主服务器上作为复制数据源的对象,可以是库、模式或表。在该对象上的操作都会被记录成逻辑日志,发送给从服务器进行复制操作。

11. 复制目标对象
从服务器上作为复制数据目标的对象。从服务器接收到逻辑日志后,将复制源对象的变化复制到复制目标对象中。

12. 复制对象映射
一对复制源对象和复制目标对象构成一个复制对象映射。构成映射的源对象和目标对象必须是同一类型的对象。根据对象的类型,复制映射分为库级、模式级和表级三个级别。其中表级要求源表和目标表结构完全一致,库级和模式级没有要求。库级和模式级复制映射会将其DDL也进行复制。复制映射包括只读模式和非只读模式。对于只读模式的映射,映射的目的表禁止用户更新。
13. 复制组
一组逻辑相关的复制关系可以构造成为复制组。通过复制组,可以构造出一对多复制、多对一复制、级联复制、对称复制、循环复制等复杂的逻辑复制环境。

体系构架
数据复制系统由以下部件构成:
1. 复制服务器;
2. 复制节点;
3. 各实例站点间通讯的MAL系统。

除了系统管理员通过复制服务器定义复制和处理异常外,其他部分的处理及主从服务器之间复制操作对于用户是透明的。整个复制环境的配置境况如下图所示。

在整个环境中有且仅有一台复制服务器(RPS),用户通过RPS定义复制及复制环境,但RPS并不参与到复制过程中。
DM7中,将复制逻辑日志按照配置归档到本站点指定目录称为本地归档,将日志的发送称为日志的远程归档。
复制节点上,与复制相关的配置文件有dmtimer.ini、dmllog.ini、dmrep.ini。各个配置文件在复制中的功能如下表所示

这些文件均是RPS依据用户配置通过MAL系统自动生成或修改的,管理员不需要手动进行管理。

在数据复制过程中,除了配置文件,其他的重要文件如下表。

这些文件都是流式文件,主服务器在执行过程中,逻辑日志先记录到逻辑日志文件中,并根据配置的信息,将逻辑日志分别分发归档到不同的逻辑日志归档文件中,并在恰当的时机将逻辑日志归档文件的内容发送给从服务器。

从服务器接收到一批数据就产生一个复制数据文件将数据存储其中,防止从服务器复制速度低导致复制数据的丢失。
逻辑日志文件大小限制为32M,两个文件交替使用。

归档路径内每个逻辑日志归档文件大小限制为32M,文件写满后增加新文件来存放新的记录。在dmllog.ini中有配置归档路径内所有文件总的空间限制,若达到限制的空间,则会删除之前的文件;若配置为无空间限制,则管理员可根据复制情况进行删除。

复制数据文件是从服务器一次接收的复制数据的临时文件,其大小不会超过32K;在其数据复制结束后会被自动删除。

配置数据复制在RPS上进行,与数据复制相关的过程如下:
1. SP_INIT_REP_SYS*
定义:
SP_INIT_REP_SYS(
CREATE_FLAG INT
);
功能说明:
创建或删除数据复制所需的系统表
参数说明:
CREATE_FLAG:为1表示创建复制所需系统表;为0表示删除这些系统表
返回值:

举例说明:
创建复制所需的系统表
SP_INIT_REP_SYS(1);

2. SP_RPS_ADD_GROUP
定义:
SP_RPS_ADD_GROUP(
GROUP_NAME VARCHAR(128),
GROUP_DESC VARCHAR(1000)
);
功能说明:
创建复制组
参数说明:
GROUP_NAME:创建的复制组名称
GROUP_DESC:复制组描述
返回值:

备注:
指示RPS创建一个新的复制组。如果已存在同名复制组则报错。
举例说明:
创建复制组REP_GRP_B2C
SP_RPS_ADD_GROUP(‘REP_GRP_B2C’,’主从同步复制’);

3. SP_RPS_DROP_GROUP
定义:
SP_RPS_DROP_GROUP(
GROUP_NAME VARCHAR(128)
);
功能说明:
删除复制组
参数说明:
GROUP_NAME:复制组名称
返回值:

举例说明:
删除复制组REP_GRP_B2C
SP_RPS_DROP_GROUP (‘REP_GRP_B2C’);

4. SP_RPS_ADD_REPLICATION
定义:
SP_RPS_ADD_REPLICATION(
GRP_NAME VARCHAR(128),
REP_NAME VARCHAR(128),
REP_DESC VARCHAR(1000),
MINSTANCE VARCHAR(128),
SINSTANCE VARCHAR(128),
REP_TIMER VARCHAR(128),
ARCH_PATH VARCHAR(256)
);
功能说明:
创建复制关系
参数说明:
GRP_NAME:复制组名
REP_NAME:复制名,必须在RPS上唯一
REP_DESC:复制描述
MINSTANCE:主节点实例名,必须在RPS的MAL中已配置
SINSTANCE:从节点实例名,必须在RPS的MAL中已配置
REP_TIMER:复制定时器名。借助定时器,可以设置复制数据的同步时机。如果是同步复制则为NULL
ARCH_PATH:主服务器上逻辑日志的完整归档路径。
返回值:

举例说明:
创建复制关系
SP_RPS_ADD_REPLICATION (‘REP_GRP_B2C’, ‘REPB2C’, ‘B到C的同步复制’, ‘B’, ‘C’, NULL, ‘{ DEFARCHPATH}\REPB2C’);

5. SP_RPS_DROP_REPLICATION
定义:
SP_RPS_DROP_REPLICATION (
REP_NAME VARCHAR(128)
);
功能说明:
删除复制关系
参数说明:
REP_NAME:复制名称
返回值:

举例说明:
删除复制关系
SP_RPS_DROP_REPLICATION (‘REPB2C’);

6. SP_RPS_SET_ROUTE_FAULT_TIMEOUT
定义:
SP_RPS_SET_ROUTE_FAULT_TIMEOUT (
REP_NAME VARCHAR(128),
TIMEOUTS INT
);
功能说明:
设置复制路径故障超时
参数说明:
REP_NAME:复制关系名。
TIMEOUTS:故障超时值,以秒为单位。0为立即超时;-1表示无超时限制返回值:

备注:
该接口用于设置复制路径故障处理策略。设置后,RPS如检测到复制路径产生故障,且故障持续超过设定的超时值后,则需要取消故障的复制关系。
举例说明:
设置复制路径故障超时
SP_RPS_SET_ROUTE_FAULT_TIMEOUT (‘REPB2C’,10);

7. SP_RPS_SET_INST_FAULT_TIMEOUT
定义:
SP_RPS_SET_INST_FAULT_TIMEOUT (
INST_NAME VARCHAR(128),
TIMEOUTS INT
);
功能说明:
设置复制节点故障超时
参数说明:
INST_NAME:复制节点实例名
TIMEOUTS:故障超时值,以秒为单位。0为立即超时;-1表示无超时限制
返回值:

举例说明:
设置复制节点故障超时
SP_RPS_SET_INST_FAULT_TIMEOUT (‘B’,10);

8. SP_RPS_ADD_TIMER
定义:
SP_RPS_ADD_TIMER(
TIMER_NAME VARCHAR(128),
TIMER_DESC VARCHAR(1000),
TYPE$ INT,
FERQ_INTERVAL INT,
FREQ_SUB_INTERVAL INT,
FREQ_MINUTE_INTERVAL INT,
START_TIME TIME,
END_TIME TIME,
DURING_START_DATE DATETIME,
DURING_END_DATE DATETIME,
NO_END_DATA_FLAG INT
);
功能说明:
设置复制关系的定时器
参数说明:
TIMER_NAME:定时器名
TIMER_DESC:定时器描述
TYPE$:定时器类型,取值如下:
1:执行一次
2:每日执行
3:每周执行
4:按月执行的第几天
5:按月执行的第一周
6:按月执行的第二周
7:按月执行的第三周
8:按月执行的第四周
9:按月执行的最后一周
FREQ_INTERVAL:间隔的月/周(调度类型决定)数
FREQ_SUB_INTERVAL: 间隔天数
FREQ_MINUTE_INTERVAL:间隔的分钟数
START_TIME:开始时间
END_TIME:结束时间
DURING_START_DATE:有效日期时间段的开始日期时间
DURING_END_DATE:有效日期时间段结束日期时间
NO_END_DATA_FLAG:结束日期是否无效标识,0表示结束日期有效,1表示无效
本过程的TYPE$、FERQ_INTERVAL、FREQ_SUB_INTERVAL、FREQ_MINUTE_INTERVAL、START_TIME、END_TIME、DURING_START_DATE、DURING_END_DATE和NO_END_DATA_FLAG分别与过程SP_ADD_TIMER的参数TYPE、FREQ_MONTH_WEEK_INTERVAL、FREQ_SUB_INTERVAL、FREQ_MINUTE_INTERVAL、START_TIME、END_TIME、DURING_START_DATE、DURING_END_DATE和NO_END_DATE_FLAG对应,其具体说明可参考过程SP_ADD_TIMER的说明。
返回值:

举例说明:
设置复制关系的定时器
SP_RPS_ADD_TIMER (‘TIMER1′,’按天计算’, 1, 1, 0, 1, CURTIME, ’23:59:59′, NOW, NULL, 1);

9. SP_RPS_REP_RESET_TIMER
定义:
SP_RPS_REP_RESET_TIMER(
REP_NAME VARCHAR(128),
TIMER_NAME VARCHAR(128)
);
功能说明:
重新设置复制关系的定时器
参数说明:
REP_NAME:复制名
TIMER_NAME:新的定时器名
返回值:

举例说明:
重新设置复制关系的定时器
SP_RPS_REP_RESET_TIMER (‘REPB2C’, ‘TIMER1’);

10. SP_RPS_ADD_TAB_MAP
定义:
SP_RPS_ADD_TAB_MAP(
REP_NAME VARCHAR(128),
MTAB_SCHEMA VARCHAR(128),
MTAB_NAME VARCHAR(128),
STAB_SCHEMA VARCHAR(128),
STAB_NAME VARCHAR(128),
READ_ONLY_MODE INT
);
功能说明:
添加表级复制映射
参数说明:
REP_NAME:复制关系名
MTAB_SCHEMA:主表模式名
MTAB_NAME:主表名
STAB_SCHEMA:从表模式名
STAB_NAME:从表名
READ_ONLY_MODE:只读复制模式,1表示只读模式,从表只接受复制更新,0表示非只读模式
返回值:

举例说明:
添加复制映射
SP_RPS_ADD_TAB_MAP(‘REPB2C’, ‘USER1’, ‘T1’, ‘USER2’, ‘T2’, 0);

11. SP_RPS_DROP_TAB_MAP
定义:
SP_RPS_DROP_TAB_MAP(
REP_NAME VARCHAR(128),
MTAB_SCHEMA VARCHAR(128),
MTAB_NAME VARCHAR(128),
STAB_SCHEMA VARCHAR(128),
STAB_NAME VARCHAR(128),
);
功能说明:
删除表级复制映射
参数说明:
REP_NAME:复制关系名
MTAB_SCHEMA:主表模式名
MTAB_NAME:主表名
STAB_SCHEMA:从表模式名
STAB_NAME:从表名
返回值:

举例说明:
删除表级复制映射
SP_RPS_DROP_TAB_MAP(‘REPB2C’, ‘USER1’, ‘T1’, ‘USER2’, ‘T2’);

12. SP_RPS_ADD_SCH_MAP
定义:
SP_RPS_ADD_SCH_MAP(
REP_NAME VARCHAR(128),
MSCH VARCHAR(128),
SSCH VARCHAR(128),
READ_ONLY_MODE INT
);
功能说明:
添加模式级复制映射
参数说明:
REP_NAME:复制关系名
MSCH:主模式名
SSCH: 从表模式名
READ_ONLY_MODE:只读复制模式,1表示只读模式,从表只接受复制更新,0表示非只读模式
返回值:

举例说明:
添加复制映射
SP_RPS_ADD_SCH_MAP(‘REPB2C’, ‘USER1’, ‘USER2’, 0);

13. SP_RPS_DROP_SCH_MAP
定义:
SP_RPS_DROP_SCH_MAP(
REP_NAME VARCHAR(128),
MSCH VARCHAR(128),
SSCH VARCHAR(128)
);
功能说明:
删除模式级复制映射
参数说明:
REP_NAME:复制关系名
MSCH:主模式名
SSCH:从模式名
返回值:

举例说明:
删除模式级复制映射
SP_RPS_DROP_SCH_MAP(‘REPB2C’, ‘USER1’, ‘USER2’);

14. SP_RPS_ADD_DB_MAP
定义:
SP_RPS_ADD_DB_MAP(
REP_NAME VARCHAR(128),
READ_ONLY_MODE INT
);
功能说明:
添加库级复制映射
参数说明:
REP_NAME:复制关系名
READ_ONLY_MODE:只读复制模式,1表示只读模式,从表只接受复制更新,0表示非只读模式
返回值:

举例说明:
添加库级复制映射
SP_RPS_ADD_DB_MAP(‘REPB2C’, 0);

15. SP_RPS_DROP_DB_MAP
定义:
SP_RPS_DROP_DB_MAP(
REP_NAME VARCHAR(128)
);
功能说明:
删除库级复制映射
参数说明:
REP_NAME:复制关系名
返回值:

举例说明:
删除库级复制映射
SP_RPS_DROP_DB_MAP(‘REPB2C’);

16. SP_RPS_SET_BEGIN
定义:
SP_RPS_SET_BEGIN(
GRP_NAME VARCHAR(128),
);
功能说明:
开始复制设置
参数说明:
GRP_NAME:复制组名
返回值:

备注:
开始对指定复制组进行属性设置。创建/删除复制关系与创建/删除复制映射等接口都必须在此接口调用后执行,否则会报错“错误的复制设置序列”。同一会话中也不能同时开始多个复制设置。
举例说明:
复制组REPB2C开始复制
SP_RPS_SET_BEGIN(‘ REP_GRP_B2C’);

17. SP_RPS_SET_APPLY
定义:
SP_RPS_SET_APPLY ();
功能说明:
提交复制设置,保存并提交本次设置的所有操作。如果需要继续设置,则必须重新调用SP_RPS_SET_BEGIN
参数说明:

返回值:

举例说明:
提交复制设置
SP_RPS_SET_APPLY ();
18. SP_RPS_SET_CANCEL
定义:
SP_RPS_SET_CANCEL ();
功能说明:
放弃复制设置,放弃本次设置的所有操作。如果需要重新设置,则必须再次调用SP_RPS_SET_BEGIN
参数说明:

返回值:

举例说明:
放弃复制设置
SP_RPS_SET_CANCEL();

DM 的数据复制可以在表级,模式级和库级进行配置,本篇我们看下表级数据复制的配置过程。
下面举一个简单的例子来说明数据复制的配置。
1. 准备工作
参与复制的复制实例的信息如下表所示。
数据库 实例名 IP地址 数据库端口号 MAL端口号 文件目录
复制数据库 jydm 10.10.10.180 5236 5241 /dm_home/dmdba/dmdbms/data/
主数据库 hy 10.10.10.184 5236 5242 /dm_home/dmdba/dmdbms/data/
从数据库 wy 10.10.10.185 5236 5243 /dm_home/dmdba/dmdbms/data/

假设主数据库上存在hy.t1表,从数据库上存在与主数据库hy.t1表结构完全相同的wy.t1表,现需要创建一个主数据库上hy.t1表到从数据库上wy.t1表的同步复制关系,其名称为REPHY2WY。

2. 参数设置
修改3台DM数据库的dm.ini 文件,添加如下内容:
数据库 dm.ini 设置
复制数据库 INSTANCE_NAME = jydm
PORT_NUM = 5236
MAL_INI = 1
主数据库 INSTANCE_NAME = hy
PORT_NUM = 5236
MAL_INI = 1
从数据库 INSTANCE_NAME = wy
PORT_NUM = 5236
MAL_INI = 1

配置dmmal.ini 文件
修改dmmal.ini文件,添加如下内容。Dmmal 配置文件默认没有,可以从示例目录下复制过来修改。

[root@shard1 ini_script]# pwd
/dm_home/dmdba/dmdbms/samples/ini_script
[root@shard1 ini_script]# ls -lrt
总用量 48
-rwxr-xr-x 1 dmdba dinstall  890 2月  14 2019 dmarch_example.ini
-rwxr-xr-x 1 dmdba dinstall 1966 2月  14 2019 dmdcr_cfg_example.ini
-rwxr-xr-x 1 dmdba dinstall  631 2月  14 2019 dmdcr_example.ini
-rwxr-xr-x 1 dmdba dinstall 1537 2月  14 2019 dminit_example.ini
-rwxr-xr-x 1 dmdba dinstall 2070 2月  14 2019 dmmal_example.ini
-rwxr-xr-x 1 dmdba dinstall 1277 2月  14 2019 dmmonitor_example.ini
-rwxr-xr-x 1 dmdba dinstall  288 2月  14 2019 dmmpp_example.ini
-rwxr-xr-x 1 dmdba dinstall 1679 2月  14 2019 dmtimer_example.ini
-rwxr-xr-x 1 dmdba dinstall 1241 2月  14 2019 dmwatch_example.ini
-rwxr-xr-x 1 dmdba dinstall 2146 2月  14 2019 dmwatcher_example.ini
-rwxr-xr-x 1 dmdba dinstall  522 2月  14 2019 dmwmon_example.ini
-rwxr-xr-x 1 dmdba dinstall  636 2月  14 2019 sqllog_example.ini


在每个数据库服务器上创建一个dmmal.ini文件,每个dmmal.ini 配置必须一致:

MAL_CHECK_INTERVAL = 5
MAL_CONN_FAIL_INTERVAL = 5

[MAL_jydm]
MAL_INST_NAME = jydm
MAL_HOST = 10.10.10.180
MAL_PORT = 5241
MAL_INST_PORT = 5236 
MAL_INST_HOST = 10.10.10.180

[MAL_hy]
MAL_INST_NAME = hy
MAL_HOST = 10.10.10.184
MAL_PORT = 5242
MAL_INST_PORT = 5236 
MAL_INST_HOST = 10.10.10.184

[MAL_wy] 
MAL_INST_NAME =wy
MAL_HOST = 10.10.10.185
MAL_PORT = 5243
MAL_INST_PORT = 5236 
MAL_INST_HOST =10.10.10.185

[dmdba@shard1 jydm]$ cat dmmal.ini
MAL_CHECK_INTERVAL = 5
MAL_CONN_FAIL_INTERVAL = 5

[MAL_jydm]
MAL_INST_NAME = jydm
MAL_HOST = 10.10.10.180
MAL_PORT = 5241
MAL_INST_PORT = 5236 
MAL_INST_HOST = 10.10.10.180

[MAL_hy]
MAL_INST_NAME = hy
MAL_HOST = 10.10.10.184
MAL_PORT = 5242
MAL_INST_PORT = 5236 
MAL_INST_HOST = 10.10.10.184

[MAL_wy] 
MAL_INST_NAME =wy
MAL_HOST = 10.10.10.185
MAL_PORT = 5243
MAL_INST_PORT = 5236 
MAL_INST_HOST =10.10.10.185

[dmdba@jydm2 hy]$ cat dmmal.ini
MAL_CHECK_INTERVAL = 5
MAL_CONN_FAIL_INTERVAL = 5

[MAL_jydm]
MAL_INST_NAME = jydm
MAL_HOST = 10.10.10.180
MAL_PORT = 5241
MAL_INST_PORT = 5236 
MAL_INST_HOST = 10.10.10.180

[MAL_hy]
MAL_INST_NAME = hy
MAL_HOST = 10.10.10.184
MAL_PORT = 5242
MAL_INST_PORT = 5236 
MAL_INST_HOST = 10.10.10.184

[MAL_wy] 
MAL_INST_NAME =wy
MAL_HOST = 10.10.10.185
MAL_PORT = 5243
MAL_INST_PORT = 5236 
MAL_INST_HOST =10.10.10.185

[dmdba@jydm3 wy]$ cat dmmal.ini
MAL_CHECK_INTERVAL = 5
MAL_CONN_FAIL_INTERVAL = 5

[MAL_jydm]
MAL_INST_NAME = jydm
MAL_HOST = 10.10.10.180
MAL_PORT = 5241
MAL_INST_PORT = 5236 
MAL_INST_HOST = 10.10.10.180

[MAL_hy]
MAL_INST_NAME = hy
MAL_HOST = 10.10.10.184
MAL_PORT = 5242
MAL_INST_PORT = 5236 
MAL_INST_HOST = 10.10.10.184

[MAL_wy] 
MAL_INST_NAME =wy
MAL_HOST = 10.10.10.185
MAL_PORT = 5243
MAL_INST_PORT = 5236 
MAL_INST_HOST =10.10.10.185

对3个数据库都要启用mal_ini参数

SQL> sp_set_para_value(2,'MAL_INI',1);
DMSQL executed successfully
used time: 7.465(ms). Execute id is 23.

复制数据库初始化
如果是第一次使用复制数据库,需要对复制数据库执行初始化操作。通过执行系统函数SP_INIT_REP_SYS(create_flag)来初始化复制数据库。其主要作用是创建复制用户(SYSREP/SYSREP)和创建复制数据库上需要的系统表。SP_INIT_REP_SYS 的参数create_flag为1时表示创建用户和系统表,为0时表示删除用户和系统表。

开始初始化:

SQL> SP_INIT_REP_SYS(1);
DMSQL executed successfully
used time: 115.960(ms). Execute id is 30.
SQL> conn SYSREP/SYSREP

Server[LOCALHOST:5236]:mode is normal, state is open
login used time: 9.684(ms)
SQL> select table_name from user_tables;

LINEID     TABLE_NAME       
---------- -----------------
1          RPS_GROUPS
2          RPS_INSTANCES
3          RPS_REPLICATIONS
4          RPS_TABMAPS
5          RPS_FAULT_HISTORY
6          RPS_CONFLICTS
7          RPS_TIMERS

7 rows got

used time: 170.443(ms). Execute id is 31.

创建的7张复制系统表如下:
1) 复制组表
CREATE TABLE SYSREP.RPS_GROUPS (
NAME VARCHAR(128), // 复制组名
ID INT, // 复制组ID
DESC$ VARCHAR(1000), // 描述
CLUSTER PRIMARY (NAME)
);
2) 复制节点实例表
CREATE TABLE SYSREP.RPS_INSTANCES(
INST_NAME VARCHAR(128), // 复制节点实例名
GRP_ID INT, // 复制组ID
INST_ID INT, // 实例在复制组中编号
FAULT_TIMEOUT INT, // 故障超时处理值,以秒为单位,0为立即超时
VALID_FLAG CHAR(1), // 节点系统状态
FAULT_TIME DATETIME, // 节点故障开始时间
NET_VALID_FLAG CHAR(1), // 网络状态
NET_FAULT_TIME DATETIME, // 网络故障开始时间
CLUSTER PRIMAY KEY(GRP_ID, INST_NAME)
);
3) 复制关系表
CREATE TABLE SYSREP.RPS_REPLICATIONS(
REP_NAME VARCHAR(128), // 复制名
GRP_ID INT, // 复制组ID
REP_ID INT, // 复制ID,全局唯一
MINST_ID INT, // 主节点实例编号
SINST_ID INT, // 从节点实例编号
ARCH_DIR VARCHAR(256), // 主节点归档日志路径
FAULT_TIMEOUT INT, // 故障超时处理值,以秒为单位,0为立即超时
VALID_FLAG CHAR(1), // 复制关系状态
FAULT_TIME DATETIME, // 故障开始时间
SYNC_FLAG INTEGER //指定同步或异步复制
TIMER_NAME VARCHAR(128) //指定异步复制的定时器(同步复制没有此项)
DESC$ VARCHAR(1000), // 复制描述
CLUSTER PRIMARY KEY(GRP_ID, REP_NAME)
);
4) 复制映射表
CREATE TABLE SYSREP.RPS_TABMAPS(
REP_ID INT, //复制ID
MSCH_NAME VARCHAR(128), //主表模式名
MTAB_NAME VARCHAR(128), //主表名
MSCH_ID INT, //主表模式ID
MTAB_ID INT, //主表ID
SSCH_NAME VARCHAR(128), //从表模式名
STAB_NAME VARCHAR(128), //从表名
SSCH_ID INT, //从表模式ID
STAB_ID INT, //从表ID
READONLY_MODE INT, //映射模式1:只读模式,0:非只读模式
CLUSTER PRIMARY KEY(REP_ID, MTAB_ID, STAB_ID)
);
5) 复制故障历史表
CREATE TABLE SYSREP.REP_FAULT_HISTORY(
GRP_NAME VARCHAR(128), // 复制组
OBJ_NAME VARCHAR(128), // 故障节点或关系
FAULT_TYPE VARCHAR(128), // 故障类型描述
START_TIME DATETIME, // 故障开始时间
END_TIME DATETIME // 故障结束时间
);
6) 复制冲突历史表
CREATE TABLE SYSREP.RPS_CONFLICTS
(
SEQ_NO BIGINT // 冲突序号
REP_ID INT, // 对应复制号
INST_ID INT, // 产生冲突的节点编号
TABLE_ID INT, // 冲突表ID
TYPE TINYINT // 操作类型
OCC_TIME DATETIME, // 冲突产生时间
KEY_DATA VARCHAR(8000), // 冲突数据的PK值,如包含多个KEY值,则以逗号分隔。如键值超长则截断
);
7) 复制定时器表
CREATE TABLE SYSREP.RPS_TIMERS (
NAME VARCHAR(128), // 定时器名称
TYPE$ INT, // 定时类型,取值可参见SP_RPS_CREATE_TIMER
FERQ_INTERVAL INT, // 间隔天数
FREQ_SUB_INTERVAL INT, // 间隔的月/周(调度类型决定)数
FREQ_MINUTE_INTERVAL INT, // 间隔的分钟数
START_TIME TIME, // 开始时间
END_TIME TIME, // 结束时间
DURING_START_DATE DATETIME // 开始时间点
DURING_END_DATE DATETIME, // 结束时间点
NO_END_DATA_FLAG INTEGER //是否有结束日期(0:有结束日期;1:没有结束日期)
DESC$ VARCHAR(1000), // 定时器描述
CLUSTER PRIMARY KEY(REP_NAME)
);
配置表级复制

启动3台服务器,启动的顺序不分先后。

[root@shard1 jydm]# systemctl stop DmServicejydm.service
[root@shard1 jydm]# systemctl start DmServicejydm.service

[root@jydm2 ~]# systemctl stop DmServicehy.service
[root@jydm2 ~]# systemctl start DmServicehy.service

[root@jydm3 ~]# systemctl stop DmServicewy.service
[root@jydm3 ~]# systemctl start DmServicewy.service

登录复制数据库(RPS jydm),保证服务器状态为OPEN,开始复制配置。

[dmdba@shard1 bin]$ ./disql sysdba/abcd

Server[LOCALHOST:5236]:mode is normal, state is open
login used time: 9.851(ms)
disql V7.1.6.46-Build(2018.02.08-89107)ENT 
Connected to: DM 7.1.6.46

创建复制组REP_GRP_hy2wy

SQL> SP_RPS_ADD_GROUP('REP_GRP_hy2wy', '主从同步复制');
DMSQL executed successfully
used time: 16.792(ms). Execute id is 3.

SQL> select * from SYSREP.RPS_GROUPS;

LINEID     NAME          ID          DESC$       
---------- ------------- ----------- ------------
1          REP_GRP_hy2wy 1           主从同步复制

used time: 1.599(ms). Execute id is 4.

开始设置

SQL> SP_RPS_SET_BEGIN('REP_GRP_hy2wy');
DMSQL executed successfully
used time: 1.081(ms). Execute id is 6.

添加复制关系

SQL> SP_RPS_ADD_REPLICATION ('REP_GRP_hy2wy', 'REPhy2wy', 'hy到wy的同步复制', 'hy', 'wy', NULL, '/dm_home/dmdba/dmdbms/dmreplog');
DMSQL executed successfully
used time: 3.807(ms). Execute id is 7.

添加复制映射, 注意这里用户名和表名要大写,否则会提示对象不存在

SQL> SP_RPS_ADD_TAB_MAP('REPhy2wy', 'HY', 'T1', 'WY', 'T1', 0);
DMSQL executed successfully
used time: 4.495(ms). Execute id is 10.

提交设置

SQL> SP_RPS_SET_APPLY();
DMSQL executed successfully
used time: 00:00:05.038. Execute id is 12.
SQL> select * from SYSREP.RPS_REPLICATIONS;

LINEID     REP_NAME GRP_ID      REP_ID      MINST_ID    SINST_ID    ARCH_DIR                       FAULT_TIMEOUT VALID_FLAG  FAULT_TIME                  SYNC_FLAG   TIMER_NAME DESC$           
---------- -------- ----------- ----------- ----------- ----------- ------------------------------ ------------- ----------- --------------------------- ----------- ---------- ----------------
1          REPhy2wy 1           1           1           2           /dm_home/dmdba/dmdbms/dmreplog -1            0           NULL                        1           NULL       hy到wy的同步复制

used time: 0.534(ms). Execute id is 13.
SQL> select * from SYSREP.RPS_INSTANCES;

LINEID     INST_NAME GRP_ID      INST_ID     FAULT_TIMEOUT VALID_FLAG  FAULT_TIME                  NET_VALID_FLAG NET_FAULT_TIME             
---------- --------- ----------- ----------- ------------- ----------- --------------------------- -------------- ---------------------------
1          hy        1           1           -1            0           NULL                        0              NULL
2          wy        1           2           -1            0           NULL                        0              NULL

used time: 0.499(ms). Execute id is 14.
SQL> select * from SYSREP.RPS_TABMAPS;

LINEID     REP_ID      MSCH_NAME MTAB_NAME MSCH_ID     MTAB_ID     SSCH_NAME STAB_NAME SSCH_ID     STAB_ID     READONLY_MODE
---------- ----------- --------- --------- ----------- ----------- --------- --------- ----------- ----------- -------------
1          1           HY        T1        150995951   1297        WY        T1        150995951   1296        0

used time: 0.523(ms). Execute id is 15.
SQL> select * from SYSREP.RPS_FAULT_HISTORY;
no rows

used time: 1.136(ms). Execute id is 16.
SQL> select * from SYSREP.RPS_CONFLICTS;
no rows

used time: 1.336(ms). Execute id is 17.
SQL> select * from SYSREP.RPS_TIMERS;
no rows

used time: 0.987(ms). Execute id is 18.

至此,表复制完成。注意一点,DM的复制配置必须在开始复制配置之前执行SP_RPS_SET_BEGIN,配置完成之后执行SP_RPS_SET_APPLY提交。
验证同步
主库的逻辑日志:

[root@jydm2 dmreplog]# ls -lrt
总用量 4
-rw-r--r-- 1 dmdba dinstall 48 10月 21 08:24 llog_arch_file_20141021082409.log

主库对T1表插入数据:

[dmdba@jydm2 bin]$ ./disql hy/hy
Server[LOCALHOST:5236]:mode is normal, state is open
login used time: 6.047(ms)
disql V7.1.6.46-Build(2018.02.08-89107)ENT 
Connected to: DM 7.1.6.46
SQL> select * from t1;
no rows

used time: 0.998(ms). Execute id is 4.
S SQL> insert into t1 values(2,'hy2');
affect rows 1

used time: 0.777(ms). Execute id is 8.
SQL> commit;
executed successfully
used time: 1.141(ms). Execute id is 9.
SQL> select * from t1;

LINEID     ID USERNAME
---------- -- --------
1          1  hy
2          2  hy2

used time: 0.403(ms). Execute id is 10.

从库查询: 同步成功,主库之前的数据无法同步过来,只对后来操作的数据有效

[dmdba@jydm3 bin]$ ./disql wy/wy

Server[LOCALHOST:5236]:mode is normal, state is open
login used time: 5.844(ms)
disql V7.1.6.46-Build(2018.02.08-89107)ENT 
Connected to: DM 7.1.6.46
SQL> select * from t1;     

LINEID     ID USERNAME
---------- -- --------
1          1  hy
2          2  hy2

used time: 0.618(ms). Execute id is 9.

在主库中更新记录

SQL> update t1 set username='hy1' where id=1;
affect rows 1

used time: 1.427(ms). Execute id is 11.
SQL> commit;
executed successfully
used time: 0.869(ms). Execute id is 12.
SQL> select * from t1;

LINEID     ID USERNAME
---------- -- --------
1          1  hy1
2          2  hy2

used time: 0.339(ms). Execute id is 13.

查看从库记录

SQL> select * from t1; 

LINEID     ID USERNAME
---------- -- --------
1          1  hy1
2          2  hy2

used time: 1.272(ms). Execute id is 12.

主库删除T1表的所有数据:

SQL> truncate table t1;
executed successfully
used time: 3.283(ms). Execute id is 14.

从库的所有数据也删除了

SQL> select * from t1;
no rows

used time: 0.793(ms). Execute id is 15.

整个复制的环境中,逻辑日志仅存在主库中。
我们配置的是同步的复制,如果想改成异步同步可以添加个计时器,比如:
SP_RPS_ADD_TIMER(‘hy2wy_TIMER’,’从hy到wy定时复制’,1,0,0,0,’23:18:33′,NULL,’2019-12-18 23:18:33′,NULL,1);
SP_RPS_REP_RESET_TIMER(‘REPhy2wy’,’hy2wy_TIMER’);

若需要删除复制组,则该复制组不能处于配置阶段,即该组的配置已经提交或取消。
监控数据复制
配置完成后复制服务器RPS虽然不参与复制的具体执行,但是在复制过程中,还是建议保持RPS的运行来对复制进行监控。
复制系统内,状态分为复制节点的状态和复制关系的状态。具体内容见下表。

复制节点的系统状态在SYSREP.RPS_INSTANCES中的VALID_FLAG字段表示;网络状态在SYSREP.RPS_INSTANCES表中NET_VALID_FLAG的字段表示。
复制关系的状态在SYSREP.RPS_REPLICATIONS中的VALID_FLAG字段表示。

SQL> select * from sysrep.rps_replications;

LINEID     REP_NAME GRP_ID      REP_ID      MINST_ID    SINST_ID    ARCH_DIR                       FAULT_TIMEOUT VALID_FLAG  FAULT_TIME                  SYNC_FLAG   TIMER_NAME DESC$           
---------- -------- ----------- ----------- ----------- ----------- ------------------------------ ------------- ----------- --------------------------- ----------- ---------- ----------------
1          REPhy2wy 1           1           1           2           /dm_home/dmdba/dmdbms/dmreplog -1            0           NULL                        1           NULL       hy到wy的同步复制

used time: 1.339(ms). Execute id is 23.
SQL> select * from SYSREP.RPS_INSTANCES;

LINEID     INST_NAME GRP_ID      INST_ID     FAULT_TIMEOUT VALID_FLAG  FAULT_TIME                  NET_VALID_FLAG NET_FAULT_TIME             
---------- --------- ----------- ----------- ------------- ----------- --------------------------- -------------- ---------------------------
1          hy        1           1           -1            0           NULL                        0              NULL
2          wy        1           2           -1            0           NULL                        0              NULL

used time: 0.574(ms). Execute id is 24.

复制系统的配置信息可以通过在复制RPS上查询系统表来实现。在RPS上,复制相关的信息都在SYSREP模式中。
复制的监视按复制组为单位进行,RPS每隔1分钟轮询探查所有的复制组,在每个复制组内,依次探测各个复制节点,要求其返回复制节点本身的状态和其所涉及的复制的状态。
这里需要特别说明的是节点的系统异常状态是指该节点在复制环境中的状态,是一个推测值。当RPS无法得到节点的返回消息,其他节点的返回消息中所有涉及到该节点的复制关系都是异常时,RPS就认为该节点系统异常。
若复制结点或复制关系发生异常,这些异常都会记录在SYSREP.RPS_FAULT_HISTORY表中。若其END_TIME字段为NULL,表示该故障还没有结束,管理员需尽快检查复制节点的状态及其网络连接。

SQL> select * from SYSREP.RPS_CONFLICTS;
no rows

used time: 1.336(ms). Execute id is 17.

DM7并没有特意设置对错误历史的管理接口,管理员可以通过对SYSREP.RPS_FAULT_HISTORY表的删除和查询来实现错误记录的管理。
至此,复制环境配置完成。以上的例子只是一个最简单的复制环境。复制的配置灵活,在同一个复制组内,一个主服务器可以有多个从服务器,一个复制节点可以既是主服务器又是从服务器。管理员可以根据实际需要,配置出对称、一对多、多对一、级联、循环的复制环境。
在配置过程中或配置完成后,可以对复制的配置进行修改。修改包括复制组、复制关系、复制对象的删除和复制关系属性的修改。这些修改操作都必须在开始复制SP_RPS_SET_BEGIN和提交复制SP_RPS_SET_APPLY之间进行。若需要删除复制组,则该复制组不能处于配置阶段,即该组的配置已经提交或取消。
添加一个定时器,将同步复制修改为异步复制

SQL> SP_RPS_ADD_TIMER('hy2wy_TIMER','从hy到wy定时复制',1,0,0,0,'23:18:33',NULL,'2019-12-18 23:18:33',NULL,1);
SP_RPS_ADD_TIMER('hy2wy_TIMER','从hy到wy定时复制',1,0,0,0,'23:18:33',NULL,'2019-12-18 23:18:33',NULL,1);
[-8702]:group not set begin.
used time: 33.874(ms). Execute id is 0.

提示复制组设置没有开始所以不能进行修改

SQL> SP_RPS_SET_BEGIN('REP_GRP_hy2wy');
DMSQL executed successfully
used time: 0.481(ms). Execute id is 26.

设置时间开始时间为2019-12-18 23:38:33

SQL> SP_RPS_ADD_TIMER('hy2wy_TIMER','从hy到wy定时复制',1,0,0,0,'23:38:33',NULL,'2019-12-18 23:38:33',NULL,1);
DMSQL executed successfully
used time: 13.217(ms). Execute id is 29.
SQL> SP_RPS_REP_RESET_TIMER('REPhy2wy','hy2wy_TIMER');
DMSQL executed successfully
used time: 0.790(ms). Execute id is 30.
SQL> SP_RPS_SET_APPLY();
DMSQL executed successfully
used time: 24.494(ms). Execute id is 31.

现在在主库中插入数据

SQL> select * from t1;

LINEID     ID USERNAME
---------- -- --------
1          1  hy

used time: 2.296(ms). Execute id is 4.
SQL> insert into t1 values(2,'hy2');
affect rows 1

used time: 1.519(ms). Execute id is 5.
SQL> commit;
executed successfully
used time: 1.430(ms). Execute id is 6.
SQL> select * from t1;

LINEID     ID USERNAME
---------- -- --------
1          1  hy
2          2  hy2

used time: 0.512(ms). Execute id is 8.

在从库中查询记录

SQL> select * from t1;

LINEID     ID USERNAME
---------- -- --------
1          1  hy

used time: 0.337(ms). Execute id is 8.
SQL> select sysdate;

LINEID     SYSDATE                    
---------- ---------------------------
1          2019-12-18 23:38:37.570259

used time: 0.263(ms). Execute id is 9.

可以看到在时间2019-12-18 23:38:37.570259时间点表数据还没有同步到从库

SQL> select sysdate;

LINEID     SYSDATE                    
---------- ---------------------------
1          2019-12-18 23:39:05.629576

used time: 0.432(ms). Execute id is 12.
SQL> select * from t1;

LINEID     ID USERNAME
---------- -- --------
1          1  hy
2          2  hy2

used time: 0.458(ms). Execute id is 13.

可以看到在时间2019-12-18 2019-12-18 23:39:05.629576时间点表数据已经同步到从库

删除复制映射

SQL> SP_RPS_DROP_TAB_MAP('REPhy2wy', 'HY', 'T1', 'WY', 'T1');
DMSQL executed successfully
used time: 1.485(ms). Execute id is 36.

修改复制的错误超时时间,超时的时间单位是秒

SQL> SP_RPS_SET_ROUTE_FAULT_TIMEOUT('REPhy2wy', 60);
DMSQL executed successfully
used time: 0.791(ms). Execute id is 37.

删除复制关系

SQL> SP_RPS_DROP_REPLICATION('REPhy2wy');
DMSQL executed successfully
used time: 1.807(ms). Execute id is 38. 

删除整个复制组

SQL> SP_RPS_DROP_GROUP('REP_GRP_hy2wy');
DMSQL executed successfully
used time: 1.807(ms). Execute id is 39. 

删除数据复制所使用的用户和系统表

SQL> SP_INIT_REP_SYS(0);
DMSQL executed successfully
used time: 160.803(ms). Execute id is 51.
SQL> select * from SYSREP.RPS_INSTANCES;
select * from SYSREP.RPS_INSTANCES;
[-2103]:Error in line: 1
Invalid schema name [SYSREP].
used time: 0.735(ms). Execute id is 0.

另外,在配置或修改配置时想要取消操作,可以使用如下系统过程结束配置。

SQL> SP_RPS_SET_CANCEL();
DMSQL executed successfully
used time: 0.749(ms). Execute id is 40.

Oracle 11G RAC One Node’s Instance Arise ORA-01105 ORA-01606

生产环境,Oracle Linux 6.4 RAC 11.2.4.0,今天出现了grid集群因为OCR磁盘组不能访问而不能通过命令来显示集群状态信息,在手动mount OCR磁盘组后,准备重启节点2时出现了ORA-01105,ORA-01606错误,具体信息如下:

SQL> startup
ORACLE instance started.

Total System Global Area 5.3447E+10 bytes
Fixed Size                  2265864 bytes
Variable Size            1.3019E+10 bytes
Database Buffers         4.0265E+10 bytes
Redo Buffers              160698368 bytes
ORA-01105: mount is incompatible with mounts by other instances
ORA-01606: parameter not identical to that of another mounted instance

查看错误详细说明,根据错误描述可知是由于2号实例与1号实例由于某些参数设置一样所导致的

[oracle@db2 dbs]$ oerr ora 1105
01105, 00000, "mount is incompatible with mounts by other instances"
// *Cause:  An attempt to mount the database discovered that another instance
//         mounted a database by the same name, but the mount is not
//         compatible. Additional errors are reported explaining why.
// *Action: See accompanying errors.
[oracle@db2 dbs]$ oerr ora 1606
01606, 00000, "parameter not identical to that of another mounted instance"
// *Cause:  A parameter was different on two instances.
// *Action: Modify the initialization parameter and restart.

使用spfile文件来创建文本格式的pfile文件

SQL> create pfile='rlcs.ora' from spfile;

File created.

[oracle@db2 dbs]$ cat rlcs.ora

*._serial_direct_read='AUTO'
*._swrf_mmon_flush=TRUE
*._use_adaptive_log_file_sync='FALSE'
*.audit_file_dest='/u01/app/oracle/admin/RL/adump'
*.audit_trail='NONE'
*.cluster_database=true
*.compatible='11.2.0.4.0'
*.control_files='+DATA/rl/controlfile/current.260.926786537','+ARCH/rl/controlfile/current.256.926786537'
*.db_block_size=8192
*.db_create_file_dest='+DATA'
*.db_domain=''
*.db_file_name_convert='+data/rl/','+data/rldg/'
*.db_name='RL'
*.db_recovery_file_dest='+ARCH'
*.db_recovery_file_dest_size=10737418240
*.diagnostic_dest='/u01/app/oracle'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=RLZYXDB)'
*.fal_server='yb_st'
RL1.instance_number=1
RL2.instance_number=2
*.listener_networks=''
*.log_archive_config='dg_config=(rl,rldg)'
*.log_archive_dest_1='location=+ARCH  valid_for=(all_logfiles,all_roles) db_unique_name=rl'
*.log_archive_dest_2='service=yb_st valid_for=(online_logfiles,primary_role) db_unique_name=rldg'
*.log_archive_dest_state_1='ENABLE'
*.log_archive_dest_state_2='ENABLE'
*.log_archive_format='%t_%s_%r.arch'
*.log_file_name_convert='+data/rl/','+data/rldg/','+arch/rl/','+arch/rldg/'
*.open_cursors=300
*.parallel_adaptive_multi_user=TRUE
*.parallel_degree_policy='AUTO'
*.parallel_force_local=FALSE
*.pga_aggregate_target=21474836480
*.processes=2000
*.remote_listener='scan-ip:1521'
*.remote_login_passwordfile='exclusive'
*.service_names='rl'
*.sessions=2205
*.sga_max_size=53687091200
*.sga_target=53687091200
*.standby_file_management='manual'
RLZY2.thread=2
RLZY1.thread=1
*.undo_retention=7200
RLZY2.undo_tablespace='UNDOTBS2'
RLZY1.undo_tablespace='UNDOTBS1'
*.utl_file_dir='/rmanbak/utl'
从文本参数文件看不出来有什么参数是两个实例不一致的。

通过执行下面的语句来查看2号实例与gc相关的参数
SQL> set linesize 333
SQL> col name for a35
SQL> col description for a66
SQL> col value for a30
SQL> SELECT   i.ksppinm name,  
  2     i.ksppdesc description,  
  3     CV.ksppstvl VALUE
  4  FROM   sys.x$ksppi i, sys.x$ksppcv CV  
  5     WHERE   i.inst_id = USERENV ('Instance')  
  6     AND CV.inst_id = USERENV ('Instance')  
  7     AND i.indx = CV.indx  
  8     AND i.ksppinm LIKE '/_gc%' ESCAPE '/'  
  9  ORDER BY   REPLACE (i.ksppinm, '_', '');  

NAME                                DESCRIPTION                                                        VALUE
----------------------------------- ------------------------------------------------------------------ ------------------------------
_gc_affinity_locking                if TRUE, enable object affinity                                    TRUE
_gc_affinity_locks                  if TRUE, get affinity locks                                        TRUE
_gc_affinity_ratio                  dynamic object affinity ratio                                      50
_gc_async_memcpy                    if TRUE, use async memcpy                                          FALSE
_gc_bypass_readers                  if TRUE, modifications bypass readers                              TRUE
_gc_check_bscn                      if TRUE, check for stale blocks                                    TRUE
_gc_coalesce_recovery_reads         if TRUE, coalesce recovery reads                                   TRUE
_gc_cpu_time                        if TRUE, record the gc cpu time                                    FALSE
_gc_cr_server_read_wait             if TRUE, cr server waits for a read to complete                    TRUE
_gc_defer_ping_index_only           if TRUE, restrict deferred ping to index blocks only               TRUE
_gc_defer_time                      how long to defer pings for hot buffers in milliseconds            0

NAME                                DESCRIPTION                                                        VALUE
----------------------------------- ------------------------------------------------------------------ ------------------------------
_gc_delta_push_compression          if delta >= K bytes, compress before push                          3072
_gc_delta_push_max_level            max delta level for delta push                                     100
_gc_delta_push_objects              objects which use delta push                                       0
_gc_disable_s_lock_brr_ping_check   if TRUE, disable S lock BRR ping check for lost write protect      TRUE
_gc_down_convert_after_keep         if TRUE, down-convert lock after recovery                          TRUE
_gc_element_percent                 global cache element percent                                       110
_gc_escalate_bid                    if TRUE, escalates create a bid                                    TRUE
_gc_fg_merge                        if TRUE, merge pi buffers in the foreground                        TRUE
_gc_flush_during_affinity           if TRUE, flush during affinity                                     TRUE
_gc_fusion_compression              compress fusion blocks if there is free space                      1024
_gc_global_checkpoint_scn           if TRUE, enable global checkpoint scn                              TRUE

NAME                                DESCRIPTION                                                        VALUE
----------------------------------- ------------------------------------------------------------------ ------------------------------
_gc_global_cpu                      global cpu checks                                                  TRUE
_gc_global_lru                      turn global lru off, make it automatic, or turn it on              AUTO
_gc_global_lru_touch_count          global lru touch count                                             5
_gc_global_lru_touch_time           global lru touch time in seconds                                   60
_gc_integrity_checks                set the integrity check level                                      1
_gc_keep_recovery_buffers           if TRUE, make single instance crash recovery buffers current       TRUE
_gc_latches                         number of latches per LMS process                                  8
_gc_log_flush                       if TRUE, flush redo log before a current block transfer            TRUE
_gc_long_query_threshold            threshold for long running query                                   0
_gc_max_downcvt                     maximum downconverts to process at one time                        256
_gc_maximum_bids                    maximum number of bids which can be prepared                       0

NAME                                DESCRIPTION                                                        VALUE
----------------------------------- ------------------------------------------------------------------ ------------------------------
_gc_no_fairness_for_clones          if TRUE, no fairness if we serve a clone                           TRUE
_gc_object_queue_max_length         maximum length for an object queue                                 0
_gc_override_force_cr               if TRUE, try to override force-cr requests                         TRUE
_gc_persistent_read_mostly          if TRUE, enable persistent read-mostly locking                     TRUE
_gc_policy_minimum                  dynamic object policy minimum activity per minute                  1500
_gc_policy_time                     how often to make object policy decisions in minutes               10
_gc_read_mostly_flush_check         if TRUE, optimize flushes for read mostly objects                  FALSE
_gc_read_mostly_locking             if TRUE, enable read-mostly locking                                FALSE
_gcr_enable_high_cpu_kill           if TRUE, GCR may kill foregrounds under high load                  FALSE
_gcr_enable_high_cpu_rm             if TRUE, GCR may enable a RM plan under high load                  FALSE
_gcr_enable_high_cpu_rt             if TRUE, GCR may boost bg priority under high load                 FALSE

NAME                                DESCRIPTION                                                        VALUE
----------------------------------- ------------------------------------------------------------------ ------------------------------
_gcr_high_cpu_threshold             minimum amount of CPU process must consume to be kill target       10
_gcr_use_css                        if FALSE, GCR wont register with CSS nor use any CSS feature       TRUE
_gc_sanity_check_cr_buffers         if TRUE, sanity check CR buffers                                   FALSE
_gcs_disable_remote_handles         disable remote client/shadow handles                               FALSE
_gcs_disable_skip_close_remastering if TRUE, disable skip close optimization in remastering            FALSE
_gc_serve_high_pi_as_current        if TRUE, use a higher clone scn when serving a pi                  TRUE
_gcs_fast_reconfig                  if TRUE, enable fast reconfiguration for gcs locks                 TRUE
_gcs_latches                        number of gcs resource hash latches to be allocated per LMS proces 64
                                    s

_gcs_pkey_history                   number of pkey remastering history                                 4000

NAME                                DESCRIPTION                                                        VALUE
----------------------------------- ------------------------------------------------------------------ ------------------------------
_gcs_process_in_recovery            if TRUE, process gcs requests during instance recovery             TRUE
_gcs_resources                      number of gcs resources to be allocated
_gcs_res_per_bucket                 number of gcs resource per hash bucket                             4
_gcs_shadow_locks                   number of pcm shadow locks to be allocated
_gc_statistics                      if TRUE, kcl statistics are maintained                             TRUE
_gcs_testing                        GCS testing parameter                                              0
_gc_transfer_ratio                  dynamic object read-mostly transfer ratio                          2
_gc_undo_affinity                   if TRUE, enable dynamic undo affinity                              TRUE
_gc_undo_block_disk_reads           if TRUE, enable undo block disk reads                              TRUE
_gc_use_cr                          if TRUE, allow CR pins on PI and WRITING buffers                   TRUE
_gc_vector_read                     if TRUE, vector read current buffers                               TRUE

64 rows selected.

查看1号实例与gc相关的参数

SQL> set linesize 333
SQL> col name for a35
SQL> col description for a66
SQL> col value for a30
SQL> SELECT   i.ksppinm name,  
  2     i.ksppdesc description,  
  3     CV.ksppstvl VALUE
  4  FROM   sys.x$ksppi i, sys.x$ksppcv CV  
  5     WHERE   i.inst_id = USERENV ('Instance')  
  6     AND CV.inst_id = USERENV ('Instance')  
  7     AND i.indx = CV.indx  
  8     AND i.ksppinm LIKE '/_gc%' ESCAPE '/'  
  9  ORDER BY   REPLACE (i.ksppinm, '_', '');  

NAME                                DESCRIPTION                                                        VALUE
----------------------------------- ------------------------------------------------------------------ ------------------------------
_gc_affinity_locking                if TRUE, enable object affinity                                    TRUE
_gc_affinity_locks                  if TRUE, get affinity locks                                        TRUE
_gc_affinity_ratio                  dynamic object affinity ratio                                      50
_gc_async_memcpy                    if TRUE, use async memcpy                                          FALSE
_gc_bypass_readers                  if TRUE, modifications bypass readers                              TRUE
_gc_check_bscn                      if TRUE, check for stale blocks                                    TRUE
_gc_coalesce_recovery_reads         if TRUE, coalesce recovery reads                                   TRUE
_gc_cpu_time                        if TRUE, record the gc cpu time                                    FALSE
_gc_cr_server_read_wait             if TRUE, cr server waits for a read to complete                    TRUE
_gc_defer_ping_index_only           if TRUE, restrict deferred ping to index blocks only               TRUE
_gc_defer_time                      how long to defer pings for hot buffers in milliseconds            0

NAME                                DESCRIPTION                                                        VALUE
----------------------------------- ------------------------------------------------------------------ ------------------------------
_gc_delta_push_compression          if delta >= K bytes, compress before push                          3072
_gc_delta_push_max_level            max delta level for delta push                                     100
_gc_delta_push_objects              objects which use delta push                                       0
_gc_disable_s_lock_brr_ping_check   if TRUE, disable S lock BRR ping check for lost write protect      TRUE
_gc_down_convert_after_keep         if TRUE, down-convert lock after recovery                          TRUE
_gc_element_percent                 global cache element percent                                       110
_gc_escalate_bid                    if TRUE, escalates create a bid                                    TRUE
_gc_fg_merge                        if TRUE, merge pi buffers in the foreground                        TRUE
_gc_flush_during_affinity           if TRUE, flush during affinity                                     TRUE
_gc_fusion_compression              compress fusion blocks if there is free space                      1024
_gc_global_checkpoint_scn           if TRUE, enable global checkpoint scn                              TRUE

NAME                                DESCRIPTION                                                        VALUE
----------------------------------- ------------------------------------------------------------------ ------------------------------
_gc_global_cpu                      global cpu checks                                                  TRUE
_gc_global_lru                      turn global lru off, make it automatic, or turn it on              AUTO
_gc_global_lru_touch_count          global lru touch count                                             5
_gc_global_lru_touch_time           global lru touch time in seconds                                   60
_gc_integrity_checks                set the integrity check level                                      1
_gc_keep_recovery_buffers           if TRUE, make single instance crash recovery buffers current       TRUE
_gc_latches                         number of latches per LMS process                                  8
_gc_log_flush                       if TRUE, flush redo log before a current block transfer            TRUE
_gc_long_query_threshold            threshold for long running query                                   0
_gc_max_downcvt                     maximum downconverts to process at one time                        256
_gc_maximum_bids                    maximum number of bids which can be prepared                       0

NAME                                DESCRIPTION                                                        VALUE
----------------------------------- ------------------------------------------------------------------ ------------------------------
_gc_no_fairness_for_clones          if TRUE, no fairness if we serve a clone                           TRUE
_gc_object_queue_max_length         maximum length for an object queue                                 0
_gc_override_force_cr               if TRUE, try to override force-cr requests                         TRUE
_gc_persistent_read_mostly          if TRUE, enable persistent read-mostly locking                     TRUE
_gc_policy_minimum                  dynamic object policy minimum activity per minute                  1500
_gc_policy_time                     how often to make object policy decisions in minutes               10
_gc_read_mostly_flush_check         if TRUE, optimize flushes for read mostly objects                  FALSE
_gc_read_mostly_locking             if TRUE, enable read-mostly locking                                TRUE
_gcr_enable_high_cpu_kill           if TRUE, GCR may kill foregrounds under high load                  FALSE
_gcr_enable_high_cpu_rm             if TRUE, GCR may enable a RM plan under high load                  FALSE
_gcr_enable_high_cpu_rt             if TRUE, GCR may boost bg priority under high load                 FALSE

NAME                                DESCRIPTION                                                        VALUE
----------------------------------- ------------------------------------------------------------------ ------------------------------
_gcr_high_cpu_threshold             minimum amount of CPU process must consume to be kill target       10
_gcr_use_css                        if FALSE, GCR wont register with CSS nor use any CSS feature       TRUE
_gc_sanity_check_cr_buffers         if TRUE, sanity check CR buffers                                   FALSE
_gcs_disable_remote_handles         disable remote client/shadow handles                               FALSE
_gcs_disable_skip_close_remastering if TRUE, disable skip close optimization in remastering            FALSE
_gc_serve_high_pi_as_current        if TRUE, use a higher clone scn when serving a pi                  TRUE
_gcs_fast_reconfig                  if TRUE, enable fast reconfiguration for gcs locks                 TRUE
_gcs_latches                        number of gcs resource hash latches to be allocated per LMS proces 64
                                    s

_gcs_pkey_history                   number of pkey remastering history                                 4000

NAME                                DESCRIPTION                                                        VALUE
----------------------------------- ------------------------------------------------------------------ ------------------------------
_gcs_process_in_recovery            if TRUE, process gcs requests during instance recovery             TRUE
_gcs_resources                      number of gcs resources to be allocated
_gcs_res_per_bucket                 number of gcs resource per hash bucket                             4
_gcs_shadow_locks                   number of pcm shadow locks to be allocated
_gc_statistics                      if TRUE, kcl statistics are maintained                             TRUE
_gcs_testing                        GCS testing parameter                                              0
_gc_transfer_ratio                  dynamic object read-mostly transfer ratio                          2
_gc_undo_affinity                   if TRUE, enable dynamic undo affinity                              TRUE
_gc_undo_block_disk_reads           if TRUE, enable undo block disk reads                              TRUE
_gc_use_cr                          if TRUE, allow CR pins on PI and WRITING buffers                   TRUE
_gc_vector_read                     if TRUE, vector read current buffers                               TRUE

64 rows selected.

通过对比发现_gc_read_mostly_locking参数在1号实例中为true,2号实例为false

将所有实例中的_gc_read_mostly_locking参数设置为true

SQL> alter system set "_gc_read_mostly_locking"=true scope=spfile sid='*';

System altered.

重启2号实例成功

SQL> shutdown immediate
ORA-01507: database not mounted


ORACLE instance shut down.
SQL> startup
ORACLE instance started.

Total System Global Area 5.3447E+10 bytes
Fixed Size                  2265864 bytes
Variable Size            1.3019E+10 bytes
Database Buffers         4.0265E+10 bytes
Redo Buffers              160698368 bytes
Database mounted.
Database opened.

原因是因为之前有人修改过_gc_read_mostly_locking隐藏参数,只是只修改了1号实例。

mysqld –skip-grant-tables

mysqld的–skip-grant-tables选项
这个选项会导致不使用权限系统来启动服务器,它将让任何用户可以访问服务器并且不受限制的访问所有数据库。在不使用授权表启动服务器后可以通过shell来执行mysqladmin flush-privileges或mysqladmin reload命令或者在连接到服务器后执行flush privileges语句来让正在运行的服务器再次使用授权表。

使用–skip-grant-tables选项启动服务器

[root@localhost mysql]# service mysqld stop
Shutting down MySQL. SUCCESS! 


[root@localhost mysql]# service mysqld start --skip-grant-tables
Starting MySQL.. SUCCESS! 

现在就可以不使用用户和密码就可以登录服务器

[mysql@localhost ~]$ mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.26-log Source distribution

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

现在可以执行mysqladin flush-privileges命令让正在运行的服务器再次使用授权表

[mysql@localhost ~]$ mysqladmin  flush-privileges

现在不使用用户和密码就不能登录服务器了,必须使用用户和密码才能登录了

[mysql@localhost ~]$ mysql
ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: NO)


[mysql@localhost ~]$ mysql -uroot -pabcd mysql
mysql: [Warning] Using a password on the command line interface can be insecure.
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 5
Server version: 5.7.26-log Source distribution

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

再次使用–skip-grant-tables选项启动服务器

[root@localhost mysql]# service mysqld stop
Shutting down MySQL.. SUCCESS! 

[root@localhost mysql]# service mysqld start --skip-grant-tables
Starting MySQL.. SUCCESS! 

现在就可以不使用用户和密码就可以登录服务器

[mysql@localhost ~]$ mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.26-log Source distribution

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

现在可以执行mysqladin reload命令让正在运行的服务器再次使用授权表

mysql@localhost ~]$ mysqladmin reload

现在不使用用户和密码就不能登录服务器了,必须使用用户和密码才能登录了

[mysql@localhost ~]$ mysql
ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: NO)
[mysql@localhost ~]$ mysql -uroot -pabcd mysql
mysql: [Warning] Using a password on the command line interface can be insecure.
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 5
Server version: 5.7.26-log Source distribution

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

再次使用–skip-grant-tables选项启动服务器

[root@localhost mysql]# service mysqld stop
Shutting down MySQL. SUCCESS! 


[root@localhost mysql]# service mysqld start --skip-grant-tables
Starting MySQL.. SUCCESS! 

现在就可以不使用用户和密码就可以登录服务器

[mysql@localhost ~]$ mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.26-log Source distribution

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

现在可以执行flush privileges语句让正在运行的服务器再次使用授权表

mysql> flush privileges;
Query OK, 0 rows affected (0.12 sec)

现在不使用用户和密码就不能登录服务器了,必须使用用户和密码才能登录了

[mysql@localhost ~]$ mysql
ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: NO)
[mysql@localhost ~]$ mysql -uroot -pabcd mysql
mysql: [Warning] Using a password on the command line interface can be insecure.
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 5
Server version: 5.7.26-log Source distribution

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

–skip-grant-tables选项也可以在选项文件my.cnf中进行设置。这个选项还会导致服务器在启动过程中禁止加载用户定义函数(udf),调度事件和安装插件语句中安装的插件。为了以任何方式来加载插件,使用–plugin-load选项。–skip-grant-tables选项也会导致disabled_storage_engines系统变量失效。

flush privileges语句可以在服务器启动后通过执行其它操作来隐式执行。例如在升级过程中mysql_upgrade程序就会刷新权限。