MySQL5.7 分区如何处理NULL

MySQL分区如何处理NULL
MySQL中的分区并没有禁止NULL作为分区表达式的值,无论是列值还是用户提供的表达式的值。尽管允许使用NULL作为表达式的值,但要记住 NULL不是数字,这一点很重要。MySQL的分区实现将NULL视为小于任何非空值,就像ORDER BY那样。

这意味着不同类型的分区对NULL的处理方式不同,如果你没有做好准备,可能会产生意想不到的行为。在这种情况下,本节将讨论MySQL分区类 型如何处理NULL值,并分别提供示例。

范围分区处理NULL。
如果你将一行插入到一个按范围分区的表中,以便用于确定分区的列值为NULL,那么这一行将插入到最低的分区中。考虑数据库test中的这两张表,创建方法如下:

mysql> CREATE TABLE t1 (
    -> c1 INT,
    -> c2 VARCHAR(20)
    -> )
    -> PARTITION BY RANGE(c1) (
    -> PARTITION p0 VALUES LESS THAN (0),
    -> PARTITION p1 VALUES LESS THAN (10),
    -> PARTITION p2 VALUES LESS THAN MAXVALUE
    -> );
Query OK, 0 rows affected (0.02 sec)

mysql> CREATE TABLE t2 (
    -> c1 INT,
    -> c2 VARCHAR(20)
    -> )
    -> PARTITION BY RANGE(c1) (
    -> PARTITION p0 VALUES LESS THAN (-5),
    -> PARTITION p1 VALUES LESS THAN (0),
    -> PARTITION p2 VALUES LESS THAN (10),
    -> PARTITION p3 VALUES LESS THAN MAXVALUE
    -> );
Query OK, 0 rows affected (0.02 sec)

在INFORMATION_SCHEMA数据库中,用户可以使用以下查询来查看这两个CREATE TABLE语句创建的分区:

mysql> SELECT TABLE_NAME, PARTITION_NAME, TABLE_ROWS, AVG_ROW_LENGTH, DATA_LENGTH  FROM INFORMATION_SCHEMA.PARTITIONS WHERE  TABLE_SCHEMA = 'test' AND TABLE_NAME LIKE 't_';
+------------+----------------+------------+----------------+-------------+
| TABLE_NAME | PARTITION_NAME | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH |
+------------+----------------+------------+----------------+-------------+
| t1         | p0             |          0 |              0 |       16384 |
| t1         | p1             |          0 |              0 |       16384 |
| t1         | p2             |          0 |              0 |       16384 |
| t2         | p0             |          0 |              0 |       16384 |
| t2         | p1             |          0 |              0 |       16384 |
| t2         | p2             |          0 |              0 |       16384 |
| t2         | p3             |          0 |              0 |       16384 |
+------------+----------------+------------+----------------+-------------+
7 rows in set (0.01 sec)

现在让我们在每个表中插入一行包含NULL作为分区键的列,并使用一对SELECT语句验证这些行是否被插入:

mysql> INSERT INTO t1 VALUES (NULL, 'mothra');
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO t2 VALUES (NULL, 'mothra');
Query OK, 1 row affected (0.00 sec)

mysql> SELECT * FROM t1;
+------+--------+
| c1   | c2     |
+------+--------+
| NULL | mothra |
+------+--------+
1 row in set (0.00 sec)

mysql> SELECT * FROM t2;
+------+--------+
| c1   | c2     |
+------+--------+
| NULL | mothra |
+------+--------+
1 row in set (0.00 sec)

通过再次对INFORMATION_SCHEMA.PARTITIONS表运行前面的查询,您可以看到哪些分区用于存储插入的行:

mysql> SELECT TABLE_NAME, PARTITION_NAME, TABLE_ROWS, AVG_ROW_LENGTH, DATA_LENGTH  FROM INFORMATION_SCHEMA.PARTITIONS WHERE  TABLE_SCHEMA = 'test' AND TABLE_NAME LIKE 't_';
+------------+----------------+------------+----------------+-------------+
| TABLE_NAME | PARTITION_NAME | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH |
+------------+----------------+------------+----------------+-------------+
| t1         | p0             |          1 |          16384 |       16384 |
| t1         | p1             |          0 |              0 |       16384 |
| t1         | p2             |          0 |              0 |       16384 |
| t2         | p0             |          1 |          16384 |       16384 |
| t2         | p1             |          0 |              0 |       16384 |
| t2         | p2             |          0 |              0 |       16384 |
| t2         | p3             |          0 |              0 |       16384 |
+------------+----------------+------------+----------------+-------------+
7 rows in set (0.00 sec)

用户还可以删除这些分区,然后重新运行SELECT语句,证明这些行存储在每个表的最低分区中:

mysql> ALTER TABLE t1 DROP PARTITION p0;
Query OK, 0 rows affected (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> ALTER TABLE t2 DROP PARTITION p0;
Query OK, 0 rows affected (0.07 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM t1;
Empty set (0.00 sec)

mysql> SELECT * FROM t2;
Empty set (0.00 sec)

对于使用SQL函数的分区表达式,NULL也是这样处理的。假设我们使用CREATE TABLE语句定义一个表,如下所示:

mysql> CREATE TABLE tndate (
    -> id INT,
    -> dt DATE
    -> )
    -> PARTITION BY RANGE( YEAR(dt) ) (
    -> PARTITION p0 VALUES LESS THAN (1990),
    -> PARTITION p1 VALUES LESS THAN (2000),
    -> PARTITION p2 VALUES LESS THAN MAXVALUE
    -> );
Query OK, 0 rows affected (0.01 sec)

与其他MySQL函数一样,YEAR(NULL)返回NULL。dt列值为NULL的行被视为分区表达式的值小于任何其他值,因此被插入到分区p0中。

列表分区处理NULL。
按列表分区的表,当且仅当其中一个分区是用包含NULL的值列表定义的时,才允许NULL值。与之相反的是,按LIST分区的表如果没有在值列表中 显式使用NULL,则会拒绝在分区表达式中使用NULL值,如下面的例子所示:

mysql> CREATE TABLE ts1 (
    -> c1 INT,
    -> c2 VARCHAR(20)
    -> )
    -> PARTITION BY LIST(c1) (
    -> PARTITION p0 VALUES IN (0, 3, 6),
    -> PARTITION p1 VALUES IN (1, 4, 7),
    -> PARTITION p2 VALUES IN (2, 5, 8)
    -> );
Query OK, 0 rows affected (0.01 sec)

mysql> INSERT INTO ts1 VALUES (9, 'mothra');
ERROR 1526 (HY000): Table has no partition for value 9
mysql> INSERT INTO ts1 VALUES (NULL, 'mothra');
ERROR 1526 (HY000): Table has no partition for value NULL

只有c1值在0到8之间(包括0和8)的行才能插入到ts1中。NULL超出了这个范围,就像数字9一样。我们可以创建包含NULL的值列表的表ts2和ts3 ,如下所示:

mysql> CREATE TABLE ts2 (
    -> c1 INT,
    -> c2 VARCHAR(20)
    -> )
    -> PARTITION BY LIST(c1) (
    -> PARTITION p0 VALUES IN (0, 3, 6),
    -> PARTITION p1 VALUES IN (1, 4, 7),
    -> PARTITION p2 VALUES IN (2, 5, 8),
    -> PARTITION p3 VALUES IN (NULL)
    -> );
Query OK, 0 rows affected (0.04 sec)

mysql> CREATE TABLE ts3 (
    -> c1 INT,
    -> c2 VARCHAR(20)
    -> )
    -> PARTITION BY LIST(c1) (
    -> PARTITION p0 VALUES IN (0, 3, 6),
    -> PARTITION p1 VALUES IN (1, 4, 7, NULL),
    -> PARTITION p2 VALUES IN (2, 5, 8)
    -> );
Query OK, 0 rows affected (0.02 sec)

在为分区定义值列表时,你可以(也应该)像对待其他值一样对待NULL。例如,(NULL)和(1,4,7,NULL)中的值都是有效的,(1,NULL, 4, 7)、(NULL, 1, 4, 7)等的值也是有效的。你可以在表ts2和ts3中分别插入列c1为NULL的行:

mysql> INSERT INTO ts2 VALUES (NULL, 'mothra');
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO ts3 VALUES (NULL, 'mothra');
Query OK, 1 row affected (0.00 sec)

通过对INFORMATION_SCHEMA.PARTITIONS发出适当的查询。你可以确定哪些分区用于存储刚刚插入的行(和前面的例子一样,我们假设分区表是 在test数据库中创建的):

mysql> SELECT TABLE_NAME, PARTITION_NAME, TABLE_ROWS, AVG_ROW_LENGTH, DATA_LENGTH  FROM INFORMATION_SCHEMA.PARTITIONS WHERE  TABLE_SCHEMA = 'test' AND TABLE_NAME LIKE 'ts_';
+------------+----------------+------------+----------------+-------------+
| TABLE_NAME | PARTITION_NAME | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH |
+------------+----------------+------------+----------------+-------------+
| ts2        | p0             |          0 |              0 |       16384 |
| ts2        | p1             |          0 |              0 |       16384 |
| ts2        | p2             |          0 |              0 |       16384 |
| ts2        | p3             |          1 |          16384 |       16384 |
| ts3        | p0             |          0 |              0 |       16384 |
| ts3        | p1             |          1 |          16384 |       16384 |
| ts3        | p2             |          0 |              0 |       16384 |
+------------+----------------+------------+----------------+-------------+
7 rows in set (0.01 sec)

如前面所示,您还可以通过删除这些分区,然后执行SELECT来验证哪些分区用于存储行。

哈希和键分区处理NULL。
对于按哈希分区的表,对NULL的处理有些不同。在这些情况下,任何产生NULL值的分区表达式都将被视为其返回值为零。我们可以通过检查创建 按哈希分区的表并使用包含适当值的记录填充表对文件系统的影响来验证这种行为。假设你用下面的语句创建了一个表th(也在test数据库中) :

mysql> CREATE TABLE th (
    -> c1 INT,
    -> c2 VARCHAR(20)
    -> )
    -> PARTITION BY HASH(c1)
    -> PARTITIONS 2;
Query OK, 0 rows affected (0.01 sec)

属于该表的分区可以使用下面的查询来查看:

mysql> SELECT TABLE_NAME, PARTITION_NAME, TABLE_ROWS, AVG_ROW_LENGTH, DATA_LENGTH  FROM INFORMATION_SCHEMA.PARTITIONS WHERE  TABLE_SCHEMA = 'test' AND TABLE_NAME='th';
+------------+----------------+------------+----------------+-------------+
| TABLE_NAME | PARTITION_NAME | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH |
+------------+----------------+------------+----------------+-------------+
| th         | p0             |          0 |              0 |       16384 |
| th         | p1             |          0 |              0 |       16384 |
+------------+----------------+------------+----------------+-------------+
2 rows in set (0.00 sec)

每个分区的TABLE_ROWS为0。现在插入两行,c1列的值分别为NULL和0,并验证这两行是否被插入,如下所示:

mysql> INSERT INTO th VALUES (NULL, 'mothra'), (0, 'gigan');
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM th;
+------+--------+
| c1   | c2     |
+------+--------+
| NULL | mothra |
|    0 | gigan  |
+------+--------+
2 rows in set (0.00 sec)

mysql>

回想一下,对于任何整数N, NULL MOD N的值总是NULL。对于按哈希或键进行分区的表,将该结果作为确定正确分区的0处理。再次查询 INFORMATION_SCHEMA.PARTITIONS表,我们可以看到,两行都插入到了分区p0中:

mysql> SELECT TABLE_NAME, PARTITION_NAME, TABLE_ROWS, AVG_ROW_LENGTH, DATA_LENGTH  FROM INFORMATION_SCHEMA.PARTITIONS WHERE  TABLE_SCHEMA = 'test' AND TABLE_NAME='th';
+------------+----------------+------------+----------------+-------------+
| TABLE_NAME | PARTITION_NAME | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH |
+------------+----------------+------------+----------------+-------------+
| th         | p0             |          2 |           8192 |       16384 |
| th         | p1             |          0 |              0 |       16384 |
+------------+----------------+------------+----------------+-------------+
2 rows in set (0.00 sec)

键分区处理NULL。
对于按键分区的表,对NULL的处理有些不同。在这些情况下,任何产生NULL值的分区表达式都将被视为其返回值为NULL。我们可以通过检查创建 按键分区的表并使用包含适当值的记录填充表对文件系统的影响来验证这种行为。假设你用下面的语句创建了一个表tk(也在test数据库中):

mysql> CREATE TABLE tk (
    -> c1 INT NOT NULL PRIMARY KEY,
    -> c2 VARCHAR(20)
    -> )
    -> PARTITION BY KEY()
    -> PARTITIONS 2;
Query OK, 0 rows affected (0.02 sec)

属于该表的分区可以使用下面的查询来查看:

mysql> SELECT TABLE_NAME, PARTITION_NAME, TABLE_ROWS, AVG_ROW_LENGTH, DATA_LENGTH  FROM INFORMATION_SCHEMA.PARTITIONS WHERE  TABLE_SCHEMA = 'test' AND TABLE_NAME='tk';
+------------+----------------+------------+----------------+-------------+
| TABLE_NAME | PARTITION_NAME | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH |
+------------+----------------+------------+----------------+-------------+
| tk         | p0             |          0 |              0 |       16384 |
| tk         | p1             |          0 |              0 |       16384 |
+------------+----------------+------------+----------------+-------------+
2 rows in set (0.00 sec)

每个分区的TABLE_ROWS为0。现在插入两行,c1列的值分别为NULL和0,并验证这两行是否被插入,如下所示:

mysql> INSERT INTO tk VALUES (NULL, 'mothra'), (0, 'gigan');
ERROR 1048 (23000): Column 'c1' cannot be null