记一次MySQL死锁排查

MySQL

一. 环境

1. 表结构

涉及到以下两张表:

  • BIND_RECORD
CREATE TABLE `BIND_RECORD` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `bind_uid` bigint(20) NOT NULL COMMENT '被绑uid',
  `bind_by_uid` bigint(20) NOT NULL COMMENT '渠道uid',
  `bind_type` int(2) NOT NULL COMMENT '绑定规则类型,1-长期,2-连续订阅,3-短期',
  `bind_rule_ref` bigint(20) NOT NULL COMMENT '绑定规则id',
  `start_time` datetime NOT NULL COMMENT '有效起始时间',
  `end_time` datetime DEFAULT NULL COMMENT '有效终止时间',
  `extra` varchar(1024) DEFAULT NULL COMMENT '额外信息,json字段',
  `created_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
  `last_updated_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '修改时间',
  PRIMARY KEY (`id`),
  UNIQUE KEY `uq_bind_uid_type` (`bind_uid`,`bind_type`),
  KEY `idx_bind_uid` (`bind_uid`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8mb4 COMMENT='绑定记录表';
  • BIND_RECORD_HISTORY
CREATE TABLE `BIND_RECORD_HISTORY` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `bind_uid` bigint(20) NOT NULL COMMENT '渠道uid',
  `bind_by_uid` bigint(20) NOT NULL COMMENT '被绑定方uid',
  `bind_type` int(2) NOT NULL COMMENT '绑定规则类型,1-长期,2-连续订阅,3-短期',
  `bind_rule_ref` bigint(20) NOT NULL COMMENT '绑定规则id',
  `start_time` datetime NOT NULL COMMENT '有效起始时间',
  `end_time` datetime DEFAULT NULL COMMENT '有效终止时间',
  `created_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
  `last_updated_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '修改时间',
  `extra` varchar(1024) DEFAULT NULL COMMENT '额外信息,json字段',
  PRIMARY KEY (`id`),
  KEY `idx_bind_uid` (`bind_uid`)
) ENGINE=InnoDB AUTO_INCREMENT=245 DEFAULT CHARSET=utf8mb4 COMMENT='绑定记录历史表,删除的绑定规则会移到历史表中';

表结构如上,使用InnoDB

插入一条测试数据:

INSERT INTO `xmds_bind`.`BIND_RECORD`(`id`, `bind_uid`, `bind_by_uid`, `bind_type`, `bind_rule_ref`, `start_time`, `end_time`, `extra`, `created_time`, `last_updated_time`) VALUES (244, 57431, 205449, 1, 75, '2018-12-07 12:43:59', '2018-12-01 12:43:59', NULL, '2018-11-29 10:55:30', '2019-02-13 16:35:40');

2. 数据库环境

查询数据的事务隔离级别如下:

mysql> SELECT @@tx_isolation;
+-----------------+
| @@tx_isolation  |
+-----------------+
| REPEATABLE-READ |
+-----------------+
1 row in set (0.00 sec)

REPEATABLE-READ:可重复读

二. 死锁分析

1. 死锁日志

在服务运行中在日志中出现了ERROR信息:

Deadlock found when trying to get lock; try restarting transaction

由信息可以明确的知道是死锁造成的,使用下面的命令查看MySQL的日志。

show engine innodb status

找到LATEST DETECTED DEADLOCK 死锁信息如下:

------------------------
LATEST DETECTED DEADLOCK
------------------------
2018-12-07 12:44:00 7faa671c7700
*** (1) TRANSACTION:
TRANSACTION 478255595954, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 360, 1 row lock(s)
MySQL thread id 95407803, OS thread handle 0x7facb920e700, query id 2341515722 172.31.42.155 naliworld updating
UPDATE BIND_RECORD  SET id = id,bind_uid = 57431,bind_by_uid = 205449,bind_type = 3,bind_rule_ref = 75,start_time = '2018-12-07 12:43:59',end_time = '2018-12-10 12:43:59' WHERE  id = 244
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 129635 page no 3 n bits 152 index `PRIMARY` of table `xmds-bind`.`BIND_RECORD` trx id 478255595954 lock_mode X locks rec but not gap waiting
Record lock, heap no 40 PHYSICAL RECORD: n_fields 12; compact format; info bits 0
 0: len 8; hex 80000000000000f4; asc         ;;
 1: len 6; hex 006f5a414156; asc  oZAAV;;
 2: len 7; hex 1d0000048618fd; asc        ;;
 3: len 8; hex 800000000000e057; asc        W;;
 4: len 8; hex 8000000000032289; asc       " ;;
 5: len 4; hex 80000003; asc     ;;
 6: len 8; hex 800000000000004b; asc        K;;
 7: len 5; hex 99a18ecafb; asc      ;;
 8: len 5; hex 99a194cafb; asc      ;;
 9: SQL NULL;
 10: len 5; hex 99a18ecaf6; asc      ;;
 11: len 4; hex 5c09fa8f; asc \   ;;

*** (2) TRANSACTION:
TRANSACTION 478255595950, ACTIVE 0 sec fetching rows
mysql tables in use 1, locked 1
5 lock struct(s), heap size 1184, 96 row lock(s)
MySQL thread id 95335471, OS thread handle 0x7faa671c7700, query id 2341515729 172.31.42.155 naliworld updating
DELETE FROM BIND_RECORD  WHERE       (  end_time is not null
            
        
          
              and end_time < '2018-12-07 12:44:00' )
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 129635 page no 3 n bits 152 index `PRIMARY` of table `xmds-bind`.`BIND_RECORD` trx id 478255595950 lock mode S
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

Record lock, heap no 23 PHYSICAL RECORD: n_fields 12; compact format; info bits 0
 0: len 8; hex 8000000000000027; asc        ';;
 1: len 6; hex 006f53755353; asc  oSuSS;;
 2: len 7; hex 5900000542059a; asc Y   B  ;;
 3: len 8; hex 8000000000032083; asc         ;;
 4: len 8; hex 80000000000318fe; asc         ;;
 5: len 4; hex 80000001; asc     ;;
 6: len 8; hex 8000000000000028; asc        (;;
 7: len 5; hex 99a15b1de1; asc   [  ;;
 8: SQL NULL;
 9: len 2; hex 7b7d; asc {};;
 10: len 5; hex 99a15b1de1; asc   [  ;;
 11: len 4; hex 5beaa899; asc [   ;;

......中间省略了和上面几乎一样的一堆信息,因为是批量删除

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 129635 page no 3 n bits 152 index `PRIMARY` of table `xmds-bind`.`BIND_RECORD` trx id 478255595950 lock_mode X waiting
Record lock, heap no 40 PHYSICAL RECORD: n_fields 12; compact format; info bits 0
 0: len 8; hex 80000000000000f4; asc         ;;
 1: len 6; hex 006f5a414156; asc  oZAAV;;
 2: len 7; hex 1d0000048618fd; asc        ;;
 3: len 8; hex 800000000000e057; asc        W;;
 4: len 8; hex 8000000000032289; asc       " ;;
 5: len 4; hex 80000003; asc     ;;
 6: len 8; hex 800000000000004b; asc        K;;
 7: len 5; hex 99a18ecafb; asc      ;;
 8: len 5; hex 99a194cafb; asc      ;;
 9: SQL NULL;
 10: len 5; hex 99a18ecaf6; asc      ;;
 11: len 4; hex 5c09fa8f; asc \   ;;

*** WE ROLL BACK TRANSACTION (1)

分析一下上面的日志:

  • 事务1(TRANSACTION),执行的语句为UPDATE BIND_RECORD SET id = id,bind_uid = 57431,bind_by_uid = 205449,bind_type = 3,bind_rule_ref = 75,start_time = '2018-12-07 12:43:59',end_time = '2018-12-10 12:43:59' WHERE id = 244 ;等待X锁(WAITING FOR THIS LOCK TO BE GRANTED:lock_mode X locks rec but not gap waiting),其中 but not gap waiting表示非间隙锁。
  • 事务2,执行的语句为DELETE FROM BIND_RECORD WHERE ( end_time is not nulland end_time < '2018-12-07 12:44:00' ) ;持有S锁(HOLDS THE LOCK(S)中所描述的lock mode S),等待X锁(WAITING FOR THIS LOCK TO BE GRANTED:lock_mode X waiting)

通过上面的描述,可以大概分析出死锁产生的原因

  • 事务2当前持有了lock mode S(为什么会持有S锁还要看事务2中具体执行了哪些SQL)
  • 事务1执行UPDATE操作,UPDATE操作需要X锁(lock_mode X locks rec but not gap,和 S锁冲突),当前锁不能获取,因为它和事务2中的持有S锁冲突,所以事务1的锁请求被加入一个锁请求队列,事务1被阻塞,等待事务2释放S锁
  • 事务2执行DELETE操作,DELETE操作需要X锁(lock_mode X),所以现在情况就是,事务2等待X锁并持有S锁,但是由于事务1已经请求了一个X锁,所以不能获取到,而事务1又在等待事务2释放S锁,这样一来就出现了死锁。

2. 如何知道SQL的锁信息

上面提到了事务2首先持有了S锁,这个S锁是怎么持有的,怎么知道每个SQL对应的锁的信息呢。

模拟前先设置数据库事务隔离级别,关闭自动提交:

SET GLOBAL TRANSACTION ISOLATION LEVEL REPEATABLE READ;  //隔离级别设置为REPEATABLE READ
SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ; //保险起见GLOBAL和SESSION都设置一下
show variables like 'autocommit'; //查看是否开启了自动提交
set autocommit = 0;

在MySQL官网上,找到了如何打开InnoDB锁监控,具体参考https://dev.mysql.com/doc/refman/8.0/en/innodb-enabling-monitors.html

SET GLOBAL innodb_status_output=ON; 
SET GLOBAL innodb_status_output_locks=ON;

使用上面提到的语句来试一试:

在MySQL的终端依次输入下面的语句:

mysql>start transaction;

mysql>UPDATE BIND_RECORD  SET id = id,bind_uid = 57431,bind_by_uid = 205449,bind_type = 3,bind_rule_ref = 75,start_time = '2018-12-07 12:43:59',end_time = '2018-12-10 12:43:59' WHERE  id = 244;

mysql>SHOW ENGINE INNODB STATUS;

mysql>commit;

在输入 SHOW ENGINE INNODB STATUS; 会打印出当前事务的锁信息如下:

------------
TRANSACTIONS
------------
Trx id counter 144145
Purge done for trx's n:o < 144142 undo n:o < 0 state: running but idle
History list length 3
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 281479443287616, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 281479443286712, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 281479443285808, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 144144, ACTIVE 4 sec
2 lock struct(s), heap size 1136, 1 row lock(s)
MySQL thread id 6, OS thread handle 123145535860736, query id 106 localhost root starting
SHOW ENGINE INNODB STATUS
TABLE LOCK table `xmds_bind`.`bind_record` trx id 144144 lock mode IX
RECORD LOCKS space id 616 page no 6 n bits 296 index PRIMARY of table `xmds_bind`.`bind_record` trx id 144144 lock_mode X locks rec but not gap
Record lock, heap no 225 PHYSICAL RECORD: n_fields 12; compact format; info bits 0
 0: len 8; hex 80000000000000f4; asc         ;;
 1: len 6; hex 00000002330c; asc     3 ;;
 2: len 7; hex 2a000001e70693; asc *      ;;
 3: len 8; hex 800000000000e057; asc        W;;
 4: len 8; hex 8000000000032289; asc       " ;;
 5: len 4; hex 80000003; asc     ;;
 6: len 8; hex 800000000000004b; asc        K;;
 7: len 5; hex 99a18ecafb; asc      ;;
 8: len 5; hex 99a194cafb; asc      ;;
 9: SQL NULL;
 10: len 5; hex 99a17aadde; asc   z  ;;
 11: len 4; hex 5c63d6dc; asc \c  ;;

从输出中 RECORD LOCKS space id 616 page no 6 n bits 296 index PRIMARY of tablexmds_bind.bind_recordtrx id 144144 lock_mode X locks rec but not gap 可以看出,需要一个X锁(lock_mode X locks rec but not gap)和我们上面的分析相符。

3. 本地模拟死锁

通过上面的方式,再分析代码,发现了事务2中有一条语句会持有S锁

insert into BIND_RECORD_HISTORY(id, bind_uid, bind_by_uid, bind_type, bind_rule_ref,start_time, end_time, extra) select id, bind_uid, bind_by_uid, bind_type, bind_rule_ref,start_time, end_time, extra from BIND_RECORD where end_time > '2018-12-09';

持有锁信息如下:

------------
TRANSACTIONS
------------
Trx id counter 144148
Purge done for trx's n:o < 144147 undo n:o < 0 state: running but idle
History list length 4
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 281479443287616, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 281479443286712, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 281479443285808, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 144147, ACTIVE 6 sec
91 lock struct(s), heap size 24784, 19451 row lock(s), undo log entries 1
MySQL thread id 6, OS thread handle 123145535860736, query id 169 localhost root starting
SHOW ENGINE INNODB STATUS
TABLE LOCK table `xmds_bind`.`bind_record` trx id 144147 lock mode IS
RECORD LOCKS space id 616 page no 6 n bits 296 index PRIMARY of table `xmds_bind`.`bind_record` trx id 144147 lock mode S
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

下面来我们打开两个MySQL终端来模拟下,是否会出现我们预期中的死锁:

事务1 事务2
start transaction;
insert into BIND_RECORD_HISTORY(id, bind_uid, bind_by_uid, bind_type, bind_rule_ref,start_time, end_time, extra) select id, bind_uid, bind_by_uid, bind_type, bind_rule_ref,start_time, end_time, extra from BIND_RECORD where end_time < '2018-12-07 12:44:00';
start transaction;
UPDATE BIND_RECORD SET id = id,bind_uid = 57431,bind_by_uid = 205449,bind_type = 3,bind_rule_ref = 75,start_time = '2018-12-07 12:43:59',end_time = '2018-12-10 12:43:59' WHERE id = 244;
DELETE FROM BIND_RECORD WHERE ( end_time is not null and end_time < '2018-12-07 12:44:00' );

按照上面的执行顺序,结果如下,报出了死锁错误:

ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
LOCK

在知道了死锁原因之后要修改就比较简单了。

三. 官网上的一个例子

最后在官网上找到了类似的一个死锁例子,地址如下:

https://dev.mysql.com/doc/refman/8.0/en/innodb-deadlock-example.html

推荐阅读更多精彩内容