MySQL数据库连接的相关探索

问题的起源是由于发现在公司的很多项目中使用MySQL的时候,都没有关闭数据库连接的操作。这主要是由于一般一个项目就共用一个global的db connection,不需要关闭,因为始终都要用这个connection。那么到底是否真的不需要关闭connection?以及共用一个db connection的实例是否有什么问题?最佳实践究竟是怎样的?这些问题驱使我开始本次探索之旅。

为什么要close connection?

笔者查询了很多资料,没有找到说得特别清晰的文章,大多数是说这是一个好的习惯或者是最佳实践之类,但并没有详细阐释为什么?那么到底不去close connection会带来什么问题呢?

不去close connection会有两种情况:

  1. 程序的进程结束了
  2. 程序的进程始终在运行

进程结束

我们先来看下第一种情况,即程序结束了,mysql会如何处理connection,这种情况又可以分为两种种情况:

  • 程序正常结束
  • 程序非正常结束
    我们通过如下代码来验证一下
# 程序正常结束的情况
#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""
mysqlconn.py
~~~~~~~~~~~~
test mysql connection related functions.

:copyright: (c) 2019 by Geekpy.

"""
import MySQLdb
import time
import os
import sys

db_config = {
    'host': 'localhost',
    'port': 3306,
    'db': 'testorder',
    'user': 'root',
    'password': 'LocalTest'
}


def test_not_close_conn_normal_exit():
    print('begin connecting to mysql')
    conn = MySQLdb.Connection(autocommit=True, **db_config)

    # sleep 30s to check the connection
    print('mysql connected. sleeping ...')
    time.sleep(30)

    # exit program without close connection to see
    # if mysql can close the connection on server side
    print('after sleep, now exit program.')
    sys.exit()

if __name__ == '__main__':
    test_not_close_conn_normal_exit()

在运行程序之前,我们看下MySQL client来查询当前的连接情况,如下:

mysql> show processlist;
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| Id  | User | Host            | db        | Command | Time | State    | Info             |
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| 298 | root | localhost:59682 | testorder | Query   | 0    | starting | show processlist |
+-----+------+-----------------+-----------+---------+------+----------+------------------+

可以看出此时只有一个连接,即当前mysql client的连接
运行程序,然后再次查询,如下:

mysql> show processlist;
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| Id  | User | Host            | db        | Command | Time | State    | Info             |
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| 298 | root | localhost:59682 | testorder | Query   | 0    | starting | show processlist |
| 379 | root | localhost       | testorder | Sleep   | 4    |          | <null>           |
+-----+------+-----------------+-----------+---------+------+----------+------------------+

看到此时多了一个连接,且状态为Sleep,这是由于我们建立连接后没有做任何操作。

sleep之后,程序结束,且没有close connection的动作,此时再次查询,发现如下结果:

mysql> show processlist;
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| Id  | User | Host            | db        | Command | Time | State    | Info             |
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| 298 | root | localhost:59682 | testorder | Query   | 0    | starting | show processlist |
+-----+------+-----------------+-----------+---------+------+----------+------------------+

看来在程序退出后,一旦socket连接断开,MySQL会自动关闭connection,回收资源。

那么再来看下,如果程序没有正常退出会怎么样呢?
我们用如下代码来验证下:

# 程序非正常结束的情况
#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""
mysqlconn.py
~~~~~~~~~~~~
test mysql connection related functions.

:copyright: (c) 2019 by Geekpy.

"""
import MySQLdb
import time
import os
import sys

db_config = {
    'host': 'localhost',
    'port': 3306,
    'db': 'testorder',
    'user': 'root',
    'password': 'LocalTest'
}


def test_not_close_conn_unnormal_exit():
    print('begin connecting to mysql')
    conn = MySQLdb.Connection(autocommit=True, **db_config)

    # sleep to check the connection
    print('mysql connected. sleeping ...')
    pid = os.getpid()
    print(f'current pid is {pid}, you can kill me now ...')
    time.sleep(2000)

if __name__ == '__main__':
    test_not_close_conn_unnormal_exit()

运行该程序后,我们先查下connection情况,可以看到新增了一个连接。

mysql> show processlist;
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| Id  | User | Host            | db        | Command | Time | State    | Info             |
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| 298 | root | localhost:59682 | testorder | Query   | 0    | starting | show processlist |
| 380 | root | localhost       | testorder | Sleep   | 5    |          | <null>           |
+-----+------+-----------------+-----------+---------+------+----------+------------------+

直接使用kill -9指令kill掉测试进程。
然后再次查询,发现如下结果:

mysql> show processlist;
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| Id  | User | Host            | db        | Command | Time | State    | Info             |
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| 298 | root | localhost:59682 | testorder | Query   | 0    | starting | show processlist |
+-----+------+-----------------+-----------+---------+------+----------+------------------+

也就是MySQL依然可以感知到socket连接的断开,并自动关闭connection。

结论:无论客户端进程如何结束,一旦socket连接断开,mysql都会自动关闭连接。

进程未结束

进程未结束,socket连接也没有断开的情况下,connection处于inactive状态,就是我们在上例中看到的Sleep状态的连接。而这种状态是否可以一直持续呢?

答案是,可以持续一段时间直到timeout,连接才会关闭。

那么这个timeout时间是多少,由什么参数决定呢?
简单来说其是由参数interactive_timeoutwait_timeout来决定的,而这两个参数默认都是28800s,也就是8小时。关于这两个参数的说明,可以参考MySQL中interactive_timeout和wait_timeout的区别

结论: 在进程未结束,且socket连接未断开的情况下,connection会保持很长一段时间不会被关闭(默认是8小时)

这种情况下就会出现一种情况,即如果我们有多个程序访问同一个数据库,且都是多进程或者多线程程序,并长时间运行,开了多个connection,都始终没有关闭。这种极端情况会导致数据库有大量的connection连接,而通常MySQL的连接数是有限制的(可以如下查询),如果超出这个限制,就会出现连接的问题,导致我们无法连接数据库。

mysql> show variables like '%max_connections%';
+------------------------------+-------+
| Variable_name                | Value |
+------------------------------+-------+
| max_connections              | 1000  |
+------------------------------+-------+

在某些情况下,比如我们的程序只有一个线程,且始终共用一个connection实例,并且这个实例始终都在使用,这种情况下即使不去close connection问题也不大,且这样我们可以减少open/close connection的一些开销。当然,我们实际上还可以有更好的选择,那就是下面要介绍的DBUtil工具。

DBUtils

对于Python而言,一个常被提及的开源库DBUtils,可以帮助我们来建立数据库连接,维护连接状态,甚至可以提供连接池功能。下面我们就来了解下这个库。

安装

pip install DBUtils

使用

DBUtil主要提供两个类供我们使用,一个是PersistentDB, 另一个是PooledDB
PersistentDB主要为单线程应用提供一个持久的连接,而PooledDB通常可以为多线程应用提供线程池服务。

PersistentDB

Whenever a thread opens a database connection for the first time, a new connection to the database will be opened that will be used from now on for this specific thread. When the thread closes the database connection, it will still be kept open so that the next time when a connection is requested by the same thread, this already opened connection can be used. The connection will be closed automatically when the thread dies.
In short: PersistentDB tries to recycle database connections to increase the overall database access performance of your threaded application, but it makes sure that connections are never shared between threads.

翻译过来就是,当一个线程新建了一个数据库连接时,只有这个线程可以使用该连接,且当线程close connection时,该connection并没有真正close,而是被PersistentDB回收,下次该线程再次获取connection时,PersistentDB会将之前的connection重新给该线程(注意是同一个线程),这样就减少了开关connection的开销,而当该线程死掉的时候,该connection会被自动释放。正是基于这些特性,我们在使用DBUtils的时候,可以在不需要connection的时候立即关闭,因为即使频繁的新建和关闭connection也是没关系的,由于都是本地的计算,开销极小,基本可以忽略其connection建立和关闭的开销。

因此对于单线程应用来说,使用PersistentDB会是非常好的选择,不仅可以保证线程安全,还可以减少我们open/close的开销,提高性能。

  • 代码示例:
#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""
mysqlconn.py
~~~~~~~~~~~~
test mysql connection related functions.

:copyright: (c) 2019 by Geekpy.

"""
import MySQLdb
import time
import threading
import os
# 注意这里PersistentDB是在PersistentDB Module当中
from DBUtils.PersistentDB import PersistentDB
from DBUtils.PooledDB import PooledDB
import sys

db_config = {
    'host': 'localhost',
    'port': 3306,
    'db': 'testorder',
    'user': 'root',
    'password': 'LocalTest'
}

db_persis = PersistentDB(
    # creator即你使用的db driver
    creator=MySQLdb,
    # 如果在支持threading.local的环境下可以使用如下配置方式,性能更好
    threadlocal=threading.local,
    **db_config
)


def test_with_dbutils_persistent_conn():
    print('begin connecting to mysql')
    conn = db_persis.connection()

    print('after get connection, sleep 100s')
    time.sleep(100)

    # 这里close并没有真正关闭数据库的connection
    # 而是被PersistentDB回收
    conn.close()
    print('close function already called, sleep 100s again')

    time.sleep(100)
    sys.exit()


if __name__ == '__main__':
    test_with_dbutils_persistent_conn()


  • 运行说明

在运行前看下当前的connection连接

mysql> show processlist;
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| Id  | User | Host            | db        | Command | Time | State    | Info             |
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| 298 | root | localhost:59682 | testorder | Query   | 0    | starting | show processlist |
+-----+------+-----------------+-----------+---------+------+----------+------------------+

运行程序,建立连接后(after get connection, sleep 100s),再次查询连接情况

mysql> show processlist;
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| Id  | User | Host            | db        | Command | Time | State    | Info             |
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| 298 | root | localhost:59682 | testorder | Query   | 0    | starting | show processlist |
| 419 | root | localhost       | testorder | Sleep   | 3    |          | <null>           |
+-----+------+-----------------+-----------+---------+------+----------+------------------+

关闭连接后(close function already called, sleep 100s again),可以看到connection仍然存在

mysql> show processlist;
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| Id  | User | Host            | db        | Command | Time | State    | Info             |
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| 298 | root | localhost:59682 | testorder | Query   | 0    | starting | show processlist |
| 419 | root | localhost       | testorder | Sleep   | 107  |          | <null>           |
+-----+------+-----------------+-----------+---------+------+----------+------------------+

程序退出后,connection被实际关闭

mysql> show processlist;
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| Id  | User | Host            | db        | Command | Time | State    | Info             |
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| 298 | root | localhost:59682 | testorder | Query   | 0    | starting | show processlist |
+-----+------+-----------------+-----------+---------+------+----------+------------------+
  • PersistentDB的说明

通过help(PersistentDB) 可以看到说明

class PersistentDB(builtins.object)
 |  PersistentDB(creator, maxusage=None, setsession=None, failures=None, ping=1, closeable=False, threadlocal=None, *args, **kwargs)
 |
 |  Generator for persistent DB-API 2 connections.
 |
 |  After you have created the connection pool, you can use
 |  connection() to get thread-affine, steady DB-API 2 connections.
 |
 |  Methods defined here:
 |
 |  __init__(self, creator, maxusage=None, setsession=None, failures=None, ping=1, closeable=False, threadlocal=None, *args, **kwargs)
 |      Set up the persistent DB-API 2 connection generator.
 |
 |      creator: either an arbitrary function returning new DB-API 2
 |          connection objects or a DB-API 2 compliant database module
 |      maxusage: maximum number of reuses of a single connection
 |          (number of database operations, 0 or None means unlimited)
 |          Whenever the limit is reached, the connection will be reset.
 |      setsession: optional list of SQL commands that may serve to prepare
 |          the session, e.g. ["set datestyle to ...", "set time zone ..."]
 |      failures: an optional exception class or a tuple of exception classes
 |          for which the connection failover mechanism shall be applied,
 |          if the default (OperationalError, InternalError) is not adequate
 |      ping: determines when the connection should be checked with ping()
 |          (0 = None = never, 1 = default = whenever it is requested,
 |          2 = when a cursor is created, 4 = when a query is executed,
 |          7 = always, and all other bit combinations of these values)
 |      closeable: if this is set to true, then closing connections will
 |          be allowed, but by default this will be silently ignored
 |      threadlocal: an optional class for representing thread-local data
 |          that will be used instead of our Python implementation
 |          (threading.local is faster, but cannot be used in all cases)
 |      args, kwargs: the parameters that shall be passed to the creator
 |          function or the connection constructor of the DB-API 2 module

PooledDB

PooledDB既可以share connection,也可以使用独享的connection,关键是看程序在获取connection的时候参数shareable的设置。默认的情况下该参数为True,即获取的connection可以共享,这种情况下不同线程可以使用同一个connection。

  • 代码示例
#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""
mysqlconn.py
~~~~~~~~~~~~
test mysql connection related functions.

:copyright: (c) 2019 by Geekpy.

"""
import MySQLdb
import time
import threading
import os
from DBUtils.PersistentDB import PersistentDB
from DBUtils.PooledDB import PooledDB
import sys

db_config = {
    'host': 'localhost',
    'port': 3306,
    'db': 'testorder',
    'user': 'root',
    'password': 'LocalTest'
}


db_pool = PooledDB(
    creator=MySQLdb,
    mincached=2,
    maxconnections=20,
    **db_config
)


def test_with_pooleddb_conn():
    print('begin connecting to mysql')
    conn = db_pool.connection()

    print('after get connection, sleep 100s')
    time.sleep(100)

    conn.close()
    print('close function already called, sleep 100s again')

    time.sleep(100)
    sys.exit()


if __name__ == '__main__':
    test_with_pooleddb_conn()

  • 运行说明
    在运行前看下当前的connection连接
mysql> show processlist;
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| Id  | User | Host            | db        | Command | Time | State    | Info             |
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| 298 | root | localhost:59682 | testorder | Query   | 0    | starting | show processlist |
+-----+------+-----------------+-----------+---------+------+----------+------------------+

运行程序,建立连接后(after get connection, sleep 100s)。由于我们设置了参数mincached=2,所以会有两个connection建立出来

mysql> show processlist;
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| Id  | User | Host            | db        | Command | Time | State    | Info             |
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| 298 | root | localhost:59682 | testorder | Query   | 0    | starting | show processlist |
| 420 | root | localhost       | testorder | Sleep   | 6    |          | <null>           |
| 421 | root | localhost       | testorder | Sleep   | 6    |          | <null>           |
+-----+------+-----------------+-----------+---------+------+----------+------------------+

同样,当我们调用close之后,连接并不会真的关闭,而是被PooledDB回收。同时PooledDB可以通过参数设置是否可以将connection分享给其它线程,这样就可以多个线程共享同一个连接。显然,由于连接池的存在,多线程在连接效率上将大幅提升。

mysql> show processlist;
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| Id  | User | Host            | db        | Command | Time | State    | Info             |
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| 298 | root | localhost:59682 | testorder | Query   | 0    | starting | show processlist |
| 420 | root | localhost       | testorder | Sleep   | 112  |          | <null>           |
| 421 | root | localhost       | testorder | Sleep   | 12   |          | <null>           |
+-----+------+-----------------+-----------+---------+------+----------+------------------+

程序退出后,connection被实际关闭

mysql> show processlist;
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| Id  | User | Host            | db        | Command | Time | State    | Info             |
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| 298 | root | localhost:59682 | testorder | Query   | 0    | starting | show processlist |
+-----+------+-----------------+-----------+---------+------+----------+------------------+
  • PooldedDB的说明

通过help(PooledDB) 可以看到说明

class PooledDB(builtins.object)
 |  PooledDB(creator, mincached=0, maxcached=0, maxshared=0, maxconnections=0, blocking=False, maxusage=None, setsession=None, reset=True, failures=None, ping=1, *args, **kwargs)
 |
 |  Pool for DB-API 2 connections.
 |
 |  After you have created the connection pool, you can use
 |  connection() to get pooled, steady DB-API 2 connections.
 |
 |  Methods defined here:
 |
 |  __del__(self)
 |      Delete the pool.
 |
 |  __init__(self, creator, mincached=0, maxcached=0, maxshared=0, maxconnections=0, blocking=False, maxusage=None, setsession=None, reset=True, failures=None, ping=1, *args, **kwargs)
 |      Set up the DB-API 2 connection pool.
 |
 |      creator: either an arbitrary function returning new DB-API 2
 |          connection objects or a DB-API 2 compliant database module
 |      mincached: initial number of idle connections in the pool
 |          (0 means no connections are made at startup)
 |      maxcached: maximum number of idle connections in the pool
 |          (0 or None means unlimited pool size)
 |      maxshared: maximum number of shared connections
 |          (0 or None means all connections are dedicated)
 |          When this maximum number is reached, connections are
 |          shared if they have been requested as shareable.
 |      maxconnections: maximum number of connections generally allowed
 |          (0 or None means an arbitrary number of connections)
 |      blocking: determines behavior when exceeding the maximum
 |          (if this is set to true, block and wait until the number of
 |          connections decreases, otherwise an error will be reported)
 |      maxusage: maximum number of reuses of a single connection
 |          (0 or None means unlimited reuse)
 |          When this maximum usage number of the connection is reached,
 |          the connection is automatically reset (closed and reopened).
 |      setsession: optional list of SQL commands that may serve to prepare
 |          the session, e.g. ["set datestyle to ...", "set time zone ..."]
 |      reset: how connections should be reset when returned to the pool
 |          (False or None to rollback transcations started with begin(),
 |          True to always issue a rollback for safety's sake)
 |      failures: an optional exception class or a tuple of exception classes
 |          for which the connection failover mechanism shall be applied,
 |          if the default (OperationalError, InternalError) is not adequate
 |      ping: determines when the connection should be checked with ping()
 |          (0 = None = never, 1 = default = whenever fetched from the pool,
 |          2 = when a cursor is created, 4 = when a query is executed,
 |          7 = always, and all other bit combinations of these values)
 |      args, kwargs: the parameters that shall be passed to the creator
 |          function or the connection constructor of the DB-API 2 module

Transaction

DBUtils对于Transaction有些特别的要求,这里单独拿出来说下。

def test_with_transaction():
    print('begin connecting to mysql')
    conn = db_pool.connection()

    # 必须先调用begin来开启一个事务
    conn.begin()
    with conn.cursor() as cursor:
        cursor.execute("UPDATE migration_info SET status='prepare' WHERE id=4")

    print('execute but not commit')
    time.sleep(100)
    
    # 这里commit之后才会真正提交给数据库
    conn.commit()
    conn.close()
    print('close function already called, sleep 100s again')

    time.sleep(100)
    sys.exit()

References