mysql数据同步到elasticsearch

96
张维斌
0.5 2019.05.05 08:28 字数 556

环境

  • centos7.2
  • php5.5.7
  • mysql5.7

安装elasticsearch

下载地址:https://www.elastic.co/downloads/elasticsearch

image

安装2.4.6

wget https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/rpm/elasticsearch/2.4.6/elasticsearch-2.4.6.rpm

yum -y install elasticsearch-2.4.6.rpm


环境:

  • ubuntu 18.0.4
  • php7.2.13
  • mysql5.7
  • java1.8+
  • go1.9+

安装elasticsearch

下载地址:https://www.elastic.co/downloads/elasticsearch

wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.5.0.deb

gdebi elasticsearch-6.5.0.deb

安装完成后,配置elasticsearch

vim /etc/elasticsearch/elasticsearch.yml

//配置内容
.
.
.
cluster.name: zpdx-search    //集群名称
.
.
.
node.name: zpdx-1      //节点名称
.
.
.
path.data: /var/lib/elasticsearch    //数据路径
.
.
.
path.log: /var/log/elasticsearch      //日志路径
.
.
.
network.host:  127.0.0.1      //主机地址,默认
.
.
.
http:port:  9200      //端口,默认

开启elasticsearch

systemctl start elasticsearch.service

elasticsearch设置开机自启动

sudo /bin/systemctl daemon-reload

sudo /bin/systemctl enable elasticsearch.service

安装mvn

sudo apt update

sudo apt install maven 

安装 ik

github下载地址: elasticsearch-analysis-ik

git clone https://github.com/medcl/elasticsearch-analysis-ik.git

cd lasticsearch-analysis-ik

git checkout tags/1.10.6

mvn clean

mvn compile

mvn package

./elasticsearch-analysis-ik/target/releases目录会下载了一个对应版本的插件包elasticsearch-analysis-ik-1.10.6.zip

将插件包复制到elaticsearch的插件目录下

cp elasticsearch-analysis-ik-1.10.6.zip /usr/share/elasticsearch/plugins/

cd /usr/share/elasticsearch/plugins/

unzip elasticsearch-analysis-ik-1.10.6.zip

mkdir ik

mv ./* ik

重启elasticsearch

systemctl restart elasticsearch.service

测试,使用curl 'http://127.0.0.1:9200',或者用浏览器访问http://127.0.0.1:9200```,显示

{
  "name" : "zpdx-1",
  "cluster_name" : "zpdxshop-search",
  "cluster_uuid" : "g1wO4CR8TGOwfrDHY-vcGw",
  "version" : {
    "number" : "6.5.0",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "816e6f6",
    "build_date" : "2018-11-09T18:58:36.352602Z",
    "build_snapshot" : false,
    "lucene_version" : "7.5.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

elasticsearch安装完成并成功启动

安装go

sudo apt update

sudo apt install golang

查看go 版本

go version

查看go 环境

go env

设置go环境,vim /etc/profile

//将go环境路径设置到/usr/local/go方便管理
//在最后添加
.
.
.
#go环境
export GOPATH=/usr/local/go

保存并退出,使/etc/profile文件生效

source /etc/profile

安装go-mysql-elasticsearch

go-mysql-elasticsearch的基本原理是:如果是第一次启动该程序,首先使用mysqldump工具对源mysql数据库进行一次全量同步,通过elasticsearch client执行操作写入数据到ES;然后实现了一个mysql client,作为slave连接到源mysql,源mysql作为master会将所有数据的更新操作通过binlog event同步给slave, 通过解析binlog event就可以获取到数据的更新内容,之后写入到ES.

使用限制

1. mysql binlog必须是ROW模式

2. 要同步的mysql数据表必须包含主键,否则直接忽略,这是因为如果数据表没有主键,UPDATE和DELETE操作就会因为在ES中找不到对应的document而无法进行同步

3. 不支持程序运行过程中修改表结构

4. 要赋予用于连接mysql的账户RELOAD权限以及REPLICATION权限, SUPER权限:
       GRANT REPLICATION SLAVE ON *.* TO 'admin'@'127.0.0.1';
       GRANT RELOAD ON *.* TO 'admin'@'27.0.0.1';
       UPDATE mysql.user SET Super_Priv='Y' WHERE user='admin' AND host='127.0.0.1';

查看mysqlbinlogserver_id,需要设置

//进入mysql

mysql> show global variables like '%binlog_format%'; 
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| binlog_format | ROW   |
+---------------+-------+
1 row in set (0.02 sec)

mysql> show variables like '%server_id%';
+----------------+-------+
| Variable_name  | Value |
+----------------+-------+
| server_id      | 100   |
| server_id_bits | 32    |
+----------------+-------+
2 rows in set (0.01 sec)

mysql> 

设置mysql,编辑/etc/mysql/mysql.conf.d/mysqld.cnf

vim /etc/mysql/mysql.conf.d/mysqld.cnf

.
.
.
[mysql]
log-bin=mysql-bin
binlog_format="ROW"
server-id=100

重启mysql

systemctl restart mysql

安装go-mysql-elasticsearch

github下载地址:go-mysql-elasticsearch

go get github.com/siddontang/go-mysql-elasticsearch

cd $GOPATH/src/github.com/siddontang/go-mysql-elasticsearch

make

go-elasticsearch不能自动创建索引index,需手动创建

vim create_index.json

//添加内容
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  },
  "mappings": {
    "content": {
      "properties": {
        "title": {
          "type": "text",
          "analyzer": "ik_smart",     
          "search_analyzer": "ik_smart"
        },
        "content": {
          "type": "text",
          "analyzer": "ik_smart",
          "search_analyzer": "ik_smart"
        }
      }
    }
  }
}

curl执行

chase@chase-MACH-WX9:~$ curl -XPUT 'localhost:9200/test?pretty' -H 'Content-Type:application/json' -d'@create_index.json'
{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "test"
}
chase@chase-MACH-WX9:~$ 

创建成功,查看索引

chase@chase-MACH-WX9:~$ curl localhost:9200/test?pretty
{
  "test" : {
    "aliases" : { },
    "mappings" : {
      "content" : {
        "properties" : {
          "content" : {
            "type" : "text",
            "analyzer" : "ik_smart"
          },
          "title" : {
            "type" : "text",
            "analyzer" : "ik_smart"
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1548855659726",
        "number_of_shards" : "1",
        "number_of_replicas" : "0",
        "uuid" : "F3ze_A2gQo6mE88aEQzKNA",
        "version" : {
          "created" : "6050099"
        },
        "provided_name" : "test"
      }
    }
  }
}
chase@chase-MACH-WX9:~$ 

配置go-mysql-elasticsearch,编辑$GOPATH/src/github.com/siddontang/go-mysql-elasticsearch/etc/river.toml

# MySQL address, user and password
# user must have replication privilege in MySQL.
my_addr = "127.0.0.1:3306"
my_user = "admin"
my_pass = "123456"
my_charset = "utf8"

# Set true when elasticsearch use https
#es_https = false
# Elasticsearch address
es_addr = "127.0.0.1:9200"
# Elasticsearch user and password, maybe set by shield, nginx, or x-pack
es_user = ""
es_pass = ""

# Path to store data, like master.info, if not set or empty,
# we must use this to support breakpoint resume syncing. 
# TODO: support other storage, like etcd. 
data_dir = "./var"

# Inner Http status address
stat_addr = "127.0.0.1:12800"

# pseudo server id like a slave 
server_id = 100

# mysql or mariadb
flavor = "mysql"

# mysqldump execution path
# if not set or empty, ignore mysqldump.
mysqldump = "mysqldump"

# if we have no privilege to use mysqldump with --master-data,
# we must skip it.
#skip_master_data = false

# minimal items to be inserted in one bulk
bulk_size = 128

# force flush the pending requests if we don't have enough items >= bulk_size
flush_bulk_time = "200ms"

# Ignore table without primary key
skip_no_pk_table = false

# MySQL data source
[[source]]
schema = "test"

# Only below tables will be synced into Elasticsearch.
# "t_[0-9]{4}" is a wildcard table format, you can use it if you have many sub tables, like table_0000 - table_1023
# I don't think it is necessary to sync all tables in a database.
tables = ["node"]

# Below is for special rule mapping

# Very simple example
# 
# desc t;
# +-------+--------------+------+-----+---------+-------+
# | Field | Type         | Null | Key | Default | Extra |
# +-------+--------------+------+-----+---------+-------+
# | id    | int(11)      | NO   | PRI | NULL    |       |
# | name  | varchar(256) | YES  |     | NULL    |       |
# +-------+--------------+------+-----+---------+-------+
# 
# The table `t` will be synced to ES index `test` and type `t`.
[[rule]]
schema = "test"
table = "node"
index = "test"
type = "content"

启动go-mysql-elasticsearch

cd $GOPATH/src/github.com/siddontang/go-mysql-elasticsearch

./bin/go-mysql-elasticsearch -config=./etc/river.toml

同步数据信息

[2019/01/30 21:48:31] [info] binlogsyncer.go:111 create BinlogSyncer with config {100 mysql 127.0.0.1 3306 admin   utf8 false false <nil> false false 0 0s 0s 0}
[2019/01/30 21:48:31] [info] dump.go:164 skip dump, use last binlog replication pos (mysql-bin.000001, 790) or GTID set %!s(<nil>)
[2019/01/30 21:48:31] [info] status.go:53 run status http server 127.0.0.1:12800
[2019/01/30 21:48:31] [info] binlogsyncer.go:323 begin to sync binlog from position (mysql-bin.000001, 790)
[2019/01/30 21:48:31] [info] binlogsyncer.go:172 register slave for master server 127.0.0.1:3306
[2019/01/30 21:48:31] [info] sync.go:31 start sync binlog at binlog file (mysql-bin.000001, 790)
[2019/01/30 21:48:31] [info] binlogsyncer.go:692 rotate to (mysql-bin.000001, 790)
[2019/01/30 21:48:31] [info] binlogsyncer.go:692 rotate to (mysql-bin.000002, 4)
[2019/01/30 21:48:31] [info] sync.go:73 rotate binlog to (mysql-bin.000001, 790)
[2019/01/30 21:48:31] [info] master.go:54 save position (mysql-bin.000001, 790)
[2019/01/30 21:48:31] [info] sync.go:73 rotate binlog to (mysql-bin.000002, 4)
[2019/01/30 21:48:31] [info] master.go:54 save position (mysql-bin.000002, 4)

自此,数据全量导入到elasticsearch

设置go-mysql-elasticsearch开机自启动

ubuntu安装sysv-rc-conf

sudo apt update

sudo apt install sysv-rc-conf

如果报错:

sudo apt-get install sysv-rc-conf
正在读取软件包列表... 完成
正在分析软件包的依赖关系树       
正在读取状态信息... 完成       
E: 无法定位软件包 sysv-rc-conf

解决方法:添加镜像源

vim /etc/apt/sources.list

//添加如下内容
deb http://archive.ubuntu.com/ubuntu/ trusty main universe restricted multiverse

sudo apt update

sudo apt install sysv-rc-con

创建go-mysql-elasticsearch脚本

vim /etc/init.d/go-mysql-elasticsearch

//添加内容如下:
#!/bin/bash
#go-mysql-elasticsearch start
$GOPATH/src/github.com/siddontang/go-mysql-elasticsearch/bin/go-mysql-elasticsearch -config=$GOPATH/src/github.com/siddontang/go-mysql-elasticsearch/etc/river.toml

执行

sysv-rc-conf go-mysql-elasticsearch on

service go-mysql-elasticsearch start

centos安装chkconfig

yum -y install chkconfig

安装elasticsearch-php

composer require elasticsearch/elasticsearch

elasticsearch官方文档

elasticsearch-php官方文档

Mysql与Elasticsearch数据实时同步

elasticsearch之实例篇

阮一峰 -- 全文搜索引擎 Elasticsearch 入门教程

随笔
Gupao