neo4j基本操作

一. neo4j安装

  1. 安装jdk

可以安装openjdk,neo4j 4.0版本以上需要openjdk-11,3.5版本需要openjdk-8。
如果默认软件源没有openjdk,可以添加ppa源。
如果ubuntu版本比较旧(如16.04),可能装openjdk-11比较麻烦,可以装openjdk-8。

sudo add-apt-repository -y ppa:openjdk-r/ppa
sudo apt-get update
sudo apt-get install openjdk-8-jdk

2. 安装neo4j

wget -O - https://debian.neo4j.org/neotechnology.gpg.key | sudo apt-key add -
echo 'deb https://debian.neo4j.org/repo stable/' | sudo tee -a /etc/apt/sources.list.d/neo4j.list
sudo apt-get update
sudo apt-get install neo4j
sudo apt-get install cypher-shell

3. 启动或停止服务

neo4j status
neo4j start
neo4j stop

通过cypher-shell可以进入neo4j交互界面,默认用户名和密码是"neo4j"。
在交互界面可以通过CALL dbms.changePassword('password'); 修改密码。

4. 设置远程浏览器访问

默认只能localhost访问,需要远程访问需修改/etc/neo4j/neo4j.conf,去掉注释即可

#dbms.connectors.default_listen_address=0.0.0.0

二. py2neo使用

节点和关系

In [1]: from py2neo import Graph, Node, Relationship

In [2]: a = Node("Person", name="Alice")
In [3]: b = Node("Person", name="Bob")
In [4]: ab = Relationship(a, "KNOWS", b)

In [5]: print(type(a))
<class 'py2neo.data.Node'>
In [6]: print(a)
(:Person {name: 'Alice'})

In [7]: print(type(ab))
<class 'py2neo.data.KNOWS'>
In [8]: print(ab)
(Alice)-[:KNOWS {}]->(Bob)

这样就成功创建了两个 Node 和两个 Node 之间的 Relationship。 Node 和 Relationship 都继承了 PropertyDict 类,它可以赋值很多属性,类似于字典的形式。

Subgraph
Subgraph子图,是 Node 和 Relationship 的集合,最简单的构造子图的方式是通过关系运算符,如下:

# 创建subgraph
In [10]: s = a | b | ab

In [11]: print(type(s))
<class 'py2neo.data.Subgraph'>
In [12]: print(s)
Subgraph({Node('Person', name='Alice'), Node('Person', name='Bob')}, {KNOWS(Node('Person', name='Alice'), Node('Person', name='Bob'))})

# 可以通过 nodes () 和 relationships () 方法获取所有的 Node 和 Relationship
In [20]: type(s.nodes)
Out[20]: py2neo.collections.SetView
In [18]: list(s.nodes)
Out[18]: [Node('Person', name='Alice'), Node('Person', name='Bob')]
In [19]: list(s.relationships)
Out[19]: [KNOWS(Node('Person', name='Alice'), Node('Person', name='Bob'))]

# subgraph求交集
In [21]: s2 = a | b

In [22]: s&s2
Out[22]: Subgraph({Node('Person', name='Alice'), Node('Person', name='Bob')}, {})

walkable
Walkable 是增加了遍历信息的 Subgraph,可以通过 + 号便可以构建一个 Walkable 对象,如:

In [34]: a = Node("Person", name="Alice")
In [35]: b = Node("Person", name="Bob")
In [36]: c = Node("Person", name="Jack")
In [37]: d = Node("Dog", name="Pupy")
In [38]: ab = Relationship(a, "KNOWS", b)
In [39]: bc = Relationship(b, "LIKES", c)
In [40]: cd = Relationship(c, "HAS", d)
# 创建walkable对象
In [41]: w = ab+bc+cd

In [42]: print(type(w))
<class 'py2neo.data.Path'>
In [43]: print(w)
(Alice)-[:KNOWS {}]->(Bob)-[:LIKES {}]->(Jack)-[:HAS {}]->(Pupy)

In [44]: from py2neo import walk

# 用walk方法从起始节点遍历到终止节点
In [45]: for item in walk(w):
    ...:     print(item)
(:Person {name: 'Alice'})
(Alice)-[:KNOWS {}]->(Bob)
(:Person {name: 'Bob'})
(Bob)-[:LIKES {}]->(Jack)
(:Person {name: 'Jack'})
(Jack)-[:HAS {}]->(Pupy)
(:Dog {name: 'Pupy'})

# 用 start_node ()、end_node ()、nodes ()、relationships () 方法来获取起始 Node、终止 Node、所有 Node 和 Relationship
In [47]: w.start_node
Out[47]: Node('Person', name='Alice')
In [48]: w.end_node
Out[48]: Node('Dog', name='Pupy')
In [49]: w.nodes
Out[49]:
(Node('Person', name='Alice'),
 Node('Person', name='Bob'),
 Node('Person', name='Jack'),
 Node('Dog', name='Pupy'))
In [50]: w.relationships
Out[50]:
(KNOWS(Node('Person', name='Alice'), Node('Person', name='Bob')),
 LIKES(Node('Person', name='Bob'), Node('Person', name='Jack')),
 HAS(Node('Person', name='Jack'), Node('Dog', name='Pupy')))

Graph

  1. 初始化
    Graph是和 Neo4j 数据交互的 最重要得API,提供了许多方法来操作 Neo4j 数据库。 Graph 在初始化的时候需要传入连接的 URI,初始化参数有 bolt、secure、host、http_port、https_port、bolt_port、user、password,详情参考:http://py2neo.org/v3/database.html#py2neo.database.Graph。 初始化的实例如下:
g = Graph(host='localhost', auth=('neo4j', 'passwd'))
  1. 创建数据
    可以直接创建子图,也可以创建单个节点或关系
In [34]: a = Node("Person", name="Alice")
In [35]: b = Node("Person", name="Bob")
In [36]: c = Node("Person", name="Jack")
In [37]: d = Node("Dog", name="Pupy")
In [38]: ab = Relationship(a, "KNOWS", b)
In [39]: bc = Relationship(b, "LIKES", c)
In [40]: cd = Relationship(c, "HAS", d)
In [41]: ss = a|b|c|d|ab|bc|cd
In [42]: g.create(ss)

得到如下结果:



再添加一个关系

r = Relationship(a, 'KONWS', c)
g.create(r)

得到结果如下:


  1. 查找节点
    使用NodeMatcher查找节点。
In [40]: from py2neo import NodeMatcher, RelationshipMatcher

In [41]: nm = NodeMatcher(g)

In [43]: res = nm.match('Person')
In [44]: list(res)
Out[44]:
[Node('Person', name='Bob'),
 Node('Person', name='Alice'),
 Node('Person', name='Jack')]

# 返回查找结果得第一个
In [58]: res = nm.match('Person').first()
In [59]: res
Out[59]: Node('Person', name='Bob')

In [49]: res = nm.match('Dog', name='Pupy')
In [50]: list(res)
Out[50]: [Node('Dog', name='Pupy')]

# 使用正则匹配查询
In [56]: res = nm.match('Person').where('_.name=~"A.*"')
In [57]: list(res)
Out[57]: [Node('Person', name='Alice')]

first()返回单个节点
limit(amount)返回底部节点的限值条数
skip(amount)返回顶部节点的限值条数
order_by(fields)排序
where(
conditions, **properties)筛选条件

  1. 查找关系
    可以使用g.match查找关系,也可以使用RelationshipMatcher,后者更强大。
In [40]: from py2neo import NodeMatcher, RelationshipMatcher

In [42]: rm = RelationshipMatcher(g)

In [96]: list(g.match())
Out[96]:
[LIKES(Node('Person', name='Bob'), Node('Person', name='Jack')),
 KONWS(Node('Person', name='Alice'), Node('Person', name='Jack')),
 KNOWS(Node('Person', name='Alice'), Node('Person', name='Bob')),
 HAS(Node('Person', name='Jack'), Node('Dog', name='Pupy'))]

In [63]: res = g.match(r_type='LIKES')
In [64]: list(res)
Out[64]: [LIKES(Node('Person', name='Bob'), Node('Person', name='Jack'))]

# 查询以某个节点为头节点的某个关系,例如要查询白血病的并发症
In [293]: a = nm.match('疾病', name='白血病').first()                                                                                                                                         
In [294]: a                                                                                                                                                                                   
Out[294]: Node('疾病', name='白血病')
In [295]: list(g.match(r_type='并发症', nodes=[a]))                                                                                                                                           
Out[295]: 
[并发症(Node('疾病', name='白血病'), Node('疾病', name='白血病性中枢神经感染')),
 并发症(Node('疾病', name='白血病'), Node('疾病', name='白血病脑出血')),
 并发症(Node('疾病', name='白血病'), Node('疾病', name='肠功能衰竭')),
 并发症(Node('疾病', name='白血病'), Node('疾病', name='卡氏肺囊虫感染'))]

In [66]: res2 = rm.match(r_type='LIKES')
In [67]: list(res2)
Out[67]: [LIKES(Node('Person', name='Bob'), Node('Person', name='Jack'))]
  1. 批量插入
    批量插入时要注意避免插入很多相同节点(即使类型和值都相同,但多次用Node构建,产生的节点就是不同的,因为id不同),如下示例:
In [258]: a1 = Node('Person', '小明')                                                                                                                                                         
In [259]: a2 = Node('Person', '小明')                                                                                                                                                         

In [260]: a1==a2                                                                                                                                                                              
Out[260]: False
In [261]: id(a1)                                                                                                                                                                              
Out[261]: 139971127871536
In [262]: id(a2)                                                                                                                                                                              
Out[262]: 139971551445936

因此在批量插入时,尤其是对表格类数据,要注意避免多次构造具有相同类型和值的节点,可以在用Node构建节点前先用NodeMatcher查询是否已经存在相同类型和值的节点。下边是一个据体的批量插入的例子:

g = Graph(host='localhost', auth=('neo4j', 'password'))
nm = NodeMatcher(g)

for i in data:
    spos = i['spo_list']
    for spo in spos:
        p, sub, obj, sub_type, obj_type = spo.values()
        sub_existed = nm.match(sub_type, name=sub).first()  # 查询是否已存在相同类型和值的节点
        obj_existed = nm.match(obj_type, name=obj).first()
        if sub_existed and obj_existed:  # 两个节点之间只能有一种关系,因此如果sub和obj都已经存在了,就不再插入
            continue
        elif sub_existed:
            obj_node = Node(obj_type, name=obj)  # 只存在sub节点,则需要构建新的obj节点
            rel = Relationship(sub_existed, p, obj_node)
        elif obj_existed:
            sub_node = Node(sub_type, name=sub)
            rel = Relationship(sub_node, p, obj_existed)
        else:
            sub_node = Node(sub_type, name=sub)
            obj_node = Node(obj_type, name=obj)
            rel = Relationship(sub_node, p, obj_node)
        g.create(rel)

参考

  1. https://www.cnblogs.com/selfcs/p/12658740.html
  2. https://py2neo.readthedocs.io/en/latest/database/work.html
  3. https://www.cnblogs.com/qiujichu/p/13032254.html
  4. http://foreversong.cn/archives/1271
  5. https://cuiqingcai.com/4778.html

推荐阅读更多精彩内容