字典树-Trie

简介

在计算机科学中，trie，又称前缀树或字典树，是一种有序树，用于保存关联数组，其中的键通常是字符串。与二叉查找树不同，键不是直接保存在节点中，而是由节点在树中的位置决定。一个节点的所有子孙都有相同的前缀，也就是这个节点对应的字符串，而根节点对应空字符串。一般情况下，不是所有的节点都有对应的值，只有叶子节点和部分内部节点所对应的键才有相关的值。典型应用是用于统计，排序和保存大量[字符串（但不仅限于字符串），所以经常被搜索引擎系统用于文本词频统计。它的优点是：利用字符串的公共前缀来减少查询时间，最大限度地减少无谓的字符串比较，查询效率比哈希树高。

trie包含三个单词（see、pain、paint）的数据结构如下

trie.png

CRD

节点至少要包含：
1. 是否是结尾
2. 子节点

例如插入一个paint单词，如果用户查询pain，尽管 paint 包含了 pain，但是Trie中仍然不包含 pain 这个单词。

插入

遍历字符串给相应节点，注意处理尾节点给记录

查找

遍历节点找字符，注意找最后一个字符时需要判断是不是结尾

删除

一个单词是另一个单词的前缀

无分支

单词的除了最后一个字母，其他的字母有多个分支

/// Trie里面的节点
type Node struct {
    isWord bool
    size   int
    //只考虑字母
    children [26]*Node
}

func (n *Node) Size() int {
    return n.size
}

func NewNode() *Node {
    return &Node{
        isWord:   false,
        children: [26]*Node{},
    }
}

type Trie struct {
    size int
    root *Node
}

func NewTrie() *Trie {
    return &Trie{
        size: 0,
        root: NewNode(),
    }
}

func (t *Trie) Size() int {
    return t.size
}

func (t *Trie) IsEmpty() bool {
    return 0 == t.size
}

func (t *Trie) Add(word string) {
    current := t.root
    for _, s := range word {
        next := current.children[s-'a']
        if next == nil {
            current.children[s-'a'].size++
            current.children[s-'a'] = NewNode()
        }
        current = current.children[s-'a']
    }
    if !current.isWord {
        t.size++
        current.isWord = true
    }
}

func (t *Trie) Contain(word string) bool {
    current := t.root
    for _, s := range word {
        next := current.children[s-'a']
        if next == nil {
            return false
        }
        current = next
    }
    return current.isWord
}

func (t *Trie) Delete(word string) bool {
    var multiChildNode *Node
    multiChildNodeIndex := -1
    current := t.root
    for i, s := range word {
        child := current.children[s-'a']
        if child == nil {
            return false
        }
        if child.Size() > 0 {
            multiChildNodeIndex = i
            multiChildNode = child
        }
        current = child
    }
    //图1
    if current.size > 0 {
        if current.isWord {
            current.isWord = false
            current.size--
            t.size--
            return true
        }
        return false
    }
    //图2
    if multiChildNodeIndex == -1 {
        t.root.children[word[0]-'a'] = nil
        t.size--
        return true
    }
    //图3
    if multiChildNodeIndex != len(word)-1 {
        multiChildNode.children[word[multiChildNodeIndex+1]-'a'] = nil
        t.size--
        return true
    }
    return false
}

Trie查询效率非常高，但是对空间的消耗还是挺大的，这也是典型的空间换时间。

可以使用压缩字典树(Compressed Trie) ，但是维护相对来说复杂一些。

如果我们不止存储英文单词，还有其他特殊字符，那么维护子节点的集合可能会更多。

可以对Trie字典树做些限制，比如每个节点只能有3个子节点，左边的节点是小于父节点的，中间的节点是等于父节点的，右边的子节点是大于父节点的，这就是三分搜索Trie字典树(Ternary Search Trie)。

GinのRoute

上文已经说到Trie适合用于引擎系统用于文本词频统计，核心是做字符串比较的，同样，它也适用于路由匹配，看看Gin是怎么做的吧

路由类型

param 与 catchAll 使用的区别就是 : 与 * 的区别。* 会把路由后面的所有内容赋值给参数 key；但 : 可以多次使用。比如：/user/:id/:no 是合法的，但 /user/*id/:no 是非法的，因为 * 后面所有内容会赋值给参数 id。

//Gin的路由类型
type nodeType uint8

const (
    static nodeType = iota  // 纯静态路由
    root                    // 根节点
    param                   // url中出现':id'
    catchAll                // url中出现'*'
)

type node struct {
    path      string          // 当前节点相对路径（与祖先节点的 path 拼接可得到完整路径）                
    indices   string          // 所有孩子节点的path[0]组成的字符串，通过这个可以找children的位置            
    wildChild bool            // 是否有*或者:           
    nType     nodeType        // 路由类型     
    priority  uint32          // 优先级，当前节点及子孙节点的实际路由数量
    children  []*node         // children
    handlers  HandlersChain   // 业务逻辑包含中间件
    fullPath  string          // 全url路径
}

`path` 和 `indices`

关于 path 和 indices，其实是使用了前缀树的逻辑。

举个栗子：
如果我们有两个路由，分别是 /index，/inter，则根节点为 {path: "/in", indices: "dt"...}，两个子节点为{path: "dex", indices: ""}，{path: "ter", indices: ""}

路由树例子

r.GET("/", func(context *gin.Context) {})
r.GET("/index", func(context *gin.Context) {})
r.GET("/inter", func(context *gin.Context) {})
r.GET("/go", func(context *gin.Context) {})
r.GET("/game/:id", func(context *gin.Context) {})

路由树

源码


// 创建新的路由with业务（包含中间件）
// 不是线程安全的
func (n *node) addRoute(path string, handlers HandlersChain) {
    fullPath := path
    n.priority++

    // Empty tree
    if len(n.path) == 0 && len(n.children) == 0 {
        n.insertChild(path, fullPath, handlers)
        n.nType = root
        return
    }

    parentFullPathIndex := 0

walk:
    for {
        // 找到最长的公共前缀
        // 公共前缀不能包含*和：
        i := longestCommonPrefix(path, n.path)

        // 如果根节点的path包含新输入的path，需要分割path，再进行组合
        if i < len(n.path) {
            child := node{                 
                path:      n.path[i:],       // 用户输入后面字符串
                wildChild: n.wildChild,
                indices:   n.indices,
                children:  n.children,
                handlers:  n.handlers,
                priority:  n.priority - 1,
                fullPath:  n.fullPath,
            }

            n.children = []*node{&child}
            n.indices = bytesToStr([]byte{n.path[i]})
            n.path = path[:i]  //最长公共字符串
            n.handlers = nil 
            n.wildChild = false
            n.fullPath = fullPath[:parentFullPathIndex+i]
        }

        // 新建一个children
        if i < len(path) {
            path = path[i:]

            if n.wildChild {
                parentFullPathIndex += len(n.path)
                n = n.children[0]
                n.priority++

                // Check if the wildcard matches
                if len(path) >= len(n.path) && n.path == path[:len(n.path)] &&
                    // Adding a child to a catchAll is not possible
                    n.nType != catchAll &&
                    // Check for longer wildcard, e.g. :name and :names
                    (len(n.path) >= len(path) || path[len(n.path)] == '/') {
                    continue walk
                }

                pathSeg := path
                if n.nType != catchAll {
                    pathSeg = strings.SplitN(path, "/", 2)[0]
                }
                prefix := fullPath[:strings.Index(fullPath, pathSeg)] + n.path
                panic("'" + pathSeg +
                    "' in new path '" + fullPath +
                    "' conflicts with existing wildcard '" + n.path +
                    "' in existing prefix '" + prefix +
                    "'")
            }

            c := path[0]

            // 处理参数后面的 /
            if n.nType == param && c == '/' && len(n.children) == 1 {
                parentFullPathIndex += len(n.path)
                n = n.children[0]
                n.priority++
                continue walk
            }

            
            for i, max := 0, len(n.indices); i < max; i++ {
                if c == n.indices[i] {
                    parentFullPathIndex += len(n.path)
                    i = n.incrementChildPrio(i)
                    n = n.children[i]
                    continue walk
                }
            }

            // 插入子节点
            if c != ':' && c != '*' {
                n.indices += bytesToStr([]byte{c})
                child := &node{
                    fullPath: fullPath,
                }
                n.children = append(n.children, child)
                n.incrementChildPrio(len(n.indices) - 1)
                n = child
            }
            n.insertChild(path, fullPath, handlers)
            return
        }

        // 否则就是当前节点
        if n.handlers != nil {
            panic("handlers are already registered for path '" + fullPath + "'")
        }
        n.handlers = handlers
        n.fullPath = fullPath
        return
    }
}

// 根据url拿到handle的句柄，url参数放在map里面
func (n *node) getValue(path string, params *Params, unescape bool) (value nodeValue) {
walk: // Outer loop for walking the tree
    for {
        prefix := n.path
        if len(path) > len(prefix) {
            if path[:len(prefix)] == prefix {
                path = path[len(prefix):]
                // 如果该节点没有*和:继续向子节点找
                if !n.wildChild {
                    idxc := path[0]
                    for i, c := range []byte(n.indices) {
                        if c == idxc {
                            n = n.children[i]
                            continue walk
                        }
                    }

                    // 什么都没有发现
                    value.tsr = (path == "/" && n.handlers != nil)
                    return
                }

                // 处理url参数
                n = n.children[0]
                switch n.nType {
                case param:
                    // Find param end (either '/' or path end)
                    end := 0
                    for end < len(path) && path[end] != '/' {
                        end++
                    }

                    // Save param value
                    if params != nil {
                        if value.params == nil {
                            value.params = params
                        }
                        // Expand slice within preallocated capacity
                        i := len(*value.params)
                        *value.params = (*value.params)[:i+1]
                        val := path[:end]
                        if unescape {
                            if v, err := url.QueryUnescape(val); err == nil {
                                val = v
                            }
                        }
                        (*value.params)[i] = Param{
                            Key:   n.path[1:],
                            Value: val,
                        }
                    }

                    // we need to go deeper!
                    if end < len(path) {
                        if len(n.children) > 0 {
                            path = path[end:]
                            n = n.children[0]
                            continue walk
                        }

                        // ... but we can't
                        value.tsr = (len(path) == end+1)
                        return
                    }

                    if value.handlers = n.handlers; value.handlers != nil {
                        value.fullPath = n.fullPath
                        return
                    }
                    if len(n.children) == 1 {
                        // No handle found. Check if a handle for this path + a
                        // trailing slash exists for TSR recommendation
                        n = n.children[0]
                        value.tsr = (n.path == "/" && n.handlers != nil)
                    }
                    return

                case catchAll:
                    // Save param value
                    if params != nil {
                        if value.params == nil {
                            value.params = params
                        }
                        // Expand slice within preallocated capacity
                        i := len(*value.params)
                        *value.params = (*value.params)[:i+1]
                        val := path
                        if unescape {
                            if v, err := url.QueryUnescape(path); err == nil {
                                val = v
                            }
                        }
                        (*value.params)[i] = Param{
                            Key:   n.path[2:],
                            Value: val,
                        }
                    }

                    value.handlers = n.handlers
                    value.fullPath = n.fullPath
                    return

                default:
                    panic("invalid node type")
                }
            }
        }

        if path == prefix {
            // 已经找到句柄
            if value.handlers = n.handlers; value.handlers != nil {
                value.fullPath = n.fullPath
                return
            }

            
            if path == "/" && n.wildChild && n.nType != root {
                value.tsr = true
                return
            }

        
            for i, c := range []byte(n.indices) {
                if c == '/' {
                    n = n.children[i]
                    value.tsr = (len(n.path) == 1 && n.handlers != nil) ||
                        (n.nType == catchAll && n.children[0].handlers != nil)
                    return
                }
            }

            return
        }

        
        value.tsr = (path == "/") ||
            (len(prefix) == len(path)+1 && prefix[len(path)] == '/' &&
                path == prefix[:len(prefix)-1] && n.handlers != nil)
        return
    }
}

字典树-Trie

字典树-Trie

简介

CRD

插入

查找

删除

GinのRoute

路由类型

`path` 和 `indices`

路由树例子

相关流程图

源码

参考&致谢