Richard Layte

Implementing Tries in Go

A Trie is a tree based data structure that stores keys based on a node’s position in the tree rather than explicitly assigning a key. From the Wikipedia article:

Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree defines the key with which it is associated.

In other words, instead of labeling each node directly the key for a specific node is implied from traversing the tree to its location and keeping track of the nodes along that path.

This example shows a trie made up of the words “A”, “to”, “tea”, “ted”, “ten”, “i”, “in”, and “inn”. None of the nodes are labeled, but we can still search the trie for keys by traversing from the root node along each character in a query string, until we either find a valid node or not. For example, if we search for “ten” we start at the root node, follow the path “t” to “e” to “n”, which ends in a valid node. However, if we search for “tent” then when we get to the “n” node there won’t be a “t” to follow so we know “tent” doesn’t exist in the trie.

Implementation

First, we need to define a Trie type using a struct. This is a recursive type so that each node in a Trie is itself a Trie. The constructor function initialises a Trie with an empty map to contain its child nodes.

1
2
3
4
5
6
7
8
9
type Trie struct {
  terminated bool
  value interface{}
  children map[string]*Trie
}

func NewTrie() *Trie {
  return &Trie{children: map[string]*Trie{}}
}

Adding items to a Trie is simple: recurse through the string, adding new child nodes for each character. If a node already exists it’s skipped over. If all the characters are consumed the terminated flag is flipped to true to signify this node represents the end of a key.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
func (t *Trie) Add(k string) *Trie {
  if len(k) == 0 {
    t.terminated = true
    return t
  }

  head := string(k[0])
  tail := string(k[1:])

  child, ok := t.children[head]

  if !ok {
    child = NewTrie()
    t.children[head] = child
  }

  return child.Add(tail)
}

Note that we’re not storing any data about the key the node represents. That is determined purely from the node’s position within the Trie structure.

Therefore, to get a key we use the same approach used to add one: recurse through the key string, if all the characters are consumed and the current node is terminated then we return it. If all the characters aren’t used or the final node discovered isn’t terminated then no result is returned.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
func (t *Trie) Get(k string) (*Trie, bool) {
  if len(k) == 0 {
    return t, t.terminated
  }

  head := string(s[0])
  tail := string(s[1:])

  if child, ok := t.children[head]; ok {
    return child.Get(tail)
  }

  return t, false
}

I’ve left out some of the details to keep things simple. For example, Remove and Children are important methods for Tries to implement — if you’re interested in seeing how these work take a look at the full source code.

Applications

So now that we can represent a set of keys as a trie what advantages does this have over other data structures? One feature is that trie’s give us fast look up times to a set of keys whilst maintaining a distinct node for each key in the set. Another distinguishing feature is that nodes are shared where possible (e.g. in the case of ‘f’ and ‘o’ in a trie of ‘for’, ‘foe’, and ‘food’), which reduces the space required to store a set of keys and also provides quick access to common prefixes between keys. I’ll discuss some concrete application below.

Dictionaries

One obvious use for tries is as the backing data structure for a dictionary. There aren’t many changes required for this to work so the Dict struct is essentially a thin wrapper around a Trie that implements a simplified interface to store and retrieve data.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
package dict

import github.com/rlayte/trie-go

type Dict struct {
  data *trie.Trie
}

func NewDict() Dict {
  return Dict{data: trie.NewTrie()}
}

func (d Dict) Get(k string) (interface{}, bool) {
  child, ok := d.data.Get(k)
  return child.Value, ok
}

func (d Dict) Put(k string, v interface{}) {
  child := d.data.Add(k)
  child.Value = v
}

Note that this makes use of the Value field exposed in the Trie struct to store additional data associated with a node.

This can now be used to store data in the same way you’d expect to used a standard dictionary. For example:

1
2
3
d := NewDict()
d.Put("Hello", "World")
log.Println(d.Get("Hello")) // print "World"

This implementation is easier to understand in many ways compared to standard hash tables because it avoids having to deal with collisions. However, lookup times will be O(m) where m is the length of the key compared to expected constant time for a properly tuned hash table.

Autocomplete

As I mentioned above one of the interesting features of tries is that they share nodes where possible. This allows us to quickly find all keys with a common prefix by iterating through the trie until we find the prefix and then returning all of its children.

This provides an easy way to implement autocomplete by inserting a dictionary of words into a trie and then finding all the children of the current substring.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
package complete

import github.com/rlayte/trie-go

type Completer struct {
  dictionary *trie.Trie
}

func NewCompleter(words []string) Completer {
  c := Completer{dictionary: trie.NewTrie()}

  for _, word := range words {
    c.dictionary.Add(word)
  }

  return c
}

func (c *Completer) Complete(prefix string) []string {
  child, _ := c.dictionary.Get(prefix)
  return child.Children(prefix)
}

This can now be used to complete words by providing a root prefix:

1
2
c := NewCompleter([]string{Food, Foods, Foodie, Foodstuff})
completions := c.Complete("Foo") // ["Food", "Foods", ...]

Suffix tries

Finally, a special variation on a trie is a suffix trie. A suffix trie is simply a trie of all the possible suffixes of a string. For example:

1
2
3
4
5
6
7
8
// Suffixes for the string ‘my woes’
suffixes = ["my woes", "y woes", " woes", "woes", "oes", "es", "s"]

suffixTrie := NewTrie()

for _, k := suffixes {
  suffixTrie.Add(k)
}

It might not be immediately apparent why this is useful, but consider testing for the existence of “woe” in “my woes”. This can now be achieved by searching the trie for the substring.

1
2
3
4
5
q = "woe"

if n, _ := suffixTrie.Get(q); n {
  log.Println("running through the 6 with...")
}

This would normally take O(n) in the worst case because the whole string would need to be scanned to locate, or not locate, the substring. Preprocessing this string into a suffix array turns that into a O(q) operation where q is equal to the length of the query string.

Conclusion

If you’re interested in seeing the full code it’s available here. However, this is purely for my own education and isn’t a serious project — here are some much better implementations.