A Trie is a tree based data structure that stores keys based on a node’s position in the tree rather than explicitly assigning a key. From the Wikipedia article:
Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree defines the key with which it is associated.
In other words, instead of labeling each node directly the key for a specific node is implied from traversing the tree to its location and keeping track of the nodes along that path.
This example shows a trie made up of the words “A”, “to”, “tea”, “ted”, “ten”, “i”, “in”, and “inn”. None of the nodes are labeled, but we can still search the trie for keys by traversing from the root node along each character in a query string, until we either find a valid node or not. For example, if we search for “ten” we start at the root node, follow the path “t” to “e” to “n”, which ends in a valid node. However, if we search for “tent” then when we get to the “n” node there won’t be a “t” to follow so we know “tent” doesn’t exist in the trie.
Implementation
First, we need to define a Trie type using a struct. This is a recursive type so that each node in a Trie is itself a Trie. The constructor function initialises a Trie with an empty map to contain its child nodes.
1 2 3 4 5 6 7 8 9 |
|
Adding items to a Trie is simple: recurse through the string, adding new child nodes for each character. If a node already exists it’s skipped over. If all the characters are consumed the terminated flag is flipped to true to signify this node represents the end of a key.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
Note that we’re not storing any data about the key the node represents. That is determined purely from the node’s position within the Trie structure.
Therefore, to get a key we use the same approach used to add one: recurse through the key string, if all the characters are consumed and the current node is terminated then we return it. If all the characters aren’t used or the final node discovered isn’t terminated then no result is returned.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
I’ve left out some of the details to keep things simple. For example, Remove and Children are important methods for Tries to implement — if you’re interested in seeing how these work take a look at the full source code.
Applications
So now that we can represent a set of keys as a trie what advantages does this have over other data structures? One feature is that trie’s give us fast look up times to a set of keys whilst maintaining a distinct node for each key in the set. Another distinguishing feature is that nodes are shared where possible (e.g. in the case of ‘f’ and ‘o’ in a trie of ‘for’, ‘foe’, and ‘food’), which reduces the space required to store a set of keys and also provides quick access to common prefixes between keys. I’ll discuss some concrete application below.
Dictionaries
One obvious use for tries is as the backing data structure for a dictionary. There aren’t many changes required for this to work so the Dict
struct is essentially a thin wrapper around a Trie that implements a simplified interface to store and retrieve data.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
Note that this makes use of the Value field exposed in the Trie struct to store additional data associated with a node.
This can now be used to store data in the same way you’d expect to used a standard dictionary. For example:
1 2 3 |
|
This implementation is easier to understand in many ways compared to standard hash tables because it avoids having to deal with collisions. However, lookup times will be O(m) where m is the length of the key compared to expected constant time for a properly tuned hash table.
Autocomplete
As I mentioned above one of the interesting features of tries is that they share nodes where possible. This allows us to quickly find all keys with a common prefix by iterating through the trie until we find the prefix and then returning all of its children.
This provides an easy way to implement autocomplete by inserting a dictionary of words into a trie and then finding all the children of the current substring.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
|
This can now be used to complete words by providing a root prefix:
1 2 |
|
Suffix tries
Finally, a special variation on a trie is a suffix trie. A suffix trie is simply a trie of all the possible suffixes of a string. For example:
1 2 3 4 5 6 7 8 |
|
It might not be immediately apparent why this is useful, but consider testing for the existence of “woe” in “my woes”. This can now be achieved by searching the trie for the substring.
1 2 3 4 5 |
|
This would normally take O(n) in the worst case because the whole string would need to be scanned to locate, or not locate, the substring. Preprocessing this string into a suffix array turns that into a O(q) operation where q is equal to the length of the query string.
Conclusion
If you’re interested in seeing the full code it’s available here. However, this is purely for my own education and isn’t a serious project — here are some much better implementations.