While it is hard to find an unanimous definition of "Radix Tree", most accepted definitions of Radix Tree indicate that it is a compacted Prefix Tree. What I'm struggling to understand is the significance of the term "radix" in this case. Why compacted prefix trees are so named (i.e. Radix Tree) and non-compacted ones are not called Radix Tree?
Wikipedia can answer this, https://en.wikipedia.org/wiki/Radix:
In mathematical numeral systems, the radix or base is the number of
unique digits, including zero, used to represent numbers in a
positional numeral system. For example, for the decimal system (the
most common system in use today) the radix is ten, because it uses the
ten digits from 0 through 9.
and the tree https://en.wikipedia.org/wiki/Radix_tree:
a data structure that represents a space-optimized trie in which each
node that is the only child is merged with its parent. The result is
that the number of children of every internal node is at least the
radix r of the radix tree, where r is a positive integer and a power x
of 2, having x ≥ 1
Finally check a dictionary:
1.radix(Noun)
A primitive word, from which other words spring.
The radix in the radix tree determines the balance between the amount of children (or depth) of the tree and the 'sparseness', or how many suffixes are unique.
EDIT - elaboration
the number of children of every internal node is at least the radix r
Let's consider the words "aba,abnormal,acne, and abysmal". In a regular prefix tree (or trie), every arc adds a single letter to the word, so we have:
-a-b-a-
n-o-r-m-a-l-
y-s-m-a-l-
-c-n-e-
My drawing is a bit misleading - in tries the letters usually sit on arcs, so '-' is a node and the letters are edges. Note many internal nodes have one child! Now the compact (and obvious) form:
-a-b -a-
normal-
ysmal-
cne-
Well now we have an inner node (behind b) with 3 children! The radix is a positive power of 2, so 2 in this case. Why 2 and not say 3? Well first note the root has 2 children. In addition, suppose we want to add a word. Options:
shares the b prefix - well, 4 is greater than 2.
shares an edge of a child of b - say 'abnormally'. Well The way insertion works the shared part will split and we'll have:
Relevant branch:
-normal-ly-
-
normal is an inner node now, but has 2 children (one a leaf).
- another case would be deleting acne for example. But now the compactness property says The node after b must be merged back, since it's the only child, so the tree becomes
tree:
-ab-a
-normal-ly-
-
-ysmal
so, we still maintain children>2.
Hope this clarifies!
I have known that Huffman Tree is a kind of tree used for optimal prefix code, but is there any tree for optimal prefix code other than Huffman tree? If there are some that kind of trees, will their heights be the same?
Many thanks!
Huffman trees are constructed recursively by taking the two currently lowest probability symbols and combining them.
If there are other symbols with the same low probability, then these symbols can be combined instead.
This means that the final tree is not uniquely defined and there are multiple optimal prefix codes with potentially different heights.
For example, consider the symbols and probabilities below:
A 1/3
B 1/3
C 1/6
D 1/6
Can be encoded as:
A 0
B 10
C 110
D 111
or
A 00
B 01
C 10
D 11
Both encodings have an expected number of bits per symbol equal to 2, but different heights.
However, all optimal prefix codes can be constructed by the Huffman algorithm for a suitable choice of ordering with respect to probability ties.
Within the constraints of the Huffman code problem, i.e. representation of each symbol by a prefix-unique sequence of bits, then there is exactly one optimal total number of bits that can be achieved, and the Huffman algorithm achieves that. There are other approaches that arrive at the same answer.
As Peter de Rivaz noted, in certain special cases the Huffman algorithm has more than one choice at some steps in which of two minimum probability set of codes to pick, which can result in different trees. So the tree height/depth you mention is not unique, but the total number of bits (sum of the bit lengths of each symbol weighted by its probability) is always the same.
I'm currently trying to get my head around the variations of Trie and was wondering if anyone would be able to clarify a couple of points. (I got quite confused with the answer to this question Tries versus ternary search trees for autocomplete?, especially within the first paragraph).
From what I have read, is the following correct? Supposing we have stored n elements in the data structures, and L is the length of the string we are searching for:
A Trie stores its keys at the leaf nodes, and if we have a positive hit for the search then this means that it will perform O(L) comparisons. For a miss however, the average performance is O(log2(n)).
Similarly, a Radix tree (with R = 2^r) stores the keys at the leaf nodes and will perform O(L) comparisons for a positive hit. However misses will be quicker, and occur on average in O(logR(n)).
A Ternary Search Trie is essentially a BST with operations <,>,= and with a character stored in every node. Instead of comparing the whole key at a node (as with BST), we only compare a character of the key at that node. Overall, supposing our alphabet size is A, then if there is a hit we must perform (at most) O(L*A) = O(L) comparisons. If there is not a hit, on average we have O(log3(n)).
With regards to the Radix tree, if for example our alphabet is {0,1} and we set R = 4, for a binary string 0101 we would only need two comparisons right? Hence if the size of our alphabet is A, we would actually only perform L * (A / R) comparisons? If so then I guess this just becomes O(L), but I was curious if this was correct reasoning.
Thanks for any help you folks can give!
Are the trie and radix trie data structures the same thing?
If they aren't the same, then what is the meaning of radix trie (AKA Patricia trie)?
A radix tree is a compressed version of a trie. In a trie, on each edge you write a single letter, while in a PATRICIA tree (or radix tree) you store whole words.
Now, assume you have the words hello, hat and have. To store them in a trie, it would look like:
e - l - l - o
/
h - a - t
\
v - e
And you need nine nodes. I have placed the letters in the nodes, but in fact they label the edges.
In a radix tree, you will have:
*
/
(ello)
/
* - h - * -(a) - * - (t) - *
\
(ve)
\
*
and you need only five nodes. In the picture above nodes are the asterisks.
So, overall, a radix tree takes less memory, but it is harder to implement. Otherwise the use case of both is pretty much the same.
My question is whether Trie data structure and Radix Trie are the same thing?
In short, no. The category Radix Trie describes a particular category of Trie, but that doesn't mean that all tries are radix tries.
If they are[n't] same, then what is the meaning of Radix trie (aka Patricia Trie)?
I assume you meant to write aren't in your question, hence my correction.
Similarly, PATRICIA denotes a specific type of radix trie, but not all radix tries are PATRICIA tries.
What is a trie?
"Trie" describes a tree data structure suitable for use as an associative array, where branches or edges correspond to parts of a key. The definition of parts is rather vague, here, because different implementations of tries use different bit-lengths to correspond to edges. For example, a binary trie has two edges per node that correspond to a 0 or a 1, while a 16-way trie has sixteen edges per node that correspond to four bits (or a hexidecimal digit: 0x0 through to 0xf).
This diagram, retrieved from Wikipedia, seems to depict a trie with (at least) the keys 'A', 'to', 'tea', 'ted', 'ten', 'i', 'in' and 'inn' inserted:
If this trie were to store items for the keys 't' or 'te' there would need to be extra information (the numbers in the diagram) present at each node to distinguish between nullary nodes and nodes with actual values.
What is a radix trie?
"Radix trie" seems to describe a form of trie that condenses common prefix parts, as Ivaylo Strandjev described in his answer. Consider that a 256-way trie which indexes the keys "smile", "smiled", "smiles" and "smiling" using the following static assignments:
root['s']['m']['i']['l']['e']['\0'] = smile_item;
root['s']['m']['i']['l']['e']['d']['\0'] = smiled_item;
root['s']['m']['i']['l']['e']['s']['\0'] = smiles_item;
root['s']['m']['i']['l']['i']['n']['g']['\0'] = smiling_item;
Each subscript accesses an internal node. That means to retrieve smile_item, you must access seven nodes. Eight node accesses correspond to smiled_item and smiles_item, and nine to smiling_item. For these four items, there are fourteen nodes in total. They all have the first four bytes (corresponding to the first four nodes) in common, however. By condensing those four bytes to create a root that corresponds to ['s']['m']['i']['l'], four node accesses have been optimised away. That means less memory and less node accesses, which is a very good indication. The optimisation can be applied recursively to reduce the need to access unnecessary suffix bytes. Eventually, you get to a point where you're only comparing differences between the search key and indexed keys at locations indexed by the trie. This is a radix trie.
root = smil_dummy;
root['e'] = smile_item;
root['e']['d'] = smiled_item;
root['e']['s'] = smiles_item;
root['i'] = smiling_item;
To retrieve items, each node needs a position. With a search key of "smiles" and a root.position of 4, we access root["smiles"[4]], which happens to be root['e']. We store this in a variable called current. current.position is 5, which is the location of the difference between "smiled" and "smiles", so the next access will be root["smiles"[5]]. This brings us to smiles_item, and the end of our string. Our search has terminated, and the item has been retrieved, with just three node accesses instead of eight.
What is a PATRICIA trie?
A PATRICIA trie is a variant of radix tries for which there should only ever be n nodes used to contain n items. In our crudely demonstrated radix trie pseudocode above, there are five nodes in total: root (which is a nullary node; it contains no actual value), root['e'], root['e']['d'], root['e']['s'] and root['i']. In a PATRICIA trie there should only be four. Let's take a look at how these prefixes might differ by looking at them in binary, since PATRICIA is a binary algorithm.
smile: 0111 0011 0110 1101 0110 1001 0110 1100 0110 0101 0000 0000 0000 0000
smiled: 0111 0011 0110 1101 0110 1001 0110 1100 0110 0101 0110 0100 0000 0000
smiles: 0111 0011 0110 1101 0110 1001 0110 1100 0110 0101 0111 0011 0000 0000
smiling: 0111 0011 0110 1101 0110 1001 0110 1100 0110 1001 0110 1110 0110 0111 ...
Let us consider that the nodes are added in the order they are presented above. smile_item is the root of this tree. The difference, bolded to make it slightly easier to spot, is in the last byte of "smile", at bit 36. Up until this point, all of our nodes have the same prefix. smiled_node belongs at smile_node[0]. The difference between "smiled" and "smiles" occurs at bit 43, where "smiles" has a '1' bit, so smiled_node[1] is smiles_node.
Rather than using NULL as branches and/or extra internal information to denote when a search terminates, the branches link back up the tree somewhere, so a search terminates when the offset to test decreases rather than increasing. Here's a simple diagram of such a tree (though PATRICIA really is more of a cyclic graph, than a tree, as you'll see), which was included in Sedgewick's book mentioned below:
A more complex PATRICIA algorithm involving keys of variant length is possible, though some of the technical properties of PATRICIA are lost in the process (namely that any node contains a common prefix with the node prior to it):
By branching like this, there are a number of benefits: Every node contains a value. That includes the root. As a result, the length and complexity of the code becomes a lot shorter and probably a bit faster in reality. At least one branch and at most k branches (where k is the number of bits in the search key) are followed to locate an item. The nodes are tiny, because they store only two branches each, which makes them fairly suitable for cache locality optimisation. These properties make PATRICIA my favourite algorithm so far...
I'm going to cut this description short here, in order to reduce the severity of my impending arthritis, but if you want to know more about PATRICIA you can consult books such as "The Art of Computer Programming, Volume 3" by Donald Knuth, or any of the "Algorithms in {your-favourite-language}, parts 1-4" by Sedgewick.
TRIE:
We can have a search scheme where instead of comparing a whole search key with all existing keys (such as a hash scheme), we could also compare each character of the search key. Following this idea, we can build a structure (as shown below) which has three existing keys – “dad”, “dab”, and ”cab”.
[root]
...// | \\...
| \
c d
| \
[*] [*]
...//|\. ./|\\... Fig-I
a a
/ /
[*] [*]
...//|\.. ../|\\...
/ / \
B b d
/ / \
[] [] []
(cab) (dab) (dad)
This is essentially an M-ary tree with internal node, represented as [ * ] and leaf node, represented as [ ].
This structure is called a trie. The branching decision at each node can be kept equal to the number of unique symbols of the alphabet, say R. For lower case English alphabets a-z, R=26; for extended ASCII alphabets, R=256 and for binary digits/strings R=2.
Compact TRIE:
Typically, a node in a trie uses an array with size=R and thus causes waste of memory when each node has fewer edges. To circumvent the memory concern, various proposals were made. Based on those variations trie are also named as “compact trie” and “compressed trie”. While a consistent nomenclature is rare, a most common version of a compact trie is formed by grouping all edges when nodes have single edge. Using this concept, the above (Fig-I) trie with keys “dad”, “dab”, and ”cab” can take below form.
[root]
...// | \\...
| \
cab da
| \
[ ] [*] Fig-II
./|\\...
| \
b d
| \
[] []
Note that each of ‘c’, ‘a’, and ‘b’ is sole edge for its corresponding parent node and therefore, they are conglomerated into a single edge “cab”. Similarly, ‘d’ and a’ are merged into single edge labelled as “da”.
Radix Trie:
The term radix, in Mathematics, means a base of a number system, and it essentially indicates the number of unique symbols needed to represent any number in that system. For example, decimal system is radix ten,and binary system is radix two. Using the similar concept, when we’re interested in characterizing a data structure or an algorithm by the number of unique symbols of the underlying representational system, we tag the concept with the term “radix”. For example, “radix sort” for certain sorting algorithm. In the same line of logic, all variants of trie whose characteristics (such as depth, memory need, search miss/hit runtime, etc.) depend on radix of the underlying alphabets, we may call them radix “trie’s”. For example, an un-compacted as well as a compacted trie when uses alphabets a-z, we can call it a radix 26 trie. Any trie that uses only two symbols (traditionally ‘0’ and ‘1’) can be called a radix 2 trie. However, somehow many literatures restricted the use of the term “Radix Trie” only for the compacted trie.
Prelude to PATRICIA Tree/Trie:
It would be interesting to notice that even strings as keys can be represented using binary-alphabets. If we assume ASCII encoding, then a key “dad” can be written in binary form by writing the binary representation of each character in sequence, say as “011001000110000101100100” by writing binary forms of ‘d’, ‘a’, and ‘d’ sequentially.
Using this concept, a trie (with Radix Two) can be formed. Below we depict this concept using a simplified assumption that the letters ‘a’,’b’,’c’, and’d’ are from a smaller alphabet instead of ASCII.
Note for Fig-III:
As mentioned, to make the depiction easy, let’s assume an alphabet with only 4 letters {a,b,c,d} and their corresponding binary representations are “00”, “01”, “10” and “11” respectively. With this, our string keys “dad”, “dab”, and ”cab” become “110011”, “110001”, and “100001” respectively. The trie for this will be as shown below in Fig-III (bits are read from left to right just like strings are read from left to right).
[root]
\1
\
[*]
0/ \1
/ \
[*] [*]
0/ /
/ /0
[*] [*]
0/ /
/ /0
[*] [*]
0/ 0/ \1 Fig-III
/ / \
[*] [*] [*]
\1 \1 \1
\ \ \
[] [] []
(cab) (dab) (dad)
PATRICIA Trie/Tree:
If we compact the above binary trie (Fig-III) using single edge compaction, it would have much less nodes than shown above and yet, the nodes would still be more than the 3, the number of keys it contains. Donald R. Morrison found (in 1968) an innovative way to use binary trie to depict N keys using only N nodes and he named this data structure PATRICIA. His trie structure essentially got rid of single edges (one-way branching); and in doing so, he also got rid of the notion of two kinds of nodes – inner nodes (that don’t depict any key) and leaf nodes (that depict keys). Unlike the compaction logic explained above, his trie uses a different concept where each node includes an indication of how many bits of a key are to be skipped to make the branching decision. Yet another characteristic of his PATRICIA trie is that it doesn’t store the keys – which means such data structure will not be suitable for answering questions like, list all keys that match a given prefix, but is good for finding if a key exists or not in the trie. Nonetheless, the term Patricia Tree or Patricia Trie has, since then, been used in many different but similar senses, such as, to indicate a compact trie [NIST], or to indicate a radix trie with radix two [as indicated in a subtle way in WIKI] and so on.
Trie that may not be a Radix Trie:
Ternary Search Trie (aka Ternary Search Tree) often abbreviated as TST is a data structure (proposed by J. Bentley and R. Sedgewick) which looks very similar to a trie with three-way branching. For such tree, each node has a characteristic alphabet ‘x’ so that branching decision is driven by whether a character of a key is less than, equal to or greater than ‘x’. Due to this fixed 3-way branching feature, it provides a memory-efficient alternative for trie, especially when R (radix) is very large such as for Unicode alphabets. Interestingly, the TST, unlike (R-way) trie, doesn’t have its characteristics influenced by R. For example, search miss for TST is ln(N) as opposed logR(N) for R-way Trie. Memory requirements of TST, unlike R-way trie is NOT a function of R as well. So we should be careful to call a TST a radix-trie. I, personally, don’t think we should call it a radix-trie since none (as far as I know) of its characteristics are influenced by the radix,R, of its underlying alphabets.
In tries, most of the nodes don’t store keys and are just hops on a
path between a key and the ones that extend it. Most of these hops are
necessary, but when we store long words, they tend to produce long
chains of internal nodes, each with just one child. This is the main
reason tries need too much space, sometimes more than BSTs.
Radix tries (aka radix trees, aka Patricia trees) are based on the
idea that we can somehow compress the path, for example after
"intermediate t node", we could have "hem" in one node, or "idote" in
one node.
Here is a graph to compare trie vs radix trie:
The original trie has 9 nodes and 8 edges, and if we assume 9 bytes
for an edge, with a 4-byte overhead per node, this means
9 * 4 + 8 * 9 = 108 bytes.
The compressed trie on the right has 6 nodes and 5 edges but in this
case each edge carries a string, not just a character; however, we can
simplify the operation by accounting for edge references and string
labels separately. This way, we would still count 9 bytes per edge
(because we would include the string terminator byte in the edge
cost), but we could add the sum of string lengths as a third term in
the final expression; the total number of bytes needed is given by
6 * 4 + 5 * 9 + 8 * 1 = 77 bytes.
For this simple trie, the compressed version requires 30% less
memory.
Reference
Since every integer can be represented as a series of bits of some length, it seems like you could sort a bunch of integers in the following way:
Insert each integer into a binary trie (a trie where each node has two children, one labeled 0 and one labeled 1).
Using the standard algorithm for listing all words in a trie in sorted order, list off all the integers in sorted order.
Is this sorting algorithm ever used in practice?
I haven't actually seen this algorithm ever used before. This is probably because of the huge memory overhead - every bit of the numbers gets blown up into a node containing two pointers and a bit of its own. On a 32-bit system, this is a 64x blowup in memory, and on 64-bit system this is a 128x blowup in memory.
However, this algorithm is extremely closely related to most-significant digit radix sort (also called binary quicksort), which is used frequently in practice. In fact, you can think of binary quicksort as a space-efficient implementation of this algorithm.
The connection is based on the recursive structure of the trie. If you think about the top node of the trie, it will look like this:
*
/ \
/ \
All #s All #s
with with
MSB 0 MSB 1
If you were to use the standard pre-order traversal of the trie to output all the numbers in sorted order, the algorithm would first print out all numbers starting with a 0 in sorted order, then print out all numbers starting with a 1 in sorted order. (The root node is never printed, since all numbers have the same number of bits in them, and therefore all the numbers are actually stored down at the leaves).
So suppose that you wanted to simulate the effect of building the trie and doing a preorder traversal on it. You would know that this would start off by printing out all the numbers with MSB 0, then would print all the numbers with MSB 1. Accordingly, you could start off by splitting the numbers into two groups - one with all numbers having MSB 0, and one with all numbers having MSB 1. You would then recursively sort all the numbers with MSB 0 and print them out, then recursively sort all the numbers starting with MSB 1 and print them out. This recursive process would continue until eventually you had gone through all of the bits of the numbers, at which point you would just print out each number individually.
The above process is almost identically the binary quicksort algorithm, which works like this:
If there are no numbers left, do nothing.
If there is one number left, print it out.
Otherwise:
Split the numbers into two groups based on their first bit.
Recursively sort and print the numbers starting with 0.
Recurisvely sort and print the numbers starting with 1.
There are some differences between these algorithms. Binary quicksort works by recursively splitting the list into smaller and smaller pieces until everything is sorted, while the trie-based algorithm builds the trie and then reconstructs the numbers. However, you can think of binary quicksort as an optimization of the algorithm that simultaneously builds up and walks the trie.
So in short: I doubt anyone would want to use the trie-based algorithm due to the memory overhead, but it does give a great starting point for deriving the MSD radix sort algorithm!
Hope this helps!