implementing binary trees - data structure, trie - binary-tree

Binary trees. The principal is understandable, but what do they really looks like in terms of arrays or associative arrays?
if the data structures I have available to me are:
AssociativeArray={tag:value,tag:value,tag:value}
(of course, each tag is unique)
and
Array=[value,value,value]
(where the value can be any data type including array)
examples:
DictOfWords={greeting:"hello",sound:"music",sleep:"dream",words:["hello","music","dream"]}
listOfWords=["hello","music","dream",DictOfWords]
what would a binary tree built out of one or both of those look like?
further, what would the data structure for a trie for word search look like built from those data structures?
What would a node of a trie look like? Would it be an Associative Array or a linear array or some combination of the two? I understand from this post that "A trie holds one node per character"
so would the top level structure be something like:
trie={a:{},b:{},c:{}...}
or
trie={a:[],b:[],c:{}...}
or
trie=["a","b","c"...]

Tries and Binary Trees are usually not thought of as arrays or associative arrays. They are thought of as collections of nodes, usually implemented as a struct. For example, binary trees tend to look like
struct BTreeNode
{
value_type value;
BTreeNode* left;
BTreeNode* right;
};
And Tries tend to look like
struct TrieNode
{
char_type letter;
associative_array<char_type, TrieNode*> children;
};
Now if you are only looking to model this with arrays and associative arrays, the question is going to be: what do you intend to do with them? If all you need to do is store data in a tree/trie structure, you have a lot of options. However, if you actually want to use a BTree as a BTree or a Trie as a Trie, we have to make sure that whatever transformation you use to convert structures to arrays/associative arrays works. The easiest one: treat each struct as an associative array with a constant number of entries
4
/ \
2 5
/ \ \
1 3 6
Would be usually done as
BTreeNode oneNode(1, null, null);
BTreeNode threeNode(3, null, null);
BTreeNode twoNode(2, oneNode, threeNode);
BTreeNode sixNode(6, null, null);
BTreeNode fiveNode(5, null, sixNode);
BTreeNode fourNode(4, twoNode, fiveNode);
You can do a 1-to-1 conversion of those structs to associative arrays and get
fourNode = { value: 4,
left: {
value: 2,
left: {
value: 1
},
right: {
value: 3
}
},
right: {
value: 5,
right: {
value:6
}
}
}
There is a comparable conversion to arrays, but it is less obvious to read
A comparable trie storing "abc" "abd" "abe" "ace" creates a trie structure that looks like
a
/ \
b c
/ | \ \
c d e e
Doing the same conversion from structs to values as above, you get
trie = {
letter: 'a',
children: {
'b': {
letter: 'b'
children: {
'c': { letter: 'c' },
'd': { letter: 'd' },
'e': { letter: 'e' }
}
'c': {
letter: 'c'
children: {
'e': { letter: 'e' }
}
}
}
However, standing by my original comments, "what do they really looks like in terms of arrays or associative arrays?" is unanswerable. They don't actually get implemented as arrays or associative arrays at all, so "really look like" cannot be used alongside "arrays or associative arrays." Think of them in terms of the node structures that they are really constructed from, and you will go much further.
For example, there is an idea of a self balancing binary tree. Those structures are very easy to understand if you think of the structures as a bunch of nodes linked together. If you try to think of a self balancing binary tree in terms of arrays/associative arrays, you will have a LOT of trouble, because they tend to have a pointer back to their parent, which creates really messed up looking associative arrays.
struct SelfBalancingBTreeNode
{
value_type value;
SelfBalancingBTreeNode* parent;
SelfBalancingBTreeNode* left;
SelfBalancingBTreeNode* right;
};
To model this you need to have really interesting associative array structures
leftNode = { value: 1, parent: null, left: null, right: null}
parentNode = value: 2, parent: null, left: leftNode, right: null}
leftNode['parent'] = parentNode
Which creates cycles that are not commonly thought of when using associative arrays

Binary tree:
1
/ \
2 3
/ \ / \
4 5 6 7
Can be represented as:
[1, 2, 3, 4, 5, 6, 7]
Thus a node at index i's children are at indices 2i and 2i+1.
Or it can be represented as:
{1:[2,3], 2:[4,5], 3:[6,7]}
With a reference to the root somewhere.
Trie:
1
a / \ b
2 3
a / \ b
4 5
Can be represented as:
{1:{a:2, b:3},
2:{a:4, b:5},
3:{},
4:{},
5:{}
}

Related

Finding all subsequences from dictionary

In a program I need to efficiently answer queries of the following form:
Given a set of strings A and a query string q return all s ∈ A such that s is a subsequence of q
For example, given A = {"abc", "aaa", "abd"} and q = "abcd", "abc" and "abd" should be returned.
Is there any better way than iterating each element of A and checking if it is a subsequence of q?
NOTE: I have STRIPS planner or automated planner in mind. Each state in STRIPS planner is a set of propositions like {"(room rooma)", "(at-robby rooma)", "(at ball1 rooma)"}. I want to find all ground actions applicable to a given state. Actions in STRIPS planner basically consist of two parts, preconditions and effects(which are not really relevant here). Preconditions are set of propositions needed to be true to apply an action to a state. For example, to apply an action"(move rooma roomb)", its preconditions, {"(room rooma)", "(room roomb)","(at-robby rooma)"} must all be true in the state.
If your set A is large and you have many queries, you could implement a trie-like structure, where level n refers to character n in a string. In your example:
trie = {
a: {
a: {
a: { value: "aaa"}
},
b {
c: { value: "abc"},
d: { value: "abd"}
}
}
}
That would enable you to look up matches in a forked path through the trie:
function query(trie, q) {
s = Set();
if (q.isEmpty()) {
if (trie.value) s.add(t.value);
} else {
s = s.union(query(trie, q[1:]));
c = substr(q, 0, 1);
if (t[c]) {
s = s.union(query(t[c], substr(q, 1));
}
}
return s;
}
Efectively, you will generate all 2^m subsets of the quesy string of m characters, but in practice, the trie is sparse and you end up checking fewer paths.
The speed payoff comes with many lookups. Building the trie is more costly than doing a brute-force lookup. But if you build the trie only one or have a means to update the trie when you update the set A, you wil get a good lookup performance.
The actual data structure for the trie nodes depends on how many possible elements the items can have. In your example, only four letters are used. If you have a limited range of "letters", you can use an array. Otherwise you might need a sort of dictionary, which might make the tree quite big in memory.

How to quick search in huge range-value pairs?

For example we have some range-value pairs:
<<1, 2>, 65>
<<3, 37>, 75>
<<45, 159>, 12>
<<160, 200>, 23>
<<210, 255>, 121>
And these ranges are disjoint.
Give a integer 78, and the corresponding range is <45, 159>, so output the value 12.
As the range maybe very large, I now use map to storage these range-value pairs.
Evevy searching will scan the entire map set, it's O(n).
Is there any good ways except binary search?
Binary search is definitely your best option. O(log n) is hard to beat!
If you don't want to hand-code the binary search, and you're using C++, you can use the std::map::upper_bound function, which implements binary search:
std::map<std::pair<int, int>, int> my_map;
int val = 78; // value to search for
auto it = my_map.upper_bound(std::make_pair(val, val));
if(it != my_map.begin()) it--;
if(it->first.first <= val && it->first.second >= val) {
/* Found it, value is it->second */
} else {
/* Didn't find it */
}
std::map::upper_bound is guaranteed to run in logarithmic time, because the map is inherently sorted (and usually implemented as a red-black tree). It returns the element after the element you're searching for, and hence the iterator that it returns is decremented to obtain a useful iterator.
How about a Binary search tree?
Though you'll probably need a self balancing search tree: AVL Trees and Red black trees are a good start, but you probably should stick with something that you already have in your "Ecosystem" for .Net that would be a SortedDictionary,probably with a costum IComparer but that's just an example...
For C++ you should use an ordered map, with a custom compare here
Links:
http://en.wikipedia.org/wiki/Binary_search_tree
http://en.wikipedia.org/wiki/AVL_tree
http://en.wikipedia.org/wiki/Red-black_tree
http://msdn.microsoft.com/en-us/library/f7fta44c%28v=vs.110%29.aspx
http://msdn.microsoft.com/en-us/library/8ehhxeaf(v=vs.110).aspx
http://www.cplusplus.com/reference/map/map/
Again for C++, I would do something like (without compiler, so careful) :
class Comparer
{
bool operator()(pair<int,int> const & one, pair<int,int> const & two)
{
if (a.second < b.first)
{
return true;
}
else
{
return false;
}
}
};
then you define the map as:
map<pair<int,int>,int,Comparer> myMap;
myMap.insert(make_pair(make_pair(-3,9),1));
int num = myMap[make_pair(7,7)]

O(1) find value from a key in a range

What kind of data structure would allow me to get a corresponding value from a given key in a set of ordered range-like keys, where my key is not necessarily in the set.
Consider, [key, value]:
[3, 1]
[5, 2]
[10, 3]
Looking up 3 or 4 would return 1, 5 - 9 would return 2 and 10 would return 3. The ranges are not constant sized.
O(1) or like-O(1) is important, if possible.
A balanced binary search tree will give you O(log n).
what about a key-indexed array? Say, you know your keys are below 1000, you can simply fill a int[1000] with values, like this:
[0,0]
[1,0]
[2,0]
[3,1]
[4,1]
[5,2]
......
and so on. that'll give you o(1) performance, but huge memory overhead.
otherwise, a hash table is the closest i know of. hope it helps.
edit: look up red-black tree, it's a self balancing tree which has a worst case of o
(logn) in searching.
I would use i Dictionary in this scenario. Retrieving a value by using its key is very fast, close to O(1)
Dictionary<int, int> myDictionary = new Dictionary<int, int>();
myDictionary.Add(3,1);
myDictionary.Add(5,2);
myDictionary.Add(10,3);
//If you know the key exists, you can use
int value = myDictionary[3];
//If you don't know if the key is in the dictionary - use the TryGetValue method
int value;
if (myDictionary.TryGetValue(3, out value))
{
//The key was found and the corresponding value is stored in "value"
}
For more info: http://msdn.microsoft.com/en-us/library/xfhwa508.aspx

How many BSTs can you make by inserting 11 nodes to get a tree of height 3?

A BST is generated (by successive insertion of nodes) from each permutation of keys from the set
{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}.
How many permutations determine trees of height three?
The number of permutations of nodes you have to check is 11! = 39,916,800, so you could just write a program to brute-force this. Here's a skeleton of one, written in C++:
vector<int> values = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11};
unsigned numSuccesses = 0;
do {
if (bstHeightOf(values) == 3) values++;
} while (next_permutation(values.begin(), values.end());
Here, you just need to write the bstHeightOf function, which computes the height of a BST formed by inserting the given nodes in the specified order. I'll leave this as an exercise.
You can prune down the search space a bunch by using these observations:
The maximum number of nodes in a BST of height 2 is 7.
The root can't be 1, 2, 3, 9, 10, or 11, because if it were, one subtree would have more than 7 nodes in it and therefore the overall tree would have height greater than three.
Given that you know the possible roots, one option would be to generate all BSTs with the keys {1, 2, 3, ..., 11} (not by listing off all orderings, but by listing off all trees), filter it down just to the set of nodes with height 3, and then use this recursive algorithm to count the number of ways each tree can be built by inserting values. This would probably be significantly faster than the above approach, since the number of trees to check is much lower than the number of orderings and each tree can be checked in linear time.
Hope this helps!
An alternative to templatetypdef's answer that might be more tricky but can be done completely by hand.
Consider the complete binary tree of height 3: it has 15 nodes. You're looking for trees with 11 nodes; that means that four of those 15 nodes are missing. The patterns in which these missing nodes can occur can be enumerated with fairly little effort. (Hint: I did this by dividing the patterns into two groups.) This will give you all the shapes of trees of height 3 with 11 nodes.
Once you've done this, you just need to reason about the relationship between these tree shapes and the actual trees you're looking for. (Hint: this relationship is extremely simple - don't overthink it.)
This allows you to enumerate the resulting trees that satisfy the requirements. If you get to 96, you have the same result as I do. For each of these trees, we now need to find how many permutations give rise to that tree.
This part is the tricky part; you might now need to split these trees up into smaller groups for which you know, by symmetry, that the number of permutations that gives rise to that tree is the same for all trees in a group. For example,
6
/ \
/ \
3 8
/ \ / \
2 5 7 10
/ / / \
1 4 9 11
is going to have the same number of permutations that give rise to it as
6
/ \
/ \
4 9
/ \ / \
2 5 7 11
/ \ \ /
1 3 8 10
You'll also need to find out how many trees occur in each group; the class of this example contains 16 trees. (Hint: I split them up into 7 groups of between 2 and 32 trees.) Now you'll need to find the number of permutations that give rise to such a tree, for each group. You can determine this "recursively", still on paper; for the class containing the two example trees above, I get 12096 permutations. Since that class contains 16 trees, the total number of permutations leading to such a tree is 16*12069 = 193536. Do the same for the six other classes and add the numbers up to get the total.
If any particular part of this solution has you stumped or anything is unclear, don't hesitate to ask!
Since this site is about programming, I'll provide code to determine this. We can use a backtracking algorithm, that backtracks as soon as the height constraint is violated.
We can implement the BST as a flat array, where the children of a node at index k are stored at indices 2*k and 2*k + 1. The root is at index 1. Index 0 is not used. When an index is not occupied we can store a special value there, like -1.
The algorithm is quite brute force, and on my laptop it takes about a 1.5 seconds to complete:
function insert(tree, value) {
let k = 1;
while (k < tree.length) {
if (tree[k] == -1) {
tree[k] = value;
return k;
}
k = 2*k + (value > tree[k] ? 1 : 0);
}
return -1;
}
function populate(tree, values) {
if (values.length == 0) return 1; // All values were inserted! Count this permutation
let count = 0;
for (let i = 0; i < values.length; i++) {
let value = values[i]
let node = insert(tree, value);
if (node >= 0) { // Height is OK
values.splice(i, 1); // Remove this value from remaining values
count += populate(tree, values);
values.splice(i, 0, value); // Backtrack
tree[node] = -1; // Free the node
}
}
return count;
}
function countTrees(n) {
// Create an empty tree as flat array of height 3,
// and provide n unique values to insert
return populate(Array(16).fill(-1), [...Array(n).keys()]);
}
console.log(countTrees(11));
Output: 1056000

Effective sort-by-example algorithm

Assume we have an array of objects of length N (all objects have the same set of fields).
And we have an array of length N of the same type values, which represent certain object's field (e.g. array of numbers representing IDs).
Now we want to sort the array of objects by the field which is represented in the 2nd array and in the same order as in the 2nd array.
For example, here are 2 arrays (as in description) and expected result:
A = [ {id: 1, color: "red"}, {id: 2, color: "green"}, {id: 3, color: "blue"} ]
B = [ "green", "blue", "red"]
sortByColorByExample(A, B) ==
[ {id: 2, color: "green"}, {id: 3, color: "blue"}, {id: 1, color: "red"} ]
How to effectively implement 'sort-by-example' function? I can't come up with anything better then O(N^2).
This is assuming you have a bijection from elements in B to elements in A
Build a map (say M) from B's elements to their position (O(N))
For each element of A (O(N)), access the map to find where to put it in the sorted array (O(log(N)) with a efficient implementation of the map)
Total complexity: O(NlogN) time and O(N) space
Suppose we are sorting on an item's colour. Then create a dictionary d that maps each colour to a list of the items in A that have that colour. Then iterate across the colours in the list B, and for each colour c output (and remove) a value from the list d[c]. This runs in O(n) time with O(n) extra space for the dictionary.
Note that you have to decide what to do if A cannot be sorted according to the examples in B: do you raise an error? Choose the order that maximizes the number of matches? Or what?
Anyway, here's a quick implementation in Python:
from collections import defaultdict
def sorted_by_example(A, B, key):
"""Return a list consisting of the elements from the sequence A in the
order given by the sequence B. The function key takes an element
of A and returns the value that is used to match elements from B.
If A cannot be sorted by example, raise IndexError.
"""
d = defaultdict(list)
for a in A:
d[key(a)].append(a)
return [d[b].pop() for b in B]
>>> A = [{'id': 1, 'color': 'red'}, {'id': 2, 'color': 'green'}, {'id': 3, 'color': 'blue'}]
>>> B = ['green', 'blue', 'red']
>>> from operator import itemgetter
>>> sorted_by_example(A, B, itemgetter('color'))
[{'color': 'green', 'id': 2}, {'color': 'blue', 'id': 3}, {'color': 'red', 'id': 1}]
Note that this approach handles the case where there are multiple identical values in the sequence B, for example:
>>> A = 'proper copper coffee pot'.split()
>>> B = 'ccpp'
>>> ' '.join(sorted_by_example(A, B, itemgetter(0)))
'coffee copper pot proper'
Here when there are multiple identical values in B, we get the corresponding elements in A in reverse order, but this is just an artefact of the implementation: by using a collections.deque instead of a list (and popleft instead of pop), we could arrange to get the corresponding elements of A in the original order, if that were preferred.
Make an array of arrays, call it C of size B.length.
Loop through A. If it has color 'green' put it in C[0]. If it has a color of 'blue' put it in C[1], if it has a color of red put it in C[2].
When you're done go through C, and flatten it out to your original structure.
Wouldn't something along the lines of a merge sort be better? Create B.length arrays, one for each element inside B, and go through A, and place them in the appropriate smaller array then when it's all done merge the arrays together. It should be around O(2n)
Iterate through the first array and Make a HashMap of such fields versus the List of Objects. O(n) [assuming there are duplicate values of those key fields]
For eg. key = green will contain all objects with field value Green
Now iterate through the second array, get the list of objects from HashMap and store it in another array. O(k) .. (where k - distinct values of field)
The total running time is O(n) but it requires some additional memory in terms of a map and an auxiliary array
In the end you will get the array sorted as per your requirements.

Resources