How to store an arborescent tree in array? - data-structures

I want to store a tree in an array, while being able to compute father and child index easily from the current index, like in a binary heap. The tree has a single root node, which is at level 0. The tree has N levels, each node at level i has n(i) children.
Can this be done? How?
EDIT:
Clarification: You can store a (complete) binary tree, i.e. to store a heap, in a single array without explicitly storing the indices. Root goes at 0, children of the node in position i go in 2i+1 and 2i+2. So you can compute the children from the index of the parent node, without actually needing to store the index. The data structure is implicit in the data, see http://en.wikipedia.org/wiki/Binary_heap#Heap_implementation
My question: can you generalize this to a more general tree, as detailed above.

If I understand what you want to say ( each node at level i has n(i) children) then is very simple : The first number is the root fallowed by n(0) elements that are the root's children , then you put for all those n(0) nods all their n(1) nodes.
If you have n(0) = 3 then for the first you put n(1) nods, after them you put all the n(1) nods if the second nod, and and after those the n(1) nods for the 3rd nod
1 -> 2, 5, 3 ( 1 is the root, and has 2, 5, 3 as children)
2 -> 4, 10
3 -> 45, 35
5-> 12, 31
n(0) = 3, n(1) = 2 , n(2) = 0
Then You should have: {1, 2, 5, 3, 4, 10, 45, 35, 12, 31}
For a good index you should keep another array with the father position and another with the first child index or if you realy want to have just one array you should do this:
For each element keep 3 things: the father index and the first child index.
Because the child are one after another you will allways have access to all the children
and you will allways have the father. ( I will put -1 for the root's father)
Then you should have :
{1,-1, 3, 2, 0,12, 5, 0, x, 3, 0, x, 4, 3, x, ... }
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, ... }
-1 is the father of 1 and 3 is the start of his child
0 is the father of 1 and 12 is the start of his child ( 4 in this case)
If you want a "heap" structure you have to find the largest number of children Mx = ( max(n(i)), 1<=i<=N and do a heap with step MX, each element will have their children at pos*MX, pos*MX + 1, ..pos*MX + n(k), and the father at pos/MX, where the pos is the index of the node.
You will have a lot of free spaces but is a heap-like stuture
I hope it helps you.

Related

Is my postorder traversal of this graph correct?

I am trying to implement an algorithm that requires a post-order traversal. Here is my graph (taken from here, pg. 8):
When I try to do a postorder traversal of this, the order I get is:
[3, 2, 1, 5, 4, 6]
The problem with this order is that the algorithm won't work in this order. This is the code I am using to get it (pseudocode):
function PostOrder(root, out_list) {
root.visited = true
for child in root.Children {
if not child.visited {
PostOrder(child, out_list)
}
}
out_list.append(root)
}
Is the postorder correct?
Yes, the post order traversal of your algorithm is correct. The expected output is indeed as you provided it.
Your confusion may come from the fact that the graph is not a binary tree, and not even a tree. It is a directed graph.
In general postorder means that you first perform a postorder traversal on the node behind the first outgoing edge, then on the node behind its next outgoing edge, ...etc, and only after all outgoing edges have been traversed, the node itself is output.
Since at node 1 you are not at the end yet, and still can go to 2, and from there to 3, you need to follow those edges before outputting anything. And only then backtrack.
For reference, here is your algorithm implemented in python:
def postorder(root, out_list, children, visited):
visited[root] = True
for child in children[root]:
if not visited[child]:
postorder(child, out_list, children, visited)
out_list.append(root)
children = [
[], # dummy for node 0
[2], # 1
[1,3], # 2
[2], # 3
[2,3], # 4
[1], # 5
[5,4] # 6
]
nodes = []
postorder(6, nodes, children, [False] * len(children))
print(nodes) # [3, 2, 1, 5, 4, 6]
I think you got confused with the postorder traversal of binary trees.
Postorder traversal in graph is different.
Post Ordering in Graphs – If we list the vertices in the order in which they are last visited by DFS traversal then the ordering is called PostOrder.
Assuming your root is node is 6, the order mentioned gives the correct answer.
Checkout the following example on how the post order traversal list is generated:
Pass 1:
List:[]
6 -> 5 -> 1 -> 2 -> 3 (Now Node 3 has no adjacent nodes which are unvisited)
List: [3]
Pass 2:
6 -> 5 -> 1 -> 2
Node 2 has has no adjacent nodes which are unvisited.
List: [3, 2]
Pass 3:
6 -> 5 -> 1
Node 1 has has no adjacent nodes which are unvisited.
List: [3, 2, 1]
Pass 4:
6 -> 5
Node 5 has has no adjacent nodes which are unvisited.
List: [3, 2, 1, 5]
Pass 5:
6 -> 4
Node 4 has has no adjacent nodes which are unvisited.
List: [3, 2, 1, 5, 4]
Pass 6:
Node 6 has has no adjacent nodes which are unvisited.
List: [3, 2, 1, 5, 4, 6]
Important Notes:
As we are using DFS, there can be multiple paths possible depending upon the order of the nodes in the adjacency list.
Possible are the correct orders:
[3, 2, 1, 5, 4, 6]
[1, 3, 2, 4, 5, 6]
[3, 1, 2, 4, 5, 6]
[1, 2, 3, 4, 5, 6]

Is there a data structure for effective implementation of this encryption algorithm?

input -> alphabet -> output (index of a number in alphabet) -> new alphabet (the number moved to the begin of the alphabet):
3 -> [1, 2, 3, 4, 5] -> 3 -> [3, 1, 2, 4, 5]
2 -> [3, 1, 2, 4, 5] -> 3 -> [2, 3, 1, 4, 5]
1 -> [2, 3, 1, 4, 5] -> 3 -> [1, 2, 3, 4, 5]
1 -> [1, 2, 3, 4, 5] -> 1 -> [1, 2, 3, 4, 5]
4 -> [1, 2, 3, 4, 5] -> 4 -> [4, 1, 2, 3, 5]
5 -> [4, 1, 2, 3, 5] -> 5 -> [5, 4, 1, 2, 3]
input: (n - number of numbers in alphabet, m - length of text to be encrypted, the text)
5, 6
3 2 1 1 4 5
Answer: 3 2 1 1 4 5 -> 3 3 3 1 4 5
Is there any data structure or algorithm to make this efficiently, faster than O(n*m)?
I'd be appreciated for any ideas. Thanks.
Use an order statistics tree to store the pairs (1,1)...(n,n), ordered by their first elements.
Look up the translation for a character c by selecting the c-th smallest element of the tree and taking its second element.
Then update the tree by removing the node that you looked up and inserting it back into the tree with the first element of the pair set to -t, where t is the position in the message (or some other steadily decreasing counter).
Lookup, removal and insertion can be done in O(ln n) time worst-case if a self-balanced search tree (e.g. a red-black tree) is used as underlying tree structure for the order statistics tree.
Given that the elements for the initial tree are inserted in order, the tree structure can be build in O(n).
So the whole algorithm will be O(n + m ln n) time, worst-case.
You can further improve this for the case that n is larger than m, by storing only one node for any continuous range of nodes in the tree, but counting it for the purpose of rank in the order statistics tree according to the number of nodes there would normally be.
Starting then from only one actually stored node, when the tree is rearranged, you split the range-representing node into three: one node representing the range before the found value, one representing the range after the found value and one representing the actual value. These three nodes are then inserted back, in case of the range nodes only if they are non-empty and with the first pair element equal to the second and in case of the non-range node, with the negative value as described before. If a node with negative first entry is found, it is not split in this.
The result of this is that the tree will contain at most O(m) nodes, so the algorithm has a worst-time complexity of O(m ln min(n,m)).
Maybe a hashmap with letter/index pairs? I believe that element lookup in a hashmap usually O(1) most of the time, unless you have a lot of collisions (which is unlikely).

N non­ overlapping Optimal partition

Here is a problem I run into a few days ago.
Given a list of integer items, we want to partition the items into at most N non­overlapping, consecutive bins, in a way that minimizes the maximum number of items in any bin.
For example, suppose we are given the items (5, 2, 3, 6, 1, 6), and we want 3 bins. We can optimally partition these as follows:
n < 3: 1, 2 (2 items)
3 <= n < 6: 3, 5 (2 items)
6 <= n: 6, 6 (2 items)
Every bin has 2 items, so we can’t do any better than that.
Can anyone share your idea about this question?
Given n bins and an array with p items, here is one greedy algorithm you could use.
To minimize the max number of items in a bin:
p <= n Try to use p bins.
Simply try and put each item in it's own bin. If you have duplicate numbers then your average will be unavoidably worse.
p > n Greedily use all bins but try to keep each one's member count near floor(p / n).
Group duplicate numbers
Pad the largest duplicate bins that fall short of floor(p / n) with unique numbers to the left and right (if they exist).
Count the number of bins you have and determine the number mergers you need to make, let's call it r.
Repeat the following r times:
Check each possible neighbouring bin pairing; find and perform the minimum merger
Example
{1,5,6,9,8,8,6,2,5,4,7,5,2,4,5,3,2,8,7,5} 20 items to 4 bins
{1}{2, 2, 2}{3}{4, 4}{5, 5, 5, 5, 5}{6, 6}{7, 7}{8, 8, 8}{9} 1. sorted and grouped
{1, 2, 2, 2, 3}{4, 4}{5, 5, 5, 5, 5}{6, 6}{7, 7}{8, 8, 8, 9} 2. greedy capture by largest groups
{1, 2, 2, 2, 3}{4, 4}{5, 5, 5, 5, 5}{6, 6}{7, 7}{8, 8, 8, 9} 3. 6 bins but we want 4, so 2 mergers need to be made.
{1, 2, 2, 2, 3}{4, 4}{5, 5, 5, 5, 5}{6, 6, 7, 7}{8, 8, 8, 9} 3. first merger
{1, 2, 2, 2, 3, 4, 4}{5, 5, 5, 5, 5}{6, 6, 7, 7}{8, 8, 8, 9} 3. second merger
So the minimum achievable max was 7.
Here is some psudocode that will give you just one solution with the minimum bin quantity possible:
Sort the list of "Elements" with Element as a pair {Value, Quanity}.
So for example {5,2,3,6,1,6} becomes an ordered set:
Let S = {{1,1},{2,1},{3,1},{5,1},{6,2}}
Let A = the largest quanity of any particular value in the set
Let X = Items in List
Let N = Number of bins
Let MinNum = ceiling ( X / N )
if A > MinNum then Let MinNum = A
Create an array BIN(1 to N+1) of pointers to linked lists of elements.
For I from 1 to N
Remove as many elements from the front of S that are less than MinNum
and Add them to Bin(I)
Next I
Let Bin(I+1)=any remaining in S
LOOP while Bin(I+1) not empty
Let MinNum = MinNum + 1
For I from 1 to N
Remove as many elements from the front of Bin(I+1) so that Bin(I) is less than MinNum
and Add them to Bin(I)
Next I
END LOOP
Your minimum bin size possible will be MinNum and BIN(1) to Bin(N) will contain the distribution of values.

How many level order BST sequences are possible given a preOrder and inOrder sequence?

When I am trying to print level Order of BST, this question prompted me.
Here is a
Pre-Order Sequence: 4, 1, 2, 3, 5, 6, 7, 8
In_order Sequence : 1, 2, 3, 4, 5, 6, 7, 8
A level order sequence for a BST with above pre_order and In_order is
[4, 2, 6, 1, 3, 5, 7, 8]
However, for the same Pre-order an In-order sequence this level order sequence seems possible. [4, 1, 5, 2, 6, 3, 7, 8]. I don't know how. I am trying to figure this out.
I am unable to construct BST in paper (drawing) that satisfies all the pre_order, In-order and level order sequences.
If you have in-order traversal together with one of pre/post-order, that is enough to reconstruct a binary tree. Moreover, in case of BST (binary search tree), post-order or pre-order alone is sufficient.
In your case, reconstructing a BST from pre-order 4, 1, 2, 3, 5, 6, 7, 8 gives the following BST:
4
/ \
1 5
\ \
2 6
\ \
3 7
\
8
which gives, again unique, level-order traversal [4,1,5,2,6,3,7,8].
See also:
Reconstructing binary trees from tree traversals
Following combination will generate unique binary tree(which can be BST).
Inorder and Preorder.
Inorder and Postorder.
Inorder and Level-order.
So in your case inorder & pre order are given which will generate unique binary tree which is BST in your case so level order will be unique for that tree.
Pre-Order Sequence: 4, 1, 2, 3, 5, 6, 7, 8
In_order Sequence : 1, 2, 3, 4, 5, 6, 7, 8
SO tree will be
level 0- 4
level 1- 1,5
level 2- 2,6
level 3- 3,7
level 4- 8
Level order is
4,1,5,2,6,3,7,8
in sort there will always unique level order traversal

Efficient data structure for a list of index sets

I am trying to explain by example:
Imagine a list of numbered elements E = [elem0, elem1, elem2, ...].
One index set could now be {42, 66, 128} refering to elements in E. The ordering in this set is not important, so {42, 66, 128} == {66, 128, 42}, but each element is at most once in any given index set (so it is an actual set).
What I want now is a space efficient data structure that gives me another ordered list M that contains index sets that refer to elements in E. Each index set in M will only occur once (so M is a set in this regard) but M must be indexable itself (so M is a List in this sense, whereby the precise index is not important). If necessary, index sets can be forced to all contain the same number of elements.
For example, M could look like:
0: {42, 66, 128}
1: {42, 66, 9999}
2: {1, 66, 9999}
I could now do the following:
for(i in M[2]) { element = E[i]; /* do something with E[1],E[66],and E[9999] */ }
You probably see where this is going: You may now have another map M2 that is an ordered list of sets pointing into M which ultimately point to elements in E.
As you can see in this example, index sets can be relatively similar (M[0] and M[1] share the first two entries, M[1] and M[2] share the last two) which makes me think that there must be something more efficient than the naive way of using an array-of-sets. However, I may not be able to come up with a good global ordering of index entries that guarantee good "sharing".
I could think of anything ranging from representing M as a tree (where M's index comes from the depth-first search ordering or something) to hash maps of union-find structures (no idea how that would work though:)
Pointers to any textbook datastructure for something like this are highly welcome (is there anything in the world of databases?) but I also appreciate if you propose a "self-made" solution or only random ideas.
Space efficiency is important for me because E may contain thousands or even few million elements, (some) index sets are potentially large, similarities between at least some index sets should be substantial, and there may be multiple layers of mappings.
Thanks a ton!
You may combine all numbers from M and remove duplicates and name it as UniqueM.
All M[X] collections convert to bit masks. For example int value may store 32 numbers (To support of unlimited count you should store array of ints, if array size is 10 totally we can store 320 different elements). long type may store 64 bits.
E: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}
M[0]: {6, 8, 1}
M[1]: {2, 8, 1}
M[2]: {6, 8, 5}
Will be converted to:
UniqueM: {6, 8, 1, 2, 5}
M[0]: 11100 {this is 7}
M[1]: 01110 {this is 14}
M[2]: 11001 {this is 19}
Note:
Also you may combine my and ring0 approaches, instead of rearrange E make new UniqueM and use intervals inside it.
It will be pretty hard to beat an index. You could save some space by using the right data type (eg in gnu C, short if less than 64k elements in E, int if < 4G...).
Besides,
Since you say the order in E is not important, you could sort E a way it maximizes the consecutive elements to match as much as possible the Ms.
For instance,
E: { 1,2,3,4,5,6,7,8 }
0: {1,3,5,7}
1: {1,3,5,8}
2: {3,5,7,8}
By re-arranging E
E: { 1,3,5,7,8,2,4,6 }
and using E indexes, not values, you could define the Ms based on subsets of E, giving indexes
0: {0-3} // E[0]: 1, E[1]: 3, E[2]: 5, E[3]: 7 etc...
1: {0-2,4}
2: {1-3,4}
this way
you use indexes instead of the raw numbers (indexes are usually smaller, no negative..)
the Ms are made of sub-sets, 0-3 meaning 0,1,2,3,
The difficult part is to make the algorithm to re-arrange E so that you maximize the subsets sizes - minimize the Ms sizes.
E rearrangement algo suggestion
sort all Ms
process all Ms:
algo to build a map, which gives for an element 'x' its list of neighbors 'y', along with points, number of times 'y' is just after 'x'
Map map (x,y) -> z
for m in Ms
for e,f in m // e and f are consecutive elements
if ( ! map(e,f)) map(e,f) = 1
else map(e,f)++
rof
rof
Get E rearranged
ER = {} // E rearranged
Map mas = sort_map(map) // mas(x) -> list(y) where 'y' are sorted desc based on 'z'
e = get_min_elem(mas) // init with lowest element (regardless its 'z' scores)
while (mas has elements)
ER += e // add element e to ER
f = mas(e)[0] // get most likely neighbor of e (in f), ie first in the list
if (empty(mas(e))
e = get_min_elem(mas) // Get next lowest remaining value
else
delete mas(e)[0] // set next e neighbour in line
e = f
fi
elihw
The algo (map) should be O(n*m) space, with n elements in E, m elements in all Ms.
Bit arrays may be used. They're arrays of elements a[i] which are 1 if i is in set and 0 if i is not in set. So every set would occupy exactly size(E) bits even if it contain a few or no members. Not so space efficient, but if you compress this array with some compression algorithm it will be much less in size (possibly reaching ultimate entropy limit). So you can try dynamic Markov coder or RLE or group Huffman and choose one most efficient for you. Then, iteration process could include on-the-fly decompression followed by linear scanning for 1 bits. For looong 0 runs you could modify decompression algorithm to detect such cases (RLE is simplest case for it).
If you found sets having small defference, you may store sets A and A xor B anstead of A and B saving space for common parts. In this case to iterate over B you'll have to unpack both A and A xor B then xor them.
Another useful solution:
E: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}
M[0]: {1, 2, 3, 4, 5, 10, 14, 15}
M[1]: {1, 2, 3, 4, 5, 11, 14, 15}
M[2]: {1, 2, 3, 4, 5, 12, 13}
Cache frequently used items:
Cache[1] = {1, 2, 3, 4, 5}
Cache[2] = {14, 15}
Cache[3] = {-2, 7, 8, 9} //Not used just example.
M[0]: {-1, 10, -2}
M[1]: {-1, 11, -2}
M[2]: {-1, 12, 13}
Mark links to cached list as negative numbers.

Resources