Iterative-deepening depth first search using limited memory - depth-first-search

This is a follow-up to Find first null in binary tree with limited memory.
Wikipedia says that iterative-deepening depth first search will find the shortest path. I would like an implementation that is limited in memory to k nodes and accesses the tree the least number of times.
For instance, if my binary tree is:
0
1 2
3 4 5 6
7 8 9 10 11 12 13 14
And I'm limited to 5 nodes of memory than my search order is:
mem[0] = read node 0
mem[1] = read node 1
mem[2] = read node 2
mem[3] = read node 3
mem[4] = read node 4 //Now my memory is full. I continue...
mem[3] = read node 5 //overwrite where I stored node 3
mem[4] = read node 6 //overwrite where I stored node 4
Now if my next read is to 7, I need to re-read 3. But if I make my next read to 14, then I don't need to re-read 3 just yet. If the solution is at 14, this will make my algorithm a bit faster!
I'm looking for a general solution; something that will work for any size memory and number of branches per node.

If your nodes link to their parents, and the children of a node will always be enumerated in the same order, you can trace your steps without having to save them.

Related

Insert/remove the same element in heap sort

Show the heap at each stage when the following numbers are inserted to an initially empty min-heap in the given order: {11, 17, 13, 4, 4, 1 }. Now, show the heap at each stage when we successively perform the deleteMin operation on the heap until it is empty.
Here is the answer/checkpoint I receive:
![1]https://imgur.com/zu47RIF
I have 2 questions please:
I don't understand when we insert element 4 the second time, why do we shift 11 to make it the right child of the old element/firstly inserted element 4? Is it because we want to satisfy a requirement of the complete binary tree, which is each node in the levels from 1 to k - 2 has exactly 2 children (k = levels of the trees, level k is the bottom-most level)?
I don't understand how we deleteMin = 1, 13 becomes the right child of the newly parent 11 (which is the left child of 4). Just a quick note that my instructor gave the class 2 ways to deleteMin. The other way is fine with me - it's just the reversed process of inserting.
Like you said, the heap shape is an "almost complete tree": all levels are complete, except the lowest level which can be incomplete to the right. Therefore, the second 4 is necessarily added to the right of 17 to preserve the heap shape:
4
/ \
11 13
/ \
17 4
After that, 4 switches places with 11 to regain the min-heap property.
Deletions are typically implemented by removing the root and putting the last (i.e., bottom-rightmost) element in its place. This preserves the heap shape. The new root is then allowed to sift down in order to regain the min-heap property. So 13 becomes the new root:
13
/ \
4 4
/ \
17 11
Then 13 switches places with either child node. It looks like they chose the right-hand child in your example.

O(1) Algorithm for Counting Left-Children in Complete Binary Tree

I have a complete binary tree (i.e. a tree where "every level, except possibly the last, is completely filled, and all nodes are as far left as possible"). This tree is stored in depth-first, left-to-right order. My problem is, given a node by index and the tree's total size, tell me how many nodes are in that node's left subtree, in O(1).
For example, suppose the tree's total size is 10. This implies the following complete tree (note: the numbers are the node index in the depth-first, left-to-right order):
0
/ \
1 7
/ \ |\
2 5 8 9
/| /
3 4 6
Now, given a node index I need to find how many left-children it has. For this example:
Node 0 has 6 left-children.
Node 1 has 3 left-children.
Node 2 has 1 left-child.
Node 3 has 0 left-children.
Node 4 has 0 left-children.
Node 5 has 1 left-child.
Node 6 has 0 left-children.
Node 7 has 1 left-child.
Node 8 has 0 left-children.
Node 9 has 0 left-children.
Each such query must take O(1) time and be a function only of the node index and tree size (e.g., I cannot store anything in the tree).
I feel like this should be a fairly simple problem, but so-far I haven't been able to figure it out.
Strictly speaking, this is a bit of a simplification of my problem; I actually want 1+ this value, and I'll never call the function on leaves. But the core problem is this.

Cache-aware tree impementation

I have a tree where every node may have 0 to N children.
Use-case is the following query: Given pointers to two nodes: Are these nodes within the same branch of the tree?
Examples
q(2,7) => true
q(5,4) => false
By the book (slow)
The straight forward implementation would be to store a pointer to the parent and a pointer to a list of children at each node. But this would lead to bad performance because the tree would be fragmented in memory and therefor not cache-aware.
Question
What would be a good way to represent the tree in compact form? The whole tree has about 100,000 nodes. So it should be possible to find a way to make it fit completely in the CPU-cache.
Binary trees for example are often represented implicitly as an array and are therefor perfect to be completely stored in the CPU-cache (if small enough).
You can pre-allocate a contiguous block of memory where you concatenate the information for all nodes.
Afterwards, each node would only need a way to retrieve the beginning of its information, and the length of that information.
In this case, the information for each node could be represented by the parent, followed by the list of children (let's assume that we use -1 when there is no parent, i.e. for the root).
For example, for the tree posted in the question, the information for node 1 would be: -1 2 3 4, the information for node 2 is: 1 5, and so on.
The contiguous array would be obtained by concatenating these arrays, resulting in something like:
-1 2 3 4 1 5 1 9 10 1 11 12 13 14 2 3 5 5 5 3 3 4 4 4 15 4
Each node would use some metadata to allow retrieving its associated information. As mentioned, this metadata would need to consist of a startIndex and length. E.g. for node 3, we would have startIndex = 6, length = 3, which allows to retrieve the 1 9 10 subarray, indicating that the parent is node 1, and its children are nodes 9 and 10.
In addition, the metadata information can also be stored in the contiguous memory block, at the beginning. The metadata has fixed length for each node (two values), thus we can easily obtain the position of the metadata for a certain node, given its index.
In this way, all the information about the graph will be stored in a contiguous, cache-friendly, memory block.

Sequentially Constructing Full B-Trees

If I have a sorted set of data, which I want to store on disk in a way that is optimal for both reading sequentially and doing random lookups on, it seems that a B-Tree (or one of the variants is a good choice ... presuming this data-set does not all fit in RAM).
The question is can a full B-Tree be constructed from a sorted set of data without doing any page splits? So that the sorted data can be written to disk sequentially.
Constructing a "B+ tree" to those specifications is simple.
Choose your branching factor k.
Write the sorted data to a file. This is the leaf level.
To construct the next highest level, scan the current level and write out every kth item.
Stop when the current level has k items or fewer.
Example with k = 2:
0 1|2 3|4 5|6 7|8 9
0 2 |4 6 |8
0 4 |8
0 8
Now let's look for 5. Use binary search to find the last number less than or equal to 5 in the top level, or 0. Look at the interval in the next lowest level corresponding to 0:
0 4
Now 4:
4 6
Now 4 again:
4 5
Found it. In general, the jth item corresponds to items jk though (j+1)k-1 at the next level. You can also scan the leaf level linearly.
We can make a B-tree in one pass, but it may not be the optimal storage method. Depending on how often you make sequential queries vs random access ones, it may be better to store it in sequence and use binary search to service a random access query.
That said: assume that each record in your b-tree holds (m - 1) keys (m > 2, the binary case is a bit different). We want all the leaves on the same level and all the internal nodes to have at least (m - 1) / 2 keys. We know that a full b-tree of height k has (m^k - 1) keys. Assume that we have n keys total to store. Let k be the smallest integer such that m^k - 1 > n. Now if 2 m^(k - 1) - 1 < n we can completely fill up the inner nodes, and distribute the rest of the keys evenly to the leaf nodes, each leaf node getting either the floor or ceiling of (n + 1 - m^(k - 1))/m^(k - 1) keys. If we cannot do that then we know that we have enough to fill all of the nodes at depth k - 1 at least halfway and store one key in each of the leaves.
Once we have decided the shape of our tree, we need only do an inorder traversal of the tree sequentially dropping keys into position as we go.
Optimal meaning that an inorder traversal of the data will always be seeking forward through the file (or mmaped region), and a random lookup is done in a minimal number of seeks.

What is the name of a data structure looking like bowling pins?

First of all, sorry for the title. Someone please propose a better one, I really didn't know how to express my question properly.
Basically, I'm just looking for the name of a data structure where the elements look like this (ignore the dots):
......5
....3...2
..4...1...6
9...2...3...1
I first thought it could be a certain kind of "tree", but, as wikipedia says:
A tree is [...] an acyclic connected graph where each node has zero or more children nodes and at most one parent node
Since there can be more than one parent by node in the data structure I'm looking for, it's probably not a tree.
So, here's my question:
What is the name of a data structure that can represent data with the following links between the elements? (/ and \ being the links, again, ignore the dots):
......5
...../..\
....3...2
.../..\./..\
..4...1...6
../.\./..\./..\
9...2...3...1
I think it isn't totally wrong to call it a Tree, although "Digraph" (directed graph) would be a more proper term.
First of all, sorry for the title.
Someone please propose a better one, I
really didn't know how to express my
question properly.
The title is fine, I LOL'd hard when I opened the question. I am going to start calling them "Bowling Pins" now :)
5
3 2
4 1 6
9 2 3 1
The most popular thing I reckon, which was laid out like this, is Pascal's triangle. It's the structure used to calculate binomial coefficients; each node is the sum of its parents:
http://info.ee.surrey.ac.uk/Personal/L.Wood/publications/MSc-thesis/fig36.gif.
Usuallly, when it comes to implementing such algorithms (such class is commonly referred to as "dynamic programming"), this "structure" is usually represented as a simple two-dimensional array. See here, for example:
n\k 0 1 2 3 4
------------------
0 1 0 0 0 0
1 1 1 0 0 0
2 1 2 1 0 0
3 1 3 3 1 0
4 1 4 6 4 1
5 1 5 10 10 5
6 1 6 15 20 15
I think, that there's no formal name for such a structure, but in dynamic programming such stuff is just... arrays.
But from now on, as NullUserException suggests I'm totally calling it "bowling pins" :-)
A "dag", or directed acyclic graph, is a graph with directed edge in which there may be multiple paths to a node, and some nodes may have both incoming and outgoing edges, but there is no way to leave any node and return to it (there are no cycles). Note that in a finite DAG at least one node must have nothing but outgoing edges and at least one must have nothing but incoming edges. Were that not the case, it would be possible to move continuously through the graph without ever reaching a dead end (since every node would have an exit), and without visiting any node twice (since the graph is acyclic). If there are only a finite number of nodes, that's obviously impossible.
Since there can be more than one parent by node in the data structure I'm looking for, it's probably not a tree.
What you're looking for is probably a graph. A tree is a special case of a graph where each node has exactly one parent. (except the root which has none)

Resources