compaction in an array storing 2 linked lists - algorithm

An array Arr ( size n ) can represent doubly linked list.
[ Say the cells have struct { int val, next, prev; } ]
I have two lists A and B stored in the array.
A has m nodes and B has n - m nodes.
These nodes being scattered, I want to rearrange them such that all nodes of A are from Arr[0] .. Arr[m-1] and rest are filled by nodes of B, in O(m) time.
The solution that occurs to me is to :
Iterate A till a node occurs which is placed beyond Arr[m-1]
then, iterate B till a node occurs which is placed before Arr[m]
swap the two ( including the manipulation of the next prev links of them and their neighbours).
However in this case the total number of iterations is O(n + m). Hence there should be a better answer.
P.S:
This question occurs in Introduction to Algorithms, 2nd edition.
Problem 10.3-5

How about iterating through list A and placing each element in Arr[0] ... Arr[m-1], obviously swapping its position with whatever was there before and updating the prev/next links as well. There will be a lot of swapping but nevertheless it will be O(m) since once you finish iterating through A (m iterations), all of its elements will be located (in order, incidentally) in the first m slots of Arr, and thus B must be located entirely in the rest of Arr.
To add some pseudocode
a := index of head of A
for i in 0 ... m-1
swap Arr[i], Arr[a]
a := index of next element in A
end

i think "jw013" is right but the idea needs some improvements :
by swapping your are changing the address of elements in the Arr array .
so you need to be careful about that !
e.g. lets say we have Arr like :
indices: 0 1 2 3 4
| 2 | empty | 3 | empty | 1 | (assume the link list is like 1 -> 2 -> 3)
so Arr[4].next is 0 and Arr[0].next is 2 .
but when you swap Arr[4] and Arr[0] then Arr[0].next is 0 .
which is not what we want to happen so we should consider adjusting pointers when swapping.
so the code for it is like :
public static void compactify(int List_head , int Free , node [] array){
int List_lenght ;
List_lenght = find_listlenght(List_head , array);
if(List_lenght != 0){ // if the list is not empty
int a = List_head;
for (int i = 0; i < List_lenght ; i++) {
swap( array , a , i );
a = array[i].next;
print_mem(array);
}
}
}
now when calling swap:
private static void swap(node[] array, int a, int i) {
// adjust the next and prev of both array[a] and array[i]
int next_a = array[a].next;
int next_i = array[i].next;
int prev_a = array[a].prev;
int prev_i = array[i].prev;
// if array[a] has a next adjust the array[next_a].prev to i
if(next_a != -1)
array[next_a].prev = i;
// if array[i] has a next adjust the array[next_i].prev to a
if(next_i != -1)
array[next_i].prev = a;
// like before adjust the pointers of array[prev_a] and array[prev_i]
if(prev_a != -1)
array[prev_a].next = i;
if(prev_i != -1)
array[prev_i].next = a;
node temp = array[a];
array[a] = array[i];
array[i] = temp;
}

Related

Finding bounded nearest neighbour in a 1-dimensional array

Let's say we have some array of boolean values:
A = [0 1 1 1 1 0 0 0 0 0 0 0 1 1 1 0 0 0 1 1 0 1 1 1 1 0 0 0 ... 0]
The array is constructed by performing classification on a stream of data. Each element in the array corresponds to the output of a classification algorithm given a small "chunk" of the data. An answer may include restructuring the array to make parsing more efficient.
The array is pseudo random in the sense that groups of 1's and 0's tend to exist in bunches (but not necessarily always).
Given some index, i, what is the most efficient way to find the group of at least n zeros closest to A[i]? For the easy case, take n = 1.
EDIT: Groups should have AT LEAST n zeros. Again, for the easy case, that means at least 1 zero.
EDIT2: This search will be performed o(n) times, where n is size of the array. (Specifically, its n/c, where c is some fixed duration.
In this solution I organize the data so that you can use a binary search O(log n) to find the nearest group of at least a certain size.
I first create groups of zeros from the array, then I put each group of zeros into lists containing all groups of size s or larger , so that when you want to find the nearest group of s s or more then you just run a binary search in the list that has all groups with a size of s or greater.
The downside is in the pre-processing of putting the groups into the lists, with O(n * m) (I think, please check me) time and space efficiency where n is the number of groups of zeros, and m is the max size of the groups, though in reality the efficiency is probably better.
Here is the code:
public static class Group {
final public int x1;
final public int x2;
final public int size;
public Group(int x1, int x2) {
assert x1 <= x2;
this.x1 = x1;
this.x2 = x2;
this.size = x2 - x1 + 1;
}
public static final List<Group> getGroupsOfZeros(byte[] arr) {
List<Group> listOfGroups = new ArrayList<>();
for (int i = 0; i < arr.length; i++) {
if (arr[i] == 0) {
int x1 = i;
for (++i; i < arr.length; i++)
if (arr[i] != 0)
break;
int x2 = i - 1;
listOfGroups.add(new Group(x1, x2));
}
}
return Collections.unmodifiableList(listOfGroups);
}
public static final Group binarySearchNearest(int i, List<Group> list) {
{ // edge cases
Group firstGroup = list.get(0);
if (i <= firstGroup.x2)
return firstGroup;
Group lastGroup = list.get(list.size() - 1);
if (i >= lastGroup.x1)
return lastGroup;
}
int lo = 0;
int hi = list.size() - 1;
while (lo <= hi) {
int mid = (hi + lo) / 2;
Group currGroup = list.get(mid);
if (i < currGroup.x1) {
hi = mid - 1;
} else if (i > currGroup.x2) {
lo = mid + 1;
} else {
// x1 <= i <= x2
return currGroup;
}
}
// intentionally swapped because: lo == hi + 1
Group lowGroup = list.get(hi);
Group highGroup = list.get(lo);
return (i - lowGroup.x2) < (highGroup.x1 - i) ? lowGroup : highGroup;
}
}
NOTE: GroupsBySize can be improved, as described by #maraca to only contain a list of Groups per each distinct group size. I'll update tomorrow.
public static class GroupsBySize {
private List<List<Group>> listOfGroupsBySize = new ArrayList<>();
public GroupsBySize(List<Group> groups) {
for (Group group : groups) {
// ensure internal array can groups up to this size
while (listOfGroupsBySize.size() < group.size) {
listOfGroupsBySize.add(new ArrayList<Group>());
}
// add group to all lists up to its size
for (int i = 0; i < group.size; i++) {
listOfGroupsBySize.get(i).add(group);
}
}
}
public final Group getNearestGroupOfAtLeastSize(int index, int atLeastSize) {
if (atLeastSize < 1)
throw new IllegalArgumentException("group size must be greater than 0");
List<Group> groupsOfAtLeastSize = listOfGroupsBySize.get(atLeastSize - 1);
return Group.binarySearchNearest(index, groupsOfAtLeastSize);
}
}
public static void main(String[] args) {
byte[] byteArray = null;
List<Group> groups = Group.getGroupsOfZeros(byteArray);
GroupsBySize groupsBySize = new GroupsBySize(groups);
int index = 12;
int atLeastSize = 5;
Group g = groupsBySize.getNearestGroupOfAtLeastSize(index, atLeastSize);
System.out.println("nearest group is (" + g.x1 + ":" + g.x2 + ") of size " + g.size);
}
If you have n queries on an array of size n, then the naive approach would take O(n^2) time.
You can optimize this by incorporating the observation that the number of distinct group sizes is in the order of sqrt(n), because the most distinct group sizes we get if we have one group of size 1, one of size 2, one of size 3 and so on, we know that 1 + 2 + 3 + ... + n is n * (n + 1) / 2, so in the order of n^2, but the array has size n, so the number of distinct group sizes is in the order of sqrt(n).
create an integer array of size n to denote which group sizes are present how many times
create a list for the 0-groups, each element should contain the group size and starting index
scan the array, add the 0-groups to the list and update the present group sizes
create an array for the different group sizes, each entry should contain the group size and an array with the start indices of the groups
create an integer array or a map which tells you which group size is at which index by scanning the array of the present group sizes
go through the list of 0-groups and fill the start index arrays created at 4.
We end up with an array which takes O(n) space, takes O(n) time to create and contains all present group sizes in order, additionally each entry has an array with the start indices of the groups of that size.
To answer a query we can do a binary search on the start indices of all groups greater or equal than the given minimum group size. This takes O(log(n)*sqrt(n)) and we do it n times, so over all it would take O(n*log(n)*sqrt(n)) = O(n^1.5*log(n)) which is better than O(n^2).
I think you can get it down to O(n^1.5) by creating a structure which has all distinct group sizes but contains not only the groups of that size, but also the groups that are bigger than that size. This would be the time complexity to create the structure and answering all the n queries would be faster O(n*log(sqrt(n))*log(n)) I think, so it doesn't matter.
example:
[0 1 1 1 1 0 0 0 0 0 0 0 1 1 1 0 0, 1, 0, 0] -- 0 indexed array
hashmap = {1:[0], 2:[15, 18], 7:[5]}
search(i = 7, n = 2) {
binary search in {2:[15, 18], 7:[5]}
return min(15, 5)
}
what is the most efficient way to find the group of at least n zeros closest to A[i]
If we are not limited in preprocessing time and resources, the most efficient way would seem to be O(1) time and O(n * sqrt n) space, storing the answers to all possible queries. (To accomplish that, run the algorithm below with a list of all possible queries, that is each distinct zero-group size in the array paired with each index.)
If we are provided with all the n / c queries at once, we can produce the complete result set in O(n log n) time.
Traverse once from the left and once from the right. For each traversal, start with a balanced binary tree with our queries, sorted by zero-group-size (the n in the query), where each node has a sorted list of the query indexes (all is with this particular n).
At each iteration, when a zero-group is registered, update all queries with n equal and lower than this zero-group size, removing all equal and lower indexes from the node and keeping the records for them (since the index list is sorted, we just remove the head of the list while it's equal or lower than the current index), and storing the current index of the zero-group in the node as well (the "last seen" zero-group-index). If no is are left in the node, remove it.
After the traversal, assign each node's "last seen" zero-group-index to any remaining is in that node. Now we have all the answers for this traversal. (Any queries left in the tree have no answer.) In the opposite traversal, if any query comes up with a better (closer) answer, update it in the final record.

Efficient data structure for storing a long sequence of (mostly consecutive) integers

I'd like a data structure to efficiently store a long sequence of numbers. The numbers should always be whole integers, let's say Longs.
The feature of the inputs I'd like to leverage (to claim 'efficiency') is that the longs will be mostly consecutive. There can be missing values. And the values can be interacted with out of order.
I'd like the data structure to support the following operations:
addVal(n): add a single value, n
addRange(n, m): add all values between n and m, inclusive
delVal(n): remove a single value, n
delRange(n, m): remove all values between n and m, inclusive
containsVal(n): return whether a single value, n, exists in the structure
containsRange(n, m): return whether all values between n and m, incluse, exist in the structure
In essence this is a more specific kind of Set data structure that can leverage the continuity of the data to use less than O(n) memory, where n is the number of values stored.
To be clear, while I think an efficient implementation of such a data structure will require that we store intervals internally, that isn't visible or relevant to the user. There are some interval trees that store multiple intervals separately and allow operations to find the number of intervals that overlap with a given point or interval. But from the user perspective this should behave exactly like a set (except for the range-based operations so bulk additions and deletions can be handled efficiently).
Example:
dataStructure = ...
dataStructure.addRange(1,100) // [(1, 100)]
dataStructure.addRange(200,300) // [(1, 100), (200, 300)]
dataStructure.addVal(199) // [(1, 100), (199, 300)]
dataStructure.delRange(50,250) // [(1, 49), (251, 300)]
My assumption is this would best be implemented by some tree-based structure but I don't have a great impression about how to do that yet.
I wanted to learn if there was some commonly used data structure that already satisfies this use case, as I'd rather not reinvent the wheel. If not, I'd like to hear how you think this could best be implemented.
If you don't care about duplicates, then your intervals are non-overlapping. You need to create a structure that maintains that invariant. If you need a query like numIntervalsContaining(n) then that is a different problem.
You could use a BST that stores pairs of endpoints, as in a C++ std::set<std::pair<long,long>>. The interpretation is that each entry corresponds to the interval [n,m]. You need a weak ordering - it is the usual integer ordering on the left endpoint. A single int or long n is inserted as [n,n]. We have to maintain the property that all node intervals are non-overlapping. A brief evaluation of the order of your operations is as follows. Since you've already designated n I use N for the size of the tree.
addVal(n): add a single value, n : O(log N), same as a std::set<int>. Since the intervals are non-overlapping, you need to find the predecessor of n, which can be done in O(log n) time (break it down by cases as in https://www.quora.com/How-can-you-find-successors-and-predecessors-in-a-binary-search-tree-in-order). Look at the right endpoint of this predecessor, and extend the interval or add an additional node [n,n] if necessary, which by left-endpoint ordering will always be a right child. Note that if the interval is extended (inserting [n+1,n+1] into a tree with node [a,n] forming the node [a,n+1]) it may now bump into the next left endpoint, requiring another merge. So there are a few edge cases to consider. A little more complicated than a simple BST, but still O(log N).
addRange(n, m): O(log N), process is similar. If the interval inserted intersects nontrivially with another, merge the intervals so that the non-overlapping property is maintained. The worst case performance is O(n) as pointed out below as there is no upper limit on the number of subintervals consumed by the one we are inserting.
delVal(n): O(log N), again O(n) worst case as we don't know how many intervals are contained in the interval we are deleting.
delRange(n, m): remove all values between n and m, inclusive : O(log N)
containsVal(n): return whether a single value, n, exists in the structure : O(log N)
containsRange(n, m): return whether all values between n and m, inclusive, exist in the structure : O(log N)
Note that we can maintain the non-overlapping property with correct add() and addRange() methods, it is already maintained by the delete methods. We need O(n) storage at the worst.
Note that all operations are O(log N), and inserting the range [n,m] is nothing like O(m-n) or O(log(m-n)) or anything like that.
I assume you don't care about duplicates, just membership. Otherwise you may need an interval tree or KD-tree or something, but those are more relevant for float data...
Another alternative might be a rope data structure ( https://en.m.wikipedia.org/wiki/Rope_(data_structure) ), which seems to support the operations you are asking for, implemented in O(log n) time. As opposed to the example in Wikipedia, yours would store [start,end] rather than string subsequences.
What's interesting about the rope is its efficient lookup of index-within-interval. It accomplishes this by ordering all value positions from left to right - a lower to higher positioning (of which your intervals would be a straightforward representation) can be either upwards or downwards as long as the movement is to the right - as well as relying on storing subtree size, which orients current position based on the weight on the left. Engulfing partial intervals by larger encompassing intervals could be accomplished in O(log n) time by updating and unlinking relevant tree segments.
The problem with storing each interval as a (start,end) couple is that if you add a new range that encompasses N previously stored intervals, you have to destroy each of these intervals, which takes N steps, whether the intervals are stored in a tree, a rope or a linked list.
(You could leave them for automatic garbage collection, but that will take time too, and only works in some languages.)
A possible solution for this could be to store the values (not the start and end point of intervals) in an N-ary tree, where each node represents a range, and stores two N-bit maps, representing N sub-ranges and whether the values in those sub-ranges are all present, all absent, or mixed. In the case of mixed, there would be a pointer to a child node which represents this rub-range.
Example: (using a tree with branching factor 8 and height 2)
full range: 0-511 ; store interval 100-300
0-511:
0- 63 64-127 128-191 192-255 256-319 320-383 384-447 448-511
0 mixed 1 1 mixed 0 0 0
64-127:
64- 71 72- 79 80- 87 88- 95 96-103 104-111 112-119 120-127
0 0 0 0 mixed 1 1 1
96-103:
96 97 98 99 100 101 102 103
0 0 0 0 1 1 1 1
256-319:
256-263 264-271 272-279 280-287 288-295 296-303 304-311 312-319
1 1 1 1 1 mixed 0 0
296-303:
296 297 298 299 300 301 302 303
1 1 1 1 1 0 0 0
So the tree would contain these five nodes:
- values: 00110000, mixed: 01001000, 2 pointers to sub-nodes
- values: 00000111, mixed: 00001000, 1 pointer to sub-node
- values: 00001111, mixed: 00000000
- values: 11111000, mixed: 00000100, 1 pointer to sub-node
- values: 11111000, mixed: 00000000
The point of storing the interval this way is that you can discard an interval without having to actually delete it. Let's say you add a new range 200-400; in that case, you'd change the range 256-319 in the root node from "mixed" to "1", without deleting or updating the 256-319 and 296-303 nodes themselves; these nodes can be kept for later re-use (or disconnected and put in a queue of re-usable sub-trees, or deleted in a programmed garbage-collection when the programme is idling or running low on memory).
When looking up an interval, you only have to go as deep down the tree as necessary; when looking up e.g. 225-275, you'd find that 192-255 is all-present, 256-319 is mixed, 256-263 and 264-271 and 272-279 are all-present, and you'd know the answer is true. Since these values would be stored as bitmaps (one for present/absent, one for mixed/solid), all the values in a node could be checked with only a few bitwise comparisons.
Re-using nodes:
If a node has a child node, and the corresponding value is later set from mixed to all-absent or all-present, the child node no longer holds relevant values (but it is being ignored). When the value is changed back to mixed, the child node can be updated by setting all its values to its value in the parent node (before it was changed to mixed) and then making the necessary changes.
In the example above, if we add the range 0-200, this will change the tree to:
- values: 11110000, mixed: 00001000, 2 pointers to sub-nodes
- (values: 00000111, mixed: 00001000, 1 pointer to sub-node)
- (values: 00001111, mixed: 00000000)
- values: 11111000, mixed: 00000100, 1 pointer to sub-node
- values: 11111000, mixed: 00000000
The second and third node now contain outdated values, and are being ignored. If we then delete the range 80-95, the value for range 64-127 in the root node is changed to mixed again, and the node for range 64-127 is re-used. First we set all values in it to all-present (because that was the previous value of the parent node), and then we set the values for 80-87 and 88-95 to all-absent. The third node, for range 96-103 remains out-of-use.
- values: 00110000, mixed: 01001000, 2 pointers to sub-nodes
- values: 11001111, mixed: 00000000, 1 pointer to sub-node
- (values: 00001111, mixed: 00000000)
- values: 11111000, mixed: 00000100, 1 pointer to sub-node
- values: 11111000, mixed: 00000000
If we then added value 100, the value for range 96-103 in the second node would be changed to mixed again, and the third node would be updated to all-absent (its previous value in the second node) and then value 100 would be set to present:
- values: 00110000, mixed: 01001000, 2 pointers to sub-nodes
- values: 11000111, mixed: 00001000, 1 pointer to sub-node
- values: 00001000, mixed: 00000000
- values: 11111000, mixed: 00000100, 1 pointer to sub-node
- values: 11111000, mixed: 00000000
At first it may seem that this data structure uses a lot of storage space compared to solutions which store the intervals as (start,end) pairs. However, let's look at the (theoretical) worst-case scenario, where every even number is present and every odd number is absent, across the whole 64-bit range:
Total range: 0 ~ 18,446,744,073,709,551,615
Intervals: 9,223,372,036,854,775,808
A data structure which stores these as (start,end) pairs would use:
Nodes: 9,223,372,036,854,775,808
Size of node: 16 bytes
TOTAL: 147,573,952,589,676,412,928 bytes
If the data structure uses nodes which are linked via (64-bit) pointers, that would add:
Data: 147,573,952,589,676,412,928 bytes
Pointers: 73,786,976,294,838,206,456 bytes
TOTAL: 221,360,928,884,514,619,384 bytes
An N-ary tree with branching factor 64 (and 16 for the last level, to get a total range of 10×6 + 1×4 = 64 bits) would use:
Nodes (branch): 285,942,833,483,841
Size of branch: 528 bytes
Nodes (leaf): 18,014,398,509,481,984
Size of leaf: 144 bytes
TOTAL: 2,745,051,201,444,873,744 bytes
which is 53.76 times less than (start,end) pair structures (or 80.64 times less including pointers).
The calculation was done with the following N-ary tree:
Branch (9 levels):
value: 64-bit map
mixed: 64-bit map
sub-nodes: 64 pointers
TOTAL: 528 bytes
Leaf:
value: 64-bit map
mixed: 64-bit map
sub-nodes: 64 16-bit maps (more efficient than pointing to sub-node)
TOTAL: 144 bytes
This is of course a worst-case comparison; the average case would depend very much on the specific input.
Here's a first code example I wrote to test the idea. The nodes have branching factor 16, so that every level stores 4 bits of the integers, and common bit depths can be obtained without different leaves and branches. As an example, a tree of depth 3 is created, representing a range of 4×4 = 16 bits.
function setBits(pattern, value, mask) { // helper function (make inline)
return (pattern & ~mask) | (value ? mask : 0);
}
function Node(value) { // CONSTRUCTOR
this.value = value ? 65535 : 0; // set all values to 1 or 0
this.mixed = 0; // set all to non-mixed
this.sub = null; // no pointer array yet
}
Node.prototype.prepareSub = function(pos, mask, value) {
if ((this.mixed & mask) == 0) { // not mixed, child possibly outdated
var prev = (this.value & mask) >> pos;
if (value == prev) return false; // child doesn't require setting
if (!this.sub) this.sub = []; // create array of pointers
if (this.sub[pos]) {
this.sub[pos].value = prev ? 65535 : 0; // update child node values
this.sub[pos].mixed = 0;
}
else this.sub[pos] = new Node(prev); // create new child node
}
return true; // child requires setting
}
Node.prototype.set = function(value, a, b, step) {
var posA = Math.floor(a / step), posB = Math.floor(b / step);
var maskA = 1 << posA, maskB = 1 << posB;
a %= step; b %= step;
if (step == 1) { // node is leaf
var vMask = (maskB | (maskB - 1)) ^ (maskA - 1); // bits posA to posB inclusive
this.value = setBits(this.value, value, vMask);
}
else if (posA == posB) { // only 1 sub-range to be set
if (a == 0 && b == step - 1) // a-b is full sub-range
{
this.value = setBits(this.value, value, maskA);
this.mixed = setBits(this.mixed, 0, maskA);
}
else if (this.prepareSub(posA, maskA, value)) { // child node requires setting
var solid = this.sub[posA].set(value, a, b, step >> 4); // recurse
this.value = setBits(this.value, solid ? value : 0, maskA); // set value
this.mixed = setBits(this.mixed, solid ? 0 : 1, maskA); // set mixed
}
}
else { // multiple sub-ranges to set
var vMask = (maskB - 1) ^ (maskA | (maskA - 1)); // bits between posA and posB
this.value = setBits(this.value, value, vMask); // set inbetween values
this.mixed &= ~vMask; // set inbetween to solid
var solidA = true, solidB = true;
if (a != 0 && this.prepareSub(posA, maskA, value)) { // child needs setting
solidA = this.sub[posA].set(value, a, step - 1, step >> 4);
}
if (b != step - 1 && this.prepareSub(posB, maskB, value)) { // child needs setting
solidB = this.sub[posB].set(value, 0, b, step >> 4);
}
this.value = setBits(this.value, solidA ? value : 0, maskA); // set value
this.value = setBits(this.value, solidB ? value : 0, maskB);
if (solidA) this.mixed &= ~maskA; else this.mixed |= maskA; // set mixed
if (solidB) this.mixed &= ~maskB; else this.mixed |= maskB;
}
return this.mixed == 0 && this.value == 0 || this.value == 65535; // solid or mixed
}
Node.prototype.check = function(a, b, step) {
var posA = Math.floor(a / step), posB = Math.floor(b / step);
var maskA = 1 << posA, maskB = 1 << posB;
a %= step; b %= step;
var vMask = (maskB - 1) ^ (maskA | (maskA - 1)); // bits between posA and posB
if (step == 1) {
vMask = posA == posB ? maskA : vMask | maskA | maskB;
return (this.value & vMask) == vMask;
}
if (posA == posB) {
var present = (this.mixed & maskA) ? this.sub[posA].check(a, b, step >> 4) : this.value & maskA;
return !!present;
}
var present = (this.mixed & maskA) ? this.sub[posA].check(a, step - 1, step >> 4) : this.value & maskA;
if (!present) return false;
present = (this.mixed & maskB) ? this.sub[posB].check(0, b, step >> 4) : this.value & maskB;
if (!present) return false;
return (this.value & vMask) == vMask;
}
function NaryTree(factor, depth) { // CONSTRUCTOR
this.root = new Node();
this.step = Math.pow(factor, depth);
}
NaryTree.prototype.addRange = function(a, b) {
this.root.set(1, a, b, this.step);
}
NaryTree.prototype.delRange = function(a, b) {
this.root.set(0, a, b, this.step);
}
NaryTree.prototype.hasRange = function(a, b) {
return this.root.check(a, b, this.step);
}
var intervals = new NaryTree(16, 3); // create tree for 16-bit range
// CREATE RANDOM DATA AND RUN TEST
document.write("Created N-ary tree for 16-bit range.<br>Randomly adding/deleting 100000 intervals...");
for (var test = 0; test < 100000; test++) {
var a = Math.floor(Math.random() * 61440);
var b = a + Math.floor(Math.random() * 4096);
if (Math.random() > 0.5) intervals.addRange(a,b);
else intervals.delRange(a,b);
}
document.write("<br>Checking a few random intervals:<br>");
for (var test = 0; test < 8; test++) {
var a = Math.floor(Math.random() * 65280);
var b = a + Math.floor(Math.random() * 256);
document.write("Tree has interval " + a + "-" + b + " ? " + intervals.hasRange(a,b),".<br>");
}
I ran a test to check how many nodes are being created, and how many of these are active or dormant. I used a total range of 24-bit (so that I could test the worst-case without running out of memory), divided into 6 levels of 4 bits (so each node has 16 sub-ranges); the number of nodes that need to be checked or updated when adding, deleting or checking an interval is 11 or less. The maximum number of nodes in this scheme is 1,118,481.
The graph below shows the number of active nodes when you keep adding/deleting random intervals with range 1 (single integers), 1~16, 1~256 ... 1~16M (the full range).
Adding and deleting single integers (dark green line) creates active nodes up to close to the maximum 1,118,481 nodes, with almost no nodes being made dormant. The maximum is reached after adding and deleting around 16M integers (= the number of integers in the range).
If you add and delete random intervals in a larger range, the number of nodes that are created is roughly the same, but more of them are being made dormant. If you add random intervals in the full 1~16M range (bright yellow line), less than 64 nodes are active at any time, no matter how many intervals you keep adding or deleting.
This already gives an idea of where this data structure could be useful as opposed to others: the more nodes are being made dormant, the more intervals/nodes would need to be deleted in other schemes.
On the other hand, it shows how this data structure may be too space-inefficient for certain ranges, and types and amounts of input. You could introduce a dormant node recycling system, but that takes away the advantage of the dormant nodes being immediately reusable.
A lot of space in the N-ary tree is taken up by pointers to child nodes. If the complete range is small enough, you could store the tree in an array. For a 32-bit range that would take 580 MB (546 MB for the "value" bitmaps and 34 MB for the "mixed" bitmaps). This is more space-efficient because you only store the bitmaps, and you don't need pointers to child nodes, because everything has a fixed place in the array. You'd have the advantage of a tree with depth 7, so any operation could be done by checking 15 "nodes" or fewer, and no nodes need to be created or deleted during add/delete/check operations.
Here's a code example I used to try out the N-ary-tree-in-an-array idea; it uses 580MB to store a N-ary tree with branching factor 16 and depth 7, for a 32-bit range (unfortunately, a range above 40 bits or so is probably beyond the memory capabilities of any normal computer). In addition to the requested functions, it can also check whether an interval is completely absent, using notValue() and notRange().
#include <iostream>
inline uint16_t setBits(uint16_t pattern, uint16_t mask, uint16_t value) {
return (pattern & ~mask) | (value & mask);
}
class NaryTree32 {
uint16_t value[0x11111111], mixed[0x01111111];
bool set(uint32_t a, uint32_t b, uint16_t value = 0xFFFF, uint8_t bits = 28, uint32_t offset = 0) {
uint8_t posA = a >> bits, posB = b >> bits;
uint16_t maskA = 1 << posA, maskB = 1 << posB;
uint16_t mask = maskB ^ (maskA - 1) ^ (maskB - 1);
// IF NODE IS LEAF: SET VALUE BITS AND RETURN WHETHER VALUES ARE MIXED
if (bits == 0) {
this->value[offset] = setBits(this->value[offset], mask, value);
return this->value[offset] != 0 && this->value[offset] != 0xFFFF;
}
uint32_t offsetA = offset * 16 + posA + 1, offsetB = offset * 16 + posB + 1;
uint32_t subMask = ~(0xFFFFFFFF << bits);
a &= subMask; b &= subMask;
// IF SUB-RANGE A IS MIXED OR HAS WRONG VALUE
if (((this->mixed[offset] & maskA) != 0 || (this->value[offset] & maskA) != (value & maskA))
&& (a != 0 || posA == posB && b != subMask)) {
// IF SUB-RANGE WAS PREVIOUSLY SOLID: UPDATE TO PREVIOUS VALUE
if ((this->mixed[offset] & maskA) == 0) {
this->value[offsetA] = (this->value[offset] & maskA) ? 0xFFFF : 0x0000;
if (bits != 4) this->mixed[offsetA] = 0x0000;
}
// RECURSE AND IF SUB-NODE IS MIXED: SET MIXED BIT AND REMOVE A FROM MASK
if (this->set(a, posA == posB ? b : subMask, value, bits - 4, offsetA)) {
this->mixed[offset] |= maskA;
mask ^= maskA;
}
}
// IF SUB-RANGE B IS MIXED OR HAS WRONG VALUE
if (((this->mixed[offset] & maskB) != 0 || (this->value[offset] & maskB) != (value & maskB))
&& b != subMask && posA != posB) {
// IF SUB-RANGE WAS PREVIOUSLY SOLID: UPDATE SUB-NODE TO PREVIOUS VALUE
if ((this->mixed[offset] & maskB) == 0) {
this->value[offsetB] = (this->value[offset] & maskB) ? 0xFFFF : 0x0000;
if (bits > 4) this->mixed[offsetB] = 0x0000;
}
// RECURSE AND IF SUB-NODE IS MIXED: SET MIXED BIT AND REMOVE A FROM MASK
if (this->set(0, b, value, bits - 4, offsetB)) {
this->mixed[offset] |= maskB;
mask ^= maskB;
}
}
// SET VALUE AND MIXED BITS THAT HAVEN'T BEEN SET YET AND RETURN WHETHER NODE IS MIXED
if (mask) {
this->value[offset] = setBits(this->value[offset], mask, value);
this->mixed[offset] &= ~mask;
}
return this->mixed[offset] != 0 || this->value[offset] != 0 && this->value[offset] != 0xFFFF;
}
bool check(uint32_t a, uint32_t b, uint16_t value = 0xFFFF, uint8_t bits = 28, uint32_t offset = 0) {
uint8_t posA = a >> bits, posB = b >> bits;
uint16_t maskA = 1 << posA, maskB = 1 << posB;
uint16_t mask = maskB ^ (maskA - 1) ^ (maskB - 1);
// IF NODE IS LEAF: CHECK BITS A TO B INCLUSIVE AND RETURN
if (bits == 0) {
return (this->value[offset] & mask) == (value & mask);
}
uint32_t subMask = ~(0xFFFFFFFF << bits);
a &= subMask; b &= subMask;
// IF SUB-RANGE A IS MIXED AND PART OF IT NEEDS CHECKING: RECURSE AND RETURN IF FALSE
if ((this->mixed[offset] & maskA) && (a != 0 || posA == posB && b != subMask)) {
if (this->check(a, posA == posB ? b : subMask, value, bits - 4, offset * 16 + posA + 1)) {
mask ^= maskA;
}
else return false;
}
// IF SUB-RANGE B IS MIXED AND PART OF IT NEEDS CHECKING: RECURSE AND RETURN IF FALSE
if (posA != posB && (this->mixed[offset] & maskB) && b != subMask) {
if (this->check(0, b, value, bits - 4, offset * 16 + posB + 1)) {
mask ^= maskB;
}
else return false;
}
// CHECK INBETWEEN BITS (AND A AND/OR B IF NOT YET CHECKED) WHETHER SOLID AND CORRECT
return (this->mixed[offset] & mask) == 0 && (this->value[offset] & mask) == (value & mask);
}
public:
NaryTree32() { // CONSTRUCTOR: INITIALISES ROOT NODE
this->value[0] = 0x0000;
this->mixed[0] = 0x0000;
}
void addValue(uint32_t a) {this->set(a, a);}
void addRange(uint32_t a, uint32_t b) {this->set(a, b);}
void delValue(uint32_t a) {this->set(a, a, 0);}
void delRange(uint32_t a, uint32_t b) {this->set(a, b, 0);}
bool hasValue(uint32_t a) {return this->check(a, a);}
bool hasRange(uint32_t a, uint32_t b) {return this->check(a, b);}
bool notValue(uint32_t a) {return this->check(a, a, 0);}
bool notRange(uint32_t a, uint32_t b) {return this->check(a, b, 0);}
};
int main() {
NaryTree32 *tree = new NaryTree32();
tree->addRange(4294967280, 4294967295);
std::cout << tree->hasRange(4294967280, 4294967295) << "\n";
tree->delValue(4294967290);
std::cout << tree->hasRange(4294967280, 4294967295) << "\n";
tree->addRange(1000000000, 4294967295);
std::cout << tree->hasRange(4294967280, 4294967295) << "\n";
tree->delRange(2000000000, 4294967280);
std::cout << tree->hasRange(4294967280, 4294967295) << "\n";
return 0;
}
Interval trees seem to be geared toward storing overlapping intervals, while in your case that doesn't make sense. An interval tree could hold millions of small overlapping intervals, which together form only a handful of longer non-overlapping intervals.
If you want to store only non-overlapping intervals, then adding or deleting an interval may involve deleting a number of consecutive intervals that fall within the new interval. So quickly finding consecutive intervals, and efficient deletion of a potentially large number of intervals are important.
That sounds like a job for the humble linked list. When inserting a new interval, you'd:
Search the position of the new interval's starting point.
If it is inside an existing interval, go on to find the position of the end point, while extending the existing interval and deleting all intervals you pass on the way.
If it is inbetween existing intervals, check if the end point comes before the next existing interval. If it does, create a new interval. If the end point comes after the start of the next existing interval, change the starting point of the next interval, and then go on to find the end point as explained in the previous paragraph.
Deleting an interval would be largely the same: you truncate the intervals that the starting point and end point are inside of, and delete all the intervals inbetween.
The average and worst-case complexity of this are N/2 and N, where N is the number of intervals in the linked list. You could improve this by adding a method to avoid having to iterate over the whole list to find the starting point; if you know the range and distribution of the values, this could be something like a hash table; e.g. if the values are from 1 to X and the distribution is uniform, you'd store a table of length Y, where each item points to the interval that starts before the value X/Y. When adding an interval (A,B), you'd look up table[A/Y] and start iterating over the linked list from there. The choice of value for Y would be determined by how much space you want to use, versus how close you want to get to the actual position of the starting point. The complexities would then drop by a factor Y.
(If you work in a language where you can short-circuit a linked list, and just leave the chain of objects you cut loose to be garbage-collected, you could find the location of the starting point and end point independently, connect them, and skip the deletion of all the intervals inbetween. I don't know whether this would actually increase speed in practice.)
Here's a start of a code example, with the three range functions, but without further optimisation:
function Interval(a, b, n) {
this.start = a;
this.end = b;
this.next = n;
}
function IntervalList() {
this.first = null;
}
IntervalList.prototype.addRange = function(a, b) {
if (!this.first || b < this.first.start - 1) {
this.first = new Interval(a, b, this.first); // insert as first element
return;
}
var i = this.first;
while (a > i.end + 1 && i.next && b >= i.next.start - 1) {
i = i.next; // locate starting point
}
if (a > i.end + 1) { // insert as new element
i.next = new Interval(a, b, i.next);
return;
}
var j = i.next;
while (j && b >= j.start - 1) { // locate end point
i.end = j.end;
i.next = j = j.next; // discard overlapping interval
}
if (a < i.start) i.start = a; // update interval start
if (b > i.end) i.end = b; // update interval end
}
IntervalList.prototype.delRange = function(a, b) {
if (!this.first || b < this.first.start) return; // range before first interval
var i = this.first;
while (i.next && a > i.next.start) i = i.next; // a in or after interval i
if (a > i.start) { // a in interval
if (b < i.end) { // range in interval -> split
i.next = new Interval(b + 1, i.end, i.next);
i.end = a - 1;
return;
}
if (a <= i.end) i.end = a - 1; // truncate interval
}
var j = a > i.start ? i.next : i;
while (j && b >= j.end) j = j.next; // b before or in interval j
if (a <= this.first.start) this.first = j; // short-circuit list
else i.next = j;
if (j && b >= j.start) j.start = b + 1; // truncate interval
}
IntervalList.prototype.hasRange = function(a, b) {
if (!this.first) return false; // empty list
var i = this.first;
while (i.next && a > i.end) i = i.next; // a before or in interval i
return a >= i.start && b <= i.end; // range in interval ?
}
IntervalList.prototype.addValue = function(a) {
this.addRange(a, a); // could be optimised
}
IntervalList.prototype.delValue = function(a) {
this.delRange(a, a); // could be optimised
}
IntervalList.prototype.hasValue = function(a) {
return this.hasRange(a, a); // could be optimised
}
IntervalList.prototype.print = function() {
var i = this.first;
if (i) do document.write("(" + i.start + "-" + i.end + ") "); while (i = i.next);
document.write("<br>");
}
var intervals = new IntervalList();
intervals.addRange(100,199);
document.write("+ (100-199) → "); intervals.print();
intervals.addRange(300,399);
document.write("+ (300-399) → "); intervals.print();
intervals.addRange(200,299);
document.write("+ (200-299) → "); intervals.print();
intervals.delRange(225,275);
document.write("− (225-275) → "); intervals.print();
document.write("(150-200) ? " + intervals.hasRange(150,200) + "<br>");
document.write("(200-300) ? " + intervals.hasRange(200,300) + "<br>");
I'm surprised no one has suggested segment trees over the integer domain of stored values. (When used in geometric applications like graphics in 2d and 3d, they're called quadtrees and octrees resp.) Insert, delete, and lookup will have time and space complexity proportional to the number of bits in (maxval - minval), that is log_2 (maxval - minval), the max and min values of the integer data domain.
In a nutshell, we are encoding a set of integers in [minval, maxval]. A node at topmost level 0 represents that entire range. Each successive level's nodes represent sub-ranges of approximate size (maxval - minval) / 2^k. When a node is included, some subset of it's corresponding values are part of the represented set. When it's a leaf, all of its values are in the set. When it's absent, none are.
E.g. if minval=0 and maxval=7, then the k=1 children of the k=0 node represent [0..3] and [4..7]. Their children at level k=2 are [0..1][2..3][4..5], and [6..7], and the k=3 nodes represent individual elements. The set {[1..3], [6..7]} would be the tree (levels left to right):
[0..7] -- [0..3] -- [0..1]
| | `-[1]
| `- [2..3]
` [4..7]
`- [6..7]
It's not hard to see that space for the tree will be O(m log (maxval - minval)) where m is the number of intervals stored in the tree.
It's not common to use segment trees with dynamic insert and delete, but the algorithms turn out to be fairly simple. It takes some care to ensure the number of nodes is minimized.
Here is some very lightly tested java code.
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
public class SegmentTree {
// Shouldn't differ by more than Long.MAX_VALUE to prevent overflow.
static final long MIN_VAL = 0;
static final long MAX_VAL = Long.MAX_VALUE;
Node root;
static class Node {
Node left;
Node right;
Node(Node left, Node right) {
this.left = left;
this.right = right;
}
}
private static boolean isLeaf(Node node) {
return node != null && node.left == null && node.right == null;
}
private static Node reset(Node node, Node left, Node right) {
if (node == null) {
return new Node(left, right);
}
node.left = left;
node.right = right;
return node;
}
/**
* Accept an arbitrary subtree rooted at a node representing a subset S of the range [lo,hi] and
* transform it into a subtree representing S + [a,b]. It's assumed a >= lo and b <= hi.
*/
private static Node add(Node node, long lo, long hi, long a, long b) {
// If the range is empty, the interval tree is always null.
if (lo > hi) return null;
// If this is a leaf or insertion is outside the range, there's no change.
if (isLeaf(node) || a > b || b < lo || a > hi) return node;
// If insertion fills the range, return a leaf.
if (a == lo && b == hi) return reset(node, null, null);
// Insertion doesn't cover the range. Get the children, if any.
Node left = null, right = null;
if (node != null) {
left = node.left;
right = node.right;
}
// Split the range and recur to insert in halves.
long mid = lo + (hi - lo) / 2;
left = add(left, lo, mid, a, Math.min(b, mid));
right = add(right, mid + 1, hi, Math.max(a, mid + 1), b);
// Build a new node, coallescing to leaf if both children are leaves.
return isLeaf(left) && isLeaf(right) ? reset(node, null, null) : reset(node, left, right);
}
/**
* Accept an arbitrary subtree rooted at a node representing a subset S of the range [lo,hi] and
* transform it into a subtree representing range(s) S - [a,b]. It's assumed a >= lo and b <= hi.
*/
private static Node del(Node node, long lo, long hi, long a, long b) {
// If the tree is null, we can't remove anything, so it's still null
// or if the range is empty, the tree is null.
if (node == null || lo > hi) return null;
// If the deletion is outside the range, there's no change.
if (a > b || b < lo || a > hi) return node;
// If deletion fills the range, return an empty tree.
if (a == lo && b == hi) return null;
// Deletion doesn't fill the range.
// Replace a leaf with a tree that has the deleted portion removed.
if (isLeaf(node)) {
return add(add(null, lo, hi, b + 1, hi), lo, hi, lo, a - 1);
}
// Not a leaf. Get children, if any.
Node left = node.left, right = node.right;
long mid = lo + (hi - lo) / 2;
// Recur to delete in child ranges.
left = del(left, lo, mid, a, Math.min(b, mid));
right = del(right, mid + 1, hi, Math.max(a, mid + 1), b);
// Build a new node, coallescing to empty tree if both children are empty.
return left == null && right == null ? null : reset(node, left, right);
}
private static class NotContainedException extends Exception {};
private static void verifyContains(Node node, long lo, long hi, long a, long b)
throws NotContainedException {
// If this is a leaf or query is empty, it's always contained.
if (isLeaf(node) || a > b) return;
// If tree or search range is empty, the query is never contained.
if (node == null || lo > hi) throw new NotContainedException();
long mid = lo + (hi - lo) / 2;
verifyContains(node.left, lo, mid, a, Math.min(b, mid));
verifyContains(node.right, mid + 1, hi, Math.max(a, mid + 1), b);
}
SegmentTree addRange(long a, long b) {
root = add(root, MIN_VAL, MAX_VAL, Math.max(a, MIN_VAL), Math.min(b, MAX_VAL));
return this;
}
SegmentTree addVal(long a) {
return addRange(a, a);
}
SegmentTree delRange(long a, long b) {
root = del(root, MIN_VAL, MAX_VAL, Math.max(a, MIN_VAL), Math.min(b, MAX_VAL));
return this;
}
SegmentTree delVal(long a) {
return delRange(a, a);
}
boolean containsVal(long a) {
return containsRange(a, a);
}
boolean containsRange(long a, long b) {
try {
verifyContains(root, MIN_VAL, MAX_VAL, Math.max(a, MIN_VAL), Math.min(b, MAX_VAL));
return true;
} catch (NotContainedException expected) {
return false;
}
}
private static final boolean PRINT_SEGS_COALESCED = true;
/** Gather a list of possibly coalesced segments for printing. */
private static void gatherSegs(List<Long> segs, Node node, long lo, long hi) {
if (node == null) {
return;
}
if (node.left == null && node.right == null) {
if (PRINT_SEGS_COALESCED && !segs.isEmpty() && segs.get(segs.size() - 1) == lo - 1) {
segs.remove(segs.size() - 1);
} else {
segs.add(lo);
}
segs.add(hi);
} else {
long mid = lo + (hi - lo) / 2;
gatherSegs(segs, node.left, lo, mid);
gatherSegs(segs, node.right, mid + 1, hi);
}
}
SegmentTree print() {
List<Long> segs = new ArrayList<>();
gatherSegs(segs, root, MIN_VAL, MAX_VAL);
Iterator<Long> it = segs.iterator();
while (it.hasNext()) {
long a = it.next();
long b = it.next();
System.out.print("[" + a + "," + b + "]");
}
System.out.println();
return this;
}
public static void main(String [] args) {
SegmentTree tree = new SegmentTree()
.addRange(0, 4).print()
.addRange(6, 7).print()
.delVal(2).print()
.addVal(5).print()
.addRange(0,1000).print()
.addVal(5).print()
.delRange(22, 777).print();
System.out.println(tree.containsRange(3, 20));
}
}

Efficiently count occurrences of each element from given ranges

So i have some ranges like these:
2 4
1 9
4 5
4 7
For this the result should be
1 -> 1
2 -> 2
3 -> 2
4 -> 4
5 -> 3
6 -> 2
7 -> 2
8 -> 1
9 -> 1
The naive approach will be to loop through all the ranges but that would be very inefficient and the worst case would take O(n * n)
What would be the efficient approach probably in O(n) or O(log(n))
Here's the solution, in O(n):
The rationale is to add a range [a, b] as a +1 in a, and a -1 after b. Then, after adding all the ranges, then compute the accumulated sums for that array and display it.
If you need to perform queries while adding the values, a better choice would be to use a Binary Indexed Tree, but your question doesn't seem to require this, so I left it out.
#include <iostream>
#define MAX 1000
using namespace std;
int T[MAX];
int main() {
int a, b;
int min_index = 0x1f1f1f1f, max_index = 0;
while(cin >> a >> b) {
T[a] += 1;
T[b+1] -= 1;
min_index = min(min_index, a);
max_index = max(max_index, b);
}
for(int i=min_index; i<=max_index; i++) {
T[i] += T[i-1];
cout << i << " -> " << T[i] << endl;
}
}
UPDATE: Based on the "provocations" (in a good sense) by גלעד ברקן, you can also do this in O(n log n):
#include <iostream>
#include <map>
#define ull unsigned long long
#define miit map<ull, int>::iterator
using namespace std;
map<ull, int> T;
int main() {
ull a, b;
while(cin >> a >> b) {
T[a] += 1;
T[b+1] -= 1;
}
ull last;
int count = 0;
for(miit it = T.begin(); it != T.end(); it++) {
if (count > 0)
for(ull i=last; i<it->first; i++)
cout << i << " " << count << endl;
count += it->second;
last = it->first;
}
}
The advantage of this solution is being able to support ranges with much larger values (as long as the output isn't so large).
The solution would be pretty simple:
generate two lists with the indices of all starting and ending indices of the ranges and sort them.
Generate a counter for the number of ranges that cover the current index. Start at the first item that is at any range and iterate over all numbers to the last element that is in any range. Now if an index is either part of the list of starting-indices, we add 1 to the counter, if it's an element of the ending-indices, we substract 1 from the counter.
Implementation:
vector<int> count(int** ranges , int rangecount , int rangemin , int rangemax)
{
vector<int> res;
set<int> open, close;
for(int** r = ranges ; r < ranges + sizeof(int*) * rangecount ; r++)
{
open.add((*r)[0]);
close.add((*r)[1]);
}
int rc = 0;
for(int i = rangemin ; i < rangemax ; i++)
{
if(open.count(i))
++rc;
res.add(rc);
if(close.count(i))
--rc;
}
return res;
}
Paul's answer still counts from "the first item that is at any range and iterate[s] over all numbers to the last element that is in any range." But what is we could aggregate overlapping counts? For example, if we have three (or say a very large number of) overlapping ranges [(2,6),[1,6],[2,8] the section (2,6) could be dependent only on the number of ranges, if we were to label the overlaps with their counts [(1),3(2,6),(7,8)]).
Using binary search (once for the start and a second time for the end of each interval), we could split the intervals and aggregate the counts in O(n * log m * l) time, where n is our number of given ranges and m is the number of resulting groups in the total range and l varies as the number of disjoint updates required for a particular overlap (the number of groups already within that range). Notice that at any time, we simply have a sorted list grouped as intervals with labeled count.
2 4
1 9
4 5
4 7
=>
(2,4)
(1),2(2,4),(5,9)
(1),2(2,3),3(4),2(5),(6,9)
(1),2(2,3),4(4),3(5),2(6,7),(8,9)
So you want the output to be an array, where the value of each element is the number of input ranges that include it?
Yeah, the obvious solution would be to increment every element in the range by 1, for each range.
I think you can get more efficient if you sort the input ranges by start (primary), end (secondary). So for 32bit start and end, start:end can be a 64bit sort key. Actually, just sorting by start is fine, we need to sort the ends differently anyway.
Then you can see how many ranges you enter for an element, and (with a pqueue of range-ends) see how many you already left.
# pseudo-code with possible bugs.
# TODO: peek or put-back the element from ranges / ends
# that made the condition false.
pqueue ends; // priority queue
int depth = 0; // how many ranges contain this element
for i in output.len {
while (r = ranges.next && r.start <= i) {
ends.push(r.end);
depth++;
}
while (ends.pop < i) {
depth--;
}
output[i] = depth;
}
assert ends.empty();
Actually, we can just sort the starts and ends separately into two separate priority queues. There's no need to build the pqueue on the fly. (Sorting an array of integers is more efficient than sorting an array of structs by one struct member, because you don't have to copy around as much data.)

Converting A Recursive Function into a Non-Recursive Function

I'm trying to convert a recursive function into a non-recursive solution in pseudocode. The reason why I'm running into problems is that the method has two recursive calls in it.
Any help would be great. Thanks.
void mystery(int a, int b) {
if (b - a > 1) {
int mid = roundDown(a + b) / 2;
print mid;
mystery(a, mid);
mystery(mid + 1, b);
}
}
This one seems more interesting, it will result in displaying all numbers from a to (b-1) in an order specific to the recursive function. Note that all of the "left" midpoints get printed before any "right" midpoints.
void mystery (int a, int b) {
if (b > a) {
int mid = roundDown(a + b) / 2;
print mid;
mystery(a, mid);
mystery(mid + 1, b);
}
}
For example, if a = 0, and b = 16, then the output is:
8 4 2 1 0 3 6 5 7 12 10 9 11 14 13 15
The texbook method to turn a recursive procedure into an iterative one is simply to replace the recursive call with
a stack and run a "do loop" until the stack is empty.
Try the following:
push(0, 16); /* Prime the stack */
call mystery;
...
void mystery {
do until stackempty() { /* iterate until stack is empty */
pop(a, b) /* pop and process... */
do while (b - a >= 1) { /* run the "current" values down... */
int mid = roundDown(a+b)/2;
print mid;
push(mid+1, b); /* push in place of recursive call */
b = mid;
}
}
The original function had two recusive calls, so why only a single stack? Ignore the requirements for
the second recursive call and you can easily see
the first recursive call (mystery(a, mid);) could implemented as a simple loop where b assumes the value of mid
on each iteration - nothing else needs to be "remembered". So turn it into a loop and simply push
the parameters needed for the recusion onto a stack,
add an outer loop to run the stack down. Done.
With a bit of creative thinking, any recursive function can be turned into an iterative one using stacks.
This is what is happening. You have a long rod, you are dividing it into two. Then you take these two parts and divide it into two. You do this with each sub-part until the length of that part becomes 1.
How would you do that?
Assume you have to break the rod at mid point. We will put the marks to cut in bins for further cuts. Note: each part spawns two new parts so we need 2n boxes to store sub-parts.
len = pow (2, b-a+1) // +1 might not be needed
ar = int[len] // large array will memoize my marks to cut
ar[0] = a // first mark
ar[1] = b // last mark
start_ptr = 0 // will start from this point
end_ptr = 1 // will end to this point
new_end = end_ptr // our end point will move for cuts
while true: //loop endlessly, I do not know, may there is a limit
while start_ptr < end_ptr: // look into bins
i = ar[start_ptr] //
j = ar[start_ptr+1] // pair-wise ends
if j - i > 1 // if lengthier than unit length, then add new marks
mid = floor ( (i+j) / 2 ) // this is my mid
print mid
ar[++new_end] = i // first mark --|
ar[++new_end] = mid - 1 // mid - 1 mark --+-- these two create one pair
ar[++new_end] = mid + 1 // 2nd half 1st mark --|
ar[++new_end] = j // end mark --+-- these two create 2nd pair
start_ptr = start_ptr + 2 // jump to next two ends
if end_ptr == new_end // if we passed to all the pairs and no new pair
break // was created, we are done.
else
end_ptr = new_end //else, start from sub prolem
PS: I haven't tried this code. This is just a pseudo code. It seems to me that it should do the job. Let me know if you try it out. It will validate my algorithm. It is basically a b-tree in an array.
This example recursively splits a range of numbers until the range is reduced to a single value. The output shows the structure of the numbers. The single values are output in order, but grouped based on the left side first split function.
void split(int a, int b)
{
int m;
if ((b - a) < 2) { /* if size == 1, return */
printf(" | %2d", a);
return;
}
m = (a + b) / 2; /* else split array */
printf("\n%2d %2d %2d", a, m, b);
split(a, m);
split(m, b);
}

Efficiently sort an array of integers when having extra information about the final sorted array

Suppose we have this array of integers called data
3
2
4
5
2
Also we have the following array of the same size called info
1
4
0
2
3
Each value of info represents an index on the first array. So for example the first value is 1 which means that in position 0 the final sorted array will have the value data[info[0]].
By following this logic the final sorted array will be the following:
data[info[0]] => 2
data[info[1]] => 2
data[info[2]] => 3
data[info[3]] => 4
data[info[4]] => 5
I would like to make an in place sorting of the data array, without using any extra memory of size N where N is the size of the data array. In addition I would like the amount of total operations to be as small as possible.
I've been trying to think of a solution to my problem however I couldn't think of anything that wouldn't use extra memory. Keep in mind that these are my own restrictions for a system that I'm implementing, if these restrictions can't be kept then I will probably have to think of something else.
Any ideas would be appreciated.
Thank you in advance
why not simply
for i in 0..n-1 :
info[i] := data[info[i]]
and info now holds the sorted array. If it must be in data, just copy it back, next:
for i in 0..n-1 :
data[i] := info[i]
2*n copies, overall.
If the info array need not remain intact, you can use that as additional storage and sort in O(n):
for(int i = 0; i < n; ++i) {
int where = info[i];
if (where == i) continue;
info[i] = data[i];
data[i] = i < where ? data[where] : info[where];
}
If an element of data is already in its correct place, we skip that index. Otherwise, remember the element in the info array, and write the correct element into data, fetching it from data if it comes from a larger index, and from info if it comes from a smaller index.
Of course that simple method requires the types of the info and data arrays to be the same, and in general does 3*n copies.
If the data elements cannot be stored in the info array, we can follow the cycles in info:
for(int i = 0; i < n; ++i) {
// Check if this is already in the right place, if so mark as done
if (info[i] == i) info[i] = -1;
// Nothing to do if we already treated this index
if (info[i] < 0) continue;
// New cycle we haven't treated yet
Stuff temp = data[i]; // remember value
int j = info[i], k = i;
while(j != i) {
// copy the right value into data[k]
data[k] = data[j];
// mark index k as done
info[k] = -1;
// get next indices
k = j;
j = info[j];
}
// Now close the cycle
data[k] = temp;
info[k] = -1;
}
That does n - F + C copies of data elements, where F is the number of elements that already were in the right place (fixed points of the sorting permutation) and C is the number of cycles of length > 1 in the sorting permutation. That means the number of copies is at most 3*n/2.

Resources