Quicksort - which sub-part should be sorted first?

Quicksort - which sub-part should be sorted first? - algorithm

I am reading some text which claims this regarding the ordering of the two recursive Quicksort calls:
... it is important to call the smaller subproblem first, this in conjunction with tail recursion ensures that the stack depth is log n.
I am not at all sure what that means, why should I call Quicksort on the smaller subarray first?

Look at quicksort as an implicit binary tree. The pivot is the root, and the left and right subtrees are the partitions you create.
Now consider doing a depth first search of this tree. The recursive calls actually correspond to doing a depth first search on the implicit tree described above. Also assume that the tree always has the smaller sub-tree as the left child, so the suggestion is in fact to do a pre-order on this tree.
Now suppose you implement the preorder using a stack, where you push only the left child (but keep the parent on the stack) and when the time comes to push the right child (say you maintained a state where you knew whether a node has its left child explored or not), you replace the top of stack, instead of pushing the right child (this corresponds to the tail recursion part).
The maximum stack depth is the maximum 'left depth': i.e. if you mark each edge going to a left child as 1, and going to a right child as 0, then you are looking at the path with maximum sum of edges (basically you don't count the right edges).
Now since the left sub-tree has no more than half the elements, each time you go left (i.e. traverse and edge marked 1), you are reducing the number of nodes left to explore by at least half.
Thus the maximum number of edges marked 1 that you see, is no more than log n.
Thus the stack usage is no more than log n, if you always pick the smaller partition, and use tail recursion.

Some languages have tail recursion. This means that if you write f(x) { ... ... .. ... .. g(x)} then the final call, to g(x), isn't implemented with a function call at all, but with a jump, so that the final call does not use any stack space.
Quicksort splits the data to be sorted into two sections. If you always handle the shorter section first, then each call that consumes stack space has a section of data to sort that is at most half the size of the recursive call that called it. So if you start off with 10 elements to sort, the stack at its deepest will have a call sorting those 10 elements, and then a call sorting at most 5 elements, and then a call sorting at most 2 elements, and then a call sorting at most 1 element - and then, for 10 elements, the stack cannot go any deeper - the stack size is limited by the log of the data size.
If you didn't worry about this, you could end up with the stack holding a call sorting 10 elements, and then a call sorting 9 elements, and then a call sorting 8 elements, and so on, so that the stack was as deep as the number of elements to be sorted. But this can't happen with tail recursion if you sort the short sections first, because although you can split 10 elements into 1 element and 9 elements, the call sorting 9 elements is done last of all and implemented as a jump, which doesn't use any more stack space - it reuses the stack space previously used by its caller, which was just about to return anyway.

Ideally, the list is partitions into two roughly similar size sublists. It doesn't matter much which sublist you work on first.
But if on a bad day the list partitions in the most lopsided way possible, a sublist of two or three items, maybe four, and a sublist nearly as long as the original. This could be due to bad choices of partition value or wickedly contrived data. Imagine what would happen if you worked on the bigger sublist first. The first invocation of Quicksort is holding the pointers/indices for the short list in its stack frame while recursively calling quicksort for the long list. This too partitions badly into a very short list and a long one, and we do the longer sublist first, repeat...
Ultimately, on the baddest of bad days with the wickedest of wicked data, we'll have stack frames built up in number proportional to the original list length. This is quicksort's worst case behavior, O(n) depth of recursive calls. (Note we are talking of quicksort's depth of recursion, not performance.)
Doing the shorter sublist first gets rid of it fairly quickly. We still process a larger number of tiny lists, in proportion to the original list length, but now each one is taken care of by a shallow one or two recursive calls. We still make O(n) calls (performance) but each is depth O(1).

Surprisingly, this turns out to be important even when quicksort is not confronted with wildly unbalanced partitions, and even when introsort is actually being used.
The problem arises (in C++) when the values in the container being sorted are really big. By this, I don't mean that they point to really big objects, but that they are themselves really big. In that case, some (possibly many) compilers will make the recursive stack frame quite big, too, because it needs at least one temporary value in order to do a swap. Swap is called inside of partition, which is not itself recursive, so you would think that the quicksort recursive driver would not require the monster stack-frame; unfortunately, partition usually ends up being inlined because it's nice and short, and not called from anywhere else.
Normally the difference between 20 and 40 stack frames is negligible, but if the values weigh in at, say, 8kb, then the difference between 20 and 40 stack frames could mean the difference between working and stack overflow, if stacks have been reduced in size to allow for many threads.
If you use the "always recurse into the smaller partition" algorithm, the stack cannot every exceed log2 N frames, where N is the number of elements in the vector. Furthermore, N cannot exceed the amount of memory available divided by the size of an element. So on a 32-bit machine, the there could only be 219 8kb elements in a vector, and the quicksort call depth could not exceed 19.
In short, writing quicksort correctly makes its stack usage predictable (as long as you can predict the size of a stack frame). Not bothering with the optimization (to save a single comparison!) can easily cause the stack depth to double even in non-pathological cases, and in pathological cases it can get a lot worse.

Related

Realistic usage of unrolled skip lists

Why there is no any information in Google / Wikipedia about unrolled skip list? e.g. combination between unrolled linked list and skip list.

Probably because it wouldn't typically give you much of a performance improvement, if any, and it would be somewhat involved to code correctly.
First, the unrolled linked list typically uses a pretty small node size. As the Wikipedia article says: " just large enough so that the node fills a single cache line or a small multiple thereof." On modern Intel processors, a cache line is 64 bytes. Skip list nodes have, on average, two pointers per node, which means an average of 16 bytes per node for the forward pointers. Plus whatever the data for the node is: 4 or 8 bytes for a scalar value, or 8 bytes for a reference (I'm assuming a 64 bit machine here).
So figure 24 bytes, total, for an "element." Except that the elements aren't fixed size. They have a varying number of forward pointers. So you either need to make each element a fixed size by allocating an array for the maximum number of forward pointers for each element (which for a skip list with 32 levels would require 256 bytes), or use a dynamically allocated array that's the correct size. So your element becomes, in essence:
struct UnrolledSkipListElement
{
void* data; // 64-bit pointer to data item
UnrolledSkipListElement* forward_pointers; // dynamically allocated
}
That would reduce your element size to just 16 bytes. But then you lose much of the cache-friendly behavior that you got from unrolling. To find out where you go next, you have to dereference the forward_pointers array, which is going to incur a cache miss, and therefore eliminate the savings you got by doing the unrolling. In addition, that dynamically allocated array of pointers isn't free: there's some (small) overhead involved in allocating that memory.
If you can find some way around that problem, you're still not going to gain much. A big reason for unrolling a linked list is that you must visit every node (up to the node you find) when you're searching it. So any time you can save with each link traversal adds up to very big savings. But with a skip list you make large jumps. In a perfectly organized skip list, for example, you could skip half the nodes on the first jump (if the node you're looking for is in the second half of the list). If your nodes in the unrolled skip list only contain four elements, then the only savings you gain will be at levels 0, 1, and 2. At higher levels you're skipping more than three nodes ahead and as a result you will incur a cache miss.
So the skip list isn't unrolled because it would be somewhat involved to implement and it wouldn't give you much of a performance boost, if any. And it might very well cause the list to be slower.

Linked list complexity is O(N)
Skip list complexity is O(Log N)
Unrolled Linked List complexity can be calculate as following:
O (N / (M / 2) + Log M) = O (2N/M + Log M)
Where M is number of elements in single node.
Because Log M is not significant,
Unrolled Linked List complexity is O(N/M)
If we suppose to combine Skip list with Unrolled linked list, the new complexity will be
O(Log N + "something from unrolled linked list such N1/M")
This means the "new" complexity will not be as better as first someone will think. New complexity might be even worse than original O(Log N). The implementation will more complex as well. So gain is questionable and rather dubious.
Also, since single node will have lots of data, but only single "forward" array, the "tree" will not be so-balanced either and this will ruin O(Log N) part of the equation.

quick sort implementation using stack

I am reading a quick sort implementation using a stack at the following link.
link
My question is regarding the following paragraph.
The policy of putting the larger of the small subfiles on the stack
ensures that each entry on the stack is no more than one-half of the
size of the one below it, so that the stack needs to contain room for
only about lg N entries. This maximum stack usage occurs when the
partition always falls at the center of the file. For random files,
the actual maximum stack size is much lower; for degenerate files it
is likely to be small.
This technique does not necessarily work in a truly recursive
implementation, because it depends on end- or tail-recursion removal.
If the last action of a procedure is to call another procedure, some
programming environments will arrange things such that local variables
are cleared from the stack before, rather than after, the call.
Without end-recursion removal, we cannot guarantee that the stack size
will be small for quicksort.
What does the author mean by "that each entry on the stack is no more than one-half of the size of the one below it"? Could you please give an example of this.
How did the author came to the conclusion that the stack needs space for only about lg N entries?
What does authore mean by "Without end-recursion removal, we cannot guarantee that the stack size will be small for quicksort" ?
Thanks for your time and help.

The policy of putting the larger of the small subfiles on the stack ensures that each entry on the stack is no more than one-half of the size of the one below it,
That is not quite true. Consider you want to sort a 100-element array, and the first pivot goes right in the middle. Then you have a stack
49
50
then you pop the 49-element part off the stack, partition, and push the two parts on the stack. Let's say the choice of pivot was not quite as good this time, there were 20 elements not larger than the pivot. Then you'd get the stack
20
28
50
and each stack entry is more than half of the one below.
But that cannot continue forever, and we have
During the entire sorting, if stack level k is occupied, its size is at most total_size / (2^k).
That is obviously true when the sorting begins, since then there is only one element on the stack, at level 0, which is the entire array of size total_size.
Now, assume the stated property holds on entering the loop (while(!stack.empty())).
A subarray of length s is popped from stack level m. If s <= 1, nothing else is done before the next loop iteration, and the invariant continues to hold. Otherwise, if s >= 2, After partitioning that, there are two new subarrays to be pushed on the stack, with s-1 elements together. The smaller of those two then has a size smaller_size <= (s-1)/2, and the larger has a size larger_size <= s-1. Stack level m will be occupied by the larger of the two, and we have
larger_size <= s-1 < s <= total_size / (2^m)
smaller_size <= (s-1)/2 < s/2 <= total_size / (2^(m+1))
for the stack levels m resp. m+1 at the end of the loop body. The invariant holds for the next iteration.
Since at most one subarray of size 0 is ever on the stack (it is then immediately popped off in the next iteration), there are never more than lg total_size + 1 stack levels occupied.
Regarding
What does author mean by "Without end-recursion removal, we cannot guarantee that the stack size will be small for quicksort" ?
In a recursive implementation, you can have deep recursion, and when the stack frame is not reused for the end-call, you may need linear stack space. Consider a stupid pivot selection, always choosing the first element as pivot, and an already sorted array.
[0,1,2,3,4]
partition, pivot goes in position 0, the smaller subarray is empty. The recursive call for the larger subarray [1,2,3,4], allocates a new stack frame (so there are now two stack frames). Same principle, the next recursive call with the subarray [2,3,4] allocates a third stack frame, etc.
If one has end-recursion removal, i.e. the stack frame is reused, one has the same guarantees as with the manual stack above.

I will try to answer your question (hopefully I am not wrong)...
Every step in quicksort you divide your input into two (one half). By doing so, you need logN. This explains your first and second question ("each entry on the stack is no more than one-half" and "logN" entries)

Heapsort using multiple heaps

I found a variant of Heapsort using multiple heaps at http://students.ceid.upatras.gr/~lebenteas/Heapsort-using-Multiple-Heaps-final.pdf. The solution proposes that instead of the traditional Heapsort algorithm, where after each swap, we do another siftdown to bring the highest value in the current heap to the root, we can do some other things. However, what exactly do they mean by 'other things', I cannot understand.
For example, at one point they say We “forget”, for the time being, the existence of the root. That surely means we are currently stalling the swapping of the highest element with the last element of the heap. However, just after some lines, they say So far, two elements have been transferred in the sorted part of the heap., which runs counter to the proposition that the swapping hasn't been done yet. Also in the figure in page 97, the node with value 1 is missing, I don't know how.
Can anybody give me an idea of what exactly is the authors trying to convey, and how worthwhile can it be?

(The line you asked about is in section 2.3, so I will explain the variation of heapsort which is proposed in section 2.3:)
When the author says we "forget" the existence of the root, this does not mean that they are stalling the swapping of the highest element. The swap is done, but they temporarily delay rebuilding the heap. After swapping the highest element into the root position, they compare the roots of the 2 subheaps, and swap one or the other with the next-highest element. Then, after doing 2 swaps (rather than 1), they rebuild the heap.
Then they take this idea a step further in sections 3 and 4, and propose another variant of heapsort, which uses more than one heap.
How do you keep more than one heap in an array? (To make it concrete, let's talk about 2 heaps.) Well, how do you keep a single heap? The root goes at index 0, its children are at 1 and 2, then the children of the left subheap are at 3 and 4, etc., right?
To put 2 heaps together in an array, keep the 2 roots at 0 and 1. The children of the first root go at 2 and 3, then the children of the 2nd root at 4 and 5... with such an arrangement, it is still possible to navigate up and down the tree by doing simple arithmetic operations on indexes.
The standard heapsort repeats 2 steps: swap the root with the last element in the "heap" area, then siftDown to rebuild the heap. This heapsort repeats the following 3 steps: compare the 2 roots to see which one is bigger, swap that one with the last element in the "heap" area, then call siftDown on the appropriate heap.
This requires an extra compare at each step, but the siftDown operations work on slightly shallower heaps, which saves more than a single compare.

Data design Issue to find insertion deletion and getMin in O(1)

As said in the title i need to define a datastructure that takes only O(1) time for insertion deletion and getMIn time.... NO SPACE CONSTRAINTS.....
I have searched SO for the same and all i have found is for insertion and deletion in O(1) time.... even a stack does. i saw previous post in stack overflow all they say is hashing...
with my analysis for getMIn in O(1) time we can use heap datastructure
for insertion and deletion in O(1) time we have stack...
so inorder to achieve my goal i think i need to tweak around heapdatastructure and stack...
How will i add hashing technique to this situation ...
if i use hashtable then what should my hash function look like how to analize the situation in terms of hashing... any good references will be appreciated ...

If you go with your initial assumption that insertion and deletion are O(1) complexity (if you only want to insert into the top and delete/pop from the top then a stack works fine) then in order to have getMin return the minimum value in constant time you would need to store the min somehow. If you just had a member variable keep track of the min then what would happen if it was deleted off the stack? You would need the next minimum, or the minimum relative to what's left in the stack. To do this you could have your elements in a stack contain what it believes to be the minimum. The stack is represented in code by a linked list, so the struct of a node in the linked list would look something like this:
struct Node
{
int value;
int min;
Node *next;
}
If you look at an example list: 7->3->1->5->2. Let's look at how this would be built. First you push in the value 2 (to an empty stack), this is the min because it's the first number, keep track of it and add it to the node when you construct it: {2, 2}. Then you push the 5 onto the stack, 5>2 so the min is the same push {5,2}, now you have {5,2}->{2,2}. Then you push 1 in, 1<2 so the new min is 1, push {1, 1}, now it's {1,1}->{5,2}->{2,2} etc. By the end you have:
{7,1}->{3,1}->{1,1}->{5,2}->{2,2}
In this implementation, if you popped off 7, 3, and 1 your new min would be 2 as it should be. And all of your operations is still in constant time because you just added a comparison and another value to the node. (You could use something like C++'s peek() or just use a pointer to the head of the list to take a look at the top of the stack and grab the min there, it'll give you the min of the stack in constant time).
A tradeoff in this implementation is that you'd have an extra integer in your nodes, and if you only have one or two mins in a very large list it is a waste of memory. If this is the case then you could keep track of the mins in a separate stack and just compare the value of the node that you're deleting to the top of this list and remove it from both lists if it matches. It's more things to keep track of so it really depends on the situation.
DISCLAIMER: This is my first post in this forum so I'm sorry if it's a bit convoluted or wordy. I'm also not saying that this is "one true answer" but it is the one that I think is the simplest and conforms to the requirements of the question. There are always tradeoffs and depending on the situation different approaches are required.

This is a design problem, which means they want to see how quickly you can augment existing data-structures.
start with what you know:
O(1) update, i.e. insertion/deletion, is screaming hashtable
O(1) getMin is screaming hashtable too, but this time ordered.
Here, I am presenting one way of doing it. You may find something else that you prefer.
create a HashMap, call it main, where to store all the elements
create a LinkedHashMap (java has one), call it mins where to track the minimum values.
the first time you insert an element into main, add it to mins as well.
for every subsequent insert, if the new value is less than what's at the head of your mins map, add it to the map with something equivalent to addToHead.
when you remove an element from main, also remove it from mins. 2*O(1) = O(1)
Notice that getMin is simply peeking without deleting. So just peek at the head of mins.
EDIT:
Amortized algorithm:
(thanks to #Andrew Tomazos - Fathomling, let's have some more fun!)
We all know that the cost of insertion into a hashtable is O(1). But in fact, if you have ever built a hash table you know that you must keep doubling the size of the table to avoid overflow. Each time you double the size of a table with n elements, you must re-insert the elements and then add the new element. By this analysis it would
seem that worst-case cost of adding an element to a hashtable is O(n). So why do we say it's O(1)? because not all the elements take worst-case! Indeed, only the elements where doubling occurs takes worst-case. Therefore, inserting n elements takes n+summation(2^i where i=0 to lg(n-1)) which gives n+2n = O(n) so that O(n)/n = O(1) !!!
Why not apply the same principle to the linkedHashMap? You have to reload all the elements anyway! So, each time you are doubling main, put all the elements in main in mins as well, and sort them in mins. Then for all other cases proceed as above (bullets steps).

A hashtable gives you insertion and deletion in O(1) (a stack does not because you can't have holes in a stack). But you can't have getMin in O(1) too, because ordering your elements can't be faster than O(n*Log(n)) (it is a theorem) which means O(Log(n)) for each element.
You can keep a pointer to the min to have getMin in O(1). This pointer can be updated easily for an insertion but not for the deletion of the min. But depending on how often you use deletion it can be a good idea.

You can use a trie. A trie has O(L) complexity for both insertion, deletion, and getmin, where L is the length of the string (or whatever) you're looking for. It is of constant complexity with respect to n (number of elements).
It requires a huge amount of memory, though. As they emphasized "no space constraints", they were probably thinking of a trie. :D

Strictly speaking your problem as stated is provably impossible, however consider the following:
Given a type T place an enumeration on all possible elements of that type such that value i is less than value j iff T(i) < T(j). (ie number all possible values of type T in order)
Create an array of that size.
Make the elements of the array:
struct PT
{
T t;
PT* next_higher;
PT* prev_lower;
}
Insert and delete elements into the array maintaining double linked list (in order of index, and hence sorted order) storage
This will give you constant getMin and delete.
For insertition you need to find the next element in the array in constant time, so I would use a type of radix search.
If the size of the array is 2^x then maintain x "skip" arrays where element j of array i points to the nearest element of the main array to index (j << i).
This will then always require a fixed x number of lookups to update and search so this will give constant time insertion.
This uses exponential space, but this is allowed by the requirements of the question.

in your problem statement " insertion and deletion in O(1) time we have stack..."
so I am assuming deletion = pop()
in that case, use another stack to track min
algo:
Stack 1 -- normal stack; Stack 2 -- min stack
Insertion
push to stack 1.
if stack 2 is empty or new item < stack2.peek(), push to stack 2 as well
objective: at any point of time stack2.peek() should give you min O(1)
Deletion
pop() from stack 1.
if popped element equals stack2.peek(), pop() from stack 2 as well

Efficient way to handle adding and removing items by bitwise And

So, suppose you have a collection of items. Each item has an identifier which can be represented using a bitfield. As a simple example, suppose your collection is:
0110, 0111, 1001, 1011, 1110, 1111
So, you then want to implement a function, Remove(bool bitval, int position). For example, a call to Remove(0, 2) would remove all items where index 2(i.e. 3rd bit) was 0. In this case, that would be 1001, only. Remove(1,1) would remove 1110, 1111, 0111, and 0110. It is trivial to come up with an O(n) collection where this is possible (just use a linked list), with n being the number of items in the collection. In general the number of items to be removed is going to be O(n) (assuming a given bit has a ≥ c% chance of being 1 and a ≥ c% chance of being 0, where c is some constant > 0), so "better" algorithms which somehow are O(l), with l being the number of items being removed, are unexciting.
Is it possible to define a data structure where the average (or better yet, worst case) removal time is better than O(n)? A binary tree can do pretty well (just remove all left/right branches at the height m, where m is the index being tested), but I'm wondering if there is any way to do better (and quite honestly, I'm not sure how to removing all left or right branches at a particular height in an efficient manner). Alternatively, is there a proof that doing better is not possible?
Edit: I'm not sure exactly what I'm expecting in terms of efficiency (sorry Arno), but a basic explanation of it's possible application is thus: Suppose we are working with a binary decision tree. Such a tree could be used for a game tree or a puzzle solver or whatever. Further suppose the tree is small enough that we can fit all of the leaf nodes into memory. Each such node is basically just a bitfield listing all of the decisions. Now, if we want to prune arbitrary decisions from this tree, one method would be to just jump to the height where a particular decision is made and prune the left or right side of every node (left meaning one decision, right meaning the other). Normally in a decision tree you only want to prune subtree at a time (since the parent of that subtree is different from the parent of other subtrees and thus the decision which should be pruned in one subtree should not be pruned from others), but in some types of situations this may not be the case. Further, you normally only want to prune everything below a particular node, but in this case you'll be leaving some stuff below the node but also pruning below other nodes in the tree.
Anyhow, this is somewhat of a question based on curiousity; I'm not sure it's practical to use any results, but am interested in what people have to say.
Edit:
Thinking about it further, I think the tree method is actually O(n / logn), assuming it's reasonably dense. Proof:
Suppose you have a binary tree with n items. It's height is log(n). Removing half the bottom will require n/2 removals. Removing the half the row above will require n/4. The sum of operations for each row is n-1. So the average number of removals is n-1 / log(n).

Provided the length of your bitfields is limited, the following may work:
First, represent the bitfields that are in the set as an array of booleans, so in your case (4 bit bitfields), new bool[16];
Transform this array of booleans into a bitfield itself, so a 16-bit bitfield in this case, where each bit represents whether the bitfield corresponding to its index is included
Then operations become:
Remove(0, 0) = and with bitmask 1010101010101010
Remove(1, 0) = and with bitmask 0101010101010101
Remove(0, 2) = and with bitmask 1111000011110000
Note that more complicated 'add/remove' operations could then also be added as O(1) bit-logic.
The only down-side is that extra work is needed to interpret the resulting 16-bit bitfield back into a set of values, but with lookup arrays that might not turn out too bad either.
Addendum:
Additional down-sides:
Once the size of an integer is exceeded, every added bit to the original bit-fields will double the storage space. However, this is not much worse than a typical scenario using another collection where you have to store on average half the possible bitmask values (provided the typical scenario doesn't store far less remaining values).
Once the size of an integer is exceeded, every added bit also doubles the number of 'and' operations needed to implement the logic.
So basically, I'd say if your original bitfields are not much larger than a byte, you are likely better off with this encoding, beyond that you're probably better off with the original strategy.
Further addendum:
If you only ever execute Remove operations, which over time thins out the set state-space further and further, you may be able to stretch this approach a bit further (no pun intended) by making a more clever abstraction that somehow only keeps track of the int values that are non-zero. Detecting zero values may not be as expensive as it sounds either if the JIT knows what it's doing, because a CPU 'and' operation typically sets the 'zero' flag if the result is zero.
As with all performance optimizations, this one'd need some measurement to determine if it is worthwile.

If each decision bit and position are listed as objects, {bit value, k-th position}, you would end up with an array of length 2*k. If you link to each of these array positions from your item, represented as a linked list (which are of length k), using a pointer to the {bit, position} object as the node value, you can "invalidate" a bunch of items by simply deleting the {bit, position} object. This would require you, upon searching the list of items, to find "complete" items (it makes search REALLY slow?).
So something like:
[{0,0}, {1,0}, {0,1}, {1, 1}, {0,2}, {1, 2}, {0,3}, {1,3}]
and linked from "0100", represented as: {0->3->4->6}
You wouldn't know which items were invalid until you tried to find them (so it doesn't really limit your search space, which is what you're after).
Oh well, I tried.

Sure, it is possible (even if this is "cheating"). Just keep a stack of Remove objects:
struct Remove {
bool set;
int index;
}
The remove function just pushes an object on the stack. Viola, O(1).
If you wanted to get fancy, your stack couldn't exceed (number of bits) without containing duplicate or impossible scenarios.
The rest of the collection has to apply the logic whenever things are withdrawn or iterated over.
Two ways to do insert into the collection:
Apply the Remove rules upon insert, to clear out the stack, making in O(n). Gotta pay somewhere.
Each bitfield has to store it's index in the remove stack, to know what rules apply to it. Then, the stack size limit above wouldn't matter

If you use an array to store your binary tree, you can quickly index any element (the children of the node at index n are at index (n+1)*2 and (n+1)*2-1. All the nodes at a given level are stored sequentially. The first node at at level x is 2^x-1 and there are 2^x elements at that level.
Unfortunately, I don't think this really gets you much of anywhere from a complexity standpoint. Removing all the left nodes at a level is O(n/2) worst case, which is of course O(n). Of course the actual work depends on which bit you are checking, so the average may be somewhat better. This also requires O(2^n) memory which is much worse than the linked list and not practical at all.
I think what this problem is really asking is for a way to efficiently partition a set of sets into two sets. Using a bitset to describe the set gives you a fast check for membership, but doesn't seem to lend itself to making the problem any easier.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio