How to use balanced binary trees to solve this challenge? - algorithm

1(70)
/ \
/ \
2(40) 5(10)
/ \ \
/ \ \
3(60) 4(80) 6(20)
/ \
/ \
7(30) 8(50)
This is for an online challenge (not live contest). I don't need someone to solve for me, just to push in right direction. Trying to learn.
Each node has a unique ID, no two people have same salary. Person #1 has salary $70, person #7 has $30 salary, for example. Tree structure denotes who supervises who. Question is who has kth lowest salary on a person's subordinates.
For example I choose person #2. Who is 2nd lowest among subordinates? #2's subordinates are 3, 4, 7, 8. 2nd lowest salary is $50 belonging to person #8.
There are many queries so structure to must be efficient.
I thought about this problem and researched data structures. Binary tree seems like a good idea but I need help.
For example I think ideal structure look like, for person #2:
2(40)
/ \
/ \
7(30) 3(60)
/ \
/ \
8(50) 4(80)
Every child node is subordinate of #2, every left branch has lower salary than on right. If I store how many children at each node I can get kth lowest.
For example: From #2, left branch 1 node, right branch 3 nodes. So 2nd lowest - 1 means I now want 1st lowest in right branch.
Move to #3, 1st lowest points to #8 with $50 which is correct.
My question:
Is this approach as I describe it a good one? Is it a valid approach?
I am having trouble figuring out how to construct this kind of tree. I think I can make them recursively. But hard to figure out how to make all children into new tree sorted by salary. Need some light help.

Here's a solution that uses O(n log^2 n + q log n) time and O(n log^2 n) space (not the best on the latter count, but probably good enough given the limits).
Implement a purely functional sorted list (as an augmented binary search tree) with the following operations and some way to iterate.
EmptyList() -> returns the empty list
Insert(list, key) -> returns the list where |key| has been inserted into |list|
Length(list) -> returns the length of the list
Get(list, k) -> returns the element at index |k| in |list|
On top of these operations, implement an operation
Merge(list1, list2) -> returns the union of |list1| and |list2|
by inserting the elements of the shorter list into the longer.
Now do the obvious thing: traverse the employee hierarchy from leaves to root, setting the ordered list for each employee to the appropriate merge of her subordinate lists, and answer the queries.
Analysis (sketch)
Each query takes O(log n) time. The interesting part of the analysis pertains to the preprocessing.
The cost of preprocessing is dominated by the cost of calling Insert(), specifically from Merge(), since there are n other insertions. Each insertion takes O(log n) time and costs O(log n) space (measuring in words).
What keeps the preprocessing from being quadratic is an implicit heavy path decomposition. Every time we merge two lists, neither list is merged subsequently. Since the shorter list is inserted into the longer, every time a key is inserted into a list, that list is at least twice as long as the list into which that key was previously inserted. It follows that each key is the subject of at most lg n insertions, which suffices to establish a bound of O(n log n) insertions overall and thus the claimed resource bounds.

Here is one possible solution. For each node, we will construct an array of all children's values of that node and keep this in sorted order. The result we are looking for is a dictionary of the form
{ 1 : [10, 20, 30, 40, 60 80],
2 : [30, 50, 60, 80]
...
}
Once we have this, to query for any node for the ith lowest salary, just take the ith element of the array. The total time to do all the queries is O( q ) where q is the number of queries.
How do we construct this? Assuming you have a pointer to the root node, you can recursively construct the sorted salaries for each child. Store those values in the result. Make a copy of each child's array, and insert each child's salary into the child's copied array. Use binary search to find the position, since each array is sorted. Now you have k sorted arrays, you merge them to get a sorted array. If you are merging two arrays, this can be done in linear time. Simply loop, picking the first element of the array that is smaller each time.
For the case where each node has 2 children, merging the two children's arrays is O(n) tine. Inserting the salaries of each node to its corresponding array is O(log(n)) per node since we use binary search. Copying the children array is O(n), and there are n nodes, so we have O(n^2) total time for pre-processing.
Total run time is O(n^2 + q)
What if we can not assume each node has at most 2 children? Then to merge the arrays, use this algorithm. This runs in O(nlog(k)) where k is the number of arrays to merge, since we pop from the heap once per element, and resizing the heap takes O(log(k)) when there are k arrays. k<=n so we can simplify this to O(nlog(n)). So the total running time is unchanged.
The space complexity of this solution is O(n^2).

The question has two parts: first, finding the specified person, and then finding the kth subordinate.
Since the tree is not ordered by id, to find the specified perspn by id requires walking the whole tree until the specified id is found. To speed up this part, we can builda hash map that would allow us to find the person node by id in O(1) time and require O(n) space and set-up time.
Then, to find the subordinate with the kth lowest salary, we need to search the subtree. Since its not ordered by salary, we would have to scan the whole subtree and find the kth lowest salary. This could be done using an array or a heap (putting the subtree nodes into array or heap). This second part would O(m log k) time using the heap to keep the lowest k items, where m is the number of sub-ordinates, and require O(k) space. This should be acceptable if m (number of subordinates of specified person), and k are small.

Related

Best sorting algorithm - Partially sorted linked list

Problem- Given a sorted doubly link list and two numbers C and K. You need to decrease the info of node with data K by C and insert the new node formed at its correct position such that the list remains sorted.
I would think of insertion sort for such problem, because, insertion sort at any instance looks like, shown bunch of cards,
that are partially sorted. For insertion sort, number of swaps is equivalent to number of inversions. Number of compares is equivalent to number of exchanges + (N-1).
So, in the given problem(above), if node with data K is decreased by C, then the sorted linked list became partially sorted. Insertion sort is the best fit.
Another point is, amidst selection of sorting algorithm, if sorting logic applied for array representation of data holds best fit, then same sorting logic should holds best fit for linked list representation of same data.
For this problem, Is my thought process correct in choosing insertion sort?
Maybe you mean something else, but insertion sort is not the best algorithm, because you actually don't need to sort anything. If there is only one element with value K then it doesn't make a big difference, but otherwise it does.
So I would suggest the following algorithm O(n), ignoring edge cases for simplicity:
Go forward in the list until the value of the current node is > K - C.
Save this node, all the reduced nodes will be inserted before this one.
Continue to go forward while the value of the current node is < K
While the value of the current node is K, remove node, set value to K - C and insert it before the saved node. This could be optimized further, so that you only do one remove and insert operation of the whole sublist of nodes which had value K.
If these decrease operations can be batched up before the sorted list must be available, then you can simply remove all the decremented nodes from the list. Then, sort them, and perform a two-way merge into the list.
If the list must be maintained in order after each node decrement, then there is little choice but to remove the decremented node and re-insert in order.
Doing this with a linear search for a deck of cards is probably acceptable, unless you're running some monstrous Monte Carlo simulation involving cards, that runs for hours or day, so that optimization counts.
Otherwise the way we would deal with the need to maintain order would be to use an ordered sequence data structure: balanced binary tree (red-black, splay) or a skip list. Take the node out of the structure, adjust value, re-insert: O(log N).

How can this be done in O(nlogn) time complexity

I had a question on my exams for which I had to come up with an efficient algorithm. The problem was like this:
We have some objects which have two properties:
H = <1,1000000>
R = <1,1000000>
we can insert one object into another if H1>H2 and R1>R2. The input contains pairs of H and R, one pair per line. if the current object can be inserted in any previous objects, we choose such with the least H and then we destroy both of them. print the number of left objects in the output.
I wonder how can this problem be solved in O(n.log(n)) time complexity using binary search trees or segment tree, or with fenwick tree.
Thanks in advance.
A solution with fenwick tree, as follows;
Let's sort the whole array by R at first (right now, we are not caring about H), and assign each item a token (which is equal to it's position in the sorted array).
Let's get back to our original array. We are going to run a sweep on the given array. Say, we have a fenwick tree, which will, instead of cumulative sum, store maximum (from beginning to that position) only for H.
For an item, say, we couldn't fit it into another item. Then we'll insert it into the tree. We'll insert in such position that is equal to it's token.
So, right now, we've a fenwick tree, which contains only the items we've dealt with till now. Other values are 0. The items in the tree are positioned in R sorted order.
Now, how to find out if we can fit current item to another object? We can actually run a binary search (upper bound) on fenwick tree for current item's H. And, as the items are already sorted in R order, instead of whole tree, we need to search in the effective range.
Binary search in fenwick tree can be done in O(log(n)). Check out the Find index with given cumulative frequency part of this article.

IOI Qualifier INOI task 2

I can't figure out how to solve question 2 in the following link in an efficient manner:
http://www.iarcs.org.in/inoi/2012/inoi2012/inoi2012-qpaper.pdf
You can do this in On log n) time. (Or linear if you really care to.) First, pad the input array out to the next power of two using some really big negative number. Now, build an interval tree-like data structure; recursively partition your array by dividing it in half. Each node in the tree represents a subarray whose length is a power of two and which begins at a position that is a multiple of its length, and each nonleaf node has a "left half" child and a "right half" child.
Compute, for each node in your tree, what happens when you add 0,1,2,3,... to that subarray and take the maximum element. Notice that this is trivial for the leaves, which represent subarrays of length 1. For internal nodes, this is simply the maximum of the left child with length/2 + right child. So you can build this tree in linear time.
Now we want to run a sequence of n queries on this tree and print out the answers. The queries are of the form "what happens if I add k,k+1,k+2,...n,1,...,k-1 to the array and report the maximum?"
Notice that, when we add that sequence to the whole array, the break between n and 1 either occurs at the beginning/end, or smack in the middle, or somewhere in the left half, or somewhere in the right half. So, partition the array into the k,k+1,k+2,...,n part and the 1,2,...,k-1 part. If you identify all of the nodes in the tree that represent subarrays lying completely inside one of the two sequences but whose parents either don't exist or straddle the break-point, you will have O(log n) nodes. You need to look at their values, add various constants, and take the maximum. So each query takes O(log n) time.

Finding closest number in a range

I thought a problem which is as follows:
We have an array A of integers of size n, and we have test cases t and in every test cases we are given a number m and a range [s,e] i.e. we are given s and e and we have to find the closest number of m in the range of that array(A[s]-A[e]).
You may assume array indexed are from 1 to n.
For example:
A = {5, 12, 9, 18, 19}
m = 13
s = 4 and e = 5
So the answer should be 18.
Constraints:
n<=10^5
t<=n
All I can thought is an O(n) solution for every test case, and I think a better solution exists.
This is a rough sketch:
Create a segment tree from the data. At each node, besides the usual data like left and right indices, you also store the numbers found in the sub-tree rooted at that node, stored in sorted order. You can achieve this when you construct the segment tree in bottom-up order. In the node just above the leaf, you store the two leaf values in sorted order. In an intermediate node, you keep the numbers in the left child, and right child, which you can merge together using standard merging. There are O(n) nodes in the tree, and keeping this data should take overall O(nlog(n)).
Once you have this tree, for every query, walk down the path till you reach the appropriate node(s) in the given range ([s, e]). As the tutorial shows, one or more different nodes would combine to form the given range. As the tree depth is O(log(n)), that is the time per query to reach these nodes. Each query should be O(log(n)). For all the nodes which lie completely inside the range, find the closest number using binary search in the sorted array stored in those nodes. Again, O(log(n)). Find the closest among all these, and that is the answer. Thus, you can answer each query in O(log(n)) time.
The tutorial I link to contains other data structures, such as sparse table, which are easier to implement, and should give O(sqrt(n)) per query. But I haven't thought much about this.
sort the array and do binary search . complexity : o(nlogn + logn *t )
I'm fairly sure no faster solution exists. A slight variation of your problem is:
There is no array A, but each test case contains an unsorted array of numbers to search. (The array slice of A from s to e).
In that case, there is clearly no better way than a linear search for each test case.
Now, in what way is your original problem more specific than the variation above? The only added information is that all the slices come from the same array. I don't think that this additional constraint can be used for an algorithmic speedup.
EDIT: I stand corrected. The segment tree data structure should work.

Data Structure for fast position lookup

Looking for a datastructure that logically represents a sequence of elements keyed by unique ids (for the purpose of simplicity let's consider them to be strings, or at least hashable objects). Each element can appear only once, there are no gaps, and the first position is 0.
The following operations should be supported (demonstrated with single-letter strings):
insert(id, position) - add the element keyed by id into the sequence at offset position. Naturally, the position of each element later in the sequence is now incremented by one. Example: [S E L F].insert(H, 1) -> [S H E L F]
remove(position) - remove the element at offset position. Decrements the position of each element later in the sequence by one. Example: [S H E L F].remove(2) -> [S H L F]
lookup(id) - find the position of element keyed by id. [S H L F].lookup(H) -> 1
The naïve implementation would be either a linked list or an array. Both would give O(n) lookup, remove, and insert.
In practice, lookup is likely to be used the most, with insert and remove happening frequently enough that it would be nice not to be linear (which a simple combination of hashmap + array/list would get you).
In a perfect world it would be O(1) lookup, O(log n) insert/remove, but I actually suspect that wouldn't work from a purely information-theoretic perspective (though I haven't tried it), so O(log n) lookup would still be nice.
A combination of trie and hash map allows O(log n) lookup/insert/remove.
Each node of trie contains id as well as counter of valid elements, rooted by this node and up to two child pointers. A bit string, determined by left (0) or right (1) turns while traversing the trie from its root to given node, is part of the value, stored in the hash map for corresponding id.
Remove operation marks trie node as invalid and updates all counters of valid elements on the path from deleted node to the root. Also it deletes corresponding hash map entry.
Insert operation should use the position parameter and counters of valid elements in each trie node to search for new node's predecessor and successor nodes. If in-order traversal from predecessor to successor contains any deleted nodes, choose one with lowest rank and reuse it. Otherwise choose either predecessor or successor, and add a new child node to it (right child for predecessor or left one for successor). Then update all counters of valid elements on the path from this node to the root and add corresponding hash map entry.
Lookup operation gets a bit string from the hash map and uses it to go from trie root to corresponding node while summing all the counters of valid elements to the left of this path.
All this allow O(log n) expected time for each operation if the sequence of inserts/removes is random enough. If not, the worst case complexity of each operation is O(n). To get it back to O(log n) amortized complexity, watch for sparsity and balancing factors of the tree and if there are too many deleted nodes, re-create a new perfectly balanced and dense tree; if the tree is too imbalanced, rebuild the most imbalanced subtree.
Instead of hash map it is possible to use some binary search tree or any dictionary data structure. Instead of bit string, used to identify path in the trie, hash map may store pointer to corresponding node in trie.
Other alternative to using trie in this data structure is Indexable skiplist.
O(log N) time for each operation is acceptable, but not perfect. It is possible, as explained by Kevin, to use an algorithm with O(1) lookup complexity in exchange for larger complexity of other operations: O(sqrt(N)). But this can be improved.
If you choose some number of memory accesses (M) for each lookup operation, other operations may be done in O(M*N1/M) time. The idea of such algorithm is presented in this answer to related question. Trie structure, described there, allows easily converting the position to the array index and back. Each non-empty element of this array contains id and each element of hash map maps this id back to the array index.
To make it possible to insert element to this data structure, each block of contiguous array elements should be interleaved with some empty space. When one of the blocks exhausts all available empty space, we should rebuild the smallest group of blocks, related to some element of the trie, that has more than 50% empty space. When total number of empty space is less than 50% or more than 75%, we should rebuild the whole structure.
This rebalancing scheme gives O(MN1/M) amortized complexity only for random and evenly distributed insertions/removals. Worst case complexity (for example, if we always insert at leftmost position) is much larger for M > 2. To guarantee O(MN1/M) worst case we need to reserve more memory and to change rebalancing scheme so that it maintains invariant like this: keep empty space reserved for whole structure at least 50%, keep empty space reserved for all data related to the top trie nodes at least 75%, for next level trie nodes - 87.5%, etc.
With M=2, we have O(1) time for lookup and O(sqrt(N)) time for other operations.
With M=log(N), we have O(log(N)) time for every operation.
But in practice small values of M (like 2 .. 5) are preferable. This may be treated as O(1) lookup time and allows this structure (while performing typical insert/remove operation) to work with up to 5 relatively small contiguous blocks of memory in a cache-friendly way with good vectorization possibilities. Also this limits memory requirements if we require good worst case complexity.
You can achieve everything in O(sqrt(n)) time, but I'll warn you that it's going to take some work.
Start by having a look at a blog post I wrote on ThriftyList. ThriftyList is my implementation of the data structure described in Resizable Arrays in Optimal Time and Space along with some customizations to maintain O(sqrt(n)) circular sublists, each of size O(sqrt(n)). With circular sublists, one can achieve O(sqrt(n)) time insertion/removal by the standard insert/remove-then-shift in the containing sublist followed by a series of push/pop operations across the circular sublists themselves.
Now, to get the index at which a query value falls, you'll need to maintain a map from value to sublist/absolute-index. That is to say, a given value maps to the sublist containing the value, plus the absolute index at which the value falls (the index at which the item would fall were the list non-circular). From these data, you can compute the relative index of the value by taking the offset from the head of the circular sublist and summing with the number of elements which fall behind the containing sublist. To maintain this map requires O(sqrt(n)) operations per insert/delete.
Sounds roughly like Clojure's persistent vectors - they provide O(log32 n) cost for lookup and update. For smallish values of n O(log32 n) is as good as constant....
Basically they are array mapped tries.
Not quite sure on the time complexity for remove and insert - but I'm pretty sure that you could get a variant of this data structure with O(log n) removes and inserts as well.
See this presentation/video: http://www.infoq.com/presentations/Value-Identity-State-Rich-Hickey
Source code (Java): https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/PersistentVector.java

Resources