Data Structure: Returning the same shaped BST with another BST's values - algorithm

Hey I have a question where I need to describe an algorithm that gets 2 Binary Search Trees, T1 and T2. The trees contain different values for each node.
And the algorithm should return a Binary Search Tree with the same shape as T2, but with values of T1 with time complexity of O(n) where n is the number of elements (same for both trees)
what we call "Equally Topological" (I think this is how it's called / A nice name for that)
For example:
T1 (defined the values)
T2 (defined the shape):
Should Return:
What I've tried so far is to think about the median value / average however that does not work each time, or think about maybe building an AVL tree then rotating it until we find the solution but I don't have in mind if that would work, or is of time complexity O(n). Any help would be appreciated! Thank you!

Traverse T1 and write values to an array (in sorted order they will be). Then create a copy of T2 (if necessary) and traverse it, writing values from array one by one. This will use 2 traversals, so it is Θ(n). Traverse both trees in order left subtree - vertex - right subtree.

Related

How can this be done in O(nlogn) time complexity

I had a question on my exams for which I had to come up with an efficient algorithm. The problem was like this:
We have some objects which have two properties:
H = <1,1000000>
R = <1,1000000>
we can insert one object into another if H1>H2 and R1>R2. The input contains pairs of H and R, one pair per line. if the current object can be inserted in any previous objects, we choose such with the least H and then we destroy both of them. print the number of left objects in the output.
I wonder how can this problem be solved in O(n.log(n)) time complexity using binary search trees or segment tree, or with fenwick tree.
Thanks in advance.
A solution with fenwick tree, as follows;
Let's sort the whole array by R at first (right now, we are not caring about H), and assign each item a token (which is equal to it's position in the sorted array).
Let's get back to our original array. We are going to run a sweep on the given array. Say, we have a fenwick tree, which will, instead of cumulative sum, store maximum (from beginning to that position) only for H.
For an item, say, we couldn't fit it into another item. Then we'll insert it into the tree. We'll insert in such position that is equal to it's token.
So, right now, we've a fenwick tree, which contains only the items we've dealt with till now. Other values are 0. The items in the tree are positioned in R sorted order.
Now, how to find out if we can fit current item to another object? We can actually run a binary search (upper bound) on fenwick tree for current item's H. And, as the items are already sorted in R order, instead of whole tree, we need to search in the effective range.
Binary search in fenwick tree can be done in O(log(n)). Check out the Find index with given cumulative frequency part of this article.

How to use balanced binary trees to solve this challenge?

1(70)
/ \
/ \
2(40) 5(10)
/ \ \
/ \ \
3(60) 4(80) 6(20)
/ \
/ \
7(30) 8(50)
This is for an online challenge (not live contest). I don't need someone to solve for me, just to push in right direction. Trying to learn.
Each node has a unique ID, no two people have same salary. Person #1 has salary $70, person #7 has $30 salary, for example. Tree structure denotes who supervises who. Question is who has kth lowest salary on a person's subordinates.
For example I choose person #2. Who is 2nd lowest among subordinates? #2's subordinates are 3, 4, 7, 8. 2nd lowest salary is $50 belonging to person #8.
There are many queries so structure to must be efficient.
I thought about this problem and researched data structures. Binary tree seems like a good idea but I need help.
For example I think ideal structure look like, for person #2:
2(40)
/ \
/ \
7(30) 3(60)
/ \
/ \
8(50) 4(80)
Every child node is subordinate of #2, every left branch has lower salary than on right. If I store how many children at each node I can get kth lowest.
For example: From #2, left branch 1 node, right branch 3 nodes. So 2nd lowest - 1 means I now want 1st lowest in right branch.
Move to #3, 1st lowest points to #8 with $50 which is correct.
My question:
Is this approach as I describe it a good one? Is it a valid approach?
I am having trouble figuring out how to construct this kind of tree. I think I can make them recursively. But hard to figure out how to make all children into new tree sorted by salary. Need some light help.
Here's a solution that uses O(n log^2 n + q log n) time and O(n log^2 n) space (not the best on the latter count, but probably good enough given the limits).
Implement a purely functional sorted list (as an augmented binary search tree) with the following operations and some way to iterate.
EmptyList() -> returns the empty list
Insert(list, key) -> returns the list where |key| has been inserted into |list|
Length(list) -> returns the length of the list
Get(list, k) -> returns the element at index |k| in |list|
On top of these operations, implement an operation
Merge(list1, list2) -> returns the union of |list1| and |list2|
by inserting the elements of the shorter list into the longer.
Now do the obvious thing: traverse the employee hierarchy from leaves to root, setting the ordered list for each employee to the appropriate merge of her subordinate lists, and answer the queries.
Analysis (sketch)
Each query takes O(log n) time. The interesting part of the analysis pertains to the preprocessing.
The cost of preprocessing is dominated by the cost of calling Insert(), specifically from Merge(), since there are n other insertions. Each insertion takes O(log n) time and costs O(log n) space (measuring in words).
What keeps the preprocessing from being quadratic is an implicit heavy path decomposition. Every time we merge two lists, neither list is merged subsequently. Since the shorter list is inserted into the longer, every time a key is inserted into a list, that list is at least twice as long as the list into which that key was previously inserted. It follows that each key is the subject of at most lg n insertions, which suffices to establish a bound of O(n log n) insertions overall and thus the claimed resource bounds.
Here is one possible solution. For each node, we will construct an array of all children's values of that node and keep this in sorted order. The result we are looking for is a dictionary of the form
{ 1 : [10, 20, 30, 40, 60 80],
2 : [30, 50, 60, 80]
...
}
Once we have this, to query for any node for the ith lowest salary, just take the ith element of the array. The total time to do all the queries is O( q ) where q is the number of queries.
How do we construct this? Assuming you have a pointer to the root node, you can recursively construct the sorted salaries for each child. Store those values in the result. Make a copy of each child's array, and insert each child's salary into the child's copied array. Use binary search to find the position, since each array is sorted. Now you have k sorted arrays, you merge them to get a sorted array. If you are merging two arrays, this can be done in linear time. Simply loop, picking the first element of the array that is smaller each time.
For the case where each node has 2 children, merging the two children's arrays is O(n) tine. Inserting the salaries of each node to its corresponding array is O(log(n)) per node since we use binary search. Copying the children array is O(n), and there are n nodes, so we have O(n^2) total time for pre-processing.
Total run time is O(n^2 + q)
What if we can not assume each node has at most 2 children? Then to merge the arrays, use this algorithm. This runs in O(nlog(k)) where k is the number of arrays to merge, since we pop from the heap once per element, and resizing the heap takes O(log(k)) when there are k arrays. k<=n so we can simplify this to O(nlog(n)). So the total running time is unchanged.
The space complexity of this solution is O(n^2).
The question has two parts: first, finding the specified person, and then finding the kth subordinate.
Since the tree is not ordered by id, to find the specified perspn by id requires walking the whole tree until the specified id is found. To speed up this part, we can builda hash map that would allow us to find the person node by id in O(1) time and require O(n) space and set-up time.
Then, to find the subordinate with the kth lowest salary, we need to search the subtree. Since its not ordered by salary, we would have to scan the whole subtree and find the kth lowest salary. This could be done using an array or a heap (putting the subtree nodes into array or heap). This second part would O(m log k) time using the heap to keep the lowest k items, where m is the number of sub-ordinates, and require O(k) space. This should be acceptable if m (number of subordinates of specified person), and k are small.

Finding closest number in a range

I thought a problem which is as follows:
We have an array A of integers of size n, and we have test cases t and in every test cases we are given a number m and a range [s,e] i.e. we are given s and e and we have to find the closest number of m in the range of that array(A[s]-A[e]).
You may assume array indexed are from 1 to n.
For example:
A = {5, 12, 9, 18, 19}
m = 13
s = 4 and e = 5
So the answer should be 18.
Constraints:
n<=10^5
t<=n
All I can thought is an O(n) solution for every test case, and I think a better solution exists.
This is a rough sketch:
Create a segment tree from the data. At each node, besides the usual data like left and right indices, you also store the numbers found in the sub-tree rooted at that node, stored in sorted order. You can achieve this when you construct the segment tree in bottom-up order. In the node just above the leaf, you store the two leaf values in sorted order. In an intermediate node, you keep the numbers in the left child, and right child, which you can merge together using standard merging. There are O(n) nodes in the tree, and keeping this data should take overall O(nlog(n)).
Once you have this tree, for every query, walk down the path till you reach the appropriate node(s) in the given range ([s, e]). As the tutorial shows, one or more different nodes would combine to form the given range. As the tree depth is O(log(n)), that is the time per query to reach these nodes. Each query should be O(log(n)). For all the nodes which lie completely inside the range, find the closest number using binary search in the sorted array stored in those nodes. Again, O(log(n)). Find the closest among all these, and that is the answer. Thus, you can answer each query in O(log(n)) time.
The tutorial I link to contains other data structures, such as sparse table, which are easier to implement, and should give O(sqrt(n)) per query. But I haven't thought much about this.
sort the array and do binary search . complexity : o(nlogn + logn *t )
I'm fairly sure no faster solution exists. A slight variation of your problem is:
There is no array A, but each test case contains an unsorted array of numbers to search. (The array slice of A from s to e).
In that case, there is clearly no better way than a linear search for each test case.
Now, in what way is your original problem more specific than the variation above? The only added information is that all the slices come from the same array. I don't think that this additional constraint can be used for an algorithmic speedup.
EDIT: I stand corrected. The segment tree data structure should work.

Data structure supporting Add and Partial-Sum

Let A[1..n] be an array of real numbers. Design an algorithm to perform any sequence of the following operations:
Add(i,y) -- Add the value y to the ith number.
Partial-sum(i) -- Return the sum of the first i numbers, i.e.
There are no insertions or deletions; the only change is to the values of the numbers. Each operation should take O(logn) steps. You may use one additional array of size n as a work space.
How to design a data structure for above algorithm?
Construct a balanced binary tree with n leaves; stick the elements along the bottom of the tree in their original order.
Augment each node in the tree with "sum of leaves of subtree"; a tree has #leaves-1 nodes so this takes O(n) setup time (which we have).
Querying a partial-sum goes like this: Descend the tree towards the query (leaf) node, but whenever you descend right, add the subtree-sum on the left plus the element you just visited, since those elements are in the sum.
Modifying a value goes like this: Find the query (left) node. Calculate the difference you added. Travel to the root of the tree; as you travel to the root, update each node you visit by adding in the difference (you may need to visit adjacent nodes, depending if you're storing "sum of leaves of subtree" or "sum of left-subtree plus myself" or some variant); the main idea is that you appropriately update all the augmented branch data that needs updating, and that data will be on the root path or adjacent to it.
The two operations take O(log(n)) time (that's the height of a tree), and you do O(1) work at each node.
You can probably use any search tree (e.g. a self-balancing binary search tree might allow for insertions, others for quicker access) but I haven't thought that one through.
You may use Fenwick Tree
See this question

Interval tree algorithm that supports merging of intervals with no overlap

I'm looking for an interval tree algorithm similar to the red-black interval tree in CLR but that supports merging of intervals by default so that there are never any overlapping intervals.
In other words if you had a tree containing two intervals [2,3] and [5,6] and you added the interval [4,4], the result would be a tree containing just one interval [2,6].
Thanks
Update: the use case I'm considering is calculating transitive closure. Interval sets are used to store the successor sets because they have been found to be quite compact. But if you represent interval sets just as a linked list I have found that in some situations they can become quite large and hence so does the time required to find the insertion point. Hence my interest in interval trees. Also there may be quite a lot of merging one tree with another (i.e. a set OR operation) - if both trees are large then it may be better to create a new tree using inorder walks of both trees rather than repeated insertions of each interval.
The problem I see is that inserting a large interval can wipe out a large chunk of the tree, making it difficult to recover the red-black invariants.
I think it would be easier to use a splay tree, as follows. For simplicity, each tree contains two sentinels, an interval A to the left of all other intervals and an interval Z to the right. When inserting an interval I, splay I's predecessor-to-be H to the root of the tree. The tree looks like
H
/ \
... X
/ \
... ...
Now detach X and splay I's successor-to-be J to the root.
H J
/ / \
... ... ...
At this point all of the intervals that overlap I are in J's left subtree. Detach that subtree and put all of its nodes on the free list, extending I if necessary. Make I the parent of H and J
I
/ \
H J
/ \
... ...
and continue on our merry way. This operation is O(log n) amortized, where n is the number of tree nodes (this can be proved by examining the splay tree potential function and doing a lot of algebra).
I should add that there's a natural recursive tree-to-tree merge by inserting the root of one tree and then merging the left and right subtrees. I don't know how to analyze it off-hand.

Resources