Performing select operation on a on a non-ranked balanced binary search tree - ranking

the select function returns int, and should work in O(log n).
my friend got asked this question in an interview and he couldn't answer it,
and i also have no idea how it is possible if the tree isn't ranked,
the interviewer mentioned that the return value of select should somehow help in the solution.
(select takes a rank as a parameter and returns the number in the tree that has this rank)

Related

Data Structure for supporting certain operations in O(log n) time

Consider storing tuples (id, date , views) in some data structure supporting the operations
insert(id, date, views) - inserts the element in the data structure and if the elements
if already there it simply update the views. So each id will be unique storing the date and view.
Only the views get updated.
delete(id) - remove the element with corresponding id
search(id) - returns the corresponding tuple with corresponding id
findelementWithMaxView(date) - consider all tuples which have the second part (date)
greater than or equal to date (parameter) return anyone with maximum views.
What Data structure we can make that could support all these operations in O(log(n)) time
My Thougths :
We could simply make an AVL tree with key as postId. That would support insert, delete and search
in O(log n). But just making this wont help with findelementWithMaxView(date) function in O(log n)
If I make another AVL tree with key as date and each node of the tree stores an extra information
That is the (max-views, id) where max-view is the maximum of views for the subtree with that node
and id is the id of the tuple with the max-views.
Not sure if this would do all the operations in O(log n)
Your line of thinking is correct - with a caveat
You line of thinking is correct. Having two data structures with O(log(n)) that are modified at every step will result in the complexity of O(log(n) + log(n)) which is still O(log(n)).
Insert(id): Insert (id, date, views) into first structure with id as key - log(n). Find the date key in the second structure and replace id, views if needed - log(n). So insert is ok.
Search(id): log(n), nothing else needed - ok.
Update(id): log(n) for search, then update the views. This means that we will need to perform search in the secondary structure to update the max-views and id if necessary. Since we are likely only ever going to increase the view count I will give you a bit of a leeway to say this is log(n).
FindElementWithMaxDate(date) thanks to the secondary structure, this will be manageable within the log(n).
Now here is an interesting question that wasn't covered: Are the dates unique? If yes, then everything is nice, delete works and we can all go and sing kumbaya.
However I'm somewhat convinced that it isn't the case - if the dates were unique, it wouldn't be necessary to have separate date and id. (Though it might still be useful, probably a good practice?) After a bit of thinking I arrived at a conclusion that both scenarios are possible. So what happens if the dates aren't unique?
If the dates aren't unique, the delete/decrease breaks.
Since we only store the (max-views, id) in our secondary structure, we are in trouble. The delete of the max-views leaves us with the impossible task of finding the 2nd-max without any preparation. So we will have to go through all the elements with our date - and there can be up to O(n) of them.1
So what can be done about that?
Since the ID's are unique, use the ID's as a tie breaker. So we have an AVL tree with date as a key, id as a tiebreaker. Furthermore, every node of the tree needs to hold the maximum_views(node, left_subtree, right_subtree) so that we can quickly answer the question for an interval of dates1.
1 On average there will be O(sqrt(n)) of them. Which is not that bad, however it is worse than the O(log(n)) we wanted.
2 Which is [date: last_date] in our case.

Data Structure: Returning the same shaped BST with another BST's values

Hey I have a question where I need to describe an algorithm that gets 2 Binary Search Trees, T1 and T2. The trees contain different values for each node.
And the algorithm should return a Binary Search Tree with the same shape as T2, but with values of T1 with time complexity of O(n) where n is the number of elements (same for both trees)
what we call "Equally Topological" (I think this is how it's called / A nice name for that)
For example:
T1 (defined the values)
T2 (defined the shape):
Should Return:
What I've tried so far is to think about the median value / average however that does not work each time, or think about maybe building an AVL tree then rotating it until we find the solution but I don't have in mind if that would work, or is of time complexity O(n). Any help would be appreciated! Thank you!
Traverse T1 and write values to an array (in sorted order they will be). Then create a copy of T2 (if necessary) and traverse it, writing values from array one by one. This will use 2 traversals, so it is Θ(n). Traverse both trees in order left subtree - vertex - right subtree.

Answer queries about the number of distinct numbers in a given range

The problem
I have an array with N numbers. The numbers may be disctints and may also be unordered. I have to answer Q queries about how many distinct numbers there are between A and B. Where A, B are indices between 0 <= A <= B < array.Length.
I know how to do it O(N) per query by using a HashTable but I'm asking for a more efficient solution.
I tried to improve it with sqrt decomposition and also with a segment tree but I couldn't. I'm not showing any code because I did not find any idea that worked.
I'm looking for someone to give an explanation of a faster solution.
UPDATE
I read you can collect the queries and then answer them all by using a Binary Indexed Tree (BIT). Can someone explain how to do it. Assume I know how a BIT works.
For each index i find the previous index prev[i] that has the same value (or -1 if there's no such index). It may be done in O(n) average by going left to right with hash_map, then the answer for range [l;r) of indices is number of elements i in range [l;r) such that their value is less then l (it require some thinking but should be clear)
Now we will solve problem "given range [l;r) and value c find number of elements that are less then c" on array prev. It may be done in O(log^2) using segment tree, if we save in each vertex all the numbers that are in its range(subtree). (On each query we will get O(log n) vertices and do binary search in them)

Cormen quick sort modify partition function

I am learning quick sort from Introduction to Algorithms.I got stuck on question 7.1-2. of Chapter 7 Quicksort-
"What value of q does PARTITION return when all elements in the array A[p…r] have the same value? Modify PARTITION so that q=⌊(p+r)/2⌋ when > all elements in the array A[p…r] have the same value."
The first part is easy and the answer is definitely r.But I can't even figure out what the second part is asking.I mean what is the reason for setting the pivot to (p+r)/2.Further I can't understand the solutions I found on searching on Google.
Please help me in understanding what is the advantage of this modification in case all elements are equal and if possible please provide the algorithm to do so.
By setting the pivot to the middle of p and r, we divide the array of size n into two sub-problems of equal size n/2. If you draw the recursion tree for the following recurrence, you would see its height as O(lgn)
T(n) = 2T(n/2)+O(n)
Imagine if the position of the pivot returned from the partition is always the last element in the array. Then the recurrence for the run time would be
T(n) = T(n-1)+O(n)
Do you see now why is it inefficient if the recursion tree is like a linked list? Try drawing the tree and adding the costs at each node in both the cases.
Also modifying the partition method to return (p+r)/2 is easy.
Hint: One easy way is split <= in the if condition of the partition.

Finding closest number in a range

I thought a problem which is as follows:
We have an array A of integers of size n, and we have test cases t and in every test cases we are given a number m and a range [s,e] i.e. we are given s and e and we have to find the closest number of m in the range of that array(A[s]-A[e]).
You may assume array indexed are from 1 to n.
For example:
A = {5, 12, 9, 18, 19}
m = 13
s = 4 and e = 5
So the answer should be 18.
Constraints:
n<=10^5
t<=n
All I can thought is an O(n) solution for every test case, and I think a better solution exists.
This is a rough sketch:
Create a segment tree from the data. At each node, besides the usual data like left and right indices, you also store the numbers found in the sub-tree rooted at that node, stored in sorted order. You can achieve this when you construct the segment tree in bottom-up order. In the node just above the leaf, you store the two leaf values in sorted order. In an intermediate node, you keep the numbers in the left child, and right child, which you can merge together using standard merging. There are O(n) nodes in the tree, and keeping this data should take overall O(nlog(n)).
Once you have this tree, for every query, walk down the path till you reach the appropriate node(s) in the given range ([s, e]). As the tutorial shows, one or more different nodes would combine to form the given range. As the tree depth is O(log(n)), that is the time per query to reach these nodes. Each query should be O(log(n)). For all the nodes which lie completely inside the range, find the closest number using binary search in the sorted array stored in those nodes. Again, O(log(n)). Find the closest among all these, and that is the answer. Thus, you can answer each query in O(log(n)) time.
The tutorial I link to contains other data structures, such as sparse table, which are easier to implement, and should give O(sqrt(n)) per query. But I haven't thought much about this.
sort the array and do binary search . complexity : o(nlogn + logn *t )
I'm fairly sure no faster solution exists. A slight variation of your problem is:
There is no array A, but each test case contains an unsorted array of numbers to search. (The array slice of A from s to e).
In that case, there is clearly no better way than a linear search for each test case.
Now, in what way is your original problem more specific than the variation above? The only added information is that all the slices come from the same array. I don't think that this additional constraint can be used for an algorithmic speedup.
EDIT: I stand corrected. The segment tree data structure should work.

Resources