Can anyone provide a mathematical proof that why the size of segment tree array created for an n sized array will be maximum 4*n?
I tried proving N(size of segment tree array) as a function of n(size of given array for which segment tree is to be built) but was unable to prove it. I know that the maximum number of nodes for l levels will be (2^l) - 1 which can be further written as (2^ceil(log(N+1))) -1. But how N is maximum 4n is the proof what I want.
Related
I have a min heap H with n elements. The function min(H,k) prints the k smallest values by order from small to largest. In the end of the method, H still contains the n values. I was asked to give an algorithm of min(H,k) in O(klogk) and O(k) extra space. In the solutions they did as follows:
We will use an extra min heap T without any data. It will contain copies of elements of the original min heap H (There will be a two-way pointer between the values from H and T). The algo:
Print the minimal element of H in O(1).
Insert into T the two children of the root of H.
As long as we didn't print k values do:
Print the minimal element of T (lets call it x).
Remove x from T.
Insert into heap T the two children of x of heap H (if exists).
I don't understand why this algorithm is valid and what is worst, I don't understand the algorithm at all. I understand that we create a new heap T. I also understand why printing the minimal element of H is O(1). I don't understand the "Insert into T the two children of the root of H" part. Does it inserts pointers of those children into heap T or just their values? If the answer is the second option, then how I know to follow to the next ones?
The elements of T have to let you get at the elements and positions of the elements in the original heap. If you can do that then you can find the next elements and you can find the values.
A variety of representations will work. For example with the min heap array representation all that you need to know are the offsets into the array, with global constants for the length of the array and the start of the array.
The key insight as to why this works is that all operations on T are heap operations on a heap of size O(k). Therefore they each take O(log(k)). There are O(k) operations needed so the result is O(k log(k)).
I have given an array of size N. I have Q queries I have to calculate gcd between L and R where L R, where 1 ≤ L ≤ R ≤ N.How to calculate it efficiently as brute force approach will fail .
GCD is additive and commutative. This means segment tree can solve this problem with log(N) time per range GCD query.
Wikipedia has a nice article about segment trees
Segment tree can be used to do preprocessing and query in moderate time. With segment tree, preprocessing time is O(n) and time to for GCD query is O(Logn). The extra space required is O(n) to store the segment tree.
Representation of Segment trees
Leaf Nodes are the elements of the input array.
Each internal node represents GCD of all leaves under it.
Array representation of tree is used to represent Segment Trees i.e., for each node at index i,
Left child is at index 2*i+1
Right child at 2*i+2 and the parent is at floor((i-1)/2).
Implementation can be found here http://www.geeksforgeeks.org/gcds-of-a-given-index-ranges-in-an-array/
OR use a sparse table with O(n log n) processing and O(1) per query
Given an array of integers and some query operations.
The query operations are of 2 types
1.Update the value of the ith index to x.
2.Given 2 integers find the kth minimum in that range.(Ex if the 2 integers are i and j ,we have to find out the kth minimum between i and j both inclusive).
I can find the Range minimum query using segment tree but could no do so for the kth minimum.
Can anyone help me?
Here is a O(polylog n) per query solution that does actually not assume a constant k, so the k can vary between queries. The main idea is to use a segment tree, where every node represents an interval of array indices and contains a multiset (balanced binary search tree) of the values in the represened array segment. The update operation is pretty straightforward:
Walk up the segment tree from the leaf (the array index you're updating). You will encounter all nodes that represent an interval of array indices that contain the updated index. At every node, remove the old value from the multiset and insert the new value into the multiset. Complexity: O(log^2 n)
Update the array itself.
We notice that every array element will be in O(log n) multisets, so the total space usage is O(n log n). With linear-time merging of multisets we can build the initial segment tree in O(n log n) as well (there's O(n) work per level).
What about queries? We are given a range [i, j] and a rank k and want to find the k-th smallest element in a[i..j]. How do we do that?
Find a disjoint coverage of the query range using the standard segment tree query procedure. We get O(log n) disjoint nodes, the union of whose multisets is exactly the multiset of values in the query range. Let's call those multisets s_1, ..., s_m (with m <= ceil(log_2 n)). Finding the s_i takes O(log n) time.
Do a select(k) query on the union of s_1, ..., s_m. See below.
So how does the selection algorithm work? There is one really simple algorithm to do this.
We have s_1, ..., s_n and k given and want to find the smallest x in a, such that s_1.rank(x) + ... + s_m.rank(x) >= k - 1, where rank returns the number of elements smaller than x in the respective BBST (this can be implemented in O(log n) if we store subtree sizes).
Let's just use binary search to find x! We walk through the BBST of the root, do a couple of rank queries and check whether their sum is larger than or equal to k. It's a predicate monotone in x, so binary search works. The answer is then the minimum of the successors of x in any of the s_i.
Complexity: O(n log n) preprocessing and O(log^3 n) per query.
So in total we get a runtime of O(n log n + q log^3 n) for q queries. I'm sure we could get it down to O(q log^2 n) with a cleverer selection algorithm.
UPDATE: If we are looking for an offline algorithm that can process all queries at once, we can get O((n + q) * log n * log (q + n)) using the following algorithm:
Preprocess all queries, create a set of all values that ever occured in the array. The number of those will be at most q + n.
Build a segment tree, but this time not on the array, but on the set of possible values.
Every node in the segment tree represents an interval of values and maintains a set of positions where these values occurs.
To answer a query, start at the root of the segment tree. Check how many positions in the left child of the root lie in the query interval (we can do that by doing two searches in the BBST of positions). Let that number be m. If k <= m, recurse into the left child. Otherwise recurse into the right child, with k decremented by m.
For updates, remove the position from the O(log (q + n)) nodes that cover the old value and insert it into the nodes that cover the new value.
The advantage of this approach is that we don't need subtree sizes, so we can implement this with most standard library implementations of balanced binary search trees (e.g. set<int> in C++).
We can turn this into an online algorithm by changing the segment tree out for a weight-balanced tree such as a BB[α] tree. It has logarithmic operations like other balanced binary search trees, but allows us to rebuild an entire subtree from scratch when it becomes unbalanced by charging the rebuilding cost to the operations that must have caused the imbalance.
If this is a programming contest problem, then you might be able to get away with the following O(n log(n) + q n^0.5 log(n)^1.5)-time algorithm. It is set up to use the C++ STL well and has a much better big-O constant than Niklas's (previous?) answer on account of using much less space and indirection.
Divide the array into k chunks of length n/k. Copy each chunk into the corresponding locations of a second array and sort it. To update: copy the chunk that changed into the second array and sort it again (time O((n/k) log(n/k)). To query: copy to a scratch array the at most 2 (n/k - 1) elements that belong to a chunk partially overlapping the query interval. Sort them. Use one of the answers to this question to select the element of the requested rank out of the union of the sorted scratch array and fully overlapping chunks, in time O(k log(n/k)^2). The optimum setting of k in theory is (n/log(n))^0.5. It's possible to shave another log(n)^0.5 using the complicated algorithm of Frederickson and Johnson.
perform a modification of the bucket sort: create a bucket that contains the numbers in the range you want and then sort this bucket only and find the kth minimum.
Damn, this solution can't update an element but at least finds that k-th element, here you'll get some ideas so you can think of some solution that provides update. Try pointer-based B-trees.
This is O(n log n) space and O(q log^2 n) time complexity. Later I explained the same with O(log n) per query.
So, you'll need to do the next:
1) Make a "segment tree" over given array.
2) For every node, instead of storing one number, you would store a whole array. The size of that array has to be equal to the number of it's children. That array (as you guessed) has to contain the values of the bottom nodes (children, or the numbers from that segment), but sorted.
3) To make such an array, you would merge two arrays from its two sons from segment tree. But not only that, for every element from the array you have just made (by merging), you need to remember the position of the number before its insertion in merged array (basically, the array from which it comes, and position in it). And a pointer to the first next element that is not inserted from the same array.
4) With this structure, you can check how many numbers there are that are lower than given value x, in some segment S. You find (with binary search) the first number in the array of the root node that is >= x. And then, using the pointers you have made, you can find the results for the same question for two children arrays (arrays of nodes that are children to the previous node) in O(1). You stop to operate this descending for each node that represents the segment that is whole either inside or outside of given segment S. The time complexity is O(log n): O(log n) to find the first element that is >=x, and O(log n) for all segments of decomposition of S.
5) Do a binary search over solution.
This was solution with O(log^2 n) per query. But you can reduce to O(log n):
1) Before doing all I wrote above, you need to transform the problem. You need to sort all numbers and remember the positions for each in original array. Now these positions are representing the array you are working on. Call that array P.
If bounds of the query segment are a and b. You need to find the k-th element in P that is between a and b by value (not by index). And that element represents the index of your result in original array.
2) To find that k-th element, you would do some type of back-tracking with complexity of O(log n). You will be asking the number of elements between index 0 and (some other index) that are between a and b by value.
3) Suppose that you know the answer for such a question for some segment (0,h). Get answers on same type of questions for all segments in tree that begin on h, starting from the greatest one. Keep getting those answers as long as the current answer (from segment (0,h)) plus the answer you got the last are greater than k. Then update h. Keep updating h, until there is only one segment in tree that begins with h. That h is the index of the number you are looking for in the problem you have stated.
To get the answer to such a question for some segment from tree you will spend exactly O(1) of time. Because you already know the answer of it's parent's segment, and using the pointers I explained in the first algorithm you can get the answer for the current segment in O(1).
The question is :
find median of a large data(n numbers) in fixed size (k) of numbers
What i thought is :
maintain 2 heaps , maximum heap for numbers less than current median and minimum heap for numbers greater than current median .
The main concept is to FIND the 1st element of previous set in one of the heap (depending on it is < or > current median), and replace it with the new element we encounter .
Now modify such as to make |size(heap1) - size(heap2)| = 1 or 0 because median is avg. of top element if size1 != size2 else the top element of the heap with size > size of other .
The problem i am facing is the time complexity increases because finding the element takes O(n) time so total O(n*k), so i am not able to achieve the desired complexity O(n*logk) (as was required in source of the question).
How should it be reduced , without using extra space ?
edit : input : 1 4 3 5 6 2 , k=4
median :
from 1 4 3 5 = (4+3)/2
from 4 3 5 6 = (4+5)/2
from 3 5 6 2= (3+5)/2
You can solve this problem using an order-statistic tree, which is a BST with some additional information that allows finding medians, quantiles and other order statistics in O(log n) time in a tree with n elements.
First, construct an OST with the first k elements. Then, in a loop:
Find and report the median value.
Remove the first element that was inserted into the tree (you can find out which element this was in the array).
Insert the next element from the array.
Each of these steps takes O(log k) if the tree is self-balancing, because we maintain the invariant that the tree never grows beyond size k, which also gives O(k) auxiliary space. The preprocessing takes O(k log k) time while the loop repeats n + 1 - k times for a total time of O(n log k).
If you can find a balanced tree implementation that gives you efficient access to the central element you should probably use it. You could also do this with heaps much as you suggest, as long as you keep an extra array of length k which tells you where each element in the window lives in its heap, and which heap it is in. You will have to modify the code that maintains the heap to update this array when it moves things around, but heap code is a lot easier to write and a lot smaller than balanced tree code. Then you don't need to search through all the heap to remove the item which has just gone off the edge of the window and the cost is down to n log k.
This problem seems like the one which you have in efficient implementation in dijkstra shortest path where we need to delete(update in case of dijkstra) an element which is not at top of heap.But you can use the same work around to this problem but with extra space complexity. Firstly you cannot use inbuilt heap library, create your own heap data structure but maintain pointers to each element in the heap so while adding or removing element update the pointers to each element. So after calculating median of first k elements delete the first element directly from heap (min or max) according to whether it is greater or less than median using pointers and then use heapify at that position. Then sizes of heaps change and then you can get new median using same logic as you are using for adjusting the heap sizes and calculating median.
Heapify takes O(logk) hence your total cost will be O(n*logk) but will need O(n) more space for pointers.
I'm learning data-structures and algorithms. The book I refer to(Sedgwick) uses 'finding the maximum element' to illustrate the divide-and-conquer strategy. This algorithm divides an array midway into two parts, finds the maximum elements in the two parts (recursively), and returns the larger of the two as the maximum element of the whole array.
The below is an exercise question asked
Modify the divide-and-conquer program for finding the maximum element in an array (Program 5.6) to divide an array of size N into one part of size k = 2^(lg N – 1) and another of size, N – k (so that the size of at least one of the parts is a power of 2).
Draw the tree corresponding to the recursive calls that your program makes when the array size is 11, similar to the one shown for Program 5.6.
I see that the left sub-tree of such a binary tree is a perfect binary tree because the size of the first subset is a power of two. What is the implication the author is hoping that I should get from this?
I suppose that one nugget of this exercise lies in the k. It makes the point that if you use this formula for k in a binary recursion, then your underlying tree is "pretty", in the sense that the left subtree of every node (not just the root) is a perfect binary tree.
Of course it is also well-behaved in the "ideal" case when N is a power of 2; k is then simply N/2, and every subtree (not only the left) is a perfect binary tree.