Given an array with n elements, how to find the number of elements greater than or equal to a given value (x) in the given range index i to index j in O(log n) complexity?
The queries are of the form (i, j, x) which means find number of elements greater than x from the ith till jth element in the array
The array is not sorted. i, j & x are different for different queries. Elements of the array are static.
Edit: i, j, x all can be different for different queries!
If we know all queries before hand, we can solve this problem by making use of Fenwick tree.
First, we need to sort all elements in array and queries together, based on their values.
So, assuming that we have array [5, 4, 2, 1, 3] and queries (0, 1, 6) and (2, 5, 2), we will have following result after sorting : [1, 2, 2, 3 , 4 , 5, 6]
Now, we will need to process each element in descending order:
If we encounter an element which is from the array, we will update its index in the Fenwick tree, which take O(log n)
If we encounter a queries, we need to check, in this range of the query, how many elements have been added in the tree, which take O(log n).
For above example, the process will be:
1st element is a query for value 6, as Fenwick tree is empty -> result is 0
2nd is element 5 -> add index 0 into Fenwick tree
3rd element is 4 -> add index 1 into tree.
4th element is 3 -> add index 4 into tree.
5th element is 2 -> add index 2 into tree.
6th element is query for range (2, 5), we query the tree and get answer 2.
7th element is 1 -> add index 3 into tree.
Finish.
So, in total, the time complexity for our solution is O((m + n) log(m + n)) with m and n is the number of queries and number of element from input array respectively.
That is possible only if you got the array sorted. In that case binary search the smallest value passing your condition and compute the count simply by sub-dividing your index range by its found position to two intervals. Then just compute the length of the interval passing your condition.
If array is not sorted and you need to preserve its order you can use index sort . When put together:
definitions
Let <i0,i1> be your used index range and x be your value.
index sort array part <i0,i1>
so create array of size m=i1-i0+1 and index sort it. This task is O(m.log(m)) where m<=n.
binary search x position in index array
This task is O(log(m)) and you want the index j = <0,m) for which array[index[j]]<=x is the smallest value <=x
compute count
Simply count how many indexes are after j up to m
count = m-j;
As you can see if array is sorted you got O(log(m)) complexity but if it is not then you need to sort O(m.log(m)) which is worse than naive approach O(m) which should be used only if the array is changing often and cant be sorted directly.
[Edit1] What I mean by Index sort
By index sort I mean this: Let have array a
a[] = { 4,6,2,9,6,3,5,1 }
The index sort means that you create new array ix of indexes in sorted order so for example ascending index sort means:
a[ix[i]]<=a[ix[i+1]]
In our example index bubble sort is is like this:
// init indexes
a[ix[i]]= { 4,6,2,9,6,3,5,1 }
ix[] = { 0,1,2,3,4,5,6,7 }
// bubble sort 1st iteration
a[ix[i]]= { 4,2,6,6,3,5,1,9 }
ix[] = { 0,2,1,4,5,6,7,3 }
// bubble sort 2nd iteration
a[ix[i]]= { 2,4,6,3,5,1,6,9 }
ix[] = { 2,0,1,5,6,7,4,3 }
// bubble sort 3th iteration
a[ix[i]]= { 2,4,3,5,1,6,6,9 }
ix[] = { 2,0,5,6,7,1,4,3 }
// bubble sort 4th iteration
a[ix[i]]= { 2,3,4,1,5,6,6,9 }
ix[] = { 2,5,0,7,6,1,4,3 }
// bubble sort 5th iteration
a[ix[i]]= { 2,3,1,4,5,6,6,9 }
ix[] = { 2,5,7,0,6,1,4,3 }
// bubble sort 6th iteration
a[ix[i]]= { 2,1,3,4,5,6,6,9 }
ix[] = { 2,7,5,0,6,1,4,3 }
// bubble sort 7th iteration
a[ix[i]]= { 1,2,3,4,5,6,6,9 }
ix[] = { 7,2,5,0,6,1,4,3 }
So the result of ascending index sort is this:
// ix: 0 1 2 3 4 5 6 7
a[] = { 4,6,2,9,6,3,5,1 }
ix[] = { 7,2,5,0,6,1,4,3 }
Original array stays unchanged only the index array is changed. Items a[ix[i]] where i=0,1,2,3... are sorted ascending.
So now if x=4 on this interval you need to find (bin search) which i has the smallest but still a[ix[i]]>=x so:
// ix: 0 1 2 3 4 5 6 7
a[] = { 4,6,2,9,6,3,5,1 }
ix[] = { 7,2,5,0,6,1,4,3 }
a[ix[i]]= { 1,2,3,4,5,6,6,9 }
// *
i = 3; m=8; count = m-i = 8-3 = 5;
So the answer is 5 items are >=4
[Edit2] Just to be sure you know what binary search means for this
i=0; // init value marked by `*`
j=4; // max power of 2 < m , i+j is marked by `^`
// ix: 0 1 2 3 4 5 6 7 i j i+j a[ix[i+j]]
a[ix[i]]= { 1,2,3,4,5,6,6,9 } 0 4 4 5>=4 j>>=1;
* ^
a[ix[i]]= { 1,2,3,4,5,6,6,9 } 0 2 2 3< 4 -> i+=j; j>>=1;
* ^
a[ix[i]]= { 1,2,3,4,5,6,6,9 } 2 1 3 4>=4 j>>=1;
* ^
a[ix[i]]= { 1,2,3,4,5,6,6,9 } 2 0 -> stop
*
a[ix[i]] < x -> a[ix[i+1]] >= x -> i = 2+1 = 3 in O(log(m))
so you need index i and binary bit mask j (powers of 2). At first set i with zero and j with the biggest power of 2 still smaller then n (or in this case m). Fro example something like this:
i=0; for (j=1;j<=m;j<<=1;); j>>=1;
Now in each iteration test if a[ix[i+j]] suffice search condition or not. If yes then update i+=j else leave it as is. After that go to next bit so j>>=1 and if j==0 stop else do iteration again. at the end you found value is a[ix[i]] and index is i in log2(m) iterations which is also the number of bits needed to represent m-1.
In the example above I use condition a[ix[i]]<4 so the found value was biggest number still <4 in the array. as we needed to also include 4 then I just increment the index once at the end (I could use <=4instead but was too lazy to rewrite the whole thing again).
The count of such items is then just number of element in array (or interval) minus the i.
Previous answer describes an offline solution using Fenwick tree, but this problem could be solved online (and even when doing updates to the array) with slightly worse complexity. I'll describe such a solution using segment tree and AVL tree (any self-balancing BST could do the trick).
First lets see how to solve this problem using segment tree. We'll do this by keeping the actual elements of the array in every node by range that it covers. So for array A = [9, 4, 5, 6, 1, 3, 2, 8] we'll have:
[9 4 5 6 1 3 2 8] Node 1
[9 4 5 6] [1 3 2 8] Node 2-3
[9 4] [5 6] [1 3] [2 8] Node 4-7
[9] [4] [5] [6] [1] [3] [2] [8] Node 8-15
Since height of our segment tree is log(n) and at every level we keep n elements, total amount of memory used is n log(n).
Next step is to sort these arrays which looks like this:
[1 2 3 4 5 6 8 9] Node 1
[4 5 6 9] [1 2 3 8] Node 2-3
[4 9] [5 6] [1 3] [2 8] Node 4-7
[9] [4] [5] [6] [1] [3] [2] [8] Node 8-15
NOTE: You first need to build the tree and then sort it to keep the order of elements in original array.
Now we can start our range queries and that works basically the same way as in regular segment tree, except when we find a completely overlapping interval, we then additionally check for number of elements greater than X. This can be done with binary search in log(n) time by finding the index of first element greater than X and subtracting it from number of elements in that interval.
Let's say our query was (0, 5, 4), so we do a segment search on interval [0, 5] and end up with arrays: [4, 5, 6, 9], [1, 3]. We then do a binary search on these arrays to see number of elements greater than 4 and get 3 (from first array) and 0 (from second) which brings to total of 3 - our query answer.
Interval search in segment trees can have up to log(n) paths, which means log(n) arrays and since we're doing binary search on each of them, brings complexity to log^2(n) per query.
Now if we wanted to update the array, since we are using segment trees its impossible to add/remove elements efficiently, but we can replace them. Using AVL trees (or other binary trees that allow replacement and lookup in log(n) time) as nodes and storing the arrays, we can manage this operation in same time complexity (replacement with log(n) time).
This is special variant of orthogonal range counting queries in 2D.
Each element el[i] is transformed into point on the plane (i, el[i])
and the query (i,j,x) can be transformed to count all points in the rectangle [i,j] x [x, +infty].
You can use 2D Range Trees (for example: http://www.cs.uu.nl/docs/vakken/ga/slides5b.pdf) for such type of the queries.
The simple idea is to have a tree that stores points in the leaves
(each leaf contains single point) ordered by X-axis.
Each internal node of the tree contains additional tree that stores all points from the subtree (ordered by Y-axis).
The used space is O(n logn)
Simple version could do the counting in O(log^2 n) time, but using
fractional cascading
this could be reduced to O(log n).
There better solution by Chazelle in 1988 (https://www.cs.princeton.edu/~chazelle/pubs/FunctionalDataStructures.pdf)
to O(n) preprocessing and O(log n) query time.
You can find some solutions with better query time, but they are way more complicated.
I would try to give you a simple approach.
You must have studied merge sort.
In merge sort we keep on dividing array into sub array and then build it up back but we dont store the sorted subarrays in this approach we store them as nodes of binary tree.
this takes up nlogn space and nlogn time to build up;
now for each query you just have to find the subarray this will be done in logn on average and logn^2 in worst case.
These tree are also known as fenwick trees.
If you want a simple code I can provide you with that.
I have an array lets say a = { 1,4,5,6,2,23,4,2};
now I have to find median of array position from 2 to 6 (odd total terms), so what I have done, I have taken a[1] to a[5] in arr[0] to arr[4] then I have sorted it and write the arr[2] as the median .
But here every time I put values from one array to another, so that the values of my initial array remains the same. Secondly, I have sorted, so this procedure is taking pretty much **time**.
So I want to know if there is any way I can do this differently to reduce my computation time.
Any websites, material to understand, what, and how to do?
Use std::nth_element from <algorithm> which is O(N):
nth_element(a, a + size / 2, a + size);
median = a[size/2];
It is possible to find the median without sorting in O(n) time; algorithms that do this are called selection algorithms.
If you are doing multiple queries on the same array then you could use a Segment Tree. They are generally used to do range minimum/maximum and range sum queries but you can change it to do range median.
A segment tree for a set with n intervals uses O(n log n) storage and can be built in O(n log n) time. A range query can be done in O(log n).
Example of median in range segment tree:
You build the segment tree from the bottom up (update from the top down):
[5]
[3] [7]
[1,2] [4] [6] [8]
1 2 3 4 5 6 7 8
Indices covered by node:
[4]
[2] [6]
[0,1] [3] [5] [7]
0 1 2 3 4 5 6 7
A query for median for range indices of 4-6 would go down this path of values:
[4]
[5]
0 1 2 3 4 5 6 7
Doing a search for the median, you know the number of total elements in the query (3) and the median in that range would be the 2nd element (index 5). So you are essentially doing a search for the first node which contains that index which is node with values [1,2] (indices 0,1).
Doing a search of the median of the range 3-6 is a bit more complicated because you have to search for two indices (4,5) which happen to lie in the same node.
[4]
[6]
[5]
0 1 2 3 4 5 6 7
Segment tree
Range minimum query on Segment Tree
To find the median of an array of less than 9 elements, I think the most efficient is to use a sort algorithm like insertion sort. The complexity is bad, but for such a small array because of the k in the complexity of better algorithms like quicksort, insertion sort is very efficient. Do your own benchmark but I can tell you will have better results with insertion sort than with shell sort or quicksort.
I think the best way is to use the median of medians algorithm of counting the k-th largest element of an array. You can find the overall idea of the algorithm here: Median of Medians in Java , on wikipedia: http://en.wikipedia.org/wiki/Selection_algorithm#Linear_general_selection_algorithm_-_Median_of_Medians_algorithm or just browse the internet. Some general improvements can be made during implementation (avoid sorting when choosing the median of particular arrays). However, note that for an array of less than 50 elements its more efficient to use insertion sort than median of medians algorithm.
All existing answers have some downsides in certain situations:
Sorting the entire subrange is not very efficient because one does not need to sort the entire array to get the median, and one needs an additional array if multiple subrange medians are to be found.
Using std::nth_element is more efficient but it still mutates the subrange, so one still needs an additional array.
Using segment tree gets you an efficent solution but you need to either implement the structure yourself or use a third party library.
For this reason, I am posting my approach which uses std::map and is inspired by selection sort algorithm:
First collect the frequencies of elements in first subrange into an object of std::map<int, int>.
With this object, we can efficently find the median of the subrange whose length is subrangeLength:
double median(const std::map<int, int> &histogram, int subrangeLength)
{
const int middle{subrangeLength / 2};
int count{0};
/* We use the fact that keys in std::map are sorted, so by simply iterating
and adding up the frequencies, we can find the median. */
if (subrangeLength % 2 == 1) {
for (const auto &freq : histogram) {
count += freq.second;
/* In case where subrangeLength is odd, "middle" is the lower integer bound of
subrangeLength / 2, so as soon as we cross it, we have found the median. */
if (count > middle) {
return freq.first;
}
}
} else {
std::optional<double> medLeft;
for (const auto &freq : histogram) {
count += freq.second;
/* In case where subrangeLength is even, we need to pay attention to the case when
elements at positions middle and middle + 1 are different. */
if (count == middle) {
medLeft = freq.first;
} else if (count > middle) {
if (!medLeft) {
medLeft = freq.first;
}
return (*medLeft + freq.first) / 2.0;
}
}
}
return -1;
}
Now when we want to get the median of next subrange, we simply update the histogram by decreasing the frequency of the element that is to be removed and add/increase it for the new element (with std::map, this is done in constant time). Now we compute the median again and continue with this until we handle all subranges.
Given an array with integers, with each integer being at most n positions away from its final position, what would be the best sorting algorithm?
I've been thinking for a while about this and I can't seem to get a good strategy to start dealing with this problem. Can someone please guide me?
I'd split the list (of size N) into 2n sublists (using zero-based indexing):
list 0: elements 0, 2n, 4n, ...
list 1: elements 1, 2n+1, 4n+1, ...
...
list 2n-1: elements 2n-1, 4n-1, ...
Each of these lists is obviously sorted.
Now merge these lists (repeatedly merging 2 lists at a time, or using a min heap with one element of each of these lists).
That's all. Time complexity is O(N log(n)).
This is easy in Python:
>>> a = [1, 0, 5, 4, 3, 2, 6, 8, 9, 7, 12, 13, 10, 11]
>>> n = max(abs(i - x) for i, x in enumerate(a))
>>> n
3
>>> print(*heapq.merge(*(a[i::2 * n] for i in range(2 * n))))
0 1 2 3 4 5 6 7 8 9 10 11 12 13
The Heap Sort is very fast for initially random array/collection of elements. In pseudo code this sort would be imlemented as follows:
# heapify
for i = n/2:1, sink(a,i,n)
→ invariant: a[1,n] in heap order
# sortdown
for i = 1:n,
swap a[1,n-i+1]
sink(a,1,n-i)
→ invariant: a[n-i+1,n] in final position
end
# sink from i in a[1..n]
function sink(a,i,n):
# {lc,rc,mc} = {left,right,max} child index
lc = 2*i
if lc > n, return # no children
rc = lc + 1
mc = (rc > n) ? lc : (a[lc] > a[rc]) ? lc : rc
if a[i] >= a[mc], return # heap ordered
swap a[i,mc]
sink(a,mc,n)
For different cases like "Nearly Sorted" of "Few Unique" the algorithms can work differently and be more efficent. For a complete list of the algorithms with animations in the various cases see this brilliant site.
I hope this helps.
Ps. For nearly sorted sets (as commented above) the insertion sort is your winner.
I'd recommend using a comb sort, just start it with a gap size equal to the maximum distance away (or about there). It's expected O(n log n) (or in your case O(n log d) where d is the maximum displacement), easy to understand, easy to implement, and will work even when the elements are displaced more than you expect. If you need the guaranteed execution time you can use something like heap sort, but in the past I've found the overhead in space or computation time usually isn't worth it and end up implementing nearly anything else.
Since each integer being at most n positions away from its final position:
1) for the smallest integer (aka. the 0th integer in the final sorted array), its current position must be in A[0...n] because the nth element is n positions away from the 0th position
2) for the second smallest integer (aka. the 1st integer in the final sorted array, zero based), its current position must be in A[0...n+1]
3) for the ith smallest integer, its current position must be in A[i-n...i+n]
We could use a (n+1)-size min heap, containing a rolling window to get the array sorted. And you could find more details here:
http://www.geeksforgeeks.org/nearly-sorted-algorithm/