Find the number of elements greater than x in a given range - algorithm

Given an array with n elements, how to find the number of elements greater than or equal to a given value (x) in the given range index i to index j in O(log n) complexity?
The queries are of the form (i, j, x) which means find number of elements greater than x from the ith till jth element in the array
The array is not sorted. i, j & x are different for different queries. Elements of the array are static.
Edit: i, j, x all can be different for different queries!

If we know all queries before hand, we can solve this problem by making use of Fenwick tree.
First, we need to sort all elements in array and queries together, based on their values.
So, assuming that we have array [5, 4, 2, 1, 3] and queries (0, 1, 6) and (2, 5, 2), we will have following result after sorting : [1, 2, 2, 3 , 4 , 5, 6]
Now, we will need to process each element in descending order:
If we encounter an element which is from the array, we will update its index in the Fenwick tree, which take O(log n)
If we encounter a queries, we need to check, in this range of the query, how many elements have been added in the tree, which take O(log n).
For above example, the process will be:
1st element is a query for value 6, as Fenwick tree is empty -> result is 0
2nd is element 5 -> add index 0 into Fenwick tree
3rd element is 4 -> add index 1 into tree.
4th element is 3 -> add index 4 into tree.
5th element is 2 -> add index 2 into tree.
6th element is query for range (2, 5), we query the tree and get answer 2.
7th element is 1 -> add index 3 into tree.
Finish.
So, in total, the time complexity for our solution is O((m + n) log(m + n)) with m and n is the number of queries and number of element from input array respectively.

That is possible only if you got the array sorted. In that case binary search the smallest value passing your condition and compute the count simply by sub-dividing your index range by its found position to two intervals. Then just compute the length of the interval passing your condition.
If array is not sorted and you need to preserve its order you can use index sort . When put together:
definitions
Let <i0,i1> be your used index range and x be your value.
index sort array part <i0,i1>
so create array of size m=i1-i0+1 and index sort it. This task is O(m.log(m)) where m<=n.
binary search x position in index array
This task is O(log(m)) and you want the index j = <0,m) for which array[index[j]]<=x is the smallest value <=x
compute count
Simply count how many indexes are after j up to m
count = m-j;
As you can see if array is sorted you got O(log(m)) complexity but if it is not then you need to sort O(m.log(m)) which is worse than naive approach O(m) which should be used only if the array is changing often and cant be sorted directly.
[Edit1] What I mean by Index sort
By index sort I mean this: Let have array a
a[] = { 4,6,2,9,6,3,5,1 }
The index sort means that you create new array ix of indexes in sorted order so for example ascending index sort means:
a[ix[i]]<=a[ix[i+1]]
In our example index bubble sort is is like this:
// init indexes
a[ix[i]]= { 4,6,2,9,6,3,5,1 }
ix[] = { 0,1,2,3,4,5,6,7 }
// bubble sort 1st iteration
a[ix[i]]= { 4,2,6,6,3,5,1,9 }
ix[] = { 0,2,1,4,5,6,7,3 }
// bubble sort 2nd iteration
a[ix[i]]= { 2,4,6,3,5,1,6,9 }
ix[] = { 2,0,1,5,6,7,4,3 }
// bubble sort 3th iteration
a[ix[i]]= { 2,4,3,5,1,6,6,9 }
ix[] = { 2,0,5,6,7,1,4,3 }
// bubble sort 4th iteration
a[ix[i]]= { 2,3,4,1,5,6,6,9 }
ix[] = { 2,5,0,7,6,1,4,3 }
// bubble sort 5th iteration
a[ix[i]]= { 2,3,1,4,5,6,6,9 }
ix[] = { 2,5,7,0,6,1,4,3 }
// bubble sort 6th iteration
a[ix[i]]= { 2,1,3,4,5,6,6,9 }
ix[] = { 2,7,5,0,6,1,4,3 }
// bubble sort 7th iteration
a[ix[i]]= { 1,2,3,4,5,6,6,9 }
ix[] = { 7,2,5,0,6,1,4,3 }
So the result of ascending index sort is this:
// ix: 0 1 2 3 4 5 6 7
a[] = { 4,6,2,9,6,3,5,1 }
ix[] = { 7,2,5,0,6,1,4,3 }
Original array stays unchanged only the index array is changed. Items a[ix[i]] where i=0,1,2,3... are sorted ascending.
So now if x=4 on this interval you need to find (bin search) which i has the smallest but still a[ix[i]]>=x so:
// ix: 0 1 2 3 4 5 6 7
a[] = { 4,6,2,9,6,3,5,1 }
ix[] = { 7,2,5,0,6,1,4,3 }
a[ix[i]]= { 1,2,3,4,5,6,6,9 }
// *
i = 3; m=8; count = m-i = 8-3 = 5;
So the answer is 5 items are >=4
[Edit2] Just to be sure you know what binary search means for this
i=0; // init value marked by `*`
j=4; // max power of 2 < m , i+j is marked by `^`
// ix: 0 1 2 3 4 5 6 7 i j i+j a[ix[i+j]]
a[ix[i]]= { 1,2,3,4,5,6,6,9 } 0 4 4 5>=4 j>>=1;
* ^
a[ix[i]]= { 1,2,3,4,5,6,6,9 } 0 2 2 3< 4 -> i+=j; j>>=1;
* ^
a[ix[i]]= { 1,2,3,4,5,6,6,9 } 2 1 3 4>=4 j>>=1;
* ^
a[ix[i]]= { 1,2,3,4,5,6,6,9 } 2 0 -> stop
*
a[ix[i]] < x -> a[ix[i+1]] >= x -> i = 2+1 = 3 in O(log(m))
so you need index i and binary bit mask j (powers of 2). At first set i with zero and j with the biggest power of 2 still smaller then n (or in this case m). Fro example something like this:
i=0; for (j=1;j<=m;j<<=1;); j>>=1;
Now in each iteration test if a[ix[i+j]] suffice search condition or not. If yes then update i+=j else leave it as is. After that go to next bit so j>>=1 and if j==0 stop else do iteration again. at the end you found value is a[ix[i]] and index is i in log2(m) iterations which is also the number of bits needed to represent m-1.
In the example above I use condition a[ix[i]]<4 so the found value was biggest number still <4 in the array. as we needed to also include 4 then I just increment the index once at the end (I could use <=4instead but was too lazy to rewrite the whole thing again).
The count of such items is then just number of element in array (or interval) minus the i.

Previous answer describes an offline solution using Fenwick tree, but this problem could be solved online (and even when doing updates to the array) with slightly worse complexity. I'll describe such a solution using segment tree and AVL tree (any self-balancing BST could do the trick).
First lets see how to solve this problem using segment tree. We'll do this by keeping the actual elements of the array in every node by range that it covers. So for array A = [9, 4, 5, 6, 1, 3, 2, 8] we'll have:
[9 4 5 6 1 3 2 8] Node 1
[9 4 5 6] [1 3 2 8] Node 2-3
[9 4] [5 6] [1 3] [2 8] Node 4-7
[9] [4] [5] [6] [1] [3] [2] [8] Node 8-15
Since height of our segment tree is log(n) and at every level we keep n elements, total amount of memory used is n log(n).
Next step is to sort these arrays which looks like this:
[1 2 3 4 5 6 8 9] Node 1
[4 5 6 9] [1 2 3 8] Node 2-3
[4 9] [5 6] [1 3] [2 8] Node 4-7
[9] [4] [5] [6] [1] [3] [2] [8] Node 8-15
NOTE: You first need to build the tree and then sort it to keep the order of elements in original array.
Now we can start our range queries and that works basically the same way as in regular segment tree, except when we find a completely overlapping interval, we then additionally check for number of elements greater than X. This can be done with binary search in log(n) time by finding the index of first element greater than X and subtracting it from number of elements in that interval.
Let's say our query was (0, 5, 4), so we do a segment search on interval [0, 5] and end up with arrays: [4, 5, 6, 9], [1, 3]. We then do a binary search on these arrays to see number of elements greater than 4 and get 3 (from first array) and 0 (from second) which brings to total of 3 - our query answer.
Interval search in segment trees can have up to log(n) paths, which means log(n) arrays and since we're doing binary search on each of them, brings complexity to log^2(n) per query.
Now if we wanted to update the array, since we are using segment trees its impossible to add/remove elements efficiently, but we can replace them. Using AVL trees (or other binary trees that allow replacement and lookup in log(n) time) as nodes and storing the arrays, we can manage this operation in same time complexity (replacement with log(n) time).

This is special variant of orthogonal range counting queries in 2D.
Each element el[i] is transformed into point on the plane (i, el[i])
and the query (i,j,x) can be transformed to count all points in the rectangle [i,j] x [x, +infty].
You can use 2D Range Trees (for example: http://www.cs.uu.nl/docs/vakken/ga/slides5b.pdf) for such type of the queries.
The simple idea is to have a tree that stores points in the leaves
(each leaf contains single point) ordered by X-axis.
Each internal node of the tree contains additional tree that stores all points from the subtree (ordered by Y-axis).
The used space is O(n logn)
Simple version could do the counting in O(log^2 n) time, but using
fractional cascading
this could be reduced to O(log n).
There better solution by Chazelle in 1988 (https://www.cs.princeton.edu/~chazelle/pubs/FunctionalDataStructures.pdf)
to O(n) preprocessing and O(log n) query time.
You can find some solutions with better query time, but they are way more complicated.

I would try to give you a simple approach.
You must have studied merge sort.
In merge sort we keep on dividing array into sub array and then build it up back but we dont store the sorted subarrays in this approach we store them as nodes of binary tree.
this takes up nlogn space and nlogn time to build up;
now for each query you just have to find the subarray this will be done in logn on average and logn^2 in worst case.
These tree are also known as fenwick trees.
If you want a simple code I can provide you with that.

Related

What is the purpose of having heap sort?

When we build a heap, the elements in the array get arranged in a particular order (ascending or descending) depending on whether it is max heap or min heap. Then what is the use of heap sort when building a heap itself arranges the elements in sorted order with less time complexity?
void build_heap (int Arr[ ])
{
for(int i = N/2-1 ; i >= 0; i-- )
{
down_heapify (Arr, i, N);
}
}
void heap_sort(int Arr[], int N)
{
build_heap(Arr);
for(int i = N-1; i >= 1; i--)
{
swap(Arr[i], Arr[0]);
down_heapify(Arr, 0, i+1);
}
}
Heap sort summed up
Heap sort is an algorithm which can be summed up in two steps:
Convert the input array into a heap;
Convert the heap into a sorted array.
The heap itself is not a sorted array.
Let's look at an example:
[9, 7, 3, 5, 4, 2, 0, 6, 8, 1] # unsorted array
convert into heap
[9, 8, 3, 7, 4, 2, 0, 6, 5, 1] # array representing a max-heap
sort
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] # sorted array
If you look closely, you'll notice the second array in my example, which represents a heap, isn't quite sorted. The order of the element looks less random than in the original unsorted array; they look almost sorted in decreasing order; but they aren't completely sorted. 3 comes before 7, 0 comes before 6 in the array.
So what is a heap?
What is a heap?
Note that in the previous section, I make a distinction between "a heap" and "an array representing a heap". Let's talk about what is a heap first, and then what is an array representing a heap.
A max-heap is a binary tree with values on the nodes, which satisfies the two following properties:
the value on a child node is always lower than the value on its parent node;
the tree is almost-complete, in the sense that all branches of the tree have almost the same length, with a difference of at most 1 between the longest and the shorted branches; in addition, the longest branches must be on the left and the shortest branches must be on the right.
In the example I gave, the heap constructed is this one:
9
/ \
8 3
/ \ / \
7 4 2 0
/ \ / \ / \ / \
6 5 1
You can check that this binary tree satisfies the two properties of a heap: each child has a lower value than its parent, and all branches have almost the same length, with always 4 or 3 values per branch, and the longest branches being on the left and the shortest branches being on the right.
What is an array representing a heap?
Storing binary trees into arrays is usually pretty inconvenient, and binary trees are most generally implemented using pointers, kinda like a linked list. However, the heap is a very special binary tree, and its "almost-complete" property is super useful to implement it as an array.
All we have to do is read the values row per row, left to right. In the heap above, we have four rows:
9
8 3
7 4 2 0
6 5 1
We simply store these values in that order in an array:
[9, 8, 3, 7, 4, 2, 0, 6, 5, 1]
Notice that this is exactly the array after the first step of heap sort at the beginning of my post.
In this array representation, we can use positions to determine which node is a child of which node: the node at position i has two children, which are at positions 2*i+1 and 2*i+2.
This array is not a sorted array. But it represents a heap and we can easily use it to produce a sorted array, in n log(n) operations, by extracting the maximum element repeatedly.
If heap-sort was implemented with an external binary tree, then we could use either a max-heap or a min-heap, and sort the array by repeatedly selecting the maximum element or the minimum element. However, if you try to implement heap-sort in-place, storing the heap as an array inside the array which is being sorted, you'll notice that it's much more convenient to use a max-heap than a min-heap, in order to sort the elements in increasing order by repeatedly selecting the max element and moving it to the end of the array.
"Then what is the use heap sort when building a heap itself arranges the elements in sorted order"
It seems that you confuse the purposes of Heap Sort algorithm and heap data structure. Let us clarify this way:
heap is a data structure that allows us to find minimum or maximum of the list repeatedly in a changing collection. We can use sink() based approach for creating a heap from scratch in O(n). Then each operation takes O(logn) complexity. However, heap doesn't provide you with a sorted array, it just gives maximum or minimum depending on your implementation.
On the other hand, Heap Sort algorithm provides you with a sorted array/collection using heap data structure. First it builds a heap in O(n) time complexity. Then it starts from the bottom and returns max/min one-by-one to the actual array. In each iteration, you have to heapify your heap to get next max/min properly, which in total gives O(n*logn) time complexity.
void heap_sort(int Arr[], int N)
{
build_heap(Arr); // O(n) time complexity
for(int i = N-1; i >= 1; i--) // n iteration
{
swap(Arr[i], Arr[0]);
down_heapify(Arr, 0, i+1); //O(logn) time complexity
}
// in total O(n) + O(n*logn) = O(n*logn)
}
In conclusion, building a heap itself doesn't provide you with a sorted array.

divide and conquer - finding the median for an array

Say we have an array of size 2n of all unique elements.
Assume we split the array into 2 arrays of size n, and we have a special constant time lookup to find the kth smallest element for that particular array if 1 <= k <= n, so for [4 5 6], k=2 returns 5.
Then what is the Θ(log(n)) algorithm for finding the median? Median is defined as the nth lowest element between the 2 arrays. If array was [1 2 3 4 5 6], median would typically be (3+4)/2, but we just choose the smaller one which is 3.
My attempt ie:
2n = 6 [1 2 3 4 5 6]
n = 3 [1 2 3] [4 5 6] (not necessarily sorted, but we have the constant time lookup, so sorting is irrelevant)
Step 1) use lookup where k = n to find the kth smallest element for each array
[1 2 3] [4 5 6]
^ ^ (if k = 3, we get 3 for the first array, 6 for the second array)
Step 2) compare the 2 values we got and choose the smaller one. 3 is the median where median is defined as the nth lowest element between the 2 arrays.
First off, is this the correct algorithm for Θ(log(n)) time? Secondly, what would the proof correctness (that it finds the median) look like? I believe it would be through induction.
Selection (of which median computation is a special case) cannot be solved in O(log n) time. You can solve it in O(n) time using an algorithm such as Quickselect.

find kth smallest number in O(logn) time

Here is the problem, an unsorted array a[n], and I need to find the kth smallest number in range [i, j], and absolutely 1<=i<=j<=n, k<=j-i+1.
Typically I will use quick-find to do the job, but it is not fast enough if there many query requests with different range [i, j], I hardly to figure out a algorithm to do the query in O(logn) time (preprocessing is allowed).
Any idea is appreciated.
PS
Let me make the problem easier to understand. Any kinds of preprocessing is allowed, but the query needs to be done in O(logn) time. And there will be many (more than 1) queries, like find the 1st in range [3,7], or 3rd in range [10,17], or 11th in range [33, 52].
By range [i, j] I mean in the original array, not sorted or something.
For example, a[5] = {3,1,7,5,9}, query 1st in range [3,4] is 5, 2nd in range [1,3] is 5, 3rd in range [0,2] is 7.
If pre-processing is allowed and not counted towards the time complexity, just use that to construct sub-lists so that you can efficiently find the element you're looking for. As with most optimisations, this trades space for time.
Your pre-processing step is to take your original list of n numbers and create a number of new sublists.
Each of these sublists is a portion of the original, starting with the nth element, extending for m elements and then sorted. So your original list of:
{3, 1, 7, 5, 9}
gives you:
list[0][0] = {3}
list[0][1] = {1, 3}
list[0][2] = {1, 3, 7}
list[0][3] = {1, 3, 5, 7}
list[0][4] = {1, 3, 5, 7, 9}
list[1][0] = {1}
list[1][1] = {1, 7}
list[1][2] = {1, 5, 7}
list[1][3] = {1, 5, 7, 9}
list[2][0] = {7}
list[2][1] = {5, 7}
list[2][2] = {5, 7, 9}
list[3][0] = {5}
list[3][1] = {5,9}
list[4][0] = {9}
This isn't a cheap operation (in time or space) so you may want to maintain a "dirty" flag on the list so you only perform it the first time after you do an modifying operation (insert, delete, change).
In fact, you can use lazy evaluation for even more efficiency. Basically set all sublists to an empty list when you start and whenever you perform a modifying operation. Then, whenever you attempt to access a sublist and it's empty, calculate that sublist (and that one only) before trying to get the kth value out of it.
That ensures sublists are evaluated only when needed and cached to prevent unnecessary recalculation. For example, if you never ask for a value from the 3-through-6 sublist, it's never calculated.
The pseudo-code for creating all the sublists is basically (for loops inclusive at both ends):
for n = 0 to a.lastindex:
create array list[n]
for m = 0 to a.lastindex - n
create array list[n][m]
for i = 0 to m:
list[n][m][i] = a[n+i]
sort list[n][m]
The code for lazy evaluation is a little more complex (but only a little), so I won't provide pseudo-code for that.
Then, in order to find the kth smallest number in the range i through j (where i and j are the original indexes), you simply look up lists[i][j-i][k-1], a very fast O(1) operation:
+--------------------------+
| |
| v
1st in range [3,4] (values 5,9), list[3][4-3=1][1-1-0] = 5
2nd in range [1,3] (values 1,7,5), list[1][3-1=2][2-1=1] = 5
3rd in range [0,2] (values 3,1,7), list[0][2-0=2][3-1=2] = 7
| | ^ ^ ^
| | | | |
| +-------------------------+----+ |
| |
+-------------------------------------------------+
Here's some Python code which shows this in action:
orig = [3,1,7,5,9]
print orig
print "====="
list = []
for n in range (len(orig)):
list.append([])
for m in range (len(orig) - n):
list[-1].append([])
for i in range (m+1):
list[-1][-1].append(orig[n+i])
list[-1][-1] = sorted(list[-1][-1])
print "(%d,%d)=%s"%(n,m,list[-1][-1])
print "====="
# Gives xth smallest in index range y through z inclusive.
x = 1; y = 3; z = 4; print "(%d,%d,%d)=%d"%(x,y,z,list[y][z-y][x-1])
x = 2; y = 1; z = 3; print "(%d,%d,%d)=%d"%(x,y,z,list[y][z-y][x-1])
x = 3; y = 0; z = 2; print "(%d,%d,%d)=%d"%(x,y,z,list[y][z-y][x-1])
print "====="
As expected, the output is:
[3, 1, 7, 5, 9]
=====
(0,0)=[3]
(0,1)=[1, 3]
(0,2)=[1, 3, 7]
(0,3)=[1, 3, 5, 7]
(0,4)=[1, 3, 5, 7, 9]
(1,0)=[1]
(1,1)=[1, 7]
(1,2)=[1, 5, 7]
(1,3)=[1, 5, 7, 9]
(2,0)=[7]
(2,1)=[5, 7]
(2,2)=[5, 7, 9]
(3,0)=[5]
(3,1)=[5, 9]
(4,0)=[9]
=====
(1,3,4)=5
(2,1,3)=5
(3,0,2)=7
=====
Current solution is O( (logn)^2 ). I am pretty sure it can be modified to run on O(logn). The main advantage of this algorithm over paxdiablo's algorithm is space efficiency. This algorithm needs O(nlogn) space, not O(n^2) space.
First, the complexity of finding kth smallest element from two sorted arrays of length m and n is O(logm + logn). Complexity of finding kth smallest element from arrays of lengths a,b,c,d.. is O(loga+logb+.....).
Now, sort the whole array and store it. Sort the first half and second half of the array and store it and so on. You will have 1 sorted array of length n, 2 sorted of arrays of length n/2, 4 sorted arrays of length n/4 and so on. Total memory required = 1*n+2*n/2+4*n/4+8*n/8...= nlogn.
Once you have i and j figure out the list of of subarrays which, when concatenated, give you range [i,j]. There are going to be logn number of arrays. Finding kth smallest number among them would take O( (logn)^2) time.
Example for the last paragraph:
Assume the array is of size 8 (indexed from 0 to 7). You have the following sorted lists:
A:0-7, B:0-3, C:4-7, D:0-1, E:2-3, F:4-5, G:6-7.
Now construct a tree with pointers to these arrays such that every node contains its immediate constituents. A will be root, B and C are its children and so on.
Now implement a recursive function that returns a list of arrays.
def getArrays(node, i, j):
if i==node.min and j==node.max:
return [node];
if i<=node.left.max:
if j<=node.left.max:
return [getArrays(node.left, i, j)]; # (i,j) is located within left node
else:
return [ getArrays(node.left, i, node.left.max), getArrays(node.right, node.right.min, j) ]; # (i,j) is spread over left and right node
else:
return [getArrays(node.right, i, j)]; # (i,j) is located within right node
Preprocess: Make an nxn array where the [k][r] element is the kth smallest element of the first r elements (1-indexed for convenience).
Then, given some particular range [i,j] and value for k, do the following:
Find the element at the [k][j] slot of the matrix; call this x.
go down the i-1 column of your matrix and find how many values in it are smaller than or equal to x (treat column 0 as having 0 smaller entries). By construction, this column will be sorted (all columns will be sorted), so it can be found in log time. Call this value s
Find the element in the [k+s][j] slot of the matrix. This is your answer.
E.g., given 3 1 7 5 9
3 1 1 1 1
X 3 3 3 3
X X 7 5 5
X X X 7 7
X X X X 9
Now, if we're asked for the 2nd smallest in [2,4] range (again, 1-indexing), I first find the 2nd smallest in [1,4] range which is 3. I then look at column 1 and see that there is 1 element less than or equal to 3. Finally, I find the 3rd smallest in [1,4] range at [3][5] slot which is 5, as desired.
This takes n^2 space, and log(n) lookup time.
This one does not require pre-process but is somehow slower than O(logN). It's significantly faster than a naive iterate&count, and could support dynamic modification on the sequence.
It goes like this. Suppose the length n has n=2^x for some x. Construct a segment-tree whose root node represent [0,n-1]. For each of the node, if it represent a node [a,b], b>a, let it has two child nodes each representing [a,(a+b)/2], [(a+b)/2+1,b]. (That is, do a recursive divide-by-two).
Then, on each node, maintain a separate binary search tree for the numbers within that segment. Therefore, each modification on the sequence takes O(logN)[on the segement]*O(logN)[on the BST]. Queries can be done like this, Let Q(a,b,x) be rank of x within segment [a,b]. Obviously, if Q(a,b,x) can be computed efficiently, a binary search on x can compute the answer desired effectively (with an extra O(logE) factor.
Q(a,b,x) can be computed as: find smallest number of segments that make up [a,b], which can be done in O(logN) on the segment tree. For each segment, query on the binary search tree for that segment for the number of elements less than x. Add all these numbers to get Q(a,b,x).
This should be O(logN*logE*logN). Well not exactly what you have asked for though.
In O(log n) time it's not possible to read all of the elements of the array. Since it's not sorted, and there's no other provided information, this is impossible.
There's no way you can do better than O(n) in both worst and average case. You have to look at every single element.

Sorting a permutation with minimum cost

I am given a permutation of elements {1, 2, 3, ..., N} and I have to sort it using a swap operation. An operation which swaps elements x, y has cost min(x,y).
I need to find out the minimum cost of sorting the permutation. I tought about a greedy going from N to 1 and putting each element on it's position using a swap operation, but this is not a good idea.
Would this be optimal:
Find element 2
If it is not at correct place already
Find element at position 2
If swapping that with 2 puts both to right place
Swap them
Cost = Cost + min(2, other swapped element)
repeat
Find element 1
If element 1 is at position 1
Find first element that is in wrong place
If no element found
set sorted true
else
Swap found element with element 1
Cost = Cost + 1
else
Find element that should go to the position where 1 is
Swap found element with element 1
Cost = Cost + 1
until sorted is true
If seeks are trivial, then the minimum number of swaps will be determined by the number of cycles. It would follow a principle similar to Cuckoo Hashing. You take the first value in the permutation, and look at the value at the index of the value at the original index. If those match, then swap for a single operation.
[3 2 1] : Value 3 is at index one, so look at the value at index 3.
[3 2 1] : Value 1 is at index 3, so a two index cycle exists. Swap these values.
If not, push the first index onto a stack and seek the index for the value of the second index. There will eventually be a cycle. At that point, start swapping by popping values off the stack. This will take a number of swaps equal to n-1, where n is the length of the cycle.
[3 1 2] : Value 3 is at index one, so look at the value at index 3.
[3 1 2] : Value 2 is at index 3, so add 3 to the stack and seek to index 2. Also store 3 as the beginning value of the cycle.
[3 1 2] : Value 1 is at index 2, so add 2 to the stack and seek to index 1.
[3 1 2] : Value 3 is the beginning of the cycle, so swap pop 2 off the stack and swap values 1 and 2.
[1 3 2] : Pop 3 off the stack and swap 2 and 3, resulting in a sorted list with 2 swaps.
[1 2 3]
With this algorithm, the maximum number of swaps will be N-1, where N is the total number of values. This occurs when there is an N length cycle.
EDIT : This algorithm gives the minimum number of swaps, but not necessarily the minimum value using the min(x, y) function. I haven't done the math, but I believe that the only time when swap(x, y) = {swap(1, x), swap(1, y), swap(1, x)} shouldn't be used is when x in {2,3} and n < 2; Should be easy enough to write that as a special case. It may be better to check and place 2 and 3 explicitly, then follow the algorithm mentioned in the comments to achieve sorting in two operations.
EDIT 2 : Pretty sure this will catch all cases.
while ( unsorted ) {
while ( 1 != index(1) )
swap (1 , index (1) )
if (index(2) == value#(2))
swap (2, value#(2) )
else
swap (1 , highest value out of place)
}
If you have permutation of the numbers 1, 2, ..., N, then the sorted collection will be precisely 1, 2, ..., N. So you know the answer with complexity O(0) (i.e. you don't need an algorithm at all).
If you actually want to sort the range by repeated swapping, you can repeatedly "advance and cycle": Advance over the already sorted range (where a[i] == i), and then swap a[i] with a[a[i]] until you complete the cycle. Repeat until you reach the end. That needs at most N − 1 swaps, and it basically performs a cycle decomposition of the permutation.
Hmm. An interesting question. A quick algorithm that came up to my mind is to use elements as indices. We first find the index of an element that has 1 as value, and swap it with element of that number. Eventually this will end up with 1 appearing at first position, this means you have to swap 1 with some element that isn't yet in the position, and continue. This tops at 2*N-2, and has lower limit at N-1 for permutation (2,3,...,N,1), but the exact cost will vary.
Okay, given above algorithm and examples, I think the most optimal will be to follow exchanging 1 with anything until it first hits first place, then exchange 2 with second-place if it's not in place already, then continue swapping 1 with anything not yet in place, until sorted.
set sorted=false
while (!sorted) {
if (element 1 is in place) {
if (element 2 is in place) {
find any element NOT in place
if (no element found) sorted=true
else {
swap 1 with element found
cost++
}
} else {
swap 2 with element at second place
cost+=2
}
} else {
find element with number equals to position of element 1
swap 1 with element found
cost++
}
}
Use a bucket sort with bucket size of 1.
The cost is zero, since no swaps occur.
Now make a pass through the bucket array, and swap each value back to it's corresponding position in the original array.
That is N swaps.
The sum of N is N(N+1)/2 giving you an exact fixed cost.
A different interpretation is that you just store from the bucket array, back into the original array. That is no swaps, hence the cost is zero, which is a reasonable minimum.

find median with minimum time in an array

I have an array lets say a = { 1,4,5,6,2,23,4,2};
now I have to find median of array position from 2 to 6 (odd total terms), so what I have done, I have taken a[1] to a[5] in arr[0] to arr[4] then I have sorted it and write the arr[2] as the median .
But here every time I put values from one array to another, so that the values of my initial array remains the same. Secondly, I have sorted, so this procedure is taking pretty much **time**.
So I want to know if there is any way I can do this differently to reduce my computation time.
Any websites, material to understand, what, and how to do?
Use std::nth_element from <algorithm> which is O(N):
nth_element(a, a + size / 2, a + size);
median = a[size/2];
It is possible to find the median without sorting in O(n) time; algorithms that do this are called selection algorithms.
If you are doing multiple queries on the same array then you could use a Segment Tree. They are generally used to do range minimum/maximum and range sum queries but you can change it to do range median.
A segment tree for a set with n intervals uses O(n log n) storage and can be built in O(n log n) time. A range query can be done in O(log n).
Example of median in range segment tree:
You build the segment tree from the bottom up (update from the top down):
[5]
[3] [7]
[1,2] [4] [6] [8]
1 2 3 4 5 6 7 8
Indices covered by node:
[4]
[2] [6]
[0,1] [3] [5] [7]
0 1 2 3 4 5 6 7
A query for median for range indices of 4-6 would go down this path of values:
[4]
[5]
0 1 2 3 4 5 6 7
Doing a search for the median, you know the number of total elements in the query (3) and the median in that range would be the 2nd element (index 5). So you are essentially doing a search for the first node which contains that index which is node with values [1,2] (indices 0,1).
Doing a search of the median of the range 3-6 is a bit more complicated because you have to search for two indices (4,5) which happen to lie in the same node.
[4]
[6]
[5]
0 1 2 3 4 5 6 7
Segment tree
Range minimum query on Segment Tree
To find the median of an array of less than 9 elements, I think the most efficient is to use a sort algorithm like insertion sort. The complexity is bad, but for such a small array because of the k in the complexity of better algorithms like quicksort, insertion sort is very efficient. Do your own benchmark but I can tell you will have better results with insertion sort than with shell sort or quicksort.
I think the best way is to use the median of medians algorithm of counting the k-th largest element of an array. You can find the overall idea of the algorithm here: Median of Medians in Java , on wikipedia: http://en.wikipedia.org/wiki/Selection_algorithm#Linear_general_selection_algorithm_-_Median_of_Medians_algorithm or just browse the internet. Some general improvements can be made during implementation (avoid sorting when choosing the median of particular arrays). However, note that for an array of less than 50 elements its more efficient to use insertion sort than median of medians algorithm.
All existing answers have some downsides in certain situations:
Sorting the entire subrange is not very efficient because one does not need to sort the entire array to get the median, and one needs an additional array if multiple subrange medians are to be found.
Using std::nth_element is more efficient but it still mutates the subrange, so one still needs an additional array.
Using segment tree gets you an efficent solution but you need to either implement the structure yourself or use a third party library.
For this reason, I am posting my approach which uses std::map and is inspired by selection sort algorithm:
First collect the frequencies of elements in first subrange into an object of std::map<int, int>.
With this object, we can efficently find the median of the subrange whose length is subrangeLength:
double median(const std::map<int, int> &histogram, int subrangeLength)
{
const int middle{subrangeLength / 2};
int count{0};
/* We use the fact that keys in std::map are sorted, so by simply iterating
and adding up the frequencies, we can find the median. */
if (subrangeLength % 2 == 1) {
for (const auto &freq : histogram) {
count += freq.second;
/* In case where subrangeLength is odd, "middle" is the lower integer bound of
subrangeLength / 2, so as soon as we cross it, we have found the median. */
if (count > middle) {
return freq.first;
}
}
} else {
std::optional<double> medLeft;
for (const auto &freq : histogram) {
count += freq.second;
/* In case where subrangeLength is even, we need to pay attention to the case when
elements at positions middle and middle + 1 are different. */
if (count == middle) {
medLeft = freq.first;
} else if (count > middle) {
if (!medLeft) {
medLeft = freq.first;
}
return (*medLeft + freq.first) / 2.0;
}
}
}
return -1;
}
Now when we want to get the median of next subrange, we simply update the histogram by decreasing the frequency of the element that is to be removed and add/increase it for the new element (with std::map, this is done in constant time). Now we compute the median again and continue with this until we handle all subranges.

Resources