Insertion sort comparison? - algorithm

How to count number of comparisons in insertion sort in less than O(n^2) ?

When we're inserting an element, we alternate comparisons and swaps until either (1) the element compares not less than the element to its right (2) we hit the beginning of the array. In case (1), there is one comparison not paired with a swap. In case (2), every comparison is paired with a swap. The upward adjustment for number of comparisons can be computed by counting the number of successive minima from left to right (or however your insertion sort works), in time O(n).
num_comparisons = num_swaps
min_so_far = array[0]
for i in range(1, len(array)):
if array[i] < min_so_far:
min_so_far = array[i]
else:
num_comparisons += 1

As commented, to do it in less than O(n^2) is hard, maybe impossible if you must pay the price for sorting. If you already know the number of comparisons done at each external iteration then it would be possible in O(n), but the price for sorting was payed sometime before.
Here is a way for counting the comparisons inside the method (in pseudo C++):
void insertion_sort(int p[], const size_t n, size_t & count)
{
for (long i = 1, j; i < n; ++i)
{
auto tmp = p[i];
for (j = i - 1; j >= 0 and p[j] > tmp; --j) // insert a gap where put tmp
p[j + 1] = p[j];
count += i - j; // i - j is the number of comparisons done in this iteration
p[j + 1] = tmp;
}
}
n is the number of elements and count the comparisons counter which must receive a variable set to zero.

If I remember correctly, this is how insertion sort works:
A = unsorted input array
B := []; //sorted output array
while(A is not empty) {
remove first element from A and add it to B, preserving B's sorting
}
If the insertion to B is implemented by linear search from the left until you find a greater element, then the number of comparisons is the number of pairs (i,j) such that i < j and A[i] >= A[j] (I'm considering the stable variant).
In other words, for each element x, count the number of elements before x that have less or equal value. That can be done by scanning A from the left, adding it's element to some balanced binary search tree, that also remembers the number of elements under each node. In such tree, you can find number of elements lesser or equal to a certain value in O(log n). Total time: O(n log n).

Related

Q: Count array pairs with bitwise AND > k ~ better than O(N^2) possible?

Given an array nums
Count no. of pairs (two elements) where bitwise AND is greater than K
Brute force
for i in range(0,n):
for j in range(i+1,n):
if a[i]&a[j] > k:
res += 1
Better version:
preprocess to remove all elements ≤k
and then brute force
But i was wondering, what would be the limit in complexity here?
Can we do better with a trie, hashmap approach like two-sum?
( I did not find this problem on Leetcode so I thought of asking here )
Let size_of_input_array = N. Let the input array be of B-bit numbers
Here is an easy to understand and implement solution.
Eliminate all values <= k.
The above image shows 5 10-bit numbers.
Step 1: Adjacency Graph
Store a list of set bits. In our example, 7th bit is set for numbers at index 0,1,2,3 in the input array.
Step 2: The challenge is to avoid counting the same pairs again.
To solve this challenge we take help of union-find data structure as shown in the code below.
//unordered_map<int, vector<int>> adjacency_graph;
//adjacency_graph has been filled up in step 1
vector<int> parent;
for(int i = 0; i < input_array.size(); i++)
parent.push_back(i);
int result = 0;
for(int i = 0; i < adjacency_graph.size(); i++){ // loop 1
auto v = adjacency_graph[i];
if(v.size() > 1){
int different_parents = 1;
for (int j = 1; j < v.size(); j++) { // loop 2
int x = find(parent, v[j]);
int y = find(parent, v[j - 1]);
if (x != y) {
different_parents++;
union(parent, x, y);
}
}
result += (different_parents * (different_parents - 1)) / 2;
}
}
return result;
In the above code, find and union are from union-find data structure.
Time Complexity:
Step 1:
Build Adjacency Graph: O(BN)
Step 2:
Loop 1: O(B)
Loop 2: O(N * Inverse of Ackermann’s function which is an extremely slow-growing function)
Overall Time Complexity
= O(BN)
Space Complexity
Overall space complexity = O(BN)
First, prune everything <= k. Also Sort the value list.
Going from the most significant bit to the least significant we are going to keep track of the set of numbers we are working with (initially all ,s=0, e=n).
Let p be the first position that contains a 1 in the current set at the current position.
If the bit in k is 0, then everything that would yield a 1 world definetly be good and we need to investigate the ones that get a 0. We have (end - p) * (end-p-1) /2 pairs in the current range and (end-p) * <total 1s in this position larger or equal to end> combinations with larger previously good numbers, that we can add to the solution. To continue we update end = p. We want to count 1s in all the numbers above, because we only counted them before in pairs with each other, not with the numbers this low in the set.
If the bit in k is 1, then we can't count any wins yet, but we need to eliminate everything below p, so we update start = p.
You can stop once you went through all the bits or start==end.
Details:
Since at each step we eliminate either everything that has a 0 or everything that has a 1, then everything between start and end will have the same bit-prefix. since the values are sorted we can do a binary search to find p.
For <total 1s in this position larger than p>. We already have the values sorted. So we can compute partial sums and store for every position in the sorted list the number of 1s in every bit position for all numbers above it.
Complexity:
We got bit-by-bit so L (the bit length of the numbers), we do a binary search (logN), and lookup and updates O(1), so this is O(L logN).
We have to sort O(NlogN).
We have to compute partial bit-wise sums O(L*N).
Total O(L logN + NlogN + L*N).
Since N>>L, L logN is subsummed by NlogN. Since L>>logN (probably, as in you have 32 bit numbers but you don't have 4Billion of them), then NlogN is subsummed by L*N. So complexity is O(L * N). Since we also need to keep the partial sums around the memory complexity is also O(L * N).

Comparing between two arrays

How can I compare between two arrays with sorted contents of integer in binary algorithm?
As in every case: it depends.
Assuming that the arrays are ordered or hashed the time complexity is at most O(n+m).
You did not mention any language, so it's pseudo code.
function SortedSequenceOverlap(Enumerator1, Enumerator2)
{ while (Enumerator1 is not at the end and Enumerator2 is not at the end)
{ if (Enumerator1.current > Enumerator2.current)
Enumerator2.fetchNext()
else if (Enumerator2.current > Enumerator1.current)
Enumerator1.fetchNext()
else
return true
}
return false
}
If the sort order is descending you need to use a reverse enumerator for this array.
However, this is not always the fastest way.
If one of the arrays have significantly different size it could be more efficient to use binary search for a few elements of the elements of the shorter array.
This can be even more improved because when you start with the median element of the small array you need not do a full search for any further element. Any element before the median element must be in the range before the location where the median element was not found and any element after the median element must be in the upper range of the large array. This can be applied recursively until all elements have been located. Once you get a hit, you can abort.
The disadvantage of this method is that it takes more time in worst case, i.e. O(n log m), and it requires random access to the arrays which might impact cache efficiency.
On the other side, multiplying with a small number (log m) could be better than adding a large number (m). In contrast to the above algorithm typically only a few elements of the large array have to be accessed.
The break even is approximately when log m is less than m/n, where n is the smaller number.
You think that's it? - no
In case the random access to the larger array causes a higher latency, e.g. because of reduced cache efficiency, it could be even better to do the reverse, i.e. look for the elements of the large array in the small array, starting with the median of the large array.
Why should this be faster? You have to look up much more elements.
Answer:
No, there are no more lookups. As soon as the boundaries where you expect a range of elements of the large array collapses you can stop searching for these elements since you won't find any hits anymore.
In fact the number of comparisons is exactly the same.
The difference is that a single element of the large array is compared against different elements of the small array in the first step. This takes only one slow access for a bunch of comparisons while the other way around you need to access the same element several times with some other elements accesses in between.
So there are less slow accesses at the expense of more fast ones.
(I implemented search as you type this way about 30 years ago where access to the large index required disk I/O.)
If you know that they are sorted, then you can have a pointer to the beginning of each array, and move on both arrays, and moving one of the pointers up (or down) after each comparison. That would make it O(n). Not sure you could bisect anything as you don't know where the common number would be.
Still better than the brute force O(n2).
If you know the second array is sorted you can use binary search to look through the second array for elements from the first array.
This can be done in two ways.
a) Binary Search b) Linear Search
For Binary Search - for each element in array A look for element in B with binary search, then in that case the complexity is O(n log n )
For Linear Search - it is O( m + n ) - where m, n are sizes of the arrays. In your case m = n.
Linear search :
Have two indices i, j that point to the arrays A, B
Compare A[i], B[j]
If A[i] < B[j] then increment i, because any match if exists can be found only in later indices in A.
If A[i] > B[j] then increment j, because any match if exists can be found only in later indices in B.
If A[i] == B[j] you found the answer.
Code:
private int findCommonElement(int[] A, int[] B) {
for ( int i = 0, j = 0; i < A.length && j < B.length; ) {
if ( A[i] < B[j] ) {
i++;
} else if ( A[i] > B[j] ) {
j++;
}
return A[i];
}
return -1; //Assuming all integers are positive.
}
Now if you have both descending, just reverse the comparison signs i.e. if A[i] < B[j] increment j else increment i
If you have one descending (B) one ascending (A) then i for A starts from beginning of array and j for B starts from end of array and move them accordingly as shown below:
for (int i = 0, j = B.length - 1; i < A.length && j >= 0; ) {
if ( A[i] < B[j] ) {
i++;
} else if ( A[i] > B[j] ) {
j--;
}
return A[i];
}

Find ascending triples in a list

I encountered the problem in a programming interview and have no idea about it by now.
A list whose length is n, the elements in it are all positive integers without order.
To find out all possible triples (a, b, c), that a < b < c, and a appears before b and b before c in the list.
And analyse the time complexity of your algorithm.
No general algorithm can be faster than O(n^3), as given a sorted input of distinct elements then the output will have size O(n^3), so just to produce the output will take time proportional. Indeed even a randomly generated list of integers will already have n^3 triples up until constant factors.
That given you could simply iterate over all possible triples in list order, and compare them for sorted order. This naive solution is already the best it can be asymptotically (that is O(n^3))
for (int i = 0; i < n; i++)
for (int j = i+1; j < n; j++)
for (int k = j+1; k < n; k++)
if (X[i] < X[j] && X[j] < X[k)
output(X[i],X[j],X[k])
I suspect you may have a transcription error in your problem statement - or else the question is supposed to be a very easy short coding exercise.
If it is known that there are only a small set of triples (say k), then you may prefer to find all the triples by storing pointers to the previous smallest element.
ALGORITHM
Prepare an empty data structure (possible choices described later).
Prepare an empty array B of length n.
Then for each element c in the list:
Store the index (in the array B) of the most recent element in the list that is smaller than c (if it exists) using the data structure.
Store c (and its index in the original list) in the data structure
Then use array B to find all elements b smaller than c, and then again to find all elements a smaller than b, and emit all these combinations as output triples.
DATA STRUCTURE
The data structure needs to be able to store value,position pairs to make it easy to find the largest position (i.e. most recent) over all elements with value less than c.
One easy way to do this if the range of allowed values is fairly small is to use a series of arrays where A[k][x] stores the maximum position for all elements in the range [x*2^k,(x+1)*2^k).
If the values have up to M bits (i.e. the values are in the range 0 to 2^M-1) then updating or accessing this data structure are both O(M) operations.
COMPLEXITY
The given method is O(nM+k).
If the values have a larger range, then you could use a form of binary search tree instead of the series of arrays, or instead sort the values and replace the values with their ordinal value. This would then have complexity O(nlogn+k).
COUNTING TRIPLES
If you just wish to know the total number of triples of this form then you can do this in O(n).
The idea is similar to before:
Find the most recent smaller element for each index, and the count of smaller elements for each index
Find the next greater element for each index, and the count of greater elements
Compute the sum of the product of the count of smaller elements and the count of larger elements for each index.
To make this O(n) we need to be able to find the next greater element in O(n). This can be done by:
Push the current index i to the stack
while A[top(stack)] < A[i+1], pop an index x off the stack and store NGE[x]=i+1
increment i and return to step 1
We also need to be able to find the count of greater elements in O(n). Once the NGE array has been prepared, we can find the counts by iterating backwards over the array and computing
count_greater_elements[i] = count_greater_elements[ NGE[i] ] + 1 if NGE[i] is defined
= 0 otherwise
The most recent smaller elements and counts can be computed in an analogous way.
N^2 solution for general case (to count all such triples, not output them all; output will take n^3 just because of its size):
for each number X in array lets count amount of numbers less than X with indexes less than x and amount of numbers greater than X with indexes greater than X. Than for each X we can get number of triples in which X is the middle element just as less[X] * greater[X]. The answer is sum of such products.
int calc(vector<int> numbers) {
int n = numbers.size();
vector<int> less(n), more(n);
for (int i = 0; i < n; i++)
for (int j = i + 1; j < n; j++)
if (numbers[i] < numbers[j])
less[j]++, more[i]++;
int res = 0;
for (int i = 0; i < n; i++)
res += less[i] * more[i];
return res;
}

Amortized Time Cost using Accounting Method

I written an algorithm to calculate the next lexicographic permutation of an array of integers (ex. 123, 132, 213, 231, 312,323). I dont think the code is necessary but I included it below.
I think I have appropriately determined worst case time cost of O(n) where n is the number of elements in the array. I understand however if you utilize "Amortized Cost" you would find that the time cost could be accurately shown as O(1) on average case.
Question:
I would like to learn the "ACCOUNTING METHOD" to show this as O(1) but am having difficulty understanding how to apply a cost to each operation. Accounting method: Link: Accounting_Method_Explained
Thoughts:
Ive thought to apply a cost of changing a value at a position, or applying the cost to a swap. But it really doesnt make much sense.
public static int[] getNext(int[] array) {
int temp;
int j = array.length - 1;
int k = array.length - 1;
// Find largest index j with a[j] < a[j+1]
// Finds the next adjacent pair of values not in descending order
do {
j--;
if(j < 0)
{
//Edge case, where you have the largest value, return reverse order
for(int x = 0, y = array.length-1; x<y; x++,y--)
{
temp = array[x];
array[x] = array[y];
array[y] = temp;
}
return array;
}
}while (array[j] > array[j+1]);
// Find index k such that a[k] is smallest integer
// greater than a[j] to the right of a[j]
for (;array[j] > array[k]; k--,count++);
//Swap the two elements found from j and k
temp = array[k];
array[k] = array[j];
array[j] = temp;
//Sort the elements to right of j+1 in ascending order
//This will make sure you get the next smallest order
//after swaping j and k
int r = array.length - 1;
int s = j + 1;
while (r > s) {
temp = array[s];
array[s++] = array[r];
array[r--] = temp;
}
return array;
} // end getNext
Measure running time in swaps, since the other work per iteration is worst-case O(#swaps).
The swap of array[j] and array[k] has virtual cost 2. The other swaps have virtual cost 0. Since at most one swap per iteration is costly, the running time per iteration is amortized constant (assuming that we don't go into debt).
To show that we don't go into debt, it suffices to show that, if the swap of array[j] and array[k] leaves a credit at position j, then every other swap involves a position with a credit available, which is consumed. Case analysis and induction reveal that, between iterations, if an item is larger than the one immediately following it, then it was put in its current position by a swap that left an as-yet unconsumed credit.
This problem is not a great candidate for the accounting method, given the comparatively simple potential function that can be used: number of indexes j such that array[j] > array[j + 1].
From the aggregate analysis, we see T(n) < n! · e < n! · 3, so we pay $3 for each operation, and its enough for the total n! operations. Therefore its an upper bound of actual cost. So the total amortized

selection algorithm problem

Suppose you have an array A of n items, and you want to find the k items in A closest
to the median of A. For example, if A contains the 9 values {7, 14, 10, 12, 2, 11, 29, 3, 4}
and k = 5, then the answer would be the values {7, 14, 10, 12, 11}, since the median
is 10 and these are the five values in A closest to the value 10. Give an algorithm
to solve this problem in O(n) time.
I know that a selection algorithm (deep selection) is the appropriate algorithm for this problem, but I think that would run in O(n*logn) time instead of O(n). Any help would be greatly appreciated :)
You will first need to find the median, which can be done in O(n) (for example using Hoare's Quickselect algorithm).
Then you will need to implement a sorting algorithm which sorts the elements in the array according to their absolute distance to the median (smallest distances first).
If you were to sort the entire array this way, this would typically take somewhere from O(n * log n) to O(n^2), depending on the algorithm being used. However since you only need the first k values, the complexity can be reduced to O(k * log n) to O(k * n).
Since k is a constant and does not depend on the size of the array, the overall complexity in a worst case scenario will be: O(n) (for finding the median) + O(k * n) (sorting), which is O(n) overall.
I think you can do this using a variant on quicksort.
You start with a set S of n items and are looking for the "middle" k items. You can think of this as partitioning S into three parts of sizes n - k/2 (the "lower" items), k (the "middle" items), and n - k/2 (the "upper" items).
This gives us a strategy: first remove the lower n - k/2 items from S, leaving S'. Then remove the upper n - k/2 items from S', leaving S'', which is the middle k items of S.
You can easily partition a set this way using "half a quicksort": choose a pivot, partition the set into L and U (lower and upper elements w.r.t. the pivot), then you know the items to discard in the partition must be either all of L and some of U or vice versa: recurse accordingly.
[Thinking further, this may not be exactly what you want if you define "closest to the median" in some other way, but it's a start.]
Assumption: we care about the k values in A that are closest to the median. If we had an A={1,2,2,2,2,2,2,2,2,2,2,2,3}, and k=3, the answer is {2,2,2}. Similarly, if we have A={0,1,2,3,3,4,5,6}, and k=3, answers {2,3,3} and {3,3,4} are equally valid. Furthermore, we are not interested in the indices from which these values came, though I imagine some small tweaks to the algorithm would work.
As Grodrigues states, first find the median in O(n) time. While we're at it, keep track of the largest and smallest number
Next, create an array K, k items long. This array will contain the distance an item is from the median. (note that
Copy the first k items from A into K.
For each item A[i], compare the distance of A[i] from the median to each item of K. If A[i] is closer to the median than the farthest item from the median in K, replace that item. As an optimization, we could also track K's closest and farthest items from the median, so we have a faster comparison to K, or we could keep K sorted, but neither optimization is necessary to operate in O(n) time.
Pseudocode, C++ ish:
/* n = length of array
* array = A, given in the problem
* result is a pre-allocated array where the result will be placed
* k is the length of result
*
* returns
* 0 for success
* -1 for invalid input
* 1 for other errors
*
* Implementation note: optimizations are skipped.
*/
#define SUCCESS 0
#define INVALID_INPUT -1
#define ERROR 1
void find_k_closest(int n, int[] array, int k, int[] result)
{
// if we're looking for more results than possible,
// it's impossible to give a valid result.
if( k > n ) return INVALID_INPUT;
// populate result with the first k elements of array.
for( int i=0; i<k; i++ )
{
result[i] = array[i];
}
// if we're looking for n items of an n length array,
// we don't need to do any comparisons
// Up to this point, function is O(k). Worst case, k==n,
// and we're O(n)
if( k==n ) return 0;
// Assume an O(n) median function
// Note that we don't bother finding the median if there's an
// error or if the output is the input.
int median = median(array);
// Convert the result array to be distance, not
// actual numbers
for( int i=0; i<k; i++)
{
result[i] = result[i]-median;
// if array[i]=1, median=3, array[i] will be set to 2.
// 4 3 -1.
}
// Up to this point, function is O(2k+n) = O(n)
// find the closest items.
// Outer loop is O(n * order_inner_loop)
// Inner loop is O(k)
// Thus outer loop is O(2k*n) = O(n)
// Note that we start at k, since the first k elements
// of array are already in result.
OUTER: for(int i=k; i<n; i++)
{
int distance = array[i]-median;
int abs_distance = abs(distance);
// find the result farthest from the median
int idx = 0;
#define FURTHER(a,b) ((abs(a)>abs(b)) ? 1 : 0;
INNER: for( int i=1; i<k; i++ )
{
idx = (FURTHER(result[i],result[i-1])) ? i:i-1;
}
// If array[i] is closer to the median than the farthest element of
// result, replace the farthest element of result with array[i]
if( abs_distance < result[idx] ){ result[idx] = distance; }
}
}
// Up to this point, function is O(2n)
// convert result from distance to values
for( int i=0; i<k; i++)
{
result[i] = median - result[i];
// if array[i]=2 , median=3, array[i] will be set to 1.
// -1 3 4.
}
}

Resources