Comparing between two arrays - sorting

How can I compare between two arrays with sorted contents of integer in binary algorithm?

As in every case: it depends.
Assuming that the arrays are ordered or hashed the time complexity is at most O(n+m).
You did not mention any language, so it's pseudo code.
function SortedSequenceOverlap(Enumerator1, Enumerator2)
{ while (Enumerator1 is not at the end and Enumerator2 is not at the end)
{ if (Enumerator1.current > Enumerator2.current)
Enumerator2.fetchNext()
else if (Enumerator2.current > Enumerator1.current)
Enumerator1.fetchNext()
else
return true
}
return false
}
If the sort order is descending you need to use a reverse enumerator for this array.
However, this is not always the fastest way.
If one of the arrays have significantly different size it could be more efficient to use binary search for a few elements of the elements of the shorter array.
This can be even more improved because when you start with the median element of the small array you need not do a full search for any further element. Any element before the median element must be in the range before the location where the median element was not found and any element after the median element must be in the upper range of the large array. This can be applied recursively until all elements have been located. Once you get a hit, you can abort.
The disadvantage of this method is that it takes more time in worst case, i.e. O(n log m), and it requires random access to the arrays which might impact cache efficiency.
On the other side, multiplying with a small number (log m) could be better than adding a large number (m). In contrast to the above algorithm typically only a few elements of the large array have to be accessed.
The break even is approximately when log m is less than m/n, where n is the smaller number.
You think that's it? - no
In case the random access to the larger array causes a higher latency, e.g. because of reduced cache efficiency, it could be even better to do the reverse, i.e. look for the elements of the large array in the small array, starting with the median of the large array.
Why should this be faster? You have to look up much more elements.
Answer:
No, there are no more lookups. As soon as the boundaries where you expect a range of elements of the large array collapses you can stop searching for these elements since you won't find any hits anymore.
In fact the number of comparisons is exactly the same.
The difference is that a single element of the large array is compared against different elements of the small array in the first step. This takes only one slow access for a bunch of comparisons while the other way around you need to access the same element several times with some other elements accesses in between.
So there are less slow accesses at the expense of more fast ones.
(I implemented search as you type this way about 30 years ago where access to the large index required disk I/O.)

If you know that they are sorted, then you can have a pointer to the beginning of each array, and move on both arrays, and moving one of the pointers up (or down) after each comparison. That would make it O(n). Not sure you could bisect anything as you don't know where the common number would be.
Still better than the brute force O(n2).

If you know the second array is sorted you can use binary search to look through the second array for elements from the first array.

This can be done in two ways.
a) Binary Search b) Linear Search
For Binary Search - for each element in array A look for element in B with binary search, then in that case the complexity is O(n log n )
For Linear Search - it is O( m + n ) - where m, n are sizes of the arrays. In your case m = n.
Linear search :
Have two indices i, j that point to the arrays A, B
Compare A[i], B[j]
If A[i] < B[j] then increment i, because any match if exists can be found only in later indices in A.
If A[i] > B[j] then increment j, because any match if exists can be found only in later indices in B.
If A[i] == B[j] you found the answer.
Code:
private int findCommonElement(int[] A, int[] B) {
for ( int i = 0, j = 0; i < A.length && j < B.length; ) {
if ( A[i] < B[j] ) {
i++;
} else if ( A[i] > B[j] ) {
j++;
}
return A[i];
}
return -1; //Assuming all integers are positive.
}
Now if you have both descending, just reverse the comparison signs i.e. if A[i] < B[j] increment j else increment i
If you have one descending (B) one ascending (A) then i for A starts from beginning of array and j for B starts from end of array and move them accordingly as shown below:
for (int i = 0, j = B.length - 1; i < A.length && j >= 0; ) {
if ( A[i] < B[j] ) {
i++;
} else if ( A[i] > B[j] ) {
j--;
}
return A[i];
}

Related

Find majority element when the values are unknown

Suppose I have an array of elements.
I cannot read the values of the elements. I can only compare any two elements from the array to know whether they are the same or not, but even then I don't get to know their actual values.
Suppose this array has a majority of elements of the same value. I need to find and return any of the majority elements. How would I do it?
We have to be be able to do it in a big thet.of n l0g n.
Keep track of two indices, i & j. Initialize i=0, j=1. Repeatedly compare arr[i] to arr[j].
if arr[i] == arr[j], increment j.
if arr[i] != arr[j]
eliminate both from the array
increment i to the next index that hasn't been eliminated.
increment j to the next index >i that hasn't been eliminated.
The elimination operation will eliminate at least one non-majority element each time it eliminates a majority element, so majority is preserved. When you've gone through the array, all elements not eliminated will be in the majority, and you're guaranteed at least one.
This is O(n) time, but also O(n) space to keep track of eliminations.
Given:
an implicit array a of length n, which is known to have a majority element
an oracle function f, such that f(i, j) = a[i] == a[j]
Asked:
Return an index i, such that a[i] is a majority element of a.
Main observation:
If
m is a majority element of a, and
for some even k < n each element of a[0, k) occurs at most k / 2 times
then m is a majority element of a[k, n).
We can use that observation by assuming that the first element is the majority element. We move through the array until we reach a point where that element occurred exactly half the time. Then we discard the prefix and continue again from that point on. This is exactly what the Boyer-Moore algorithm does, as pointed out by Rici in the comments.
In code:
result = 0 // index where the majority element is
count = 0 // the number of times we've seen that element in the current prefix
for i = 0; i < n; i++ {
// we've seen the current majority candidate exactly half of the time:
// discard the current prefix and start over
if (count == 0) {
result = i
}
// keep track of how many times we've seen the current majority candidate in the prefix
if (f(result, i)) {
count++
} else {
count--
}
}
return result
For completeness: this algorithm uses two variables and a single loop, so it runs in O(n) time and O(1) space.
Assuming you can determine if elements are <, >, or == what you can do is go through the list and build a tree. The trees values will be like buckets, the item and count of how many you've seen. When you come by a node where you get == then just increment the count. Then at the end go through the tree and find the one with the highest count.
Assuming you build a balanced tree, this should be O(n log n). Red Black trees might help with making a balanced tree. Else you could build the tree by adding randomly selected elements and this would give you O(n log n) on average.

Algorithm: finding closest differences between two elements of array

You have a array, and a target. Find the difference of the two elements of the array. And this difference should be closest to the target.
(i.e., find i, j such that (array[i]- array[j]) should be closest to target)
Attempt:
I use a order_map (C++ hash-table) to store each element of the array. This would be O(n).
Then, I output the ordered element to a new array (which is sorted increasing number).
Next, I use two pointers.
int mini = INT_MAX, a, b;
for (int i=0, j = ordered_array.size() -1 ; i <j;) {
int tmp = ordered_array[i] - ordered_array[j];
if (abs(tmp - target) < mini) {
mini = abs(tmp - target);
a = i;
b = j;
}
if (tmp == target) return {i,j};
else if (tmp > target) j --;
else i ++;
}
return {a,b};
Question:
Is my algorithm still runs at O(n)?
If the array is already sorted, there is an O(n) algorithm: let two pointers swipe up through the array, whenever the difference between the pointed at elements is smaller than target increase the upper one, whenever it is larger than target increase the lower one. While doing so, keep track of the best result found so far.
When you reach the end of the array the overall best result has been found.
It is easy to see that this is really O(n). Both pointers will only move upwards, in each step you move exactly one pointer and the array has n elements. So after at most 2n steps this will halt.
As already mentioned in some comments, if you need to sort the array first, you need the O(n log n) effort for sorting (unless you can use some radix sort or counting sort).
Your searching phase is linear. Two-pointers approach is equivalent to this:
Make a copy of sorted array
Add `target` to every entry (shift values up or down) (left picture)
Invert shifted array indexing (right picture)
Walk through both arrays in parallel, checking for absolute difference
All stages are linear (and inverting is implicit in your code)
P.S. Is C++ hash map ordered? I doubt. Sorted array creation is in general O(nlogn) (except for special methods like counting or radix sort)

Insertion sort comparison?

How to count number of comparisons in insertion sort in less than O(n^2) ?
When we're inserting an element, we alternate comparisons and swaps until either (1) the element compares not less than the element to its right (2) we hit the beginning of the array. In case (1), there is one comparison not paired with a swap. In case (2), every comparison is paired with a swap. The upward adjustment for number of comparisons can be computed by counting the number of successive minima from left to right (or however your insertion sort works), in time O(n).
num_comparisons = num_swaps
min_so_far = array[0]
for i in range(1, len(array)):
if array[i] < min_so_far:
min_so_far = array[i]
else:
num_comparisons += 1
As commented, to do it in less than O(n^2) is hard, maybe impossible if you must pay the price for sorting. If you already know the number of comparisons done at each external iteration then it would be possible in O(n), but the price for sorting was payed sometime before.
Here is a way for counting the comparisons inside the method (in pseudo C++):
void insertion_sort(int p[], const size_t n, size_t & count)
{
for (long i = 1, j; i < n; ++i)
{
auto tmp = p[i];
for (j = i - 1; j >= 0 and p[j] > tmp; --j) // insert a gap where put tmp
p[j + 1] = p[j];
count += i - j; // i - j is the number of comparisons done in this iteration
p[j + 1] = tmp;
}
}
n is the number of elements and count the comparisons counter which must receive a variable set to zero.
If I remember correctly, this is how insertion sort works:
A = unsorted input array
B := []; //sorted output array
while(A is not empty) {
remove first element from A and add it to B, preserving B's sorting
}
If the insertion to B is implemented by linear search from the left until you find a greater element, then the number of comparisons is the number of pairs (i,j) such that i < j and A[i] >= A[j] (I'm considering the stable variant).
In other words, for each element x, count the number of elements before x that have less or equal value. That can be done by scanning A from the left, adding it's element to some balanced binary search tree, that also remembers the number of elements under each node. In such tree, you can find number of elements lesser or equal to a certain value in O(log n). Total time: O(n log n).

Avoiding duplications when comparing array elements

The question is something like: Go through an array and find pairs of elements that add up to a certain sum k.
for (auto i : array) {
for (auto j : array) {
if (i+j==k) {
*Do something
}
}
}
Say we had array = [1,2,5] and k=3; when i=1 and j=2, we would execute the Do something. But when i=2 and j=1, we would execute Do something again, even though we have already found 2 elements and we would be repeating the answer.
Essentially, how can one go through an array and avoid comparing the same 2 elements multiple times?
If order doesn't matter, then decide which half of all pairs you want: the one where i >= j, or the one where i <= j.
for (i=0...)
for (j=i...)
Like rici commented, this is a very slow (O(N^2)) algorithm compared to sorting (a copy of) the vector. I think then you could iterate i forwards from the start, and j backwards from the end, comparing i+j <= k to decide whether to modify i or j next.
If I am not getting your question wrong then this explained procedure could be a way to solve it.
Sort your array in increasing order ( Preferably merge sort because it takes O(n.log(n)) time ).
Now, take two indices let call them low and high, where low is index to the left-most smallest number of the sorted array and high is index to the right-most largest number of the same sorted array.
Now, do like this :
while(low < high) {
if(arr[low] + arr[high] == k) {
print "unique pair found"
high = high - 1;
low = low + 1;
}
else if(arr[low] + arr[high] > k){
high = high - 1;
}
else {
low = low + 1;
}
}
It's time complexity is O(n), it will definitely give what you are looking for ? (Any doubt most welcome).
You could put the results of i + j in a vector, then sort it and then remove duplicates, and finally go over the vector and "do something".

implementing an algorithm that requires minimal memory

I am trying to implement an algorithm to find N-th largest element that requires minimal memory.
Example:
List of integers: 1;9;5;7;2;5. N: 2
after duplicates are removed, the list becomes 1;9;5;7;2.
So, the answer is 7, because 7 is the 2-nd largest element in the modified list.
In the below algorithm i am using bubble sort to sort my list and then removing duplicates without using a temp variable, does that make my program memory efficient ? Any ideas or suggestion
type Integer_array is Array (Natural range <>) of Integer;
procedure FindN-thLargestNumber (A : in out Integer_Array) is
b : Integer;
c:Integer;
begin
//sorting of array
for I in 0 to length(A) loop
for J in 1 to length(A) loop
if A[I] > A[J] then
A[I] = A[I] + A[J];
A[J] = A[I] - A[J];
A[I] = A[I] - A[J];
end if;
end loop;
end loop;
//remove duplicates
for K in 1 to length(A) loop
IF A[b] != A[K] then
b++;
A[b]=A[K];
end loop;
c = ENTER TO FIND N-th Largest number
PRINT A[b-(c-1)] ;
end FindN-th Largest Number
To find the n'th largest element you don't need to sort the main list at all. Note that this algorithm will perform well if N is smaller than M. If N is a large fraction of the list size then then you will be better off just sorting the list.
You just need a sorted list holding your N largest, then pick the smallest from that list (this is all untested so will probably need a couple of tweaks):
int[n] found = new int[n];
for (int i = 0;i<found.length;i++) {
found[i] = Integer.MIN_VALUE;
}
for (int i: list) {
if (i > found[0]) {
int insert = 0;
// Find the point in the array to insert the value
while (insert < found.length && found[insert] < i) {
insert++;
}
// If not at the end we have found a larger value, so move back one before inserting
if (found[insert] >= i) {
insert --;
}
// insert the value and shuffle everything BELOW it down.
for (int j=insert;j<=0;j--) {
int temp = found[j];
found[j]=i;
i=temp;
}
}
}
At the end you have the top N values from your list sorted in order. the first entry in the list is Nth value, the last entry the top value.
If you need the N-th largest element, then you don't need to sort the complete array. You should apply selection sort, but only for the required N steps.
Instead of using bubble sort, use quicksort kind of partial sorting.
Pick a key and using as a pivot move around elements (move all the elements>= pivot to the left of the array)
Count how many unique elements are there that are greater than equal to pivot.
If the number is less than N, then the answer is to the right of the array. Otherwise it is in the left part of the array (left or right as compared to pivot)
Iteratively repeat with smaller array and appropriate N
Complexity is O(n) and you will need constant extra memory.
HeapSort uses constant additional memory, so it has minimal space complexity, albeit it doesn't use a minimal number of variables.
It sorts in O(n log n) time which I think is optimal time complexity for this problem because of the needed to ignore duplicates. I may be wrong.
Of course you don't need to complete the heapsort -- just heapify the array and then pop out the first N non-duplicate largest values.
If you really do want to minimise memory usage to the point that you care about one temporary variable one way or the other, then you probably have to accept terrible performance. This becomes a purely theoretical exercise, though -- it will not make your code more memory efficient in practice, because in non-recursive code there is no practical difference between using, say 64 bytes of stack vs using 128 bytes of stack.

Resources