Count number of identical pairs - algorithm

An identical pair in array are 2 indices p,q such that
0<=p<q<N and array[p]=array[q] where N is the length of the array.
Given an unsorted array, find the number identical pairs in the array.
My solution was to sort the array by values,
keeping track of indices.
Then for every index p in sorted array, count all q<N such that and
sortedarray[p].index < sortedarray[q].index and
sortedarray[p] = sortedarray[q]
Is this the correct approach. I think the complexity would be
O(N log N) for sorting based on value +
O(N^2) for counting the newsorted array that satisfies the condition.
This means I am still looking at O(N^2). Is there a better way ?
Another thought that came was for every P binary search the sorted array for all Q that satisfies the condition. Would that not reduce the complexity of the second part to O(Nlog(N))
Here is my code for second part
for(int i=0;i<N;i++){
int j=i+1;
while( j<N && sortedArray[j].index > sortedArray[i].index &&
sortedArray[j].item == sortedArray[i].item){
inversion++;
j++;
}
}
return inversion;
#Edit: I think, I mistook the complexity of second part to be O(N^2).
As in every iteration in while loop, no rescan of elements from indices 0-i occurs, linear time is required for scanning the sorted array to count the inversions. The total complexity is therefore
O(NlogN) for sorting and O(N) for linear scan count in sorted array.

You are partially correct. Sorting the array via Merge Sort or Heapsort will take O(n lg n). But once the array is sorted, you can make a single pass through to find all identical pairs. This single pass is an O(n) operation. So the total complexity is:
O(n lg n + n) = O(n lg n)

As Tim points out in his response, the complexity of finding the pairs within a sorted array is O(n) and not O(n^2).
To convince yourself of this, think about a typical O(n^2) algorithm: Insertion Sort.
An animated example can be found here.
As you can see in the gif, the reason why this algorithm is quadratic, is because, for each element, it has to check the whole array to ensure where such element will have to go (this includes previous elements in the array!).
On the hand, in you case, you have an ordered array: e.g. [0,1,3,3,6,7,7,9,10,10]
In this situation, you will start scanning (pairwise) from the beginning, and (because of the fact that the array is ordered) you know that once an element is scanned and you pointers proceed, there cannot be any reason to rescan previous elements in the future, because otherwise you would have not proceeded in the first place.
Hence, you scan the whole array only once: O(n)

If you can allocate more memory you can get some gains.
You can reach O(n) by using a hash table which maps any values in the array to a counter indicating how often you already saw this value.
If the number of allowed values is integral and in a limited range you can directly use an array instead of a hash table. The index of value i being i itself. In that case the complexity would be O(n+m) where m is the number of allowed values (because you must first set to 0 all entries in the array and then look through all the array entries to count pairs).
Both methods gives you the number of identical values for each values in your array. Let's call this number nv_i the number of appearance of the value i in the array. Then the number of pairs of value i is: (nv_i)*(nv_i-1)/2.
You can pair:
1st i with nv_i-1 others
2nd i with nv_i-2 others
...
last i with 0
And (nv_i-1)+(nv_i-2)+...+0 = (nv_i)*(nv_i-1)/2

I've been thinking about this.... I think that if you "embed" the == condition into your sorting algorithm, then, the complexity is still O(n lg n).

Related

Efficient algorithm to determine if two sets of numbers are disjoint

Practicing for software developer interviews and got stuck on an algorithm question.
Given two sets of unsorted integers with array of length m and other of
length n and where m < n find an efficient algorithm to determine if
the sets are disjoint. I've found solutions in O(nm) time, but haven't
found any that are more efficient than this, such as in O(n log m) time.
Using a datastructure that has O(1) lookup/insertion you can easily insert all elements of first set.
Then foreach element in second set, if it exists not disjoint, otherwise it is disjoint
Pseudocode
function isDisjoint(list1, list2)
HashMap = new HashMap();
foreach( x in list1)
HashMap.put(x, true);
foreach(y in list2)
if(HashMap.hasKey(y))
return false;
return true;
This will give you an O(n + m) solution
Fairly obvious approach - sort the array of length m - O(m log m).
For every element in the array of length n, use binary search to check if it exists in the array of length m - O(log m) per element = O(n log m). Since m<n, this adds up to O(n log m).
Here's a link to a post that I think answers your question.
3) Sort smaller O((m + n)logm)
Say, m < n, sort A
Binary search for each element of B into A
Disadvantage: Modifies the input
Looks like Cheruvian beat me to it, but you can use a hash table to get O(n+m) in average case:
*Insert all elements of m into the table, taking (probably) constant time for each, assuming there aren't a lot with the same hash. This step is O(m)
*For each element of n, check to see if it is in the table. If it is, return false. Otherwise, move on to the next. This takes O(n).
*If none are in the table, return true.
As I said before, this works because a hash table gives constant lookup time in average case. In the rare event that many unique elements in m have the same hash, it will take slightly longer. However, most people don't need to care about hypothetical worst cases. For example, quick sort is used more than merge sort because it gives better average performance, despite the O(n^2) upper bound.

Finding all unique elements from a big sorted array in log n time?

I have a very big sorted array. How can I count or print all the unique elements of an array??
Suppose my array is [2,3,3,3,4,6,6,7]
then output should be 2,3,4,6,7
I know to do it in a n(complexity) time. But interviewer asked me to do this in log n time??
Is it possible?
Here is an algorithm which requires O(logn*k) where k is unique elements:-
set uniQ
int ind = 0;
do {
uniQ.add(arr[i]);
ind = BinSearchGreater(arr,arr[ind],ind+1);
if(ind >= arr.length)
break;
} while(true);
BinSearchGreater(arr,key,start_ind) : returns index of first element greater than key in subarray starting at start_ind
Time complexity :-
Note this algorithm is only good when no of unique elements are small.
This is asymptotically O(n*logn) if all are unique so worse than linear.
I would like to know how he (the interviewer) counts every unique element in the array [1,2,3,4,5] without picking at least every element. In this case you have to pick every element to count every element and this will be done in O(n). In my opinion impossible to get a complexity of O(log n), if there are no other requirements to the given array.

How to sort an array according to another array?

Suppose A={1,2,3,4}, p={36,3,97,19}, sort A using p as sort keys. You can get {2,4,1,3}.
It is an example in the book introducton to algorithms. It says it can be done in nlogn.
Can anyone give me some idea about how it can be done? My thought is you need to keep track of each element in p to find where it ends up, like p[1] ends up at p[3] then A[1] ends up at A[3]. Can anyone use merge sort or other nlogn sorting to get this done?
I'm new to algorithm and find it a little intimidating :( thanks for any help.
Construct an index array:
i = { 0, 1, 2, 3 }
Now, while you are sorting p, make the same changes to the index array i.
When you're done, you'll have:
i = { 1, 3, 0, 2 }
Sorting two arrays takes at most twice as long as sorting one (and actually, if you're only counting comparisons you don't have to do any additional comparisons, just data swaps in two arrays instead of one), so that doesn't change the Big-O complexity of the overall sort because O( 2n log n ) = O(n log n).
Now, you can use those indices to construct the sorted A array in linear time by simply iterating through the sorted index array and looking up the element of A at that index. This takes O( n ) time.
The runtime complexity of your overall algorithm is at worst: O( n + 2n log n ) = O( n log n )
Of course you can also skip index array entirely and simply treat the array A in the same way, sorting it along side p.
I don't see this difficult, since complexity of a sorting algorithm is usually measured on number of comparisons required you just need to update the position of elements in array A according to the elements in B. You won't need to do any comparison in addition to ones already needed to sort B so complexity is the same.
Every time you move an element, just move it in both arrays and you are done.

check if there exists a[i] = 2*a[j] in an unsorted array a?

Given a unsorted sequence of a[1,...,n] of integers, give an O(nlogn) runtime algorithm to check there are two indices i and j such that a[i] =2*a[j]. The algorithm should return i=0 and j=2 on input 4,12,8,10 and false on input 4,3,1,11.
I think we have to sort the array anyways which is O(nlogn). I'm not sure what to do after that.
Note: that can be done on O(n)1 on average, using a hash table.
set <- new hash set
for each x in array:
set.add(2*x)
for each x in array:
if set.contains(x):
return true
return false
Proof:
=>
If there are 2 elements a[i] and a[j] such that a[i] = 2 * a[j], then when iterating first time, we inserted 2*a[j] to the set when we read a[j]. On the second iteration, we find that a[i] == 2* a[j] is in set, and return true.
<=
If the algorithm returned true, then it found a[i] such that a[i] is already in the set in second iteration. So, during first itetation - we inserted a[i]. That only can be done if there is a second element a[j] such that a[i] == 2 * a[j], and we inserted a[i] when reading a[j].
Note:
In order to return the indices of the elemets, one can simply use a hash-map instead of a set, and for each i store 2*a[i] as key and i as value.
Example:
Input = [4,12,8,10]
first insert for each x - 2x to the hash table, and the index. You will get:
hashTable = {(8,0),(24,1),(16,2),(20,3)}
Now, on secod iteration you check for each element if it is in the table:
arr[0]: 4 is not in the table
arr[1]: 12 is not in the table
arr[2]: 8 is in the table - return the current index [2] and the value of 8 in the map, which is 0.
so, final output is 2,0 - as expected.
(1) Complexity notice:
In here, O(n) assumes O(1) hash function. This is not always true. If we do assume O(1) hash function, we can also assume sorting with radix-sort is O(n), and using a post-processing of O(n) [similar to the one suggested by #SteveJessop in his answer], we can also achieve O(n) with sorting-based algorithm.
Sort the array (O(n log n), or O(n) if you're willing to stretch a point about arrays of fixed-size integers)
Initialise two pointers ("fast" and "slow") at the start of the array (O(1))
Repeatedly:
increment "fast" until you find an even value >= twice the value at "slow"
if the value at "fast" is exactly twice the value at "slow", return true
increment "slow" until you find a value >= half the value at fast
if the value at "slow" is exactly half the value at "fast", return true
if one of the attempts to increment goes past the end, return false
Since each of fast and slow can be incremented at most n times total before reaching the end of the array, the "repeatedly" part is O(n).
You're right that the first step is sorting the array.
Once the array is sorted, you can find out whether a given element is inside the array in O(log n) time. So if for every of the n elements, you check for the inclusion of another element in O(log n) time, you end up with a runtime of O(n log n).
Does that help you?
Create an array of pairs A={(a[0], 0), (a[1], 1), ..., (a[n-1], n-1)}
Sort A,
For every (a[i], i) in A, do a binary search to see if there's a (a[i] * 2, j) pair or not. We can do this, because A is sorted.
Step 1 is O(n), and step 2 and 3 are O(n * log n).
Also, you can do step 3 in O(n) (there's no need for binary search). Because if the corresponding element for A[i] is at A[j], then then corresponding element for A[i+1] cannot be in A[0..j-1]. So we can keep two pointers, and find the answer in O(n). But anyway, the whole algorithm will be O(n log n) because we still do sorting.
Sorting the array is a good option - O(nlogn), assuming you don't have some fancy bucket sort option.
Once it's sorted, you need only pass through the array twice - I believe this is O(n)
Create a 'doubles' list which starts empty.
Then, For each element of the array:
check the element against the first element of the 'doubles' list
if it is the same, you win
if the element is higher, ditch the first element of the 'doubles' list and check again
add its double to the end of the 'doubles' list
keep going until you find a double, or get to the end of your first list.
You can also use a balanced tree, but it uses extra space but also does not harm the array.
Starting at i=0, and incrementing i, insert elements, checking if twice or half the current element is already there in the tree.
One advantage is that it will work in O(M log M) time where M = min [max{i,j}]. You could potentially change your sorting based algorithm to try and do O(M log M) but it could get complicated.
Btw, if you are using comparisons only, there is an Omega(n log n) lower bound, by reducing the element distinctness problem to this:
Duplicate the input array. Use the algorithm for this problem twice. So unless you bring hashing type stuff into the picture, you cannot get a better than Theta(n log n) algorithm!

Very hard sorting algorithm problem - O(n) time - Time complextiy

Since the problem is long i can not describe it at title.
Imagine that we have 2 unsorted integer arrays. Both array lenght is n and they are containing interegers between 0 - n^765 (n power 765 maximum) .
I want to compare both arrays and find out whether they contain any same integer value or not with in O(n) time complexity.
no duplicates are possible in the same array
Any help and idea is appreciated.
What you want is impossible. Each element will be stored in up to log(n^765) bits, which is O(log n). So simply reading the contents of both arrays will take O(n*logn).
If you have a constant upper bound on the value of each element, You can solve this in O(n) average time by storing the elements of one array in a hash table, and then checking if the elements of the other array are contained in it.
Edit:
The solution you may be looking for is to use radix sort to sort your data, after which you can easily check for duplicate elements. You would look at your numbers in base n, and do 765 passes over your data. Each pass would use a bucket sort or counting sort to sort by a single digit (in base n). This process would take O(n) time in the worst case (assuming a constant upper bound on element size). Note that I doubt anyone would ever choose this over a hash table in practice.
By assuming multiplication and division is O(1):
Think about numbers, you can write them as:
Number(i) = A0 * n^765 + A1 * n^764 + .... + A764 * n + A765.
for coding number to this format, you should just do Number / n^i, Number % n^i, if you precompute, n^1, n^2, n^3, ... it can be done in O(n * 765)=> O(n) for all numbers. precomputation of n^i, can be done in O(i) since i at most is 765 it's O(1) for all items.
Now you can write Numbers(i) as array: Nembers(i) = (A0, A1, ..., A765) and know you can radix sort items :
first compare all A765, then ...., All of Ai's are in the range 0..n so for comparing Ai's you can use Counting sort (Counting sort is O(n)), so your radix sort is O(n * 765) which is O(n).
After radix sort you have two sorted array and you can simply find one similar item in O(n) or use merge algorithm (like merge sort) to find most possible similarity (not just one).
for generalization if the size of input items is O(n^C) it can be sorted in O(n) (C is fix number). but because the overhead of this way of sortings are big, prefer to using quicksort and similar algorithms. Simple sample of this question can be found in Introduction to Algorithm book, which asks if the numbers are in range (0..n^2) how to sort them in O(n).
Edit: for clarifying how you can find similar items in 2-sorted lists:
You have 2 sorted list, for example in merge sort how do you can merge two sorted list to one list? you will move from start of list 1, and list 2, and move your head pointer of list1 while head(list(1)) > head(list(2)), and after that do this for list2 and ..., so if there is a similar item your algorithm will stop (before reach the end of lists), or in the end of two lists your algorithm will stop.
it's as easy as bellow:
public int FindSimilarityInSortedLists(List<int> list1, List<int> list2)
{
int i = 0;
int j = 0;
while (i < list1.Count && j < list2.Count)
{
if (list1[i] == list2[j])
return list1[i];
if (list1[i] < list2[j])
i++;
else
j++;
}
return -1; // not found
}
If memory was unlimited you could simply create a hashtable with the integers as keys and the values the number of times they are found. Then to do your "fast" look up you simple query for an integer, discover if its contained within the hash table, and if found check that the value is 1 or 2. That would take O(n) to load and O(1) to query.
I do not think you can do it O(n).
You should check n values whether they are in the other array. This means you have n comparing operations at least if the other array has just 1 element. But as you have n element it the other array as well, you can do it just O(n*n)

Resources