duplicate identifying algorithm

duplicate identifying algorithm - algorithm

Rather than having two for loops : first from (var = i) 1 to length-1 of the input array and the second from i to length (using var k) then compare each array[i] == array[k], return true if an identical pair is found false if not found.
Only an array is used as the single parameter for this algo.
Can this still be optimized?

There are a few ways to tackle this depending on the memory restrictions.
With constant memory, you can sort the 2 arrays O(nlogn), and step the arrays together, advancing the pointers if the number is smaller. This will be O(n). This sums up for a total of O(nlogn).
If you can use O(n) memory, you can create a Hash Set from the first array (O(n)), and traverse the second and break if the set contains the number from the second array. This will be N*O(1) which is O(n).

Related

Count number of identical pairs

An identical pair in array are 2 indices p,q such that
0<=p<q<N and array[p]=array[q] where N is the length of the array.
Given an unsorted array, find the number identical pairs in the array.
My solution was to sort the array by values,
keeping track of indices.
Then for every index p in sorted array, count all q<N such that and
sortedarray[p].index < sortedarray[q].index and
sortedarray[p] = sortedarray[q]
Is this the correct approach. I think the complexity would be
O(N log N) for sorting based on value +
O(N^2) for counting the newsorted array that satisfies the condition.
This means I am still looking at O(N^2). Is there a better way ?
Another thought that came was for every P binary search the sorted array for all Q that satisfies the condition. Would that not reduce the complexity of the second part to O(Nlog(N))
Here is my code for second part
for(int i=0;i<N;i++){
int j=i+1;
while( j<N && sortedArray[j].index > sortedArray[i].index &&
sortedArray[j].item == sortedArray[i].item){
inversion++;
j++;
}
}
return inversion;
#Edit: I think, I mistook the complexity of second part to be O(N^2).
As in every iteration in while loop, no rescan of elements from indices 0-i occurs, linear time is required for scanning the sorted array to count the inversions. The total complexity is therefore
O(NlogN) for sorting and O(N) for linear scan count in sorted array.

You are partially correct. Sorting the array via Merge Sort or Heapsort will take O(n lg n). But once the array is sorted, you can make a single pass through to find all identical pairs. This single pass is an O(n) operation. So the total complexity is:
O(n lg n + n) = O(n lg n)

As Tim points out in his response, the complexity of finding the pairs within a sorted array is O(n) and not O(n^2).
To convince yourself of this, think about a typical O(n^2) algorithm: Insertion Sort.
An animated example can be found here.
As you can see in the gif, the reason why this algorithm is quadratic, is because, for each element, it has to check the whole array to ensure where such element will have to go (this includes previous elements in the array!).
On the hand, in you case, you have an ordered array: e.g. [0,1,3,3,6,7,7,9,10,10]
In this situation, you will start scanning (pairwise) from the beginning, and (because of the fact that the array is ordered) you know that once an element is scanned and you pointers proceed, there cannot be any reason to rescan previous elements in the future, because otherwise you would have not proceeded in the first place.
Hence, you scan the whole array only once: O(n)

If you can allocate more memory you can get some gains.
You can reach O(n) by using a hash table which maps any values in the array to a counter indicating how often you already saw this value.
If the number of allowed values is integral and in a limited range you can directly use an array instead of a hash table. The index of value i being i itself. In that case the complexity would be O(n+m) where m is the number of allowed values (because you must first set to 0 all entries in the array and then look through all the array entries to count pairs).
Both methods gives you the number of identical values for each values in your array. Let's call this number nv_i the number of appearance of the value i in the array. Then the number of pairs of value i is: (nv_i)*(nv_i-1)/2.
You can pair:
1st i with nv_i-1 others
2nd i with nv_i-2 others
...
last i with 0
And (nv_i-1)+(nv_i-2)+...+0 = (nv_i)*(nv_i-1)/2

I've been thinking about this.... I think that if you "embed" the == condition into your sorting algorithm, then, the complexity is still O(n lg n).

Check if array B is a permutation of A

I tried to find a solution to this but couldn't get much out of my head.
We are given two unsorted integer arrays A and B. We have to check whether array B is a permutation of A. How can this be done.? Even XORing the numbers wont work as there can be several counterexamples which have same XOR value bt are not permutation of each other.
A solution needs to be O(n) time and with space O(1)
Any help is welcome!!
Thanks.

The question is theoretical but you can do it in O(n) time and o(1) space. Allocate an array of 232 counters and set them all to zero. This is O(1) step because the array has constant size. Then iterate through the two arrays. For array A, increment the counters corresponding to the integers read. For array B, decrement them. If you run into a negative counter value during iteration of array B, stop --- the arrays are not permutations of each others. Otherwise at the end (assuming A and B have the same size, a prerequisite) the counter array is all zero and the two arrays are permutations of each other.
This is O(1) space and O(n) time solution. However it is not practical, but would easily pass as a solution to the interview question. At least it should.
More obscure solutions
Using a nondeterministic model of computation, checking that the two arrays are not permutations of each others can be done in O(1) space, O(n) time by guessing an element that has differing count on the two arrays, and then counting the instances of that element on both of the arrays.
In randomized model of computation, construct a random commutative hash function and calculate the hash values for the two arrays. If the hash values differ, the arrays are not permutations of each others. Otherwise they might be. Repeat many times to bring the probability of error below desired threshold. Also on O(1) space O(n) time approach, but randomized.
In parallel computation model, let 'n' be the size of the input array. Allocate 'n' threads. Every thread i = 1 .. n reads the ith number from the first array; let that be x. Then the same thread counts the number of occurrences of x in the first array, and then check for the same count on the second array. Every single thread uses O(1) space and O(n) time.
Interpret an integer array [ a1, ..., an ] as polynomial xa1 + xa2 + ... + xan where x is a free variable and the check numerically for the equivalence of the two polynomials obtained. Use floating point arithmetics for O(1) space and O(n) time operation. Not an exact method because of rounding errors and because numerical checking for equivalence is probabilistic. Alternatively, interpret the polynomial over integers modulo a prime number, and perform the same probabilistic check.

If we are allowed to freely access a large list of primes, you can solve this problem by leveraging properties of prime factorization.
For both arrays, calculate the product of Prime[i] for each integer i, where Prime[i] is the ith prime number. The value of the products of the arrays are equal iff they are permutations of one another.
Prime factorization helps here for two reasons.
Multiplication is transitive, and so the ordering of the operands to calculate the product is irrelevant. (Some alluded to the fact that if the arrays were sorted, this problem would be trivial. By multiplying, we are implicitly sorting.)
Prime numbers multiply losslessly. If we are given a number and told it is the product of only prime numbers, we can calculate exactly which prime numbers were fed into it and exactly how many.
Example:
a = 1,1,3,4
b = 4,1,3,1
Product of ith primes in a = 2 * 2 * 5 * 7 = 140
Product of ith primes in b = 7 * 2 * 5 * 2 = 140
That said, we probably aren't allowed access to a list of primes, but this seems a good solution otherwise, so I thought I'd post it.

I apologize for posting this as an answer as it should really be a comment on antti.huima's answer, but I don't have the reputation yet to comment.
The size of the counter array seems to be O(log(n)) as it is dependent on the number of instances of a given value in the input array.
For example, let the input array A be all 1's with a length of (2^32) + 1. This will require a counter of size 33 bits to encode (which, in practice, would double the size of the array, but let's stay with theory). Double the size of A (still all 1 values) and you need 65 bits for each counter, and so on.
This is a very nit-picky argument, but these interview questions tend to be very nit-picky.

If we need not sort this in-place, then the following approach might work:
Create a HashMap, Key as array element, Value as number of occurances. (To handle multiple occurrences of the same number)
Traverse array A.
Insert the array elements in the HashMap.
Next, traverse array B.
Search every element of B in the HashMap. If the corresponding value is 1, delete the entry. Else, decrement the value by 1.
If we are able to process entire array B and the HashMap is empty at that time, Success. else Failure.
HashMap will use constant space and you will traverse each array only once.
Not sure if this is what you are looking for. Let me know if I have missed any constraint about space/time.

You're given two constraints: Computational O(n), where n means the total length of both A and B and memory O(1).
If two series A, B are permutations of each other, then theres also a series C resulting from permutation of either A or B. So the problem is permuting both A and B into series C_A and C_B and compare them.
One such permutation would be sorting. There are several sorting algorithms which work in place, so you can sort A and B in place. Now in a best case scenario Smooth Sort sorts with O(n) computational and O(1) memory complexity, in the worst case with O(n log n) / O(1).
The per element comparision then happens at O(n), but since in O notation O(2*n) = O(n), using a Smooth Sort and comparison will give you a O(n) / O(1) check if two series are permutations of each other. However in the worst case it will be O(n log n)/O(1)

The solution needs to be O(n) time and with space O(1).
This leaves out sorting and the space O(1) requirement is a hint that you probably should make a hash of the strings and compare them.
If you have access to a prime number list do as cheeken's solution.
Note: If the interviewer says you don't have access to a prime number list. Then generate the prime numbers and store them. This is O(1) because the Alphabet length is a constant.
Else here's my alternative idea. I will define the Alphabet as = {a,b,c,d,e} for simplicity.
The values for the letters are defined as:
a, b, c, d, e
1, 2, 4, 8, 16
note: if the interviewer says this is not allowed, then make a lookup table for the Alphabet, this takes O(1) space because the size of the Alphabet is a constant
Define a function which can find the distinct letters in a string.
// set bit value of char c in variable i and return result
distinct(char c, int i) : int
E.g. distinct('a', 0) returns 1
E.g. distinct('a', 1) returns 1
E.g. distinct('b', 1) returns 3
Thus if you iterate the string "aab" the distinct function should give 3 as the result
Define a function which can calculate the sum of the letters in a string.
// return sum of c and i
sum(char c, int i) : int
E.g. sum('a', 0) returns 1
E.g. sum('a', 1) returns 2
E.g. sum('b', 2) returns 4
Thus if you iterate the string "aab" the sum function should give 4 as the result
Define a function which can calculate the length of the letters in a string.
// return length of string s
length(string s) : int
E.g. length("aab") returns 3
Running the methods on two strings and comparing the results takes O(n) running time. Storing the hash values takes O(1) in space.
e.g.
distinct of "aab" => 3
distinct of "aba" => 3
sum of "aab => 4
sum of "aba => 4
length of "aab => 3
length of "aba => 3
Since all the values are equal for both strings, they must be a permutation of each other.
EDIT: The solutions is not correct with the given alphabet values as pointed out in the comments.

You can convert one of the two arrays into an in-place hashtable. This will not be exactly O(N), but it will come close, in non-pathological cases.
Just use [number % N] as it's desired index or in the chain that starts there. If any element has to be replaced, it can be placed at the index where the offending element started. Rinse , wash, repeat.
UPDATE:
This is a similar (N=M) hash table It did use chaining, but it could be downgraded to open addressing.

I'd use a randomized algorithm that has a low chance of error.
The key is to use a universal hash function.
def hash(array, hash_fn):
cur = 0
for item in array:
cur ^= hash_item(item)
return cur
def are_perm(a1, a2):
hash_fn = pick_random_universal_hash_func()
return hash_fn(a1, hash_fn) == hash_fn(a2, hash_fn)
If the arrays are permutations, it will always be right. If they are different, the algorithm might incorrectly say that they are the same, but it will do so with very low probability. Further, you can get an exponential decrease in chance for error with a linear amount of work by asking many are_perm() questions on the same input, if it ever says no, then they are definitely not permutations of each other.

I just find a counterexample. So, the assumption below is incorrect.
I can not prove it, but I think this may be possible true.
Since all elements of the arrays are integers, suppose each array has 2 elements,
and we have
a1 + a2 = s
a1 * a2 = m
b1 + b2 = s
b1 * b2 = m
then {a1, a2} == {b1, b2}
if this is true, it's true for arrays have n-elements.
So we compare the sum and product of each array, if they equal, one is the permutation
of the other.

What is the fastest sorting alogrithm for combining pre-sorted sets

Lets say that I have sets that I know are already sorted such as {0,2,10,23,65} and {3,5,8..}. What would be the best sorting algorithm that could combine any number of pre-sorted sets into one sorted set? For how effecient would this type of sorting be?

You do not need to sort them, you need to merge. This is done in O(M+N) using a simple loop that keeps two indexes looking at the current element of the two parts, adding the smaller of the two to the final sequence, and advancing the index by one.
Here is pseudocode:
int[] parts1, parts2 // Sorted parts
int i = 0, j = 0;
while i != parts1.Length || j != parts2.Length
if i != parts1.Length || j != parts2.Length
if parts1[i] < parts2[j]
res.Add(parts1[i++])
else
res.Add(parts2[j++])
else if i != parts1.Length
res.Add(parts1[i++])
else
res.Add(parts2[j++])
At each step the loop advances either i or j, executing parts1.Lenght + part2.Length times.

The simplest way would be to compare the head of lists you have, take the smallest one, and add it to a sorted set. Repeat until all lists are empty.
Efficiency-wise, it's always linear in time. It will take as long as the number of items you have to merge in total.
This is actually the second stage of Mergesort.

Suppose that there are O(n) elements in O(k) sets. A standard merge is going to be O(n * k).
If you only have 2 sets, this is not a big deal. If you have 1000 it might be. In that case you can keep a priority queue of sets organized by their next smallest element. This variant is O(n log(k)).

check if there exists a[i] = 2*a[j] in an unsorted array a?

Given a unsorted sequence of a[1,...,n] of integers, give an O(nlogn) runtime algorithm to check there are two indices i and j such that a[i] =2*a[j]. The algorithm should return i=0 and j=2 on input 4,12,8,10 and false on input 4,3,1,11.
I think we have to sort the array anyways which is O(nlogn). I'm not sure what to do after that.

Note: that can be done on O(n)1 on average, using a hash table.
set <- new hash set
for each x in array:
set.add(2*x)
for each x in array:
if set.contains(x):
return true
return false
Proof:
=>
If there are 2 elements a[i] and a[j] such that a[i] = 2 * a[j], then when iterating first time, we inserted 2*a[j] to the set when we read a[j]. On the second iteration, we find that a[i] == 2* a[j] is in set, and return true.
<=
If the algorithm returned true, then it found a[i] such that a[i] is already in the set in second iteration. So, during first itetation - we inserted a[i]. That only can be done if there is a second element a[j] such that a[i] == 2 * a[j], and we inserted a[i] when reading a[j].
Note:
In order to return the indices of the elemets, one can simply use a hash-map instead of a set, and for each i store 2*a[i] as key and i as value.
Example:
Input = [4,12,8,10]
first insert for each x - 2x to the hash table, and the index. You will get:
hashTable = {(8,0),(24,1),(16,2),(20,3)}
Now, on secod iteration you check for each element if it is in the table:
arr[0]: 4 is not in the table
arr[1]: 12 is not in the table
arr[2]: 8 is in the table - return the current index [2] and the value of 8 in the map, which is 0.
so, final output is 2,0 - as expected.
(1) Complexity notice:
In here, O(n) assumes O(1) hash function. This is not always true. If we do assume O(1) hash function, we can also assume sorting with radix-sort is O(n), and using a post-processing of O(n) [similar to the one suggested by #SteveJessop in his answer], we can also achieve O(n) with sorting-based algorithm.

Sort the array (O(n log n), or O(n) if you're willing to stretch a point about arrays of fixed-size integers)
Initialise two pointers ("fast" and "slow") at the start of the array (O(1))
Repeatedly:
increment "fast" until you find an even value >= twice the value at "slow"
if the value at "fast" is exactly twice the value at "slow", return true
increment "slow" until you find a value >= half the value at fast
if the value at "slow" is exactly half the value at "fast", return true
if one of the attempts to increment goes past the end, return false
Since each of fast and slow can be incremented at most n times total before reaching the end of the array, the "repeatedly" part is O(n).

You're right that the first step is sorting the array.
Once the array is sorted, you can find out whether a given element is inside the array in O(log n) time. So if for every of the n elements, you check for the inclusion of another element in O(log n) time, you end up with a runtime of O(n log n).
Does that help you?

Create an array of pairs A={(a[0], 0), (a[1], 1), ..., (a[n-1], n-1)}
Sort A,
For every (a[i], i) in A, do a binary search to see if there's a (a[i] * 2, j) pair or not. We can do this, because A is sorted.
Step 1 is O(n), and step 2 and 3 are O(n * log n).
Also, you can do step 3 in O(n) (there's no need for binary search). Because if the corresponding element for A[i] is at A[j], then then corresponding element for A[i+1] cannot be in A[0..j-1]. So we can keep two pointers, and find the answer in O(n). But anyway, the whole algorithm will be O(n log n) because we still do sorting.

Sorting the array is a good option - O(nlogn), assuming you don't have some fancy bucket sort option.
Once it's sorted, you need only pass through the array twice - I believe this is O(n)
Create a 'doubles' list which starts empty.
Then, For each element of the array:
check the element against the first element of the 'doubles' list
if it is the same, you win
if the element is higher, ditch the first element of the 'doubles' list and check again
add its double to the end of the 'doubles' list
keep going until you find a double, or get to the end of your first list.

You can also use a balanced tree, but it uses extra space but also does not harm the array.
Starting at i=0, and incrementing i, insert elements, checking if twice or half the current element is already there in the tree.
One advantage is that it will work in O(M log M) time where M = min [max{i,j}]. You could potentially change your sorting based algorithm to try and do O(M log M) but it could get complicated.
Btw, if you are using comparisons only, there is an Omega(n log n) lower bound, by reducing the element distinctness problem to this:
Duplicate the input array. Use the algorithm for this problem twice. So unless you bring hashing type stuff into the picture, you cannot get a better than Theta(n log n) algorithm!

Very hard sorting algorithm problem - O(n) time - Time complextiy

Since the problem is long i can not describe it at title.
Imagine that we have 2 unsorted integer arrays. Both array lenght is n and they are containing interegers between 0 - n^765 (n power 765 maximum) .
I want to compare both arrays and find out whether they contain any same integer value or not with in O(n) time complexity.
no duplicates are possible in the same array
Any help and idea is appreciated.

What you want is impossible. Each element will be stored in up to log(n^765) bits, which is O(log n). So simply reading the contents of both arrays will take O(n*logn).
If you have a constant upper bound on the value of each element, You can solve this in O(n) average time by storing the elements of one array in a hash table, and then checking if the elements of the other array are contained in it.
Edit:
The solution you may be looking for is to use radix sort to sort your data, after which you can easily check for duplicate elements. You would look at your numbers in base n, and do 765 passes over your data. Each pass would use a bucket sort or counting sort to sort by a single digit (in base n). This process would take O(n) time in the worst case (assuming a constant upper bound on element size). Note that I doubt anyone would ever choose this over a hash table in practice.

By assuming multiplication and division is O(1):
Think about numbers, you can write them as:
Number(i) = A0 * n^765 + A1 * n^764 + .... + A764 * n + A765.
for coding number to this format, you should just do Number / n^i, Number % n^i, if you precompute, n^1, n^2, n^3, ... it can be done in O(n * 765)=> O(n) for all numbers. precomputation of n^i, can be done in O(i) since i at most is 765 it's O(1) for all items.
Now you can write Numbers(i) as array: Nembers(i) = (A0, A1, ..., A765) and know you can radix sort items :
first compare all A765, then ...., All of Ai's are in the range 0..n so for comparing Ai's you can use Counting sort (Counting sort is O(n)), so your radix sort is O(n * 765) which is O(n).
After radix sort you have two sorted array and you can simply find one similar item in O(n) or use merge algorithm (like merge sort) to find most possible similarity (not just one).
for generalization if the size of input items is O(n^C) it can be sorted in O(n) (C is fix number). but because the overhead of this way of sortings are big, prefer to using quicksort and similar algorithms. Simple sample of this question can be found in Introduction to Algorithm book, which asks if the numbers are in range (0..n^2) how to sort them in O(n).
Edit: for clarifying how you can find similar items in 2-sorted lists:
You have 2 sorted list, for example in merge sort how do you can merge two sorted list to one list? you will move from start of list 1, and list 2, and move your head pointer of list1 while head(list(1)) > head(list(2)), and after that do this for list2 and ..., so if there is a similar item your algorithm will stop (before reach the end of lists), or in the end of two lists your algorithm will stop.
it's as easy as bellow:
public int FindSimilarityInSortedLists(List<int> list1, List<int> list2)
{
int i = 0;
int j = 0;
while (i < list1.Count && j < list2.Count)
{
if (list1[i] == list2[j])
return list1[i];
if (list1[i] < list2[j])
i++;
else
j++;
}
return -1; // not found
}

If memory was unlimited you could simply create a hashtable with the integers as keys and the values the number of times they are found. Then to do your "fast" look up you simple query for an integer, discover if its contained within the hash table, and if found check that the value is 1 or 2. That would take O(n) to load and O(1) to query.

I do not think you can do it O(n).
You should check n values whether they are in the other array. This means you have n comparing operations at least if the other array has just 1 element. But as you have n element it the other array as well, you can do it just O(n*n)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

duplicate identifying algorithm - algorithm

Related

Count number of identical pairs

Check if array B is a permutation of A

What is the fastest sorting alogrithm for combining pre-sorted sets

check if there exists a[i] = 2*a[j] in an unsorted array a?

Very hard sorting algorithm problem - O(n) time - Time complextiy

Categories

Resources