Efficient algorithm for finding the same number [duplicate] - algorithm

This question already has answers here:
Algorithm to find duplicate in an array
(7 answers)
Algorithm to find two repeated numbers in an array, without sorting
(25 answers)
Closed 8 years ago.
There are 1002 numbers in an array and two numbers are the same. How would you find the same number in this array efficiently or is there an efficient algorithm?
Here is my algorithm:
for i in range(0, 1002):
for j in range(i+1, 1002):
if(a[i]==a[j]):
return a[i]

This should work!
#include<stdio.h>
#define RANGE 1000000001
int main()
{
int arr[1002];//your all numbers;
short int hash[RANGE];//Your range of numbers
long long int i;
for(i = 0; i < RANGE; i++)
hash[i] = 0;
for(i = 0; i < 1002; i++)
{
if(hash[arr[i]] != 0)
{
printf("Duplicate number is:%d\n",arr[i]);
break;
}
else
hash[arr[i]]++;
}
return 0;
}

I think the most efficient solution is to use hash set:
from sets import Set
s=Set()
for x in [1,2,3,4,5,2,3,1]:
if x in s:
print x
break
s.add(x)

If your values are numbers, you can use radix sort to fill up a buffer and check for an element that appeared twice.

Your algortihm isn't bad at all ! In the worst case you loop n*(n-1)/2, meaning a complexity of O(n²).
The most favourable condition would be a sorted array. THen you could just loop through it comparing each element with its predecessor. The worst is n-1 comparisons, otherwhise said a complexity of O(n).
However, I assume that the array is not sorted. Sorting it would imply the cost of the sort. Quiksort algorithm, which is pretty good here, has a worstcase of O(n²). So sorting+traversing would have a cost comparable to your algorithm.
Using a hash... well, it's optimal if memory is not a problem (see exellent solution from #Nullpointer. The algorithm cost is the simple traversal, which is O(n).
However in real life, you risk to have memory constraints, meaning shorter hash table and a hash function with risks of colisions (for example modulo size of table). For this reason you'll need to store for each hash value, the list of matching values. In such a situation, the worstcase is when all numbers have the same hash H. In this case, you would calculate each hash (simple O(n) traversal), but when inserting the hash, you'd need to loop through the colision list. A quick calculation shows that again you'd have n*(n-1)/2 comparison, and again a compelxity O(n²), the same as your original proposal.

Related

Best O(n) algorithm to find often appearing numbers?

This is a example. Each number is a value
in the range between [0..k]. A number x is said to appear often in A if at least 1/3 of the numbers
in the array are equal to x.
What would be an O(n) algorithm finding the often appearing numbers for the
case when k is orders of magnitude larger than n?
Why not use a hash map, i.e. a hash-based mapping (dictionary) from integers to integers? Then just iterate over your input array and compute the counters. In imperative pseudo-code:
const int often = ceiling(n/3);
hashmap m;
for int i = 1 to n do {
if m.contains(A[i])
m[A[i]] += 1;
else
m[A[i]] = 1;
if m[A[i]] >= often
// A[i] is appearing often
// print it or store it in the result set, etc.
}
This is O(n) in terms of time (expected) and space.

Fast algorithm for computing Kendall Tau distance between two integer sequences [duplicate]

This question already has answers here:
Kendall tau distance (a.k.a bubble-sort distance) between permutations in base R
(3 answers)
Closed 5 years ago.
I am given two sequences of integers with equal length, e.g.
3 1 2 5 4
5 3 2 1 4
I want to find the Kendall Tau distance between the two, i.e. the number of inverted pairs between the sequences. For instance, we have (3, 5) (3 is before 5) in the first sequence and (5, 3) in the second one. I did a quick O(n^2) algorithm to check the number, but it gets too computationally intense for large sequences of length 40,000 and on. I've read that I can count the number of inversions in doing a bubble sort, transforming the first sequence into the second one, but that's yet again O(n^2).
unsigned short n, first[50001], second[50001], s;
int sum = 0;
cin >> n;
for(int i=1; i<n+1; i++){
cin >> first[i];
}
// in the second array exchange the actual entries in the sequence with their indices
// that way we can quickly check if a pair is inverted
for(int i=1; i<n+1; i++){
cin >> s
second[s]=i;
}
for(int i=1; i<n+1; i++){
for (int j = i+1; j < n+1; j++)
// i < j always
// when we check the indices of the respective entries in the second array
// the relationship should stay otherwise we have an inversion
if(second[first[i]]>=second[first[j]])sum++;
}
This problem seems closely related to the problem of counting inversions in an array, with the difference being that in this case an inversion means "the elements are swapped relative to the other sequence" rather than "the elements are out of order." Since there's a nice O(n log n)-time algorithm for counting inversions, it seems like it would be reasonable to try to find a way to adapt that algorithm to solve this particular problem.
The divide-and-conquer algorithm for counting inversions is based on mergesort and assumes that given any two elements in the sequence there's a fast (O(1)-time) way to compare them to see if they're in the proper order. If we can find a way to somehow annotate the elements of the second sequence so that in time O(1) we can determine whether any pair of elements from that sequence are in order or out of order, then we can just run the fast counting inversions algorithm to get the answer you're looking for.
Here's one way to do this. Create some auxiliary data structure (say, a balanced BST) that associates the elements of the first array with their indices in the first array. Then, make a copy of the second array, annotating each element with its corresponding position in the first array. This in total takes time O(n log n). Then, run the standard O(n log n)-time algorithm for counting inversions in the second array, except when comparing elements, compare by their associated index rather than their values. This in total takes time O(n log n) to complete.

Big O - is n always the size of the input?

I made up my own interview-style problem, and have a question on the big O of my solution. I will state the problem and my solution below, but first let me say that the obvious solution involves a nested loop and is O(n2). I believe I found a O(n) solution, but then I realized it depends not only on the size of the input, but the largest value of the input. It seems like my running time of O(n) is only a technicality, and that it could easily run in O(n2) time or worse in real life.
The problem is:
For each item in a given array of positive integers, print all the other items in the array that are multiples of the current item.
Example Input:
[2 9 6 8 3]
Example Output:
2: 6 8
9:
6:
8:
3: 9 6
My solution (in C#):
private static void PrintAllDivisibleBy(int[] arr)
{
Dictionary<int, bool> dic = new Dictionary<int, bool>();
if (arr == null || arr.Length < 2)
return;
int max = arr[0];
for(int i=0; i<arr.Length; i++)
{
if (arr[i] > max)
max = arr[i];
dic[arr[i]] = true;
}
for(int i=0; i<arr.Length; i++)
{
Console.Write("{0}: ", arr[i]);
int multiplier = 2;
while(true)
{
int product = multiplier * arr[i];
if (dic.ContainsKey(product))
Console.Write("{0} ", product);
if (product >= max)
break;
multiplier++;
}
Console.WriteLine();
}
}
So, if 2 of the array items are 1 and n, where n is the array length, the inner while loop will run n times, making this equivalent to O(n2). But, since the performance is dependent on the size of the input values, not the length of the list, that makes it O(n), right?
Would you consider this a true O(n) solution? Is it only O(n) due to technicalities, but slower in real life?
Good question! The answer is that, no, n is not always the size of the input: You can't really talk about O(n) without defining what the n means, but often people use imprecise language and imply that n is "the most obvious thing that scales here". Technically we should usually say things like "This sort algorithm performs a number of comparisons that is O(n) in the number of elements in the list": being specific about both what n is, and what quantity we are measuring (comparisons).
If you have an algorithm that depends on the product of two different things (here, the length of the list and the largest element in it), the proper way to express that is in the form O(m*n), and then define what m and n are for your context. So, we could say that your algorithm performs O(m*n) multiplications, where m is the length of the list and n is the largest item in the list.
An algorithm is O(n) when you have to iterate over n elements and perform some constant time operation in each iteration. The inner while loop of your algorithm is not constant time as it depends on the hugeness of the biggest number in your array.
Your algorithm's best case run-time is O(n). This is the case when all the n numbers are same.
Your algorithm's worst case run-time is O(k*n), where k = the max value of int possible on your machine if you really insist to put an upper bound on k's value. For 32 bit int the max value is 2,147,483,647. You can argue that this k is a constant, but this constant is clearly
not fixed for every case of input array; and,
not negligible.
Would you consider this a true O(n) solution?
The runtime actually is O(nm) where m is the maximum element from arr. If the elements in your array are bounded by a constant you can consider the algorithm to be O(n)
Can you improve the runtime? Here's what else you can do. First notice that you can ensure that the elements are different. ( you compress the array in hashmap which stores how many times an element is found in the array). Then your runtime would be max/a[0]+max/a[1]+max/a[2]+...<= max+max/2+...max/max = O(max log (max)) (assuming your array arr is sorted). If you combine this with the obvious O(n^2) algorithm you'd get O(min(n^2, max*log(max)) algorithm.

Algorithm for sum-up to 0 from 4 set

I have 4 arrays A, B, C, D of size n. n is at most 4000. The elements of each array are 30 bit (positive/negative) numbers. I want to know the number of ways, A[i]+B[j]+C[k]+D[l] = 0 can be formed where 0 <= i,j,k,l < n.
The best algorithm I derived is O(n^2 lg n), is there a faster algorithm?
Ok, Here is my O(n^2lg(n^2)) algorithm-
Suppose there is four array A[], B[], C[], D[]. we want to find the number of way A[i]+B[j]+C[k]+D[l] = 0 can be made where 0 <= i,j,k,l < n.
So sum up all possible arrangement of A[] and B[] and place them in another array E[] that contain n*n number of element.
int k=0;
for(i=0;i<n;i++)
{
for(j=0;j<n;j++)
{
E[k++]=A[i]+B[j];
}
}
The complexity of above code is O(n^2).
Do the same thing for C[] and D[].
int l=0;
for(i=0;i<n;i++)
{
for(j=0;j<n;j++)
{
AUX[l++]=C[i]+D[j];
}
}
The complexity of above code is O(n^2).
Now sort AUX[] so that you can find the number of occurrence of unique element in AUX[] easily.
Sorting complexity of AUX[] is O(n^2 lg(n^2)).
now declare a structure-
struct myHash
{
int value;
int valueOccuredNumberOfTimes;
}F[];
Now in structure F[] place the unique element of AUX[] and number of time they appeared.
It's complexity is O(n^2)
possibleQuardtupple=0;
Now for each item of E[], do the following
for(i=0;i<k;i++)
{
x=E[i];
find -x in structure F[] using binary search.
if(found in j'th position)
{
possibleQuardtupple+=number of occurrences of -x in F[j];
}
}
For loop i ,total n^2 number of iteration is performed and in each
iteration for binary search lg(n^2) comparison is done. So overall
complexity is O(n^2 lg(n^2)).
The number of way 0 can be reached is = possibleQuardtupple.
Now you can use stl map/ binary search. But stl map is slow, so its better to use binary search.
Hope my explanation is clear enough to understand.
I disagree that your solution is in fact as efficient as you say. In your solution populating E[] and AUX[] is O(N^2) each, so 2.N^2. These will each have N^2 elements.
Generating x = O(N)
Sorting AUX = O((2N)*log((2N)))
The binary search for E[i] in AUX[] is based on N^2 elements to be found in N^2 elements.
Thus you are still doing N^4 work, plus extra work generating the intermediate arrays ans for sorting the N^2 elements in AUX[].
I have a solution (work in progress) but I find it very difficult to calculate how much work it is. I deleted my previous answer. I will post something when I am more sure of myself.
I need to find a way to compare O(X)+O(Z)+O(X^3)+O(X^2)+O(Z^3)+O(Z^2)+X.log(X)+Z.log(Z) to O(N^4) where X+Z = N.
It is clearly less than O(N^4) ... but by how much???? My math is failing me here....
The judgement is wrong. The supplied solution generates arrays with size N^2. It then operates on these arrays (sorting, etc).
Therefore the Order of work, which would normaly be O(n^2.log(n)) should have n substituted with n^2. The result is therefore O((n^2)^2.log(n^2))

How to find sum of elements from given index interval (i, j) in constant time?

Given an array. How can we find sum of elements in index interval (i, j) in constant time. You are allowed to use extra space.
Example:
A: 3 2 4 7 1 -2 8 0 -4 2 1 5 6 -1
length = 14
int getsum(int* arr, int i, int j, int len);
// suppose int array "arr" is initialized here
int sum = getsum(arr, 2, 5, 14);
sum should be 10 in constant time.
If you can spend O(n) time to "prepare" the auxiliary information, based on which you would be able calculate sums in O(1), you could easily do it.
Preparation (O(n)):
aux[0] = 0;
foreach i in (1..LENGTH) {
aux[i] = aux[i-1] + arr[i];
}
Query (O(1)), arr is numerated from 1 to LENGTH:
sum(i,j) = aux[j] - aux[i-1];
I think it was the intent, because, otherwise, it's impossible: for any length to calculate sum(0,length-1) you should have scanned the whole array; this takes linear time, at least.
It cannot be done in constant time unless you store the information.
You would have to do something like specially modify the array to store, for each index, the sum of all values between the start of the array and this index, then using subtraction on the range to get the difference in sums.
However, nothing in your code sample seems to allow this. The array is created by the user (and can change at any time) and you have no control over it.
Any algorithm that needs to scan a group of elements in a sequential unsorted list will be O(n).
Previous answers are absolutely fine for the question asked. I am just adding a point, if this question is changed a bit like:
Find the sum of the interval, if the array gets changed dynamically.
If array elements get changed, then we have to recompute whatever sum we have stored in the auxiliary array as mentioned in #Pavel Shved's approach.
Recomputing is O(n) operation and hence we need to reduce the complexity down to O(nlogn) by making use of Segment Tree.
http://www.geeksforgeeks.org/segment-tree-set-1-sum-of-given-range/
There are three known algorithms for range based queries given [l,r]
1.Segment tree: total query time O(NlogN)
2.Fenwick tree: total query time O(NlogN)
3.Mo's algorithm(square root decomposition)
The first two algorithms can deal with modifications in the list/array given to you. The third algorithm or Mo's algorithm is an offline algorithm means all the queries need to be given to you prior. Modifications in the list/array are not allowed in this algorithm. For implementation, runtime and further reading of this algorithm you can check out this Medium blog. It explains with code. And a very few people actually know about this method.
this question will solve O(n^2)time,O(n)space or O(n)time,O(n)space..
Now the best optimal solution in this case (i.e O(n)time,O(n))
suppose a[]={1,3,5,2,6,4,9} is given
if we create an array(sum[]) in which we kept the value of sum of 0 index to that particular index.like for array a[],sum array will be sum[]={1,4,9,11,17,21,30};like
{1,3+1,3+1+5......} this takes O(n)time and O(n) space..
when we give index then it directly fetch from sum array it means add(i,j)=sum[j]-sum[i-1]; and this takes O(1) times and O(1) spaces...
so,this program takes O(n) time and O(N) spaces..
int sum[]=new int[l];
sum[0]=a[0];
System.out.print(cumsum[0]+" ");
for(int i=1;i<l;i++)
{
sum[i]=sum[i-1]+a[i];
System.out.print(sum[i]+" ");
}
?* this gives 1,4,9,11,17,21,30 and take O(n)time and O(n) spaces */
sum(i,j)=sum[j]-sum[i-1]/this gives sum of indexes from i to j and take O(1)time and O(1) spaces/
so,this program takes O(n) time and O(N) spaces..emphasized text

Resources