Complexity of searching common number in 3 arrays - performance

There are three arrays a1, a2, a3 of size n. Function searches for common number in these arrays.
Algorithm is the next:
foreach n in a1
if n is found in a2
if n is found in a3
return true
return false
My guess that worse case will be the next: a1 and a2 are equal, a3 does not contain any common number with a1.
Complexity to iterate through array a1 will be O(i).
Complexity to search array a2 or a3 is f(n) (we do not know how they are searched).
My guess that overall complexity for worse case would be:
O(n) = n * f(n) * f(n) = n * (f(n))^2
I was told that that it is wrong.
What is correct answer then?

n * f(n) * f(n) = n * (f(n))^2
I was told that that it is wrong. What is correct answer then?
The correct answer for the given algorithm:
n * (f(n) + f(n)) = O(n*f(n))
You don't search a3 array f(n) times for each n in a1 so you should use + instead of *.

Place the elements of a2 into a set s2 and the elements of a3 into a set s3. Both of these operations are linear in the number of elements of each array. Then, iterate over a1 and check if the element is in s2 and s3. The lookup is constant time. So the best achievable complexity of the whole algorithm is:
O(n1 + n2 + n3)
Where n1 is the number of elements of a1, and so on for n2 and n3. In other words, the algorithm is linear in the number of elements.

IMO worst complexity is n.log n. You sort each one of arrays and then compare.

The important thing to note here is what is the running time of the "is found" function ?
There are two possible answers :
case 1: If the a2 list and a3 list are sorted,
then this function is log N time binary search, so you get n*log(n)^2.
case 2: If the lists are unordered,
then each search will take n time (where n is the length of the each list)... and thus it will be n * n * n = n^3

For the given algorithm:
foreach item (n) you're looping through 2 other arrays (2f(n)) .. so it's n*2f(n) = o(n*f(n))
BTW, best way to do it is:
Keep an array or hash of the items you find in the first array.
Then go through the other 2 arrays and see if they have items that are already found.
Saving items in array or hash, and Lookup is O(1).
And you're just looping through the 3 arrays one time each, so you have a complexity of O(max{n,f(n)})

Well, the quick answer is O(n^3) assuming that they all have the same length. The worst case is that we find the element we are looking for in a2 at the last position, so we will span the whole array, and the same for a3 or it doesn't exist in a3. and this is the same for all the elements in a1, by that, we will have to span the 3 arrays all the time, assuming that each has length n, then total complexity is of order n^3

So in the worst case you described, where a1 and a2 are equal, and a3 doesn't contain any common numbers with the others, then for each n in a1 you will search a2 and a3.
So it looks like the time to run will be proportional to 2n^2. That is, it would be the same as writing:
int jcnt, kcnt;
for (int i = 0; i < n; ++i)
{
for (int j = 0; j < n; ++j)
{
++jcnt;
}
for (int k = 0; k < n; ++k)
{
++kcnt;
}
}
int total = jcnt+kcnt;
You'll find that total will be equal to 2n^2.
Assuming, of course, that the arrays are unordered. If a2 and a3 are ordered and you can do binary search, then it would be 2n(log n).

Related

Finding product of Absolute Difference of Every pair of integer from an array

Given an array, find the product of the absolute difference of every pair of integers.
For example: Given a[]= {2,3, 5, 7 };
output would be (3-2) * (5-2) * (7-2) * (5-3) * (7-3) * (7-5) = 240.
Can it be done better than O(n^2) ?
Edit:
All elements are distinct.
If you had to compute the sum of the absolute differences, then this would be your solution
Basically, if you take an arbitrary number, let's name it x, then you have it
m * x - n * x,
where m is the number of items which are smaller than x and n is the n number of items, which are greater than x. So, if for some reason you had a sorted array, then the index of each item would directly tell you how many greater or smaller items are if it's unique in the array. If not, then you can determine the number of higher and lower elements as well.
So, if the array is sorted, than computing the result is linear. Sorting the array if it's completely unsorted is n * log(n) of complexity if you use mergesort. Hence, the complexity is
O(n + n * log(n)) = (n + 1)log(n)
But for the product of absolute differences
you have a product of the form of
(a1 - b1) * ... (...)
since you have a product of subtractions, in order to find a pattern that you could use to optimize, you need more information about your data. The input that you have seem to contain primes. The product of
(a1 - b1) * (a2 - b2)
is
a1a2 - a1b2 - b1a2 + b1b2
I do not know about any pattern that you could use for your optimization, so I think this has an O(n^2) complexity.

Algorithm of O(nlogn) to search for sum of two elements in two arrays

Given two sorted arrays of integers A1, A2, with the same length n and an integer x, I need to write an algorithm that runs in O(nlog(n)) that determines whether there exist two elements a1, a2 (one element in each array) that make a1+a2=x.
At first I thought about having two index iterators i1=0, i2=0 (one for each array) that start from 0 and increase one at a time, depending on the next element of A1 being bigger/smaller than the next element of A2. But after testing it on two arrays I found out that it might miss some possible solutions...
Well, as they are both sorted already, the algorithm should be O(n) (sorting would be O(n * log(n))):
i1 = 0
i2 = A2.size - 1
while i1 < A1.size and i2 >= 0
if A1[i1] + A2[i2] < x
++i1
else if A1[i1] + A2[i2] > x
--i2
else
success!!!!
This is a strange question because there is an inelegant solution in time O(N Lg N) (for every element of A1, lookup A2 for x-a1 by dichotomic search), and a nice one requiring only O(N) operations.
Start from the left of A1 and the right of A2 and move left in A2 as long as a1+a2≥x. Then move right one position in A1 and update in A2 if needed...
You start one array from index = 0 = i and the other in reverse other = j.
First step you know you have the smallest in the list A and the biggest in the list B so you subtract i from x, then move the j index down until the value =< X
Basically you have 2 indexes moving towards the middle
If value of index i > value of index j then there is no such sum that match x.

How to group numbers by size

I have n different numbers and I want to sort them into k groups, such that any number in group 1 is smaller than any number in group 2, and anyone in group 2 smaller than anyone in group 3 and so on until group k (the numbers do not have to be sorted inside each group). I'm asked to design an algorithm that runs in O(n log k), but I can only come up with O(n^2) ones.
How can I do this?
You could achieve this by modifying the Bucket sort algorithm, below I have included a JavaScript implementation, see Github for further details on the source code. This implementation uses 16 buckets, you will have to modify it to allow for k buckets and you can omit the sorting of buckets itself. One approach would be to use 2^p buckets where p is the smallest integer that satisfies 2^p < n. This algorithm will run in O(n log k)
// Copyright 2011, Tom Switzer
// Under terms of ISC License: http://www.isc.org/software/license
/**
* Sorts an array of integers in linear time using bucket sort.
* This gives a good speed up vs. built-in sort in new JS engines
* (eg. V8). If a key function is given, then the result of
* key(a[i]) is used as the integer value to sort on instead a[i].
*
* #param a A JavaScript array.
* #param key A function that maps values of a to integers.
* #return The array a.
*/
function bsort(a, key) {
key = key || function(x) {
return x
};
var len = a.length,
buckets = [],
i, j, b, d = 0;
for (; d < 32; d += 4) {
for (i = 16; i--;)
buckets[i] = [];
for (i = len; i--;)
buckets[(key(a[i]) >> d) & 15].push(a[i]);
//This implementation uses 16 buckets, you will need to modify this
for (b = 0; b < 16; b++)
//The next two lines sort each bucket, you can leave it out
for (j = buckets[b].length; j--;)
a[++i] = buckets[b][j];
}
return a;
}
var array = [2, 4, 1, 5, 3];
$('#result').text(bsort(array, function(x) {
return x
}));
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div id="result"></div>
Note that the problem statement is to separate n different numbers into k groups. This would get more complicated if there were duplicates as noted in the wiki links below.
Any process that can determine the kth smallest element with less than O(n log(k)) complexity could be used k-1 times to produce an array of the elements corresponding to the boundaries between k groups. Then a single pass could be made on the array, doing a binary search of the boundary array to split up the array into k groups with O(n log(k)) complexity. However, it seems that at least one algorithm to find the kth smallest element also partitions the array, so that alone could be used to create the k groups.
A unordered partial sort using a selection algorithm with worst case time O(n) is possible. Wiki links:
http://en.wikipedia.org/wiki/Selection_algorithm
http://en.wikipedia.org/wiki/Selection_algorithm#Unordered_partial_sorting
http://en.wikipedia.org/wiki/Quickselect
http://en.wikipedia.org/wiki/Median_of_medians
http://en.wikipedia.org/wiki/Soft_heap#Applications
Use K-selection algorithm with partition function from QuickSort - QuickSelect.
Let's K is power of 2 for simplicity.
At the first stage we make partition of N elements, it takes O(N) ~ p* N time, where p is some constant
At the second stage we recursively make 2 partitions of N/2 elements, it takes 2* p* N/2 = p*N time.
At the third stage we make 4 partitions of N/4 elements, it takes 4*pN/4 = pN time.
...
At the last stage we make K partitions of N/K elements, it takes K* p* N/K = p*N time.
Note there are Log(K) stages, so overall time is Log(K) * p * N = O(N*Log(K)
Thank you for all your help, basically a quickselect (or any linear time sorting algorithm that finds the k-th statistic in linear time is enough) and, after running it k-1 times, we make a binary search over the original array to split the elements into groups, getting O(nlog k).
Also, if you don't want to make a binary search, in the quickselect, you can also separate the elements and find the statistic in each subset! #rcgldr, #MBo thank you for your ideas!

Analysis of sorting Algorithm with probably wrong comparator?

It is an interesting question from an Interview, I failed it.
An array has n different elements [A1 .. A2 .... An](random order).
We have a comparator C, but it has a probability p to return correct results.
Now we use C to implement sorting algorithm (any kind, bubble, quick etc..)
After sorting we have [Ai1, Ai2, ..., Ain] (It could be wrong)。
Now given a number m (m < n), the question is as follows:
What is Expectation of size S of Intersection between {A1, A2, ..., Am} and {Ai1, Ai2, ..., Aim}, in other words, what is E[S]?
Any relationship among m, n and p ?
If we use different sorting algorithm, how will E[S] change ?
My idea is as follows:
When m=n, E[S] = n, surely
When m=n-1, E[S] = n-1+P(An in Ain)
I dont know how to complete the answer but I thought it could be solved through induction.. Any simulation methods would also be fine I think.
Hm, if A1,A2,...,An are in random order (as stated in the question), then all that sorting and probability of correctness of Comparator C does not really matter. The question is then reduced to the expectation of the length of intersection of two random subsets, each of size m, of {A1,...,An}.
The probability that S is k is then (m k)*((n-m) (n-k))/(n m), where (a b) shall denote "a over b", the number of possibilities chosing b elements from a elements. (Because for the second subset we have to choose k elements out of the first subset and m-k elements from the rest.)
E[S] is then the sum(0 <= k <= m) k*(m k)*((n-m) (n-k))/(n m), which reduces to m/(n m) * sum(0 <= k <= m) ((m-1) (k-1))*((n-m) (n-k)). This sum is a basic (well-known) binomial identity giving ((n-1) (m-1)), so finally we get m/(n m) * ((n-1) (m-1)) = m^2/n.
Partial answer for modified question:
Let's assume that Ai1,...,Aim is not intersected with the initial (random) array start, but with the first m values of the correctly sorted array Aj1,...,Ajn. (which seems to be more interesting)
Let us further assume that the comparator C is non-deterministic.
And for simplicity we assume all array elements are different and n=2^N.
Now the partial answer here is at first restricted to
m=1
sorting-algorith=merge sort
Aj1 is the smallest element. In merge sort each element is compared ld(n)=N times. The smallest element is sorted to the first position if and only if it turns out smaller in each of its ld(n)=N comparisons. So the probability P(Ai1=Aj1) = p^N, which equals for m=1 the requested E[S]. So we get
E[S] = p^ld(n)
And here is a partial answer for
m=1
sorting-algorith=bubble sort with getting the smallest elements in place first
If the smallest element is on position k at the beginning (Ak=Aj1), then it takes max(k-1,1) correct comparisons to bring Ak to the front (Ak=Ai1). Since all n start positions are equally probable we get
E[S] = P(Ai1=Aj1) =
= P(Ai1=Aj1|Aj1=A1)*P(Aj1=A1) + ... + P(Ai1=Aj1|Aj1=An)*P(Aj1=An) =
= 1/n (p + p + p^2 + ... + p^(n-1)) = 1/n ((1-p^(n-1))/(1-p)+p-1) =
= (2p - p^2 - p^(n-1)) / (n(1-p))
Good luck for the general case!

Find a permutation that minimizes the sum

I have an array of elements [(A1, B1), ..., (An, Bn)] (all are positive floats and Bi <= 1) and I need to find such permutation which mimimizes the sum A1 + B1 * A2 + B1 * B2 * A3 + ... + B1 * ... B(n-1) * An.
Definitely I can just try all of them and select the one which gives the smallest sum (this will give correct result in O(n!)).
I tried to change the sum to A1 + B1 * (A2 + B2 * (A3 + B3 * (... + B(n-1) * An)) and tried to use a greedy algorithm which grabs the biggest Ai element on each of the steps (this does not yield a correct result).
Now when I look at the latest equation, it looks to me that here I see optimal substructure A(n - 1) + B(n - 1) * An and therefore I have to use dynamic programming, but I can not figure out correct direction. Any thoughts?
I think this can be solved in O(N log(N)).
Any permutation can be obtained by swapping pairs of adjacent elements; this is why bubble sort works, for example. So let's take a look at the effect of swapping entries (A[i], B[i]) and (A[i+1], B[i+1]). We want to find out in which cases it's a good idea to make this swap. This has effect only on the ith and i+1th terms, all others stay the same. Also, both before and after the swap, both terms have a factor B[1]*B[2]*...*B[i-1], which we can call C for now. C is a positive number.
Before the swap, the two terms we're dealing with are C*A[i] + C*B[i]*A[i+1], and afterwards they are C*A[i+1] + C*B[i+1]*A[i]. This is an improvement if the difference between the two is positive:
C*(A[i] + B[i]*A[i+1] - A[i+1] - B[i+1]*A[i]) > 0
Since C is positive, we can ignore that factor and look just at the As and Bs. We get
A[i] - B[i+1]*A[i] > A[i+1] - B[i]*A[i+1]
or equivalently
(1 - B[i+1])*A[i] > (1 - B[i])*A[i+1]
Both of these expressions are nonnegative; if one of B[i] or B[i+1] is one, then the term containing 'one minus that variable' is zero (so we should swap if B[i] is one but not if B[i+1] is one); if both variables are one, then both terms are zero. Let's assume for now that neither is equal to one; then we can rewrite further to obtain
A[i]/(1 - B[i]) > A[i+1]/(1 - B[i+1])
So we should compute this expression D[i] := A[i]/(1 - B[i]) for both terms and swap them if the left one is greater than the right one. We can extend this to the case where one or both Bs are one by defining D[i] to be infinitely big in that case.
OK, let's recap - what have we found? If there is a pair i, i+1 where D[i] > D[i+1], we should swap those two entries. That means that the only case where we cannot improve the result by swapping, is when we have reordered the pairs so that the D[i] values are in increasing order -- that is, all the cases with B[i] = 1 come last (recall that that corresponds to D[i] being infinitely large) and otherwise in increasing order of D[i] value. We can achieve that by sorting with respect to the D[i] value. A quick examination of our steps above shows that the order of pairs with equal D[i] value does not impact the final value.
Computing all D[i] values can be done in a single, linear-time pass. Sorting can be done with an O(N log(N)) algorithm (we needed the swapping-of-neighbouring-elements stuff only as an argument/proof to show that this is the optimal solution, not as part of the implementation).

Resources