I'm implementing efficient algorithm to find K-complementary pairs of numbers in given A array.
I intended to implement O(n) algorithm. And it is O(n) for sure when all numbers in A array are different. However I wonder if it is still O(n) if numbers in A array may be equal, i.e. like in test provided below, where all elements are equal 1. In this test where 3 elements provided we clearly see that we go through external loop 3 times and internal 3 times. However, if A table has n elements where n belongs to set of numbers, than it is untrue that all elements are equal.
That's why I believe complexity of this inner loop is reduced to O(1).
UPDATED
Inspired but not fully satisfied with answers to my question I did some deeper research about complexity notation definitions and discovered I wasn't precise with my calculations above.
f(n) = O(g(n)) means there are positive constants c and k, such that 0 ≤ f(n) ≤ cg(n) for all n ≥ k. The values of c and k must be fixed for the function f and must not depend on n.
By definition Big-O gives asymptotic upper bound. It means that if we have worst-case with internal for loop which potentially iterate through n elements than complexity is O(n^2). Even if statistically it is very rare scenario we cannot say that complexity is less than O(n^2). However, what is interesting, if complexity O(n^2) true than O(n^3) is true by definition as well. Big-O
Moreover, it is also correct to say that Big-Ω complexity for this algorithm is Ω(n). We use big-Ω notation for asymptotic lower bounds, since it bounds the growth of the running time from below for large enough input sizes. Big-Ω
To the best of my knowledge we cannot calculate Big-θ until we tighten Big-O and Big-Ω to the same value.
It might be worth to notice that algorithm below would be linear if it only checked that array contains Complementary Pair. But complexity grows to O(n^2) when it collects all complementary pairs. I watched Google interview presentation with hope to get more clues about topic but they simplified problem to "hasComplementaryPair" function.
Next step about ComplementaryPair algorithm is to find different, faster algorithm, or prove that it can't be done faster.
End of UPDATED
public class ComplementaryPairs {
public Set<Pair<Integer, Integer>> process(Integer[] A, Integer k) {
Set<Pair<Integer, Integer>> pairs = new HashSet<>();
if (A == null) {
return pairs;
}
/*
* 1. Build differential map.
* < k - A[i], i >
*/
Map<Integer, List<Integer>> map = new HashMap<>();
for (int i = 0; i < A.length; i++) {
put(map, k - A[i], i);
}
/*
* 2. Collect pairs
*/
for (int j = 0; j < A.length; j++) {
if (map.containsKey(A[j])) {
/*
* I've wondered if this loop spoils O(n) complexity,
* because in scenario where i.e. we have 10 elements in A and every element is the same,
* we have to go through 10 elements in external loop and 10 elements in List in HashMap.
*
* However, if A table has n elements where n belongs to set of numbers, than situation described above is impossible.
* In other words I believe complexity of this inner used to be counted as O(1)
*
*/
for (Integer iIndex : map.get(A[j])) {
pairs.add(new Pair<>(j, iIndex));
}
}
}
return pairs;
}
private void put(Map<Integer, List<Integer>> map, Integer key, Integer value) {
if (map.containsKey(key)) {
map.get(key).add(value);
} else {
/* This may be improved, so List is not created for one element only */
List<Integer> list = new LinkedList<>();
list.add(value);
map.put(key, list);
}
}
}
Consider test below
#Test
public void testWhenArrayContainElementsOfEqualValue() {
// prepare
Integer[] A = {1, 1, 1};
Integer k = 2;
// execute
Set<Pair<Integer, Integer>> resultSet = complementaryPairs.process(A, k);
System.out.println(resultSet);
// assert
assertTrue(resultSet.contains(new Pair<>(0, 0)));
assertTrue(resultSet.contains(new Pair<>(0, 1)));
assertTrue(resultSet.contains(new Pair<>(0, 2)));
assertTrue(resultSet.contains(new Pair<>(1, 0)));
assertTrue(resultSet.contains(new Pair<>(1, 1)));
assertTrue(resultSet.contains(new Pair<>(1, 2)));
assertTrue(resultSet.contains(new Pair<>(2, 0)));
assertTrue(resultSet.contains(new Pair<>(2, 1)));
assertTrue(resultSet.contains(new Pair<>(2, 2)));
}
Have you read this? https://codereview.stackexchange.com/a/145076
Creating the map is o(n) run time but when you write:
for (int j = 0; j < A.length; j++) {
if (map.containsKey(A[j])) {
for (Integer iIndex : map.get(A[j])) {
pairs.add(new Pair<>(j, iIndex));
}
}
}
You have a potential complexity of o(n^2). You should try to avoid that and
then the time complexity is clear.
As for the code, the map will contain key only when there exists i such that k-A[i]==key.
Example:
A=[0,0,0,0,0,0], k=0
Since key 0 has all elements o the array and A[i]=0 for all i for the following example all i satisfy the condition.
So n^2 is the run time. Big o complexity is o(n^2).
And now some duplications problems are appearing.
Related
I made the acquaintance of big-O a couple of weeks ago and am trying to get to grips with it, but although there's a lot of material out there about calculating time complexity, I can't seem to find out how to make algorithms more efficient.
I've been practicing with the the demo challenge in Codility:
Write a function that, given an array A of N integers, returns the smallest >positive integer (greater than 0) that does not occur in A. For example, given A = [1, 3, 6, 4, 1, 2], the function should return 5.
The given array can have integers between -1 million and 1 million.
I started with a brute-force algorithm:
public int solution(int[] A)
{
for ( int number = 1; number < 1000000; number ++)
{
if (doesContain(A, number)){}
else return i;
}
return 0;
}
This passed all tests for correctness but scored low on performance because the running time was way past the limit, time complexity being O(N**2).
I then tried putting the array into an arraylist, which reduces big-O since each object is "touched" only once, and I can use .Contains which is more efficient than iteration (not sure if that's true; I just sort of remember reading it somewhere).
public int solution(int[] A)
{
ArrayList myArr = new ArrayList();
for (int i=0; i<A.Length; i++)
{
myArr.Add(A[i]);
}
for ( int i = 1; i < 1000000; i++)
{
if (myArr.Contains(i)){}
else return i;
}
return 0;
}
Alas, the time complexity is still at O(N**2) and I can't find explanations of how to cut down time.
I know I shouldn't be using brute force, but can't seem to think of any other ways... Anyone have an explanation of how to make this algorithm more efficient?
This is a typical interview question. Forget the sort; this is a detection problem, O(n + m) on n elements and a max value of m (which is given as a constant).
boolean found[1000000] = False /// set all elements to false
for i in A // check all items in the input array
if i > 0
found[i] = True
for i in (0, 1000000)
if not found[i]
print "Smallest missing number is", i
These programs do the calculation ∑𝑖=0 𝑎𝑖 𝑥
I am trying to figure out big O calculations. I have done alot of study but I am having a problem getting this down. I understand that big O is worst case scenario or upper bounds. From what I can figure program one has two for loops one that runs for the length of the array and the other runs to the value of the first loop up to the length of the array. I think that if both ran the full length of the array then it would be quadratic O(N^2). Since the second loop only runs the length of the length of the array once I am thinking O(NlogN).
Second program has only one for loop so it would be O(N).
Am I close? If not please explain to me how I would calculate this. Since this is in the homework I am going to have to be able to figure something like this on the test.
Program 1
// assume input array a is not null
public static double q6_1(double[] a, double x)
{
double result = 0;
for (int i=0; i<a.length; i++)
{
double b = 1;
for (int j=0; j<i; j++)
{
b *= x;
}
result += a[i] * b;
}
return result;
}
Program 2
// assume input array a is not null
public static double q6_2(double[] a, double x)
{
double result = 0;
for (int i=a.length-1; i>=0; i--)
{
result = result * x + a[i];
}
return result;
}
I'm using N to refer to the length of the array a.
The first one is O(N^2). The inner loop runs 1, 2, 3, 4, ..., N - 1 times. This sum is approx N(N-1)/2 which is O(N^2).
The second one is O(N). It is simply iterating through the length of the array.
Complexity of a program is basically number of instructions executed.
When we talk about the upper bound, it means we are considering the things in worst case which should be taken in consideration by every programmer.
Let n = a.length;
Now coming back to your question, you are saying that the time complexity of the first program should be O(nlogn), which is wrong. As when i = a.length-1 the inner loop will also iterate from j = 0 to j = i. Hence the complexity would be O(n^2).
You are correct in judging the time complexity of the second program which is O(n).
I have a question about the complexity of a recursive function
The code (in C#) is like this:
public void function sort(int[] a, int n)
{
bool done = true;
int j = 0;
while (j <= n - 2)
{
if (a[j] > a[j + 1])
{
// swap a[j] and a[j + 1]
done = false;
{
j++;
}
j = n - 1;
while (j >= 1)
{
if (a[j] < a[j - 1])
{
// swap a[j] and a[j - 1]
done = false;
{
j--;
}
if (!done)
sort(array, length);
}
Now, the difficulty I have is the recursive part of the function.
In all of the recursions I have seen so far, we can determine the number of recursive calls based on the input size because every time we call the function with a smaller input etc.
But for this problem, the recursive part doesn't depend on the input size; instead it's based on whether the elements are sorted or not. I mean, if the array is already sorted, the function will run in O(n) because of the two loops and no recursive calls (I hope I'm right about this part).
How can we determine O(n) for the recursive part?
O(f(n)) means that your algorithm is always faster or equal as f(n) regardless of input (considering only size of input). So you should find worst case for your input of size n.
This one looks like some bubble sort algorithm (although weirdly complicated) which is O(n^2). In worst case, every call of sort function takes O(n) and you transport highest number to the end of array - you have n items so its O(n)*O(n) => O(n^2).
This is bubble sort. It's O(n^2). Since the algorithm swaps adjacent elements, the running time is proportional to the number of inversions in a list, which is O(n^2). The number of recursions will be O(n). The backward pass just causes it to recurse about half the time but doesn't affect the actual complexity--it's still doing the same amount of work.
I have read that quicksort is much faster than mergesort in practice, and the reason for this is the hidden constant.
Well, the solution for the randomized quick sort complexity is 2nlnn=1.39nlogn which means that the constant in quicksort is 1.39.
But what about mergesort? What is the constant in mergesort?
Let's see if we can work this out!
In merge sort, at each level of the recursion, we do the following:
Split the array in half.
Recursively sort each half.
Use the merge algorithm to combine the two halves together.
So how many comparisons are done at each step? Well, the divide step doesn't make any comparisons; it just splits the array in half. Step 2 doesn't (directly) make any comparisons; all comparisons are done by recursive calls. In step 3, we have two arrays of size n/2 and need to merge them. This requires at most n comparisons, since each step of the merge algorithm does a comparison and then consumes some array element, so we can't do more than n comparisons.
Combining this together, we get the following recurrence:
C(1) = 0
C(n) = 2C(n / 2) + n
(As mentioned in the comments, the linear term is more precisely (n - 1), though this doesn’t change the overall conclusion. We’ll use the above recurrence as an upper bound.)
To simplify this, let's define n = 2k and rewrite this recurrence in terms of k:
C'(0) = 0
C'(k) = 2C'(k - 1) + 2^k
The first few terms here are 0, 2, 8, 24, ... . This looks something like k 2k, and we can prove this by induction. As our base case, when k = 0, the first term is 0, and the value of k 2k is also 0. For the inductive step, assume the claim holds for some k and consider k + 1. Then the value is 2(k 2k) + 2k + 1 = k 2 k + 1 + 2k + 1 = (k + 1)2k + 1, so the claim holds for k + 1, completing the induction. Thus the value of C'(k) is k 2k. Since n = 2 k, this means that, assuming that n is a perfect power of two, we have that the number of comparisons made is
C(n) = n lg n
Impressively, this is better than quicksort! So why on earth is quicksort faster than merge sort? This has to do with other factors that have nothing to do with the number of comparisons made. Primarily, since quicksort works in place while merge sort works out of place, the locality of reference is not nearly as good in merge sort as it is in quicksort. This is such a huge factor that quicksort ends up being much, much better than merge sort in practice, since the cost of a cache miss is pretty huge. Additionally, the time required to sort an array doesn't just take the number of comparisons into account. Other factors like the number of times each array element is moved can also be important. For example, in merge sort we need to allocate space for the buffered elements, move the elements so that they can be merged, then merge back into the array. These moves aren't counted in our analysis, but they definitely add up. Compare this to quicksort's partitioning step, which moves each array element exactly once and stays within the original array. These extra factors, not the number of comparisons made, dominate the algorithm's runtime.
This analysis is a bit less precise than the optimal one, but Wikipedia confirms that the analysis is roughly n lg n and that this is indeed fewer comparisons than quicksort's average case.
Hope this helps!
In the worst case and assuming a straight-forward implementation, the number of comparisons to sort n elements is
n ⌈lg n⌉ − 2⌈lg n⌉ + 1
where lg n indicates the base-2 logarithm of n.
This result can be found in the corresponding Wikipedia article or recent editions of The Art of Computer Programming by Donald Knuth, and I just wrote down a proof for this answer.
Merging two sorted arrays (or lists) of size k resp. m takes k+m-1 comparisons at most, min{k,m} at best. (After each comparison, we can write one value to the target, when one of the two is exhausted, no more comparisons are necessary.)
Let C(n) be the worst case number of comparisons for a mergesort of an array (a list) of n elements.
Then we have C(1) = 0, C(2) = 1, pretty obviously. Further, we have the recurrence
C(n) = C(floor(n/2)) + C(ceiling(n/2)) + (n-1)
An easy induction shows
C(n) <= n*log_2 n
On the other hand, it's easy to see that we can come arbitrarily close to the bound (for every ε > 0, we can construct cases needing more than (1-ε)*n*log_2 n comparisons), so the constant for mergesort is 1.
Merge sort is O(n log n) and at each step, in the "worst" case (for number of comparisons), performs a comparison.
Quicksort, on the other hand, is O(n^2) in the worst case.
C++ program to count the number of comparisons in merge sort.
First the program will sort the given array, then it will show the number of comparisons.
#include<iostream>
using namespace std;
int count=0; /* to count the number of comparisions */
int merge( int arr [ ], int l, int m, int r)
{
int i=l; /* left subarray*/
int j=m+1; /* right subarray*/
int k=l; /* temporary array*/
int temp[r+1];
while( i<=m && j<=r)
{
if ( arr[i]<= arr[j])
{
temp[k]=arr[i];
i++;
}
else
{
temp[k]=arr[j];
j++;
}
k++;
count++;
}
while( i<=m)
{
temp[k]=arr[i];
i++;
k++;
}
while( j<=r)
{
temp[k]=arr[j];
j++;
k++;
}
for( int p=l; p<=r; p++)
{
arr[p]=temp[p];
}
return count;
}
int mergesort( int arr[ ], int l, int r)
{
int comparisons;
if(l<r)
{
int m= ( l+r)/2;
mergesort(arr,l,m);
mergesort(arr,m+1,r);
comparisions = merge(arr,l,m,r);
}
return comparisons;
}
int main ()
{
int size;
cout<<" Enter the size of an array "<< endl;
cin>>size;
int myarr[size];
cout<<" Enter the elements of array "<<endl;
for ( int i=0; i< size; i++)
{
cin>>myarr[i];
}
cout<<" Elements of array before sorting are "<<endl;
for ( int i=0; i< size; i++)
{
cout<<myarr[i]<<" " ;
}
cout<<endl;
int c=mergesort(myarr, 0, size-1);
cout<<" Elements of array after sorting are "<<endl;
for ( int i=0; i< size; i++)
{
cout<<myarr[i]<<" " ;
}
cout<<endl;
cout<<" Number of comaprisions while sorting the given array"<< c <<endl;
return 0;
}
I am assuming reader knows Merge sort. Comparisons happens only when two sorted arrays is getting merged. For simplicity, assume n as power of 2. To merge two (n/2) size arrays in worst case, we need (n - 1) comparisons. -1 appears here, as last element left on merging does not require any comparison. First found number of total comparison assuming it as n for some time, we can correct it by (-1) part. Number of levels for merging is log2(n) (Imagine as tree structure). In each layer there will be n comparison (need to minus some number, due to -1 part),so total comparison is nlog2(n) - (Yet to be found). "Yet to be found" part does not give nlog2(n) constant, it is actually (1 + 2 + 4 + 8 + ... + (n/2) = n - 1).
Number of total comparison in merge sort = n*log2(n) - (n - 1).
So, your constant is 1.
Can anyone tell me order of complexity of below algorithm? This algorithm is to do following:
Given an unsorted array of integers with duplicate numbers, write the most efficient code to print out unique values in the array.
I would also like to know
What are some pros and cons in the context of hardware usage of this implementation
private static void IsArrayDuplicated(int[] a)
{
int size = a.Length;
BitArray b = new BitArray(a.Max()+1);
for ( int i = 0; i < size; i++)
{
b.Set(a[i], true);
}
for (int i = 0; i < b.Count; i++)
{
if (b.Get(i))
{
System.Console.WriteLine(i.ToString());
}
}
Console.ReadLine();
}
You have two for loops, one of length a.Length and one of length (if I understand the code correctly) a.Max() + 1. So your algorithmic complexity is O(a.Length + a.Max())
The complexity of the algorithm linear.
Finding the maximum is linear.
Setting the bits is linear.
However the algorithm is also wrong,
unless your integers can be assumed to be positive.
It also has a problem with large integers - do you really
want to allocate MAX_INT/8 bytes of memory?
The name, btw, makes me cringe. IsXYZ() should always return a bool.
I'd say, try again.
Correction - pavpanchekha has the correct answer.
O(n) is probably only possible for a finite/small domain of integers. Everyone think about bucketsort. The Hashmap approach is basically not O(n) but O(n^2) since worst-case insertion into a hashmap is O(n) and NOT constant.
How about sorting the list in O(nlog(n)) and then going through it and print the duplicate values. This results in O(nlog(n)) which is probably the true complexity of the problem.
HashSet<int> mySet = new HashSet<int>( new int[] {-1, 0, -2, 2, 10, 2, 10});
foreach(var item in mySet)
{
console.WriteLine(item);
}
// HashSet guarantee unique values without exception
You have two loops, each based on the size of n. I agree with whaley, but that should give you a good start on it.
O(n) on a.length
Complexity of your algorithm is O(N), but algorithm is not correct.
If numbers are negative it will not work
In case of large numbers you will have problems with memory
I suggest you to use this approach:
private static void IsArrayDuplicated(int[] a) {
int size = a.length;
Set<Integer> b = new HashSet<Integer>();
for (int i = 0; i < size; i++) {
b.add(a[i]);
}
Integer[] T = b.toArray(new Integer[0]);
for (int i = 0; i < T.length; i++) {
System.out.println(T[i]);
}
}