Reduce big-O complexity in sorting algorithm - algorithm

I made the acquaintance of big-O a couple of weeks ago and am trying to get to grips with it, but although there's a lot of material out there about calculating time complexity, I can't seem to find out how to make algorithms more efficient.
I've been practicing with the the demo challenge in Codility:
Write a function that, given an array A of N integers, returns the smallest >positive integer (greater than 0) that does not occur in A. For example, given A = [1, 3, 6, 4, 1, 2], the function should return 5.
The given array can have integers between -1 million and 1 million.
I started with a brute-force algorithm:
public int solution(int[] A)
{
for ( int number = 1; number < 1000000; number ++)
{
if (doesContain(A, number)){}
else return i;
}
return 0;
}
This passed all tests for correctness but scored low on performance because the running time was way past the limit, time complexity being O(N**2).
I then tried putting the array into an arraylist, which reduces big-O since each object is "touched" only once, and I can use .Contains which is more efficient than iteration (not sure if that's true; I just sort of remember reading it somewhere).
public int solution(int[] A)
{
ArrayList myArr = new ArrayList();
for (int i=0; i<A.Length; i++)
{
myArr.Add(A[i]);
}
for ( int i = 1; i < 1000000; i++)
{
if (myArr.Contains(i)){}
else return i;
}
return 0;
}
Alas, the time complexity is still at O(N**2) and I can't find explanations of how to cut down time.
I know I shouldn't be using brute force, but can't seem to think of any other ways... Anyone have an explanation of how to make this algorithm more efficient?

This is a typical interview question. Forget the sort; this is a detection problem, O(n + m) on n elements and a max value of m (which is given as a constant).
boolean found[1000000] = False /// set all elements to false
for i in A // check all items in the input array
if i > 0
found[i] = True
for i in (0, 1000000)
if not found[i]
print "Smallest missing number is", i

Related

Is there a sorting algorithm with a worst case time complexity of n^3?

I'm familiar with other sorting algorithms and the worst I've heard of in polynomial time is insertion sort or bubble sort. Excluding the truly terrible bogosort and those like it, are there any sorting algorithms with a worse polynomial time complexity than n^2?
Here's one, implemented in C#:
public void BadSort<T>(T[] arr) where T : IComparable
{
for (int i = 0; i < arr.Length; i++)
{
var shortest = i;
for (int j = i; j < arr.Length; j++)
{
bool isShortest = true;
for (int k = j + 1; k < arr.Length; k++)
{
if (arr[j].CompareTo(arr[k]) > 0)
{
isShortest = false;
break;
}
}
if(isShortest)
{
shortest = j;
break;
}
}
var tmp = arr[i];
arr[i] = arr[shortest];
arr[shortest] = tmp;
}
}
It's basically a really naive sorting algorithm, coupled with a needlessly-complex method of calculating the index with the minimum value.
The gist is this:
For each index
Find the element from this point forward which
when compared with all other elements after it, ends up being <= all of them.
swap this shortest element with the element at this index
The innermost loop (with the comparison) will be executed O(n^3) times in the worst case (descending-sorted input), and every iteration of the outer loop will put one more element into the correct place, getting you just a bit closer to being fully sorted.
If you work hard enough, you could probably find a sorting algorithm with just about any complexity you want. But, as the commenters pointed out, there's really no reason to seek out an algorithm with a worst-case like this. You'll hopefully never run into one in the wild. You really have to try to come up with one this bad.
Here's an example of elegant algorithm called slowsort which runs in Ω(n^(log(n)/(2+ɛ))) for any positive ɛ:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.116.9158&rep=rep1&type=pdf (section 5).
Slow Sort
Returns the sorted vector after performing SlowSort.
It is a sorting algorithm that is of humorous nature and not useful.
It's based on the principle of multiply and surrender, a tongue-in-cheek joke of divide and conquer.
It was published in 1986 by Andrei Broder and Jorge Stolfi in their paper Pessimal Algorithms and Simplexity Analysis.
This algorithm multiplies a single problem into multiple subproblems
It is interesting because it is provably the least efficient sorting algorithm that can be built asymptotically, and with the restriction that such an algorithm, while being slow, must still all the time be working towards a result.
void SlowSort(vector<int> &a, int i, int j)
{
if(i>=j)
return;
int m=i+(j-i)/2;
int temp;
SlowSort(a, i, m);
SlowSort(a, m + 1, j);
if(a[j]<a[m])
{
temp=a[j];
a[j]=a[m];
a[m]=temp;
}
SlowSort(a, i, j - 1);
}

Count number of subsets with sum equal to k

Given an array we need to find out the count of number of subsets having sum exactly equal to a given integer k.
Please suggest an optimal algorithm for this problem. Here the actual subsets are not needed just the count will do.
The array consists of integers which can be negative as well as non negative.
Example:
Array -> {1,4,-1,10,5} abs sum->9
Answer should be 2 for{4,5} and {-1,10}
This is a variation of the subset sum problem, which is NP-Hard - so there is no known polynomial solution to it. (In fact, the subset sum problem says it is hard to find if there is even one subset that sums to the given sum).
Possible approaches to solve it are brute force (check all possible subsets), or if the set contains relatively small integers, you can use the pseudo-polynomial dynamic programming technique:
f(i,0) = 1 (i >= 0) //succesful base clause
f(0,j) = 0 (j != 0) //non succesful base clause
f(i,j) = f(i-1,j) + f(i-1,j-arr[i]) //step
Applying dynamic programming to the above recursive formula gives you O(k*n) time and space solution.
Invoke with f(n,k) [assuming 1 based index for arrays].
Following is memoized Dynamic Programming code to print the count of the number of subsets with a given sum. The repeating values of DP are stores in "tmp" array. To attain a DP solution first always start with a recursive solution to the problem and then store the repeating value in a tmp array to arrive at a memoized solution.
#include <bits/stdc++.h>
using namespace std;
int tmp[1001][1001];
int subset_count(int* arr, int sum, int n)
{ ` if(sum==0)
return 1;
if(n==0)
return 0;
if(tmp[n][sum]!=-1)
return tmp[n][sum];
else{
if(arr[n-1]>sum)
return tmp[n][sum]=subset_count(arr,sum, n-1);
else{
return tmp[n][required_sum]=subset_count(arr,sum, n- 1)+subset_count(arr,sum-arr[n-1], n-1);`
}
}
}
// Driver code
int main()
{ ` memset(tmp,-1,sizeof(tmp));
int arr[] = { 2, 3, 5, 6, 8, 10 };
int n = sizeof(arr) / sizeof(int);
int sum = 10; `
cout << subset_count(arr,sum, n);
return 0;
}
This is recursive solution. It has time complexity of O(2^n)
Use Dynamic Programming to Improve time complexity to be Quadratic O(n^2)
def count_of_subset(arr,sum,n,count):
if sum==0:
count+=1
return count
if n==0 and sum!=0:
count+=0
return count
if arr[n-1]<=sum:
count=count_of_subset(arr,sum-arr[n-1],n-1,count)
count=count_of_subset(arr,sum,n-1,count)
return count
else:
count=count_of_subset(arr,sum,n-1,count)
return count
int numSubseq(vector<int>& nums, int target) {
int size = nums.size();
int T[size+1][target+1];
for(int i=0;i<=size;i++){
for(int j=0;j<=target;j++){
if(i==0 && j!=0)
T[i][j]=0;
else if(j==0)
T[i][j] = 1;
}
}
for(int i=1;i<=size;i++){
for(int j=1;j<=target;j++){
if(nums[i-1] <= j)
T[i][j] = T[i-1][j] + T[i-1][j-nums[i-1]];
else
T[i][j] = T[i-1][j];
}
}
return T[size][target];
}
Although the above base case will work fine if the constraints is : 1<=v[i]<=1000
But consider : constraints : 0<=v[i]<=1000
The above base case will give wrong answer , consider a test case : v = [0,0,1] and k = 1 , the output will be "1" according to the base case .
But the correct answer is 3 : {0,1}{0,0,1}{1}
to avoid this we can go deep instead of returning 0 , and fix it by
C++:
if(ind==0)
{
if(v[0]==target and target==0)return 2;
if(v[0]==target || target==0)return 1;
return 0 ;
}
One of the answer to this solution is to generate a power set of N, where N is the size of the array which will be equal to 2^n. For every number between 0 and 2^N-1 check its binary representation and include all the values from the array for which the bit is in the set position i.e one.
Check if all the values you included results in the sum which is equal to the required value.
This might not be the most efficient solution but as this is an NP hard problem, there exist no polynomial time solution for this problem.

Why should Insertion Sort be used after threshold crossover in Merge Sort

I have read everywhere that for divide and conquer sorting algorithms like Merge-Sort and Quicksort, instead of recursing until only a single element is left, it is better to shift to Insertion-Sort when a certain threshold, say 30 elements, is reached. That is fine, but why only Insertion-Sort? Why not Bubble-Sort or Selection-Sort, both of which has similar O(N^2) performance? Insertion-Sort should come handy only when many elements are pre-sorted (although that advantage should also come with Bubble-Sort), but otherwise, why should it be more efficient than the other two?
And secondly, at this link, in the 2nd answer and its accompanying comments, it says that O(N log N) performs poorly compared to O(N^2) upto a certain N. How come? N^2 should always perform worse than N log N, since N > log N for all N >= 2, right?
If you bail out of each branch of your divide-and-conquer Quicksort when it hits the threshold, your data looks like this:
[the least 30-ish elements, not in order] [the next 30-ish ] ... [last 30-ish]
Insertion sort has the rather pleasing property that you can call it just once on that whole array, and it performs essentially the same as it does if you call it once for each block of 30. So instead of calling it in your loop, you have the option to call it last. This might not be faster, especially since it pulls the whole data through cache an extra time, but depending how the code is structured it might be convenient.
Neither bubble sort nor selection sort has this property, so I think the answer might quite simply be "convenience". If someone suspects selection sort might be better then the burden of proof lies on them to "prove" that it's faster.
Note that this use of insertion sort also has a drawback -- if you do it this way and there's a bug in your partition code then provided it doesn't lose any elements, just partition them incorrectly, you'll never notice.
Edit: apparently this modification is by Sedgewick, who wrote his PhD on QuickSort in 1975. It was analyzed more recently by Musser (the inventor of Introsort). Reference https://en.wikipedia.org/wiki/Introsort
Musser also considered the effect on caches of Sedgewick's delayed
small sorting, where small ranges are sorted at the end in a single
pass of insertion sort. He reported that it could double the number of
cache misses, but that its performance with double-ended queues was
significantly better and should be retained for template libraries, in
part because the gain in other cases from doing the sorts immediately
was not great.
In any case, I don't think the general advice is "whatever you do, don't use selection sort". The advice is, "insertion sort beats Quicksort for inputs up to a surprisingly non-tiny size", and this is pretty easy to prove to yourself when you're implementing a Quicksort. If you come up with another sort that demonstrably beats insertion sort on the same small arrays, none of those academic sources is telling you not to use it. I suppose the surprise is that the advice is consistently towards insertion sort, rather than each source choosing its own favorite (introductory teachers have a frankly astonishing fondness for bubble sort -- I wouldn't mind if I never hear of it again). Insertion sort is generally thought of as "the right answer" for small data. The issue isn't whether it "should be" fast, it's whether it actually is or not, and I've never particularly noticed any benchmarks dispelling this idea.
One place to look for such data would be in the development and adoption of Timsort. I'm pretty sure Tim Peters chose insertion for a reason: he wasn't offering general advice, he was optimizing a library for real use.
Insertion sort is faster in practice, than bubblesort at least. Their asympotic running time is the same, but insertion sort has better constants (fewer/cheaper operations per iteration). Most notably, it requires only a linear number of swaps of pairs of elements, and in each inner loop it performs comparisons between each of n/2 elements and a "fixed" element that can be stores in a register (while bubble sort has to read values from memory). I.e. insertion sort does less work in its inner loop than bubble sort.
The answer claims that 10000 n lg n > 10 n² for "reasonable" n. This is true up to about 14000 elements.
I am surprised no-one's mentioned the simple fact that insertion sort is simply much faster for "almost" sorted data. That's the reason it's used.
The easier one first: why insertion sort over selection sort? Because insertion sort is in O(n) for optimal input sequences, i.e. if the sequence is already sorted. Selection sort is always in O(n^2).
Why insertion sort over bubble sort? Both need only a single pass for already sorted input sequences, but insertion sort degrades better. To be more specific, insertion sort usually performs better with a small number of inversion than bubble sort does. Source This can be explained because bubble sort always iterates over N-i elements in pass i while insertion sort works more like "find" and only needs to iterate over (N-i)/2 elements in average (in pass N-i-1) to find the insertion position. So, insertion sort is expected to be about two times faster than insertion sort on average.
Here is an empirical proof the insertion sort is faster then bubble sort (for 30 elements, on my machine, the attached implementation, using java...).
I ran the attached code, and found out that the bubble sort ran on average of 6338.515 ns, while insertion took 3601.0
I used wilcoxon signed test to check the probability that this is a mistake and they should actually be the same - but the result is below the range of the numerical error (and effectively P_VALUE ~= 0)
private static void swap(int[] arr, int i, int j) {
int temp = arr[i];
arr[i] = arr[j];
arr[j] = temp;
}
public static void insertionSort(int[] arr) {
for (int i = 1; i < arr.length; i++) {
int j = i;
while (j > 0 && arr[j-1] > arr[j]) {
swap(arr, j, j-1);
j--;
}
}
}
public static void bubbleSort(int[] arr) {
for (int i = 0 ; i < arr.length; i++) {
boolean bool = false;
for (int j = 0; j < arr.length - i ; j++) {
if (j + 1 < arr.length && arr[j] > arr[j+1]) {
bool = true;
swap(arr,j,j+1);
}
}
if (!bool) break;
}
}
public static void main(String... args) throws Exception {
Random r = new Random(1);
int SIZE = 30;
int N = 1000;
int[] arr = new int[SIZE];
int[] millisBubble = new int[N];
int[] millisInsertion = new int[N];
System.out.println("start");
//warm up:
for (int t = 0; t < 100; t++) {
insertionSort(arr);
}
for (int t = 0; t < N; t++) {
arr = generateRandom(r, SIZE);
int[] tempArr = Arrays.copyOf(arr, arr.length);
long start = System.nanoTime();
insertionSort(tempArr);
millisInsertion[t] = (int)(System.nanoTime()-start);
tempArr = Arrays.copyOf(arr, arr.length);
start = System.nanoTime();
bubbleSort(tempArr);
millisBubble[t] = (int)(System.nanoTime()-start);
}
int sum1 = 0;
for (int x : millisBubble) {
System.out.println(x);
sum1 += x;
}
System.out.println("end of bubble. AVG = " + ((double)sum1)/millisBubble.length);
int sum2 = 0;
for (int x : millisInsertion) {
System.out.println(x);
sum2 += x;
}
System.out.println("end of insertion. AVG = " + ((double)sum2)/millisInsertion.length);
System.out.println("bubble took " + ((double)sum1)/millisBubble.length + " while insertion took " + ((double)sum2)/millisBubble.length);
}
private static int[] generateRandom(Random r, int size) {
int[] arr = new int[size];
for (int i = 0 ; i < size; i++)
arr[i] = r.nextInt(size);
return arr;
}
EDIT:
(1) optimizing the bubble sort (updated above) reduced the total time taking to bubble sort to: 6043.806 not enough to make a significant change. Wilcoxon test is still conclusive: Insertion sort is faster.
(2) I also added a selection sort test (code attached) and compared it against insertion. The results are: selection took 4748.35 while insertion took 3540.114.
P_VALUE for wilcoxon is still below the range of numerical error (effectively ~=0)
code for selection sort used:
public static void selectionSort(int[] arr) {
for (int i = 0; i < arr.length ; i++) {
int min = arr[i];
int minElm = i;
for (int j = i+1; j < arr.length ; j++) {
if (arr[j] < min) {
min = arr[j];
minElm = j;
}
}
swap(arr,i,minElm);
}
}
EDIT: As IVlad points out in a comment, selection sort does only n swaps (and therefore only 3n writes) for any dataset, so insertion sort is very unlikely to beat it on account of doing fewer swaps -- but it will likely do substantially fewer comparisons. The reasoning below better fits a comparison with bubble sort, which will do a similar number of comparisons but many more swaps (and thus many more writes) on average.
One reason why insertion sort tends to be faster than the other O(n^2) algorithms like bubble sort and selection sort is because in the latter algorithms, every single data movement requires a swap, which can be up to 3 times as many memory copies as are necessary if the other end of the swap needs to be swapped again later.
With insertion sort OTOH, if the next element to be inserted isn't already the largest element, it can be saved into a temporary location, and all lower elements shunted forward by starting from the right and using single data copies (i.e. without swaps). This opens up a gap to put the original element.
C code for insertion-sorting integers without using swaps:
void insertion_sort(int *v, int n) {
int i = 1;
while (i < n) {
int temp = v[i]; // Save the current element here
int j = i;
// Shunt everything forwards
while (j > 0 && v[j - 1] > temp) {
v[j] = v[j - 1]; // Look ma, no swaps! :)
--j;
}
v[j] = temp;
++i;
}
}

Finding the list of prime numbers in shortest time

I read lot many algorithms to find prime numbers and the conclusion is that a number is a prime number if it is not divisible by any of its preceding prime numbers.
I am not able to find a more precise definition. Based on this I have written a code and it performs satisfactory till the max number I pass is 1000000. But I believe there are much faster algorithms to find all primes lesser than a given number.
Following is my code, can I have a better version of the same?
public static void main(String[] args) {
for (int i = 2; i < 100000; i++) {
if (checkMod(i)) {
primes.add(i);
}
}
}
private static boolean checkMod( int num) {
for (int i : primes){
if( num % i == 0){
return false;
}
}
return true;
}
The good thing in your primality test is that you only divide by primes.
private static boolean checkMod( int num) {
for (int i : primes){
if( num % i == 0){
return false;
}
}
return true;
}
The bad thing is that you divide by all primes found so far, that is, all primes smaller than the candidate. That means that for the largest prime below one million, 999983, you divide by 78497 primes to find out that this number is a prime. That's a lot of work. So much, in fact, that the work spent on primes in this algorithm accounts for about 99.9% of all work when going to one million, a larger part for higher limits. And that algorithm is nearly quadratic, to find the primes to n in this way, you need to perform about
n² / (2*(log n)²)
divisions.
A simple improvement is to stop the division earlier. Let n be a composite number (i.e. a number greter than 1 that has divisors other than 1 and n), and let d be a divisor of n.
Now, d being a divisor of n means that n/d is an integer, and also a divisor of n: n/(n/d) = d.
So we can naturally group the divisors of n into pairs, each divisor d gives rise to the pair (d, n/d).
For such a pair, there are two possibilities:
d = n/d, which means n = d², or d = √n.
The two are different, then one of them is smaller than the other, say d < n/d. But that immediately translates to d² < n or d < √n.
So, either way, each pair of divisors contains (at least) one not exceeding √n, hence, if n is a composite number, its smallest divisor (other than 1) does not exceed √n.
So we can stop the trial division when we've reached √n:
private static boolean checkMod( int num) {
for (int i : primes){
if (i*i > n){
// We have not found a divisor less than √n, so it's a prime
return true;
}
if( num % i == 0){
return false;
}
}
return true;
}
Note: That depends on the list of primes being iterated in ascending order. If that is not guaranteed by the language, you have to use a different method, iterate by index through an ArrayList or something like that.
Stopping the trial division at the square root of the candidate, for the largest prime below one million, 999983, we now only need to divide it by the 168 primes below 1000. That's a lot less work than previously. Stopping the trial division at the square root, and dividing only by primes, is as good as trial division can possibly get and requires about
2*n^1.5 / (3*(log n)²)
divisions, for n = 1000000, that's a factor of about 750, not bad, is it?
But that's still not very efficient, the most efficient methods to find all primes below n are sieves. Simple to implement is the classical Sieve of Eratosthenes. That finds the primes below n in O(n*log log n) operations, with some enhancements (eliminating multiples of several small primes from consideration in advance), its complexity can be reduced to O(n) operations. A relatively new sieve with better asymptotic behaviour is the Sieve of Atkin, which finds the primes to n in O(n) operations, or with the enhancement of eliminating the multiples of some small primes, in O(n/log log n) operations.
The Sieve of Atkin is more complicated to implement, so it's likely that a good implementation of a Sieve of Eratosthenes performs better than a naive implementation of a Sieve of Atkin. For implementations of like optimisation levels, the performance difference is small unless the limit becomes large (larger than 1010; and it's not uncommon that in practice, a Sieve of Eratosthenes scales better than a Sieve of Atkin beyond that, due to better memory access patterns). So I would recommend beginning with a Sieve of Eratosthenes, and only when its performance isn't satisfactory despite honest efforts at optimisation, delve into the Sieve of Atkin. Or, if you don't want to implement it yourself, find a good implementation somebody else has already seriously tuned.
I have gone into a bit more detail in an answer with a slightly different setting, where the problem was finding the n-th prime. Some implementations of more-or-less efficient methods are linked from that answer, in particular one or two usable (though not much optimised) implementations of a Sieve of Eratosthenes.
I always use Eratosthenes sieve:
isPrime[100001] // - initially contains only '1' values (1,1,1 ... 1)
isPrime[0] = isPrime[1] = 0 // 0 and 1 are not prime numbers
primes.push(2); //first prime number. 2 is a special prime number because is the only even prime number.
for (i = 2; i * 2 <= 100000; i++) isPrime[i * 2] = 0 // remove all multiples of 2
for (i = 3; i <= 100000; i += 2) // check all odd numbers from 2 to 100000
if (isPrime[i]) {
primes.push(i); // add the new prime number to the solution
for (j = 2; i * j <= 100000; j++) isPrime[i * j] = 0; // remove all i's multiples
}
return primes
I hope you understand my comments
I understand a prime number to be a number that is only divisible by itself and the number 1 (with no remainder). See Wikipedia Article
That being said, I don't understand the algorithm very well in the second comment but one small improvement to your algorithm would be to change your for loop to:
for (int i = 5; i < 100000; i = i + 2) {
if (checkMod(i)) {
primes.add(i);
}
}
This is based on the assumption that 1, 2, and 3 are all prime numbers and all even numbers thereafter are not prime numbers. This at least cuts your algorithm in half.
I want to make a still slightly improved version to the 0ne suggested by Benjamin Oman above,
This is just one modification to avoid checking for primality of all the numbers ending with digit '5', because these numbers are certainly not primes as these are divisible by 5.
for (int i = 7;(i < 100000) && (!i%5==0); i = i + 2) {
if (checkMod(i)) {
primes.add(i);
}
}
This is based on the assumption that 2,3,5 are primes. The above little change will reduce all factors of 5 and improve.
Nicely explained by #Daniel Fischer.
A Implementation in C++ from his explanation:
#include<iostream>
using namespace std;
long* getListOfPrimeNumbers (long total)
{
long * primes;
primes = new long[total];
int count = 1;
primes[0] = 2;
primes[1] = 3;
while (count < total)
{
long composite_number = primes[count] + 2;
bool is_prime = false;
while (is_prime == false)
{
is_prime = true;
for (int i = 0; i <= count; i++)
{
long prime = primes[i];
if (prime * prime > composite_number)
{
break;
}
if (composite_number % prime == 0)
{
is_prime = false;
break;
}
}
if (is_prime == true)
{
count++;
primes[count] = composite_number;
}
else
{
composite_number += 2;
}
}
}
return primes;
}
int main()
{
long * primes;
int total = 10;
primes = getListOfPrimeNumbers(total);
for (int i = 0; i < total; i++){
cout << primes[i] << "\n";
}
return 0;
}
import array , math
print("enter a range to find prime numbers")
a= 0
b= 5000
c=0
x=0
k=1
g=[2]
l=0
for I in range( a , b):
for k in g:
x=x+1
if k>2:
if k > math . sqrt( I ):
break
if( I % k==0):
c=c+1
break
if c==0:
if I!=1:
g . append( I )
c=0
print g
#this algorithm will take only 19600 iteration for a range from 1-5000,which is one of fastest algorithm according to me
I found the mathematicians say 'that' "prime numbers after 3 are always one side of the multiple of 6".
It maen 5 ,7 prime numbers is nearer to 6.
11,13 are also nearer to 62.
17,19 also 63.
21,23 also 6*4.
I wrote both normally and like this up to 1million, I found this algorithm is also right and more quickly.😁
num=1000000
prime=[2,3]
def test(i):
for j in prime:
if(i%j==0):
break
if(j*j>i):
prime.append(i)
break
return 0
for i in range (6,num,6):
i=i-1
test(i)
i=i+2
test(i)
i=i-1
print(prime)

order of complexity of the algorithm in O notation

Can anyone tell me order of complexity of below algorithm? This algorithm is to do following:
Given an unsorted array of integers with duplicate numbers, write the most efficient code to print out unique values in the array.
I would also like to know
What are some pros and cons in the context of hardware usage of this implementation
private static void IsArrayDuplicated(int[] a)
{
int size = a.Length;
BitArray b = new BitArray(a.Max()+1);
for ( int i = 0; i < size; i++)
{
b.Set(a[i], true);
}
for (int i = 0; i < b.Count; i++)
{
if (b.Get(i))
{
System.Console.WriteLine(i.ToString());
}
}
Console.ReadLine();
}
You have two for loops, one of length a.Length and one of length (if I understand the code correctly) a.Max() + 1. So your algorithmic complexity is O(a.Length + a.Max())
The complexity of the algorithm linear.
Finding the maximum is linear.
Setting the bits is linear.
However the algorithm is also wrong,
unless your integers can be assumed to be positive.
It also has a problem with large integers - do you really
want to allocate MAX_INT/8 bytes of memory?
The name, btw, makes me cringe. IsXYZ() should always return a bool.
I'd say, try again.
Correction - pavpanchekha has the correct answer.
O(n) is probably only possible for a finite/small domain of integers. Everyone think about bucketsort. The Hashmap approach is basically not O(n) but O(n^2) since worst-case insertion into a hashmap is O(n) and NOT constant.
How about sorting the list in O(nlog(n)) and then going through it and print the duplicate values. This results in O(nlog(n)) which is probably the true complexity of the problem.
HashSet<int> mySet = new HashSet<int>( new int[] {-1, 0, -2, 2, 10, 2, 10});
foreach(var item in mySet)
{
console.WriteLine(item);
}
// HashSet guarantee unique values without exception
You have two loops, each based on the size of n. I agree with whaley, but that should give you a good start on it.
O(n) on a.length
Complexity of your algorithm is O(N), but algorithm is not correct.
If numbers are negative it will not work
In case of large numbers you will have problems with memory
I suggest you to use this approach:
private static void IsArrayDuplicated(int[] a) {
int size = a.length;
Set<Integer> b = new HashSet<Integer>();
for (int i = 0; i < size; i++) {
b.add(a[i]);
}
Integer[] T = b.toArray(new Integer[0]);
for (int i = 0; i < T.length; i++) {
System.out.println(T[i]);
}
}

Resources