checking if 2 numbers of array add up to I - algorithm

I saw a interview question as follows:
Give an unsorted array of integers A and and an integer I, find out if any two members of A add up to I.
any clues?
time complexity should be less

Insert the elements into hashtable.
While inserting x, check if I-x already exists. O(n) expected time.
Otherwise, sort the array ascending (from index 0 to n-1). Have two pointers, one at max and one at min (call them M and m respectively).
If a[M] + a[m] > I then M--
If a[M] + a[m] < I then m++
If a[M] + a[m] == I you have found it
If m > M, no such numbers exist.

If you have the range which the integers are within, you can use a counting sort-like solution where you scan over the array and count an array up. Ex you have the integers
input = [0,1,5,2,6,4,2]
And you create an array like this:
count = int[7]
which (in Java,C# etc.) are suited for counting integers between 0 and 6.
foreach integer in input
count[i] = count[i] + 1
This will give you the array [1,1,2,0,1,1,1]. Now you can scan over this array (half of it) and check whether there are integers which adds up to i like
for j = 0 to count.length - 1
if count[j] != 0 and count[i - j] != 0 then // Check for array out-of-bounds here
WUHUU! the integers j and i - j adds up
Overall this algorithm gives you O(n + k) where n is from the scan over the input of length n and k is the scan over the count array of length k (integers between 0 and k - 1). This means that if n > k then you have a guaranteed O(n) solution.

For example, loop and add possible number to set or hash and if found, just return it.
>>> A = [11,3,2,9,12,15]
>>> I = 14
>>> S = set()
>>> for x in A:
... if x in S:
... print I-x, x
... S.add(I-x)
...
11 3
2 12
>>>

sort the array
for each element X in A, perform a binary search for I-X. If I-X is in A, we have a solution.
This is O(nlogn).
If A contains integers in a given (small enough) range, we can use a trick to make it O(n):
we have an array V. For each element X in A, we increment V[X].
when we increment V[X] we also check if V[I-X] is >0. If it is, we have a solution.

public static boolean findSum2(int[] a, int sum) {
if (a.length == 0) {
return false;
}
Arrays.sort(a);
int i = 0;
int j = a.length - 1;
while (i < j) {
int tmp = a[i] + a[j];
if (tmp == sum) {
System.out.println(a[i] + "+" + a[j] + "=" + sum);
return true;
} else if (tmp > sum) {
j--;
} else {
i++;
}
}
return false;
}

O(n) time and O(1) space
If the array is sorted there is a solution in O(n) time complexity.
Suppose are array is
array = {0, 1, 3, 5, 8, 10, 14}
And our x1 + x2 = k = 13, so output should be= 5, 8
Take two pointers one at start of array, one at end of array
Add both the elements at ptr1 and ptr2
array[ptr1] + array[ptr2]
if sum > k then decrement ptr2 else increment ptr1
Repeat step2 and step3 till ptr1 != ptr2
Same thing explained in detail here. Seems like an Amazon interview Question
http://inder-gnu.blogspot.com/2007/10/find-two-nos-in-array-whose-sum-x.html

for nlogn : Sort the array and for each element [0<=j<len A] , subtract i-A[j] and do a binary search for this element in sorted array.
hashmap (frequency of no, number) should work in O(n).

for each ele in the array
if (sum - ele) is hashed and hashed value is not equal to index of ele
print ele, sum-ele
end-if
Hash ele as key and index as value
end-for

PERL implementation to detect if a sorted array contains two integer that sum up to Number
my #a = (11,3,2,9,12,15);
my #b = sort {$a <=> $b} #a;
my %hash;
my $sum = 14;
my $index = 0;
foreach my $ele (#b) {
my $sum_minus_ele = $sum - $ele;
print "Trace: $ele :: $index :: $sum_minus_ele\n";
if(exists($hash{$sum_minus_ele}) && $hash{$sum_minus_ele} != $index ) {
print "\tElement: ".$ele." :: Sum-ele: ".$sum_minus_ele."\n";
}
$hash{$ele} = $index;
$index++;
}

This might be possible in the following way: Before putting the elements into the hashmap, you can check if the element is greater than the required sum. If it is, you can simply skip that element, else you can proceed with putting it into the hashmap. Its a slight improvement on your algorithm, although the overall time still remains the same.

This can be solved using the UNION-FIND algorithm, which can check in constant time whether an element is into a set.
So, the algorithm would be so :
foundsum0 = false;
foreach (el: array) {
if find (-x): foundsum0 = true;
else union (x);
}
FIND and UNION are constant, O(1).

here is a O(n) solution in java using O(n) extra space. This uses hashSet to implement it
http://www.dsalgo.com/UnsortedTwoSumToK.php

Here is a solution witch takes into account duplicate entries. It is written in javascript and assumes array is sorted. The solution runs in O(n) time and does not use any extra memory aside from variable. Choose a sorting algorithm of choice. (radix O(kn)!) and then run the array through this baby.
var count_pairs = function(_arr,x) {
if(!x) x = 0;
var pairs = 0;
var i = 0;
var k = _arr.length-1;
if((k+1)<2) return pairs;
var halfX = x/2;
while(i<k) {
var curK = _arr[k];
var curI = _arr[i];
var pairsThisLoop = 0;
if(curK+curI==x) {
// if midpoint and equal find combinations
if(curK==curI) {
var comb = 1;
while(--k>=i) pairs+=(comb++);
break;
}
// count pair and k duplicates
pairsThisLoop++;
while(_arr[--k]==curK) pairsThisLoop++;
// add k side pairs to running total for every i side pair found
pairs+=pairsThisLoop;
while(_arr[++i]==curI) pairs+=pairsThisLoop;
} else {
// if we are at a mid point
if(curK==curI) break;
var distK = Math.abs(halfX-curK);
var distI = Math.abs(halfX-curI);
if(distI > distK) while(_arr[++i]==curI);
else while(_arr[--k]==curK);
}
}
return pairs;
}
I solved this during an interview for a large corporation. They took it but not me.
So here it is for everyone.
Start at both side of the array and slowly work your way inwards making sure to count duplicates if they exist.
It only counts pairs but can be reworked to
find the pairs
find pairs < x
find pairs > x
Enjoy and don't forget to bump if its the best solution!

Split the array into two groups <= I/2 and > I/2. Then split those into <= I/4,>I/4 and <= 3I/4,>3I/4
And repeat for log(I) steps and check the pairs joining from the outside e.g 1I/8<= and >7I/8 and if they both contain at least one element then they add to I.
This will take n.Log(I) + n/2 steps and for I

An implementation in python
def func(list,k):
temp={} ## temporary dictionary
for i in range(len(list)):
if(list[i] in temp): ## if temp already has the key just increment its value
temp[list[i]] +=1
else: ## else initialize the key in temp with count as 0
temp[list[i]]=0
if(k-list[i] in temp and ((k/2 != list[i]) or temp[list[i]]>=1)): ## if the corresponding other value to make the sum k is in the dictionary and its either not k/2 or the count for that number is more than 1
return True
return False
Input:
list is a list of numbers (A in the question above)...
k is the sum (I in the question above)....
The function outputs True if there exist a pair in the list whose sum is equal to k and False otherwise...
I am using a dictionary whose key is the element in the array(list) and value is the count of that element(number of times that element is present in that list).
Average running time complexity is O(n).
This implementation also takes care of two important edge cases:
repeated numbers in the list and
not adding the same number twice.

Related

Find continuous subarrays that have at least 1 pair adding up to target sum - Optimization

I took this assessment that had this prompt, and I was able to pass 18/20 tests, but not the last 2 due to hitting the execution time limit. Unfortunately, the input values were not displayed for these tests.
Prompt:
// Given an array of integers **a**, find how many of its continuous subarrays of length **m** that contain at least 1 pair of integers with a sum equal to **k**
Example:
const a = [1,2,3,4,5,6,7];
const m = 5, k = 5;
solution(a, m, k) will yield 2, because there are 2 subarrays in a that have at least 1 pair that add up to k
a[0]...a[4] - [1,2,3,4,5] - 2 + 3 = k ✓
a[1]...a[5] - [2,3,4,5,6] - 2 + 3 = k ✓
a[2]...a[6] - [3,4,5,6,7] - no two elements add up to k ✕
Here was my solution:
// strategy: check each subarray if it contains a two sum pair
// time complexity: O(n * m), where n is the size of a and m is the subarray length
// space complexity: O(m), where m is the subarray length
function solution(a, m, k) {
let count = 0;
for(let i = 0; i <= a.length - m; i++){
let set = new Set();
for(let j = i; j < i + m; j++){
if(set.has(k - a[j])){
count++;
break;
}
else
set.add(a[j]);
}
}
return count;
}
I thought of ways to optimize this algo, but failed to come up with any. Is there any way this can be optimized further for time complexity - perhaps for any edge cases?
Any feedback would be much appreciated!
maintain a map of highest position of the last m values (add/remove/query is O(1)) and highest position of the first value of a complementary pair
for each array element, check if complementary element is in the map, update the highest position if necessary.
if at least m elements were processed and higest position is in the range, increase counter
O(n) overall. Python:
def solution(a, m, k):
count = 0
last_pos = {} # value: last position observed
max_complement_pos = -1
for head, num in enumerate(a, 1): # advance head by one
tail = head - m
# deletion part is to keep space complexity O(m).
# If this is not a concern (likely), safe to omit
if tail > 0 and last_pos[a[tail]] <= tail: # time to pop last element
del last_pos[a[tail]]
max_complement_pos = max(max_complement_pos, last_pos.get(k-num, -1))
count += head >= m and max_complement_pos > tail
last_pos[num] =head # add element at head
return count
Create a counting hash: elt -> count.
When the window moves:
add/increment the new element
decrement the departing element
check if (k - new_elt) is in your hash with a count >= 1. If it is, you've found a good subarray.

Specific Max Sum of the elements of an Int array - C/C++

Let's say we have an array: 7 3 1 1 6 13 8 3 3
I have to find the maximum sum of this array such that:
if i add 13 to the sum: i cannot add the neighboring elements from each side: 6 1 and 8 3 cannot be added to the sum
i can skip as many elements as necessary to make the sum max
My algorithm was this:
I take the max element of the array and add that to the sum
I make that element and the neighbor elements -1
I keep doing this until it's not possible to find anymore max
The problem is that for some specific test cases this algorithm is wrong.
Lets see this one: 15 40 45 35
according to my algorithm:
I take 45 and make neighbors -1
The program ends
The correct way to do it is 15 + 35 = 50
This problem can be solved with dynamic programming.
Let A be the array, let DP[m] be the max sum in {A[1]~A[m]}
Every element in A only have two status, been added into the sum or not. First we suppose we have determine DP[1]~DP[m-1], now look at {A[1]~A[m]}, A[m] only have two status that we have said, if A[m] have been added into, A[m-1] and A[m-2] can't be added into the sum, so in add status, the max sum is A[m]+DP[m-3] (intention: DP[m-3] has been the max sum in {A[1]~A[m-3]}), if A[m] have not been added into the sum, the max sum is DP[m-1], so we just need to compare A[m]+DP[m-3] and DP[m-1], the bigger is DP[m]. The thought is the same as mathematical induction.
So the DP equation is DP[m] = max{ DP[m-3]+A[m], DP[m-1] },DP[size(A)] is the result
The complexity is O(n), pseudocode is follow:
DP[1] = A[1];
DP[2] = max(DP[1], DP[2]);
DP[3] = max(DP[1], DP[2], DP[3]);
for(i = 4; i <= size(A); i++) {
DP[i] = DP[i-3] + A[i];
if(DP[i] < DP[i-1])
DP[i] = DP[i-1];
}
It's solvable with a dynamic programming approach, taking O(N) time and O(N) space. Implementation following:
int max_sum(int pos){
if( pos >= N){ // N = array_size
return 0;
}
if( visited(pos) == true ){ // if this state is already checked
return ret[pos]; // ret[i] contains the result for i'th cell
}
ret[pos] = max_sum(pos+3) + A[pos] + ret[pos-2]; // taking this item
ret[pos] = max(ret[pos], ret[pos-1]+ max_sum(pos+1) ); // if skipping this item is better
visited[pos] = true;
return ret[pos];
}
int main(){
// clear the visited array
// and other initializations
cout << max_sum(2) << endl; //for i < 2, ret[i] = A[i]
}
The above problem is max independent set problem (with twist) in a path graph which has dynamic programming solution in O(N).
Recurrence relation for solving it : -
Max(N) = maximum(Max(N-3) + A[N] , Max(N-1))
Explanation:- IF we have to select maximum set from N elements than we can either select Nth element and the maximum set from first N-3 element or we can select maximum from first N-1 elements excluding Nth element.
Pseudo Code : -
Max(1) = A[1];
Max(2) = maximum(A[1],A[2]);
Max(3) = maximum(A[3],Max(2));
for(i=4;i<=N;i++) {
Max(N) = maximum(Max(N-3)+A[N],Max(N-1));
}
As suggested, this is a dynamic programming problem.
First, some notation, Let:
A be the array, of integers, of length N
A[a..b) be the subset of A containing the elements at index a up to
but not including b (the half open interval).
M be an array such that M[k] is the specific max sum of A[0..k)
such that M[N] is the answer to our original problem.
We can describe an element of M (M[n]) by its relation to one or more elements of M (M[k]) where k < n. And this lends itself to a nice linear time algorithm. So what is this relationship?
The base cases are as follows:
M[0] is the max specific sum of the empty list, which must be 0.
M[1] is the max specific sum for a single element, so must be
that element: A[0].
M[2] is the max specific sum of the first two elements. With only
two elements, we can either pick the first or the second, so we better
pick the larger of the two: max(A[0], A[1]).
Now, how do we calculate M[n] if we know M[0..n)? Well, we have a choice to make:
Either we add A[n-1] (the last element in A[0..n)) or we don't. We don't know for
certain whether adding A[n-1] in will make for a larger sum, so we try both and take
the max:
If we don't add A[n-1] what would the sum be? It would be the same as the
max specific sum immediately before it: M[n-1].
If we do add A[n-1] then we can't have the previous two elements in our
solution, but we can have any elements before those. We know that M[n-1] and
M[n-2] might have used those previous two elements, but M[n-3] definitely
didn't, because it is the max in the range A[0..n-3). So we get
M[n-3] + A[n-1].
We don't know which one is bigger though, (M[n-1] or M[n-3] + A[n-1]), so to find
the max specific sum at M[n] we must take the max of those two.
So the relation becomes:
M[0] = 0
M[1] = A[0]
M[2] = max {A[0], A[1]}
M[n] = max {M[n-1], M[n-3] + A[n-1]} where n > 2
Note a lot of answers seem to ignore the case for the empty list, but it is
definitely a valid input, so should be accounted for.
The simple translation of the solution in C++ is as follows:
(Take special note of the fact that the size of m is one bigger than the size of a)
int max_specific_sum(std::vector<int> a)
{
std::vector<int> m( a.size() + 1 );
m[0] = 0; m[1] = a[0]; m[2] = std::max(a[0], a[1]);
for(unsigned int i = 3; i <= a.size(); ++i)
m[i] = std::max(m[i-1], m[i-3] + a[i-1]);
return m.back();
}
BUT This implementation has a linear space requirement in the size of A. If you look at the definition of M[n], you will see that it only relies on M[n-1] and M[n-3] (and not the whole preceding list of elements), and this means you need only store the previous 3 elements in M, resulting in a constant space requirement. (The details of this implementation are left to the OP).

Calculate majority element in an array

Last week I appeared in an interview. I was given the following question:
Given an array of 2n elements, and out of this n elements are same, and the remaining are all different. Find the element that repeats n times.
There is no restriction on the range of the elements.
Can someone please give me an efficient algorithm to solve this?
"Array of 2n elements is given, and out of this n elements are same, and remaining are all different. Find the element that repeats n time."
This can be done in O(n) with the following algorithm:
1) Iterate over the array, checking to see if any elements [i] and [i+1] are the same.
2) Iterate over the array, checking to see if any elements [i] and [i+2] are the same.
3) If n = 2 (and thus length = 4), check if 0 and 3 are the same.
Explanation:
Call the matching elements m and the non-matching elements r.
For n = 2, we can construct mmrr, mrmr and mrrm - so we must check for gap size 0, 1 and the only place we can have gap size 2.
For n > 2, we cannot construct the array with no gaps of size 0 or 1. For example for n = 3, you have to start like this: mrrmr... but then you must place an m. Similarly for n = 4, mrrmrrmm - having no gaps of size 0 or 1 would require ms to be outnumbered by rs by more and more as n increases. Proving this is easy.
You just need to find two elements that are the same.
One idea would be:
Get one element from the 2n elements.
If it is not in the a Set, put it in.
Repeat until you find one that is in that set.
Well if complexity doesn't matter, one naive way would be to use two loops, which is for the worst case O(n^2).
for(int i = 0; i < array.size(); i++){
for(int j = i + 1; j < array.size(); j++){
if(array[i] == array[j]){
// element found
}
}
If the first four elements are all distinct then the array must contain a consecutive pair of the target element...
int find(int A[n])
{
// check first four elements (10 iterations = O(1))
for (int i = 0; i < 4; i++)
for (int j = i+1; j < 4; j++)
if (A[i] == A[j])
return A[i];
// find the consecutive pair (n-4 iterations = O(n))
for (int i = 3; i < n-1; i++)
if (A[i] == A[i+1])
return A[i];
// unreachable if input matches preconditions
throw invald_input;
}
This is optimally O(n) time with a single pass and O(1) space.
If you find one element twice, that is the element as the questions says : Array of 2n elements is given, and out of this n elements are same, and remaining are all different. Find the element that repeats n time.
You have array of 2n element and half are same and remaining are different so , consider following case,
ARRAY[2n] = {N,10,N,878,85778,N......};
or
Array[2n] = {10,N,10,N,44,N......};
And so on now simple case in for loop like,
if(ARRAY[i] == ARRAY[i+1])
{
//Your similar element :)
}
I think that the problem should be "find the element that appears at least n+1 times", if it appears only n times they can be two.
Assuming that there is such an element in the input the following algorithm can be used.
input array of 2*n elements;
int candidate = input[0];
int count = 1;
for (int i = 1; i < 2*n; ++i) {
if (input[i] == candidate) {
count++;
} else {
count --;
if (count == 0) candidate = input[i];
}
}
return candidate;
if the request is to find if there is an element present n+1 times another traversal is required to find if the element found at previous step appears n + 1 times.
Edit:
It has been suggested that the n elements with the same value are contiguous.
If this is the case just use the above algorithm and stop when count reaches n.

Find the maximum element which is common in two arrays?

Given two arrays, how to find the maximum element which is common to both the arrays?
I was thinking of sorting both the arrays(n log n) and then perform the binary search of every element from one sorted array(starting from larger one) in another array until match is found.
eg:
a = [1,2,5,4,3]
b = [9,8,3]
Maximum common element in these array is 3
Can we do better than n log n?
With some extra space you could hash in 1 array, then do a contains on each element of the other array keeping track of the biggest value that returns true. Would be O(n).
You can but using O(N) space.
Just go through the first array and place all elements in a HashTable. This is O(N)
Then go through the second array keeping track of the current maximum and checking if the element is in the HashTable. This is also O(N) .
So total runtime is O(N) and O(N) extra space for the HashTable
Example in Java:
public static int getMaxCommon(int[] a, int[] b){
Set<Integer> firstArray = new HashSet<Integer>(Arrays.asList(a));
int currentMax = Integer.MIN_VALUE;
for(Integer n:b){
if(firstArray.contains(n)){
if(currentMax < n){
currentMax = n
}
}
}
return currentMax;
}
While it depends on the time complexities of the various operations in specific languages, how about creating sets from the arrays and finding the maximum value in the intersection of the two sets? Going by the time complexities for operations in Python, it'd be, on average, O(n) for the set assignments, O(n) for the intersections, and O(n) for finding the max value. So average case would be O(n).
However! Worst-case would be O(len(a) * len(b)) -> O(n^2), because of the worst-case time complexity of set intersections.
More info here, if you're interested: http://wiki.python.org/moin/TimeComplexity
If you already know the range of numbers that would be in your arrays, you could perform counting sort, and then perform the binary search like you wanted. This would yield O(n) runtime.
Pseudocode:
sort list1 in descending order
sort list2 in descending order
item *p1 = list1
item *p2 = list2
while ((*p1 != *p2) && (haven't hit the end of either list))
if (*p1 > *p2)
++p1;
else
++p2;
// here, either we have *p1 == *p2, or we hit the end of one of the lists
if (*p1 == *p2)
return *p1;
return NOT_FOUND;
Not a perfect, but a simple solution, O(len(array1) + len(array2))
import sys
def find_max_in_common(array1, array2):
array1 = set(array1)
array2 = set(array2)
item_lookup = {}
for item in array1:
item_lookup[item] = True
max_item = -sys.maxsize
intersection = False
for item in array2:
if not item_lookup.get(item, None):
continue
else:
intersection = True
if item > max_item:
max_item = item
return None if not intersection else max_item
Solution using binary search (and invariants to "proof" correctness):
from bisect import bisect_left
def findLargestCommon(nums0, nums1):
# Find the largest common element of two sorted lists
if not nums0 or not nums1:
return None
i = len(nums0) - 1
j = len(nums1) - 1
if nums0[i] == nums1[j]:
return nums0[i]
elif nums0[i] > nums1[j]:
nums0, nums1 = nums1, nums0
i, j = j, i
while i >= 0 and j > 0:
# nums0[i] < nums1[j]
# look for nums0[i] in nums1[:j]
jj = bisect_left(nums1, nums0[i], hi=j)
if jj == j:
# nums1[j-1] < nums0[i] < nums1[j]
j -= 1
# nums1[j] < nums0[i]
nums0, nums1 = nums1, nums0
i, j = j, i
# nums0[i] < nums1[j]
# (jj != j) and nums1[jj] >= nums0[i]
elif nums1[jj] == nums0[i]:
return nums0[i]
else: # nums1[jj] > nums0[i]
j = jj
# nums0[i] < nums1[j]
return None

array- having some issues [duplicate]

An interesting interview question that a colleague of mine uses:
Suppose that you are given a very long, unsorted list of unsigned 64-bit integers. How would you find the smallest non-negative integer that does not occur in the list?
FOLLOW-UP: Now that the obvious solution by sorting has been proposed, can you do it faster than O(n log n)?
FOLLOW-UP: Your algorithm has to run on a computer with, say, 1GB of memory
CLARIFICATION: The list is in RAM, though it might consume a large amount of it. You are given the size of the list, say N, in advance.
If the datastructure can be mutated in place and supports random access then you can do it in O(N) time and O(1) additional space. Just go through the array sequentially and for every index write the value at the index to the index specified by value, recursively placing any value at that location to its place and throwing away values > N. Then go again through the array looking for the spot where value doesn't match the index - that's the smallest value not in the array. This results in at most 3N comparisons and only uses a few values worth of temporary space.
# Pass 1, move every value to the position of its value
for cursor in range(N):
target = array[cursor]
while target < N and target != array[target]:
new_target = array[target]
array[target] = target
target = new_target
# Pass 2, find first location where the index doesn't match the value
for cursor in range(N):
if array[cursor] != cursor:
return cursor
return N
Here's a simple O(N) solution that uses O(N) space. I'm assuming that we are restricting the input list to non-negative numbers and that we want to find the first non-negative number that is not in the list.
Find the length of the list; lets say it is N.
Allocate an array of N booleans, initialized to all false.
For each number X in the list, if X is less than N, set the X'th element of the array to true.
Scan the array starting from index 0, looking for the first element that is false. If you find the first false at index I, then I is the answer. Otherwise (i.e. when all elements are true) the answer is N.
In practice, the "array of N booleans" would probably be encoded as a "bitmap" or "bitset" represented as a byte or int array. This typically uses less space (depending on the programming language) and allows the scan for the first false to be done more quickly.
This is how / why the algorithm works.
Suppose that the N numbers in the list are not distinct, or that one or more of them is greater than N. This means that there must be at least one number in the range 0 .. N - 1 that is not in the list. So the problem of find the smallest missing number must therefore reduce to the problem of finding the smallest missing number less than N. This means that we don't need to keep track of numbers that are greater or equal to N ... because they won't be the answer.
The alternative to the previous paragraph is that the list is a permutation of the numbers from 0 .. N - 1. In this case, step 3 sets all elements of the array to true, and step 4 tells us that the first "missing" number is N.
The computational complexity of the algorithm is O(N) with a relatively small constant of proportionality. It makes two linear passes through the list, or just one pass if the list length is known to start with. There is no need to represent the hold the entire list in memory, so the algorithm's asymptotic memory usage is just what is needed to represent the array of booleans; i.e. O(N) bits.
(By contrast, algorithms that rely on in-memory sorting or partitioning assume that you can represent the entire list in memory. In the form the question was asked, this would require O(N) 64-bit words.)
#Jorn comments that steps 1 through 3 are a variation on counting sort. In a sense he is right, but the differences are significant:
A counting sort requires an array of (at least) Xmax - Xmin counters where Xmax is the largest number in the list and Xmin is the smallest number in the list. Each counter has to be able to represent N states; i.e. assuming a binary representation it has to have an integer type (at least) ceiling(log2(N)) bits.
To determine the array size, a counting sort needs to make an initial pass through the list to determine Xmax and Xmin.
The minimum worst-case space requirement is therefore ceiling(log2(N)) * (Xmax - Xmin) bits.
By contrast, the algorithm presented above simply requires N bits in the worst and best cases.
However, this analysis leads to the intuition that if the algorithm made an initial pass through the list looking for a zero (and counting the list elements if required), it would give a quicker answer using no space at all if it found the zero. It is definitely worth doing this if there is a high probability of finding at least one zero in the list. And this extra pass doesn't change the overall complexity.
EDIT: I've changed the description of the algorithm to use "array of booleans" since people apparently found my original description using bits and bitmaps to be confusing.
Since the OP has now specified that the original list is held in RAM and that the computer has only, say, 1GB of memory, I'm going to go out on a limb and predict that the answer is zero.
1GB of RAM means the list can have at most 134,217,728 numbers in it. But there are 264 = 18,446,744,073,709,551,616 possible numbers. So the probability that zero is in the list is 1 in 137,438,953,472.
In contrast, my odds of being struck by lightning this year are 1 in 700,000. And my odds of getting hit by a meteorite are about 1 in 10 trillion. So I'm about ten times more likely to be written up in a scientific journal due to my untimely death by a celestial object than the answer not being zero.
As pointed out in other answers you can do a sort, and then simply scan up until you find a gap.
You can improve the algorithmic complexity to O(N) and keep O(N) space by using a modified QuickSort where you eliminate partitions which are not potential candidates for containing the gap.
On the first partition phase, remove duplicates.
Once the partitioning is complete look at the number of items in the lower partition
Is this value equal to the value used for creating the partition?
If so then it implies that the gap is in the higher partition.
Continue with the quicksort, ignoring the lower partition
Otherwise the gap is in the lower partition
Continue with the quicksort, ignoring the higher partition
This saves a large number of computations.
To illustrate one of the pitfalls of O(N) thinking, here is an O(N) algorithm that uses O(1) space.
for i in [0..2^64):
if i not in list: return i
print "no 64-bit integers are missing"
Since the numbers are all 64 bits long, we can use radix sort on them, which is O(n). Sort 'em, then scan 'em until you find what you're looking for.
if the smallest number is zero, scan forward until you find a gap. If the smallest number is not zero, the answer is zero.
For a space efficient method and all values are distinct you can do it in space O( k ) and time O( k*log(N)*N ). It's space efficient and there's no data moving and all operations are elementary (adding subtracting).
set U = N; L=0
First partition the number space in k regions. Like this:
0->(1/k)*(U-L) + L, 0->(2/k)*(U-L) + L, 0->(3/k)*(U-L) + L ... 0->(U-L) + L
Find how many numbers (count{i}) are in each region. (N*k steps)
Find the first region (h) that isn't full. That means count{h} < upper_limit{h}. (k steps)
if h - count{h-1} = 1 you've got your answer
set U = count{h}; L = count{h-1}
goto 2
this can be improved using hashing (thanks for Nic this idea).
same
First partition the number space in k regions. Like this:
L + (i/k)->L + (i+1/k)*(U-L)
inc count{j} using j = (number - L)/k (if L < number < U)
find first region (h) that doesn't have k elements in it
if count{h} = 1 h is your answer
set U = maximum value in region h L = minimum value in region h
This will run in O(log(N)*N).
I'd just sort them then run through the sequence until I find a gap (including the gap at the start between zero and the first number).
In terms of an algorithm, something like this would do it:
def smallest_not_in_list(list):
sort(list)
if list[0] != 0:
return 0
for i = 1 to list.last:
if list[i] != list[i-1] + 1:
return list[i-1] + 1
if list[list.last] == 2^64 - 1:
assert ("No gaps")
return list[list.last] + 1
Of course, if you have a lot more memory than CPU grunt, you could create a bitmask of all possible 64-bit values and just set the bits for every number in the list. Then look for the first 0-bit in that bitmask. That turns it into an O(n) operation in terms of time but pretty damned expensive in terms of memory requirements :-)
I doubt you could improve on O(n) since I can't see a way of doing it that doesn't involve looking at each number at least once.
The algorithm for that one would be along the lines of:
def smallest_not_in_list(list):
bitmask = mask_make(2^64) // might take a while :-)
mask_clear_all (bitmask)
for i = 1 to list.last:
mask_set (bitmask, list[i])
for i = 0 to 2^64 - 1:
if mask_is_clear (bitmask, i):
return i
assert ("No gaps")
Sort the list, look at the first and second elements, and start going up until there is a gap.
We could use a hash table to hold the numbers. Once all numbers are done, run a counter from 0 till we find the lowest. A reasonably good hash will hash and store in constant time, and retrieves in constant time.
for every i in X // One scan Θ(1)
hashtable.put(i, i); // O(1)
low = 0;
while (hashtable.get(i) <> null) // at most n+1 times
low++;
print low;
The worst case if there are n elements in the array, and are {0, 1, ... n-1}, in which case, the answer will be obtained at n, still keeping it O(n).
You can do it in O(n) time and O(1) additional space, although the hidden factor is quite large. This isn't a practical way to solve the problem, but it might be interesting nonetheless.
For every unsigned 64-bit integer (in ascending order) iterate over the list until you find the target integer or you reach the end of the list. If you reach the end of the list, the target integer is the smallest integer not in the list. If you reach the end of the 64-bit integers, every 64-bit integer is in the list.
Here it is as a Python function:
def smallest_missing_uint64(source_list):
the_answer = None
target = 0L
while target < 2L**64:
target_found = False
for item in source_list:
if item == target:
target_found = True
if not target_found and the_answer is None:
the_answer = target
target += 1L
return the_answer
This function is deliberately inefficient to keep it O(n). Note especially that the function keeps checking target integers even after the answer has been found. If the function returned as soon as the answer was found, the number of times the outer loop ran would be bound by the size of the answer, which is bound by n. That change would make the run time O(n^2), even though it would be a lot faster.
Thanks to egon, swilden, and Stephen C for my inspiration. First, we know the bounds of the goal value because it cannot be greater than the size of the list. Also, a 1GB list could contain at most 134217728 (128 * 2^20) 64-bit integers.
Hashing part
I propose using hashing to dramatically reduce our search space. First, square root the size of the list. For a 1GB list, that's N=11,586. Set up an integer array of size N. Iterate through the list, and take the square root* of each number you find as your hash. In your hash table, increment the counter for that hash. Next, iterate through your hash table. The first bucket you find that is not equal to it's max size defines your new search space.
Bitmap part
Now set up a regular bit map equal to the size of your new search space, and again iterate through the source list, filling out the bitmap as you find each number in your search space. When you're done, the first unset bit in your bitmap will give you your answer.
This will be completed in O(n) time and O(sqrt(n)) space.
(*You could use use something like bit shifting to do this a lot more efficiently, and just vary the number and size of buckets accordingly.)
Well if there is only one missing number in a list of numbers, the easiest way to find the missing number is to sum the series and subtract each value in the list. The final value is the missing number.
int i = 0;
while ( i < Array.Length)
{
if (Array[i] == i + 1)
{
i++;
}
if (i < Array.Length)
{
if (Array[i] <= Array.Length)
{//SWap
int temp = Array[i];
int AnoTemp = Array[temp - 1];
Array[temp - 1] = temp;
Array[i] = AnoTemp;
}
else
i++;
}
}
for (int j = 0; j < Array.Length; j++)
{
if (Array[j] > Array.Length)
{
Console.WriteLine(j + 1);
j = Array.Length;
}
else
if (j == Array.Length - 1)
Console.WriteLine("Not Found !!");
}
}
Here's my answer written in Java:
Basic Idea:
1- Loop through the array throwing away duplicate positive, zeros, and negative numbers while summing up the rest, getting the maximum positive number as well, and keep the unique positive numbers in a Map.
2- Compute the sum as max * (max+1)/2.
3- Find the difference between the sums calculated at steps 1 & 2
4- Loop again from 1 to the minimum of [sums difference, max] and return the first number that is not in the map populated in step 1.
public static int solution(int[] A) {
if (A == null || A.length == 0) {
throw new IllegalArgumentException();
}
int sum = 0;
Map<Integer, Boolean> uniqueNumbers = new HashMap<Integer, Boolean>();
int max = A[0];
for (int i = 0; i < A.length; i++) {
if(A[i] < 0) {
continue;
}
if(uniqueNumbers.get(A[i]) != null) {
continue;
}
if (A[i] > max) {
max = A[i];
}
uniqueNumbers.put(A[i], true);
sum += A[i];
}
int completeSum = (max * (max + 1)) / 2;
for(int j = 1; j <= Math.min((completeSum - sum), max); j++) {
if(uniqueNumbers.get(j) == null) { //O(1)
return j;
}
}
//All negative case
if(uniqueNumbers.isEmpty()) {
return 1;
}
return 0;
}
As Stephen C smartly pointed out, the answer must be a number smaller than the length of the array. I would then find the answer by binary search. This optimizes the worst case (so the interviewer can't catch you in a 'what if' pathological scenario). In an interview, do point out you are doing this to optimize for the worst case.
The way to use binary search is to subtract the number you are looking for from each element of the array, and check for negative results.
I like the "guess zero" apprach. If the numbers were random, zero is highly probable. If the "examiner" set a non-random list, then add one and guess again:
LowNum=0
i=0
do forever {
if i == N then leave /* Processed entire array */
if array[i] == LowNum {
LowNum++
i=0
}
else {
i++
}
}
display LowNum
The worst case is n*N with n=N, but in practice n is highly likely to be a small number (eg. 1)
I am not sure if I got the question. But if for list 1,2,3,5,6 and the missing number is 4, then the missing number can be found in O(n) by:
(n+2)(n+1)/2-(n+1)n/2
EDIT: sorry, I guess I was thinking too fast last night. Anyway, The second part should actually be replaced by sum(list), which is where O(n) comes. The formula reveals the idea behind it: for n sequential integers, the sum should be (n+1)*n/2. If there is a missing number, the sum would be equal to the sum of (n+1) sequential integers minus the missing number.
Thanks for pointing out the fact that I was putting some middle pieces in my mind.
Well done Ants Aasma! I thought about the answer for about 15 minutes and independently came up with an answer in a similar vein of thinking to yours:
#define SWAP(x,y) { numerictype_t tmp = x; x = y; y = tmp; }
int minNonNegativeNotInArr (numerictype_t * a, size_t n) {
int m = n;
for (int i = 0; i < m;) {
if (a[i] >= m || a[i] < i || a[i] == a[a[i]]) {
m--;
SWAP (a[i], a[m]);
continue;
}
if (a[i] > i) {
SWAP (a[i], a[a[i]]);
continue;
}
i++;
}
return m;
}
m represents "the current maximum possible output given what I know about the first i inputs and assuming nothing else about the values until the entry at m-1".
This value of m will be returned only if (a[i], ..., a[m-1]) is a permutation of the values (i, ..., m-1). Thus if a[i] >= m or if a[i] < i or if a[i] == a[a[i]] we know that m is the wrong output and must be at least one element lower. So decrementing m and swapping a[i] with the a[m] we can recurse.
If this is not true but a[i] > i then knowing that a[i] != a[a[i]] we know that swapping a[i] with a[a[i]] will increase the number of elements in their own place.
Otherwise a[i] must be equal to i in which case we can increment i knowing that all the values of up to and including this index are equal to their index.
The proof that this cannot enter an infinite loop is left as an exercise to the reader. :)
The Dafny fragment from Ants' answer shows why the in-place algorithm may fail. The requires pre-condition describes that the values of each item must not go beyond the bounds of the array.
method AntsAasma(A: array<int>) returns (M: int)
requires A != null && forall N :: 0 <= N < A.Length ==> 0 <= A[N] < A.Length;
modifies A;
{
// Pass 1, move every value to the position of its value
var N := A.Length;
var cursor := 0;
while (cursor < N)
{
var target := A[cursor];
while (0 <= target < N && target != A[target])
{
var new_target := A[target];
A[target] := target;
target := new_target;
}
cursor := cursor + 1;
}
// Pass 2, find first location where the index doesn't match the value
cursor := 0;
while (cursor < N)
{
if (A[cursor] != cursor)
{
return cursor;
}
cursor := cursor + 1;
}
return N;
}
Paste the code into the validator with and without the forall ... clause to see the verification error. The second error is a result of the verifier not being able to establish a termination condition for the Pass 1 loop. Proving this is left to someone who understands the tool better.
Here's an answer in Java that does not modify the input and uses O(N) time and N bits plus a small constant overhead of memory (where N is the size of the list):
int smallestMissingValue(List<Integer> values) {
BitSet bitset = new BitSet(values.size() + 1);
for (int i : values) {
if (i >= 0 && i <= values.size()) {
bitset.set(i);
}
}
return bitset.nextClearBit(0);
}
def solution(A):
index = 0
target = []
A = [x for x in A if x >=0]
if len(A) ==0:
return 1
maxi = max(A)
if maxi <= len(A):
maxi = len(A)
target = ['X' for x in range(maxi+1)]
for number in A:
target[number]= number
count = 1
while count < maxi+1:
if target[count] == 'X':
return count
count +=1
return target[count-1] + 1
Got 100% for the above solution.
1)Filter negative and Zero
2)Sort/distinct
3)Visit array
Complexity: O(N) or O(N * log(N))
using Java8
public int solution(int[] A) {
int result = 1;
boolean found = false;
A = Arrays.stream(A).filter(x -> x > 0).sorted().distinct().toArray();
//System.out.println(Arrays.toString(A));
for (int i = 0; i < A.length; i++) {
result = i + 1;
if (result != A[i]) {
found = true;
break;
}
}
if (!found && result == A.length) {
//result is larger than max element in array
result++;
}
return result;
}
An unordered_set can be used to store all the positive numbers, and then we can iterate from 1 to length of unordered_set, and see the first number that does not occur.
int firstMissingPositive(vector<int>& nums) {
unordered_set<int> fre;
// storing each positive number in a hash.
for(int i = 0; i < nums.size(); i +=1)
{
if(nums[i] > 0)
fre.insert(nums[i]);
}
int i = 1;
// Iterating from 1 to size of the set and checking
// for the occurrence of 'i'
for(auto it = fre.begin(); it != fre.end(); ++it)
{
if(fre.find(i) == fre.end())
return i;
i +=1;
}
return i;
}
Solution through basic javascript
var a = [1, 3, 6, 4, 1, 2];
function findSmallest(a) {
var m = 0;
for(i=1;i<=a.length;i++) {
j=0;m=1;
while(j < a.length) {
if(i === a[j]) {
m++;
}
j++;
}
if(m === 1) {
return i;
}
}
}
console.log(findSmallest(a))
Hope this helps for someone.
With python it is not the most efficient, but correct
#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
import datetime
# write your code in Python 3.6
def solution(A):
MIN = 0
MAX = 1000000
possible_results = range(MIN, MAX)
for i in possible_results:
next_value = (i + 1)
if next_value not in A:
return next_value
return 1
test_case_0 = [2, 2, 2]
test_case_1 = [1, 3, 44, 55, 6, 0, 3, 8]
test_case_2 = [-1, -22]
test_case_3 = [x for x in range(-10000, 10000)]
test_case_4 = [x for x in range(0, 100)] + [x for x in range(102, 200)]
test_case_5 = [4, 5, 6]
print("---")
a = datetime.datetime.now()
print(solution(test_case_0))
print(solution(test_case_1))
print(solution(test_case_2))
print(solution(test_case_3))
print(solution(test_case_4))
print(solution(test_case_5))
def solution(A):
A.sort()
j = 1
for i, elem in enumerate(A):
if j < elem:
break
elif j == elem:
j += 1
continue
else:
continue
return j
this can help:
0- A is [5, 3, 2, 7];
1- Define B With Length = A.Length; (O(1))
2- initialize B Cells With 1; (O(n))
3- For Each Item In A:
if (B.Length <= item) then B[Item] = -1 (O(n))
4- The answer is smallest index in B such that B[index] != -1 (O(n))

Resources