Avoiding duplications when comparing array elements - algorithm

The question is something like: Go through an array and find pairs of elements that add up to a certain sum k.
for (auto i : array) {
for (auto j : array) {
if (i+j==k) {
*Do something
}
}
}
Say we had array = [1,2,5] and k=3; when i=1 and j=2, we would execute the Do something. But when i=2 and j=1, we would execute Do something again, even though we have already found 2 elements and we would be repeating the answer.
Essentially, how can one go through an array and avoid comparing the same 2 elements multiple times?

If order doesn't matter, then decide which half of all pairs you want: the one where i >= j, or the one where i <= j.
for (i=0...)
for (j=i...)
Like rici commented, this is a very slow (O(N^2)) algorithm compared to sorting (a copy of) the vector. I think then you could iterate i forwards from the start, and j backwards from the end, comparing i+j <= k to decide whether to modify i or j next.

If I am not getting your question wrong then this explained procedure could be a way to solve it.
Sort your array in increasing order ( Preferably merge sort because it takes O(n.log(n)) time ).
Now, take two indices let call them low and high, where low is index to the left-most smallest number of the sorted array and high is index to the right-most largest number of the same sorted array.
Now, do like this :
while(low < high) {
if(arr[low] + arr[high] == k) {
print "unique pair found"
high = high - 1;
low = low + 1;
}
else if(arr[low] + arr[high] > k){
high = high - 1;
}
else {
low = low + 1;
}
}
It's time complexity is O(n), it will definitely give what you are looking for ? (Any doubt most welcome).

You could put the results of i + j in a vector, then sort it and then remove duplicates, and finally go over the vector and "do something".

Related

Comparing between two arrays

How can I compare between two arrays with sorted contents of integer in binary algorithm?
As in every case: it depends.
Assuming that the arrays are ordered or hashed the time complexity is at most O(n+m).
You did not mention any language, so it's pseudo code.
function SortedSequenceOverlap(Enumerator1, Enumerator2)
{ while (Enumerator1 is not at the end and Enumerator2 is not at the end)
{ if (Enumerator1.current > Enumerator2.current)
Enumerator2.fetchNext()
else if (Enumerator2.current > Enumerator1.current)
Enumerator1.fetchNext()
else
return true
}
return false
}
If the sort order is descending you need to use a reverse enumerator for this array.
However, this is not always the fastest way.
If one of the arrays have significantly different size it could be more efficient to use binary search for a few elements of the elements of the shorter array.
This can be even more improved because when you start with the median element of the small array you need not do a full search for any further element. Any element before the median element must be in the range before the location where the median element was not found and any element after the median element must be in the upper range of the large array. This can be applied recursively until all elements have been located. Once you get a hit, you can abort.
The disadvantage of this method is that it takes more time in worst case, i.e. O(n log m), and it requires random access to the arrays which might impact cache efficiency.
On the other side, multiplying with a small number (log m) could be better than adding a large number (m). In contrast to the above algorithm typically only a few elements of the large array have to be accessed.
The break even is approximately when log m is less than m/n, where n is the smaller number.
You think that's it? - no
In case the random access to the larger array causes a higher latency, e.g. because of reduced cache efficiency, it could be even better to do the reverse, i.e. look for the elements of the large array in the small array, starting with the median of the large array.
Why should this be faster? You have to look up much more elements.
Answer:
No, there are no more lookups. As soon as the boundaries where you expect a range of elements of the large array collapses you can stop searching for these elements since you won't find any hits anymore.
In fact the number of comparisons is exactly the same.
The difference is that a single element of the large array is compared against different elements of the small array in the first step. This takes only one slow access for a bunch of comparisons while the other way around you need to access the same element several times with some other elements accesses in between.
So there are less slow accesses at the expense of more fast ones.
(I implemented search as you type this way about 30 years ago where access to the large index required disk I/O.)
If you know that they are sorted, then you can have a pointer to the beginning of each array, and move on both arrays, and moving one of the pointers up (or down) after each comparison. That would make it O(n). Not sure you could bisect anything as you don't know where the common number would be.
Still better than the brute force O(n2).
If you know the second array is sorted you can use binary search to look through the second array for elements from the first array.
This can be done in two ways.
a) Binary Search b) Linear Search
For Binary Search - for each element in array A look for element in B with binary search, then in that case the complexity is O(n log n )
For Linear Search - it is O( m + n ) - where m, n are sizes of the arrays. In your case m = n.
Linear search :
Have two indices i, j that point to the arrays A, B
Compare A[i], B[j]
If A[i] < B[j] then increment i, because any match if exists can be found only in later indices in A.
If A[i] > B[j] then increment j, because any match if exists can be found only in later indices in B.
If A[i] == B[j] you found the answer.
Code:
private int findCommonElement(int[] A, int[] B) {
for ( int i = 0, j = 0; i < A.length && j < B.length; ) {
if ( A[i] < B[j] ) {
i++;
} else if ( A[i] > B[j] ) {
j++;
}
return A[i];
}
return -1; //Assuming all integers are positive.
}
Now if you have both descending, just reverse the comparison signs i.e. if A[i] < B[j] increment j else increment i
If you have one descending (B) one ascending (A) then i for A starts from beginning of array and j for B starts from end of array and move them accordingly as shown below:
for (int i = 0, j = B.length - 1; i < A.length && j >= 0; ) {
if ( A[i] < B[j] ) {
i++;
} else if ( A[i] > B[j] ) {
j--;
}
return A[i];
}

Give an analyze an O(n) algorithm to check for nuts/bolts matches in sorted arrays

I have a collection of n nuts, and a collection of n bolts, both arranged in increasing order of size. I need to give an O(n) algorithm to check if
a nut and bolt have the same size. Nut sizes are in an array NUTS[1..n]
bolt sizes are in an array BOLTS[1...n]
I don't need to find all matches, just stop the algorithm even if one match is found. Here what my psuedocode looks like so far
for each n in NUTS
for each b in BOLTS
if BOLTS[i] == NUTS[i]
break;
So for each nut, I'm searching through all the bolts. This is O(n^2), How would I make this O(n)? My understanding is I would probably need to have just one for loop to do this. Sorry my understanding is a little shaky
You can step through the two arrays in the same loop, but using two separate index variables. You always increment the index variable for which the array value is smaller. If neither is smaller, they're equal and you've found a match.
i = 0;
j = 0;
while (i < n and j < n) {
if (NUTS[i] < BOLTS[j]) {
i++;
} else if (NUTS[i] > BOLTS[j]) {
j++;
} else {
return true; // found a match
}
}
return false; // found nothing
The best case is when NUTS[0] == BOLTS[0], because then you enter the final else clause and return true immediately.
The worst case is when you never enter the final else clause, because then you have to iterate the loop all the way to the end. This happens if you always take the first or second if clause, incrementing either i or j. In the worst case you alternate between incrementing i and j, making both of them grow as slowly as possible. This leads to a worst case of 2*n steps before at least one variable exceeds n.
Therefore this algorithm is in O(2*n), which is O(n).
The pseudo-code shared will take O(n^2) in worst case, which is when there is no matching pair.
However you can speed this up to O(nlogn) by using binary search, as the arrays are in sorted order.
foreach A in nuts:
if binarysearch ( Bolts, Bolts+n , A)
break;
Also you can speed this upto O(n) as well using two pointers one in each array and increment them accordingly.
Bolt_index = Nut_index = 0
while ( Bolt_Index < Bolts.size && Nut_Index < Nuts.Size)
while( Bolt_Index < Bolts.Size && Bolts[Bolt_index] < Nuts[Nut_Index] )
Bolt_index++ ;
while( Nut_Index < Nuts.Size && Nuts[Nut_index] < Bolts[Bolt_Index] )
Nut_index++ ;
if Bolt_Index< Bolts.Size && Nut_Index < Nuts.Size && Bolts[Bolt_Index] == Nuts[Nut_Index]
Match_found// break
else
Nut_Index++ // increment either Nut_Index or Bolt_Index

implementing an algorithm that requires minimal memory

I am trying to implement an algorithm to find N-th largest element that requires minimal memory.
Example:
List of integers: 1;9;5;7;2;5. N: 2
after duplicates are removed, the list becomes 1;9;5;7;2.
So, the answer is 7, because 7 is the 2-nd largest element in the modified list.
In the below algorithm i am using bubble sort to sort my list and then removing duplicates without using a temp variable, does that make my program memory efficient ? Any ideas or suggestion
type Integer_array is Array (Natural range <>) of Integer;
procedure FindN-thLargestNumber (A : in out Integer_Array) is
b : Integer;
c:Integer;
begin
//sorting of array
for I in 0 to length(A) loop
for J in 1 to length(A) loop
if A[I] > A[J] then
A[I] = A[I] + A[J];
A[J] = A[I] - A[J];
A[I] = A[I] - A[J];
end if;
end loop;
end loop;
//remove duplicates
for K in 1 to length(A) loop
IF A[b] != A[K] then
b++;
A[b]=A[K];
end loop;
c = ENTER TO FIND N-th Largest number
PRINT A[b-(c-1)] ;
end FindN-th Largest Number
To find the n'th largest element you don't need to sort the main list at all. Note that this algorithm will perform well if N is smaller than M. If N is a large fraction of the list size then then you will be better off just sorting the list.
You just need a sorted list holding your N largest, then pick the smallest from that list (this is all untested so will probably need a couple of tweaks):
int[n] found = new int[n];
for (int i = 0;i<found.length;i++) {
found[i] = Integer.MIN_VALUE;
}
for (int i: list) {
if (i > found[0]) {
int insert = 0;
// Find the point in the array to insert the value
while (insert < found.length && found[insert] < i) {
insert++;
}
// If not at the end we have found a larger value, so move back one before inserting
if (found[insert] >= i) {
insert --;
}
// insert the value and shuffle everything BELOW it down.
for (int j=insert;j<=0;j--) {
int temp = found[j];
found[j]=i;
i=temp;
}
}
}
At the end you have the top N values from your list sorted in order. the first entry in the list is Nth value, the last entry the top value.
If you need the N-th largest element, then you don't need to sort the complete array. You should apply selection sort, but only for the required N steps.
Instead of using bubble sort, use quicksort kind of partial sorting.
Pick a key and using as a pivot move around elements (move all the elements>= pivot to the left of the array)
Count how many unique elements are there that are greater than equal to pivot.
If the number is less than N, then the answer is to the right of the array. Otherwise it is in the left part of the array (left or right as compared to pivot)
Iteratively repeat with smaller array and appropriate N
Complexity is O(n) and you will need constant extra memory.
HeapSort uses constant additional memory, so it has minimal space complexity, albeit it doesn't use a minimal number of variables.
It sorts in O(n log n) time which I think is optimal time complexity for this problem because of the needed to ignore duplicates. I may be wrong.
Of course you don't need to complete the heapsort -- just heapify the array and then pop out the first N non-duplicate largest values.
If you really do want to minimise memory usage to the point that you care about one temporary variable one way or the other, then you probably have to accept terrible performance. This becomes a purely theoretical exercise, though -- it will not make your code more memory efficient in practice, because in non-recursive code there is no practical difference between using, say 64 bytes of stack vs using 128 bytes of stack.

Find triplets in better than linear time such that A[n-1] >= A[n] <= A[n+1]

A sequence of numbers was given in an interview such that A[0] >= A[1] and A[N-1] >= A[N-2]. I was asked to find at-least one triplet such that A[n-1] >= A[n] <= A[n+1].
I tried to solve in iterations. Interviewer expected better than linear time solution. How should I approach this question?
Example: 9 8 5 4 3 2 6 7
Answer: 3 2 6
We can solve this in O(logn) time using divide & conquer aka. binary search. Better than linear time. So we need to find a triplet such that A[n-1] >= A[n] <= A[n+1].
First find the mid of the given array. If mid is smaller than its left and greater than its right. then return, thats your answer. Incidentally this would be a basecase in your recursion. Also if len(arr) < 3 then too return. another basecase.
Now comes the recursion scenarios. When to recurse, we would need to inspect further right. For that, If mid is greater than the element on its left then consider start to left of the array as a subproblem and recurse with this new array. i.e. in tangible terms at this point we would have ...26... with index n being 6. So we move left to see if the element to the left of 2 completes the triplet.
Otherwise if mid is greater than element on its right subarray then consider mid+1 to right of the array as a subproblem and recurse.
More Theory: The above should be sufficient to understand the problem but read on. The problem essentially boils down to finding local minima in a given set of elements. A number in the array is called local minima if it is smaller than both its left and right numbers which precisely boils down to A[n-1] >= A[n] <= A[n+1].
A given array such that its first 2 elements are decreasing and last 2 elements are increasing HAS to have a local minima. Why is that? Lets prove this by negation. If first two numbers are decreasing, and there is no local minima, that means 3rd number is less than 2nd number. otherwise 2nd number would have been local minima. Following the same logic 4th number will have to be less than 3rd number and so on and so forth. So the numbers in the array will have to be in decreasing order. Which violates the constraint of last two numbers being in increasing order. This proves by negation that there need to be a local minima.
The above theory suggests a O(n) linear approach but we definitely can do better. But the theory definitely gives us a different perspective about the problem.
Code: Here's python code (fyi - was typed in stackoverflow text editor freehand, it might misbheave).
def local_minima(arr, start, end):
mid = (start+end)/2
if mid-2 < 0 and mid+1 >= len(arr):
return -1;
if arr[mid-2] > arr[mid-1] and arr[mid-1] < arr[mid]: #found it!
return mid-1;
if arr[mid-1] > arr[mid-2]:
return local_minima(arr, start, mid);
else:
return local_minima(arr, mid, end);
Note that I just return the index of the n. To print out the triple just do -1 and +1 to the returned index. source
It sounds like what you're asking is this:
You have a sequence of numbers. It starts decreasing and continues to decrease until element n, then it starts increasing until the end of the sequence. Find n.
This is a (non-optimal) solution in linear time:
for (i = 1; i < length(A) - 1; i++)
{
if ((A[i-1] >= A[i]) && (A[i] <= A[i+1]))
return i;
}
To do better than linear time, you need to use the information that you get from the fact that the series decreases then increases.
Consider the difference between A[i] and A[i+1]. If A[i] > A[i+1], then n > i, since the values are still decreasing. If A[i] <= A[i+1], then n <= i, since the values are now increasing. In this case you need to check the difference between A[i-1] and A[i].
This is a solution in log time:
int boundUpper = length(A) - 1;
int boundLower = 1;
int i = (boundUpper + boundLower) / 2; //initial estimate
while (true)
{
if (A[i] > A[i+1])
boundLower = i + 1;
else if (A[i-1] >= A[i])
return i;
else
boundUpper = i;
i = (boundLower + boundUpper) / 2;
}
I'll leave it to you to add in the necessary error check in the case that A does not have an element satisfying the criteria.
Linear you could just do by iterating through the set, comparing them all.
You could also check the slope of the first two, then do a kind of binary chop/in order traversal comparing pairs until you find one of the opposite slope. That would amortize to a better than n time, I think, though it's not guaranteed.
edit: just realised what your ordering meant. The binary chop method is guaranteed to do this in <n time, as there is guaranteed to be a point of change (assuming that your N-1, N-2 are the last two elements of the list).
This means you just need to find it/one of them, in which case binary chop will do it in order log(n)

array- having some issues [duplicate]

An interesting interview question that a colleague of mine uses:
Suppose that you are given a very long, unsorted list of unsigned 64-bit integers. How would you find the smallest non-negative integer that does not occur in the list?
FOLLOW-UP: Now that the obvious solution by sorting has been proposed, can you do it faster than O(n log n)?
FOLLOW-UP: Your algorithm has to run on a computer with, say, 1GB of memory
CLARIFICATION: The list is in RAM, though it might consume a large amount of it. You are given the size of the list, say N, in advance.
If the datastructure can be mutated in place and supports random access then you can do it in O(N) time and O(1) additional space. Just go through the array sequentially and for every index write the value at the index to the index specified by value, recursively placing any value at that location to its place and throwing away values > N. Then go again through the array looking for the spot where value doesn't match the index - that's the smallest value not in the array. This results in at most 3N comparisons and only uses a few values worth of temporary space.
# Pass 1, move every value to the position of its value
for cursor in range(N):
target = array[cursor]
while target < N and target != array[target]:
new_target = array[target]
array[target] = target
target = new_target
# Pass 2, find first location where the index doesn't match the value
for cursor in range(N):
if array[cursor] != cursor:
return cursor
return N
Here's a simple O(N) solution that uses O(N) space. I'm assuming that we are restricting the input list to non-negative numbers and that we want to find the first non-negative number that is not in the list.
Find the length of the list; lets say it is N.
Allocate an array of N booleans, initialized to all false.
For each number X in the list, if X is less than N, set the X'th element of the array to true.
Scan the array starting from index 0, looking for the first element that is false. If you find the first false at index I, then I is the answer. Otherwise (i.e. when all elements are true) the answer is N.
In practice, the "array of N booleans" would probably be encoded as a "bitmap" or "bitset" represented as a byte or int array. This typically uses less space (depending on the programming language) and allows the scan for the first false to be done more quickly.
This is how / why the algorithm works.
Suppose that the N numbers in the list are not distinct, or that one or more of them is greater than N. This means that there must be at least one number in the range 0 .. N - 1 that is not in the list. So the problem of find the smallest missing number must therefore reduce to the problem of finding the smallest missing number less than N. This means that we don't need to keep track of numbers that are greater or equal to N ... because they won't be the answer.
The alternative to the previous paragraph is that the list is a permutation of the numbers from 0 .. N - 1. In this case, step 3 sets all elements of the array to true, and step 4 tells us that the first "missing" number is N.
The computational complexity of the algorithm is O(N) with a relatively small constant of proportionality. It makes two linear passes through the list, or just one pass if the list length is known to start with. There is no need to represent the hold the entire list in memory, so the algorithm's asymptotic memory usage is just what is needed to represent the array of booleans; i.e. O(N) bits.
(By contrast, algorithms that rely on in-memory sorting or partitioning assume that you can represent the entire list in memory. In the form the question was asked, this would require O(N) 64-bit words.)
#Jorn comments that steps 1 through 3 are a variation on counting sort. In a sense he is right, but the differences are significant:
A counting sort requires an array of (at least) Xmax - Xmin counters where Xmax is the largest number in the list and Xmin is the smallest number in the list. Each counter has to be able to represent N states; i.e. assuming a binary representation it has to have an integer type (at least) ceiling(log2(N)) bits.
To determine the array size, a counting sort needs to make an initial pass through the list to determine Xmax and Xmin.
The minimum worst-case space requirement is therefore ceiling(log2(N)) * (Xmax - Xmin) bits.
By contrast, the algorithm presented above simply requires N bits in the worst and best cases.
However, this analysis leads to the intuition that if the algorithm made an initial pass through the list looking for a zero (and counting the list elements if required), it would give a quicker answer using no space at all if it found the zero. It is definitely worth doing this if there is a high probability of finding at least one zero in the list. And this extra pass doesn't change the overall complexity.
EDIT: I've changed the description of the algorithm to use "array of booleans" since people apparently found my original description using bits and bitmaps to be confusing.
Since the OP has now specified that the original list is held in RAM and that the computer has only, say, 1GB of memory, I'm going to go out on a limb and predict that the answer is zero.
1GB of RAM means the list can have at most 134,217,728 numbers in it. But there are 264 = 18,446,744,073,709,551,616 possible numbers. So the probability that zero is in the list is 1 in 137,438,953,472.
In contrast, my odds of being struck by lightning this year are 1 in 700,000. And my odds of getting hit by a meteorite are about 1 in 10 trillion. So I'm about ten times more likely to be written up in a scientific journal due to my untimely death by a celestial object than the answer not being zero.
As pointed out in other answers you can do a sort, and then simply scan up until you find a gap.
You can improve the algorithmic complexity to O(N) and keep O(N) space by using a modified QuickSort where you eliminate partitions which are not potential candidates for containing the gap.
On the first partition phase, remove duplicates.
Once the partitioning is complete look at the number of items in the lower partition
Is this value equal to the value used for creating the partition?
If so then it implies that the gap is in the higher partition.
Continue with the quicksort, ignoring the lower partition
Otherwise the gap is in the lower partition
Continue with the quicksort, ignoring the higher partition
This saves a large number of computations.
To illustrate one of the pitfalls of O(N) thinking, here is an O(N) algorithm that uses O(1) space.
for i in [0..2^64):
if i not in list: return i
print "no 64-bit integers are missing"
Since the numbers are all 64 bits long, we can use radix sort on them, which is O(n). Sort 'em, then scan 'em until you find what you're looking for.
if the smallest number is zero, scan forward until you find a gap. If the smallest number is not zero, the answer is zero.
For a space efficient method and all values are distinct you can do it in space O( k ) and time O( k*log(N)*N ). It's space efficient and there's no data moving and all operations are elementary (adding subtracting).
set U = N; L=0
First partition the number space in k regions. Like this:
0->(1/k)*(U-L) + L, 0->(2/k)*(U-L) + L, 0->(3/k)*(U-L) + L ... 0->(U-L) + L
Find how many numbers (count{i}) are in each region. (N*k steps)
Find the first region (h) that isn't full. That means count{h} < upper_limit{h}. (k steps)
if h - count{h-1} = 1 you've got your answer
set U = count{h}; L = count{h-1}
goto 2
this can be improved using hashing (thanks for Nic this idea).
same
First partition the number space in k regions. Like this:
L + (i/k)->L + (i+1/k)*(U-L)
inc count{j} using j = (number - L)/k (if L < number < U)
find first region (h) that doesn't have k elements in it
if count{h} = 1 h is your answer
set U = maximum value in region h L = minimum value in region h
This will run in O(log(N)*N).
I'd just sort them then run through the sequence until I find a gap (including the gap at the start between zero and the first number).
In terms of an algorithm, something like this would do it:
def smallest_not_in_list(list):
sort(list)
if list[0] != 0:
return 0
for i = 1 to list.last:
if list[i] != list[i-1] + 1:
return list[i-1] + 1
if list[list.last] == 2^64 - 1:
assert ("No gaps")
return list[list.last] + 1
Of course, if you have a lot more memory than CPU grunt, you could create a bitmask of all possible 64-bit values and just set the bits for every number in the list. Then look for the first 0-bit in that bitmask. That turns it into an O(n) operation in terms of time but pretty damned expensive in terms of memory requirements :-)
I doubt you could improve on O(n) since I can't see a way of doing it that doesn't involve looking at each number at least once.
The algorithm for that one would be along the lines of:
def smallest_not_in_list(list):
bitmask = mask_make(2^64) // might take a while :-)
mask_clear_all (bitmask)
for i = 1 to list.last:
mask_set (bitmask, list[i])
for i = 0 to 2^64 - 1:
if mask_is_clear (bitmask, i):
return i
assert ("No gaps")
Sort the list, look at the first and second elements, and start going up until there is a gap.
We could use a hash table to hold the numbers. Once all numbers are done, run a counter from 0 till we find the lowest. A reasonably good hash will hash and store in constant time, and retrieves in constant time.
for every i in X // One scan Θ(1)
hashtable.put(i, i); // O(1)
low = 0;
while (hashtable.get(i) <> null) // at most n+1 times
low++;
print low;
The worst case if there are n elements in the array, and are {0, 1, ... n-1}, in which case, the answer will be obtained at n, still keeping it O(n).
You can do it in O(n) time and O(1) additional space, although the hidden factor is quite large. This isn't a practical way to solve the problem, but it might be interesting nonetheless.
For every unsigned 64-bit integer (in ascending order) iterate over the list until you find the target integer or you reach the end of the list. If you reach the end of the list, the target integer is the smallest integer not in the list. If you reach the end of the 64-bit integers, every 64-bit integer is in the list.
Here it is as a Python function:
def smallest_missing_uint64(source_list):
the_answer = None
target = 0L
while target < 2L**64:
target_found = False
for item in source_list:
if item == target:
target_found = True
if not target_found and the_answer is None:
the_answer = target
target += 1L
return the_answer
This function is deliberately inefficient to keep it O(n). Note especially that the function keeps checking target integers even after the answer has been found. If the function returned as soon as the answer was found, the number of times the outer loop ran would be bound by the size of the answer, which is bound by n. That change would make the run time O(n^2), even though it would be a lot faster.
Thanks to egon, swilden, and Stephen C for my inspiration. First, we know the bounds of the goal value because it cannot be greater than the size of the list. Also, a 1GB list could contain at most 134217728 (128 * 2^20) 64-bit integers.
Hashing part
I propose using hashing to dramatically reduce our search space. First, square root the size of the list. For a 1GB list, that's N=11,586. Set up an integer array of size N. Iterate through the list, and take the square root* of each number you find as your hash. In your hash table, increment the counter for that hash. Next, iterate through your hash table. The first bucket you find that is not equal to it's max size defines your new search space.
Bitmap part
Now set up a regular bit map equal to the size of your new search space, and again iterate through the source list, filling out the bitmap as you find each number in your search space. When you're done, the first unset bit in your bitmap will give you your answer.
This will be completed in O(n) time and O(sqrt(n)) space.
(*You could use use something like bit shifting to do this a lot more efficiently, and just vary the number and size of buckets accordingly.)
Well if there is only one missing number in a list of numbers, the easiest way to find the missing number is to sum the series and subtract each value in the list. The final value is the missing number.
int i = 0;
while ( i < Array.Length)
{
if (Array[i] == i + 1)
{
i++;
}
if (i < Array.Length)
{
if (Array[i] <= Array.Length)
{//SWap
int temp = Array[i];
int AnoTemp = Array[temp - 1];
Array[temp - 1] = temp;
Array[i] = AnoTemp;
}
else
i++;
}
}
for (int j = 0; j < Array.Length; j++)
{
if (Array[j] > Array.Length)
{
Console.WriteLine(j + 1);
j = Array.Length;
}
else
if (j == Array.Length - 1)
Console.WriteLine("Not Found !!");
}
}
Here's my answer written in Java:
Basic Idea:
1- Loop through the array throwing away duplicate positive, zeros, and negative numbers while summing up the rest, getting the maximum positive number as well, and keep the unique positive numbers in a Map.
2- Compute the sum as max * (max+1)/2.
3- Find the difference between the sums calculated at steps 1 & 2
4- Loop again from 1 to the minimum of [sums difference, max] and return the first number that is not in the map populated in step 1.
public static int solution(int[] A) {
if (A == null || A.length == 0) {
throw new IllegalArgumentException();
}
int sum = 0;
Map<Integer, Boolean> uniqueNumbers = new HashMap<Integer, Boolean>();
int max = A[0];
for (int i = 0; i < A.length; i++) {
if(A[i] < 0) {
continue;
}
if(uniqueNumbers.get(A[i]) != null) {
continue;
}
if (A[i] > max) {
max = A[i];
}
uniqueNumbers.put(A[i], true);
sum += A[i];
}
int completeSum = (max * (max + 1)) / 2;
for(int j = 1; j <= Math.min((completeSum - sum), max); j++) {
if(uniqueNumbers.get(j) == null) { //O(1)
return j;
}
}
//All negative case
if(uniqueNumbers.isEmpty()) {
return 1;
}
return 0;
}
As Stephen C smartly pointed out, the answer must be a number smaller than the length of the array. I would then find the answer by binary search. This optimizes the worst case (so the interviewer can't catch you in a 'what if' pathological scenario). In an interview, do point out you are doing this to optimize for the worst case.
The way to use binary search is to subtract the number you are looking for from each element of the array, and check for negative results.
I like the "guess zero" apprach. If the numbers were random, zero is highly probable. If the "examiner" set a non-random list, then add one and guess again:
LowNum=0
i=0
do forever {
if i == N then leave /* Processed entire array */
if array[i] == LowNum {
LowNum++
i=0
}
else {
i++
}
}
display LowNum
The worst case is n*N with n=N, but in practice n is highly likely to be a small number (eg. 1)
I am not sure if I got the question. But if for list 1,2,3,5,6 and the missing number is 4, then the missing number can be found in O(n) by:
(n+2)(n+1)/2-(n+1)n/2
EDIT: sorry, I guess I was thinking too fast last night. Anyway, The second part should actually be replaced by sum(list), which is where O(n) comes. The formula reveals the idea behind it: for n sequential integers, the sum should be (n+1)*n/2. If there is a missing number, the sum would be equal to the sum of (n+1) sequential integers minus the missing number.
Thanks for pointing out the fact that I was putting some middle pieces in my mind.
Well done Ants Aasma! I thought about the answer for about 15 minutes and independently came up with an answer in a similar vein of thinking to yours:
#define SWAP(x,y) { numerictype_t tmp = x; x = y; y = tmp; }
int minNonNegativeNotInArr (numerictype_t * a, size_t n) {
int m = n;
for (int i = 0; i < m;) {
if (a[i] >= m || a[i] < i || a[i] == a[a[i]]) {
m--;
SWAP (a[i], a[m]);
continue;
}
if (a[i] > i) {
SWAP (a[i], a[a[i]]);
continue;
}
i++;
}
return m;
}
m represents "the current maximum possible output given what I know about the first i inputs and assuming nothing else about the values until the entry at m-1".
This value of m will be returned only if (a[i], ..., a[m-1]) is a permutation of the values (i, ..., m-1). Thus if a[i] >= m or if a[i] < i or if a[i] == a[a[i]] we know that m is the wrong output and must be at least one element lower. So decrementing m and swapping a[i] with the a[m] we can recurse.
If this is not true but a[i] > i then knowing that a[i] != a[a[i]] we know that swapping a[i] with a[a[i]] will increase the number of elements in their own place.
Otherwise a[i] must be equal to i in which case we can increment i knowing that all the values of up to and including this index are equal to their index.
The proof that this cannot enter an infinite loop is left as an exercise to the reader. :)
The Dafny fragment from Ants' answer shows why the in-place algorithm may fail. The requires pre-condition describes that the values of each item must not go beyond the bounds of the array.
method AntsAasma(A: array<int>) returns (M: int)
requires A != null && forall N :: 0 <= N < A.Length ==> 0 <= A[N] < A.Length;
modifies A;
{
// Pass 1, move every value to the position of its value
var N := A.Length;
var cursor := 0;
while (cursor < N)
{
var target := A[cursor];
while (0 <= target < N && target != A[target])
{
var new_target := A[target];
A[target] := target;
target := new_target;
}
cursor := cursor + 1;
}
// Pass 2, find first location where the index doesn't match the value
cursor := 0;
while (cursor < N)
{
if (A[cursor] != cursor)
{
return cursor;
}
cursor := cursor + 1;
}
return N;
}
Paste the code into the validator with and without the forall ... clause to see the verification error. The second error is a result of the verifier not being able to establish a termination condition for the Pass 1 loop. Proving this is left to someone who understands the tool better.
Here's an answer in Java that does not modify the input and uses O(N) time and N bits plus a small constant overhead of memory (where N is the size of the list):
int smallestMissingValue(List<Integer> values) {
BitSet bitset = new BitSet(values.size() + 1);
for (int i : values) {
if (i >= 0 && i <= values.size()) {
bitset.set(i);
}
}
return bitset.nextClearBit(0);
}
def solution(A):
index = 0
target = []
A = [x for x in A if x >=0]
if len(A) ==0:
return 1
maxi = max(A)
if maxi <= len(A):
maxi = len(A)
target = ['X' for x in range(maxi+1)]
for number in A:
target[number]= number
count = 1
while count < maxi+1:
if target[count] == 'X':
return count
count +=1
return target[count-1] + 1
Got 100% for the above solution.
1)Filter negative and Zero
2)Sort/distinct
3)Visit array
Complexity: O(N) or O(N * log(N))
using Java8
public int solution(int[] A) {
int result = 1;
boolean found = false;
A = Arrays.stream(A).filter(x -> x > 0).sorted().distinct().toArray();
//System.out.println(Arrays.toString(A));
for (int i = 0; i < A.length; i++) {
result = i + 1;
if (result != A[i]) {
found = true;
break;
}
}
if (!found && result == A.length) {
//result is larger than max element in array
result++;
}
return result;
}
An unordered_set can be used to store all the positive numbers, and then we can iterate from 1 to length of unordered_set, and see the first number that does not occur.
int firstMissingPositive(vector<int>& nums) {
unordered_set<int> fre;
// storing each positive number in a hash.
for(int i = 0; i < nums.size(); i +=1)
{
if(nums[i] > 0)
fre.insert(nums[i]);
}
int i = 1;
// Iterating from 1 to size of the set and checking
// for the occurrence of 'i'
for(auto it = fre.begin(); it != fre.end(); ++it)
{
if(fre.find(i) == fre.end())
return i;
i +=1;
}
return i;
}
Solution through basic javascript
var a = [1, 3, 6, 4, 1, 2];
function findSmallest(a) {
var m = 0;
for(i=1;i<=a.length;i++) {
j=0;m=1;
while(j < a.length) {
if(i === a[j]) {
m++;
}
j++;
}
if(m === 1) {
return i;
}
}
}
console.log(findSmallest(a))
Hope this helps for someone.
With python it is not the most efficient, but correct
#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
import datetime
# write your code in Python 3.6
def solution(A):
MIN = 0
MAX = 1000000
possible_results = range(MIN, MAX)
for i in possible_results:
next_value = (i + 1)
if next_value not in A:
return next_value
return 1
test_case_0 = [2, 2, 2]
test_case_1 = [1, 3, 44, 55, 6, 0, 3, 8]
test_case_2 = [-1, -22]
test_case_3 = [x for x in range(-10000, 10000)]
test_case_4 = [x for x in range(0, 100)] + [x for x in range(102, 200)]
test_case_5 = [4, 5, 6]
print("---")
a = datetime.datetime.now()
print(solution(test_case_0))
print(solution(test_case_1))
print(solution(test_case_2))
print(solution(test_case_3))
print(solution(test_case_4))
print(solution(test_case_5))
def solution(A):
A.sort()
j = 1
for i, elem in enumerate(A):
if j < elem:
break
elif j == elem:
j += 1
continue
else:
continue
return j
this can help:
0- A is [5, 3, 2, 7];
1- Define B With Length = A.Length; (O(1))
2- initialize B Cells With 1; (O(n))
3- For Each Item In A:
if (B.Length <= item) then B[Item] = -1 (O(n))
4- The answer is smallest index in B such that B[index] != -1 (O(n))

Resources