Amortized Time Cost using Accounting Method - algorithm

I written an algorithm to calculate the next lexicographic permutation of an array of integers (ex. 123, 132, 213, 231, 312,323). I dont think the code is necessary but I included it below.
I think I have appropriately determined worst case time cost of O(n) where n is the number of elements in the array. I understand however if you utilize "Amortized Cost" you would find that the time cost could be accurately shown as O(1) on average case.
Question:
I would like to learn the "ACCOUNTING METHOD" to show this as O(1) but am having difficulty understanding how to apply a cost to each operation. Accounting method: Link: Accounting_Method_Explained
Thoughts:
Ive thought to apply a cost of changing a value at a position, or applying the cost to a swap. But it really doesnt make much sense.
public static int[] getNext(int[] array) {
int temp;
int j = array.length - 1;
int k = array.length - 1;
// Find largest index j with a[j] < a[j+1]
// Finds the next adjacent pair of values not in descending order
do {
j--;
if(j < 0)
{
//Edge case, where you have the largest value, return reverse order
for(int x = 0, y = array.length-1; x<y; x++,y--)
{
temp = array[x];
array[x] = array[y];
array[y] = temp;
}
return array;
}
}while (array[j] > array[j+1]);
// Find index k such that a[k] is smallest integer
// greater than a[j] to the right of a[j]
for (;array[j] > array[k]; k--,count++);
//Swap the two elements found from j and k
temp = array[k];
array[k] = array[j];
array[j] = temp;
//Sort the elements to right of j+1 in ascending order
//This will make sure you get the next smallest order
//after swaping j and k
int r = array.length - 1;
int s = j + 1;
while (r > s) {
temp = array[s];
array[s++] = array[r];
array[r--] = temp;
}
return array;
} // end getNext

Measure running time in swaps, since the other work per iteration is worst-case O(#swaps).
The swap of array[j] and array[k] has virtual cost 2. The other swaps have virtual cost 0. Since at most one swap per iteration is costly, the running time per iteration is amortized constant (assuming that we don't go into debt).
To show that we don't go into debt, it suffices to show that, if the swap of array[j] and array[k] leaves a credit at position j, then every other swap involves a position with a credit available, which is consumed. Case analysis and induction reveal that, between iterations, if an item is larger than the one immediately following it, then it was put in its current position by a swap that left an as-yet unconsumed credit.
This problem is not a great candidate for the accounting method, given the comparatively simple potential function that can be used: number of indexes j such that array[j] > array[j + 1].

From the aggregate analysis, we see T(n) < n! · e < n! · 3, so we pay $3 for each operation, and its enough for the total n! operations. Therefore its an upper bound of actual cost. So the total amortized

Related

Time Complexity Analysis : better explaination?

I came across the following code to find triplets that satisfy Triangle sum property.
// Function to count all possible triangles with arr[]
// elements
static int findNumberOfTriangles(int arr[])
{
int n = arr.length;
// Sort the array elements in non-decreasing order
Arrays.sort(arr);
// Initialize count of triangles
int count = 0;
// Fix the first element. We need to run till n-3 as
// the other two elements are selected from arr[i+1...n-1]
for (int i = 0; i < n-2; ++i)
{
// Initialize index of the rightmost third element
int k = i + 2;
// Fix the second element
for (int j = i+1; j < n; ++j)
{
/* Find the rightmost element which is smaller
than the sum of two fixed elements
The important thing to note here is, we use
the previous value of k. If value of arr[i] +
arr[j-1] was greater than arr[k], then arr[i] +
arr[j] must be greater than k, because the
array is sorted. */
while (k < n && arr[i] + arr[j] > arr[k])
++k;
/* Total number of possible triangles that can be
formed with the two fixed elements is k - j - 1.
The two fixed elements are arr[i] and arr[j]. All
elements between arr[j+1] to arr[k-1] can form a
triangle with arr[i] and arr[j]. One is subtracted
from k because k is incremented one extra in above
while loop. k will always be greater than j. If j
becomes equal to k, then above loop will increment
k, because arr[k] + arr[i] is always/ greater than
arr[k] */
count += k - j - 1;
}
}
return count;
}
Can someone give a better explanation as to why the time complexity of this solution is O(n^2) and not O(n^3) ? My understanding is that for every i and j, k varies too.
The time complexity of the above solution is O(n^2) as you can see the value of k is initialised before the second for loop. In the second for
loop the value of k is increasing in while condition. Once the while condition terminates, for loop will run for the next value of j and the
value of k remains the same as it was terminated in the while loop before.
Once the value of k becomes equal to n then it will not run for any value of j after that.
So the second for loop is running only from k=i+2 to n. Hence the complexity is O(n^2).
The only statement that could get executed more than O(n^2) times is the most nested ++k statement.
But k never exceeds n, and is reset (to a non-negative number) n-2 times. That proves that the ++k statement is executed at most n(n-2) = O(n^2) times.

Insertion sort comparison?

How to count number of comparisons in insertion sort in less than O(n^2) ?
When we're inserting an element, we alternate comparisons and swaps until either (1) the element compares not less than the element to its right (2) we hit the beginning of the array. In case (1), there is one comparison not paired with a swap. In case (2), every comparison is paired with a swap. The upward adjustment for number of comparisons can be computed by counting the number of successive minima from left to right (or however your insertion sort works), in time O(n).
num_comparisons = num_swaps
min_so_far = array[0]
for i in range(1, len(array)):
if array[i] < min_so_far:
min_so_far = array[i]
else:
num_comparisons += 1
As commented, to do it in less than O(n^2) is hard, maybe impossible if you must pay the price for sorting. If you already know the number of comparisons done at each external iteration then it would be possible in O(n), but the price for sorting was payed sometime before.
Here is a way for counting the comparisons inside the method (in pseudo C++):
void insertion_sort(int p[], const size_t n, size_t & count)
{
for (long i = 1, j; i < n; ++i)
{
auto tmp = p[i];
for (j = i - 1; j >= 0 and p[j] > tmp; --j) // insert a gap where put tmp
p[j + 1] = p[j];
count += i - j; // i - j is the number of comparisons done in this iteration
p[j + 1] = tmp;
}
}
n is the number of elements and count the comparisons counter which must receive a variable set to zero.
If I remember correctly, this is how insertion sort works:
A = unsorted input array
B := []; //sorted output array
while(A is not empty) {
remove first element from A and add it to B, preserving B's sorting
}
If the insertion to B is implemented by linear search from the left until you find a greater element, then the number of comparisons is the number of pairs (i,j) such that i < j and A[i] >= A[j] (I'm considering the stable variant).
In other words, for each element x, count the number of elements before x that have less or equal value. That can be done by scanning A from the left, adding it's element to some balanced binary search tree, that also remembers the number of elements under each node. In such tree, you can find number of elements lesser or equal to a certain value in O(log n). Total time: O(n log n).

Sample an index of a maximal number in an array, with a probability of 1/(number of maximal numbers)

This is one of the recent interview question that I faced. Program to return the index of the maximum number in the array [ To Note : the array may or may not contain multiple copies of maximum number ] such that each index ( which contains the maximum numbers ) have the probability of 1/no of max numbers to be returned.
Examples:
[-1 3 2 3 3], each of positions [1,3,4] have the probability 1/3 to be returned (the three 3s)
[ 2 4 6 6 3 1 6 6 ], each of [2,3,6,7] have the probability of 1/4 to be returned (corresponding to the position of the 6s).
First, I gave O(n) time and O(n) space algorithm where I collect the set of max-indexes and then return a random number from the set. But he asked for a O(n) time and O(1) complexity program and then I came up with this.
int find_maxIndex(vector<int> a)
{
max = a[0];
max_index = 0;
count = 0;
for(i = 1 to a.size())
{
if(max < a[i])
{
max = a[i];
count = 0;
}
if(max == a[i])
{
count++;
if(rand < 1/count) //rand = a random number in the range of [0,1]
max_index = i;
}
}
return max_index;
}
I gave him this solution. But my doubt is if this procedure would select one of the indexes of max numbers with equal probability. Hope I am clear.Is there any other method to do this ?
What you have is Reservoir sampling! There is another easy to understand solution, but requires two passes.
int find_maxIndex(vector<int> a){
int count = 1;
int maxElement = a[0];
for(int i = 1; i < a.size(); i++){
if(a[i] == maxElement){
count ++;
} else if(a[i] > maxElement){
count = 1;
maxElement = a[i];
}
}
int occurrence = rand() % count + 1;
int occur = 0;
for(int i = 0; i < a.size(); i++){
if(a[i] == maxElement){
occur++;
if(occur == occurrence) return i;
}
}
}
The algorithm is pretty simple, first find the number of times the max element occurs in the first pass. And choose a random occurrence and return the index of that occurrence. It takes two passes though, but very easy to understand.
Your algorithm works fine, and you can prove it via induction.
That is, assuming it works for any array of size N, prove it works for any array of size N+1.
So, given an array of size N+1, think of it as a sub-array of size N followed a new element at the end. By assumption, your algorithm uniformly selects one of the max elements of the sub-array... And then it behaves as follows:
If the new element is larger than the max of the sub-array, return that element. This is obviously correct.
If the new element is less than the max of the sub-array, return the result of the algorithm on the sub-array. Also obviously correct.
The only slightly tricky part is when the new element equals the max element of the sub-array. In this case, let the number of max elements in the sub-array be k. Then, by hypothesis, your algorithm selected one of them with probability 1/k. By keeping that same element with probability k/(k+1), you make the overall probability of selecting that same element equal 1/k * k /(k+1) == 1/(k+1), as desired. You also select the last element with the same probability, so we are done.
To complete the inductive proof, just verify the algorithm works on an array of size 1. Also, for quality of implementation purposes, fix it not to crash on arrays of size zero :-)
[Update]
Incidentally, this algorithm and its proof are closely related to the Fisher-Yates shuffle (which I always thought was "Knuth's card-shuffling algorithm", but Wikipedia says I am behind the times).
The idea is sound, but the devil is in the details.
First off, what language are you using? It might make a difference. The rand() from C and C++ will return an integer, which isn't likely to be less than 1/count unless it returns 0. Even then, if 1/count is an integer division, that result is always going to be 0.
Also your count is off by 1. It starts as 1 when you get a new max, but you immediately increment it in the next if statement.

Maximize sum of list with no more than k consecutive elements from input

I have an array of N numbers and I want remove only those elements from the list which when removed will create a new list where there are no more K numbers adjacent to each other. There can be multiple lists that can be created with this restriction. So I just want that list in which the sum of the remaining numbers is maximum and as an output print that sum only.
The algorithm that I have come up with so far has a time complexity of O(n^2). Is it possible to get better algorithm for this problem?
Link to the question.
Here's my attempt:
int main()
{
//Total Number of elements in the list
int count = 6;
//Maximum number of elements that can be together
int maxTogether = 1;
//The list of numbers
int billboards[] = {4, 7, 2, 0, 8, 9};
int maxSum = 0;
for(int k = 0; k<=maxTogether ; k++){
int sum=0;
int size= k;
for (int i = 0; i< count; i++) {
if(size != maxTogether){
sum += billboards[i];
size++;
}else{
size = 0;
}
}
printf("%i\n", sum);
if(sum > maxSum)
{
maxSum = sum;
}
}
return 0;
}
The O(NK) dynamic programming solution is fairly easy:
Let A[i] be the best sum of the elements to the left subject to the not-k-consecutive constraint (assuming we're removing the i-th element as well).
Then we can calculate A[i] by looking back K elements:
A[i] = 0;
for j = 1 to k
A[i] = max(A[i], A[i-j])
A[i] += input[i]
And, at the end, just look through the last k elements from A, adding the elements to the right to each and picking the best one.
But this is too slow.
Let's do better.
So A[i] finds the best from A[i-1], A[i-2], ..., A[i-K+1], A[i-K].
So A[i+1] finds the best from A[i], A[i-1], A[i-2], ..., A[i-K+1].
There's a lot of redundancy there - we already know the best from indices i-1 through i-K because of A[i]'s calculation, but then we find the best of all of those except i-K (with i) again in A[i+1].
So we can just store all of them in an ordered data structure and then remove A[i-K] and insert A[i]. My choice - A binary search tree to find the minimum, along with a circular array of size K+1 of tree nodes, so we can easily find the one we need to remove.
I swapped the problem around to make it slightly simpler - instead of finding the maximum of remaining elements, I find the minimum of removed elements and then return total sum - removed sum.
High-level pseudo-code:
for each i in input
add (i + the smallest value in the BST) to the BST
add the above node to the circular array
if it wrapper around, remove the overridden element from the BST
// now the remaining nodes in the BST are the last k elements
return (the total sum - the smallest value in the BST)
Running time:
O(n log k)
Java code:
int getBestSum(int[] input, int K)
{
Node[] array = new Node[K+1];
TreeSet<Node> nodes = new TreeSet<Node>();
Node n = new Node(0);
nodes.add(n);
array[0] = n;
int arrPos = 0;
int sum = 0;
for (int i: input)
{
sum += i;
Node oldNode = nodes.first();
Node newNode = new Node(oldNode.value + i);
arrPos = (arrPos + 1) % array.length;
if (array[arrPos] != null)
nodes.remove(array[arrPos]);
array[arrPos] = newNode;
nodes.add(newNode);
}
return sum - nodes.first().value;
}
getBestSum(new int[]{1,2,3,1,6,10}, 2) prints 21, as required.
Let f[i] be the maximum total value you can get with the first i numbers, while you don't choose the last(i.e. the i-th) one. Then we have
f[i] = max{
f[i-1],
max{f[j] + sum(j + 1, i - 1) | (i - j) <= k}
}
you can use a heap-like data structure to maintain the options and get the maximum one in log(n) time, keep a global delta or whatever, and pay attention to the range i - j <= k.
The following algorithm is of O(N*K) complexity.
Examine the 1st K elements (0 to K-1) of the array. There can be at most 1 gap in this region.
Reason: If there were two gaps, then there would not be any reason to have the lower (earlier gap).
For each index i of these K gap options, following holds true:
1. Sum upto i-1 is the present score of each option.
2. If the next gap is after a distance of d, then the options for d are (K - i) to K
For every possible position of gap, calculate the best sum upto that position among the options.
The latter part of the array can be traversed similarly independently from the past gap history.
Traverse the array further till the end.

array- having some issues [duplicate]

An interesting interview question that a colleague of mine uses:
Suppose that you are given a very long, unsorted list of unsigned 64-bit integers. How would you find the smallest non-negative integer that does not occur in the list?
FOLLOW-UP: Now that the obvious solution by sorting has been proposed, can you do it faster than O(n log n)?
FOLLOW-UP: Your algorithm has to run on a computer with, say, 1GB of memory
CLARIFICATION: The list is in RAM, though it might consume a large amount of it. You are given the size of the list, say N, in advance.
If the datastructure can be mutated in place and supports random access then you can do it in O(N) time and O(1) additional space. Just go through the array sequentially and for every index write the value at the index to the index specified by value, recursively placing any value at that location to its place and throwing away values > N. Then go again through the array looking for the spot where value doesn't match the index - that's the smallest value not in the array. This results in at most 3N comparisons and only uses a few values worth of temporary space.
# Pass 1, move every value to the position of its value
for cursor in range(N):
target = array[cursor]
while target < N and target != array[target]:
new_target = array[target]
array[target] = target
target = new_target
# Pass 2, find first location where the index doesn't match the value
for cursor in range(N):
if array[cursor] != cursor:
return cursor
return N
Here's a simple O(N) solution that uses O(N) space. I'm assuming that we are restricting the input list to non-negative numbers and that we want to find the first non-negative number that is not in the list.
Find the length of the list; lets say it is N.
Allocate an array of N booleans, initialized to all false.
For each number X in the list, if X is less than N, set the X'th element of the array to true.
Scan the array starting from index 0, looking for the first element that is false. If you find the first false at index I, then I is the answer. Otherwise (i.e. when all elements are true) the answer is N.
In practice, the "array of N booleans" would probably be encoded as a "bitmap" or "bitset" represented as a byte or int array. This typically uses less space (depending on the programming language) and allows the scan for the first false to be done more quickly.
This is how / why the algorithm works.
Suppose that the N numbers in the list are not distinct, or that one or more of them is greater than N. This means that there must be at least one number in the range 0 .. N - 1 that is not in the list. So the problem of find the smallest missing number must therefore reduce to the problem of finding the smallest missing number less than N. This means that we don't need to keep track of numbers that are greater or equal to N ... because they won't be the answer.
The alternative to the previous paragraph is that the list is a permutation of the numbers from 0 .. N - 1. In this case, step 3 sets all elements of the array to true, and step 4 tells us that the first "missing" number is N.
The computational complexity of the algorithm is O(N) with a relatively small constant of proportionality. It makes two linear passes through the list, or just one pass if the list length is known to start with. There is no need to represent the hold the entire list in memory, so the algorithm's asymptotic memory usage is just what is needed to represent the array of booleans; i.e. O(N) bits.
(By contrast, algorithms that rely on in-memory sorting or partitioning assume that you can represent the entire list in memory. In the form the question was asked, this would require O(N) 64-bit words.)
#Jorn comments that steps 1 through 3 are a variation on counting sort. In a sense he is right, but the differences are significant:
A counting sort requires an array of (at least) Xmax - Xmin counters where Xmax is the largest number in the list and Xmin is the smallest number in the list. Each counter has to be able to represent N states; i.e. assuming a binary representation it has to have an integer type (at least) ceiling(log2(N)) bits.
To determine the array size, a counting sort needs to make an initial pass through the list to determine Xmax and Xmin.
The minimum worst-case space requirement is therefore ceiling(log2(N)) * (Xmax - Xmin) bits.
By contrast, the algorithm presented above simply requires N bits in the worst and best cases.
However, this analysis leads to the intuition that if the algorithm made an initial pass through the list looking for a zero (and counting the list elements if required), it would give a quicker answer using no space at all if it found the zero. It is definitely worth doing this if there is a high probability of finding at least one zero in the list. And this extra pass doesn't change the overall complexity.
EDIT: I've changed the description of the algorithm to use "array of booleans" since people apparently found my original description using bits and bitmaps to be confusing.
Since the OP has now specified that the original list is held in RAM and that the computer has only, say, 1GB of memory, I'm going to go out on a limb and predict that the answer is zero.
1GB of RAM means the list can have at most 134,217,728 numbers in it. But there are 264 = 18,446,744,073,709,551,616 possible numbers. So the probability that zero is in the list is 1 in 137,438,953,472.
In contrast, my odds of being struck by lightning this year are 1 in 700,000. And my odds of getting hit by a meteorite are about 1 in 10 trillion. So I'm about ten times more likely to be written up in a scientific journal due to my untimely death by a celestial object than the answer not being zero.
As pointed out in other answers you can do a sort, and then simply scan up until you find a gap.
You can improve the algorithmic complexity to O(N) and keep O(N) space by using a modified QuickSort where you eliminate partitions which are not potential candidates for containing the gap.
On the first partition phase, remove duplicates.
Once the partitioning is complete look at the number of items in the lower partition
Is this value equal to the value used for creating the partition?
If so then it implies that the gap is in the higher partition.
Continue with the quicksort, ignoring the lower partition
Otherwise the gap is in the lower partition
Continue with the quicksort, ignoring the higher partition
This saves a large number of computations.
To illustrate one of the pitfalls of O(N) thinking, here is an O(N) algorithm that uses O(1) space.
for i in [0..2^64):
if i not in list: return i
print "no 64-bit integers are missing"
Since the numbers are all 64 bits long, we can use radix sort on them, which is O(n). Sort 'em, then scan 'em until you find what you're looking for.
if the smallest number is zero, scan forward until you find a gap. If the smallest number is not zero, the answer is zero.
For a space efficient method and all values are distinct you can do it in space O( k ) and time O( k*log(N)*N ). It's space efficient and there's no data moving and all operations are elementary (adding subtracting).
set U = N; L=0
First partition the number space in k regions. Like this:
0->(1/k)*(U-L) + L, 0->(2/k)*(U-L) + L, 0->(3/k)*(U-L) + L ... 0->(U-L) + L
Find how many numbers (count{i}) are in each region. (N*k steps)
Find the first region (h) that isn't full. That means count{h} < upper_limit{h}. (k steps)
if h - count{h-1} = 1 you've got your answer
set U = count{h}; L = count{h-1}
goto 2
this can be improved using hashing (thanks for Nic this idea).
same
First partition the number space in k regions. Like this:
L + (i/k)->L + (i+1/k)*(U-L)
inc count{j} using j = (number - L)/k (if L < number < U)
find first region (h) that doesn't have k elements in it
if count{h} = 1 h is your answer
set U = maximum value in region h L = minimum value in region h
This will run in O(log(N)*N).
I'd just sort them then run through the sequence until I find a gap (including the gap at the start between zero and the first number).
In terms of an algorithm, something like this would do it:
def smallest_not_in_list(list):
sort(list)
if list[0] != 0:
return 0
for i = 1 to list.last:
if list[i] != list[i-1] + 1:
return list[i-1] + 1
if list[list.last] == 2^64 - 1:
assert ("No gaps")
return list[list.last] + 1
Of course, if you have a lot more memory than CPU grunt, you could create a bitmask of all possible 64-bit values and just set the bits for every number in the list. Then look for the first 0-bit in that bitmask. That turns it into an O(n) operation in terms of time but pretty damned expensive in terms of memory requirements :-)
I doubt you could improve on O(n) since I can't see a way of doing it that doesn't involve looking at each number at least once.
The algorithm for that one would be along the lines of:
def smallest_not_in_list(list):
bitmask = mask_make(2^64) // might take a while :-)
mask_clear_all (bitmask)
for i = 1 to list.last:
mask_set (bitmask, list[i])
for i = 0 to 2^64 - 1:
if mask_is_clear (bitmask, i):
return i
assert ("No gaps")
Sort the list, look at the first and second elements, and start going up until there is a gap.
We could use a hash table to hold the numbers. Once all numbers are done, run a counter from 0 till we find the lowest. A reasonably good hash will hash and store in constant time, and retrieves in constant time.
for every i in X // One scan Θ(1)
hashtable.put(i, i); // O(1)
low = 0;
while (hashtable.get(i) <> null) // at most n+1 times
low++;
print low;
The worst case if there are n elements in the array, and are {0, 1, ... n-1}, in which case, the answer will be obtained at n, still keeping it O(n).
You can do it in O(n) time and O(1) additional space, although the hidden factor is quite large. This isn't a practical way to solve the problem, but it might be interesting nonetheless.
For every unsigned 64-bit integer (in ascending order) iterate over the list until you find the target integer or you reach the end of the list. If you reach the end of the list, the target integer is the smallest integer not in the list. If you reach the end of the 64-bit integers, every 64-bit integer is in the list.
Here it is as a Python function:
def smallest_missing_uint64(source_list):
the_answer = None
target = 0L
while target < 2L**64:
target_found = False
for item in source_list:
if item == target:
target_found = True
if not target_found and the_answer is None:
the_answer = target
target += 1L
return the_answer
This function is deliberately inefficient to keep it O(n). Note especially that the function keeps checking target integers even after the answer has been found. If the function returned as soon as the answer was found, the number of times the outer loop ran would be bound by the size of the answer, which is bound by n. That change would make the run time O(n^2), even though it would be a lot faster.
Thanks to egon, swilden, and Stephen C for my inspiration. First, we know the bounds of the goal value because it cannot be greater than the size of the list. Also, a 1GB list could contain at most 134217728 (128 * 2^20) 64-bit integers.
Hashing part
I propose using hashing to dramatically reduce our search space. First, square root the size of the list. For a 1GB list, that's N=11,586. Set up an integer array of size N. Iterate through the list, and take the square root* of each number you find as your hash. In your hash table, increment the counter for that hash. Next, iterate through your hash table. The first bucket you find that is not equal to it's max size defines your new search space.
Bitmap part
Now set up a regular bit map equal to the size of your new search space, and again iterate through the source list, filling out the bitmap as you find each number in your search space. When you're done, the first unset bit in your bitmap will give you your answer.
This will be completed in O(n) time and O(sqrt(n)) space.
(*You could use use something like bit shifting to do this a lot more efficiently, and just vary the number and size of buckets accordingly.)
Well if there is only one missing number in a list of numbers, the easiest way to find the missing number is to sum the series and subtract each value in the list. The final value is the missing number.
int i = 0;
while ( i < Array.Length)
{
if (Array[i] == i + 1)
{
i++;
}
if (i < Array.Length)
{
if (Array[i] <= Array.Length)
{//SWap
int temp = Array[i];
int AnoTemp = Array[temp - 1];
Array[temp - 1] = temp;
Array[i] = AnoTemp;
}
else
i++;
}
}
for (int j = 0; j < Array.Length; j++)
{
if (Array[j] > Array.Length)
{
Console.WriteLine(j + 1);
j = Array.Length;
}
else
if (j == Array.Length - 1)
Console.WriteLine("Not Found !!");
}
}
Here's my answer written in Java:
Basic Idea:
1- Loop through the array throwing away duplicate positive, zeros, and negative numbers while summing up the rest, getting the maximum positive number as well, and keep the unique positive numbers in a Map.
2- Compute the sum as max * (max+1)/2.
3- Find the difference between the sums calculated at steps 1 & 2
4- Loop again from 1 to the minimum of [sums difference, max] and return the first number that is not in the map populated in step 1.
public static int solution(int[] A) {
if (A == null || A.length == 0) {
throw new IllegalArgumentException();
}
int sum = 0;
Map<Integer, Boolean> uniqueNumbers = new HashMap<Integer, Boolean>();
int max = A[0];
for (int i = 0; i < A.length; i++) {
if(A[i] < 0) {
continue;
}
if(uniqueNumbers.get(A[i]) != null) {
continue;
}
if (A[i] > max) {
max = A[i];
}
uniqueNumbers.put(A[i], true);
sum += A[i];
}
int completeSum = (max * (max + 1)) / 2;
for(int j = 1; j <= Math.min((completeSum - sum), max); j++) {
if(uniqueNumbers.get(j) == null) { //O(1)
return j;
}
}
//All negative case
if(uniqueNumbers.isEmpty()) {
return 1;
}
return 0;
}
As Stephen C smartly pointed out, the answer must be a number smaller than the length of the array. I would then find the answer by binary search. This optimizes the worst case (so the interviewer can't catch you in a 'what if' pathological scenario). In an interview, do point out you are doing this to optimize for the worst case.
The way to use binary search is to subtract the number you are looking for from each element of the array, and check for negative results.
I like the "guess zero" apprach. If the numbers were random, zero is highly probable. If the "examiner" set a non-random list, then add one and guess again:
LowNum=0
i=0
do forever {
if i == N then leave /* Processed entire array */
if array[i] == LowNum {
LowNum++
i=0
}
else {
i++
}
}
display LowNum
The worst case is n*N with n=N, but in practice n is highly likely to be a small number (eg. 1)
I am not sure if I got the question. But if for list 1,2,3,5,6 and the missing number is 4, then the missing number can be found in O(n) by:
(n+2)(n+1)/2-(n+1)n/2
EDIT: sorry, I guess I was thinking too fast last night. Anyway, The second part should actually be replaced by sum(list), which is where O(n) comes. The formula reveals the idea behind it: for n sequential integers, the sum should be (n+1)*n/2. If there is a missing number, the sum would be equal to the sum of (n+1) sequential integers minus the missing number.
Thanks for pointing out the fact that I was putting some middle pieces in my mind.
Well done Ants Aasma! I thought about the answer for about 15 minutes and independently came up with an answer in a similar vein of thinking to yours:
#define SWAP(x,y) { numerictype_t tmp = x; x = y; y = tmp; }
int minNonNegativeNotInArr (numerictype_t * a, size_t n) {
int m = n;
for (int i = 0; i < m;) {
if (a[i] >= m || a[i] < i || a[i] == a[a[i]]) {
m--;
SWAP (a[i], a[m]);
continue;
}
if (a[i] > i) {
SWAP (a[i], a[a[i]]);
continue;
}
i++;
}
return m;
}
m represents "the current maximum possible output given what I know about the first i inputs and assuming nothing else about the values until the entry at m-1".
This value of m will be returned only if (a[i], ..., a[m-1]) is a permutation of the values (i, ..., m-1). Thus if a[i] >= m or if a[i] < i or if a[i] == a[a[i]] we know that m is the wrong output and must be at least one element lower. So decrementing m and swapping a[i] with the a[m] we can recurse.
If this is not true but a[i] > i then knowing that a[i] != a[a[i]] we know that swapping a[i] with a[a[i]] will increase the number of elements in their own place.
Otherwise a[i] must be equal to i in which case we can increment i knowing that all the values of up to and including this index are equal to their index.
The proof that this cannot enter an infinite loop is left as an exercise to the reader. :)
The Dafny fragment from Ants' answer shows why the in-place algorithm may fail. The requires pre-condition describes that the values of each item must not go beyond the bounds of the array.
method AntsAasma(A: array<int>) returns (M: int)
requires A != null && forall N :: 0 <= N < A.Length ==> 0 <= A[N] < A.Length;
modifies A;
{
// Pass 1, move every value to the position of its value
var N := A.Length;
var cursor := 0;
while (cursor < N)
{
var target := A[cursor];
while (0 <= target < N && target != A[target])
{
var new_target := A[target];
A[target] := target;
target := new_target;
}
cursor := cursor + 1;
}
// Pass 2, find first location where the index doesn't match the value
cursor := 0;
while (cursor < N)
{
if (A[cursor] != cursor)
{
return cursor;
}
cursor := cursor + 1;
}
return N;
}
Paste the code into the validator with and without the forall ... clause to see the verification error. The second error is a result of the verifier not being able to establish a termination condition for the Pass 1 loop. Proving this is left to someone who understands the tool better.
Here's an answer in Java that does not modify the input and uses O(N) time and N bits plus a small constant overhead of memory (where N is the size of the list):
int smallestMissingValue(List<Integer> values) {
BitSet bitset = new BitSet(values.size() + 1);
for (int i : values) {
if (i >= 0 && i <= values.size()) {
bitset.set(i);
}
}
return bitset.nextClearBit(0);
}
def solution(A):
index = 0
target = []
A = [x for x in A if x >=0]
if len(A) ==0:
return 1
maxi = max(A)
if maxi <= len(A):
maxi = len(A)
target = ['X' for x in range(maxi+1)]
for number in A:
target[number]= number
count = 1
while count < maxi+1:
if target[count] == 'X':
return count
count +=1
return target[count-1] + 1
Got 100% for the above solution.
1)Filter negative and Zero
2)Sort/distinct
3)Visit array
Complexity: O(N) or O(N * log(N))
using Java8
public int solution(int[] A) {
int result = 1;
boolean found = false;
A = Arrays.stream(A).filter(x -> x > 0).sorted().distinct().toArray();
//System.out.println(Arrays.toString(A));
for (int i = 0; i < A.length; i++) {
result = i + 1;
if (result != A[i]) {
found = true;
break;
}
}
if (!found && result == A.length) {
//result is larger than max element in array
result++;
}
return result;
}
An unordered_set can be used to store all the positive numbers, and then we can iterate from 1 to length of unordered_set, and see the first number that does not occur.
int firstMissingPositive(vector<int>& nums) {
unordered_set<int> fre;
// storing each positive number in a hash.
for(int i = 0; i < nums.size(); i +=1)
{
if(nums[i] > 0)
fre.insert(nums[i]);
}
int i = 1;
// Iterating from 1 to size of the set and checking
// for the occurrence of 'i'
for(auto it = fre.begin(); it != fre.end(); ++it)
{
if(fre.find(i) == fre.end())
return i;
i +=1;
}
return i;
}
Solution through basic javascript
var a = [1, 3, 6, 4, 1, 2];
function findSmallest(a) {
var m = 0;
for(i=1;i<=a.length;i++) {
j=0;m=1;
while(j < a.length) {
if(i === a[j]) {
m++;
}
j++;
}
if(m === 1) {
return i;
}
}
}
console.log(findSmallest(a))
Hope this helps for someone.
With python it is not the most efficient, but correct
#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
import datetime
# write your code in Python 3.6
def solution(A):
MIN = 0
MAX = 1000000
possible_results = range(MIN, MAX)
for i in possible_results:
next_value = (i + 1)
if next_value not in A:
return next_value
return 1
test_case_0 = [2, 2, 2]
test_case_1 = [1, 3, 44, 55, 6, 0, 3, 8]
test_case_2 = [-1, -22]
test_case_3 = [x for x in range(-10000, 10000)]
test_case_4 = [x for x in range(0, 100)] + [x for x in range(102, 200)]
test_case_5 = [4, 5, 6]
print("---")
a = datetime.datetime.now()
print(solution(test_case_0))
print(solution(test_case_1))
print(solution(test_case_2))
print(solution(test_case_3))
print(solution(test_case_4))
print(solution(test_case_5))
def solution(A):
A.sort()
j = 1
for i, elem in enumerate(A):
if j < elem:
break
elif j == elem:
j += 1
continue
else:
continue
return j
this can help:
0- A is [5, 3, 2, 7];
1- Define B With Length = A.Length; (O(1))
2- initialize B Cells With 1; (O(n))
3- For Each Item In A:
if (B.Length <= item) then B[Item] = -1 (O(n))
4- The answer is smallest index in B such that B[index] != -1 (O(n))

Resources