how to track max/min of fifo query - algorithm

I have fifo query where I put and eject doubles.
After each update I need max and min values. I don't need position (or index) of these values in query.
How to do that effectively? log(N) or probably even O(1)?
upd: found this Implement a queue in which push_rear(), pop_front() and get_min() are all constant time operations

This is a tricky question. Consider the following:
Say the size of your fifo at any given time is N.
Say that you track the min and max with just a pair of floats.
Say that the size of the fifo remains reasonably constant.
We can therefore assume that one "operation" on the queue logically consists of one push and one pop.
Say you are comparing 2 methods of handling this: one uses a heap pair and one that uses a naive compare and search.
For the heap method:
Each operation, you push to the list and both heaps, then pop from the list and both heaps. The heap operations are always O(log(n)) and the list operation is O(1), so as N is large, the time complexity of one operation is O(log(N)) average case. It's important to note that the heap operations are always this complexity regardless of whether or not the currently popped element is a min or max element. Thus, N operations has a time complexity of O(N*log(N)).
For the naive method:
Each operation, you push and pop the list, and compare the popped item to the stored min and max. If the item is the same as either one, you search the list either for an item of equal value (in which case you break early) or otherwise through the entire rest of the list, until you find the next best element. You then update the min/max with the next best. This method has O(1) typical case and O(N) worst case (min or max needs an update). It's important to note that for some range of N numbers the number of times you will need to update min and max goes to a constant, and the number of times you won't goes to N. Therefore, N operations has a time complexity of O(N). The naive case is actually better than a more advanced solution.
That said, I don't think heaps can efficiently remove elements, so you'd run into lots of trouble that way.
Thus, consider the following pseudocode:
queue fifo;
float min, max;
void push(float f)
{
if (fifo.size == 0)
{
min = f;
max = f;
}
else if (f > max) max = f;
else if (f < min) min = f;
fifo.push(f);
}
float pop()
{
if (fifo.size == 0) return (*((float *)NULL)); // explode
float f = fifo.pop();
if (fifo.size == 0)
{
min = NaN;
max = NaN;
return f;
}
if (f == max) search_max(fifo);
if (f == min) search_min(fifo);
return f;
}
search_max(queue q)
{
float f = min;
for (element in q)
{
if (element == max) return;
if (element > f) f = element;
}
max = f;
}
search_min(queue q)
{
float f = max;
for (element in q)
{
if (element == min) return;
if (element < f) f = element;
}
min = f;
}

How about using heap ( http://en.wikipedia.org/wiki/Heap_%28data_structure%29 ). You can have two heaps. One for extracting min and one for max (as a single heap cannot extract min and max at the same time). it also does not require any space overhead and the Big O is log n.

Related

What's the most efficient way to convert a binomial tree into a sorted array of keys?

Say we are given a single-tree binomial heap, such that the binomial tree is of rank r, and thus holding 2^r keys. What's the most efficient way to convert it into a k<2^r length sorted array, with the k smallest keys of the tree? Let's assume we can't use any other data structure but Lazy Binomial Heaps, and Binomial Trees. Notice that at each level the children are unnecessarily linked by order, so you might have to make some comparisons at some point.
My solution was (assuming 1<=k<=2^r):
Create a new empty lazy binomial heap H.
Insert the root's key into the heap.
Create a new counter x, and set x=1.
For each level i=0,1,... (where the root is at level 0):
Let c be the number of nodes at level i.
Set x=x+c.
Iterate over the nodes in level i and:
Insert each node N to H. (In O(1))
If x < k, recursively make the same process for each node N, passing through x so the counting continues.
Repeat k times:
Extract the minimal key out of the heap and place it in the output array.
Delete the minimal key form the heap. (amortized cost: O(1))
Return output array.
There might be some holes in the pseudo-code, but I think the idea itself is clear. I also managed to implement it. However, I'm not sure that's the most efficient algorithm for this task.
Thanks to Gene's comment I see that the earlier algorithm I suggested will not always work, as it assumes the maximal node at level x is smaller than the minimal node at level x-1, which is not a reasonable assumption.
Yet, I believe this one makes the job efficiently:
public static int[] kMin(FibonacciHeap H, int k) {
if (H == null || H.isEmpty() || k <= 0)
return new int[0];
HeapNode tree = H.findMin();
int rank = tree.getRank();
int size = H.size();
size = (int) Math.min(size, Math.pow(2, rank));
if (k > size)
k = size;
int[] result = new int[k];
FibonacciHeap heap = new FibonacciHeap();
HeapNode next = H.findMin();
for(int i = 0; i < k; i++) { // k iterations
if(next != null)
for (Iterator<HeapNode> iter = next.iterator(); iter.hasNext(); ) { // rank nCr next.getParent().getRank() iterations.
next = iter.next();
HeapNode node = heap.insert(next.getKey()); // O(1)
node.setFreePointer(next);
}
next = heap.findMin().getFreePointer();
result[i] = next.getKey();
heap.deleteMin(); // O(log n) amortized cost.
next = next.child;
}
return result;
}
"freePointer" is a field in HeapNode, where I can store a pointer to another HeapNode. It is basically the "info field" most heaps have.
let r be the rank of the tree. Every iteration we insert at most r items to the external heap. In addition, every iteration we use Delete-Min to delete one item from the heap.
Therefore, the total cost of insertions is O(kr), and the total cost of Delete-Min is O(k*log(k)+k*log(r)). So the total cost of everything becomes O(k(log(k)+r))

Amortized worst case complexity of binary search

For a binary search of a sorted array of 2^n-1 elements in which the element we are looking for appears, what is the amortized worst-case time complexity?
Found this on my review sheet for my final exam. I can't even figure out why we would want amortized time complexity for binary search because its worst case is O(log n). According to my notes, the amortized cost calculates the upper-bound of an algorithm and then divides it by the number of items, so wouldn't that be as simple as the worst-case time complexity divided by n, meaning O(log n)/2^n-1?
For reference, here is the binary search I've been using:
public static boolean binarySearch(int x, int[] sorted) {
int s = 0; //start
int e = sorted.length-1; //end
while(s <= e) {
int mid = s + (e-s)/2;
if( sorted[mid] == x )
return true;
else if( sorted[mid] < x )
start = mid+1;
else
end = mid-1;
}
return false;
}
I'm honestly not sure what this means - I don't see how amortization interacts with binary search.
Perhaps the question is asking what the average cost of a successful binary search would be. You could imagine binary searching for all n elements of the array and looking at the average cost of such an operation. In that case, there's one element for which the search makes one probe, two for which the search makes two probes, four for which it makes three probes, etc. This averages out to O(log n).
Hope this helps!
iAmortized cost is the total cost over all possible queries divided by the number of possible queries. You will get slightly different results depending on how you count queries that fail to find the item. (Either don't count them at all, or count one for each gap where a missing item could be.)
So for a search of 2^n - 1 items (just as an example to keep the math simple), there is one item you would find on your first probe, 2 items would be found on the second probe, 4 on the third probe, ... 2^(n-1) on the nth probe. There are 2^n "gaps" for missing items (remembering to count both ends as gaps).
With your algorithm, finding an item on probe k costs 2k-1 comparisons. (That's 2 compares for each of the k-1 probes before the kth, plus one where the test for == returns true.) Searching for an item not in the table costs 2n comparisons.
I'll leave it to you to do the math, but I can't leave the topic without expressing how irked I am when I see binary search coded this way. Consider:
public static boolean binarySearch(int x, int[] sorted {
int s = 0; // start
int e = sorted.length; // end
// Loop invariant: if x is at sorted[k] then s <= k < e
int mid = (s + e)/2;
while (mid != s) {
if (sorted[mid] > x) e = mid; else s = mid;
mid = (s + e)/2; }
return (mid < e) && (sorted[mid] == x); // mid == e means the array was empty
}
You don't short-circuit the loop when you hit the item you're looking for, which seems like a defect, but on the other hand you do only one comparison on every item you look at, instead of two comparisons on each item that doesn't match. Since half of all items are found at leaves of the search tree, what seems like a defect turns out to be a major gain. Indeed, the number of elements where short-circuiting the loop is beneficial is only about the square root of the number of elements in the array.
Grind through the arithmetic, computing amortized search cost (counting "cost" as the number of comparisons to sorted[mid], and you'll see that this version is approximately twice as fast. It also has constant cost (within ±1 comparison), depending only on the number of items in the array and not on where or even if the item is found. Not that that's important.

Getting the number of "friend" tower

Given n towers numbered 1, 2, 3,...,n, with their height (h[i] = towers[i] 's height) and a number k.
Two tower a, b are considered friends iff:
a - b = k
h[a] == h[b]
max(h[a+1], h[a+2] ... h[b - 1]) <= h[a]
How many `friendships' are there ?
Solution is straight forward:
for i = 1, 2, 3, 4, 5, ..., n - k:
if h[i] == h[i+k]:
for j in range[i, i+k] :
MAX = max(MAX, h[j]
if MAX <= h[i]:
ans++
But I want the solution in the most efficient way. Please help.
For a large n, the program will eat the RAM; to reduce that, instead of array I used a queue to add the height of towers (when q.size() == k, Just q.pop() ) . Checking for 3rd condition with a large k with naive solution must take time.
You can use deque to provide O(n) algorithm.
At every step:
Remove too old elements from the deque head
(if currentindex - index >= k)
Remove elements from tail that have no chance to become maximum
in the k-size window (those < currentvalue)
Add new element (index) to the deque tail
This keeps index of max element in k-size window on the head of the deque, so you can determine - is there larger value between two towers
Decription of (sliding minimum) algorithm with pseudocode:
Can min/max of moving window achieve in O(N)?
Elaborating on my comment, you could use the answer to this question to build a queue that can keep track of the maximum element between two towers. Moving to the next element only takes O(1) amortized time. I made a simple implementation in pseudocode, assuming the language supports a standard stack (would be surprised if it didn't). For an explanation, see the linked answer.
class TupleStack
Stack stack
void push(int x)
if stack.isEmpty()
stack.push((value: x, max: x))
else
stack.push((value: x, max: max(x, stack.peek().max))
int pop()
return stack.pop().value
bool isEmpty()
return stack.isEmpty()
int getMax()
if isEmpty()
return -infinity
else
return stack.peek().max
class MaxQueue
TupleStack stack1
TupleStack stack2
void enqueue(int x)
stack1.push(x)
int dequeue()
if stack2.isEmpty()
while !stack1.isEmpty()
stack2.push(stack1.pop())
return stack2.pop()
int getMax()
return max(stack1.getMax(), stack2.getMax())
Your algorithm is now very trivial. Put the first k elements in the queue. After that, repetitively check if two towers at distance k have the same height, check that the max in between (which is the max of the queue) is at most their height, and move to the next two towers. Updating the queue takes O(1) amortized time, so this algorithm runs in O(n), which is clearly optimal.
MaxQueue queue
for (int i = 1; i <= k; i++) // add first k towers to queue
queue.enqueue(h[i])
for (int i = k+1; i <= n; i++)
if h[i] == h[i-k] and h[i] >= queue.getMax()
ans++
queue.enqueue(h[i])
queue.dequeue()

How to increment all values in an array interval by a given amount

Suppose i have an array A of length L. I will be given n intervals(i,j) and i have to increment all values between A[i] and A[j].Which data structure would be most suitable for the given operations?
The intervals are known beforehand.
You can get O(N + M). Keep an extra increment array B the same size of A initially empty (filled with 0). If you need to increment the range (i, j) with value k then do B[i] += k and B[j + 1] -= k
Now do a partial sum transformation in B, considering you're indexing from 0:
for (int i = 1; i < N; ++i) B[i] += B[i - 1];
And now the final values of A are A[i] + B[i]
break all intervals into start and end indexes: s_i,e_i for the i-th interval which starts including s_i and ends excluding e_i
sort all s_i-s as an array S
sort all e_i-s as an array E
set increment to zero
start a linear scan of the input and add increment to everyone,
in each loop if the next s_i is the current index increment increment if the next e_i is index decement increment
inc=0
s=<PriorityQueue of interval startindexes>
e=<PriorityQueue of interval endindexes>
for(i=0;i<n;i++){
if( inc == 0 ){
// skip adding zeros
i=min(s.peek(),e.peek())
}
while( s.peek() == i ) {
s.pop();
inc++;
}
while( e.peek() == i ) {
e.pop();
inc--;
}
a[i]+=inc;
}
complexity(without skipping nonincremented elements): O(n+m*log(m)) // m is the number of intervals
if n>>m then it's O(n)
complexity when skipping elements: O( min( n , \sum length(I_i) ) ), where length(I_i)=e_i-s_i
There are three main approaches that I can think of:
Approach 1
This is the simplest one, where you just keep the array as is, and do the naive thing for increment.
Pros: Querying is constant time
Cons: Increment can be linear time (and hence pretty slow if L is big)
Approach 2
This one is a little more complicated, but is better if you plan on incrementing a lot.
Store the elements in a binary tree so that an in-order traversal accesses the elements in order. Each node (aside from the normal left and right subchildren) also stores an extra int addOn, which will be "add me when you query any node in this tree".
For querying elements, do the normal binary search on index to find the element, adding up all of the values of the addOn variables as you go. Add those to the A[i] at the node you want, and that's your value.
For increments, traverse down into the tree, updating all of these new addOns as necessary. Note that if you add the incremented value to an addOn for one node, you do not update it for the two children. The runtime for each increment is then O(log L), since the only times you ever have to "branch off" into the children is when the first or last element in the interval is in your range. Hence, you branch off at most 2 log L times, and access a constant factor more in elements.
Pros: Increment is now O(log L), so now things are much faster than before if you increment a ton.
Cons: Queries take longer (also O(log L)), and the implementation is much trickier.
Approach 3
Use an interval tree.
Pros: Just like approach 2, this one can be much faster than the naive approach
Cons: Not doable if you don't know what the intervals are going to be beforehand.Also tricky to implement
Solve the problem for a single interval. Then iterate over all intervals and apply the single-interval solution for each. The best data structure depends on the language. Here's a Java example:
public class Interval {
int i;
int j;
}
public void increment(int[] array, Interval interval) {
for (int i = interval.i; i < interval.j; ++i) {
++array[i];
}
}
public void increment(int[] array, Interval[] intervals) {
for (Interval interval : intervals) {
increment(array, interval);
}
}
Obviously you could nest one loop inside the other if you wanted to reduce the amount of code. However, a single-interval method might be useful in its own right.
EDIT
If the intervals are known beforehand, then you can improve things a bit. You can modify the Interval structure to maintain an increment amount (which defaults to 1). Then preprocess the set of intervals S as follows:
Initialize a second set of intervals T to the empty set
For each interval I in S: if I does not overlap any interval in T, add I to T; otherwise:
For each interval J in T that overlaps I, remove J from T, form new intervals K1...Kn from I and J such that there are no overlaps (n can be from 1 to 3), and add K1...Kn to T
When this finishes, use the intervals in T with the earlier code (modified as described). Since there are no overlaps, no element of the array will be incremented more than once. For a fixed set of intervals, this is a constant time algorithm, regardless of the array length.
For N intervals, the splitting process can probably be designed to run in something close to O(N log N) by keeping T ordered by interval start index. But if the cost is amortized among many array increment operations, this isn't all that important to the overall complexity.
A Possible implementation of O(M+N) algorithm suggested by Adrian Budau
import java.util.Scanner;
class Interval{
int i;
int j;
}
public class IncrementArray {
public static void main(String[] args) {
int k= 5; // increase array elements by this value
Scanner sc = new Scanner(System.in);
int intervalNo = sc.nextInt(); // specify no of intervals
Interval[] interval = new Interval[intervalNo]; // array containing ranges/intervals
System.out.println(">"+sc.nextLine()+"<");
for(int i=0;i<intervalNo;i++)
{
interval[i]= new Interval();
String s = sc.nextLine(); // specify i and j separated by comma in one line for an interval.
String[] s1 = s.split(" ");
interval[i].i= Integer.parseInt(s1[0]);
interval[i].j= Integer.parseInt(s1[1]);
}
int[] arr = new int[10]; // array whose values need to be incremented.
for(int i=0;i<arr.length;++i)
arr[i]=i+1; // initialising array.
int[] temp = new int[10];
Interval run=interval[0]; int i;
for(i=0;i<intervalNo;i++,run=interval[i<intervalNo?i:0] ) // i<intervalNo?i:0 is used just to avoid arrayBound Exceptions at last iteration.
{
temp[run.i]+=k;
if(run.j+1<10) // incrementing temp within array bounds.
temp[run.j +1]-=k;
}
for (i = 1; i < 10; ++i)
temp[i] += temp[i - 1];
for(i=0, run=interval[i];i<10;i++)
{
arr[i]+= temp[i];
System.out.print(" "+arr[i]); // printing results.
}
}
}

array- having some issues [duplicate]

An interesting interview question that a colleague of mine uses:
Suppose that you are given a very long, unsorted list of unsigned 64-bit integers. How would you find the smallest non-negative integer that does not occur in the list?
FOLLOW-UP: Now that the obvious solution by sorting has been proposed, can you do it faster than O(n log n)?
FOLLOW-UP: Your algorithm has to run on a computer with, say, 1GB of memory
CLARIFICATION: The list is in RAM, though it might consume a large amount of it. You are given the size of the list, say N, in advance.
If the datastructure can be mutated in place and supports random access then you can do it in O(N) time and O(1) additional space. Just go through the array sequentially and for every index write the value at the index to the index specified by value, recursively placing any value at that location to its place and throwing away values > N. Then go again through the array looking for the spot where value doesn't match the index - that's the smallest value not in the array. This results in at most 3N comparisons and only uses a few values worth of temporary space.
# Pass 1, move every value to the position of its value
for cursor in range(N):
target = array[cursor]
while target < N and target != array[target]:
new_target = array[target]
array[target] = target
target = new_target
# Pass 2, find first location where the index doesn't match the value
for cursor in range(N):
if array[cursor] != cursor:
return cursor
return N
Here's a simple O(N) solution that uses O(N) space. I'm assuming that we are restricting the input list to non-negative numbers and that we want to find the first non-negative number that is not in the list.
Find the length of the list; lets say it is N.
Allocate an array of N booleans, initialized to all false.
For each number X in the list, if X is less than N, set the X'th element of the array to true.
Scan the array starting from index 0, looking for the first element that is false. If you find the first false at index I, then I is the answer. Otherwise (i.e. when all elements are true) the answer is N.
In practice, the "array of N booleans" would probably be encoded as a "bitmap" or "bitset" represented as a byte or int array. This typically uses less space (depending on the programming language) and allows the scan for the first false to be done more quickly.
This is how / why the algorithm works.
Suppose that the N numbers in the list are not distinct, or that one or more of them is greater than N. This means that there must be at least one number in the range 0 .. N - 1 that is not in the list. So the problem of find the smallest missing number must therefore reduce to the problem of finding the smallest missing number less than N. This means that we don't need to keep track of numbers that are greater or equal to N ... because they won't be the answer.
The alternative to the previous paragraph is that the list is a permutation of the numbers from 0 .. N - 1. In this case, step 3 sets all elements of the array to true, and step 4 tells us that the first "missing" number is N.
The computational complexity of the algorithm is O(N) with a relatively small constant of proportionality. It makes two linear passes through the list, or just one pass if the list length is known to start with. There is no need to represent the hold the entire list in memory, so the algorithm's asymptotic memory usage is just what is needed to represent the array of booleans; i.e. O(N) bits.
(By contrast, algorithms that rely on in-memory sorting or partitioning assume that you can represent the entire list in memory. In the form the question was asked, this would require O(N) 64-bit words.)
#Jorn comments that steps 1 through 3 are a variation on counting sort. In a sense he is right, but the differences are significant:
A counting sort requires an array of (at least) Xmax - Xmin counters where Xmax is the largest number in the list and Xmin is the smallest number in the list. Each counter has to be able to represent N states; i.e. assuming a binary representation it has to have an integer type (at least) ceiling(log2(N)) bits.
To determine the array size, a counting sort needs to make an initial pass through the list to determine Xmax and Xmin.
The minimum worst-case space requirement is therefore ceiling(log2(N)) * (Xmax - Xmin) bits.
By contrast, the algorithm presented above simply requires N bits in the worst and best cases.
However, this analysis leads to the intuition that if the algorithm made an initial pass through the list looking for a zero (and counting the list elements if required), it would give a quicker answer using no space at all if it found the zero. It is definitely worth doing this if there is a high probability of finding at least one zero in the list. And this extra pass doesn't change the overall complexity.
EDIT: I've changed the description of the algorithm to use "array of booleans" since people apparently found my original description using bits and bitmaps to be confusing.
Since the OP has now specified that the original list is held in RAM and that the computer has only, say, 1GB of memory, I'm going to go out on a limb and predict that the answer is zero.
1GB of RAM means the list can have at most 134,217,728 numbers in it. But there are 264 = 18,446,744,073,709,551,616 possible numbers. So the probability that zero is in the list is 1 in 137,438,953,472.
In contrast, my odds of being struck by lightning this year are 1 in 700,000. And my odds of getting hit by a meteorite are about 1 in 10 trillion. So I'm about ten times more likely to be written up in a scientific journal due to my untimely death by a celestial object than the answer not being zero.
As pointed out in other answers you can do a sort, and then simply scan up until you find a gap.
You can improve the algorithmic complexity to O(N) and keep O(N) space by using a modified QuickSort where you eliminate partitions which are not potential candidates for containing the gap.
On the first partition phase, remove duplicates.
Once the partitioning is complete look at the number of items in the lower partition
Is this value equal to the value used for creating the partition?
If so then it implies that the gap is in the higher partition.
Continue with the quicksort, ignoring the lower partition
Otherwise the gap is in the lower partition
Continue with the quicksort, ignoring the higher partition
This saves a large number of computations.
To illustrate one of the pitfalls of O(N) thinking, here is an O(N) algorithm that uses O(1) space.
for i in [0..2^64):
if i not in list: return i
print "no 64-bit integers are missing"
Since the numbers are all 64 bits long, we can use radix sort on them, which is O(n). Sort 'em, then scan 'em until you find what you're looking for.
if the smallest number is zero, scan forward until you find a gap. If the smallest number is not zero, the answer is zero.
For a space efficient method and all values are distinct you can do it in space O( k ) and time O( k*log(N)*N ). It's space efficient and there's no data moving and all operations are elementary (adding subtracting).
set U = N; L=0
First partition the number space in k regions. Like this:
0->(1/k)*(U-L) + L, 0->(2/k)*(U-L) + L, 0->(3/k)*(U-L) + L ... 0->(U-L) + L
Find how many numbers (count{i}) are in each region. (N*k steps)
Find the first region (h) that isn't full. That means count{h} < upper_limit{h}. (k steps)
if h - count{h-1} = 1 you've got your answer
set U = count{h}; L = count{h-1}
goto 2
this can be improved using hashing (thanks for Nic this idea).
same
First partition the number space in k regions. Like this:
L + (i/k)->L + (i+1/k)*(U-L)
inc count{j} using j = (number - L)/k (if L < number < U)
find first region (h) that doesn't have k elements in it
if count{h} = 1 h is your answer
set U = maximum value in region h L = minimum value in region h
This will run in O(log(N)*N).
I'd just sort them then run through the sequence until I find a gap (including the gap at the start between zero and the first number).
In terms of an algorithm, something like this would do it:
def smallest_not_in_list(list):
sort(list)
if list[0] != 0:
return 0
for i = 1 to list.last:
if list[i] != list[i-1] + 1:
return list[i-1] + 1
if list[list.last] == 2^64 - 1:
assert ("No gaps")
return list[list.last] + 1
Of course, if you have a lot more memory than CPU grunt, you could create a bitmask of all possible 64-bit values and just set the bits for every number in the list. Then look for the first 0-bit in that bitmask. That turns it into an O(n) operation in terms of time but pretty damned expensive in terms of memory requirements :-)
I doubt you could improve on O(n) since I can't see a way of doing it that doesn't involve looking at each number at least once.
The algorithm for that one would be along the lines of:
def smallest_not_in_list(list):
bitmask = mask_make(2^64) // might take a while :-)
mask_clear_all (bitmask)
for i = 1 to list.last:
mask_set (bitmask, list[i])
for i = 0 to 2^64 - 1:
if mask_is_clear (bitmask, i):
return i
assert ("No gaps")
Sort the list, look at the first and second elements, and start going up until there is a gap.
We could use a hash table to hold the numbers. Once all numbers are done, run a counter from 0 till we find the lowest. A reasonably good hash will hash and store in constant time, and retrieves in constant time.
for every i in X // One scan Θ(1)
hashtable.put(i, i); // O(1)
low = 0;
while (hashtable.get(i) <> null) // at most n+1 times
low++;
print low;
The worst case if there are n elements in the array, and are {0, 1, ... n-1}, in which case, the answer will be obtained at n, still keeping it O(n).
You can do it in O(n) time and O(1) additional space, although the hidden factor is quite large. This isn't a practical way to solve the problem, but it might be interesting nonetheless.
For every unsigned 64-bit integer (in ascending order) iterate over the list until you find the target integer or you reach the end of the list. If you reach the end of the list, the target integer is the smallest integer not in the list. If you reach the end of the 64-bit integers, every 64-bit integer is in the list.
Here it is as a Python function:
def smallest_missing_uint64(source_list):
the_answer = None
target = 0L
while target < 2L**64:
target_found = False
for item in source_list:
if item == target:
target_found = True
if not target_found and the_answer is None:
the_answer = target
target += 1L
return the_answer
This function is deliberately inefficient to keep it O(n). Note especially that the function keeps checking target integers even after the answer has been found. If the function returned as soon as the answer was found, the number of times the outer loop ran would be bound by the size of the answer, which is bound by n. That change would make the run time O(n^2), even though it would be a lot faster.
Thanks to egon, swilden, and Stephen C for my inspiration. First, we know the bounds of the goal value because it cannot be greater than the size of the list. Also, a 1GB list could contain at most 134217728 (128 * 2^20) 64-bit integers.
Hashing part
I propose using hashing to dramatically reduce our search space. First, square root the size of the list. For a 1GB list, that's N=11,586. Set up an integer array of size N. Iterate through the list, and take the square root* of each number you find as your hash. In your hash table, increment the counter for that hash. Next, iterate through your hash table. The first bucket you find that is not equal to it's max size defines your new search space.
Bitmap part
Now set up a regular bit map equal to the size of your new search space, and again iterate through the source list, filling out the bitmap as you find each number in your search space. When you're done, the first unset bit in your bitmap will give you your answer.
This will be completed in O(n) time and O(sqrt(n)) space.
(*You could use use something like bit shifting to do this a lot more efficiently, and just vary the number and size of buckets accordingly.)
Well if there is only one missing number in a list of numbers, the easiest way to find the missing number is to sum the series and subtract each value in the list. The final value is the missing number.
int i = 0;
while ( i < Array.Length)
{
if (Array[i] == i + 1)
{
i++;
}
if (i < Array.Length)
{
if (Array[i] <= Array.Length)
{//SWap
int temp = Array[i];
int AnoTemp = Array[temp - 1];
Array[temp - 1] = temp;
Array[i] = AnoTemp;
}
else
i++;
}
}
for (int j = 0; j < Array.Length; j++)
{
if (Array[j] > Array.Length)
{
Console.WriteLine(j + 1);
j = Array.Length;
}
else
if (j == Array.Length - 1)
Console.WriteLine("Not Found !!");
}
}
Here's my answer written in Java:
Basic Idea:
1- Loop through the array throwing away duplicate positive, zeros, and negative numbers while summing up the rest, getting the maximum positive number as well, and keep the unique positive numbers in a Map.
2- Compute the sum as max * (max+1)/2.
3- Find the difference between the sums calculated at steps 1 & 2
4- Loop again from 1 to the minimum of [sums difference, max] and return the first number that is not in the map populated in step 1.
public static int solution(int[] A) {
if (A == null || A.length == 0) {
throw new IllegalArgumentException();
}
int sum = 0;
Map<Integer, Boolean> uniqueNumbers = new HashMap<Integer, Boolean>();
int max = A[0];
for (int i = 0; i < A.length; i++) {
if(A[i] < 0) {
continue;
}
if(uniqueNumbers.get(A[i]) != null) {
continue;
}
if (A[i] > max) {
max = A[i];
}
uniqueNumbers.put(A[i], true);
sum += A[i];
}
int completeSum = (max * (max + 1)) / 2;
for(int j = 1; j <= Math.min((completeSum - sum), max); j++) {
if(uniqueNumbers.get(j) == null) { //O(1)
return j;
}
}
//All negative case
if(uniqueNumbers.isEmpty()) {
return 1;
}
return 0;
}
As Stephen C smartly pointed out, the answer must be a number smaller than the length of the array. I would then find the answer by binary search. This optimizes the worst case (so the interviewer can't catch you in a 'what if' pathological scenario). In an interview, do point out you are doing this to optimize for the worst case.
The way to use binary search is to subtract the number you are looking for from each element of the array, and check for negative results.
I like the "guess zero" apprach. If the numbers were random, zero is highly probable. If the "examiner" set a non-random list, then add one and guess again:
LowNum=0
i=0
do forever {
if i == N then leave /* Processed entire array */
if array[i] == LowNum {
LowNum++
i=0
}
else {
i++
}
}
display LowNum
The worst case is n*N with n=N, but in practice n is highly likely to be a small number (eg. 1)
I am not sure if I got the question. But if for list 1,2,3,5,6 and the missing number is 4, then the missing number can be found in O(n) by:
(n+2)(n+1)/2-(n+1)n/2
EDIT: sorry, I guess I was thinking too fast last night. Anyway, The second part should actually be replaced by sum(list), which is where O(n) comes. The formula reveals the idea behind it: for n sequential integers, the sum should be (n+1)*n/2. If there is a missing number, the sum would be equal to the sum of (n+1) sequential integers minus the missing number.
Thanks for pointing out the fact that I was putting some middle pieces in my mind.
Well done Ants Aasma! I thought about the answer for about 15 minutes and independently came up with an answer in a similar vein of thinking to yours:
#define SWAP(x,y) { numerictype_t tmp = x; x = y; y = tmp; }
int minNonNegativeNotInArr (numerictype_t * a, size_t n) {
int m = n;
for (int i = 0; i < m;) {
if (a[i] >= m || a[i] < i || a[i] == a[a[i]]) {
m--;
SWAP (a[i], a[m]);
continue;
}
if (a[i] > i) {
SWAP (a[i], a[a[i]]);
continue;
}
i++;
}
return m;
}
m represents "the current maximum possible output given what I know about the first i inputs and assuming nothing else about the values until the entry at m-1".
This value of m will be returned only if (a[i], ..., a[m-1]) is a permutation of the values (i, ..., m-1). Thus if a[i] >= m or if a[i] < i or if a[i] == a[a[i]] we know that m is the wrong output and must be at least one element lower. So decrementing m and swapping a[i] with the a[m] we can recurse.
If this is not true but a[i] > i then knowing that a[i] != a[a[i]] we know that swapping a[i] with a[a[i]] will increase the number of elements in their own place.
Otherwise a[i] must be equal to i in which case we can increment i knowing that all the values of up to and including this index are equal to their index.
The proof that this cannot enter an infinite loop is left as an exercise to the reader. :)
The Dafny fragment from Ants' answer shows why the in-place algorithm may fail. The requires pre-condition describes that the values of each item must not go beyond the bounds of the array.
method AntsAasma(A: array<int>) returns (M: int)
requires A != null && forall N :: 0 <= N < A.Length ==> 0 <= A[N] < A.Length;
modifies A;
{
// Pass 1, move every value to the position of its value
var N := A.Length;
var cursor := 0;
while (cursor < N)
{
var target := A[cursor];
while (0 <= target < N && target != A[target])
{
var new_target := A[target];
A[target] := target;
target := new_target;
}
cursor := cursor + 1;
}
// Pass 2, find first location where the index doesn't match the value
cursor := 0;
while (cursor < N)
{
if (A[cursor] != cursor)
{
return cursor;
}
cursor := cursor + 1;
}
return N;
}
Paste the code into the validator with and without the forall ... clause to see the verification error. The second error is a result of the verifier not being able to establish a termination condition for the Pass 1 loop. Proving this is left to someone who understands the tool better.
Here's an answer in Java that does not modify the input and uses O(N) time and N bits plus a small constant overhead of memory (where N is the size of the list):
int smallestMissingValue(List<Integer> values) {
BitSet bitset = new BitSet(values.size() + 1);
for (int i : values) {
if (i >= 0 && i <= values.size()) {
bitset.set(i);
}
}
return bitset.nextClearBit(0);
}
def solution(A):
index = 0
target = []
A = [x for x in A if x >=0]
if len(A) ==0:
return 1
maxi = max(A)
if maxi <= len(A):
maxi = len(A)
target = ['X' for x in range(maxi+1)]
for number in A:
target[number]= number
count = 1
while count < maxi+1:
if target[count] == 'X':
return count
count +=1
return target[count-1] + 1
Got 100% for the above solution.
1)Filter negative and Zero
2)Sort/distinct
3)Visit array
Complexity: O(N) or O(N * log(N))
using Java8
public int solution(int[] A) {
int result = 1;
boolean found = false;
A = Arrays.stream(A).filter(x -> x > 0).sorted().distinct().toArray();
//System.out.println(Arrays.toString(A));
for (int i = 0; i < A.length; i++) {
result = i + 1;
if (result != A[i]) {
found = true;
break;
}
}
if (!found && result == A.length) {
//result is larger than max element in array
result++;
}
return result;
}
An unordered_set can be used to store all the positive numbers, and then we can iterate from 1 to length of unordered_set, and see the first number that does not occur.
int firstMissingPositive(vector<int>& nums) {
unordered_set<int> fre;
// storing each positive number in a hash.
for(int i = 0; i < nums.size(); i +=1)
{
if(nums[i] > 0)
fre.insert(nums[i]);
}
int i = 1;
// Iterating from 1 to size of the set and checking
// for the occurrence of 'i'
for(auto it = fre.begin(); it != fre.end(); ++it)
{
if(fre.find(i) == fre.end())
return i;
i +=1;
}
return i;
}
Solution through basic javascript
var a = [1, 3, 6, 4, 1, 2];
function findSmallest(a) {
var m = 0;
for(i=1;i<=a.length;i++) {
j=0;m=1;
while(j < a.length) {
if(i === a[j]) {
m++;
}
j++;
}
if(m === 1) {
return i;
}
}
}
console.log(findSmallest(a))
Hope this helps for someone.
With python it is not the most efficient, but correct
#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
import datetime
# write your code in Python 3.6
def solution(A):
MIN = 0
MAX = 1000000
possible_results = range(MIN, MAX)
for i in possible_results:
next_value = (i + 1)
if next_value not in A:
return next_value
return 1
test_case_0 = [2, 2, 2]
test_case_1 = [1, 3, 44, 55, 6, 0, 3, 8]
test_case_2 = [-1, -22]
test_case_3 = [x for x in range(-10000, 10000)]
test_case_4 = [x for x in range(0, 100)] + [x for x in range(102, 200)]
test_case_5 = [4, 5, 6]
print("---")
a = datetime.datetime.now()
print(solution(test_case_0))
print(solution(test_case_1))
print(solution(test_case_2))
print(solution(test_case_3))
print(solution(test_case_4))
print(solution(test_case_5))
def solution(A):
A.sort()
j = 1
for i, elem in enumerate(A):
if j < elem:
break
elif j == elem:
j += 1
continue
else:
continue
return j
this can help:
0- A is [5, 3, 2, 7];
1- Define B With Length = A.Length; (O(1))
2- initialize B Cells With 1; (O(n))
3- For Each Item In A:
if (B.Length <= item) then B[Item] = -1 (O(n))
4- The answer is smallest index in B such that B[index] != -1 (O(n))

Resources