Majority element of JPEG images using Divide and conquer [duplicate] - algorithm

An array is said to have a majority element if more than half of its elements are the same. Is there a divide-and-conquer algorithm for determining if an array has a majority element?
I normally do the following, but it is not using divide-and-conquer. I do not want to use the Boyer-Moore algorithm.
int find(int[] arr, int size) {
int count = 0, i, mElement;
for (i = 0; i < size; i++) {
if (count == 0) mElement = arr[i];
if (arr[i] == mElement) count++;
else count--;
}
count = 0;
for (i = 0; i < size; i++) {
if (arr[i] == mElement) count++;
}
if (count > size / 2) return mElement;
return -1;
}

I can see at least one divide and conquer method.
Start by finding the median, such as with Hoare's Select algorithm. If one value forms a majority of the elements, the median must have that value, so we've just found the value we're looking for.
From there, find (for example) the 25th and 75th percentile items. Again, if there's a majority element, at least one of those would need to have the same value as the median.
Assuming you haven't ruled out there being a majority element yet, you can continue the search. For example, let's assume the 75th percentile was equal to the median, but the 25th percentile wasn't.
When then continue searching for the item halfway between the 25th percentile and the median, as well as the one halfway between the 75th percentile and the end.
Continue finding the median of each partition that must contain the end of the elements with the same value as the median until you've either confirmed or denied the existence of a majority element.
As an aside: I don't quite see how Boyer-Moore would be used for this task. Boyer-Moore is a way of finding a substring in a string.

There is, and it does not require the elements to have an order.
To be formal, we're dealing with multisets (also called bags.) In the following, for a multiset S, let:
v(e,S) be the multiplicity of an element e in S, i.e. the number of times it occurs (the multiplicity is zero if e is not a member of S at all.)
#S be the cardinality of S, i.e. the number of elements in S counting multiplicity.
⊕ be the multiset sum: if S = L ⊕ R then S contains all the elements of L and R counting multiplicity, i.e. v(e;S) = v(e;L) + v(e;R) for any element e. (This also shows that the multiplicity can be calculated by 'divide-and-conquer'.)
[x] be the largest integer less than or equal to x.
The majority element m of S, if it exists, is that element such that 2 v(m;S) > #S.
Let's call L and R a splitting of S if L ⊕ R = S and an even splitting if |#L - #R| ≤ 1. That is, if n=#S is even, L and R have exactly half the elements of S, and if n is odd, than one has cardinality [n/2] and the other has cardinality [n/2]+1.
For an arbitrary split of S into L and R, two observations:
If neither L nor R has a majority element, then S cannot: for any element e, 2 v(e;S) = 2 v(e;L) + 2 v(e;R) ≤ #L + #R = #S.
If one of L and R has a majority element m with multiplicity k, then it is the majority element of S only if it has multiplicity r in the other half, with 2(k+r) > #S.
The algorithm majority(S) below returns either a pair (m,k), indicating that m is the majority element with k occurrences, or none:
If S is empty, return none; if S has just one element m, then return (m,1). Otherwise:
Make an even split of S into two halves L and R.
Let (m,k) = majority(L), if not none:
a. Let k' = k + v(m;R).
b. Return (m,k') if 2 k' > n.
Otherwise let (m,k) = majority(R), if not none:
a. Let k' = k + v(m;L).
b. Return (m,k') if 2 k' > n.
Otherwise return none.
Note that the algorithm is still correct even if the split is not an even one. Splitting evenly though is likely to perform better in practice.
Addendum
Made the terminal case explicit in the algorithm description above. Some sample C++ code:
struct majority_t {
int m; // majority element
size_t k; // multiplicity of m; zero => no majority element
constexpr majority_t(): m(0), k(0) {}
constexpr majority_t(int m_,size_t k_): m(m_), k(k_) {}
explicit operator bool() const { return k>0; }
};
static constexpr majority_t no_majority;
size_t multiplicity(int x,const int *arr,size_t n) {
if (n==0) return 0;
else if (n==1) return arr[0]==x?1:0;
size_t r=n/2;
return multiplicity(x,arr,r)+multiplicity(x,arr+r,n-r);
}
majority_t majority(const int *arr,size_t n) {
if (n==0) return no_majority;
else if (n==1) return majority_t(arr[0],1);
size_t r=n/2;
majority_t left=majority(arr,r);
if (left) {
left.k+=multiplicity(left.m,arr+r,n-r);
if (left.k>r) return left;
}
majority_t right=majority(arr+r,n-r);
if (right) {
right.k+=multiplicity(right.m,arr,r);
if (right.k>r) return right;
}
return no_majority;
}

A simpler divide and conquer algorithm works for the case that there exists more than 1/2 elements which are the same and there are n = 2^k elements for some integer k.
FindMost(A, startIndex, endIndex)
{ // input array A
if (startIndex == endIndex) // base case
return A[startIndex];
x = FindMost(A, startIndex, (startIndex + endIndex - 1)/2);
y = FindMost(A, (startIndex + endIndex - 1)/2 + 1, endIndex);
if (x == null && y == null)
return null;
else if (x == null && y != null)
return y;
else if (x != null && y == null)
return x;
else if (x != y)
return null;
else return x
}
This algorithm could be modified so that it works for n which is not exponent of 2, but boundary cases must be handled carefully.

Lets say the array is 1, 2, 1, 1, 3, 1, 4, 1, 6, 1.
If an array contains more than half of elements same then there should be a position where the two consecutive elements are same.
In the above example observe 1 is repeated more than half times. And the indexes(index start from 0) index 2 and index 3 have same element.

Related

Smallest missing integer algorithm that runs in O(n)?

What algorithm might find a missing integer in O(n) time, from an array?
Say we have an array A with elements in a value-range {1,2,3...2n}. Half the elements are missing so length of A = n.
E.g:
A = [1,2,5,3,10] , n=5
Output = 4
The smallest missing integer must be in the range [1, ..., n+1]. So create an array of flags, all initially false, indicating the presence of that integer. Then an algorithm is:
Scan the input array, setting flags to true as you encounter values in the range. This operation is O(n). (That is, set flag[A[i]] to true for each position i in the input array, provided A[i] <= n.)
Scan the flag array for the first false flag. This operation is also O(n). The index of the first false flag is the smallest missing integer.
EDIT: O(n) time algorithm with O(1) extra space:
If A is writable and there are some extra bits available in the elements of A, then a constant-extra-space algorithm is possible. For instance, if the elements of A are signed values, and since all the numbers are positive, we can use the sign bit of the numbers in the original array as the flags, rather than creating a new flag array. So the algorithm would be:
For each position i of the original array, if abs(A[i]) < n+1, make the value at A[abs(A[i])] negative. (This assumes array indexes are based at 1. Adjust in the obvious way if you are using 0-based arrays.) Don't just negate the value, in case there are duplicate values in A.
Find the index of the first element of A that is positive. That index is the smallest missing number in A. If all positions are negative, then A must be a permutation of {1, ..., n} and hence the smallest missing number is n+1.
If the elements are unsigned, but can hold values as high as 4 n + 1, then in step 1, instead of making the element negative, add 2 n + 1 (provided the element is <= 2 n) and use (A[i] mod (2n+1)) instead of abs(A[i]). Then in step 2, find the first element < 2 n + 1 instead of the first positive element. Other such tricks are possible as well.
You can do this in O(1) additional space, assuming that the only valid operations on the array is to read elements, and to swap pairs of elements.
First note that the specification of the problem excludes the possibility of the array containing duplicates: it contains half of the numbers from 1 to 2N.
We perform a quick-select type algorithm. Start with m=1, M=2N+1, and pivot the array on (m + M)/2. If the size of the left part of the array (elements <= (m+M)/2) is less than (m + M)/2 - m + 1, then the first missing number must be there. Otherwise, it must be in the right part of the array. Repeat on the left or right side accordingly until you find the missing number.
The size of the slice of the array under consideration halves each time and pivoting an array of size n can be done in O(n) time and O(1) space. So overall, the time complexity is 2N + N + N/2 + ... + 1 <= 4N = O(N).
An implementation of Paul Hankin's idea in C++
#include <iostream>
using namespace std;
const int MAX = 1000;
int a[MAX];
int n;
void swap(int &a, int &b) {
int tmp = a;
a = b;
b = tmp;
}
// Rearranges elements of a[l..r] in such a way that first come elements
// lower or equal to M, next come elements greater than M. Elements in each group
// come in no particular order.
// Returns an index of the first element among a[l..r] which is greater than M.
int rearrange(int l, int r, int M) {
int i = l, j = r;
while (i <= j)
if (a[i] <= M) i++;
else swap(a[i], a[j--]);
return i;
}
int main() {
cin >> n;
for (int i = 0; i < n; i++) cin >> a[i];
int L = 1, R = 2 * n;
int l = 0, r = n - 1;
while (L < R) {
int M = (L + R) / 2; // pivot element
int m = rearrange(l, r, M);
if (m - l == M - L + 1)
l = m, L = M + 1;
else
r = m - 1, R = M;
}
cout << L;
return 0;
}

dynamic programming reduction of brute force

A emoticon consists of an arbitrary positive number of underscores between two semicolons. Hence, the shortest possible emoticon is ;_;. The strings ;__; and ;_____________; are also valid emoticons.
given a String containing only(;,_).The problem is to divide string into one or more emoticons and count how many division are possible. Each emoticon must be a subsequence of the message, and each character of the message must belong to exactly one emoticon. Note that the subsequences are not required to be contiguous. subsequence definition.
The approach I thought of is to write a recursive method as follows:
countDivision(string s){
//base cases
if(s.empty()) return 1;
if(s.length()<=3){
if(s.length()!=3) return 0;
return s[0]==';' && s[1]=='_' && s[2]==';';
}
result=0;
//subproblems
genrate all valid emocticon and remove it from s let it be w
result+=countDivision(w);
return result;
}
The solution above will easily timeout when n is large such as 100. What kind of approach should I use to convert this brute force solution to a dynamic programming solution?
Few examples
1. ";_;;_____;" ans is 2
2. ";;;___;;;" ans is 36
Example 1.
";_;;_____;" Returns: 2
There are two ways to divide this string into two emoticons.
One looks as follows: ;_;|;_____; and the other looks like
this(rembember we can pick subsequence it need not be contigous): ;_ ;|; _____;
I'll describe an O(n^4)-time and -space dynamic programming solution (that can easily be improved to use just O(n^3) space) that should work for up to n=100 or so.
Call a subsequence "fresh" if consists of a single ;.
Call a subsequence "finished" if it corresponds to an emoticon.
Call a subsequence "partial" if it has nonzero length and is a proper prefix of an emoticon. (So for example, ;, ;_, and ;___ are all partial subsequences, while the empty string, _, ;; and ;___;; are not.)
Finally, call a subsequence "admissible" if it is fresh, finished or partial.
Let f(i, j, k, m) be the number of ways of partitioning the first i characters of the string into exactly j+k+m admissible subsequences, of which exactly j are fresh, k are partial and m are finished. Notice that any prefix of a valid partition into emoticons determines i, j, k and m uniquely -- this means that no prefix of a valid partition will be counted by more than one tuple (i, j, k, m), so if we can guarantee that, for each tuple (i, j, k, m), the partition prefixes within that tuple are all counted once and only once, then we can add together the counts for tuples to get a valid total. Specifically, the answer to the question will then be the sum over all 1 <= j <= n of f(n, 0, j, 0).
If s[i] = "_":
f(i, j, k, m) =
(j+1) * f(i-1, j+1, k, m-1) // Convert any of the j+1 fresh subsequences to partial
+ m * f(i-1, j, k, m) // Add _ to any of the m partial subsequences
Else if s[i] = ";":
f(i, j, k, m) =
f(i-1, j-1, k, m) // Start a fresh subsequence
+ (m+1) * f(i-1, j, k-1, m+1) // Finish any of the m+1 partial subsequences
We also need the base cases
f(0, 0, 0, 0) = 1
f(0, _, _, _) = 0
f(i, j, k, m) = 0 if any of i, j, k or m are negative
My own C++ implementation gives the correct answer of 36 for ;;;___;;; in a few milliseconds, and e.g. for ;;;___;;;_;_; it gives an answer of 540 (also in a few milliseconds). For a string consisting of 66 ;s followed by 66 _s followed by 66 ;s, it takes just under 2s and reports an answer of 0 (probably due to overflow of the long long).
Here's a fairly straightforward memoized recursion that returns an answer immediately for a string of 66 ;s followed by 66 _s followed by 66 ;s. The function has three parameters: i = index in the string, j = number of accumulating emoticons with only a left semi-colon, and k = number of accumulating emoticons with a left semi-colon and one or more underscores.
An array is also constructed for how many underscores and semi-colons are available to the right of each index, to help decide on the next possibilities.
Complexity is O(n^3) and the problem constrains the search space, where j is at most n/2 and k at most n/4.
Commented JavaScript code:
var s = ';_;;__;_;;';
// record the number of semi-colons and
// underscores to the right of each index
var cs = new Array(s.length);
cs.push(0);
var us = new Array(s.length);
us.push(0);
for (var i=s.length-1; i>=0; i--){
if (s[i] == ';'){
cs[i] = cs[i+1] + 1;
us[i] = us[i+1];
} else {
us[i] = us[i+1] + 1;
cs[i] = cs[i+1];
}
}
// memoize
var h = {};
function f(i,j,k){
// memoization
var key = [i,j,k].join(',');
if (h[key] !== undefined){
return h[key];
}
// base case
if (i == s.length){
return 1;
}
var a = 0,
b = 0;
if (s[i] == ';'){
// if there are still enough colons to start an emoticon
if (cs[i] > j + k){
// start a new emoticon
a = f(i+1,j+1,k);
}
// close any of k partial emoticons
if (k > 0){
b = k * f(i+1,j,k-1);
}
}
if (s[i] == '_'){
// if there are still extra underscores
if (j < us[i] && k > 0){
// apply them to partial emoticons
a = k * f(i+1,j,k);
}
// convert started emoticons to partial
if (j > 0){
b = j * f(i+1,j-1,k+1);
}
}
return h[key] = a + b;
}
console.log(f(0,0,0)); // 52

How to determine at which index has a sorted array been rotated around?

Given an array, such as [7,8,9,0,1,2,3,4,5,6], is it possible to determine the index around which a rotation has occurred faster than O(n)?
With O(n), simply iterate through all the elements and mark the first decreasing element as the index.
A possibly better solution would be to iterate from both ends towards the middle, but this still has a worst case of O(n).
(EDIT: The below assumes that elements are distinct. If they aren't distinct, I don't think there's anything better than just scanning the array.)
You can binary search it. I won't post any code, but here's the general idea: (I'll assume that a >= b for the rest of this. If a < b, then we know it's still in its sorted order)
Take the first element, call it a, the last element b, and the middle element, calling it c.
If a < c, then you know that the pivot is between c and b, and you can recurse (with c and b as your new ends). If a > c, then you know that the pivot is somewhere between the two, and recurse in that half (with a and c as ends) instead.
ADDENDUM: To extend to cases with repeats, if we have a = c > b then we recurse with c and b as our ends, while if a = c = b, we scan from a to c to see if there is some element d such that it differs. If it doesn't exist, then all of the numbers between a and c are equal, and thus we recurse with c and b as our ends. If it does, there are two scenarios:
a > d < b: Here, d is then the smallest element since we scanned from the left, and we're done.
a < d > b: Here, we know the answer is somewhere between d and b, and so we recurse with those as our ends.
In the best case scenario, we never have to use the equality case, giving us O(log n). Worst case, those scans encompass almost all of the array, giving us O(n).
For an array of N size if the array has been rotated minimum 1 time and less than N times, i think it will work fine:
int low=0, high = n-1;
int mid = (low +high)/2;
while( mid != low && mid != high)
{
if(a[low] < a[mid])
low = mid;
else
high = mid;
mid = (low + high)/2;
}
return high;
You can use a binary search. If you pick 1 as the central value, you know the break is in the first half because 7 > 1 < 6.
One observation is that the shift is equal to the index of the minimal element. So all you have to do is to use the binary search to find the minimal element. The only catch is that if the array has the equal elements, the task gets a bit tricky: you cannot achieve a better big O efficiency than O(N) time because you can have in input like [0, 0, 0, 0, ..., 100, 0, 0, 0, ..., 0] where you cannot find the only non-zero element quicker than linearly of course. But still the following algorithm achieves O(Mins + Log(N)) where Mins is the number of minimal elements iff the array[0] is one of the minima (otherwise Mins = 0 giving no penalty).
l = 0;
r = len(array) - 1;
while( l < r && array[l] == array[r] ) {
l = l + 1;
}
while( l < r ) {
m = (l + r) / 2;
if( array[m] > array[r] ) {
l = m + 1;
} else {
r = m;
}
}
// Here l is the answer: shifting the array l elements left will make it sorted
This works in O(log N) for unique-element arrays and O(N) for non-unique element ones (but still faster than a naive solution for majority of inputs).
Precondition
Array is sorted in ascending order
Array is left rotated
Approach
First we need to find the index at which smallest element is there.
The number of times array has been rotated will be equal to the difference of length of array and the index at which the smallest element is there.
so the task is to find the index of the smallest element. we can find the index of lowest element in two ways
Method 1 -
Just traverse the array and when the current element is smaller than the next element then the current index is the index of smallest element. In worst case this will take O(n)
Method 2
Find the middle element using (lowIndex+highIndex)/2
and then we need to find that in which side of the middle element we can find the smallest element because it can be found either left or right of the middle elment
we compare the first element to the middle element and if first element is greater than the middle element then it means smallest element lies in the left side of middle element and
if the first element is smaller than the middle element then lowest element can be found in the right side of the middle element
so this can be applied like a binary search and in O(log(n)) we cab find the index of smallest element
Using a recursive method:
static void Main(string[] args)
{
var arr = new int[]{7,8,9,0,1,2,3,4,5,6};
Console.WriteLine(FindRotation(arr));
}
private static int FindRotation(int[] arr)
{
var mid = arr.Length / 2;
return CheckRotation(arr, 0, mid, arr.Length-1);
}
private static int CheckRotation(int[] arr, int start, int mid, int end)
{
var returnVal = 0;
if (start < end && end - start > 1)
{
if (arr[start] > arr[mid])
{
returnVal = CheckRotation(arr, start, start + ((mid - start) / 2), mid);
}
else if (arr[end] < arr[mid])
{
returnVal = CheckRotation(arr, mid, mid + ((end - mid) / 2), end);
}
}
else
{
returnVal = end;
}
return returnVal;
}

array- having some issues [duplicate]

An interesting interview question that a colleague of mine uses:
Suppose that you are given a very long, unsorted list of unsigned 64-bit integers. How would you find the smallest non-negative integer that does not occur in the list?
FOLLOW-UP: Now that the obvious solution by sorting has been proposed, can you do it faster than O(n log n)?
FOLLOW-UP: Your algorithm has to run on a computer with, say, 1GB of memory
CLARIFICATION: The list is in RAM, though it might consume a large amount of it. You are given the size of the list, say N, in advance.
If the datastructure can be mutated in place and supports random access then you can do it in O(N) time and O(1) additional space. Just go through the array sequentially and for every index write the value at the index to the index specified by value, recursively placing any value at that location to its place and throwing away values > N. Then go again through the array looking for the spot where value doesn't match the index - that's the smallest value not in the array. This results in at most 3N comparisons and only uses a few values worth of temporary space.
# Pass 1, move every value to the position of its value
for cursor in range(N):
target = array[cursor]
while target < N and target != array[target]:
new_target = array[target]
array[target] = target
target = new_target
# Pass 2, find first location where the index doesn't match the value
for cursor in range(N):
if array[cursor] != cursor:
return cursor
return N
Here's a simple O(N) solution that uses O(N) space. I'm assuming that we are restricting the input list to non-negative numbers and that we want to find the first non-negative number that is not in the list.
Find the length of the list; lets say it is N.
Allocate an array of N booleans, initialized to all false.
For each number X in the list, if X is less than N, set the X'th element of the array to true.
Scan the array starting from index 0, looking for the first element that is false. If you find the first false at index I, then I is the answer. Otherwise (i.e. when all elements are true) the answer is N.
In practice, the "array of N booleans" would probably be encoded as a "bitmap" or "bitset" represented as a byte or int array. This typically uses less space (depending on the programming language) and allows the scan for the first false to be done more quickly.
This is how / why the algorithm works.
Suppose that the N numbers in the list are not distinct, or that one or more of them is greater than N. This means that there must be at least one number in the range 0 .. N - 1 that is not in the list. So the problem of find the smallest missing number must therefore reduce to the problem of finding the smallest missing number less than N. This means that we don't need to keep track of numbers that are greater or equal to N ... because they won't be the answer.
The alternative to the previous paragraph is that the list is a permutation of the numbers from 0 .. N - 1. In this case, step 3 sets all elements of the array to true, and step 4 tells us that the first "missing" number is N.
The computational complexity of the algorithm is O(N) with a relatively small constant of proportionality. It makes two linear passes through the list, or just one pass if the list length is known to start with. There is no need to represent the hold the entire list in memory, so the algorithm's asymptotic memory usage is just what is needed to represent the array of booleans; i.e. O(N) bits.
(By contrast, algorithms that rely on in-memory sorting or partitioning assume that you can represent the entire list in memory. In the form the question was asked, this would require O(N) 64-bit words.)
#Jorn comments that steps 1 through 3 are a variation on counting sort. In a sense he is right, but the differences are significant:
A counting sort requires an array of (at least) Xmax - Xmin counters where Xmax is the largest number in the list and Xmin is the smallest number in the list. Each counter has to be able to represent N states; i.e. assuming a binary representation it has to have an integer type (at least) ceiling(log2(N)) bits.
To determine the array size, a counting sort needs to make an initial pass through the list to determine Xmax and Xmin.
The minimum worst-case space requirement is therefore ceiling(log2(N)) * (Xmax - Xmin) bits.
By contrast, the algorithm presented above simply requires N bits in the worst and best cases.
However, this analysis leads to the intuition that if the algorithm made an initial pass through the list looking for a zero (and counting the list elements if required), it would give a quicker answer using no space at all if it found the zero. It is definitely worth doing this if there is a high probability of finding at least one zero in the list. And this extra pass doesn't change the overall complexity.
EDIT: I've changed the description of the algorithm to use "array of booleans" since people apparently found my original description using bits and bitmaps to be confusing.
Since the OP has now specified that the original list is held in RAM and that the computer has only, say, 1GB of memory, I'm going to go out on a limb and predict that the answer is zero.
1GB of RAM means the list can have at most 134,217,728 numbers in it. But there are 264 = 18,446,744,073,709,551,616 possible numbers. So the probability that zero is in the list is 1 in 137,438,953,472.
In contrast, my odds of being struck by lightning this year are 1 in 700,000. And my odds of getting hit by a meteorite are about 1 in 10 trillion. So I'm about ten times more likely to be written up in a scientific journal due to my untimely death by a celestial object than the answer not being zero.
As pointed out in other answers you can do a sort, and then simply scan up until you find a gap.
You can improve the algorithmic complexity to O(N) and keep O(N) space by using a modified QuickSort where you eliminate partitions which are not potential candidates for containing the gap.
On the first partition phase, remove duplicates.
Once the partitioning is complete look at the number of items in the lower partition
Is this value equal to the value used for creating the partition?
If so then it implies that the gap is in the higher partition.
Continue with the quicksort, ignoring the lower partition
Otherwise the gap is in the lower partition
Continue with the quicksort, ignoring the higher partition
This saves a large number of computations.
To illustrate one of the pitfalls of O(N) thinking, here is an O(N) algorithm that uses O(1) space.
for i in [0..2^64):
if i not in list: return i
print "no 64-bit integers are missing"
Since the numbers are all 64 bits long, we can use radix sort on them, which is O(n). Sort 'em, then scan 'em until you find what you're looking for.
if the smallest number is zero, scan forward until you find a gap. If the smallest number is not zero, the answer is zero.
For a space efficient method and all values are distinct you can do it in space O( k ) and time O( k*log(N)*N ). It's space efficient and there's no data moving and all operations are elementary (adding subtracting).
set U = N; L=0
First partition the number space in k regions. Like this:
0->(1/k)*(U-L) + L, 0->(2/k)*(U-L) + L, 0->(3/k)*(U-L) + L ... 0->(U-L) + L
Find how many numbers (count{i}) are in each region. (N*k steps)
Find the first region (h) that isn't full. That means count{h} < upper_limit{h}. (k steps)
if h - count{h-1} = 1 you've got your answer
set U = count{h}; L = count{h-1}
goto 2
this can be improved using hashing (thanks for Nic this idea).
same
First partition the number space in k regions. Like this:
L + (i/k)->L + (i+1/k)*(U-L)
inc count{j} using j = (number - L)/k (if L < number < U)
find first region (h) that doesn't have k elements in it
if count{h} = 1 h is your answer
set U = maximum value in region h L = minimum value in region h
This will run in O(log(N)*N).
I'd just sort them then run through the sequence until I find a gap (including the gap at the start between zero and the first number).
In terms of an algorithm, something like this would do it:
def smallest_not_in_list(list):
sort(list)
if list[0] != 0:
return 0
for i = 1 to list.last:
if list[i] != list[i-1] + 1:
return list[i-1] + 1
if list[list.last] == 2^64 - 1:
assert ("No gaps")
return list[list.last] + 1
Of course, if you have a lot more memory than CPU grunt, you could create a bitmask of all possible 64-bit values and just set the bits for every number in the list. Then look for the first 0-bit in that bitmask. That turns it into an O(n) operation in terms of time but pretty damned expensive in terms of memory requirements :-)
I doubt you could improve on O(n) since I can't see a way of doing it that doesn't involve looking at each number at least once.
The algorithm for that one would be along the lines of:
def smallest_not_in_list(list):
bitmask = mask_make(2^64) // might take a while :-)
mask_clear_all (bitmask)
for i = 1 to list.last:
mask_set (bitmask, list[i])
for i = 0 to 2^64 - 1:
if mask_is_clear (bitmask, i):
return i
assert ("No gaps")
Sort the list, look at the first and second elements, and start going up until there is a gap.
We could use a hash table to hold the numbers. Once all numbers are done, run a counter from 0 till we find the lowest. A reasonably good hash will hash and store in constant time, and retrieves in constant time.
for every i in X // One scan Θ(1)
hashtable.put(i, i); // O(1)
low = 0;
while (hashtable.get(i) <> null) // at most n+1 times
low++;
print low;
The worst case if there are n elements in the array, and are {0, 1, ... n-1}, in which case, the answer will be obtained at n, still keeping it O(n).
You can do it in O(n) time and O(1) additional space, although the hidden factor is quite large. This isn't a practical way to solve the problem, but it might be interesting nonetheless.
For every unsigned 64-bit integer (in ascending order) iterate over the list until you find the target integer or you reach the end of the list. If you reach the end of the list, the target integer is the smallest integer not in the list. If you reach the end of the 64-bit integers, every 64-bit integer is in the list.
Here it is as a Python function:
def smallest_missing_uint64(source_list):
the_answer = None
target = 0L
while target < 2L**64:
target_found = False
for item in source_list:
if item == target:
target_found = True
if not target_found and the_answer is None:
the_answer = target
target += 1L
return the_answer
This function is deliberately inefficient to keep it O(n). Note especially that the function keeps checking target integers even after the answer has been found. If the function returned as soon as the answer was found, the number of times the outer loop ran would be bound by the size of the answer, which is bound by n. That change would make the run time O(n^2), even though it would be a lot faster.
Thanks to egon, swilden, and Stephen C for my inspiration. First, we know the bounds of the goal value because it cannot be greater than the size of the list. Also, a 1GB list could contain at most 134217728 (128 * 2^20) 64-bit integers.
Hashing part
I propose using hashing to dramatically reduce our search space. First, square root the size of the list. For a 1GB list, that's N=11,586. Set up an integer array of size N. Iterate through the list, and take the square root* of each number you find as your hash. In your hash table, increment the counter for that hash. Next, iterate through your hash table. The first bucket you find that is not equal to it's max size defines your new search space.
Bitmap part
Now set up a regular bit map equal to the size of your new search space, and again iterate through the source list, filling out the bitmap as you find each number in your search space. When you're done, the first unset bit in your bitmap will give you your answer.
This will be completed in O(n) time and O(sqrt(n)) space.
(*You could use use something like bit shifting to do this a lot more efficiently, and just vary the number and size of buckets accordingly.)
Well if there is only one missing number in a list of numbers, the easiest way to find the missing number is to sum the series and subtract each value in the list. The final value is the missing number.
int i = 0;
while ( i < Array.Length)
{
if (Array[i] == i + 1)
{
i++;
}
if (i < Array.Length)
{
if (Array[i] <= Array.Length)
{//SWap
int temp = Array[i];
int AnoTemp = Array[temp - 1];
Array[temp - 1] = temp;
Array[i] = AnoTemp;
}
else
i++;
}
}
for (int j = 0; j < Array.Length; j++)
{
if (Array[j] > Array.Length)
{
Console.WriteLine(j + 1);
j = Array.Length;
}
else
if (j == Array.Length - 1)
Console.WriteLine("Not Found !!");
}
}
Here's my answer written in Java:
Basic Idea:
1- Loop through the array throwing away duplicate positive, zeros, and negative numbers while summing up the rest, getting the maximum positive number as well, and keep the unique positive numbers in a Map.
2- Compute the sum as max * (max+1)/2.
3- Find the difference between the sums calculated at steps 1 & 2
4- Loop again from 1 to the minimum of [sums difference, max] and return the first number that is not in the map populated in step 1.
public static int solution(int[] A) {
if (A == null || A.length == 0) {
throw new IllegalArgumentException();
}
int sum = 0;
Map<Integer, Boolean> uniqueNumbers = new HashMap<Integer, Boolean>();
int max = A[0];
for (int i = 0; i < A.length; i++) {
if(A[i] < 0) {
continue;
}
if(uniqueNumbers.get(A[i]) != null) {
continue;
}
if (A[i] > max) {
max = A[i];
}
uniqueNumbers.put(A[i], true);
sum += A[i];
}
int completeSum = (max * (max + 1)) / 2;
for(int j = 1; j <= Math.min((completeSum - sum), max); j++) {
if(uniqueNumbers.get(j) == null) { //O(1)
return j;
}
}
//All negative case
if(uniqueNumbers.isEmpty()) {
return 1;
}
return 0;
}
As Stephen C smartly pointed out, the answer must be a number smaller than the length of the array. I would then find the answer by binary search. This optimizes the worst case (so the interviewer can't catch you in a 'what if' pathological scenario). In an interview, do point out you are doing this to optimize for the worst case.
The way to use binary search is to subtract the number you are looking for from each element of the array, and check for negative results.
I like the "guess zero" apprach. If the numbers were random, zero is highly probable. If the "examiner" set a non-random list, then add one and guess again:
LowNum=0
i=0
do forever {
if i == N then leave /* Processed entire array */
if array[i] == LowNum {
LowNum++
i=0
}
else {
i++
}
}
display LowNum
The worst case is n*N with n=N, but in practice n is highly likely to be a small number (eg. 1)
I am not sure if I got the question. But if for list 1,2,3,5,6 and the missing number is 4, then the missing number can be found in O(n) by:
(n+2)(n+1)/2-(n+1)n/2
EDIT: sorry, I guess I was thinking too fast last night. Anyway, The second part should actually be replaced by sum(list), which is where O(n) comes. The formula reveals the idea behind it: for n sequential integers, the sum should be (n+1)*n/2. If there is a missing number, the sum would be equal to the sum of (n+1) sequential integers minus the missing number.
Thanks for pointing out the fact that I was putting some middle pieces in my mind.
Well done Ants Aasma! I thought about the answer for about 15 minutes and independently came up with an answer in a similar vein of thinking to yours:
#define SWAP(x,y) { numerictype_t tmp = x; x = y; y = tmp; }
int minNonNegativeNotInArr (numerictype_t * a, size_t n) {
int m = n;
for (int i = 0; i < m;) {
if (a[i] >= m || a[i] < i || a[i] == a[a[i]]) {
m--;
SWAP (a[i], a[m]);
continue;
}
if (a[i] > i) {
SWAP (a[i], a[a[i]]);
continue;
}
i++;
}
return m;
}
m represents "the current maximum possible output given what I know about the first i inputs and assuming nothing else about the values until the entry at m-1".
This value of m will be returned only if (a[i], ..., a[m-1]) is a permutation of the values (i, ..., m-1). Thus if a[i] >= m or if a[i] < i or if a[i] == a[a[i]] we know that m is the wrong output and must be at least one element lower. So decrementing m and swapping a[i] with the a[m] we can recurse.
If this is not true but a[i] > i then knowing that a[i] != a[a[i]] we know that swapping a[i] with a[a[i]] will increase the number of elements in their own place.
Otherwise a[i] must be equal to i in which case we can increment i knowing that all the values of up to and including this index are equal to their index.
The proof that this cannot enter an infinite loop is left as an exercise to the reader. :)
The Dafny fragment from Ants' answer shows why the in-place algorithm may fail. The requires pre-condition describes that the values of each item must not go beyond the bounds of the array.
method AntsAasma(A: array<int>) returns (M: int)
requires A != null && forall N :: 0 <= N < A.Length ==> 0 <= A[N] < A.Length;
modifies A;
{
// Pass 1, move every value to the position of its value
var N := A.Length;
var cursor := 0;
while (cursor < N)
{
var target := A[cursor];
while (0 <= target < N && target != A[target])
{
var new_target := A[target];
A[target] := target;
target := new_target;
}
cursor := cursor + 1;
}
// Pass 2, find first location where the index doesn't match the value
cursor := 0;
while (cursor < N)
{
if (A[cursor] != cursor)
{
return cursor;
}
cursor := cursor + 1;
}
return N;
}
Paste the code into the validator with and without the forall ... clause to see the verification error. The second error is a result of the verifier not being able to establish a termination condition for the Pass 1 loop. Proving this is left to someone who understands the tool better.
Here's an answer in Java that does not modify the input and uses O(N) time and N bits plus a small constant overhead of memory (where N is the size of the list):
int smallestMissingValue(List<Integer> values) {
BitSet bitset = new BitSet(values.size() + 1);
for (int i : values) {
if (i >= 0 && i <= values.size()) {
bitset.set(i);
}
}
return bitset.nextClearBit(0);
}
def solution(A):
index = 0
target = []
A = [x for x in A if x >=0]
if len(A) ==0:
return 1
maxi = max(A)
if maxi <= len(A):
maxi = len(A)
target = ['X' for x in range(maxi+1)]
for number in A:
target[number]= number
count = 1
while count < maxi+1:
if target[count] == 'X':
return count
count +=1
return target[count-1] + 1
Got 100% for the above solution.
1)Filter negative and Zero
2)Sort/distinct
3)Visit array
Complexity: O(N) or O(N * log(N))
using Java8
public int solution(int[] A) {
int result = 1;
boolean found = false;
A = Arrays.stream(A).filter(x -> x > 0).sorted().distinct().toArray();
//System.out.println(Arrays.toString(A));
for (int i = 0; i < A.length; i++) {
result = i + 1;
if (result != A[i]) {
found = true;
break;
}
}
if (!found && result == A.length) {
//result is larger than max element in array
result++;
}
return result;
}
An unordered_set can be used to store all the positive numbers, and then we can iterate from 1 to length of unordered_set, and see the first number that does not occur.
int firstMissingPositive(vector<int>& nums) {
unordered_set<int> fre;
// storing each positive number in a hash.
for(int i = 0; i < nums.size(); i +=1)
{
if(nums[i] > 0)
fre.insert(nums[i]);
}
int i = 1;
// Iterating from 1 to size of the set and checking
// for the occurrence of 'i'
for(auto it = fre.begin(); it != fre.end(); ++it)
{
if(fre.find(i) == fre.end())
return i;
i +=1;
}
return i;
}
Solution through basic javascript
var a = [1, 3, 6, 4, 1, 2];
function findSmallest(a) {
var m = 0;
for(i=1;i<=a.length;i++) {
j=0;m=1;
while(j < a.length) {
if(i === a[j]) {
m++;
}
j++;
}
if(m === 1) {
return i;
}
}
}
console.log(findSmallest(a))
Hope this helps for someone.
With python it is not the most efficient, but correct
#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
import datetime
# write your code in Python 3.6
def solution(A):
MIN = 0
MAX = 1000000
possible_results = range(MIN, MAX)
for i in possible_results:
next_value = (i + 1)
if next_value not in A:
return next_value
return 1
test_case_0 = [2, 2, 2]
test_case_1 = [1, 3, 44, 55, 6, 0, 3, 8]
test_case_2 = [-1, -22]
test_case_3 = [x for x in range(-10000, 10000)]
test_case_4 = [x for x in range(0, 100)] + [x for x in range(102, 200)]
test_case_5 = [4, 5, 6]
print("---")
a = datetime.datetime.now()
print(solution(test_case_0))
print(solution(test_case_1))
print(solution(test_case_2))
print(solution(test_case_3))
print(solution(test_case_4))
print(solution(test_case_5))
def solution(A):
A.sort()
j = 1
for i, elem in enumerate(A):
if j < elem:
break
elif j == elem:
j += 1
continue
else:
continue
return j
this can help:
0- A is [5, 3, 2, 7];
1- Define B With Length = A.Length; (O(1))
2- initialize B Cells With 1; (O(n))
3- For Each Item In A:
if (B.Length <= item) then B[Item] = -1 (O(n))
4- The answer is smallest index in B such that B[index] != -1 (O(n))

Algorithm to find the smallest non negative integer that is not in a list

Given a list of integers, how can I best find an integer that is not in the list?
The list can potentially be very large, and the integers might be large (i.e. BigIntegers, not just 32-bit ints).
If it makes any difference, the list is "probably" sorted, i.e. 99% of the time it will be sorted, but I cannot rely on always being sorted.
Edit -
To clarify, given the list {0, 1, 3, 4, 7}, examples of acceptable solutions would be -2, 2, 8 and 10012, but I would prefer to find the smallest, non-negative solution (i.e. 2) if there is an algorithm that can find it without needing to sort the entire list.
One easy way would be to iterate the list to get the highest value n, then you know that n+1 is not in the list.
Edit:
A method to find the smallest positive unused number would be to start from zero and scan the list for that number, starting over and increase if you find the number. To make it more efficient, and to make use of the high probability of the list being sorted, you can move numbers that are smaller than the current to an unused part of the list.
This method uses the beginning of the list as storage space for lower numbers, the startIndex variable keeps track of where the relevant numbers start:
public static int GetSmallest(int[] items) {
int startIndex = 0;
int result = 0;
int i = 0;
while (i < items.Length) {
if (items[i] == result) {
result++;
i = startIndex;
} else {
if (items[i] < result) {
if (i != startIndex) {
int temp = items[startIndex];
items[startIndex] = items[i];
items[i] = temp;
}
startIndex++;
}
i++;
}
}
return result;
}
I made a performance test where I created lists with 100000 random numbers from 0 to 19999, which makes the average lowest number around 150. On test runs (with 1000 test lists each), the method found the smallest number in unsorted lists by average in 8.2 ms., and in sorted lists by average in 0.32 ms.
(I haven't checked in what state the method leaves the list, as it may swap some items in it. It leaves the list containing the same items, at least, and as it moves smaller values down the list I think that it should actually become more sorted for each search.)
If the number doesn't have any restrictions, then you can do a linear search to find the maximum value in the list and return the number that is one larger.
If the number does have restrictions (e.g. max+1 and min-1 could overflow), then you can use a sorting algorithm that works well on partially sorted data. Then go through the list and find the first pair of numbers v_i and v_{i+1} that are not consecutive. Return v_i + 1.
To get the smallest non-negative integer (based on the edit in the question), you can either:
Sort the list using a partial sort as above. Binary search the list for 0. Iterate through the list from this value until you find a "gap" between two numbers. If you get to the end of the list, return the last value + 1.
Insert the values into a hash table. Then iterate from 0 upwards until you find an integer not in the list.
Unless it is sorted you will have to do a linear search going item by item until you find a match or you reach the end of the list. If you can guarantee it is sorted you could always use the array method of BinarySearch or just roll your own binary search.
Or like Jason mentioned there is always the option of using a Hashtable.
"probably sorted" means you have to treat it as being completely unsorted. If of course you could guarantee it was sorted this is simple. Just look at the first or last element and add or subtract 1.
I got 100% in both correctness & performance,
You should use quick sorting which is N log(N) complexity.
Here you go...
public int solution(int[] A) {
if (A != null && A.length > 0) {
quickSort(A, 0, A.length - 1);
}
int result = 1;
if (A.length == 1 && A[0] < 0) {
return result;
}
for (int i = 0; i < A.length; i++) {
if (A[i] <= 0) {
continue;
}
if (A[i] == result) {
result++;
} else if (A[i] < result) {
continue;
} else if (A[i] > result) {
return result;
}
}
return result;
}
private void quickSort(int[] numbers, int low, int high) {
int i = low, j = high;
int pivot = numbers[low + (high - low) / 2];
while (i <= j) {
while (numbers[i] < pivot) {
i++;
}
while (numbers[j] > pivot) {
j--;
}
if (i <= j) {
exchange(numbers, i, j);
i++;
j--;
}
}
// Recursion
if (low < j)
quickSort(numbers, low, j);
if (i < high)
quickSort(numbers, i, high);
}
private void exchange(int[] numbers, int i, int j) {
int temp = numbers[i];
numbers[i] = numbers[j];
numbers[j] = temp;
}
Theoretically, find the max and add 1. Assuming you're constrained by the max value of the BigInteger type, sort the list if unsorted, and look for gaps.
Are you looking for an on-line algorithm (since you say the input is arbitrarily large)? If so, take a look at Odds algorithm.
Otherwise, as already suggested, hash the input, search and turn on/off elements of boolean set (the hash indexes into the set).
There are several approaches:
find the biggest int in the list and store it in x. x+1 will not be in the list. The same applies with using min() and x-1.
When N is the size of the list, allocate an int array with the size (N+31)/32. For each element in the list, set the bit v&31 (where v is the value of the element) of the integer at array index i/32. Ignore values where i/32 >= array.length. Now search for the first array item which is '!= 0xFFFFFFFF' (for 32bit integers).
If you can't guarantee it is sorted, then you have a best possible time efficiency of O(N) as you have to look at every element to make sure your final choice is not there. So the question is then:
Can it be done in O(N)?
What is the best space efficiency?
Chris Doggett's solution of find the max and add 1 is both O(N) and space efficient (O(1) memory usage)
If you want only probably the best answer then it is a different question.
Unless you are 100% sure it is sorted, the quickest algorithm still has to look at each number in the list at least once to at least verify that a number is not in the list.
Assuming this is the problem I'm thinking of:
You have a set of all ints in the range 1 to n, but one of those ints is missing. Tell me which of int is missing.
This is a pretty easy problem to solve with some simple math knowledge. It's known that the sum of the range 1 .. n is equal to n(n+1) / 2. So, let W = n(n+1) / 2 and let Y = the sum of the numbers in your set. The integer that is missing from your set, X, would then be X = W - Y.
Note: SO needs to support MathML
If this isn't that problem, or if it's more general, then one of the other solutions is probably right. I just can't really tell from the question since it's kind of vague.
Edit: Well, since the edit, I can see that my answer is absolutely wrong. Fun math, none-the-less.
I've solved this using Linq and a binary search. I got 100% across the board. Here's my code:
using System.Collections.Generic;
using System.Linq;
class Solution {
public int solution(int[] A) {
if (A == null) {
return 1;
} else {
if (A.Length == 0) {
return 1;
}
}
List<int> list_test = new List<int>(A);
list_test = list_test.Distinct().ToList();
list_test = list_test.Where(i => i > 0).ToList();
list_test.Sort();
if (list_test.Count == 0) {
return 1;
}
int lastValue = list_test[list_test.Count - 1];
if (lastValue <= 0) {
return 1;
}
int firstValue = list_test[0];
if (firstValue > 1) {
return 1;
}
return BinarySearchList(list_test);
}
int BinarySearchList(List<int> list) {
int returnable = 0;
int tempIndex;
int[] boundaries = new int[2] { 0, list.Count - 1 };
int testCounter = 0;
while (returnable == 0 && testCounter < 2000) {
tempIndex = (boundaries[0] + boundaries[1]) / 2;
if (tempIndex != boundaries[0]) {
if (list[tempIndex] > tempIndex + 1) {
boundaries[1] = tempIndex;
} else {
boundaries[0] = tempIndex;
}
} else {
if (list[tempIndex] > tempIndex + 1) {
returnable = tempIndex + 1;
} else {
returnable = tempIndex + 2;
}
}
testCounter++;
}
if (returnable == list[list.Count - 1]) {
returnable++;
}
return returnable;
}
}
The longest execution time was 0.08s on the Large_2 test
You need the list to be sorted. That means either knowing it is sorted, or sorting it.
Sort the list. Skip this step if the list is known to be sorted. O(n lg n)
Remove any duplicate elements. Skip this step if elements are already guaranteed distinct. O(n)
Let B be the position of 1 in the list using a binary search. O(lg n)
If 1 isn't in the list, return 1. Note that if all elements from 1 to n are in the list, then the element at B+n must be n+1. O(1)
Now perform a sortof binary search starting with min = B, max = end of the list. Call the position of the pivot P. If the element at P is greater than (P-B+1), recurse on the range [min, pivot], otherwise recurse on the range (pivot, max]. Continue until min=pivot=max O(lg n)
Your answer is (the element at pivot-1)+1, unless you are at the end of the list and (P-B+1) = B in which case it is the last element + 1. O(1)
This is very efficient if the list is already sorted and has distinct elements. You can do optimistic checks to make it faster when the list has only non-negative elements or when the list doesn't include the value 1.
Just gave an interview where they asked me this question. The answer to this problem can be found using worst case analysis. The upper bound for the smallest natural number present on the list would be length(list). This is because, the worst case for the smallest number present in the list given the length of the list is the list 0,1,2,3,4,5....length(list)-1.
Therefore for all lists, smallest number not present in the list is less than equal to length of the list. Therefore, initiate a list t with n=length(list)+1 zeros. Corresponding to every number i in the list (less than equal to the length of the list) mark assign the value 1 to t[i]. The index of the first zero in the list is the smallest number not present in the list. And since, the lower bound on this list n-1, for at least one index j

Resources