Find an element in an infinite length sorted array - performance

Given an infinite length sorted array having both positive and negative integers. Find an element in it.
EDIT
All the elements in the array are unique and the array in infinite in right direction.
There are two approaches:
Approach 1:
Set the index at position 100, if the element to be found is less, binary search in the previous 100 items, else set the next index at position 200. In this way, keep on increasing the index by 100 until the item is greater.
Approach 2:
Set the index in power of 2. First set the index at position 2, then 4, then 8, then 16 and so on. Again do the binary search from position 2^K to 2^(K + 1) where item is in between.
Which of the two approaches will be better both in best case and worst case?

The first approach will be linear in the index of the element (O(k) where k is the index of the element). Actually, you are going to need k/100 iterations to find the element which is greater than the searched element, which is O(k).
The second approach will be logarithmic in the same index. O(logk). (where k is the index of the element). In here, you are going to need log(k) iterations until you find the higher element. Then binary search between 2^(i-1), 2^i (where i is the iteration number), will be logarithmic as well, totaling in O(logk)
Thus, the second is more efficient.

You can apply binary search more or less directly with a small modification. This will roughly correspond to your Approach 2.
Basically, pick some number B and set A to 0, then check if the element you're looking for is between A and B. If it is, perform the usual kind of binary search in these boundaries, otherwise set B=A and A=2*A and repeat. This will take O(log(M)), where M is the position of the element you're looking for in the array.

If the array is well-founded, i.e. has a smallest element (i.e. you have elements x0, x1, ...), and all ele­ments are unique, then here's a simple approach: If you're looking for the number n, you can do a bi­na­ry search over the indices 0, ..., n − x0. Note that we always have the basic inequality xi ≥ i + x0 for all i ≥ 0.
Thus you can find the value n in log2(n − x0) steps.

Since the array is infinite, the indexes are necessarily variable-length. That means that doing math on them is not O(1), which in turn means that "binary search with first a search for an endpoint" has a slightly different time complexity than O(log(k)).
The index math done in the search for the endpoint is just a left shift by one, which takes O(log(k)) because indexes up to k need up to log(k) bits and shifting left by one is linear in the number of bits.
The index math done in the binary search is all O(log(k)) as well.
So the actual complexity of both algorithms is O(log(k)^2). The complexity of a linear search would be O(k log k), so it still loses.

Just my 2 cents. We have an infinite array thus lets imagine that we are looking for very big number. Did you imagine? Well it's ever much more bigger. Note that length of interval to binary search in is 2^i = 2^(i+1)-2^i thus it should take log(2^i)=i time to find the number. On the other hand it takes i time to reach the target interval. So the total time complexity is O(n) again. What I'm missing?

Here is another implementation which use 2^n to search occurrence of element itself, and then gives this sub-array to binary search
e.g.
arr = 1,2,3,4,5,6,7,8,9,-1,-1,-1,-1,-1,-1...........
num = 8;
efficiency = 2logn

--Complete Solution-- takes O(logn) time complexity
public class B {
public static void main(String[] args) {
// Assuming sorted array of infinite length
int a[] = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 };
int elementToFind = 12;
int finalIndex = getFinalIndex(a, elementToFind);
if (finalIndex == -1) {
System.out.println("Element not found");
}
System.out.println("Found element:" + a[finalIndex]);
}
private static int getFinalIndex(int[] a, int elementToFind) {
int power = 2;
int finalIndex = (int) Math.pow(2, power);
for (int i = 0; i < finalIndex;) {
if (elementToFind == a[finalIndex]) {
return finalIndex;
}
else if (elementToFind < a[finalIndex]) {
System.out.println("search through binary search algo");
// taking i as starting index in binary search call
int searchedIndex = callToBinarySearch(a, i, finalIndex);
return searchedIndex;
}
else {
i = finalIndex + 1;
power = power * 2;
finalIndex = (int) Math.pow(2, power);
}
}
return -1;
}
}

package com.population.app;
import java.io.;
import java.util.;
class Demo {
public static void main(String args[]) {
int arr[] = { 1, 2, 4, 6, 8, 9, 12, 14, 17, 21, 45 };
int index = findPos(arr, 45);
if (index == -1)
System.out.println("Element not found!");
else
System.out.println("Element found! index = " + index);
}
static int findPos(int arr[], int value) {
int start = 0, end = 1;
while (arr[end] < value) {
start = end;
end = 2 * end;
// we know it is infinite but if it has finite elements it will reduce the
// overflow of legth to n-1 element other wise the code will fail has array
// index out of bound exception
if (end > arr.length) {
end = arr.length - 1;
}
}
return binarySearch(arr, start, end, value);
}
static int binarySearch(int arr[], int start, int end, int ele) {
if (end >= start) {
int mid = start + (end - start) / 2;
if (arr[mid] == ele)
return mid;
if (arr[mid] > ele)
return binarySearch(arr, start, mid - 1, ele);
return binarySearch(arr, mid + 1, end, ele);
}
return -1;
}
}

Related

Find the 1 non-repeating element in a given array without XOR or Map in O(n)

How can I find the one non repeating element in an array that all other elements appear exactly twice, when I'm not allowed to use a hash map or the operator XOR?
In O(n) time complexity
Examples:
Input
arr[] = {14, 1, 14, 4, 12, 2, 1, 2, 3, 3}
Output
4
If you want to do it in java Script then inside a for loop you can check if the first index and the last index of that item are same then return that item else return -1.
function getFirstDistinctNumber() {
arr = [14, 1, 14, 4, 12, 2, 1, 2, 3, 3];
for (let i=0; i<arr.length; i++) {
if(arr.indexOf(arr[i]) == arr.lastIndexOf(arr[i])) {
return arr[i];
}
}
return -1;
}
console.log(getFirstDistinctNumber());
And also in java, you can do the same but lastIndexOf() is not present for array. so you can do it by creating an array list
import java.util.*;
class FindDistinct {
public static void main(String[] args) {
// create an empty array list with an initial capacity
ArrayList<Integer> inputList = new ArrayList<Integer>();
// use add() method to add values in the list
inputList.add(14);
inputList.add(1);
inputList.add(14);
inputList.add(4);
inputList.add(12);
inputList.add(2);
inputList.add(1);
inputList.add(2);
inputList.add(3);
inputList.add(3);
for(int i=0; i< inputList.size(); i++) {
if(inputList.indexOf(inputList.get(i)) == inputList.lastIndexOf(inputList.get(i))) {
System.out.println(inputList.get(i));
break;
}
}
}
}
sorting the array and then using stack, you can find the required element
# code is in python
arr = [14, 1, 14, 4, 12, 2, 1, 2, 3, 3,12]
# sort the array
arr = sorted(arr)
# use a stack to find out the required element
stack = []
for ele in arr:
if len(stack) == 0:
stack.append(ele)
elif stack[-1]==ele:
stack.pop()
else:
stack.append(ele)
print(stack[-1]) # item with one occuerence
# output : 4
There's a way compute that in O(1) space and O(n log n) time. Simply binary search the value. For a given number x count the number of elements that less or equal to x - if this value is odd then the number you're looking for is less or equal to x, otherwise if it's even, then it's greater.
(technically the running time is O(n log k) where k is max_value - min_value from the elements in the array, but there's a way to modify it to work in O(n log n) if needed.)
I found the answer to the problem, using this video: https://www.youtube.com/watch?v=aZneq1PWFkg
You can get the median in O(n) time, then you can sort all of the numbers which are greater or equal to the median to the right of it and the ones that are less than it, to be on the left in O(n) time. now everything is sorted in a way that everything that's bigger than the median is on the left and bigger is on the right.
Then You search the array once more to see if the median has a twin, if it doesn’t, you’re done that’s your lonely number, if you’ve found the twin on the left side and the index of the median is odd then you take everything that’s on the left side of the median including it (and throw the others away), same for if you’ve found it in the right side. If the medians index is even you take the opposite side that you find the twin on without including the median. You keep doing the same algo from beginning until you’ve found it.
Then you can get T(n) = T(n/2) + Θ(n)
And with masters theorem you get Θ(n).
In c++ using O(n) compexity.
int arrayUnique(int *arr, int size)
{
int count;
for(int i=0;i<size;i++)
{
count=0;
for(int j=0;j<size;j++)
{
if(i==j){
continue;
}
if(arr[i] == arr[j]){
count=1;
}
}
if(count==0){
return arr[i];
}
}
}

Sorted squares of numbers in a list in O(n)?

Given a list of integers in sorted order, say, [-9, -2, 0, 2, 3], we have to square each element and return the result in a sorted order. So, the output would be: [0, 4, 4, 9, 81].
I could figure out two approaches:
O(NlogN) approach - We insert the square of each element in a hashset. Then copy the elements into a list, sort it and then return it.
O(n) approach - If there is a bound for the input elements (say -100 to -100), then we create a boolean list of size 20000 (to store -10000 to 10000). For each of the input elements, we mark the corresponding square number as true. For e.g., for 9 in the input, I will mark 81 in the boolean array as true. Then traverse this boolean list and insert all the true elements into a return list. Note that in this we make an assumption - that there is a bound for the input elements.
Is there some way in which we could do it in O(n) time even without assuming any bounds for the input?
Well I can think of an O(n) approach
Split the input into 2 lists. One with negative numbers, let's call this list A. And one with positive numbers and 0, list B. This is done while preserving the input order, which is trivial : O(n)
Reverse list A. We do this because once squared, the greater than relation between the elements if flipped
Square every item of both list in place : O(n)
Run a merge operation not unlike that of a merge sort. : O(n)
Total: O(n)
Done :)
Is there some way in which we could do it in O(n) time even without assuming any bounds for the input?
Absolutely.
Since the original list is already sorted you are in luck!
given two numbers x and y
if |x| > |y| then x^2 > y^2
So all you have to do is to split the list into two parts, one for all the negative numbers and the other one for all the positive ones
Reverse the negative one and make them positive
Then you merge those two lists into one using insertion. This runs in O(n) since both lists are sorted.
From there you can just calculate the square and put them into the new list.
We can achieve it by 2 pointer technique. 1 pointer at the start and other at the end. Compare the squares and move the pointers accordingly and start allocating the max element at the end of the new list.
Time = O(n)
Space = O(n)
Can you do it inplace ? To reduce space complexity.
This can be done with O(n) time and space. We need two pointers. The following is the Java code:
public int[] sortedSquares(int[] A) {
int i = 0;
int j = A.length - 1;
int[] result = new int[A.length];
int count = A.length - 1;
while(count >= 0) {
if(Math.abs(A[i]) > Math.abs(A[j])) {
result[count] = A[i]*A[i];
i++;
}
else {
result[count] = A[j]*A[j];
j--;
}
count--;
}
return result;
}
Start from the end ad compare the absolute values. And then create the answer.
class Solution {
public int[] sortedSquares(int[] nums) {
int left = 0;
int right = nums.length -1;
int index = nums.length- 1;
int result[] = new int [nums.length];
while(left<=right)
{
if(Math.abs(nums[left])>Math.abs(nums[right]))
{
result[index] = nums[left] * nums[left];
left++;
}
else
{
result[index] = nums[right] * nums[right];
right--;
}
index--;
}
return result;
}
}
By using the naive approach this question will be very easy but it will require O(nlogn) complexity
To solve this question in O(n), two pointer method is the best approach.
Create a new result array with the same length as the given array, and store it pointer as array length
Assign a pointer at the start of the array and then assign another pointer at the last of the array, as we know the last element from either side will be highest
[-9, -2, 0, 2, 3]
compare -9 and 3 absolute value
if the left value then store the value to the resultant array and decrease its index value and increase the left, otherwise decrease the right.
Python3 solution. time complexity - O(N) and space complexity O(1).
def sorted_squArrres(Arr:list) ->list:
i = 0
j = len(Arr)-1
while i<len(Arr):
if Arr[i]*Arr[i]<Arr[j]*Arr[j]:
Arr.insert(0,Arr[j]*Arr[j])
Arr.pop(j+1)
i+=1
continue
if Arr[i]*Arr[i]>Arr[j]*Arr[j]:
Arr.insert(0,Arr[i]*Arr[i])
Arr.pop(i+1)
i+=1
continue
else:
if i!=j:
Arr.insert(0,Arr[j]*Arr[j])
Arr.insert(0,Arr[j+1]*Arr[j+1])
Arr.pop(j+2)
Arr.pop(i+2)
i+=2
else:
Arr.insert(0,Arr[j]*Arr[j])
Arr.pop(j+1)
i+=1
return Arr
X = [[-4,-3,-2,0,3,5,6],[1,2,3,4,5],[-5,-4,-3,-2,-1],[-9,-2,0,2,3]]
for i in X:
# looping differnt kinds of inputs
print(sorted_squArrres(i))
# outputs
'''
[0, 4, 9, 9, 16, 25, 36]
[1, 4, 9, 16, 25]
[1, 4, 9, 16, 25]
[0, 4, 4, 9, 81]
'''

Any faster way to find the number of "lucky triples"?

I am working on a code challenge problem -- "find lucky triples". "Lucky triple" is defined as "In a list lst, for any combination of triple like (lst[i], lst[j], lst[k]) where i < j < k, where lst[i] divides lst[j] and lst[j] divides lst[k].
My task is to find the number of lucky triples in a given list. The brute force way is to use three loops but it takes too much time to solve the problem. I wrote this one and the system respond "time exceed". The problems looks silly and easy but the array is unsorted so general methods like binary search do not work. I am stun in the problem for one day and hope someone can give me a hint. I am seeking a way to solve the problem faster, at least the time complexity should be lower than O(N^3).
A simple dynamic programming-like algorithm will do this in quadratic time and linear space. You just have to maintain a counter c[i] for each item in the list, that represents the number of previous integers that divides L[i].
Then, as you go through the list and test each integer L[k] with all previous item L[j], if L[j] divides L[k], you just add c[j] (which could be 0) to your global counter of triples, because that also implies that there exist exactly c[j] items L[i] such that L[i] divides L[j] and i < j.
int c[] = {0}
int nbTriples = 0
for k=0 to n-1
for j=0 to k-1
if (L[k] % L[j] == 0)
c[k]++
nbTriples += c[j]
return nbTriples
There may be some better algorithm that uses fancy discrete maths to do it faster, but if O(n^2) is ok, this will do just fine.
In regard to your comment:
Why DP? We have something that can clearly be modeled as having a left to right order (DP orange flag), and it feels like reusing previously computed values could be interesting, because the brute force algorithm does the exact same computations a lot of times.
How to get from that to a solution? Run a simple example (hint: it should better be by treating input from left to right). At step i, compute what you can compute from this particular point (ignoring everything on the right of i), and try to pinpoint what you compute over and over again for different i's: this is what you want to cache. Here, when you see a potential triple at step k (L[k] % L[j] == 0), you have to consider what happens on L[j]: "does it have some divisors on its left too? Each of these would give us a new triple. Let's see... But wait! We already computed that on step j! Let's cache this value!" And this is when you jump on your seat.
Full working solution in python:
c = [0] * len(l)
print c
count = 0
for i in range(0,len(l)):
j=0
for j in range(0, i):
if l[i] % l[j] == 0:
c[i] = c[i] + 1
count = count + c[j]
print j
print c
print count
Read up on the Sieve of Eratosthenes, a common technique for finding prime numbers, which could be adapted to find your 'lucky triples'. Essentially, you would need to iterate your list in increasing value order, and for each value, multiply it by an increasing factor until it is larger than the largest list element, and each time one of these multiples equals another value in the list, the multiple is divisible by the base number. If the list is sorted when given to you, then the i < j < k requirement would also be satisfied.
e.g. Given the list [3, 4, 8, 15, 16, 20, 40]:
Start at 3, which has multiples [6, 9, 12, 15, 18 ... 39] within the range of the list. Of those multiples, only 15 is contained in the list, so record under 15 that it has a factor 3.
Proceed to 4, which has multiples [8, 12, 16, 20, 24, 28, 32, 36, 40]. Mark those as having a factor 4.
Continue through the list. When you reach an element that has an existing known factor, then if you find any multiples of that number in the list, then you have a triple. In this case, for 16, this has a multiple 32 which is in the list. So now you know that 32 is divisible by 16, which is divisible by 4. Whereas for 15, that has no multiples in the list, so there is no value that can form a triplet with 3 and 15.
A precomputation step to the problem can help reduce time complexity.
Precomputation Step:
For every element(i), iterate the array to find which are the elements(j) such that lst[j]%lst[i]==0
for(i=0;i<n;i++)
{
for(j=i+1;j<n;j++)
{
if(a[j]%a[i] == 0)
// mark those j's. You decide how to store this data
}
}
This Precomputation Step will take O(n^2) time.
In the Ultimate Step, use the details of the Precomputation Step, to help find the triplets..
Forming a graph - an array of the indices which are multiples ahead of the current index. Then calculating the collective sum of multiples of these indices, referred from the graph. It has a complexity of O(n^2)
For example, for a list {1,2,3,4,5,6} there will be an array of the multiples. The graph will look like
{ 0:[1,2,3,4,5], 1:[3,5], 2: [5], 3:[],4:[], 5:[]}
So, total triplets will be {0->1 ->3/5} and {0->2 ->5} ie., 3
package com.welldyne.mx.dao.core;
import java.util.LinkedList;
import java.util.List;
public class LuckyTriplets {
public static void main(String[] args) {
int[] integers = new int[2000];
for (int i = 1; i < 2001; i++) {
integers[i - 1] = i;
}
long start = System.currentTimeMillis();
int n = findLuckyTriplets(integers);
long end = System.currentTimeMillis();
System.out.println((end - start) + " ms");
System.out.println(n);
}
private static int findLuckyTriplets(int[] integers) {
List<Integer>[] indexMultiples = new LinkedList[integers.length];
for (int i = 0; i < integers.length; i++) {
indexMultiples[i] = getMultiples(integers, i);
}
int luckyTriplets = 0;
for (int i = 0; i < integers.length - 1; i++) {
luckyTriplets += getLuckyTripletsFromMultiplesMap(indexMultiples, i);
}
return luckyTriplets;
}
private static int getLuckyTripletsFromMultiplesMap(List<Integer>[] indexMultiples, int n) {
int sum = 0;
for (int i = 0; i < indexMultiples[n].size(); i++) {
sum += indexMultiples[(indexMultiples[n].get(i))].size();
}
return sum;
}
private static List<Integer> getMultiples(int[] integers, int n) {
List<Integer> multiples = new LinkedList<>();
for (int i = n + 1; i < integers.length; i++) {
if (isMultiple(integers[n], integers[i])) {
multiples.add(i);
}
}
return multiples;
}
/*
* if b is the multiple of a
*/
private static boolean isMultiple(int a, int b) {
return b % a == 0;
}
}
I just wanted to share my solution, which passed. Basically, the problem can be condensed to a tree problem. You need to pay attention to the wording of the question, it only treats numbers different on basis of the index not value. so {1,1,1} will have only 1 triple, but {1,1,1,1} will have 4. the constraint is {li,lj,lk} such that the divide and i<j<k
def solution(l):
count = 0
data = l
max_element = max(data)
tree_list = []
for p,element in enumerate(data):
if element == 0:
tree_list.append([])
else:
temp = []
for el in data[p+1:]:
if el%element == 0:
temp.append(el)
tree_list.append(temp)
for p,element_list in enumerate(tree_list):
data[p] = 0
temp = data[:]
for element in element_list:
pos_element = temp.index(element)
count += len(tree_list[pos_element])
temp[pos_element] = 0
return count

Find the x smallest integers in a list of length n

You have a list of n integers and you want the x smallest. For example,
x_smallest([1, 2, 5, 4, 3], 3) should return [1, 2, 3].
I'll vote up unique runtimes within reason and will give the green check to the best runtime.
I'll start with O(n * x): Create an array of length x. Iterate through the list x times, each time pulling out the next smallest integer.
Edits
You have no idea how big or small these numbers are ahead of time.
You don't care about the final order, you just want the x smallest.
This is already being handled in some solutions, but let's say that while you aren't guaranteed a unique list, you aren't going to get a degenerate list either such as [1, 1, 1, 1, 1] either.
You can find the k-th smallest element in O(n) time. This has been discussed on StackOverflow before. There are relatively simple randomized algorithms, such as QuickSelect, that run in O(n) expected time and more complicated algorithms that run in O(n) worst-case time.
Given the k-th smallest element you can make one pass over the list to find all elements less than the k-th smallest and you are done. (I assume that the result array does not need to be sorted.)
Overall run-time is O(n).
Maintain the list of the x highest so far in sorted order in a skip-list. Iterate through the array. For each element, find where it would be inserted in the skip list (log x time). If in the interior of the list, it is one of the smallest x so far, so insert it and remove the element at the end of the list. Otherwise do nothing.
Time O(n*log(x))
Alternative implementation: maintain the collection of x highest so far in a max-heap, compare each new element with top element of the heap, and pop + insert new element only if the new element is less than the top element. Since comparison to top element is O(1) and pop/insert O(log x), this is also O(nlog(x))
Add all n numbers to a heap and delete x of them. Complexity is O((n + x) log n). Since x is obviously less than n, it's O(n log n).
If the range of numbers (L) is known, you can do a modified counting sort.
given L, x, input[]
counts <- array[0..L]
for each number in input
increment counts[number]
next
#populate the output
index <- 0
xIndex <- 0
while xIndex < x and index <= L
if counts[index] > 0 then
decrement counts[index]
output[xIndex] = index
increment xIndex
else
increment index
end if
loop
This has a runtime of O(n + L) (with memory overhead of O(L)) which makes it pretty attractive if the range is small (L < n log n).
def x_smallest(items, x):
result = sorted(items[:x])
for i in items[x:]:
if i < result[-1]:
result[-1] = i
j = x - 1
while j > 0 and result[j] < result[j-1]:
result[j-1], result[j] = result[j], result[j-1]
j -= 1
return result
Worst case is O(x*n), but will typically be closer to O(n).
Psudocode:
def x_smallest(array<int> arr, int limit)
array<int> ret = new array[limit]
ret = {INT_MAX}
for i in arr
for j in range(0..limit)
if (i < ret[j])
ret[j] = i
endif
endfor
endfor
return ret
enddef
In pseudo code:
y = length of list / 2
if (x > y)
iterate and pop off the (length - x) largest
else
iterate and pop off the x smallest
O(n/2 * x) ?
sort array
slice array 0 x
Choose the best sort algorithm and you're done: http://en.wikipedia.org/wiki/Sorting_algorithm#Comparison_of_algorithms
You can sort then take the first x values?
Java: with QuickSort O(n log n)
import java.util.Arrays;
import java.util.Random;
public class Main {
public static void main(String[] args) {
Random random = new Random(); // Random number generator
int[] list = new int[1000];
int lenght = 3;
// Initialize array with positive random values
for (int i = 0; i < list.length; i++) {
list[i] = Math.abs(random.nextInt());
}
// Solution
int[] output = findSmallest(list, lenght);
// Display Results
for(int x : output)
System.out.println(x);
}
private static int[] findSmallest(int[] list, int lenght) {
// A tuned quicksort
Arrays.sort(list);
// Send back correct lenght
return Arrays.copyOf(list, lenght);
}
}
Its pretty fast.
private static int[] x_smallest(int[] input, int x)
{
int[] output = new int[x];
for (int i = 0; i < x; i++) { // O(x)
output[i] = input[i];
}
for (int i = x; i < input.Length; i++) { // + O(n-x)
int current = input[i];
int temp;
for (int j = 0; j < output.Length; j++) { // * O(x)
if (current < output[j]) {
temp = output[j];
output[j] = current;
current = temp;
}
}
}
return output;
}
Looking at the complexity:
O(x + (n-x) * x) -- assuming x is some constant, O(n)
What about using a splay tree? Because of the splay tree's unique approach to adaptive balancing it makes for a slick implementation of the algorithm with the added benefit of being able to enumerate the x items in order afterwards. Here is some psuedocode.
public SplayTree GetSmallest(int[] array, int x)
{
var tree = new SplayTree();
for (int i = 0; i < array.Length; i++)
{
int max = tree.GetLargest();
if (array[i] < max || tree.Count < x)
{
if (tree.Count >= x)
{
tree.Remove(max);
}
tree.Add(array[i]);
}
}
return tree;
}
The GetLargest and Remove operations have an amortized complexity of O(log(n)), but because the last accessed item bubbles to the top it would normally be O(1). So the space complexity is O(x) and the runtime complexity is O(n*log(x)). If the array happens to already be ordered then this algorithm would acheive its best case complexity of O(n) with either an ascending or descending ordered array. However, a very odd or peculiar ordering could result in a O(n^2) complexity. Can you guess how the array would have to be ordered for that to happen?
In scala, and probably other functional languages, a no brainer:
scala> List (1, 3, 6, 4, 5, 1, 2, 9, 4) sortWith ( _<_ ) take 5
res18: List[Int] = List(1, 1, 2, 3, 4)

Algorithm to find the smallest non negative integer that is not in a list

Given a list of integers, how can I best find an integer that is not in the list?
The list can potentially be very large, and the integers might be large (i.e. BigIntegers, not just 32-bit ints).
If it makes any difference, the list is "probably" sorted, i.e. 99% of the time it will be sorted, but I cannot rely on always being sorted.
Edit -
To clarify, given the list {0, 1, 3, 4, 7}, examples of acceptable solutions would be -2, 2, 8 and 10012, but I would prefer to find the smallest, non-negative solution (i.e. 2) if there is an algorithm that can find it without needing to sort the entire list.
One easy way would be to iterate the list to get the highest value n, then you know that n+1 is not in the list.
Edit:
A method to find the smallest positive unused number would be to start from zero and scan the list for that number, starting over and increase if you find the number. To make it more efficient, and to make use of the high probability of the list being sorted, you can move numbers that are smaller than the current to an unused part of the list.
This method uses the beginning of the list as storage space for lower numbers, the startIndex variable keeps track of where the relevant numbers start:
public static int GetSmallest(int[] items) {
int startIndex = 0;
int result = 0;
int i = 0;
while (i < items.Length) {
if (items[i] == result) {
result++;
i = startIndex;
} else {
if (items[i] < result) {
if (i != startIndex) {
int temp = items[startIndex];
items[startIndex] = items[i];
items[i] = temp;
}
startIndex++;
}
i++;
}
}
return result;
}
I made a performance test where I created lists with 100000 random numbers from 0 to 19999, which makes the average lowest number around 150. On test runs (with 1000 test lists each), the method found the smallest number in unsorted lists by average in 8.2 ms., and in sorted lists by average in 0.32 ms.
(I haven't checked in what state the method leaves the list, as it may swap some items in it. It leaves the list containing the same items, at least, and as it moves smaller values down the list I think that it should actually become more sorted for each search.)
If the number doesn't have any restrictions, then you can do a linear search to find the maximum value in the list and return the number that is one larger.
If the number does have restrictions (e.g. max+1 and min-1 could overflow), then you can use a sorting algorithm that works well on partially sorted data. Then go through the list and find the first pair of numbers v_i and v_{i+1} that are not consecutive. Return v_i + 1.
To get the smallest non-negative integer (based on the edit in the question), you can either:
Sort the list using a partial sort as above. Binary search the list for 0. Iterate through the list from this value until you find a "gap" between two numbers. If you get to the end of the list, return the last value + 1.
Insert the values into a hash table. Then iterate from 0 upwards until you find an integer not in the list.
Unless it is sorted you will have to do a linear search going item by item until you find a match or you reach the end of the list. If you can guarantee it is sorted you could always use the array method of BinarySearch or just roll your own binary search.
Or like Jason mentioned there is always the option of using a Hashtable.
"probably sorted" means you have to treat it as being completely unsorted. If of course you could guarantee it was sorted this is simple. Just look at the first or last element and add or subtract 1.
I got 100% in both correctness & performance,
You should use quick sorting which is N log(N) complexity.
Here you go...
public int solution(int[] A) {
if (A != null && A.length > 0) {
quickSort(A, 0, A.length - 1);
}
int result = 1;
if (A.length == 1 && A[0] < 0) {
return result;
}
for (int i = 0; i < A.length; i++) {
if (A[i] <= 0) {
continue;
}
if (A[i] == result) {
result++;
} else if (A[i] < result) {
continue;
} else if (A[i] > result) {
return result;
}
}
return result;
}
private void quickSort(int[] numbers, int low, int high) {
int i = low, j = high;
int pivot = numbers[low + (high - low) / 2];
while (i <= j) {
while (numbers[i] < pivot) {
i++;
}
while (numbers[j] > pivot) {
j--;
}
if (i <= j) {
exchange(numbers, i, j);
i++;
j--;
}
}
// Recursion
if (low < j)
quickSort(numbers, low, j);
if (i < high)
quickSort(numbers, i, high);
}
private void exchange(int[] numbers, int i, int j) {
int temp = numbers[i];
numbers[i] = numbers[j];
numbers[j] = temp;
}
Theoretically, find the max and add 1. Assuming you're constrained by the max value of the BigInteger type, sort the list if unsorted, and look for gaps.
Are you looking for an on-line algorithm (since you say the input is arbitrarily large)? If so, take a look at Odds algorithm.
Otherwise, as already suggested, hash the input, search and turn on/off elements of boolean set (the hash indexes into the set).
There are several approaches:
find the biggest int in the list and store it in x. x+1 will not be in the list. The same applies with using min() and x-1.
When N is the size of the list, allocate an int array with the size (N+31)/32. For each element in the list, set the bit v&31 (where v is the value of the element) of the integer at array index i/32. Ignore values where i/32 >= array.length. Now search for the first array item which is '!= 0xFFFFFFFF' (for 32bit integers).
If you can't guarantee it is sorted, then you have a best possible time efficiency of O(N) as you have to look at every element to make sure your final choice is not there. So the question is then:
Can it be done in O(N)?
What is the best space efficiency?
Chris Doggett's solution of find the max and add 1 is both O(N) and space efficient (O(1) memory usage)
If you want only probably the best answer then it is a different question.
Unless you are 100% sure it is sorted, the quickest algorithm still has to look at each number in the list at least once to at least verify that a number is not in the list.
Assuming this is the problem I'm thinking of:
You have a set of all ints in the range 1 to n, but one of those ints is missing. Tell me which of int is missing.
This is a pretty easy problem to solve with some simple math knowledge. It's known that the sum of the range 1 .. n is equal to n(n+1) / 2. So, let W = n(n+1) / 2 and let Y = the sum of the numbers in your set. The integer that is missing from your set, X, would then be X = W - Y.
Note: SO needs to support MathML
If this isn't that problem, or if it's more general, then one of the other solutions is probably right. I just can't really tell from the question since it's kind of vague.
Edit: Well, since the edit, I can see that my answer is absolutely wrong. Fun math, none-the-less.
I've solved this using Linq and a binary search. I got 100% across the board. Here's my code:
using System.Collections.Generic;
using System.Linq;
class Solution {
public int solution(int[] A) {
if (A == null) {
return 1;
} else {
if (A.Length == 0) {
return 1;
}
}
List<int> list_test = new List<int>(A);
list_test = list_test.Distinct().ToList();
list_test = list_test.Where(i => i > 0).ToList();
list_test.Sort();
if (list_test.Count == 0) {
return 1;
}
int lastValue = list_test[list_test.Count - 1];
if (lastValue <= 0) {
return 1;
}
int firstValue = list_test[0];
if (firstValue > 1) {
return 1;
}
return BinarySearchList(list_test);
}
int BinarySearchList(List<int> list) {
int returnable = 0;
int tempIndex;
int[] boundaries = new int[2] { 0, list.Count - 1 };
int testCounter = 0;
while (returnable == 0 && testCounter < 2000) {
tempIndex = (boundaries[0] + boundaries[1]) / 2;
if (tempIndex != boundaries[0]) {
if (list[tempIndex] > tempIndex + 1) {
boundaries[1] = tempIndex;
} else {
boundaries[0] = tempIndex;
}
} else {
if (list[tempIndex] > tempIndex + 1) {
returnable = tempIndex + 1;
} else {
returnable = tempIndex + 2;
}
}
testCounter++;
}
if (returnable == list[list.Count - 1]) {
returnable++;
}
return returnable;
}
}
The longest execution time was 0.08s on the Large_2 test
You need the list to be sorted. That means either knowing it is sorted, or sorting it.
Sort the list. Skip this step if the list is known to be sorted. O(n lg n)
Remove any duplicate elements. Skip this step if elements are already guaranteed distinct. O(n)
Let B be the position of 1 in the list using a binary search. O(lg n)
If 1 isn't in the list, return 1. Note that if all elements from 1 to n are in the list, then the element at B+n must be n+1. O(1)
Now perform a sortof binary search starting with min = B, max = end of the list. Call the position of the pivot P. If the element at P is greater than (P-B+1), recurse on the range [min, pivot], otherwise recurse on the range (pivot, max]. Continue until min=pivot=max O(lg n)
Your answer is (the element at pivot-1)+1, unless you are at the end of the list and (P-B+1) = B in which case it is the last element + 1. O(1)
This is very efficient if the list is already sorted and has distinct elements. You can do optimistic checks to make it faster when the list has only non-negative elements or when the list doesn't include the value 1.
Just gave an interview where they asked me this question. The answer to this problem can be found using worst case analysis. The upper bound for the smallest natural number present on the list would be length(list). This is because, the worst case for the smallest number present in the list given the length of the list is the list 0,1,2,3,4,5....length(list)-1.
Therefore for all lists, smallest number not present in the list is less than equal to length of the list. Therefore, initiate a list t with n=length(list)+1 zeros. Corresponding to every number i in the list (less than equal to the length of the list) mark assign the value 1 to t[i]. The index of the first zero in the list is the smallest number not present in the list. And since, the lower bound on this list n-1, for at least one index j

Resources