Find the smallest missing number in an array [duplicate] - algorithm

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Find the Smallest Integer Not in a List
I got asked this question in an interview
Given an unsorted array, find the smallest number that is missing. Assume that all numbers are postive.
input = {4,2,1,3,678,3432}
output = 5
Sorting it was my first approach. Second approach of mine is to have a boolean flag array.The second approach takes up lots of space.
Is there any other better approach than this ?

Suppose the length of the given array is N.
You can go with the boolean-flag approach, but you don't need to take numbers that are larger than N into account, since they're obviously too big to affect the answer. You could also consider a bitmap to save some space.

An alternative solution to unkulunkulu's is the following:
k := 1
Make an array of 2^k booleans, all initially set to false
For every value, V, in the array:
if V < 2^k then:
set the V'th boolean to true
Scan through the booleans, if there are any falses then the lowest one indicates the lowest missing value.
If there were no falses then increment k and go to step 2.
This will run in O(n log s) time and O(s) space, where n is the size of the input array and s is the smallest value not present. It can be made more efficient by not rechecking the lowest values on successive iterations, but that doesn't change the constraints.

Related

Given O(n) sets, what is complexity of figuring out distinct ones amongst them?

I have an application where I have a list of O(n) sets.
Each set Set(i) is an n-vector. Suppose n=4, for instance,
Set(1) could be [0|1|1|0]
Set(2) could be [1|1|1|0]
Set(3) could be [1|1|0|0]
Set(4) could be [1|1|1|0]
I'd like to process these sets so that as output, I only get the unique ones amongst them. So, in the example above, I would get as output:
Set(1), Set(2), Set(3). Note that Set(4) is discarded since it is same as Set(2).
A rather brute force way of figuring this gives me a worst-case bound of O(n^3):
Given: Input List of size O(n)
Output List L = Set(1)
for(j = 2 to Length of Input List){ // Loop Outer, check if Set(j) should be added to L
for(i = 1 to Length of L currently){ // Loop Inner
check if Set(i) is same as Set(j) //This step is O(n) since Set() has O(n) elements
if(they are same) exit inner loop
else
if( i is length of L currently) //so, Set(j) is unique thus far
Append Set(j) to L
}
}
There is no a priori bound on n: it can be arbitrarily large. This seems to preclude use of simple hash function which maps the binary set into decimal. I could be wrong.
Is there any other way this can be done in better worst-case running time other than O(n^3)?
O(n) sequences of length n makes an input of size O(n^2). You won't get complexity better than that, since you may at least be required to read all the input. All sequences might be the same, for example, but you'd have to read them all to know that.
A binary sequence of length n can be inserted into a trie or radix tree, while checking whether or not it already exists, in O(n) time. That's O(n^2) for all the sequences together, so simply using a trie or radix tree to find duplicates is optimal.
See: https://en.wikipedia.org/wiki/Trie
and: https://en.wikipedia.org/wiki/Radix_tree
You may consider implementing your set using a balanced binary tree. The cost of inserting a new node into such a tree is O(lgm), where m is the number of elements in the tree. Duplicates would implicitly be weeded out because if we detect that such a node already exists, then it would just not be added.
In your example, the total number of lookup/insertion operations would be n*n, since there are n sets, and each set has n values. So, the overall time might scale as O(n^2*lg(n^2)). This outperforms O(n^3) by some amount.
First of all, these are not sets but bitstrings.
Next, for every bitstring you can convert it to a number and put that number in a hashset (or simply store the original bitstrings, most hashset implementations can do that). Afterwards, your hashset contains all the unique items. O(N) time, O(N) space. If you need to maintain the original order of strings, then in the first loop check for each string if it is in the hashset already, and if not, output it and insert in the hashset.
If you can use O(n) extra space, you can try this:
First of all, let's assume the vectors are binary numbers, so 0110 becomes 6.
This is in case numbers in vectors are [0,1], else you can multiply by 10 instead of 2.
Converting all vectors into decimals would take O(4n).
For each converted number we'll map the vector by the decimal number. To implement this, we'll be using an n-sized hash-map.
HM <- n-sized hash-map
for each vector v:
num <- decimal number converted of v
map v into HM by num
loop over HM and take only one for each index
runtime by steps:
O(n)
O(n*(4+1)) , when 1 is the time for mapping, 4 is the vector length
O(n)

generate all subsets of set given by binary representation of integer in c++ [duplicate]

This question already has answers here:
How do i generate all possible subsets of an binary string?
(5 answers)
Closed 4 years ago.
In c++ I am searching for an efficient algorithm to generate all integers such that their binary representation is a subset of a set which is given by the binary representation of integer N. By efficient I mean that I don't want to for-loop through all integers smaller than N to check whether they are subsets, mainly because N could be very large.
An idea I had was to generate all possible subsets of an integer corresponding to the Hamming weight of N and then shift them to the correct positions using <<, but I am so far failing to find a good way how to do this.
Example:
For set 110100 given by integer N=52 all possible subsets would be:
{000100,010000,010100,100000,100100,110000}
corresponding to integers {4,16,20,32,36,48}, which is what I want to generate.
Let P be popcount(N), the number of bits that are set. The number of results is then 2P - 2.
Treat N as a boolean array (bits). For each bit which is set, generate two subsets: one with that bit set and one without it. This can be done recursively until no bits are set.
Finally, discard the original N from the results, and also 0 (as per your example).
The time complexity is linear in the size of the output, i.e. O(2P).

Binary search in 2 sorted integer arrays

There is a big array which consists of 2 small integer arrays written one at the end of another. Both small arrays are sorted by ascending. We have to find an element in big array as fast, as possible. My idea was to find the end of the left array by binsearch in big array and then implement 2 binsearches on small arrays. The problem is that I don't know how to find that end. If you have an idea, how to find element without finding borders of smaller arrays, you're welcome!
Information about arrays: both small arrays have integer elements, both are sorted by ascending, they both can have length from 0 to any positive integer number, but there can be only one copy of an element.
Here are some examples of big arrays:
1 2 3 4 5 6 7 (all the elements of the second array are bigger, than the maximum of the first array)
100 1 (both arrays have only one element)
1 3 5 2 4 6 or 2 4 6 1 3 5 (most common situations)
This problem is impossible to solve in guaranteed time complexity faster than O(n) and not possible to solve at all for certain arrays. Binary search runs in O(log n) for a sorted array, but the big array is not guaranteed to be sorted and will in the worst-case require one or more comparisions per element, which is O(n). The best guaranteed time complexity is O(n) with the trivial algorithm: compare every item with its neighbour until you find the "turning point" with A[i] > A[i+1]. However, if you use a breadth-first search, you may get lucky and find the "turning point" early.
Proof that the problem is unsolvable for some arrays: let the array M = [A B] be our big array. To find the point where the arrays meet we're looking for an index i where M[i] > M[i+1]. Now let A=[1 2 3] and B=[4 5]. There is no index in the array M for which the condition holds true, thus the problem is unsolvable for some arrays.
Informal proof for the former: let M=[A B] and A=[1..x] and B=[(x+1)..y] be two sorted arrays. Then swap the positions of element x and y in M. We have no way of finding the index of x without (in the worst case) checking every index, thus the problem is O(n).
Binary search relies on being able to eliminate half the solution space with each comparision, but in this case we cannot eliminate anything from the array and so we cannot do better than a linear search.
(From a practical standpoint, you should never do this in a program. The two arrays should be separate. If this isn't possible, append the length of either array to the bigger array.)
Edit: changed my answer after question was updated. It's possible to do it faster than linear time for some arrays, but not all possible arrays. Here's my idea for an algorithm using breadth-first search:
Start with the interval [0..n-1] where n is the length of the big array.
Make a list of intervals and put the starting interval in it.
For each interval in the list:
if the interval is only two elements and the first element is greater than the last
we found the turning point, return it
else if the interval is two elements or less
remove it from the list
else if the first element of the interval is greater than the last
turning point is in this interval
clear the list
split this interval in two equal parts and add them to the list
else
split this interval in two equal parts and replace this interval in the list with the two parts
I think a breadth-first approach will increase the odds of finding an interval where A[first] > A[last] early. Note that this approach will not work if the turning point is between two intervals, but it's something to get you started. I would test this myself, but unfortunately I don't have the time now.

Determine if two numbers in a set equals x in nlgn [duplicate]

This question already has answers here:
Determine whether or not there exist two elements in Set S whose sum is exactly x - correct solution?
(8 answers)
Closed 7 years ago.
I do not want a solution if my own answer is incorrect because i really want to solve this on my own. What i do want is either a yes this is correct or a no this is not correct, followed by tips or suggestions that may lead to an answer without spoiling it all.
The question:
Describe a theta(nlgn)-time algorithm that, given a set S of n integers and another integer x, determines whether or not there exist two elements in S whose sum is exactly x. (Intro to Algorithms, 2.3-7)
The attempt:
First the problem doesnt state whether or not the set is sorted. I assumed it was not, and sorted it using merge sort as it is theta(nlgn) under the worst case.
Then i said okay, the only way this will still be theta(nlgn) is if i recursively split the problem into two. My approach was to start at index i=0 of our array, see what value k would i need for x=i+k and then used binary search to split the problem in half until i either found k or not. If i did not find a k such that x=i+k, then i would continue the same process for index i=2 to n-1. This would result in theta(nlgn).
The total time complexity from both sorting the array and finding k would add up to theta(nlgn) + theta(nlgn) = theta(2nlg2n) = theta(nlgn) if you remove constants.
Yes, your solution is correct. Don't forget to handle the case when a found element k has the same index i.

Algorithms for dividing an array into n parts

In a recent campus Facebook interview i have asked to divide an array into 3 equal parts such that the sum in each array is roughly equal to sum/3.My Approach1. Sort The Array2. Fill the array[k] (k=0) uptil (array[k]<=sum/3)3. After that increment k and repeat the above step for array[k]Is there any better algorithm for this or it is NP Hard Problem
This is a variant of the partition problem (see http://en.wikipedia.org/wiki/Partition_problem for details). In fact a solution to this can solve that one (take an array, pad with 0s, and then solve this problem) so this problem is NP hard.
There is a dynamic programming approach that is pseudo-polynomial. For each i from 0 to the size of the array, you keep track of all possible combinations of current sizes for the sub arrays, and their current sums. As long as there are a limited number possible sums of subsets of the array, this runs acceptably fast.
The solution that I would have suggested is to just go for "good enough" closeness. First let's consider the simpler problem with all values positive. Then sort by value descending. Take that array in threes. Build up the three subsets by always adding the largest of the triple to the one with the smallest sum, the smallest to the one with the largest, and the middle to the middle. You will end up dividing the array evenly, and the difference will be no more than the value of the third smallest element.
For the general case you can divide into positive and negative, use the above approach on each, and then brute force all combinations of a group of positives, a group of negatives, and the few leftover values in the middle that did not divide evenly.
Here are details on a dynamic programming solution if you are interested. The running time and memory usage is O(n*(sum)^2) where n is the size of your array and sum is the sum of absolute values of your array values. For each array index j from 1 to n, store all the possible values you can get for your 3 subset sums when you split the array from index 1 to j into 3 subsets. Also for each possibility, store one possible way to split the array to get the 3 sums. Then to extend this information for 1 to (j+1) given the information from 1 to j, simply take each possible combination of 3 sums for splitting 1 to j and form the 3 combinations of 3 sums you get when you choose to add the (j+1)th array element to any one of the 3 subsets. Finally, when you are done and reach j = n, go through the set of all combinations of 3 subset sums you can get when you split array positions 1 to n into 3 sets, and choose the one whose maximum deviation from sum/3 is minimized. At first this may seem like O(n*(sum)^3) complexity, but for each j and each combination of the first 2 subset sums, the 3rd subset sum is uniquely determined. (because you are not allowed to omit any elements of the array). Thus the complexity really is O(n*(sum)^2).

Resources