Linear algorithm on binary strings

Linear algorithm on binary strings - algorithm

I'm going through some old midterms to study. (None of the solutions are given)
I've come across this problem which I'm stuck on
Let n = 2ℓ − 1 for some positive integer ℓ. Suppose someone claims to hold an array A[1.. n] of
distinct ℓ-bit strings; thus, exactly one ℓ-bit string does not appear in A. Suppose further that the
only way we can access A is by calling the function FetchBit(i, j), which returns the jth bit of the string A[i] in O(1) time.
Describe an algorithm to find the missing string in A using only O(n) calls to FetchBit.
The only thing I can think of is go through each string, convert it to base 10, sort them all and then see which value is missing. But that's certainly not O(n)
Proof it's not homework... http://web.engr.illinois.edu/~jeffe/teaching/algorithms/hwex/f12/midterm1.pdf

You can do it in 2n operations.
First, look at the first bit of every number. Obviously, you will get 2ℓ-1 zeros and 2ℓ-1-1 ones ore vice versa (because only one number is missing). If there is 2ℓ-1-1 ones then you know that the first bit of the missing number is one, otherwise it is zero.
Now you know the first bit of a missing number. Let's look at all numbers which have the same first bit (there are 2ℓ-1-1 of them) and repeat the same procedure with their second bit. This way you will determine the second bit of the missing number, and so on.
The total number of FetchBit calls will be 2ℓ-1 + 2ℓ-1-1 + ... + 21-1 <= 2ℓ+1 <= 2n+2 = O(n).

Related

Find number occuring even number of times? [duplicate]

Given is an array of integers. Each number in the array repeats an ODD number of times, but only 1 number is repeated for an EVEN number of times. Find that number.
I was thinking a hash map, with each element's count. It requires O(n) space. Is there a better way?

Hash-map is fine, but all you need to store is each element's count modulo 2.
All of those will end up being 1 (odd) except for the 0 (even) -count element.
(As Aleks G says you don't need to use arithmetic (count++ %2), only xor (count ^= 0x1); although any compiler will optimize that anyway.)

I don't know what the intended meaning of "repeat" is, but if there is an even number of occurrences of (all-1) numbers, and an odd number of occurances for only one number, then XOR should do the trick.

You don't need to keep the number of times each element is found - just whether it's even or odd number of time - so you should be ok with 1 bit for each element. Start with 0 for each element, then flip the corresponding bit when you encounter the element. Next time you encounter it, flip the bit again. At the end, just check which bit is 1.

If all numbers are repeated even times and one number repeats odd times, if you XOR all of the numbers, the odd count repeated number can be found.
By your current statement I think hashmap is good idea, but I'll think about it to find a better way. (I say this for positive integers.)

Apparently there is a solution in O(n) time and O(1) space, since it was asked at a software engineer company with this constraint explictly. See here : Bing interview question -- it seems to be doable using XOR over numbers in the array. Good luck ! :)

constructing binary sequences with unique n-bit

A question that was asked during a job interview (which I pretty much failed) and
sadly, something I still cannot figure out.
Let's assume that you're given some positive integer, n.
Assume that you construct a sequence consisting of only 1 and 0, and
you want to construct a sequence of length 2^n + n-1 such that
every sequence of length n consisting of adjacent numbers is unique.
for instance
00110 (00, 01, 11, 10) for n=2
How would one construct such a sequence?
I think one should start with 0000..0 (n zeroes) and
do something about it.
If there is a constructive way of doing it, maybe
I could extend that method to constructing
a sequence consisting of only 0, 1, ..., k-1, and having
length k^n + n-1 such that
every sequence of length n consisting of adjacent numbers is unique
(or maybe not..)
(sorry, my sequence for n=3 is wrong, so I deleted it.
also, i've never heard of De Bruijin's sequence. I know it now!
thanks for all the answers and comments).

This strikes me as a very ambitious interview question; if you don't know the answer, you're unlikely to get it in a few minutes.
As mentioned in comments, this is really just the derivation of a de Bruijn sequence, only unwrapped. You can read the Wikipedia article linked above for more information, but the algorithms it proposes, while efficient, are not exactly easy to derive. There is a much simpler (but rather more storage-intensive) algorithm which I think is folkloric; at least, I don't know of a name attached to it. It's at least simple to describe:
Start with n 0s
As long as possible:
If you can add a 1 without repeating a previously-seen n-sequence, do so.
If not but you can add a 0 without repeating a previously-seen n-sequence, do so.
Otherwise, done.
This requires you to either search the entire string on each iteration, requiring exponential time, or maintain a boolean array of all seen sequences (coded as binary numbers, presumably), requiring exponential space. The "concatenate all Lyndon words in lexicographical order" solution is much more efficient, but leaves open the question of generating all Lyndon words in lexicographical order.

Finding a number that repeats even no of times where all the other numbers repeat odd no of times

Given is an array of integers. Each number in the array repeats an ODD number of times, but only 1 number is repeated for an EVEN number of times. Find that number.
I was thinking a hash map, with each element's count. It requires O(n) space. Is there a better way?

Hash-map is fine, but all you need to store is each element's count modulo 2.
All of those will end up being 1 (odd) except for the 0 (even) -count element.
(As Aleks G says you don't need to use arithmetic (count++ %2), only xor (count ^= 0x1); although any compiler will optimize that anyway.)

I don't know what the intended meaning of "repeat" is, but if there is an even number of occurrences of (all-1) numbers, and an odd number of occurances for only one number, then XOR should do the trick.

You don't need to keep the number of times each element is found - just whether it's even or odd number of time - so you should be ok with 1 bit for each element. Start with 0 for each element, then flip the corresponding bit when you encounter the element. Next time you encounter it, flip the bit again. At the end, just check which bit is 1.

If all numbers are repeated even times and one number repeats odd times, if you XOR all of the numbers, the odd count repeated number can be found.
By your current statement I think hashmap is good idea, but I'll think about it to find a better way. (I say this for positive integers.)

Apparently there is a solution in O(n) time and O(1) space, since it was asked at a software engineer company with this constraint explictly. See here : Bing interview question -- it seems to be doable using XOR over numbers in the array. Good luck ! :)

Finding the repeated element

In an array with integers between 1 and 1,000,000 or say some very larger value ,if a single value is occurring twice twice. How do you determine which one?
I think we can use a bitmap to mark the elements , and then traverse allover again to find out the repeated element . But , i think it is a process with high complexity.Is there any better way ?

This sounds like homework or an interview question ... so rather than giving away the answer, here's a hint.
What calculations can you do on a range of integers whose answer you can determine ahead of time?
Once you realize the answer to this, you should be able to figure it out .... if you still can't figure it out ... (and it's not homework) I'll post the solution :)
EDIT: Ok. So here's the elegant solution ... if the list contains ALL of the integers within the range.
We know that all of the values between 1 and N must exist in the list. Using Guass' formula we can quickly compute the expected value of a range of integers:
Sum(1..N) = 1/2 * (1 + N) * Count(1..N).
Since we know the expected sum, all we have to do is loop through all the values and sum their values. The different between this sum and the expected sum is the duplicate value.
EDIT: As other's have commented, the question doesn't state that the range contains all of the integers ... in this case, you have to decide whether you want to optimize for memory or time.
If you want to perform the operation using O(1) storage, you can perform an in-place sort of the list. As you're sorting you have to check adjacent elements. Once you see a duplicate, you know you can stop. Optimal sorting is an O(n log n) operation on average - which establishes an upper bound for find the duplicate in this manner.
If you want to optimize for speed, you can use an additional O(n) storage. Using a HashSet (or similar structure), insert values from your list until you determine you are inserting a duplicate into the HashSet. Inserting n items into a HashSet is an O(n) operation on average, which establishes that as an upper bound for this method.

you may try to use bits as hashmap:
1 at position k means that number k occured before
0 at position k means that number k did not occured before
pseudocode:
0. assume that your array is A
1. initialize bitarray(there is nice class in c# for this) of 1000000 length filled with zeros
2. for each num in A:
if bitarray[num]
return num
else
bitarray[num] = 1
end

The time complexity of the bitmap solution is O(n) and it doesn't seem like you could do better than that. However it will take up a lot of memory for a generic list of numbers. Sorting the numbers is an obvious way to detect duplicates and doesn't require extra space if you don't mind the current order changing.

Assuming the array is of length n < N (i.e. not ALL integers are present -- in this case LBushkin's trick is the answer to this homework problem), there is no way to solve this problem using less than O(n) memory using an algorithm that just takes a single pass through the array. This is by reduction to the set disjointness problem.
Suppose I made the problem easier, and I promised you that the duplicate elements were in the array such that the first one was in the first n/2 elements, and the second one was in the last n/2 elements. Now we can think of playing a game in which two people each hold a string of n/2 elements, and want to know how many messages they have to send to be sure that none of their elements are the same. Since the first player could simulate the run of any algorithm that takes a pass through the array, and send the contents of its memory to the second player, a lower bound on the number of messages they need to send implies a lower bound on the memory requirements of any algorithm.
But its easy to see in this simple game that they need to send n/2 messages to be sure that they don't hold any of the same elements, which yields the lower bound.
Edit: This generalizes to show that for algorithms that make k passes through the array and use memory m, that m*k = Omega(n). And it is easy to see that you can in fact trade off memory for time in this way.
Of course, if you are willing to use algorithms that don't simply take passes through the array, you can do better as suggested already: sort the array, then take 1 pass through. This takes time O(nlogn) and space O(1). But note curiously that this proves that any sorting algorithm that just makes passes through the array must take time Omega(n^2)! Sorting algorithms that break the n^2 bound must make random accesses.

Finding a single number in a list [duplicate]

This question already has answers here:
How to find the only number in an array that doesn't occur twice [duplicate]
(5 answers)
Closed 7 years ago.
What would be the best algorithm for finding a number that occurs only once in a list which has all other numbers occurring exactly twice.
So, in the list of integers (lets take it as an array) each integer repeats exactly twice, except one. To find that one, what is the best algorithm.

The fastest (O(n)) and most memory efficient (O(1)) way is with the XOR operation.
In C:
int arr[] = {3, 2, 5, 2, 1, 5, 3};
int num = 0, i;
for (i=0; i < 7; i++)
num ^= arr[i];
printf("%i\n", num);
This prints "1", which is the only one that occurs once.
This works because the first time you hit a number it marks the num variable with itself, and the second time it unmarks num with itself (more or less). The only one that remains unmarked is your non-duplicate.

By the way, you can expand on this idea to very quickly find two unique numbers among a list of duplicates.
Let's call the unique numbers a and b. First take the XOR of everything, as Kyle suggested. What we get is a^b. We know a^b != 0, since a != b. Choose any 1 bit of a^b, and use that as a mask -- in more detail: choose x as a power of 2 so that x & (a^b) is nonzero.
Now split the list into two sublists -- one sublist contains all numbers y with y&x == 0, and the rest go in the other sublist. By the way we chose x, we know that a and b are in different buckets. We also know that each pair of duplicates is still in the same bucket. So we can now apply ye olde "XOR-em-all" trick to each bucket independently, and discover what a and b are completely.
Bam.

O(N) time, O(N) memory
HT= Hash Table
HT.clear()
go over the list in order
for each item you see
if(HT.Contains(item)) -> HT.Remove(item)
else
ht.add(item)
at the end, the item in the HT is the item you are looking for.
Note (credit #Jared Updike): This system will find all Odd instances of items.
comment: I don't see how can people vote up solutions that give you NLogN performance. in which universe is that "better" ?
I am even more shocked you marked the accepted answer s NLogN solution...
I do agree however that if memory is required to be constant, then NLogN would be (so far) the best solution.

Kyle's solution would obviously not catch situations were the data set does not follow the rules. If all numbers were in pairs the algorithm would give a result of zero, the exact same value as if zero would be the only value with single occurance.
If there were multiple single occurance values or triples, the result would be errouness as well.
Testing the data set might well end up with a more costly algorithm, either in memory or time.
Csmba's solution does show some errouness data (no or more then one single occurence value), but not other (quadrouples). Regarding his solution, depending on the implementation of HT, either memory and/or time is more then O(n).
If we cannot be sure about the correctness of the input set, sorting and counting or using a hashtable counting occurances with the integer itself being the hash key would both be feasible.

I would say that using a sorting algorithm and then going through the sorted list to find the number is a good way to do it.
And now the problem is finding "the best" sorting algorithm. There are a lot of sorting algorithms, each of them with its strong and weak points, so this is quite a complicated question. The Wikipedia entry seems like a nice source of info on that.

Implementation in Ruby:
a = [1,2,3,4,123,1,2,.........]
t = a.length-1
for i in 0..t
s = a.index(a[i])+1
b = a[s..t]
w = b.include?a[i]
if w == false
puts a[i]
end
end

You need to specify what you mean by "best" - to some, speed is all that matters and would qualify an answer as "best" - for others, they might forgive a few hundred milliseconds if the solution was more readable.
"Best" is subjective unless you are more specific.
That said:
Iterate through the numbers, for each number search the list for that number and when you reach the number that returns only a 1 for the number of search results, you are done.

Seems like the best you could do is to iterate through the list, for every item add it to a list of "seen" items or else remove it from the "seen" if it's already there, and at the end your list of "seen" items will include the singular element. This is O(n) in regards to time and n in regards to space (in the worst case, it will be much better if the list is sorted).
The fact that they're integers doesn't really factor in, since there's nothing special you can do with adding them up... is there?
Question
I don't understand why the selected answer is "best" by any standard. O(N*lgN) > O(N), and it changes the list (or else creates a copy of it, which is still more expensive in space and time). Am I missing something?

Depends on how large/small/diverse the numbers are though. A radix sort might be applicable which would reduce the sorting time of the O(N log N) solution by a large degree.

The sorting method and the XOR method have the same time complexity. The XOR method is only O(n) if you assume that bitwise XOR of two strings is a constant time operation. This is equivalent to saying that the size of the integers in the array is bounded by a constant. In that case you can use Radix sort to sort the array in O(n).
If the numbers are not bounded, then bitwise XOR takes time O(k) where k is the length of the bit string, and the XOR method takes O(nk). Now again Radix sort will sort the array in time O(nk).

You could simply put the elements in the set into a hash until you find a collision. In ruby, this is a one-liner.
def find_dupe(array)
h={}
array.detect { |e| h[e]||(h[e]=true; false) }
end
So, find_dupe([1,2,3,4,5,1]) would return 1.
This is actually a common "trick" interview question though. It is normally about a list of consecutive integers with one duplicate. In this case the interviewer is often looking for you to use the Gaussian sum of n-integers trick e.g. n*(n+1)/2 subtracted from the actual sum. The textbook answer is something like this.
def find_dupe_for_consecutive_integers(array)
n=array.size-1 # subtract one from array.size because of the dupe
array.sum - n*(n+1)/2
end

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio