Understanding the concept behind finding the duplicates in an array - algorithm

I found a method to find the duplicates in an array of n elements ranging from 0 to n-1.
Traverse the array. Do following for every index i of A[].
{
check for sign of A[abs(A[i])] ;
if positive then
make it negative by A[abs(A[i])] = -A[abs(A[i])];
else // i.e., A[abs(A[i])] is negative
this element (ith element of list) is a repetition
}
This method works fine. But I fail to understand how. Can someone explain it?
I am basically looking for a proof or a simpler understanding of this algorithm!

You're basically cheating by using the sign bits of each array element as an array of one-bit flags indicating the presence or absence of an element. This might or might not be faster than simply using a separate bit-set array, but it certainly makes use of the special case that you are using a signed representation (int) of unsigned values, therefore you have an extra unused bit to play with on each. This would not work if your values were signed, for example.

The algorithm stores additional information in the sign of each number in the array.
The sign in A[i] stores whether i occured previously during the processing: if it's negative, it occured once.
Note: "elements ranging from 0 to n-1." - Oh well, you cannot store the sign in 0, so this isn't a correct algorithm for the task.

See http://www.geeksforgeeks.org/find-the-two-repeating-elements-in-a-given-array/
Method 5.
The examples there might help you.

Related

Keep track of the result of accumulation for operation without inverse

I have an operation A * A -> A, which is commutative and associative. This means the order I apply it in doesn't matter, as long as I use the same elements. Nice.
I have to apply it to a list of values. To be more precise, I have to use it as the operation to accumulate the values of the list. So far, so good.
I then have a series of requests to add an element to the list, or erase it from the list. After each insertion or deletion, I have to return the new accumulated value for the new list. Simple, right?
The problem is I don't have an inverse; that is no operation '/' able to remove b if I only know a * b and tell me the other operand must have been a. (in fact, there isn't even an identity element)
So, my only obvious option is to accumulate again at every deletion -in linear time.
Can I do better? I've thought a lot about it.
And the answer is, of course I can... if I really want: I need to implement a custom binary tree, maybe a red/black one to have good worst case guarantees. Have next to the value an additional cache storing the result of the whole subtree.
cache = value * left.cache * right.cache
Maintain this invariant after every operation; then the root cache is the result.
However, "implement a custom R/B tree while maintaining an additional invariant" isn't something I'm particularly comfortable at doing. Well I would do it, but not swear by its correctness. Plus, the constant before the log would probably be significant. It seems pretty unwieldy, to do a simple thing like keeping track of an accumulation.
Does anyone see a better solution?
For completeness: the operation is a union of filters. A filter is a couple (code, mask), and a value "passes the filter" if (C bitwise operators) (value ^ code) & mask == 0; that is, if its bit corresponding to bits set in mask are equal to the corresponding bits in code. The union therefore sets to 0 (ignored) the bits where masks or codes differ, and keeps the ones which are the same.
Bonus appreciation to anyone finding a way to exploit the specific properties of the operation to get a solution more efficient than it is possible for the general problem I abstracted! ;-)
For your specific problem you could keep track for each bit x:
The total number of times that bit x is set to 1 in a mask
The total number of times that bit x is set to 1 in a mask and bit x of code is equal to 0
The total number of times that bit x is set to 1 in a mask and bit x of code is equal to 1
With these 3 counts (for each bit) it is straightforward to compute the union of all the filters.
The complexity is O(R) (where R is the number of bits in mask) to add or remove a filter.

Find number occuring even number of times? [duplicate]

Given is an array of integers. Each number in the array repeats an ODD number of times, but only 1 number is repeated for an EVEN number of times. Find that number.
I was thinking a hash map, with each element's count. It requires O(n) space. Is there a better way?
Hash-map is fine, but all you need to store is each element's count modulo 2.
All of those will end up being 1 (odd) except for the 0 (even) -count element.
(As Aleks G says you don't need to use arithmetic (count++ %2), only xor (count ^= 0x1); although any compiler will optimize that anyway.)
I don't know what the intended meaning of "repeat" is, but if there is an even number of occurrences of (all-1) numbers, and an odd number of occurances for only one number, then XOR should do the trick.
You don't need to keep the number of times each element is found - just whether it's even or odd number of time - so you should be ok with 1 bit for each element. Start with 0 for each element, then flip the corresponding bit when you encounter the element. Next time you encounter it, flip the bit again. At the end, just check which bit is 1.
If all numbers are repeated even times and one number repeats odd times, if you XOR all of the numbers, the odd count repeated number can be found.
By your current statement I think hashmap is good idea, but I'll think about it to find a better way. (I say this for positive integers.)
Apparently there is a solution in O(n) time and O(1) space, since it was asked at a software engineer company with this constraint explictly. See here : Bing interview question -- it seems to be doable using XOR over numbers in the array. Good luck ! :)

interview Q:Given an input array of size unknown with all 1's in the beginning and 0's in the end. find the index in the array from where 0's start

I was asked the following question in a job interview.
Given an input array of size unknown with all 1's in the beginning and 0's in the end. find the index in the array from where 0's start. consider there are millions of 1's and 0's in the array.i.e array is very big..e.g array contents 1111111.......1100000.........0000000.On later googling the question, I found the question on http://www.careercup.com/question?id=2441 .
The most puzzling thing about this question is if I don't know the size of an array, how do I know if *(array_name + index) belongs to the array?? Even if someone finds an index where value changes from 1 to 0, how can one assert that the index belongs to the array.
The best answer I could find was O(logn) solution where one keeps doubling index till one finds 0. Again what is the guarantee that the particular element belongs to the array.
EDIT:
it's a c based array. The constraint is you don't have the index of end elem (can't use sizeof(arr)/sizeof(arr[0])). what if i am at say 1024.arr[1024]==1. arr[2048] is out of bound as array length is 1029(unknown to the programmer). so is it okay to use arr[2048] while finding the solution?It's out of bound and it's value can be anything. So i was wondering that maybe the question is flawed.
If you don't know the length of the array, and can't read past the end of the array (because it might segfault or give you random garbage), then the only thing you can do is start from the beginning and look at each element until you find a zero:
int i = 0;
while (a[i] != 0) i++;
return i;
And you'd better hope there is at least one zero in the array.
If you can find out the length of the array somehow, then a binary search is indeed more efficient.
Ps. If it's a char array, it would be easier and likely faster to just call strlen() on it. The code above is pretty much what strlen() does, except that the standard library implementation is likely to be better optimized for your CPU architecture.
I would go with a Binary Search in order to find the 0.
At first you take the middle, if it is 1 you go in the right side otherwise in the left side. Keep doing this untill you find the first 0.
Now, the problem statement sais that : Given an input array of size unknown with all 1's in the beginning and 0's in the end. The way an Array is represented in the memory is 1 element after another, therefore since you know that there are 0's at the end of the array, if your algorithm works correctly then *(array_name + index) will surely belong to the array.
Edit :
Sorry, I just realised that the solution only works if you know the size. Otherwise, yes doubling the index is the best algorithm that comes to my mind too. But the proof of the fact that the index still belongs to the array is the same.
Edit due to comment:
It states that at the end of the array there are 0's. Therefore If you do a simple
int i;
while(i)
if( *(array_name+i) != 1 )
return i;
It should give you the first index, right?
Now since you know that the array looks like 1111...000000 you also know that atleast 1 of the 0's and that is the 1st one, surely belongs to the array.
In your case you do the search by doubling the index and then using a binary search between index and index/2. Here you can't be sure if index belongs the the array but the first 0 between index and index/2 surely belongs to the array ( the statement said there is atleast one 0 ).
Uppss... I just realised that if you keep doubling the index and you get out of the array you will find "garbage values" which means that they might not be 0's. So the best thing you can do is instead of checking for the first 0 to be checking for the first element which is not 0. Sadly there can be garbage values with the value of 1 ( extremly small chances but it might happen ). If that's the case you will need to use a O(n) algorithm.
If you don't know the size of the array you can start with index = 1; At each step you check if 2 * index is bigger than the length of the array - if it is or if it is zero - now you have a boundary to start the binary search; otherwise index = 2 * index.

Find random numbers in a given range with certain possible numbers excluded

Suppose you are given a range and a few numbers in the range (exceptions). Now you need to generate a random number in the range except the given exceptions.
For example, if range = [1..5] and exceptions = {1, 3, 5} you should generate either 2 or 4 with equal probability.
What logic should I use to solve this problem?
If you have no constraints at all, i guess this is the easiest way: create an array containing the valid values, a[0]...a[m] . Return a[rand(0,...,m)].
If you don't want to create an auxiliary array, but you can count the number of exceptions e and of elements n in the original range, you can simply generate a random number r=rand(0 ... n-e), and then find the valid element with a counter that doesn't tick on exceptions, and stops when it's equal to r.
Depends on the specifics of the case. For your specific example, I'd return a 2 if a Uniform(0,1) was below 1/2, 4 otherwise. Similarly, if I saw a pattern such as "the exceptions are odd numbers", I'd generate values for half the range and double. In general, though, I'd generate numbers in the range, check if they're in the exception set, and reject and re-try if they were - a technique known as acceptance/rejection for obvious reasons. There are a variety of techniques to make the exception-list check efficient, depending on how big it is and what patterns it may have.
Let's assume, to keep things simple, that arrays are indexed starting at 1, and your range runs from 1 to k. Of course, you can always shift the result by a constant if this is not the case. We'll call the array of exceptions ex_array, and let's say we have c exceptions. These need to be sorted, which shall turn out to be pretty important in a while.
Now, you only have k-e useful numbers to work with, so it'll be meaningful to find a random number in the range 1 to k-e. Say we end up with the number r. Now, we just need to find the r-th valid number in your array. Simple? Not so much. Remember, you can never simply walk over any of your arrays in a linear fashion, because that can really slow down your implementation when you have a lot of numbers. You have do some sort of binary search, say, to come up with a fast enough algorithm.
So let's try something better. The r-th number would nominally have lied at index r in your original array had you had no exceptions. The number at index r is r, of course, since your range and your array indices start from 1. But, you have a bunch of invalid numbers between 1 and r, and you want to somehow get to the r-th valid number. So, lets do a binary search on the array of exceptions, ex_array, to find how many invalid numbers are equal to or less than r, because we have these many invalid numbers lying between 1 and r. If this number is 0, we're all done, but if it isn't, we have a bit more work to do.
Assume you found there were n invalid numbers between 1 and r after the binary search. Let's advance n indices in your array to the index r+n, and find the number of invalid numbers lying between 1 and r+n, using a binary search to find how many elements in ex_array are less than or equal to r+n. If this number is exactly n, no more invalid numbers were encountered, and you've hit upon your r-th valid number. Otherwise, repeat again, this time for the index r+n', where n' is the number of random numbers that lay between 1 and r+n.
Repeat till you get to a stage where no excess exceptions are found. The important thing here is that you never once have to walk over any of the arrays in a linear fashion. You should optimize the binary searches so they don't always start at index 0. Say if you know there are n random numbers between 1 and r. Instead of starting your next binary search from 1, you could start it from one index after the index corresponding to n in ex_array.
In the worst case, you'll be doing binary searches for each element in ex_array, which means you'll do c binary searches, the first starting from index 1, the next from index 2, and so on, which gives you a time complexity of O(log(n!)). Now, Stirling's approximation tells us that O(ln(x!)) = O(xln(x)), so using the algorithm above only makes sense if c is small enough that O(cln(c)) < O(k), since you can achieve O(k) complexity using the trivial method of extracting valid elements from your array first.
In Python the solution is very simple (given your example):
import random
rng = set(range(1, 6))
ex = {1, 3, 5}
random.choice(list(rng-ex))
To optimize the solution, one needs to know how long is the range and how many exceptions there are. If the number of exceptions is very low, it's possible to generate a number from the range and just check if it's not an exception. If the number of exceptions is dominant, it probably makes sense to gather the remaining numbers into an array and generate random index for fetching non-exception.
In this answer I assume that it is known how to get an integer random number from a range.
Here's another approach...just keep on generating random numbers until you get one that isn't excluded.
Suppose your desired range was [0,100) excluding 25,50, and 75.
Put the excluded values in a hashtable or bitarray for fast lookup.
int randNum = rand(0,100);
while( excludedValues.contains(randNum) )
{
randNum = rand(0,100);
}
The complexity analysis is more difficult, since potentially rand(0,100) could return 25, 50, or 75 every time. However that is quite unlikely (assuming a random number generator), even if half of the range is excluded.
In the above case, we re-generate a random value for only 3/100 of the original values.
So 3% of the time you regenerate once. Of those 3%, only 3% will need to be regenerated, etc.
Suppose the initial range is [1,n] and and exclusion set's size is x. First generate a map from [1, n-x] to the numbers [1,n] excluding the numbers in the exclusion set. This mapping with 1-1 since there are equal numbers on both sides. In the example given in the question the mapping with be as follows - {1->2,2->4}.
Another example suppose the list is [1,10] and the exclusion list is [2,5,8,9] then the mapping is {1->1, 2->3, 3->4, 4->6, 5->7, 6->10}. This map can be created in a worst case time complexity of O(nlogn).
Now generate a random number between [1, n-x] and map it to the corresponding number using the mapping. Map looks can be done in O(logn).
You can do it in a versatile way if you have enumerators or set operations. For example using Linq:
void Main()
{
var exceptions = new[] { 1,3,5 };
RandomSequence(1,5).Where(n=>!exceptions.Contains(n))
.Take(10)
.Select(Console.WriteLine);
}
static Random r = new Random();
IEnumerable<int> RandomSequence(int min, int max)
{
yield return r.Next(min, max+1);
}
I would like to acknowledge some comments that are now deleted:
It's possible that this program never ends (only theoretically) because there could be a sequence that never contains valid values. Fair point. I think this is something that could be explained to the interviewer, however I believe my example is good enough for the context.
The distribution is fair because each of the elements has the same chance of coming up.
The advantage of answering this way is that you show understanding of modern "functional-style" programming, which may be interesting to the interviewer.
The other answers are also correct. This is a different take on the problem.

Finding a number that repeats even no of times where all the other numbers repeat odd no of times

Given is an array of integers. Each number in the array repeats an ODD number of times, but only 1 number is repeated for an EVEN number of times. Find that number.
I was thinking a hash map, with each element's count. It requires O(n) space. Is there a better way?
Hash-map is fine, but all you need to store is each element's count modulo 2.
All of those will end up being 1 (odd) except for the 0 (even) -count element.
(As Aleks G says you don't need to use arithmetic (count++ %2), only xor (count ^= 0x1); although any compiler will optimize that anyway.)
I don't know what the intended meaning of "repeat" is, but if there is an even number of occurrences of (all-1) numbers, and an odd number of occurances for only one number, then XOR should do the trick.
You don't need to keep the number of times each element is found - just whether it's even or odd number of time - so you should be ok with 1 bit for each element. Start with 0 for each element, then flip the corresponding bit when you encounter the element. Next time you encounter it, flip the bit again. At the end, just check which bit is 1.
If all numbers are repeated even times and one number repeats odd times, if you XOR all of the numbers, the odd count repeated number can be found.
By your current statement I think hashmap is good idea, but I'll think about it to find a better way. (I say this for positive integers.)
Apparently there is a solution in O(n) time and O(1) space, since it was asked at a software engineer company with this constraint explictly. See here : Bing interview question -- it seems to be doable using XOR over numbers in the array. Good luck ! :)

Resources