Array of size n, with one element n/2 times - algorithm

Given an array of n integers, where one element appears more than n/2 times. We need to find that element in linear time and constant extra space.
YAAQ: Yet another arrays question.

I have a sneaking suspicion it's something along the lines of (in C#)
// We don't need an array
public int FindMostFrequentElement(IEnumerable<int> sequence)
{
// Initial value is irrelevant if sequence is non-empty,
// but keeps compiler happy.
int best = 0;
int count = 0;
foreach (int element in sequence)
{
if (count == 0)
{
best = element;
count = 1;
}
else
{
// Vote current choice up or down
count += (best == element) ? 1 : -1;
}
}
return best;
}
It sounds unlikely to work, but it does. (Proof as a postscript file, courtesy of Boyer/Moore.)

Find the median, it takes O(n) on an unsorted array. Since more than n/2 elements are equal to the same value, the median is equal to that value as well.

int findLeader(int n, int* x){
int leader = x[0], c = 1, i;
for(i=1; i<n; i++){
if(c == 0){
leader = x[i];
c = 1;
} else {
if(x[i] == leader) c++;
else c--;
}
}
if(c == 0) return NULL;
else {
c = 0;
for(i=0; i<n; i++){
if(x[i] == leader) c++;
}
if(c > n/2) return leader;
else return NULL;
}
}
I'm not the author of this code, but this will work for your problem. The first part looks for a potential leader, the second checks if it appears more than n/2 times in the array.

This is what I thought initially.
I made an attempt to keep the invariant "one element appears more than n/2 times", while reducing the problem set.
Lets start comparing a[i], a[i+1]. If they're equal we compare a[i+i], a[i+2]. If not, we remove both a[i], a[i+1] from the array. We repeat this until i>=(current size)/2. At this point we'll have 'THE' element occupying the first (current size)/2 positions.
This would maintain the invariant.
The only caveat is that we assume that the array is in a linked list [for it to give a O(n) complexity.]
What say folks?
-bhupi

Well you can do an inplace radix sort as described here[pdf] this takes no extra space and linear time. then you can make a single pass counting consecutive elements and terminating at count > n/2.

How about:
randomly select a small subset of K elements and look for duplicates (e.g. first 4, first 8, etc). If K == 4 then the probability of not getting at least 2 of the duplicates is 1/8. if K==8 then it goes to under 1%. If you find no duplicates repeat the process until you do. (assuming that the other elements are more randomly distributed, this would perform very poorly with, say, 49% of the array = "A", 51% of the array ="B").
e.g.:
findDuplicateCandidate:
select a fixed size subset.
return the most common element in that subset
if there is no element with more than 1 occurrence repeat.
if there is more than 1 element with more than 1 occurrence call findDuplicate and choose the element the 2 calls have in common
This is a constant order operation (if the data set isn't bad) so then do a linear scan of the array in order(N) to verify.

My first thought (not sufficient) would be to:
Sort the array in place
Return the middle element
But that would be O(n log n), as would any recursive solution.
If you can destructively modify the array (and various other conditions apply) you could do a pass replacing elements with their counts or something. Do you know anything else about the array, and are you allowed to modify it?
Edit Leaving my answer here for posterity, but I think Skeet's got it.

in php---pls check if it's correct
function arrLeader( $A ){
$len = count($A);
$B = array();
$val=-1;
$counts = array_count_values(array); //return array with elements as keys and occurrences of each element as values
for($i=0;$i<$len;$i++){
$val = $A[$i];
if(in_array($val,$B,true)){//to avoid looping again and again
}else{
if($counts[$val]>$len/2){
return $val;
}
array_push($B, $val);//to avoid looping again and again
}
}
return -1;
}

int n = A.Length;
int[] L = new int[n + 1];
L[0] = -1;
for (int i = 0; i < n; i++)
{
L[i + 1] = A[i];
}
int count = 0;
int pos = (n + 1) / 2;
int candidate = L[pos];
for (int i = 1; i <= n; i++)
{
if (L[i] == candidate && L[pos++] == candidate)
return candidate;
}
if (count > pos)
return candidate;
return (-1);

Related

(with example) Why is KMP string matching O(n). Shouldn't it be O(n*m)?

Why is KMP O(n + m)?
I know this question has probably been asked a million times on here but I haven't find a solution that convinced me/I understood or a question that matched my example.
/**
* KMP algorithm of pattern matching.
*/
public boolean KMP(char []text, char []pattern){
int lps[] = computeTemporaryArray(pattern);
int i=0;
int j=0;
while(i < text.length && j < pattern.length){
if(text[i] == pattern[j]){
i++;
j++;
}else{
if(j!=0){
j = lps[j-1];
}else{
i++;
}
}
}
if(j == pattern.length){
return true;
}
return false;
}
n = size of text
m = size of pattern
I know why its + m, thats the runtime it takes to create the lsp array to do lookups. I'm not sure why the code I passed above is O(n).
I see that above "i" always progresses forwards EXCEPT when it doesn't match and j!= 0. In that case, we can do iterations of the while loop where i doesn't move forward, so its not exactly O(n)
If the lps array is incrementing like [1,2,3,4,5,6,0]. If we fail to match at index 6, j gets updated to 5, and then 4, and then 3.... and etc and we effectively go through m extra iterations (assuming all mismatch). This can occur at every step.
so it would look like
for (int i = 0; i < n; i++) {
for (int j = i; j >=0; j--) {
}
}
and to put all the possible i j combinations aka states would require a nm array so wouldn't the runtime be O(nm).
So is my reading of the code wrong, or the runtime analysis of the for loop wrong, or my example is impossible?
Actually, now that I think about it. It is O(n+m). Just visualized it as two windows shifting.

Sort a given array whose elements range from 1 to n , in which one element is missing and one is repeated

I have to sort this array in O(n) time and O(1) space.
I know how to sort an array in O(n) but that doesn't work with missing and repeated numbers. If I find the repeated and missing numbers first (It can be done in O(n)) and then sort , that seems costly.
static void sort(int[] arr)
{
for(int i=0;i<arr.length;i++)
{
if(i>=arr.length)
break;
if(arr[i]-1 == i)
continue;
else
{
while(arr[i]-1 != i)
{
int temp = arr[arr[i]-1];
arr[arr[i]-1] = arr[i];
arr[i] = temp;
}
}
}
}
First, you need to find missing and repeated numbers. You do this by solving following system of equations:
Left sums are computed simultaneously by making one pass over array. Right sums are even simpler -- you may use formulas for arithmetic progression to avoid looping. So, now you have system of two equations with two unknowns: missing number m and repeated number r. Solve it.
Next, you "sort" array by filling it with numbers 1 to n left to right, omitting m and duplicating r. Thus, overall algorithm requires only two passes over array.
void sort() {
for (int i = 1; i <= N; ++i) {
while (a[i] != a[a[i]]) {
std::swap(a[i], a[a[i]]);
}
}
for (int i = 1; i <= N; ++i) {
if (a[i] == i) continue;
for (int j = a[i] - 1; j >= i; --j) a[j] = j + 1;
for (int j = a[i] + 1; j <= i; ++j) a[j] = j - 1;
break;
}
}
Explanation:
Let's denote m the missing number and d the duplicated number
Please note in the while loop, the break condition is a[i] != a[a[i]] which covers both a[i] == i and a[i] is a duplicate.
After the first for, every non-duplicate number i is encountered 1-2 time and moved into the i-th position of the array at most 1 time.
The first-found number d is moved to d-th position, at most 1 time
The second d is moved around at most N-1 times and ends up in m-th position because every other i-th slot is occupied by number i
The second outer for locate the first i where a[i] != i. The only i satisfies that is i = m
The 2 inner fors handle 2 cases where m < d and m > d respectively
Full implementation at http://ideone.com/VDuLka
After
int temp = arr[arr[i]-1];
add a check for duplicate in the loop:
if((temp-1) == i){ // found duplicate
...
} else {
arr[arr[i]-1] = arr[i];
arr[i] = temp;
}
See if you can figure out the rest of the code.

How to find minimum positive contiguous sub sequence in O(n) time?

We have this algorithm for finding maximum positive sub sequence in given sequence in O(n) time. Can anybody suggest similar algorithm for finding minimum positive contiguous sub sequence.
For example
If given sequence is 1,2,3,4,5 answer should be 1.
[5,-4,3,5,4] ->1 is the minimum positive sum of elements [5,-4].
There cannot be such algorithm. The lower bound for this problem is O(n log n). I'll prove it by reducing the element distinctness problem to it (actually to the non-negative variant of it).
Let's suppose we have an O(n) algorithm for this problem (the minimum non-negative subarray).
We want to find out if an array (e.g. A=[1, 2, -3, 4, 2]) has only distinct elements. To solve this problem, I could construct an array with the difference between consecutive elements (e.g. A'=[1, -5, 7, -2]) and run the O(n) algorithm we have. The original array only has distinct elements if and only if the minimum non-negative subarray is greater than 0.
If we had an O(n) algorithm to your problem, we would have an O(n) algorithm to element distinctness problem, which we know is not possible on a Turing machine.
We can have a O(n log n) algorithm as follow:
Assuming that we have an array prefix, which index i stores the sum of array A from 0 to i, so the sum of sub-array (i, j) is prefix[j] - prefix[i - 1].
Thus, in order to find the minimum positive sub-array ending at index j, so, we need to find the maximum element prefix[x], which less than prefix[j] and x < j. We can find that element in O(log n) time if we use a binary search tree.
Pseudo code:
int[]prefix = new int[A.length];
prefix[0] = A[0];
for(int i = 1; i < A.length; i++)
prefix[i] = A[i] + prefix[i - 1];
int result = MAX_VALUE;
BinarySearchTree tree;
for(int i = 0; i < A.length; i++){
if(A[i] > 0)
result = min(result, A[i];
int v = tree.getMaximumElementLessThan(prefix[i]);
result = min(result, prefix[i] - v);
tree.add(prefix[i]);
}
I believe there's a O(n) algorithm, see below.
Note: it has a scale factor that might make it less attractive in practical applications: it depends on the (input) values to be processed, see remarks in the code.
private int GetMinimumPositiveContiguousSubsequenc(List<Int32> values)
{
// Note: this method has no precautions against integer over/underflow, which may occur
// if large (abs) values are present in the input-list.
// There must be at least 1 item.
if (values == null || values.Count == 0)
throw new ArgumentException("There must be at least one item provided to this method.");
// 1. Scan once to:
// a) Get the mimumum positive element;
// b) Get the value of the MAX contiguous sequence
// c) Get the value of the MIN contiguous sequence - allowing negative values: the mirror of the MAX contiguous sequence.
// d) Pinpoint the (index of the) first negative value.
int minPositive = 0;
int maxSequence = 0;
int currentMaxSequence = 0;
int minSequence = 0;
int currentMinSequence = 0;
int indxFirstNegative = -1;
for (int k = 0; k < values.Count; k++)
{
int value = values[k];
if (value > 0)
if (minPositive == 0 || value < minPositive)
minPositive = value;
else if (indxFirstNegative == -1 && value < 0)
indxFirstNegative = k;
currentMaxSequence += value;
if (currentMaxSequence <= 0)
currentMaxSequence = 0;
else if (currentMaxSequence > maxSequence)
maxSequence = currentMaxSequence;
currentMinSequence += value;
if (currentMinSequence >= 0)
currentMinSequence = 0;
else if (currentMinSequence < minSequence)
minSequence = currentMinSequence;
}
// 2. We're done if (a) there are no negatives, or (b) the minPositive (single) value is 1 (or 0...).
if (minSequence == 0 || minPositive <= 1)
return minPositive;
// 3. Real work to do.
// The strategy is as follows, iterating over the input values:
// a) Keep track of the cumulative value of ALL items - the sequence that starts with the very first item.
// b) Register each such cumulative value as "existing" in a bool array 'initialSequence' as we go along.
// We know already the max/min contiguous sequence values, so we can properly size that array in advance.
// Since negative sequence values occur we'll have an offset to match the index in that bool array
// with the corresponding value of the initial sequence.
// c) For each next input value to process scan the "initialSequence" bool array to see whether relevant entries are TRUE.
// We don't need to go over the complete array, as we're only interested in entries that would produce a subsequence with
// a value that is positive and also smaller than best-so-far.
// (As we go along, the range to check will normally shrink as we get better and better results.
// Also: initially the range is already limited by the single-minimum-positive value that we have found.)
// Performance-wise this approach (which is O(n)) is suitable IFF the number of input values is large (or at least: not small) relative to
// the spread between maxSequence and minSeqence: the latter two define the size of the array in which we will do (partial) linear traversals.
// If this condition is not met it may be more efficient to replace the bool array by a (binary) search tree.
// (which will result in O(n logn) performance).
// Since we know the relevant parameters at this point, we may below have the two strategies both implemented and decide run-time
// which to choose.
// The current implementation has only the fixed bool array approach.
// Initialize a variable to keep track of the best result 'so far'; it will also be the return value.
int minPositiveSequence = minPositive;
// The bool array to keep track of which (total) cumulative values (always with the sequence starting at element #0) have occurred so far,
// and the 'offset' - see remark 3b above.
int offset = -minSequence;
bool[] initialSequence = new bool[maxSequence + offset + 1];
int valueCumulative = 0;
for (int k = 0; k < indxFirstNegative; k++)
{
int value = values[k];
valueCumulative += value;
initialSequence[offset + valueCumulative] = true;
}
for (int k = indxFirstNegative; k < values.Count; k++)
{
int value = values[k];
valueCumulative += value;
initialSequence[offset + valueCumulative] = true;
// Check whether the difference with any previous "cumulative" may improve the optimum-so-far.
// the index that, if the entry is TRUE, would yield the best possible result.
int indexHigh = valueCumulative + offset - 1;
// the last (lowest) index that, if the entry is TRUE, would still yield an improvement over what we have so far.
int indexLow = Math.Max(0, valueCumulative + offset - minPositiveSequence + 1);
for (int indx = indexHigh; indx >= indexLow; indx--)
{
if (initialSequence[indx])
{
minPositiveSequence = valueCumulative - indx + offset;
if (minPositiveSequence == 1)
return minPositiveSequence;
break;
}
}
}
return minPositiveSequence;
}
}

Interview Ques - find the KTHSMALLEST element in an unsorted array

This problem asked to find k'th smallest element in an unsorted array of non-negative integers.
Here main problem is memory limit :( Here we can use constant extra space.
First I tried a O(n^2) method [without any extra memory] which gave me TLE.
Then i tried to use priority queue [extra memory] which gave me MLE :(
Any idea how to solve the problem with constant extra space and within time limit.
You can use a O(n^2) method with some pruning, which will make the program like O(nlogn) :)
Declare two variable low = maximum value which position is less than k and high = lowest value which position is greater than k
Keep track of the low and high value you already processed.
Whenever a new value comes check if it is in the [low , high] boundary. If yes then process it otherwise skip the value.
That's it :) I think it will pass both TLE and MLE :)
Have a look at my code :
int low=0,high=1e9;
for(int i=0;i<n;i++) // n is the total number of element
{
if(!(A[i]>=low&&A[i]<=high)) // A is the array in which the element are saved
continue;
int cnt=0,cnt1=0; // cnt is for the strictly less value and cnt1 for same value. Because value can be duplicate.
for(int j=0;j<n;j++)
{
if(i!=j&&A[i]>A[j])
cnt++;
if(A[i]==A[j])
cnt1++;
if(cnt>k)
break;
}
if(cnt+cnt1<k)
low=A[i]+1;
else if(cnt>=k)
high=A[i]-1;
if(cnt<k&&(cnt+cnt1)>=k)
{
return A[i];
}
}
You can do an in-place Selection Algorithm.
The idea is similar to quicksort, but recurse only on the relevant part of the array, not all of it. Note that the algorithm can be implemented with O(1) extra space pretty easily - since it's recursive call is a tail call.
This leads to an O(n) solution on average case (just be sure to pick a pivot at random in order to make sure you don't fall into pre-designed edge cases such as a sorted list). That can be improved to worst case O(n) using median of median technique, but with significantly worse constants.
Binary search on the answer for the problem.
2 major observations here :
Given that all values in the array are of type 'int', their range can be defined as [0, 2^31]. That would be your search space.
Given a value x, I can always tell in O(n) if the kth smallest element is smaller than x or greater than x.
A rough pseudocode :
start = 0, end = 2^31 - 1
while start <= end
x = (start + end ) / 2
less = number of elements less than or equal to x
if less > k
end = x - 1
elif less < k
start = x + 1
else
ans = x
end = x - 1
return ans
Hope this helps.
I believe I found a solution that is similar to that of #AliAkber but is slightly easier to understand (I keep track of fewer variables).
It passed all tests on InterviewBit
Here's the code (Java):
public int kthsmallest(final List<Integer> a, int k) {
int lo = Integer.MIN_VALUE;
int hi = Integer.MAX_VALUE;
int champ = -1;
for (int i = 0; i < a.size(); i++) {
int iVal = a.get(i);
int count = 0;
if (!(iVal > lo && iVal < hi)) continue;
for (int j = 0; j < a.size(); j++) {
if (a.get(j) <= iVal) count++;
if (count > k) break;
}
if (count > k && iVal < hi) hi = iVal;
if (count < k && iVal > lo) lo = iVal;
if (count >= k && (champ == -1 || iVal < champ))
champ = iVal;
}
return champ;
}

Efficient algorithm for: Given an unsorted array of positive integers and an integer N, return N if N existed in array or the first number <N

I had this question:
Given an unsorted array of positive integers and an integer N, return N if N existed in the array or the first number that is smaller than N.
in an interview and wanted to know what would be the best efficient algorithm to solve it?
I had given two approaches using hash and sorting array but it was not correct and efficient approach. I would really appreciate if someone can give an optimal algorithm for this problem.
I'm assuming this is in a C-style language; if not, please update the question to reflect the language.
If the array isn't sorted, then you have no choice but a (potentially) full traversal of the array to look for N, as any sorting operation is going take longer than simply traversing the array (other than by finding the element by "blind luck"). Something akin to this would probably be the most efficient (unless I'm missing something)
int retVal = -1;
for(int i = 0; i < ARRAY_LENGTH; i++)
{
if(array[i] == N) return N;
if(retVal == -1 && array[i] < N) retVal = array[i];
}
return retVal;
As suggested elsewhere, you can modify
if(retVal == -1 && array[i] < N) retVal = array[i];
to
if(retVal < array[i] && array[i] < N) retVal = array[i];
In order to get the largest value that's smaller than N, rather than simply the first.
Scan through the list from beginning to end, if you see a value less than N, hold on to the first one until you reach the end, or find N. If you find N, return it, if you reach the end, return the value you've held on to. Presumably there'd have to be some value to return if all the values were greater than N, but the problem doesn't state that.
O(N) performance, O(1) space usage.
It's a little trickier if you're looking for the largest value smaller than N. In which case, instead of holding on to the first value smaller than N, you simply grab a new value every time you find a value smaller than N, but larger than the value you are currently holding on to.
Simply replace
if(array[i] < N && retVal == -1) retVal = array[i];
with
if(array[i] < N && retVal < array[i]) retVal = array[i];
in Adam's answer
Here is my code:
int getNindex(int a[],int n,int N)
{
int min=-99999,i=0,minindex=-1;
for(i=0;i<n;i++)
{
if(a[i]>min && a[i]<=N)
{
min=a[i];
minindex=i;
}
}
return minindex;
}
int main()
{
int a[]={5,75,20,50,100};
int Nindex=getNindex(a,5,60);
if(Nindex>=0)
printf("index= %d,N= %d\n",Nindex,a[Nindex]);
return 0;
}

Resources