Programming and Data structure - data-structures

Suppose you are given arrays p [1......N] and q [1......N] both uninitialized, that is, each location may contain an arbitrary value), and a variable count, initialized to 0. Consider the following procedures set and is_set:
set(i) {
count = count + 1;
q[count] = i;
p[i] = count;
}
is_set(i) {
if (p[i] ≤ 0 or p[i] > count)
return false;
if (q[p[i]] ≠ i)
return false;
return true;
}
Show that if set(i) has not been called for some i, then regardless of what p[i] contains, is_set(i) will return false.
My reasoning:-is_set(i)
If p[i] is <=0 then first condition of OR is true and first if is executed . If p[i]>0 ,then we know count is zero and hence second condition in OR of first if will be true.SO whatever p[i] may contain it will always enter in this if and return false.Please correct me if wrong

Related

Binary Search Explanation

Link: https://leetcode.com/problems/first-bad-version/discuss/71386/An-clear-way-to-use-binary-search
I am doing a question wherein, given a string like this "FFTTTT", I have to find either the rightmost F or the leftmost T.
The following is the code:
To find the leftmost T
public int firstBadVersionLeft(int n) {
int i = 1;
int j = n;
while (i < j) {
int mid = i + (j - i) / 2;
if (isBadVersion(mid)) {
j = mid;
} else {
i = mid + 1;
}
}
return i;
}
I have the following doubts:
I am unable to understand the intuition behind returning i. I mean, why didn't we return j. I did a trial run of the code in my mind, and it works out, but how do we know we have to return i.
Why didn't we do while (i<=j) and just while(i<j). I mean, how do we determine this?
You stop when i==j as far as I can tell. So you can return either of them since they have the same value.
The difference between the two conditions is i == j . In that case mid == i + 0/2 == i. So isBadVersion can return true or false. If it returns true then you do j= mid, but we already had i == j == mid, so you don't do anything and you have an infinite loop. If isBadVersion returns false then you will make i == j+1 and the loop will end. Since the boundaries are inclusive (they include the i'th and j'th element) this can only happen if there's no 'T' in the string.
So you do (i < j) to avoid that infinite loop case.
P.S. This code will return n if there's not 'T' in the string or the last character is a 'T'. Not sure if that's intended or not.

How to find minimum positive contiguous sub sequence in O(n) time?

We have this algorithm for finding maximum positive sub sequence in given sequence in O(n) time. Can anybody suggest similar algorithm for finding minimum positive contiguous sub sequence.
For example
If given sequence is 1,2,3,4,5 answer should be 1.
[5,-4,3,5,4] ->1 is the minimum positive sum of elements [5,-4].
There cannot be such algorithm. The lower bound for this problem is O(n log n). I'll prove it by reducing the element distinctness problem to it (actually to the non-negative variant of it).
Let's suppose we have an O(n) algorithm for this problem (the minimum non-negative subarray).
We want to find out if an array (e.g. A=[1, 2, -3, 4, 2]) has only distinct elements. To solve this problem, I could construct an array with the difference between consecutive elements (e.g. A'=[1, -5, 7, -2]) and run the O(n) algorithm we have. The original array only has distinct elements if and only if the minimum non-negative subarray is greater than 0.
If we had an O(n) algorithm to your problem, we would have an O(n) algorithm to element distinctness problem, which we know is not possible on a Turing machine.
We can have a O(n log n) algorithm as follow:
Assuming that we have an array prefix, which index i stores the sum of array A from 0 to i, so the sum of sub-array (i, j) is prefix[j] - prefix[i - 1].
Thus, in order to find the minimum positive sub-array ending at index j, so, we need to find the maximum element prefix[x], which less than prefix[j] and x < j. We can find that element in O(log n) time if we use a binary search tree.
Pseudo code:
int[]prefix = new int[A.length];
prefix[0] = A[0];
for(int i = 1; i < A.length; i++)
prefix[i] = A[i] + prefix[i - 1];
int result = MAX_VALUE;
BinarySearchTree tree;
for(int i = 0; i < A.length; i++){
if(A[i] > 0)
result = min(result, A[i];
int v = tree.getMaximumElementLessThan(prefix[i]);
result = min(result, prefix[i] - v);
tree.add(prefix[i]);
}
I believe there's a O(n) algorithm, see below.
Note: it has a scale factor that might make it less attractive in practical applications: it depends on the (input) values to be processed, see remarks in the code.
private int GetMinimumPositiveContiguousSubsequenc(List<Int32> values)
{
// Note: this method has no precautions against integer over/underflow, which may occur
// if large (abs) values are present in the input-list.
// There must be at least 1 item.
if (values == null || values.Count == 0)
throw new ArgumentException("There must be at least one item provided to this method.");
// 1. Scan once to:
// a) Get the mimumum positive element;
// b) Get the value of the MAX contiguous sequence
// c) Get the value of the MIN contiguous sequence - allowing negative values: the mirror of the MAX contiguous sequence.
// d) Pinpoint the (index of the) first negative value.
int minPositive = 0;
int maxSequence = 0;
int currentMaxSequence = 0;
int minSequence = 0;
int currentMinSequence = 0;
int indxFirstNegative = -1;
for (int k = 0; k < values.Count; k++)
{
int value = values[k];
if (value > 0)
if (minPositive == 0 || value < minPositive)
minPositive = value;
else if (indxFirstNegative == -1 && value < 0)
indxFirstNegative = k;
currentMaxSequence += value;
if (currentMaxSequence <= 0)
currentMaxSequence = 0;
else if (currentMaxSequence > maxSequence)
maxSequence = currentMaxSequence;
currentMinSequence += value;
if (currentMinSequence >= 0)
currentMinSequence = 0;
else if (currentMinSequence < minSequence)
minSequence = currentMinSequence;
}
// 2. We're done if (a) there are no negatives, or (b) the minPositive (single) value is 1 (or 0...).
if (minSequence == 0 || minPositive <= 1)
return minPositive;
// 3. Real work to do.
// The strategy is as follows, iterating over the input values:
// a) Keep track of the cumulative value of ALL items - the sequence that starts with the very first item.
// b) Register each such cumulative value as "existing" in a bool array 'initialSequence' as we go along.
// We know already the max/min contiguous sequence values, so we can properly size that array in advance.
// Since negative sequence values occur we'll have an offset to match the index in that bool array
// with the corresponding value of the initial sequence.
// c) For each next input value to process scan the "initialSequence" bool array to see whether relevant entries are TRUE.
// We don't need to go over the complete array, as we're only interested in entries that would produce a subsequence with
// a value that is positive and also smaller than best-so-far.
// (As we go along, the range to check will normally shrink as we get better and better results.
// Also: initially the range is already limited by the single-minimum-positive value that we have found.)
// Performance-wise this approach (which is O(n)) is suitable IFF the number of input values is large (or at least: not small) relative to
// the spread between maxSequence and minSeqence: the latter two define the size of the array in which we will do (partial) linear traversals.
// If this condition is not met it may be more efficient to replace the bool array by a (binary) search tree.
// (which will result in O(n logn) performance).
// Since we know the relevant parameters at this point, we may below have the two strategies both implemented and decide run-time
// which to choose.
// The current implementation has only the fixed bool array approach.
// Initialize a variable to keep track of the best result 'so far'; it will also be the return value.
int minPositiveSequence = minPositive;
// The bool array to keep track of which (total) cumulative values (always with the sequence starting at element #0) have occurred so far,
// and the 'offset' - see remark 3b above.
int offset = -minSequence;
bool[] initialSequence = new bool[maxSequence + offset + 1];
int valueCumulative = 0;
for (int k = 0; k < indxFirstNegative; k++)
{
int value = values[k];
valueCumulative += value;
initialSequence[offset + valueCumulative] = true;
}
for (int k = indxFirstNegative; k < values.Count; k++)
{
int value = values[k];
valueCumulative += value;
initialSequence[offset + valueCumulative] = true;
// Check whether the difference with any previous "cumulative" may improve the optimum-so-far.
// the index that, if the entry is TRUE, would yield the best possible result.
int indexHigh = valueCumulative + offset - 1;
// the last (lowest) index that, if the entry is TRUE, would still yield an improvement over what we have so far.
int indexLow = Math.Max(0, valueCumulative + offset - minPositiveSequence + 1);
for (int indx = indexHigh; indx >= indexLow; indx--)
{
if (initialSequence[indx])
{
minPositiveSequence = valueCumulative - indx + offset;
if (minPositiveSequence == 1)
return minPositiveSequence;
break;
}
}
}
return minPositiveSequence;
}
}

Binary search bounds

I always have the hardest time with this and I have yet to see a definitive explanation for something that is supposedly so common and highly-used.
We already know the standard binary search. Given starting lower and upper bounds, find the middle point at (lower + higher)/2, and then compare it against your array, and then re-set the bounds accordingly, etc.
However what are the needed differences to adjust the search to find (for a list in ascending order):
Smallest value >= target
Smallest value > target
Largest value <= target
Largest value < target
It seems like each of these cases requires very small tweaks to the algorithm but I can never get them to work right. I try changing inequalities, return conditions, I change how the bounds are updated, but nothing seems consistent.
What are the definitive ways to handle these four cases?
I had exactly the same issue until I figured out loop invariants along with predicates are the best and most consistent way of approaching all binary problems.
Point 1: Think of predicates
In general for all these 4 cases (and also the normal binary search for equality), imagine them as a predicate. So what this means is that some of the values are meeting the predicate and some some failing. So consider for example this array with a target of 5:
[1, 2, 3, 4, 6, 7, 8]. Finding the first number greater than 5 is basically equivalent of finding the first one in this array: [0, 0, 0, 0, 1, 1, 1].
Point 2: Search boundaries inclusive
I like to have both ends always inclusive. But I can see some people like start to be inclusive and end exclusive (on len instead of len -1). I like to have all the elements inside of the array, so when referring to a[mid] I don't think whether that will give me an array out of bound. So my preference: Go inclusive!!!
Point 3: While loop condition <=
So we even want to process the subarray of size 1 in the while loop, and when the while loop finishes there should be no unprocessed element. I really like this logic. It's always solid as a rock. Initially all the elements are not inspected, basically they are unknown. Meaning that everything in the range of [st = 0, to end = len - 1] are not inspected. Then when the while loop finishes, the range of uninspected elements should be array of size 0!
Point 4: Loop invariants
Since we defined start = 0, end = len - 1, invariants will be like this:
Anything left of start is smaller than target.
Anything right of end is greater than or equal to the target.
Point 5: The answer
Once the loop finishes, basically based on the loop invariants anything to the left of start is smaller. So that means that start is the first element greater than or equal to the target.
Equivalently, anything to the right of end is greater than or equal to the target. So that means the answer is also equal to end + 1.
The code:
public int find(int a[], int target){
int start = 0;
int end = a.length - 1;
while (start <= end){
int mid = (start + end) / 2; // or for no overflow start + (end - start) / 2
if (a[mid] < target)
start = mid + 1;
else // a[mid] >= target
end = mid - 1;
}
return start; // or end + 1;
}
variations:
<
It's equivalent of finding the first 0. So basically only return changes.
return end; // or return start - 1;
>
change the if condition to <= and else will be >. No other change.
<=
Same as >, return end; // or return start - 1;
So in general with this model for all the 5 variations (<=, <, >, >=, normal binary search) only the condition in the if changes and the return statement. And figuring those small changes is super easy when you consider the invariants (point 4) and the answer (point 5).
Hope this clarifies for whoever reads this. If anything is unclear of feels like magic please ping me to explain. After understanding this method, everything for binary search should be as clear as day!
Extra point: It would be a good practice to also try including the start but excluding the end. So the array would be initially [0, len). If you can write the invariants, new condition for the while loop, the answer and then a clear code, it means you learnt the concept.
Binary search(at least the way I implement it) relies on a simple property - a predicate holds true for one end of the interval and does not hold true for the other end. I always consider my interval to be closed at one end and opened at the other. So let's take a look at this code snippet:
int beg = 0; // pred(beg) should hold true
int end = n;// length of an array or a value that is guranteed to be out of the interval that we are interested in
while (end - beg > 1) {
int mid = (end + beg) / 2;
if (pred(a[mid])) {
beg = mid;
} else {
end = mid;
}
}
// answer is at a[beg]
This will work for any of the comparisons you define. Simply replace pred with <=target or >=target or <target or >target.
After the cycle exits, a[beg] will be the last element for which the given inequality holds.
So let's assume(like suggested in the comments) that we want to find the largest number for which a[i] <= target. Then if we use predicate a[i] <= target the code will look like:
int beg = 0; // pred(beg) should hold true
int end = n;// length of an array or a value that is guranteed to be out of the interval that we are interested in
while (end - beg > 1) {
int mid = (end + beg) / 2;
if (a[mid] <= target) {
beg = mid;
} else {
end = mid;
}
}
And after the cycle exits, the index that you are searching for will be beg.
Also depending on the comparison you may have to start from the right end of the array. E.g. if you are searching for the largest value >= target, you will do something of the sort of:
beg = -1;
end = n - 1;
while (end - beg > 1) {
int mid = (end + beg) / 2;
if (a[mid] >= target) {
end = mid;
} else {
beg = mid;
}
}
And the value that you are searching for will be with index end. Note that in this case I consider the interval (beg, end] and thus I've slightly modified the starting interval.
The basic binary search is to search the position/value which equals with the target key. While it can be extended to find the minimal position/value which satisfy some condition, or find the maximal position/value which satisfy some condition.
Suppose the array is ascending order, if no satisfied position/value found, return -1.
Code sample:
// find the minimal position which satisfy some condition
private static int getMinPosition(int[] arr, int target) {
int l = 0, r = arr.length - 1;
int ans = -1;
while(l <= r) {
int m = (l + r) >> 1;
// feel free to replace the condition
// here it means find the minimal position that the element not smaller than target
if(arr[m] >= target) {
ans = m;
r = m - 1;
} else {
l = m + 1;
}
}
return ans;
}
// find the maximal position which satisfy some condition
private static int getMaxPosition(int[] arr, int target) {
int l = 0, r = arr.length - 1;
int ans = -1;
while(l <= r) {
int m = (l + r) >> 1;
// feel free to replace the condition
// here it means find the maximal position that the element less than target
if(arr[m] < target) {
ans = m;
l = m + 1;
} else {
r = m - 1;
}
}
return ans;
}
int[] a = {3, 5, 5, 7, 10, 15};
System.out.println(BinarySearchTool.getMinPosition(a, 5));
System.out.println(BinarySearchTool.getMinPosition(a, 6));
System.out.println(BinarySearchTool.getMaxPosition(a, 8));
What you need is a binary search that lets you participate in the process at the last step. The typical binary search would receive (array, element) and produce a value (normally the index or not found). But if you have a modified binary that accept a function to be invoked at the end of the search you can cover all cases.
For example, in Javascript to make it easy to test, the following binary search
function binarySearch(array, el, fn) {
function aux(left, right) {
if (left > right) {
return fn(array, null, left, right);
}
var middle = Math.floor((left + right) / 2);
var value = array[middle];
if (value > el) {
return aux(left, middle - 1);
} if (value < el) {
return aux(middle + 1, right);
} else {
return fn(array, middle, left, right);
}
}
return aux(0, array.length - 1);
}
would allow you to cover each case with a particular return function.
default
function(a, m) { return m; }
Smallest value >= target
function(a, m, l, r) { return m != null ? a[m] : r + 1 >= a.length ? null : a[r + 1]; }
Smallest value > target
function(a, m, l, r) { return (m || r) + 1 >= a.length ? null : a[(m || r) + 1]; }
Largest value <= target
function(a, m, l, r) { return m != null ? a[m] : l - 1 > 0 ? a[l - 1] : null; }
Largest value < target
function(a, m, l, r) { return (m || l) - 1 < 0 ? null : a[(m || l) - 1]; }

Lexographically smallest path in a N*M grid

I came across this in a recent interview.
We are given a N*M grid consisting of numbers and a path in the grid is the nodes you traverse.We are given a constraint that we can only move either right or down in the grid.So given this grid, we need to find the lexographically smallest path,after sorting it, to reach from top left to bottom right point of the grid
Eg. if grid is 2*2
4 3
5 1
then lexographically smallest path as per the question is "1 3 4".
How to do such problem? Code is appreciated. Thanks in advance.
You can use Dynamic programming to solve this problem. Let f(i, j) be the smallest lexicographical path (after sorting the path) from (i, j) to (N, M) moving only right and down. Consider the following recurrence:
f(i, j) = sort( a(i, j) + smallest(f(i + 1, j), f(i, j + 1)))
where a(i, j) is the value in the grid at (i, j), smallest (x, y) returns the smaller lexicographical string between x and y. the + concatenate two strings, and sort(str) sorts the string str in lexical order.
The base case of the recurrence is:
f(N, M) = a(N, M)
Also the recurrence change when i = N or j = M (make sure that you see that).
Consider the following code written in C++:
//-- the 200 is just the array size. It can be modified
string a[200][200]; //-- represent the input grid
string f[200][200]; //-- represent the array used for memoization
bool calculated[200][200]; //-- false if we have not calculate the value before, and true if we have
int N = 199, M = 199; //-- Number of rows, Number of columns
//-- sort the string str and return it
string srt(string &str){
sort(str.begin(), str.end());
return str;
}
//-- return the smallest of x and y
string smallest(string & x, string &y){
for (int i = 0; i < x.size(); i++){
if (x[i] < y[i]) return x;
if (x[i] > y[i]) return y;
}
return x;
}
string solve(int i, int j){
if (i == N && j == M) return a[i][j]; //-- if we have reached the buttom right cell (I assumed the array is 1-indexed
if (calculated[i][j]) return f[i][j]; //-- if we have calculated this before
string ans;
if (i == N) ans = srt(a[i][j] + solve(i, j + 1)); //-- if we are at the buttom boundary
else if (j == M) ans = srt(a[i][j] + solve(i + 1, j)); //-- if we are at the right boundary
else ans = srt(a[i][j] + smallest(solve(i, j + 1), solve(i + 1, j)));
calculated[i][j] = true; //-- to fetch the calculated result in future calls
f[i][j] = ans;
return ans;
}
string calculateSmallestPath(){
return solve(1, 1);
}
You can apply a dynamic programming approach to solve this problem in O(N * M * (N + M)) time and space complexity.
Below I'll consider, that N is the number of rows, M is the number of columns, and top left cell has coordinates (0, 0), first for row and second for column.
Lets for each cell store the lexicographically smallest path ended at this cell in sorted order. The answer for row and column with 0 index is trivial, because there is only one way to reach each of these cells. For the rest of cells you should choose the smallest path for top and left cells and insert the value of current cell.
The algorithm is:
path[0][0] <- a[0][0]
path[i][0] <- insert(a[i][0], path[i - 1][0])
path[0][j] <- insert(a[0][j], path[0][j - 1])
path[i][j] <- insert(a[i][j], min(path[i - 1][j], path[i][j - 1])
If no number is repeated, this can be achieved in O (NM log (NM)) as well.
Intuition:
Suppose I label a grid with upper left corner (a,b) and bottom right corner (c,d) as G(a,b,c,d). Since you've to attain the lexicographically smallest string AFTER sorting the path, the aim should be to find the minimum value every time in G. If this minimum value is attained at, let's say, (i,j), then G(i,b,c,j) and G(a,j,i,d) are rendered useless for the search of our next min (for the path). That is to say, the values for the path we desire would never be in these two grids. Proof? Any location within these grids, if traversed will not let us reach the minimum value in G(a,b,c,d) (the one at (i,j)). And, if we avoid (i,j), the path we build cannot be lexicographically smallest.
So, first we find the min for G(1,1,m,n). Suppose it's at (i,j). Mark the min. We then find out the min in G(1,1,i,j) and G(i,j,m,n) and do the same for them. Keep continuing this way until, at the end, we have m+n-1 marked entries, which will constitute our path. Traverse the original grid G(1,1,m,n) linearly and the report the value if it is marked.
Approach:
To find the min every time in G is costly. What if we map each value in the grid to it's location? - Traverse the grid and maintain a dictionary Dict with the key being the value at (i,j) and the value being the tuple (i,j). At the end, you'll have a list of key value pairs covering all the values in the grid.
Now, we'll be maintaining a list of valid grids in which we will find candidates for our path. The first valid grid will be G(1,1,m,n).
Sort the keys and start iterating from the first value in the sorted key set S.
Maintain a tree of valid grids, T(G), such that for each G(a,b,c,d) in T, G.left = G(a,b,i,j) and G.right = G(i,j,c,d) where (i,j) = location of min val in G(a,b,c,d)
The algorithm now:
for each val in sorted key set S do
(i,j) <- Dict(val)
Grid G <- Root(T)
do while (i,j) in G
if G has no child do
G.left <- G(a,b,i,j)
G.right <- G(i,j,c,d)
else if (i,j) in G.left
G <- G.left
else if (i,j) in G.right
G <- G.right
else
dict(val) <- null
end do
end if-else
end do
end for
for each val in G(1,1,m,n)
if dict(val) not null
solution.append(val)
end if
end for
return solution
The Java code:
class Grid{
int a, b, c, d;
Grid left, right;
Grid(int a, int b, int c, int d){
this.a = a;
this.b = b;
this.c = c;
this.d = d;
left = right = null;
}
public boolean isInGrid(int e, int f){
return (e >= a && e <= c && f >= b && f <= d);
}
public boolean hasNoChild(){
return (left == null && right == null);
}
}
public static int[] findPath(int[][] arr){
int row = arr.length;
int col = arr[0].length;
int[][] index = new int[row*col+1][2];
HashMap<Integer,Point> map = new HashMap<Integer,Point>();
for(int i = 0; i < row; i++){
for(int j = 0; j < col; j++){
map.put(arr[i][j], new Point(i,j));
}
}
Grid root = new Grid(0,0,row-1,col-1);
SortedSet<Integer> keys = new TreeSet<Integer>(map.keySet());
for(Integer entry : keys){
Grid temp = root;
int x = map.get(entry).x, y = map.get(entry).y;
while(temp.isInGrid(x, y)){
if(temp.hasNoChild()){
temp.left = new Grid(temp.a,temp.b,x, y);
temp.right = new Grid(x, y,temp.c,temp.d);
break;
}
if(temp.left.isInGrid(x, y)){
temp = temp.left;
}
else if(temp.right.isInGrid(x, y)){
temp = temp.right;
}
else{
map.get(entry).x = -1;
break;
}
}
}
int[] solution = new int[row+col-1];
int count = 0;
for(int i = 0 ; i < row; i++){
for(int j = 0; j < col; j++){
if(map.get(arr[i][j]).x >= 0){
solution[count++] = arr[i][j];
}
}
}
return solution;
}
The space complexity is constituted by maintenance of dictionary - O(NM) and of the tree - O(N+M). Overall: O(NM)
The time complexity for filling up and then sorting the dictionary - O(NM log(NM)); for checking the tree for each of the NM values - O(NM log(N+M)). Overall - O(NM log(NM)).
Of course, this won't work if values are repeated since then we'd have more than one (i,j)'s for a single value in the grid and the decision to chose which will no longer be satisfied by a greedy approach.
Additional FYI: The problem similar to this I heard about earlier had an additional grid property - there are no values repeating and the numbers are from 1 to NM. In such a case, the complexity could further reduce to O(NM log(N+M)) since instead of a dictionary, you can simply use values in the grid as indices of an array (which won't required sorting.)

Array of size n, with one element n/2 times

Given an array of n integers, where one element appears more than n/2 times. We need to find that element in linear time and constant extra space.
YAAQ: Yet another arrays question.
I have a sneaking suspicion it's something along the lines of (in C#)
// We don't need an array
public int FindMostFrequentElement(IEnumerable<int> sequence)
{
// Initial value is irrelevant if sequence is non-empty,
// but keeps compiler happy.
int best = 0;
int count = 0;
foreach (int element in sequence)
{
if (count == 0)
{
best = element;
count = 1;
}
else
{
// Vote current choice up or down
count += (best == element) ? 1 : -1;
}
}
return best;
}
It sounds unlikely to work, but it does. (Proof as a postscript file, courtesy of Boyer/Moore.)
Find the median, it takes O(n) on an unsorted array. Since more than n/2 elements are equal to the same value, the median is equal to that value as well.
int findLeader(int n, int* x){
int leader = x[0], c = 1, i;
for(i=1; i<n; i++){
if(c == 0){
leader = x[i];
c = 1;
} else {
if(x[i] == leader) c++;
else c--;
}
}
if(c == 0) return NULL;
else {
c = 0;
for(i=0; i<n; i++){
if(x[i] == leader) c++;
}
if(c > n/2) return leader;
else return NULL;
}
}
I'm not the author of this code, but this will work for your problem. The first part looks for a potential leader, the second checks if it appears more than n/2 times in the array.
This is what I thought initially.
I made an attempt to keep the invariant "one element appears more than n/2 times", while reducing the problem set.
Lets start comparing a[i], a[i+1]. If they're equal we compare a[i+i], a[i+2]. If not, we remove both a[i], a[i+1] from the array. We repeat this until i>=(current size)/2. At this point we'll have 'THE' element occupying the first (current size)/2 positions.
This would maintain the invariant.
The only caveat is that we assume that the array is in a linked list [for it to give a O(n) complexity.]
What say folks?
-bhupi
Well you can do an inplace radix sort as described here[pdf] this takes no extra space and linear time. then you can make a single pass counting consecutive elements and terminating at count > n/2.
How about:
randomly select a small subset of K elements and look for duplicates (e.g. first 4, first 8, etc). If K == 4 then the probability of not getting at least 2 of the duplicates is 1/8. if K==8 then it goes to under 1%. If you find no duplicates repeat the process until you do. (assuming that the other elements are more randomly distributed, this would perform very poorly with, say, 49% of the array = "A", 51% of the array ="B").
e.g.:
findDuplicateCandidate:
select a fixed size subset.
return the most common element in that subset
if there is no element with more than 1 occurrence repeat.
if there is more than 1 element with more than 1 occurrence call findDuplicate and choose the element the 2 calls have in common
This is a constant order operation (if the data set isn't bad) so then do a linear scan of the array in order(N) to verify.
My first thought (not sufficient) would be to:
Sort the array in place
Return the middle element
But that would be O(n log n), as would any recursive solution.
If you can destructively modify the array (and various other conditions apply) you could do a pass replacing elements with their counts or something. Do you know anything else about the array, and are you allowed to modify it?
Edit Leaving my answer here for posterity, but I think Skeet's got it.
in php---pls check if it's correct
function arrLeader( $A ){
$len = count($A);
$B = array();
$val=-1;
$counts = array_count_values(array); //return array with elements as keys and occurrences of each element as values
for($i=0;$i<$len;$i++){
$val = $A[$i];
if(in_array($val,$B,true)){//to avoid looping again and again
}else{
if($counts[$val]>$len/2){
return $val;
}
array_push($B, $val);//to avoid looping again and again
}
}
return -1;
}
int n = A.Length;
int[] L = new int[n + 1];
L[0] = -1;
for (int i = 0; i < n; i++)
{
L[i + 1] = A[i];
}
int count = 0;
int pos = (n + 1) / 2;
int candidate = L[pos];
for (int i = 1; i <= n; i++)
{
if (L[i] == candidate && L[pos++] == candidate)
return candidate;
}
if (count > pos)
return candidate;
return (-1);

Resources