Most efficient algorithm to compute set difference? - algorithm

What is the most efficient algorithm to compute the difference between A and B sets when both sets are sorted? How can I implement it with pseudocode?

Use an index in both arrays, and compare the values at those indices. If they are equal, they are not in the difference, and both indices can increment with 1.
Otherwise the least value is only in one of the two sets, and belongs in the difference. In that case only the index of that value should increment with 1.
Here is an implementation in basic JavaScript. It collects all values in one of the following result arrays:
onlyInA: values that only occur in A, not in B
onlyInB: values that only occur in B, not in A
inBoth: values that A and B have in common.
It is assumed that the input arrays are sorted in ascending order.
function difference(a, b) {
let i = 0; j = 0;
let onlyInA = [];
let onlyInB = [];
let inBoth = [];
while (i < a.length || j < b.length) {
if (j >= b.length || a[i] < b[j]) {
onlyInA.push(a[i]);
i++;
} else if (i >= a.length || a[i] > b[j]) {
onlyInB.push(b[j]);
j++;
} else {
inBoth.push(a[i]);
i++;
j++;
}
}
return [onlyInA, onlyInB, inBoth];
}
// Demo run
let a = [2,3,5,7,11,13,17,19,23,29,31,37];
let b = [1,3,7,15,31];
let [onlyInA, onlyInB, inBoth] = difference(a, b);
console.log("only in A:", ...onlyInA);
console.log("only in B:", ...onlyInB);
console.log("in both:", ...inBoth);

Comment on trincot's algorithm.
Is it working correctly? As it seems, there is a risk for out-of-bounds access to the a or b arrays. To check it out, I ported the algorithm to C++. It works fine with the demo data. But when I switch the data contents of a and b (so a becomes the old b and b the old a), I get an out-of-bounds exception indeed.

Related

Find numbers that are superset of another number in a list

A number A is superset of a number B if all bits set on B are also set on A. Or A & B == B.
Given a list of numbers, how can I determine all the numbers that are superset of any other number in said list?
The simple approach is:
for a in list:
for b in list:
if a != b and a & b == b:
print(a + ' is a superset')
But this approach is O(n²). Is there any solution more efficient?
It doesn't matter what are the subsets, only the information about supersets is needed and the list might be sorted.
Here are some improvements if the list is sorted:
var getSupersetList = function(sortedList){
var supersetList = [];
//You don't need to check the smallest element
for (var i = sortedList.length-1; i > 0; i--) {
var biggerElement = sortedList[i];
//You only need to iterate over smaller elements
for(var j = 0; j < i; j++){
var smallerElement = sortedList[j];
//You can get rid of the '!=' check
if((biggerElement & smallerElement) == smallerElement){
supersetList.push(biggerElement);
//You can break after it is determined that it is a subset
break;
}
}
}
//If the smallest two are duplicates (which are a subset by the logic given), add the smallest
if(sortedList.length >= 2 && sortedList[0] == sortedList[1]){
supersetList.push(sortedList[0]);
}
return supersetList;
};

Sort a given array whose elements range from 1 to n , in which one element is missing and one is repeated

I have to sort this array in O(n) time and O(1) space.
I know how to sort an array in O(n) but that doesn't work with missing and repeated numbers. If I find the repeated and missing numbers first (It can be done in O(n)) and then sort , that seems costly.
static void sort(int[] arr)
{
for(int i=0;i<arr.length;i++)
{
if(i>=arr.length)
break;
if(arr[i]-1 == i)
continue;
else
{
while(arr[i]-1 != i)
{
int temp = arr[arr[i]-1];
arr[arr[i]-1] = arr[i];
arr[i] = temp;
}
}
}
}
First, you need to find missing and repeated numbers. You do this by solving following system of equations:
Left sums are computed simultaneously by making one pass over array. Right sums are even simpler -- you may use formulas for arithmetic progression to avoid looping. So, now you have system of two equations with two unknowns: missing number m and repeated number r. Solve it.
Next, you "sort" array by filling it with numbers 1 to n left to right, omitting m and duplicating r. Thus, overall algorithm requires only two passes over array.
void sort() {
for (int i = 1; i <= N; ++i) {
while (a[i] != a[a[i]]) {
std::swap(a[i], a[a[i]]);
}
}
for (int i = 1; i <= N; ++i) {
if (a[i] == i) continue;
for (int j = a[i] - 1; j >= i; --j) a[j] = j + 1;
for (int j = a[i] + 1; j <= i; ++j) a[j] = j - 1;
break;
}
}
Explanation:
Let's denote m the missing number and d the duplicated number
Please note in the while loop, the break condition is a[i] != a[a[i]] which covers both a[i] == i and a[i] is a duplicate.
After the first for, every non-duplicate number i is encountered 1-2 time and moved into the i-th position of the array at most 1 time.
The first-found number d is moved to d-th position, at most 1 time
The second d is moved around at most N-1 times and ends up in m-th position because every other i-th slot is occupied by number i
The second outer for locate the first i where a[i] != i. The only i satisfies that is i = m
The 2 inner fors handle 2 cases where m < d and m > d respectively
Full implementation at http://ideone.com/VDuLka
After
int temp = arr[arr[i]-1];
add a check for duplicate in the loop:
if((temp-1) == i){ // found duplicate
...
} else {
arr[arr[i]-1] = arr[i];
arr[i] = temp;
}
See if you can figure out the rest of the code.

Linear time algorithm for 2-SUM

Given an integer x and a sorted array a of N distinct integers, design a linear-time algorithm to determine if there exists two distinct indices i and j such that a[i] + a[j] == x
This is type of Subset sum problem
Here is my solution. I don't know if it was known earlier or not. Imagine 3D plot of function of two variables i and j:
sum(i,j) = a[i]+a[j]
For every i there is such j that a[i]+a[j] is closest to x. All these (i,j) pairs form closest-to-x line. We just need to walk along this line and look for a[i]+a[j] == x:
int i = 0;
int j = lower_bound(a.begin(), a.end(), x) - a.begin();
while (j >= 0 && j < a.size() && i < a.size()) {
int sum = a[i]+a[j];
if (sum == x) {
cout << "found: " << i << " " << j << endl;
return;
}
if (sum > x) j--;
else i++;
if (i > j) break;
}
cout << " not found\n";
Complexity: O(n)
think in terms of complements.
iterate over the list, figure out for each item what the number needed to get to X for that number is. stick number and complement into hash. while iterating check to see if number or its complement is in hash. if so, found.
edit: and as I have some time, some pseudo'ish code.
boolean find(int[] array, int x) {
HashSet<Integer> s = new HashSet<Integer>();
for(int i = 0; i < array.length; i++) {
if (s.contains(array[i]) || s.contains(x-array[i])) {
return true;
}
s.add(array[i]);
s.add(x-array[i]);
}
return false;
}
Given that the array is sorted (WLOG in descending order), we can do the following:
Algorithm A_1:
We are given (a_1,...,a_n,m), a_1<...,<a_n.
Put a pointer at the top of the list and one at the bottom.
Compute the sum where both pointers are.
If the sum is greater than m, move the above pointer down.
If the sum is less than m, move the lower pointer up.
If a pointer is on the other (here we assume each number can be employed only once), report unsat.
Otherwise, (an equivalent sum will be found), report sat.
It is clear that this is O(n) since the maximum number of sums computed is exactly n. The proof of correctness is left as an exercise.
This is merely a subroutine of the Horowitz and Sahni (1974) algorithm for SUBSET-SUM. (However, note that almost all general purpose SS algorithms contain such a routine, Schroeppel, Shamir (1981), Howgrave-Graham_Joux (2010), Becker-Joux (2011).)
If we were given an unordered list, implementing this algorithm would be O(nlogn) since we could sort the list using Mergesort, then apply A_1.
First pass search for the first value that is > ceil(x/2). Lets call this value L.
From index of L, search backwards till you find the other operand that matches the sum.
It is 2*n ~ O(n)
This we can extend to binary search.
Search for an element using binary search such that we find L, such that L is min(elements in a > ceil(x/2)).
Do the same for R, but now with L as the max size of searchable elements in the array.
This approach is 2*log(n).
Here's a python version using Dictionary data structure and number complement. This has linear running time(Order of N: O(N)):
def twoSum(N, x):
dict = {}
for i in range(len(N)):
complement = x - N[i]
if complement in dict:
return True
dict[N[i]] = i
return False
# Test
print twoSum([2, 7, 11, 15], 9) # True
print twoSum([2, 7, 11, 15], 3) # False
Iterate over the array and save the qualified numbers and their indices into the map. The time complexity of this algorithm is O(n).
vector<int> twoSum(vector<int> &numbers, int target) {
map<int, int> summap;
vector<int> result;
for (int i = 0; i < numbers.size(); i++) {
summap[numbers[i]] = i;
}
for (int i = 0; i < numbers.size(); i++) {
int searched = target - numbers[i];
if (summap.find(searched) != summap.end()) {
result.push_back(i + 1);
result.push_back(summap[searched] + 1);
break;
}
}
return result;
}
I would just add the difference to a HashSet<T> like this:
public static bool Find(int[] array, int toReach)
{
HashSet<int> hashSet = new HashSet<int>();
foreach (int current in array)
{
if (hashSet.Contains(current))
{
return true;
}
hashSet.Add(toReach - current);
}
return false;
}
Note: The code is mine but the test file was not. Also, this idea for the hash function comes from various readings on the net.
An implementation in Scala. It uses a hashMap and a custom (yet simple) mapping for the values. I agree that it does not makes use of the sorted nature of the initial array.
The hash function
I fix the bucket size by dividing each value by 10000. That number could vary, depending on the size you want for the buckets, which can be made optimal depending on the input range.
So for example, key 1 is responsible for all the integers from 1 to 9.
Impact on search scope
What that means, is that for a current value n, for which you're looking to find a complement c such as n + c = x (x being the element you're trying ton find a 2-SUM of), there is only 3 possibles buckets in which the complement can be:
-key
-key + 1
-key - 1
Let's say that your numbers are in a file of the following form:
0
1
10
10
-10
10000
-10000
10001
9999
-10001
-9999
10000
5000
5000
-5000
-1
1000
2000
-1000
-2000
Here's the implementation in Scala
import scala.collection.mutable
import scala.io.Source
object TwoSumRed {
val usage = """
Usage: scala TwoSumRed.scala [filename]
"""
def main(args: Array[String]) {
val carte = createMap(args) match {
case None => return
case Some(m) => m
}
var t: Int = 1
carte.foreach {
case (bucket, values) => {
var toCheck: Array[Long] = Array[Long]()
if (carte.contains(-bucket)) {
toCheck = toCheck ++: carte(-bucket)
}
if (carte.contains(-bucket - 1)) {
toCheck = toCheck ++: carte(-bucket - 1)
}
if (carte.contains(-bucket + 1)) {
toCheck = toCheck ++: carte(-bucket + 1)
}
values.foreach { v =>
toCheck.foreach { c =>
if ((c + v) == t) {
println(s"$c and $v forms a 2-sum for $t")
return
}
}
}
}
}
}
def createMap(args: Array[String]): Option[mutable.HashMap[Int, Array[Long]]] = {
var carte: mutable.HashMap[Int,Array[Long]] = mutable.HashMap[Int,Array[Long]]()
if (args.length == 1) {
val filename = args.toList(0)
val lines: List[Long] = Source.fromFile(filename).getLines().map(_.toLong).toList
lines.foreach { l =>
val idx: Int = math.floor(l / 10000).toInt
if (carte.contains(idx)) {
carte(idx) = carte(idx) :+ l
} else {
carte += (idx -> Array[Long](l))
}
}
Some(carte)
} else {
println(usage)
None
}
}
}
int[] b = new int[N];
for (int i = 0; i < N; i++)
{
b[i] = x - a[N -1 - i];
}
for (int i = 0, j = 0; i < N && j < N;)
if(a[i] == b[j])
{
cout << "found";
return;
} else if(a[i] < b[j])
i++;
else
j++;
cout << "not found";
Here is a linear time complexity solution O(n) time O(1) space
public void twoSum(int[] arr){
if(arr.length < 2) return;
int max = arr[0] + arr[1];
int bigger = Math.max(arr[0], arr[1]);
int smaller = Math.min(arr[0], arr[1]);
int biggerIndex = 0;
int smallerIndex = 0;
for(int i = 2 ; i < arr.length ; i++){
if(arr[i] + bigger <= max){ continue;}
else{
if(arr[i] > bigger){
smaller = bigger;
bigger = arr[i];
biggerIndex = i;
}else if(arr[i] > smaller)
{
smaller = arr[i];
smallerIndex = i;
}
max = bigger + smaller;
}
}
System.out.println("Biggest sum is: " + max + "with indices ["+biggerIndex+","+smallerIndex+"]");
}
Solution
We need array to store the indices
Check if the array is empty or contains less than 2 elements
Define the start and the end point of the array
Iterate till condition is met
Check if the sum is equal to the target. If yes get the indices.
If condition is not met then traverse left or right based on the sum value
Traverse to the right
Traverse to the left
For more info :[http://www.prathapkudupublog.com/2017/05/two-sum-ii-input-array-is-sorted.html
Credit to leonid
His solution in java, if you want to give it a shot
I removed the return, so if the array is sorted, but DOES allow duplicates, it still gives pairs
static boolean cpp(int[] a, int x) {
int i = 0;
int j = a.length - 1;
while (j >= 0 && j < a.length && i < a.length) {
int sum = a[i] + a[j];
if (sum == x) {
System.out.printf("found %s, %s \n", i, j);
// return true;
}
if (sum > x) j--;
else i++;
if (i > j) break;
}
System.out.println("not found");
return false;
}
The classic linear time two-pointer solution does not require hashing so can solve related problems such as approximate sum (find closest pair sum to target).
First, a simple n log n solution: walk through array elements a[i], and use binary search to find the best a[j].
To get rid of the log factor, use the following observation: as the list is sorted, iterating through indices i gives a[i] is increasing, so any corresponding a[j] is decreasing in value and in index j. This gives the two-pointer solution: start with indices lo = 0, hi = N-1 (pointing to a[0] and a[N-1]). For a[0], find the best a[hi] by decreasing hi. Then increment lo and for each a[lo], decrease hi until a[lo] + a[hi] is the best. The algorithm can stop when it reaches lo == hi.

3-PARTITION problem

here is another dynamic programming question (Vazirani ch6)
Consider the following 3-PARTITION
problem. Given integers a1...an, we
want to determine whether it is
possible to partition of {1...n} into
three disjoint subsets I, J, K such
that
sum(I) = sum(J) = sum(K) = 1/3*sum(ALL)
For example, for input (1; 2; 3; 4; 4;
5; 8) the answer is yes, because there
is the partition (1; 8), (4; 5), (2;
3; 4). On the other hand, for input
(2; 2; 3; 5) the answer is no. Devise
and analyze a dynamic programming
algorithm for 3-PARTITION that runs in
time poly- nomial in n and (Sum a_i)
How can I solve this problem? I know 2-partition but still can't solve it
It's easy to generalize 2-sets solution for 3-sets case.
In original version, you create array of boolean sums where sums[i] tells whether sum i can be reached with numbers from the set, or not. Then, once array is created, you just see if sums[TOTAL/2] is true or not.
Since you said you know old version already, I'll describe only difference between them.
In 3-partition case, you keep array of boolean sums, where sums[i][j] tells whether first set can have sum i and second - sum j. Then, once array is created, you just see if sums[TOTAL/3][TOTAL/3] is true or not.
If original complexity is O(TOTAL*n), here it's O(TOTAL^2*n).
It may not be polynomial in the strictest sense of the word, but then original version isn't strictly polynomial too :)
I think by reduction it goes like this:
Reducing 2-partition to 3-partition:
Let S be the original set, and A be its total sum, then let S'=union({A/2},S).
Hence, perform a 3-partition on the set S' yields three sets X, Y, Z.
Among X, Y, Z, one of them must be {A/2}, say it's set Z, then X and Y is a 2-partition.
The witnesses of 3-partition on S' is the witnesses of 2-partition on S, thus 2-partition reduces to 3-partition.
If this problem is to be solvable; then sum(ALL)/3 must be an integer. Any solution must have SUM(J) + SUM(K) = SUM(I) + sum(ALL)/3. This represents a solution to the 2-partition problem over concat(ALL, {sum(ALL)/3}).
You say you have a 2-partition implementation: use it to solve that problem. Then (at least) one of the two partitions will contain the number sum(ALL)/3 - remove the number from that partion, and you've found I. For the other partition, run 2-partition again, to split J from K; after all, J and K must be equal in sum themselves.
Edit: This solution is probably incorrect - the 2-partition of the concatenated set will have several solutions (at least one for each of I, J, K) - however, if there are other solutions, then the "other side" may not consist of the union of two of I, J, K, and may not be splittable at all. You'll need to actually think, I fear :-).
Try 2: Iterate over the multiset, maintaining the following map: R(i,j,k) :: Boolean which represents the fact whether up to the current iteration the numbers permit division into three multisets that have sums i, j, k. I.e., for any R(i,j,k) and next number n in the next state R' it holds that R'(i+n,j,k) and R'(i,j+n,k) and R'(i,j,k+n). Note that the complexity (as per the excersize) depends on the magnitude of the input numbers; this is a pseudo-polynomialtime algorithm. Nikita's solution is conceptually similar but more efficient than this solution since it doesn't track the third set's sum: that's unnecessary since you can trivially compute it.
As I have answered in same another question like this, the C++ implementation would look something like this:
int partition3(vector<int> &A)
{
int sum = accumulate(A.begin(), A.end(), 0);
if (sum % 3 != 0)
{
return false;
}
int size = A.size();
vector<vector<int>> dp(sum + 1, vector<int>(sum + 1, 0));
dp[0][0] = true;
// process the numbers one by one
for (int i = 0; i < size; i++)
{
for (int j = sum; j >= 0; --j)
{
for (int k = sum; k >= 0; --k)
{
if (dp[j][k])
{
dp[j + A[i]][k] = true;
dp[j][k + A[i]] = true;
}
}
}
}
return dp[sum / 3][sum / 3];
}
Let's say you want to partition the set $X = {x_1, ..., x_n}$ in $k$ partitions.
Create a $ n \times k $ table. Assume the cost $M[i,j]$ be the maximum sum of $i$ elements in $j$ partitions. Just recursively use the following optimality criterion to fill it:
M[n,k] = min_{i\leq n} max ( M[i, k-1], \sum_{j=i+1}^{n} x_i )
Using these initial values for the table:
M[i,1] = \sum_{j=1}^{i} x_i and M[1,j] = x_j
The running time is $O(kn^2)$ (polynomial )
Create a three dimensional array, where size is count of elements, and part is equal to to sum of all elements divided by 3. So each cell of array[seq][sum1][sum2] tells can you create sum1 and sum2 using max seq elements from given array A[] or not. So compute all values of array, result will be in cell array[using all elements][sum of all element / 3][sum of all elements / 3], if you can create two sets without crossing equal to sum/3, there will be third set.
Logic of checking: exlude A[seq] element to third sum(not stored), check cell without element if it has same two sums; OR include to sum1 - if it is possible to get two sets without seq element, where sum1 is smaller by value of element seq A[seq], and sum2 isn't changed; OR include to sum2 check like previous.
int partition3(vector<int> &A)
{
int part=0;
for (int a : A)
part += a;
if (part%3)
return 0;
int size = A.size()+1;
part = part/3+1;
bool array[size][part][part];
//sequence from 0 integers inside to all inside
for(int seq=0; seq<size; seq++)
for(int sum1=0; sum1<part; sum1++)
for(int sum2=0;sum2<part; sum2++) {
bool curRes;
if (seq==0)
if (sum1 == 0 && sum2 == 0)
curRes = true;
else
curRes= false;
else {
int curInSeq = seq-1;
bool excludeFrom = array[seq-1][sum1][sum2];
bool includeToSum1 = (sum1>=A[curInSeq]
&& array[seq-1][sum1-A[curInSeq]][sum2]);
bool includeToSum2 = (sum2>=A[curInSeq]
&& array[seq-1][sum1][sum2-A[curInSeq]]);
curRes = excludeFrom || includeToSum1 || includeToSum2;
}
array[seq][sum1][sum2] = curRes;
}
int result = array[size-1][part-1][part-1];
return result;
}
Another example in C++ (based on the previous answers):
bool partition3(vector<int> const &A) {
int sum = 0;
for (int i = 0; i < A.size(); i++) {
sum += A[i];
}
if (sum % 3 != 0) {
return false;
}
vector<vector<vector<int>>> E(A.size() + 1, vector<vector<int>>(sum / 3 + 1, vector<int>(sum / 3 + 1, 0)));
for (int i = 1; i <= A.size(); i++) {
for (int j = 0; j <= sum / 3; j++) {
for (int k = 0; k <= sum / 3; k++) {
E[i][j][k] = E[i - 1][j][k];
if (A[i - 1] <= k) {
E[i][j][k] = max(E[i][j][k], E[i - 1][j][k - A[i - 1]] + A[i - 1]);
}
if (A[i - 1] <= j) {
E[i][j][k] = max(E[i][j][k], E[i - 1][j - A[i - 1]][k] + A[i - 1]);
}
}
}
}
return (E.back().back().back() / 2 == sum / 3);
}
You really want Korf's Complete Karmarkar-Karp algorithm (http://ac.els-cdn.com/S0004370298000861/1-s2.0-S0004370298000861-main.pdf, http://ijcai.org/papers09/Papers/IJCAI09-096.pdf). A generalization to three-partitioning is given. The algorithm is surprisingly fast given the complexity of the problem, but requires some implementation.
The essential idea of KK is to ensure that large blocks of similar size appear in different partitions. One groups pairs of blocks, which can then be treated as a smaller block of size equal to the difference in sizes that can be placed as normal: by doing this recursively, one ends up with small blocks that are easy to place. One then does a two-coloring of the block groups to ensure that the opposite placements are handled. The extension to 3-partition is a bit complicated. The Korf extension is to use depth-first search in KK order to find all possible solutions or to find a solution quickly.

Array of size n, with one element n/2 times

Given an array of n integers, where one element appears more than n/2 times. We need to find that element in linear time and constant extra space.
YAAQ: Yet another arrays question.
I have a sneaking suspicion it's something along the lines of (in C#)
// We don't need an array
public int FindMostFrequentElement(IEnumerable<int> sequence)
{
// Initial value is irrelevant if sequence is non-empty,
// but keeps compiler happy.
int best = 0;
int count = 0;
foreach (int element in sequence)
{
if (count == 0)
{
best = element;
count = 1;
}
else
{
// Vote current choice up or down
count += (best == element) ? 1 : -1;
}
}
return best;
}
It sounds unlikely to work, but it does. (Proof as a postscript file, courtesy of Boyer/Moore.)
Find the median, it takes O(n) on an unsorted array. Since more than n/2 elements are equal to the same value, the median is equal to that value as well.
int findLeader(int n, int* x){
int leader = x[0], c = 1, i;
for(i=1; i<n; i++){
if(c == 0){
leader = x[i];
c = 1;
} else {
if(x[i] == leader) c++;
else c--;
}
}
if(c == 0) return NULL;
else {
c = 0;
for(i=0; i<n; i++){
if(x[i] == leader) c++;
}
if(c > n/2) return leader;
else return NULL;
}
}
I'm not the author of this code, but this will work for your problem. The first part looks for a potential leader, the second checks if it appears more than n/2 times in the array.
This is what I thought initially.
I made an attempt to keep the invariant "one element appears more than n/2 times", while reducing the problem set.
Lets start comparing a[i], a[i+1]. If they're equal we compare a[i+i], a[i+2]. If not, we remove both a[i], a[i+1] from the array. We repeat this until i>=(current size)/2. At this point we'll have 'THE' element occupying the first (current size)/2 positions.
This would maintain the invariant.
The only caveat is that we assume that the array is in a linked list [for it to give a O(n) complexity.]
What say folks?
-bhupi
Well you can do an inplace radix sort as described here[pdf] this takes no extra space and linear time. then you can make a single pass counting consecutive elements and terminating at count > n/2.
How about:
randomly select a small subset of K elements and look for duplicates (e.g. first 4, first 8, etc). If K == 4 then the probability of not getting at least 2 of the duplicates is 1/8. if K==8 then it goes to under 1%. If you find no duplicates repeat the process until you do. (assuming that the other elements are more randomly distributed, this would perform very poorly with, say, 49% of the array = "A", 51% of the array ="B").
e.g.:
findDuplicateCandidate:
select a fixed size subset.
return the most common element in that subset
if there is no element with more than 1 occurrence repeat.
if there is more than 1 element with more than 1 occurrence call findDuplicate and choose the element the 2 calls have in common
This is a constant order operation (if the data set isn't bad) so then do a linear scan of the array in order(N) to verify.
My first thought (not sufficient) would be to:
Sort the array in place
Return the middle element
But that would be O(n log n), as would any recursive solution.
If you can destructively modify the array (and various other conditions apply) you could do a pass replacing elements with their counts or something. Do you know anything else about the array, and are you allowed to modify it?
Edit Leaving my answer here for posterity, but I think Skeet's got it.
in php---pls check if it's correct
function arrLeader( $A ){
$len = count($A);
$B = array();
$val=-1;
$counts = array_count_values(array); //return array with elements as keys and occurrences of each element as values
for($i=0;$i<$len;$i++){
$val = $A[$i];
if(in_array($val,$B,true)){//to avoid looping again and again
}else{
if($counts[$val]>$len/2){
return $val;
}
array_push($B, $val);//to avoid looping again and again
}
}
return -1;
}
int n = A.Length;
int[] L = new int[n + 1];
L[0] = -1;
for (int i = 0; i < n; i++)
{
L[i + 1] = A[i];
}
int count = 0;
int pos = (n + 1) / 2;
int candidate = L[pos];
for (int i = 1; i <= n; i++)
{
if (L[i] == candidate && L[pos++] == candidate)
return candidate;
}
if (count > pos)
return candidate;
return (-1);

Resources