How to find minimum positive contiguous sub sequence in O(n) time? - algorithm

We have this algorithm for finding maximum positive sub sequence in given sequence in O(n) time. Can anybody suggest similar algorithm for finding minimum positive contiguous sub sequence.
For example
If given sequence is 1,2,3,4,5 answer should be 1.
[5,-4,3,5,4] ->1 is the minimum positive sum of elements [5,-4].

There cannot be such algorithm. The lower bound for this problem is O(n log n). I'll prove it by reducing the element distinctness problem to it (actually to the non-negative variant of it).
Let's suppose we have an O(n) algorithm for this problem (the minimum non-negative subarray).
We want to find out if an array (e.g. A=[1, 2, -3, 4, 2]) has only distinct elements. To solve this problem, I could construct an array with the difference between consecutive elements (e.g. A'=[1, -5, 7, -2]) and run the O(n) algorithm we have. The original array only has distinct elements if and only if the minimum non-negative subarray is greater than 0.
If we had an O(n) algorithm to your problem, we would have an O(n) algorithm to element distinctness problem, which we know is not possible on a Turing machine.

We can have a O(n log n) algorithm as follow:
Assuming that we have an array prefix, which index i stores the sum of array A from 0 to i, so the sum of sub-array (i, j) is prefix[j] - prefix[i - 1].
Thus, in order to find the minimum positive sub-array ending at index j, so, we need to find the maximum element prefix[x], which less than prefix[j] and x < j. We can find that element in O(log n) time if we use a binary search tree.
Pseudo code:
int[]prefix = new int[A.length];
prefix[0] = A[0];
for(int i = 1; i < A.length; i++)
prefix[i] = A[i] + prefix[i - 1];
int result = MAX_VALUE;
BinarySearchTree tree;
for(int i = 0; i < A.length; i++){
if(A[i] > 0)
result = min(result, A[i];
int v = tree.getMaximumElementLessThan(prefix[i]);
result = min(result, prefix[i] - v);
tree.add(prefix[i]);
}

I believe there's a O(n) algorithm, see below.
Note: it has a scale factor that might make it less attractive in practical applications: it depends on the (input) values to be processed, see remarks in the code.
private int GetMinimumPositiveContiguousSubsequenc(List<Int32> values)
{
// Note: this method has no precautions against integer over/underflow, which may occur
// if large (abs) values are present in the input-list.
// There must be at least 1 item.
if (values == null || values.Count == 0)
throw new ArgumentException("There must be at least one item provided to this method.");
// 1. Scan once to:
// a) Get the mimumum positive element;
// b) Get the value of the MAX contiguous sequence
// c) Get the value of the MIN contiguous sequence - allowing negative values: the mirror of the MAX contiguous sequence.
// d) Pinpoint the (index of the) first negative value.
int minPositive = 0;
int maxSequence = 0;
int currentMaxSequence = 0;
int minSequence = 0;
int currentMinSequence = 0;
int indxFirstNegative = -1;
for (int k = 0; k < values.Count; k++)
{
int value = values[k];
if (value > 0)
if (minPositive == 0 || value < minPositive)
minPositive = value;
else if (indxFirstNegative == -1 && value < 0)
indxFirstNegative = k;
currentMaxSequence += value;
if (currentMaxSequence <= 0)
currentMaxSequence = 0;
else if (currentMaxSequence > maxSequence)
maxSequence = currentMaxSequence;
currentMinSequence += value;
if (currentMinSequence >= 0)
currentMinSequence = 0;
else if (currentMinSequence < minSequence)
minSequence = currentMinSequence;
}
// 2. We're done if (a) there are no negatives, or (b) the minPositive (single) value is 1 (or 0...).
if (minSequence == 0 || minPositive <= 1)
return minPositive;
// 3. Real work to do.
// The strategy is as follows, iterating over the input values:
// a) Keep track of the cumulative value of ALL items - the sequence that starts with the very first item.
// b) Register each such cumulative value as "existing" in a bool array 'initialSequence' as we go along.
// We know already the max/min contiguous sequence values, so we can properly size that array in advance.
// Since negative sequence values occur we'll have an offset to match the index in that bool array
// with the corresponding value of the initial sequence.
// c) For each next input value to process scan the "initialSequence" bool array to see whether relevant entries are TRUE.
// We don't need to go over the complete array, as we're only interested in entries that would produce a subsequence with
// a value that is positive and also smaller than best-so-far.
// (As we go along, the range to check will normally shrink as we get better and better results.
// Also: initially the range is already limited by the single-minimum-positive value that we have found.)
// Performance-wise this approach (which is O(n)) is suitable IFF the number of input values is large (or at least: not small) relative to
// the spread between maxSequence and minSeqence: the latter two define the size of the array in which we will do (partial) linear traversals.
// If this condition is not met it may be more efficient to replace the bool array by a (binary) search tree.
// (which will result in O(n logn) performance).
// Since we know the relevant parameters at this point, we may below have the two strategies both implemented and decide run-time
// which to choose.
// The current implementation has only the fixed bool array approach.
// Initialize a variable to keep track of the best result 'so far'; it will also be the return value.
int minPositiveSequence = minPositive;
// The bool array to keep track of which (total) cumulative values (always with the sequence starting at element #0) have occurred so far,
// and the 'offset' - see remark 3b above.
int offset = -minSequence;
bool[] initialSequence = new bool[maxSequence + offset + 1];
int valueCumulative = 0;
for (int k = 0; k < indxFirstNegative; k++)
{
int value = values[k];
valueCumulative += value;
initialSequence[offset + valueCumulative] = true;
}
for (int k = indxFirstNegative; k < values.Count; k++)
{
int value = values[k];
valueCumulative += value;
initialSequence[offset + valueCumulative] = true;
// Check whether the difference with any previous "cumulative" may improve the optimum-so-far.
// the index that, if the entry is TRUE, would yield the best possible result.
int indexHigh = valueCumulative + offset - 1;
// the last (lowest) index that, if the entry is TRUE, would still yield an improvement over what we have so far.
int indexLow = Math.Max(0, valueCumulative + offset - minPositiveSequence + 1);
for (int indx = indexHigh; indx >= indexLow; indx--)
{
if (initialSequence[indx])
{
minPositiveSequence = valueCumulative - indx + offset;
if (minPositiveSequence == 1)
return minPositiveSequence;
break;
}
}
}
return minPositiveSequence;
}
}

Related

Give a pseudocode for an algorithm that, given a list of n integers from the set {0, 1,

Problem statement:
Give a pseudocode for an algorithm that, given a list of n integers from the set {0, 1, . . . , k−1},
preprocesses its input to extract and store information that makes it possible to answer any query asking
how many of the n integers fall in the range [a..b] (with a and b being input parameters to the query) in
O(1) time. Explain how your algorithm works.
The preprocessing time should be O(n + k) in the worst case. Provide an argument showing that your
preprocessing algorithm meets that bound.
My attempt:
Counting Sort Pseudo Code
function countingSort(array, min, max)
count: array of (max – min + 1) elements //max is highest number, min is lowest
initialize count with 0 //set count = 0
for each number in array do
count[number – min] := count[number-min] + 1 //element i – min element = pos.
//pos + 1
done
z:= 0
for i from min to max do
while(count[ i – min] >0) do
array[z] := i
z := z + 1
count[i – min] := count [i – min] – 1
done
done
Find Pseudo Code
find(a, b)
??
Time Complexity Analysis:
We find that the total time complexity of Counting Sort takes O(k) time to initialize the array, O(n) time to read in the numbers and increment the appropriate element of counts. Another O(k) to create the array z, and another O(n) to scan and read through the list of numbers for a toal runtime of O(n+k).
Question:
The only problem I am having is that I do not know how I will report back to the user the number of integers that lie in between the range they have chosen [a..b] in O(1) time.. The only way I can think of retrieving that information is by looping through my array of sorted integers and having a counter to increment each time we find a number such that some some element is >= a && some element is <= b. Also should I include the actual numbers they have inputted in my search or rather should I just count the numbers in between them? The problem with looping through the array and having a counter to count the numbers between [a..b] is that this requires a for loop and is O(n). Any help would be greatly appreciated
The answer was trivial, just didn't think about it. After I use counting sort it resorts my list so that all I have to do is take the difference of the range asked of from the user. So for example
find(a,b)
numberofIntegersBetweenAandB = count[b] - count[a]
Working C++ example. Since the goal here is psuedo code, there are no error checks.
int * GenerateSums(int a[], size_t n, int min, int max)
{
size_t k = max + 2 - min;
int *sums = new int[k];
for(size_t i = 0; i < k; i++) // clear sums
sums[i] = 0;
for(size_t i = 0; i < n; i++) // set number of instances
sums[1+a[i]-min]++;
for(size_t i = 1; i < k; i++) // convert to cumulative sums
sums[i] += sums[i-1];
return sums;
}
int CountInRange(int sums[], int a, int b)
{
return sums[b+1] - sums[a];
}
int main()
{
int a[] = {4,0,3,4,2,4,1,4,3,4,3,2,4,2,3,1};
int *sums = GenerateSums(a, sizeof(a)/sizeof(a[0]), 0, 4);
int cnt;
cnt = CountInRange(sums, 0, 0); // returns 1
cnt = CountInRange(sums, 3, 4); // returns 10
cnt = CountInRange(sums, 0, 4); // returns 16
delete[] sums;
return 0;
}

Numbers of increasing squence using BIT

For EX: A sequence is giving 1 3 2 4 now i have to find the number of increasing sequences.
I came to know about BIT algorithm which is give me O(nlog2n) solution as compared to O(n2).
Code is as follow
void update(int idx ,int val){
while (idx <= MaxVal){
tree[idx] += val;
idx += (idx & -idx);
}
}
To read
int read(int idx){
int sum = 0;
while (idx > 0){
sum += tree[idx];
idx -= (idx & -idx);
}
return sum;
}
I can't understand how they are using BIT algorithms can you please help me
Binary indexed tree's read function will return the number of values which is equals or less than idx.
So, by insert each element one by one, from 0 to n (n is number of elements)
For each element, we need to know how many values that are less than this current element, and has already added to the BIT. Assume that this number is x, so the number of increasing sequence that end at this element is 2^x
After calculating all sequences that ended at this element, we need to add this element into BIT
Pseudo code:
long result = 0;
BIT tree = //initialize BIT tree
for(int i = 0; i < n; i++){
int number = tree.read(data[i] - 1);// Get the number of element that less than data[i];
result += 1L<< number;
tree.update(data[i], 1);
}
As update and read function has O(log n) time complexity, the above algo has time complexity O(n log n)

3-PARTITION problem

here is another dynamic programming question (Vazirani ch6)
Consider the following 3-PARTITION
problem. Given integers a1...an, we
want to determine whether it is
possible to partition of {1...n} into
three disjoint subsets I, J, K such
that
sum(I) = sum(J) = sum(K) = 1/3*sum(ALL)
For example, for input (1; 2; 3; 4; 4;
5; 8) the answer is yes, because there
is the partition (1; 8), (4; 5), (2;
3; 4). On the other hand, for input
(2; 2; 3; 5) the answer is no. Devise
and analyze a dynamic programming
algorithm for 3-PARTITION that runs in
time poly- nomial in n and (Sum a_i)
How can I solve this problem? I know 2-partition but still can't solve it
It's easy to generalize 2-sets solution for 3-sets case.
In original version, you create array of boolean sums where sums[i] tells whether sum i can be reached with numbers from the set, or not. Then, once array is created, you just see if sums[TOTAL/2] is true or not.
Since you said you know old version already, I'll describe only difference between them.
In 3-partition case, you keep array of boolean sums, where sums[i][j] tells whether first set can have sum i and second - sum j. Then, once array is created, you just see if sums[TOTAL/3][TOTAL/3] is true or not.
If original complexity is O(TOTAL*n), here it's O(TOTAL^2*n).
It may not be polynomial in the strictest sense of the word, but then original version isn't strictly polynomial too :)
I think by reduction it goes like this:
Reducing 2-partition to 3-partition:
Let S be the original set, and A be its total sum, then let S'=union({A/2},S).
Hence, perform a 3-partition on the set S' yields three sets X, Y, Z.
Among X, Y, Z, one of them must be {A/2}, say it's set Z, then X and Y is a 2-partition.
The witnesses of 3-partition on S' is the witnesses of 2-partition on S, thus 2-partition reduces to 3-partition.
If this problem is to be solvable; then sum(ALL)/3 must be an integer. Any solution must have SUM(J) + SUM(K) = SUM(I) + sum(ALL)/3. This represents a solution to the 2-partition problem over concat(ALL, {sum(ALL)/3}).
You say you have a 2-partition implementation: use it to solve that problem. Then (at least) one of the two partitions will contain the number sum(ALL)/3 - remove the number from that partion, and you've found I. For the other partition, run 2-partition again, to split J from K; after all, J and K must be equal in sum themselves.
Edit: This solution is probably incorrect - the 2-partition of the concatenated set will have several solutions (at least one for each of I, J, K) - however, if there are other solutions, then the "other side" may not consist of the union of two of I, J, K, and may not be splittable at all. You'll need to actually think, I fear :-).
Try 2: Iterate over the multiset, maintaining the following map: R(i,j,k) :: Boolean which represents the fact whether up to the current iteration the numbers permit division into three multisets that have sums i, j, k. I.e., for any R(i,j,k) and next number n in the next state R' it holds that R'(i+n,j,k) and R'(i,j+n,k) and R'(i,j,k+n). Note that the complexity (as per the excersize) depends on the magnitude of the input numbers; this is a pseudo-polynomialtime algorithm. Nikita's solution is conceptually similar but more efficient than this solution since it doesn't track the third set's sum: that's unnecessary since you can trivially compute it.
As I have answered in same another question like this, the C++ implementation would look something like this:
int partition3(vector<int> &A)
{
int sum = accumulate(A.begin(), A.end(), 0);
if (sum % 3 != 0)
{
return false;
}
int size = A.size();
vector<vector<int>> dp(sum + 1, vector<int>(sum + 1, 0));
dp[0][0] = true;
// process the numbers one by one
for (int i = 0; i < size; i++)
{
for (int j = sum; j >= 0; --j)
{
for (int k = sum; k >= 0; --k)
{
if (dp[j][k])
{
dp[j + A[i]][k] = true;
dp[j][k + A[i]] = true;
}
}
}
}
return dp[sum / 3][sum / 3];
}
Let's say you want to partition the set $X = {x_1, ..., x_n}$ in $k$ partitions.
Create a $ n \times k $ table. Assume the cost $M[i,j]$ be the maximum sum of $i$ elements in $j$ partitions. Just recursively use the following optimality criterion to fill it:
M[n,k] = min_{i\leq n} max ( M[i, k-1], \sum_{j=i+1}^{n} x_i )
Using these initial values for the table:
M[i,1] = \sum_{j=1}^{i} x_i and M[1,j] = x_j
The running time is $O(kn^2)$ (polynomial )
Create a three dimensional array, where size is count of elements, and part is equal to to sum of all elements divided by 3. So each cell of array[seq][sum1][sum2] tells can you create sum1 and sum2 using max seq elements from given array A[] or not. So compute all values of array, result will be in cell array[using all elements][sum of all element / 3][sum of all elements / 3], if you can create two sets without crossing equal to sum/3, there will be third set.
Logic of checking: exlude A[seq] element to third sum(not stored), check cell without element if it has same two sums; OR include to sum1 - if it is possible to get two sets without seq element, where sum1 is smaller by value of element seq A[seq], and sum2 isn't changed; OR include to sum2 check like previous.
int partition3(vector<int> &A)
{
int part=0;
for (int a : A)
part += a;
if (part%3)
return 0;
int size = A.size()+1;
part = part/3+1;
bool array[size][part][part];
//sequence from 0 integers inside to all inside
for(int seq=0; seq<size; seq++)
for(int sum1=0; sum1<part; sum1++)
for(int sum2=0;sum2<part; sum2++) {
bool curRes;
if (seq==0)
if (sum1 == 0 && sum2 == 0)
curRes = true;
else
curRes= false;
else {
int curInSeq = seq-1;
bool excludeFrom = array[seq-1][sum1][sum2];
bool includeToSum1 = (sum1>=A[curInSeq]
&& array[seq-1][sum1-A[curInSeq]][sum2]);
bool includeToSum2 = (sum2>=A[curInSeq]
&& array[seq-1][sum1][sum2-A[curInSeq]]);
curRes = excludeFrom || includeToSum1 || includeToSum2;
}
array[seq][sum1][sum2] = curRes;
}
int result = array[size-1][part-1][part-1];
return result;
}
Another example in C++ (based on the previous answers):
bool partition3(vector<int> const &A) {
int sum = 0;
for (int i = 0; i < A.size(); i++) {
sum += A[i];
}
if (sum % 3 != 0) {
return false;
}
vector<vector<vector<int>>> E(A.size() + 1, vector<vector<int>>(sum / 3 + 1, vector<int>(sum / 3 + 1, 0)));
for (int i = 1; i <= A.size(); i++) {
for (int j = 0; j <= sum / 3; j++) {
for (int k = 0; k <= sum / 3; k++) {
E[i][j][k] = E[i - 1][j][k];
if (A[i - 1] <= k) {
E[i][j][k] = max(E[i][j][k], E[i - 1][j][k - A[i - 1]] + A[i - 1]);
}
if (A[i - 1] <= j) {
E[i][j][k] = max(E[i][j][k], E[i - 1][j - A[i - 1]][k] + A[i - 1]);
}
}
}
}
return (E.back().back().back() / 2 == sum / 3);
}
You really want Korf's Complete Karmarkar-Karp algorithm (http://ac.els-cdn.com/S0004370298000861/1-s2.0-S0004370298000861-main.pdf, http://ijcai.org/papers09/Papers/IJCAI09-096.pdf). A generalization to three-partitioning is given. The algorithm is surprisingly fast given the complexity of the problem, but requires some implementation.
The essential idea of KK is to ensure that large blocks of similar size appear in different partitions. One groups pairs of blocks, which can then be treated as a smaller block of size equal to the difference in sizes that can be placed as normal: by doing this recursively, one ends up with small blocks that are easy to place. One then does a two-coloring of the block groups to ensure that the opposite placements are handled. The extension to 3-partition is a bit complicated. The Korf extension is to use depth-first search in KK order to find all possible solutions or to find a solution quickly.

Find the top k sums of two sorted arrays

You are given two sorted arrays, of sizes n and m respectively. Your task (should you choose to accept it), is to output the largest k sums of the form a[i]+b[j].
A O(k log k) solution can be found here. There are rumors of a O(k) or O(n) solution. Does one exist?
I found the responses at your link mostly vague and poorly structured. Here's a start with a O(k * log(min(m, n))) O(k * log(m + n)) O(k * log(k)) algorithm.
Suppose they are sorted decreasing. Imagine you computed the m*n matrix of the sums as follows:
for i from 0 to m
for j from 0 to n
sums[i][j] = a[i] + b[j]
In this matrix, values monotonically decrease down and to the right. With that in mind, here is an algorithm which performs a graph search through this matrix in order of decreasing sums.
q : priority queue (decreasing) := empty priority queue
add (0, 0) to q with priority a[0] + b[0]
while k > 0:
k--
x := pop q
output x
(i, j) : tuple of int,int := position of x
if i < m:
add (i + 1, j) to q with priority a[i + 1] + b[j]
if j < n:
add (i, j + 1) to q with priority a[i] + b[j + 1]
Analysis:
The loop is executed k times.
There is one pop operation per iteration.
There are up to two insert operations per iteration.
The maximum size of the priority queue is O(min(m, n)) O(m + n) O(k).
The priority queue can be implemented with a binary heap giving log(size) pop and insert.
Therefore this algorithm is O(k * log(min(m, n))) O(k * log(m + n)) O(k * log(k)).
Note that the general priority queue abstract data type needs to be modified to ignore duplicate entries. Alternately, you could maintain a separate set structure that first checks for membership in the set before adding to the queue, and removes from the set after popping from the queue. Neither of these ideas would worsen the time or space complexity.
I could write this up in Java if there's any interest.
Edit: fixed complexity. There is an algorithm which has the complexity I described, but it is slightly different from this one. You would have to take care to avoid adding certain nodes. My simple solution adds many nodes to the queue prematurely.
private static class FrontierElem implements Comparable<FrontierElem> {
int value;
int aIdx;
int bIdx;
public FrontierElem(int value, int aIdx, int bIdx) {
this.value = value;
this.aIdx = aIdx;
this.bIdx = bIdx;
}
#Override
public int compareTo(FrontierElem o) {
return o.value - value;
}
}
public static void findMaxSum( int [] a, int [] b, int k ) {
Integer [] frontierA = new Integer[ a.length ];
Integer [] frontierB = new Integer[ b.length ];
PriorityQueue<FrontierElem> q = new PriorityQueue<MaxSum.FrontierElem>();
frontierA[0] = frontierB[0]=0;
q.add( new FrontierElem( a[0]+b[0], 0, 0));
while( k > 0 ) {
FrontierElem f = q.poll();
System.out.println( f.value+" "+q.size() );
k--;
frontierA[ f.aIdx ] = frontierB[ f.bIdx ] = null;
int fRight = f.aIdx+1;
int fDown = f.bIdx+1;
if( fRight < a.length && frontierA[ fRight ] == null ) {
q.add( new FrontierElem( a[fRight]+b[f.bIdx], fRight, f.bIdx));
frontierA[ fRight ] = f.bIdx;
frontierB[ f.bIdx ] = fRight;
}
if( fDown < b.length && frontierB[ fDown ] == null ) {
q.add( new FrontierElem( a[f.aIdx]+b[fDown], f.aIdx, fDown));
frontierA[ f.aIdx ] = fDown;
frontierB[ fDown ] = f.aIdx;
}
}
}
The idea is similar to the other solution, but with the observation that as you add to your result set from the matrix, at every step the next element in our set can only come from where the current set is concave. I called these elements frontier elements and I keep track of their position in two arrays and their values in a priority queue. This helps keep the queue size down, but by how much I've yet to figure out. It seems to be about sqrt( k ) but I'm not entirely sure about that.
(Of course the frontierA/B arrays could be simple boolean arrays, but this way they fully define my result set, This isn't used anywhere in this example but might be useful otherwise.)
As the pre-condition is the Array are sorted hence lets consider the following
for N= 5;
A[]={ 1,2,3,4,5}
B[]={ 496,497,498,499,500}
Now since we know Summation of N-1 of A&B would be highest hence just insert this in to heap along with the indexes of A & B element ( why, indexes? we'll come to know in a short while )
H.insert(A[N-1]+B[N-1],N-1,N-1);
now
while(!H.empty()) { // the time heap is not empty
H.pop(); // this will give you the sum you are looking for
The indexes which we got at the time of pop, we shall use them for selecting the next sum element.
Consider the following :
if we have i & j as the indexes in A & B , then the next element would be max ( A[i]+B[j-1], A[i-1]+B[j], A[i+1]+B[j+1] ) ,
So, insert the same if that has not been inserted in the heap
hence
(i,j)= max ( A[i]+B[j-1], A[i-1]+B[j], A[i+1]+B[j+1] ) ;
if(Hash[i,j]){ // not inserted
H.insert (i,j);
}else{
get the next max from max ( A[i]+B[j-1], A[i-1]+B[j], A[i+1]+B[j+1] ) ; and insert.
}
K pop-ing them will give you max elements required.
Hope this helps
Many thanks to #rlibby and #xuhdev with such an original idea to solve this kind of problem. I had a similar coding exercise interview require to find N largest sums formed by K elements in K descending sorted arrays - means we must pick 1 element from each sorted arrays to build the largest sum.
Example: List findHighestSums(int[][] lists, int n) {}
[5,4,3,2,1]
[4,1]
[5,0,0]
[6,4,2]
[1]
and a value of 5 for n, your procedure should return a List of size 5:
[21,20,19,19,18]
Below is my code, please take a look carefully for those block comments :D
private class Pair implements Comparable<Pair>{
String state;
int sum;
public Pair(String state, int sum) {
this.state = state;
this.sum = sum;
}
#Override
public int compareTo(Pair o) {
// Max heap
return o.sum - this.sum;
}
}
List<Integer> findHighestSums(int[][] lists, int n) {
int numOfLists = lists.length;
int totalCharacterInState = 0;
/*
* To represent State of combination of largest sum as String
* The number of characters for each list should be Math.ceil(log(list[i].length))
* For example:
* If list1 length contains from 11 to 100 elements
* Then the State represents for list1 will require 2 characters
*/
int[] positionStartingCharacterOfListState = new int[numOfLists + 1];
positionStartingCharacterOfListState[0] = 0;
// the reason to set less or equal here is to get the position starting character of the last list
for(int i = 1; i <= numOfLists; i++) {
int previousListNumOfCharacters = 1;
if(lists[i-1].length > 10) {
previousListNumOfCharacters = (int)Math.ceil(Math.log10(lists[i-1].length));
}
positionStartingCharacterOfListState[i] = positionStartingCharacterOfListState[i-1] + previousListNumOfCharacters;
totalCharacterInState += previousListNumOfCharacters;
}
// Check the state <---> make sure that combination of a sum is new
Set<String> states = new HashSet<>();
List<Integer> result = new ArrayList<>();
StringBuilder sb = new StringBuilder();
// This is a max heap contain <State, largestSum>
PriorityQueue<Pair> pq = new PriorityQueue<>();
char[] stateChars = new char[totalCharacterInState];
Arrays.fill(stateChars, '0');
sb.append(stateChars);
String firstState = sb.toString();
states.add(firstState);
int firstLargestSum = 0;
for(int i = 0; i < numOfLists; i++) firstLargestSum += lists[i][0];
// Imagine this is the initial state in a graph
pq.add(new Pair(firstState, firstLargestSum));
while(n > 0) {
// In case n is larger than the number of combinations of all list entries
if(pq.isEmpty()) break;
Pair top = pq.poll();
String currentState = top.state;
int currentSum = top.sum;
/*
* Loop for all lists and generate new states of which only 1 character is different from the former state
* For example: the initial state (Stage 0) 0 0 0 0 0
* So the next states (Stage 1) should be:
* 1 0 0 0 0
* 0 1 0 0 0 (choose element at index 2 from 2nd array)
* 0 0 1 0 0 (choose element at index 2 from 3rd array)
* 0 0 0 0 1
* But don't forget to check whether index in any lists have exceeded list's length
*/
for(int i = 0; i < numOfLists; i++) {
int indexInList = Integer.parseInt(
currentState.substring(positionStartingCharacterOfListState[i], positionStartingCharacterOfListState[i+1]));
if( indexInList < lists[i].length - 1) {
int numberOfCharacters = positionStartingCharacterOfListState[i+1] - positionStartingCharacterOfListState[i];
sb = new StringBuilder(currentState.substring(0, positionStartingCharacterOfListState[i]));
sb.append(String.format("%0" + numberOfCharacters + "d", indexInList + 1));
sb.append(currentState.substring(positionStartingCharacterOfListState[i+1]));
String newState = sb.toString();
if(!states.contains(newState)) {
// The newSum is always <= currentSum
int newSum = currentSum - lists[i][indexInList] + lists[i][indexInList+1];
states.add(newState);
// Using priority queue, we can immediately retrieve the largest Sum at Stage k and track all other unused states.
// From that Stage k largest Sum's state, then we can generate new states
// Those sums composed by recently generated states don't guarantee to be larger than those sums composed by old unused states.
pq.add(new Pair(newState, newSum));
}
}
}
result.add(currentSum);
n--;
}
return result;
}
Let me explain how I come up with the solution:
The while loop in my answer executes N times, consider the max heap
( priority queue).
Poll operation 1 time with complexity O(log(
sumOfListLength )) because the maximum element Pair in
heap is sumOfListLength.
Insertion operations might up to K times,
the complexity for each insertion is log(sumOfListLength).
Therefore, the complexity is O(N * log(sumOfListLength) ),

Array of size n, with one element n/2 times

Given an array of n integers, where one element appears more than n/2 times. We need to find that element in linear time and constant extra space.
YAAQ: Yet another arrays question.
I have a sneaking suspicion it's something along the lines of (in C#)
// We don't need an array
public int FindMostFrequentElement(IEnumerable<int> sequence)
{
// Initial value is irrelevant if sequence is non-empty,
// but keeps compiler happy.
int best = 0;
int count = 0;
foreach (int element in sequence)
{
if (count == 0)
{
best = element;
count = 1;
}
else
{
// Vote current choice up or down
count += (best == element) ? 1 : -1;
}
}
return best;
}
It sounds unlikely to work, but it does. (Proof as a postscript file, courtesy of Boyer/Moore.)
Find the median, it takes O(n) on an unsorted array. Since more than n/2 elements are equal to the same value, the median is equal to that value as well.
int findLeader(int n, int* x){
int leader = x[0], c = 1, i;
for(i=1; i<n; i++){
if(c == 0){
leader = x[i];
c = 1;
} else {
if(x[i] == leader) c++;
else c--;
}
}
if(c == 0) return NULL;
else {
c = 0;
for(i=0; i<n; i++){
if(x[i] == leader) c++;
}
if(c > n/2) return leader;
else return NULL;
}
}
I'm not the author of this code, but this will work for your problem. The first part looks for a potential leader, the second checks if it appears more than n/2 times in the array.
This is what I thought initially.
I made an attempt to keep the invariant "one element appears more than n/2 times", while reducing the problem set.
Lets start comparing a[i], a[i+1]. If they're equal we compare a[i+i], a[i+2]. If not, we remove both a[i], a[i+1] from the array. We repeat this until i>=(current size)/2. At this point we'll have 'THE' element occupying the first (current size)/2 positions.
This would maintain the invariant.
The only caveat is that we assume that the array is in a linked list [for it to give a O(n) complexity.]
What say folks?
-bhupi
Well you can do an inplace radix sort as described here[pdf] this takes no extra space and linear time. then you can make a single pass counting consecutive elements and terminating at count > n/2.
How about:
randomly select a small subset of K elements and look for duplicates (e.g. first 4, first 8, etc). If K == 4 then the probability of not getting at least 2 of the duplicates is 1/8. if K==8 then it goes to under 1%. If you find no duplicates repeat the process until you do. (assuming that the other elements are more randomly distributed, this would perform very poorly with, say, 49% of the array = "A", 51% of the array ="B").
e.g.:
findDuplicateCandidate:
select a fixed size subset.
return the most common element in that subset
if there is no element with more than 1 occurrence repeat.
if there is more than 1 element with more than 1 occurrence call findDuplicate and choose the element the 2 calls have in common
This is a constant order operation (if the data set isn't bad) so then do a linear scan of the array in order(N) to verify.
My first thought (not sufficient) would be to:
Sort the array in place
Return the middle element
But that would be O(n log n), as would any recursive solution.
If you can destructively modify the array (and various other conditions apply) you could do a pass replacing elements with their counts or something. Do you know anything else about the array, and are you allowed to modify it?
Edit Leaving my answer here for posterity, but I think Skeet's got it.
in php---pls check if it's correct
function arrLeader( $A ){
$len = count($A);
$B = array();
$val=-1;
$counts = array_count_values(array); //return array with elements as keys and occurrences of each element as values
for($i=0;$i<$len;$i++){
$val = $A[$i];
if(in_array($val,$B,true)){//to avoid looping again and again
}else{
if($counts[$val]>$len/2){
return $val;
}
array_push($B, $val);//to avoid looping again and again
}
}
return -1;
}
int n = A.Length;
int[] L = new int[n + 1];
L[0] = -1;
for (int i = 0; i < n; i++)
{
L[i + 1] = A[i];
}
int count = 0;
int pos = (n + 1) / 2;
int candidate = L[pos];
for (int i = 1; i <= n; i++)
{
if (L[i] == candidate && L[pos++] == candidate)
return candidate;
}
if (count > pos)
return candidate;
return (-1);

Resources