Data structure to find the nearest untaken element - algorithm

I have an array A[0..N-1] containing N elements and a list containing M indexes. Each index in the list corresponds to an element in the array. For example, an index of 0 corresponds to A[0], an index of 1 corresponds to A[1]
I want to process the list sequentially from the first index to the last index as follows:
For the current index i,
if A[i] is not taken, then A[i] will be taken.
if A[i] is taken, then the smallest j > i where A[j] is not taken will be chosen and A[j] will be taken. If there is no such element, output -1
I want to output an array B of length M where B[i] denotes the index of element taken. I wonder how do I do it in linear complexity. (i.e. O(N) or O(M)). What data structure could be used?

Here is a suggestion for an O(M*logM) algorithm.
At the start, insert all M i-indexes of the list into a sorted tree allowing doubles. The steps become,
If A[i] is not taken, then take A[i]. Remove the i from the tree.
If A[i] is taken, then the smallest j > i where A[j] is not taken will be chosen and A[j] will be taken. The smallest j>i is readily available after i in the tree. If there is no such j, output -1. Otherwise, remove the j from the tree.

Related

Find majority element when the values are unknown

Suppose I have an array of elements.
I cannot read the values of the elements. I can only compare any two elements from the array to know whether they are the same or not, but even then I don't get to know their actual values.
Suppose this array has a majority of elements of the same value. I need to find and return any of the majority elements. How would I do it?
We have to be be able to do it in a big thet.of n l0g n.
Keep track of two indices, i & j. Initialize i=0, j=1. Repeatedly compare arr[i] to arr[j].
if arr[i] == arr[j], increment j.
if arr[i] != arr[j]
eliminate both from the array
increment i to the next index that hasn't been eliminated.
increment j to the next index >i that hasn't been eliminated.
The elimination operation will eliminate at least one non-majority element each time it eliminates a majority element, so majority is preserved. When you've gone through the array, all elements not eliminated will be in the majority, and you're guaranteed at least one.
This is O(n) time, but also O(n) space to keep track of eliminations.
Given:
an implicit array a of length n, which is known to have a majority element
an oracle function f, such that f(i, j) = a[i] == a[j]
Asked:
Return an index i, such that a[i] is a majority element of a.
Main observation:
If
m is a majority element of a, and
for some even k < n each element of a[0, k) occurs at most k / 2 times
then m is a majority element of a[k, n).
We can use that observation by assuming that the first element is the majority element. We move through the array until we reach a point where that element occurred exactly half the time. Then we discard the prefix and continue again from that point on. This is exactly what the Boyer-Moore algorithm does, as pointed out by Rici in the comments.
In code:
result = 0 // index where the majority element is
count = 0 // the number of times we've seen that element in the current prefix
for i = 0; i < n; i++ {
// we've seen the current majority candidate exactly half of the time:
// discard the current prefix and start over
if (count == 0) {
result = i
}
// keep track of how many times we've seen the current majority candidate in the prefix
if (f(result, i)) {
count++
} else {
count--
}
}
return result
For completeness: this algorithm uses two variables and a single loop, so it runs in O(n) time and O(1) space.
Assuming you can determine if elements are <, >, or == what you can do is go through the list and build a tree. The trees values will be like buckets, the item and count of how many you've seen. When you come by a node where you get == then just increment the count. Then at the end go through the tree and find the one with the highest count.
Assuming you build a balanced tree, this should be O(n log n). Red Black trees might help with making a balanced tree. Else you could build the tree by adding randomly selected elements and this would give you O(n log n) on average.

Longest Length sub array with elements in a given range

If I have a list of integers, in an array, how do I find the length of the longest sub array, such that the difference between the minimum and maximum element of that array is less than a given integer, say M.
So if we had an array with 3 elements,
[1, 2, 4]
And if M were equal to 2
Then the longest subarry would be [1, 2]
Because if we included 4, and we started from the beginning, the difference would be 3, which is greater than M ( = 2), and if we started from 2, the difference between the largest (4) and smallest element (2) would be 2 and that is not less than 2 (M)
The best I can think of is to start from the left, then go as far right as possible without the sub array range getting too high. Of course at each step we have to keep track of the minimum and maximum element so far. This has an n squared time complexity though, can't we get it faster?
I have an improvement to David Winder's algorithm. The idea is that instead of using two heaps to find the minimum and maximum elements, we can use what I call the deque DP optimization trick (there's probably a proper name for this somewhere).
To understand this, we can look at a simpler problem: finding the minimum element in all subarrays of some size k in an array. The idea is that we keep a double-ended queue containing potential candidates for the minimum element. When we encounter a new element, we pop off all the elements at the back end of the queue more than or equal to the current element before pushing the current element into the back.
We can do this because we know that any subarray we encounter in the future which includes an element that we pop off will also include the current element, and since the current element is less than those elements that gets popped off, those elements will never be the minimum.
After pushing the current element, we pop off the front element in the queue if it is more than k elements away. The minimum element in the current subarray is simply the first element in the queue because the way we popped off the elements from the back of the queue kept it increasing.
To use this algorithm in your problem, we would have two deques to store the minimum and maximum elements. When we encounter a new element which is too much larger than the minimum element, we pop off the front of the deque until the element is no longer too large. The beginning of the longest array ending at that position is then the index of the last element we popped off plus 1.
This makes the solution O(n).
C++ implementation:
int best = std::numeric_limits<int>::lowest(), beg = 0;
//best = length of the longest subarray that meets the requirements so far
//beg = the beginning of the longest subarray ending at the current index
std::deque<int> least, greatest;
//these two deques store the indices of the elements which could cause trouble
for (int i = 0; i < n; i++)
{
while (!least.empty() && a[least.back()] >= a[i])
{
least.pop_back();
//we can pop this off since any we encounter subarray which includes this
//in the future will also include the current element
}
least.push_back(i);
while (!greatest.empty() && a[greatest.back()] <= a[i])
{
greatest.pop_back();
//we can pop this off since any we encounter subarray which includes this
//in the future will also include the current element
}
greatest.push_back(i);
while (a[least.front()] < a[i] - m)
{
beg = least.front() + 1;
least.pop_front();
//remove elements from the beginning if they are too small
}
while (a[greatest.front()] > a[i] + m)
{
beg = greatest.front() + 1;
greatest.pop_front();
//remove elements from the beginning if they are too large
}
best = std::max(best, i - beg + 1);
}
Consider the following idea:
Let create MaxLen array (size of n) which define as: MaxLen[i] = length of the max sub-array till the i-th place.
After we will fill this array it will be easy (O(n)) to find your max sub-array.
How do we fill the MaxLen array? Assume you know MaxLen[i], What will be in MaxLen[i+1]?
We have 2 option - if the number in originalArr[i+1] do not break your constrains of exceed diff of m in the longest sub-array ending at index i then MaxLen[i+1] = MaxLen[i] + 1 (because we just able to make our previous sub array little bit longer. In the other hand, if originalArr[i+1] bigger or smaller with diff m with one of the last sub array we need to find the element that has diff of m and (let call its index is k) and insert into MaxLen[i+1] = i - k + 1 because our new max sub array will have to exclude the originalArr[k] element.
How do we find this "bad" element? we will use Heap. After every element we pass we insert it value and index to both min and max heap (done in log(n)). When you have the i-th element and you want to check if there is someone in the previous last array who break your sequence you can start extract element from the heap until no element is bigger or smaller the originalArr[i] -> take the max index of the extract element and that your k - the index of the element who broke your sequence.
I will try to simplify with pseudo code (I only demonstrate for min-heap but it the same as the max heap)
Array is input array of size n
min-heap = new heap()
maxLen = array(n) // of size n
maxLen[0] = 1; //max subArray for original Array with size 1
min-heap.push(Array[0], 0)
for (i in (1,n)) {
if (Array[i] - min-heap.top < m) // then all good
maxLen[i] = maxLen[i-1] + 1
else {
maxIndex = min-heap.top.index;
while (Array[i] - min-heap.top.value > m)
maxIndex = max (maxIndex , min-heap.pop.index)
if (empty(min-heap))
maxIndex = i // all element are "bad" so need to start new sub-array
break
//max index is our k ->
maxLen[i] = i - k + 1
}
min-heap.push(Array[i], i)
When you done, run on your max length array and choose the max value (from his index you can extract the begin an end indexes of the original array).
So we had loop over the array (n) and in each insert to 2 heaps (log n).
You would probably saying: Hi! But you also had un-know times of heap extract which force heapify (log n)! But notice that this heap can have max of n element and element can be extract twice so calculate accumolate complecsity and you will see its still o(1).
So bottom line: O(n*logn).
Edited:
This solution can be simplify by using AVL tree instead of 2 heaps - finding min and max are both O(logn) in AVL tree - same goes for insert, find and delete - so just use tree with element of the value and there index in the original array.
Edited 2:
#Fei Xiang even came up with better solution of O(n) using deques.

Dynamic programming algorithm (Kadane)

Description of the algorithm:
Maximum Subarray Problem
Given a sequence of n real numbers A(1) … A(n), determine a contiguous subsequence A(i) … A(j) for which the sum of elements in the subsequence is maximized.
Algorithm:
int kadane(int a[], int n)
{
int overall_sum=0; //overall maximum subarray sum
int new_sum=0; //sum obtained by including the current element
for(int i=0;i<n;i++)
{
//new_sum is the maximum value out of current element or the sum of current element
//and the previous sum
new_sum=max(a[i], new_sum+a[i]);
cout << new_sum << " : ";
//if the calculated value of new_sum is greater than the overall sum,
//it replaces the overall sum value
overall_sum=max(overall_sum, new_sum);
cout << overall_sum << endl;
}
return overall_sum;
}
I understand that we are trying to break down the problem into small sub-problems. The idea is to determine the largest partial sum of the n-1 sub-sequence to find the largest partial sum of the n sequence. The code looks clear to me in the sense that I can work it out on paper to find the solution, but the idea seems like magic. Can someone provide a better explanation of this algorithm? or a proof of why it works?
To be 100% precise, what the algorithm actually calculates is: maximum sum of a non-empty subsequence, for non-empty arrays (and zero for empty arrays, which is somewhat inconsistent). It makes a difference for arrays where all numbers are negative - if we counted an empty sequence as valid, then the result should be 0. The algorithm produces the largest negative number rather than 0 for such cases.
Proof:
At the beginning of the loop new_sum is always the maximum sum of those sequences that end on (excluding) a[i] (so, up to a[i-1] for i>0, 0 for i==0). Proof by induction of loop executions. This is obviously true for i=0 (new_sum == 0 which is the sum of an empty sequence), and becomes true for i+1 after the assignment, because the maximum-sum non-empty sequence ending at a[i] (which is the last element before a[i+1]) needs to include a[i] and is therefore the maximum of a[i] itself and the sum of a[i] and the preceding sequence.
The overall_sum is just the maximum of all new_sum values for a[i], and therefore represents the maximum global subsequence (for some i it has to end at a[i], so maxing over all a[i] works).
You've already included the explanation of why it works in the code comments:
new_sum is the maximum value out of current element
or the sum of current element and the previous sum
Rather than thinking of the algorithm as the best sum up to element i, think of it as the best sum starting at element i.
Notice that the algorithm does not admit new_sum to ever not include the current element in the traversal. If ever A[i] alone is greater than A[i] added to sum-up-to-A[i-1], it makes no sense for A[i] to include the previous section and we start counting from scratch. This guarantees that the sum we count starting at A[i] reaches the greatest it can be. We may see it decrease but by then we already updated the overall greatest sum if need be.

Algorithm for counting specific type of inversions in an array

I need an algorithm that counts the inversions of type:
Inversion between a and b exists if a has lower index and a > 2b.
Can you think of an algorithm that would do it in O(n logn)?
It can be done via a small tweak in merge sort algorithm. Counting inversions in an array
In the normal standard algorithm during the merge phase you compare elements from left and right half and increase inversions by number of elements remaining in Left portion. Here we increment not by the number of elements remaining in the left half but rather by the number of elements remaining in the left half which are more than twice as large.
A[1..n]
B[1..n] = copy(A)
sort(B) //n*log(n)
for i = 1 to n-1
//log(n)
exists = specialBinarySearch(B, A[i], 1, n)
//log(n)
setHighest(B, A[i], 1, n)
if exists
count++
specialBinarySearch(a, key, from, to)
if from <= to
mid = from + (to-from)/2
if a[mid] < floor(key/2)
return true
else //must go to left of it to get even smaller value
specialBinarySearch(a, key, from, mid-1)
else
return false
setHighest(a, key, from, to)
if from <= to
mid = from + (to-from)/2
if a[mid] == key
a[mid] = INT_MAX
else if a[mid] < key
setHighest(a, key, mid+1, to)
else
setHighest(a, key, from, mid-1)
OK. So, basically here are the steps.
Copy to an auxiliary array B. This O(n)
Sort with any n*logn algorithm
For each element a in A, perform a binary search in B for any element B[i] such that a > 2*B[i]. O(logn). (the algorithm I have written to avoid overflow)
Since we do not have to take B[i] into account, make it disqualify for comparison by setting B[i] = infinity. Another binary search. O(logn)
Repeat 3 and 4 till it exhausts.
So, lets calculate we have
O(n) + O(n*log(n)) + n*O(log(n))
=> O(n*log(n)) asymptotically
This may be solved using dynamic order statistics data structure. I know two alternatives for such a structure:
Order statistic tree
Indexable skiplist
For each element of the array (b) in order, find rank of the value 2b in the order statistics data structure. Then insert b into the order statistics data structure.
Rank of the value 2b gives number of elements a, that have lower index and are less than 2b. Sum of these numbers gives number of "inversions".

Efficient Way to Find Pair Orderings?

Let's say I have three arrays a, b, and c of equal length N. The elements of each of these arrays come from a totally ordered set, but are not sorted. I also have two index variables, i and j. For all i != j, I want to count the number of index pairs such that a[i] < a[j], b[i] > b[j] and c[i] < c[j]. Is there any way this can be done in less than O(N ^ 2) time complexity, for example by creative use of sorting algorithms?
Notes: The inspiration for this question is that, if you only have two arrays, a and b, you can find the number of index pairs such that a[i] < a[j] and b[i] > b[j] in O(N log N) with a merge sort. I'm basically looking for a generalization to three arrays.
For simplicity, you may assume that no two elements of any array are equal (no ties).
By sorting the array a and rearranging the arrays b and c at the same time, we can suppose that a[i] < a[j] <=> i < j. So we need to find the number of pairs (i,j) such that i < j, b[i] > b[j] and c[i] < c[j]. Let's view (b[i], c[i]) as a point on a plane. We add the points one by one. Each time we add a point (b[j], c[j]), first we count the number of already added points (i < j) such that b[i] > b[j] and c[i] < c[j]. Then we add the point j and proceed to the next one. The sum of the numbers obtained at each step is our result.
Now it seems that this kind of queries can be fulfilled by two-dimensional segment tree: http://en.wikipedia.org/wiki/Segment_tree The cost of one iteration will be O(log^2 n), and the total complexity is O(n log^2 n).
(Note that I assume here that the elements of arrays are numbers. It's OK, because using a sorting we can always replace the elements of an array with numbers from 1 to n so that the order was preserved.)
Edit: In fact, a simpler structure called Fenwick tree or binary indexed tree is sufficient. See this link: http://www.topcoder.com/tc?module=Static&d1=tutorials&d2=binaryIndexedTrees#2d

Resources