Maximize number of zigzag sequence in an array - algorithm

I want to maximize number of zigzag sequence in an array(without reordering).
I've a main array of random sequence of integers.I want a sub-array of index of main array that has zigzag pattern.
A sequence of integers is called zigzag sequence if each of its elements is either strictly less or strictly greater than its neighbors(and two adjacent of neighbors).
Example : The sequence 4 2 3 1 5 2 forms a zigzag, but 7 3 5 5 2 and 3 8 6 4 5
and 4 2 3 1 5 3 don't.
For a given array of integers we need to find (contiguous) sub-array of indexes that forms a zigzag sequence.
Can this be done in O(N) ?

Yes, this would seem to be solvable in O(n) time. I'll describe the algorithm as a dynamic program.
Setup
Let the array containing potential zig-zags be called Z.
Let U be an array such that len(U) == len(Z), and U[i] is an integer representing the largest contiguous left-to-right subsequence starting at i that is a zig-zag such that Z[i] < Z[i+1] (it zigs up).
Let D be similar to U, except that D[i] is an integer representing the largest contiguous left-to-right subsequence starting at i that is a zig-zag such that Z[i] > Z[i+1] (it zags down).
Subproblem
The subproblem is to find both U[i] and D[i] at each i. This can be done as follows:
U[i] = {
1 + D[i+1] if i < i+1
0 otherwise
}
L[i] = {
1 + U[i+1] if i > i+1
0 otherwise
}
The top version says that if we're looking for the largest sequence beginning with an up-zig, we see if the next element is larger (goes up), and then add a single zig to the size of the next down-zag sequence. The next one is the reverse.
Base Cases
If i == len(Z) (it is the last element), U[i] = L[i] = 0. The last element cannot have a left-to-right sequence after it because there is nothing after it.
Solution
To get the solution, first we find max(U[i]) and max(L[i]) for every i. Then get the maximum of those two values, store i, and store the length of this largest zig-zag (in a variable called length). The sequence begins at index i and ends at index i + length.
Runtime
There are n indexes, so there are 2n subproblems between U and L. Each subproblem takes O(1) time to solve, given that solutions to previously solved subproblems are memoized. Finally, iterating through U and L to get the final answer takes O(2n) time.
We thus have O(2n) + O(2n) time, or O(n).
This may be an overly complex solution, but it demonstrates that it can be done in O(n).

Related

Least cost increasing subsequence

Say we have an array A that contains N integers. The problem is that we want to minimize the cost of some increasing subsequence(not necessarily strictly increasing) starting at position 1 and ending at position N. The total cost of a subsequence is the total cost of transitioning between elements in the subsequence. When building the subsequence, the cost of transitioning from position j to position i, where i >= j can be found in the matrix COST[i][j]. It is guaranteed that some increasing subsequence exists in which we start from position 1 and reach position N. Values in the array may be very large.
For example:
N = 5
A = [0,3,2,3,3]
Cost =
[[0,INF,INF,INF,INF],
[3,0,INF,INF,INF],
[3,INF,0,INF,INF],
[5,2,2,0,INF],
[6,0,3,1,0]]
The least-cost increasing subsequence is (A[1], A[2], A[5]) or (0,3,3).
The cost
is COST[2][1] + COST[5][2] = 3 + 0 = 3.
So far I have been able to modify the traditional O(n^2) dp solution by initializing dp[i] to infinity and dp[1] to 0 and subsequently looping over all previous values to extend the subsequence. While iterating through previous values I simply maintain the minimum cost.
Now I want to improve this solution and make it o(nlogn). I know the regular LIS problem can be solved using arrays and binary search, but I have been unable to modify such an approach to fit this problem.

Sequence increasing and decreasing by turns

Let's assume we've got a sequence of integers of given length n. We want to delete some elements (maybe none), so that the sequence is increasing and decreasing by turns in result. It means, that every element should have neighbouring elements either both bigger or both smaller than itself.
For example 1 3 2 7 6 and 5 1 4 2 10 are both sequences increasing and decreasing by turns.
We want to delete some elements to transform our sequence that way, but we also want to maximize the sum of elements left. So, for example, from sequence 2 18 6 7 8 2 10 we want to delete 6 and make it 2 18 7 8 2 10.
I am looking for an effective solution to that problem. Example above shows that the most naive greedy algorithm (delete every first element that breaks the sequence) won't work - it would delete 7 instead of 6, which would not maximize the sum of elements left.
Any ideas how to solve that effectively (O(n) or O(n log n) probably) and correctly?
For every element of the sequence with index i we will calculate F(i, high) and F(i, low), where F(i, high) equals to the biggest sum of the subsequence with wanted characteristics that ends with the i-th element and this element is a "high peak". (I'll explain mainly the "high" part, the "low" part can be done similarly). We can calculate these functions using the following relations:
The answer is maximal among all F(i, high) and F(i, low) values.
That gives us a rather simple dynamic programming solution with O(n^2) time complexity. But we can go further.
We can optimize a calculation of max(F(j,low)) part. What we need to do is to find the biggest value among previously calculated F(j, low) with the condition that a[j] < a[i]. This can be done with segment trees.
First of all, we'll "squeeze" our initial sequence. We need the real value of the element a[i] only when calculating the sum. But we need only the relative order of the elements when checking that a[j] is less than a[i]. So we'll map every element to its index in the sorted elements array without duplicates. For example, sequence a = 2 18 6 7 8 2 10 will be translated to b = 0 5 1 2 3 0 4. This can be done in O(n*log(n)).
The biggest element of b will be less than n, as a result, we can build a segment tree on the segment [0, n] with every node containing the biggest sum within the segment (we need two segment trees for "high" and "low" part accordingly). Now let's describe the step i of the algorithm:
Find the biggest sum max_low on the segment [0, b[i]-1] using the "low" segment tree (initially all nodes of the tree contain zero).
F(i, high) is equal to max_low + a[i].
Find the biggest sum max_high on the segment [b[i]+1, n] using the "high" segment tree.
F(i, low) is equal to max_high + a[i].
Update the [b[i], b[i]] segment of the "high" segment tree with F(i, high) value recalculating maximums of the parent nodes (and [b[i], b[i]] node itself).
Do the same for "low" segment tree and F(i, low).
Complexity analysis: b sequence calculation is O(n*log(n)). Segment tree max/update operations have O(log(n)) complexity and there are O(n) of them. The overall complexity of this algorithm is O(n*log(n)).

Correctness of greedy algorithm

In non-decreasing sequence of (positive) integers two elements can be removed when . How many pairs can be removed at most from this sequence?
So I have thought of the following solution:
I take given sequence and divide into two parts (first and second).
Assign to each of them iterator - it_first := 0 and it_second := 0, respectively. count := 0
when it_second != second.length
if 2 * first[it_first] <= second[it_second]
count++, it_first++, it_second++
else
it_second++
count is the answer
Example:
count := 0
[1,5,8,10,12,13,15,24] --> first := [1,5,8,10], second := [12,13,15,24]
2 * 1 ?< 12 --> true, count++, it_first++ and it_second++
2 * 5 ?< 13 --> true, count++, it_first++ and it_second++
2 * 8 ?< 15 --> false, it_second++
8 ?<24 --> true, count ++it_second reach the last element - END.
count == 3
Linear complexity (the worst case when there are no such elements to be removed. n/2 elements compare with n/2 elements).
So my missing part is 'correctness' of algorithm - I've read about greedy algorithms proof - but mostly with trees and I cannot find analogy. Any help would be appreciated. Thanks!
EDIT:
By correctness I mean:
* It works
* It cannot be done faster(in logn or constant)
I would like to put some graphics but due to reputation points < 10 - I can't.
(I've meant one latex at the beginning ;))
Correctness:
Let's assume that the maximum number of pairs that can be removed is k. Claim: there is an optimal solution where the first elements of all pairs are k smallest elements of the array.
Proof: I will show that it is possible to transform any solution into the one that contains the first k elements as the first elements of all pairs.
Let's assume that we have two pairs (a, b), (c, d) such that a <= b <= c <= d, 2 * a <= b and 2 * c <= d. In this case, pairs (a, c) and (b, d) are valid, too. And now we have a <= c <= b <= d. Thus, we can always transform out pairs in such a way that the first element from any pair is not greater than the second element of any pair.
When we have this property, we can simply substitute the smallest element among all first all elements of all pairs with the smallest element in the array, the second smallest among all first elements - with the second smallest element in the array and so on without invalidating any pair.
Now we know that there is an optimal solution that contains k smallest elements. It is clear that we cannot make the answer worse by taking the smallest unused element(making it bigger can only reduce the answer for the next elements) which fits each of them. Thus, this solution is correct.
A note about the case when the length of the array is odd: it doesn't matter where the middle element goes: to the first or to the second half. In the first half it is useless(there are not enough elements in the second half). If we put it to the second half, it is useless two(let's assume that we took it. It means that there is "free space" somewhere in the second half. Thus, we can shift some elements by one and get rid of it).
Optimality in terms of time complexity: the time complexity of this solution is O(n). We cannot find the answer without reading the entire input in the worst case and reading is already O(n) time. Thus, this algorithm is optimal.
Presuming your method. Indices are 0-based.
Denote in general:
end_1 = floor(N/2) boundary (inclusive) of first part.
Denote while iterating:
i index in first part, j index in second part,
optimal solution until this point sol(i,j) (using algorithm from front),
pairs that remain to be paired-up optimally behind (i,j) point i.e. from
(i+1,j+1) onward rem(i,j) (can be calculated using algorithm from back),
final optimal solution can be expressed as the function of any point as sol(i,j) + rem(i,j).
Observation #1: when doing algorithm from front all points in [0, i] range are used, some points from [end_1+1, j] range are not used (we skip a(j) not large engough). When doing algorithm from back some [i+1, end_1] points are not used, and all [j+1, N] points are used (we skip a(i) not small enough).
Observation #2: rem(i,j) >= rem(i,j+1), because rem(i,j) = rem(i,j+1) + M, where M can be 0 or 1 depending on whether we can pair up a(j) with some unused element from [i+1, end_1] range.
Argument (by contradiction): let's assume 2*a(i) <= a(j) and that not pairing up a(i) and a(j) gives at least as good final solution. By the algorithm we would next try to pair up a(i) and a(j+1). Since:
rem(i,j) >= rem(i,j+1) (see above),
sol(i,j+1) = sol(i,j) (since we didn't pair up a(i) and a(j))
we get that sol(i,j) + rem(i,j) >= sol(i,j+1) + rem(i,j+1) which contradicts the assumption.

Generate a random integer from 0 to N-1 which is not in the list

You are given N and an int K[].
The task at hand is to generate a equal probabilistic random number between 0 to N-1 which doesn't exist in K.
N is strictly a integer >= 0.
And K.length is < N-1. And 0 <= K[i] <= N-1. Also assume K is sorted and each element of K is unique.
You are given a function uniformRand(int M) which generates uniform random number in the range 0 to M-1 And assume this functions's complexity is O(1).
Example:
N = 7
K = {0, 1, 5}
the function should return any random number { 2, 3, 4, 6 } with equal
probability.
I could get a O(N) solution for this : First generate a random number between 0 to N - K.length. And map the thus generated random number to a number not in K. The second step will take the complexity to O(N). Can it be done better in may be O(log N) ?
You can use the fact that all the numbers in K[] are between 0 and N-1 and they are distinct.
For your example case, you generate a random number from 0 to 3. Say you get a random number r. Now you conduct binary search on the array K[].
Initialize i = K.length/2.
Find K[i] - i. This will give you the number of numbers missing from the array in the range 0 to i.
For example K[2] = 5. So 3 elements are missing from K[0] to K[2] (2,3,4)
Hence you can decide whether you have to conduct the remaining search in the first part of array K or the next part. This is because you know r.
This search will give you a complexity of log(K.length)
EDIT: For example,
N = 7
K = {0, 1, 4} // modified the array to clarify the algorithm steps.
the function should return any random number { 2, 3, 5, 6 } with equal probability.
Random number generated between 0 and N-K.length = random{0-3}. Say we get 3. Hence we require the 4th missing number in array K.
Conduct binary search on array K[].
Initial i = K.length/2 = 1.
Now we see K[1] - 1 = 0. Hence no number is missing upto i = 1. Hence we search on the latter part of the array.
Now i = 2. K[2] - 2 = 4 - 2 = 2. Hence there are 2 missing numbers up to index i = 2. But we need the 4th missing element. So we again have to search in the latter part of the array.
Now we reach an empty array. What should we do now? If we reach an empty array between say K[j] & K[j+1] then it simply means that all elements between K[j] and K[j+1] are missing from the array K.
Hence all elements above K[2] are missing from the array, namely 5 and 6. We need the 4th element out of which we have already discarded 2 elements. Hence we will choose the second element which is 6.
Binary search.
The basic algorithm:
(not quite the same as the other answer - the number is only generated at the end)
Start in the middle of K.
By looking at the current value and it's index, we can determine the number of pickable numbers (numbers not in K) to the left.
Similarly, by including N, we can determine the number of pickable numbers to the right.
Now randomly go either left or right, weighted based on the count of pickable numbers on each side.
Repeat in the chosen subarray until the subarray is empty.
Then generate a random number in the range consisting of the numbers before and after the subarray in the array.
The running time would be O(log |K|), and, since |K| < N-1, O(log N).
The exact mathematics for number counts and weights can be derived from the example below.
Extension with K containing a bigger range:
Now let's say (for enrichment purposes) K can also contain values N or larger.
Then, instead of starting with the entire K, we start with a subarray up to position min(N, |K|), and start in the middle of that.
It's easy to see that the N-th position in K (if one exists) will be >= N, so this chosen range includes any possible number we can generate.
From here, we need to do a binary search for N (which would give us a point where all values to the left are < N, even if N could not be found) (the above algorithm doesn't deal with K containing values greater than N).
Then we just run the algorithm as above with the subarray ending at the last value < N.
The running time would be O(log N), or, more specifically, O(log min(N, |K|)).
Example:
N = 10
K = {0, 1, 4, 5, 8}
So we start in the middle - 4.
Given that we're at index 2, we know there are 2 elements to the left, and the value is 4, so there are 4 - 2 = 2 pickable values to the left.
Similarly, there are 10 - (4+1) - 2 = 3 pickable values to the right.
So now we go left with probability 2/(2+3) and right with probability 3/(2+3).
Let's say we went right, and our next middle value is 5.
We are at the first position in this subarray, and the previous value is 4, so we have 5 - (4+1) = 0 pickable values to the left.
And there are 10 - (5+1) - 1 = 3 pickable values to the right.
We can't go left (0 probability). If we go right, our next middle value would be 8.
There would be 2 pickable values to the left, and 1 to the right.
If we go left, we'd have an empty subarray.
So then we'd generate a number between 5 and 8, which would be 6 or 7 with equal probability.
This can be solved by basically solving this:
Find the rth smallest number not in the given array, K, subject to
conditions in the question.
For that consider the implicit array D, defined by
D[i] = K[i] - i for 0 <= i < L, where L is length of K
We also set D[-1] = 0 and D[L] = N
We also define K[-1] = 0.
Note, we don't actually need to construct D. Also note that D is sorted (and all elements non-negative), as the numbers in K[] are unique and increasing.
Now we make the following claim:
CLAIM: To find the rth smallest number not in K[], we need to find right most occurrence of r' in D (which occurs at position defined by j), where r' is the largest number in D, which is < r. Such an r' exists, because D[-1] = 0. Once we find such an r' (and j), the number we are looking for is r-r' + K[j].
Proof: Basically the definition of r' and j tells us that there are exactlyr' numbers missing from 0 to K[j], and more than r numbers missing from 0 to K[j+1]. Thus all the numbers from K[j]+1 to K[j+1]-1 are missing (and these missing are at least r-r' in number), and the number we seek is among them, given by K[j] + r-r'.
Algorithm:
In order to find (r',j) all we need to do is a (modified) binary search for r in D, where we keep moving to the left even if we find r in the array.
This is an O(log K) algorithm.
If you are running this many times, it probably pays to speed up your generation operation: O(log N) time just isn't acceptable.
Make an empty array G. Starting at zero, count upwards while progressing through the values of K. If a value isn't in K add it to G. If it is in K don't add it and progress your K pointer. (This relies on K being sorted.)
Now you have an array G which has only acceptable numbers.
Use your random number generator to choose a value from G.
This requires O(N) preparatory work and each generation happens in O(1) time. After N look-ups the amortized time of all operations is O(1).
A Python mock-up:
import random
class PRNG:
def __init__(self, K,N):
self.G = []
kptr = 0
for i in range(N):
if kptr<len(K) and K[kptr]==i:
kptr+=1
else:
self.G.append(i)
def getRand(self):
rn = random.randint(0,len(self.G)-1)
return self.G[rn]
prng=PRNG( [0,1,5], 7)
for i in range(20):
print prng.getRand()

lexicographical Smallest permutation in matrix with restricted exchange

Following question was asked in Facebook Job interview aptitude test:
A permutation is a list of K numbers, each between 1 and K (both inclusive),
that has no duplicate elements.
Permutation X is lexicographically smaller than Permutation Y iff for some
i <= K:
All of the first i-1 elements of X are equal to first i-1 elements of Y.
ith element of X is smaller than ith element of Y.
You are given a permutation P, you can exchange some of its elements as many
times as you want in any order. You have to find the lexicographically smallest
Permutation that you can obtain from P.
K is less than 101.
Input Format:
First line of input contains K being the size of permutation.
Next line contains K single spaced numbers indicating the permutation.
Each of next K lines contains K characters, character j of line i is equal to
'Y' if you can exchange ith and jth element of a permutation, and
'N' otherwise.
Output Format:
Print K numbers with a space separating each of them indicating the
permutation.
Sample Input
3
3 1 2
NNY
NNN
YNN
Sample Output
2 1 3
Sample Input
3
3 2 1
NYN
YNY
NYN
Sample Output
1 2 3
In the first example you can exchange first element with last element to
obtain 2 1 3 from 3 1 2.
What I did?
I generated all the permutations First.
Then, I discarded those permutations that are not feasible.
In Example 1: 1 3 3 is not feasible as position 1 and 2 is nonexchangeable.
From all the permissible permutation lists, i picked the lexicographical smallest one as the solution.
Problem with above solution:
My solution works perfectly for K<=25. When the size of K becomes greater than 25, the solution is really slow. For K=100, I did n't get the output even in 60 minutes.
My Questions here are:
How Should I optimize my solution?
Can it be done without generating all permutations?
Better Solution with Explanations and Pseudo-Code(or Code) will be highly helpful.
Thank you!
My solution works perfectly for K<=25. When the size of K becomes greater than 25, the solution is really slow.
Your solution will work very slow. As you are generating all permutations, so overall complexity for generating permutations is:
O(2^K).
Hence, O(2^K) will take a year as K can be as large as 100.
Can it be done without generating all permutations?
Yes, it can be done without generating all permutations.
How Should I optimize my solution?
You can solve this problem in linear time using(DFS and Connected Component) concept in graph theory.
Please note we will take second example for explaining the steps (involved in the algorithm) which I am going to descibe.
Step 1:
Construct a graph G with K-1 Vertices.
Thus V={0,1,2}
Step 2:
Let an edge e connect two vertices whenever swapping the elements at those two positions is permissible.
Therefore the edges are: E={(0,1) , (1,0) , (1,2) , (2,1)}
Step 3:
Find all the Connected Components(CC) of this graph G(V,E).
In example 2:
All the CC are:
CC1: {0, 1, 2}
Step 4:
For each of the connected components, sort the elements available within that connected component in such a way that smallest index within the connected component gets smallest element, second smallest index gets second smallest element, etc.
In example 2:
Smallest index in CC1 = 0
Smallest index in CC1 = 1
Smallest index in CC1 = 2
Smallest index 0 in CC1 gets the smallest element. Smallest
element=1.
Second smaller index in CC1 gets the second smallest element. Second
smallest index =2.
Third smaller index in CC1 gets the Third smallest element. Third
smallest index =3.
Thus, The result after sorting CC1 as per above rule is (1,2,3).
When Step 4 is done for all connected components, we have the lowest possible permutation.
Therefore, 1 2 3 is the lexicographically smallest permutation in example 2.
Pseudo-Code(or Code) will be highly helpful.
As I had already described the logic, here is the Code in C++:
vector<int>TMP_IP;
char Adjacency[LIMIT][LIMIT];
vector<vector<int> >ADJ_vector(LIMIT);
int MARKED[LIMIT];
vector<int>connected_COMPONENTS;
vector<int>Positions_vector;
void DepthFirstTraversal(int u)
{
MARKED[u]=1;
connected_COMPONENTS.push_back(u);
for(int j=0;j<ADJ_vector[u].size();++j)
if(!MARKED[ADJ_vector[u][j]] )
DepthFirstTraversal(ADJ_vector[u][j]);
}
//Print result
void lexo_smallest(int K)
{
for(int i=0;i<K;++i)
cout<<TMP_IP[i]<<" ";
cout<<endl;
}
int main()
{
int K,ip;
string rows[109];
scanf("%d",&K);
for(int i=0;i<K;++i)
{
scanf("%d",&ip);
TMP_IP.push_back(ip);
}
for(int i=0;i<K;++i)
cin>>rows[i];
for(int i=0;i<K;++i)
for(int j=0;j<rows[i].size();++j)
Adjacency[i][j]=rows[i][j];
for(int i=0;i<K;++i)
for(int j=0;j<K;++j)
if(Adjacency[i][j]=='Y')
ADJ_vector[i].push_back(j);
for( int i = 0 ; i <K ; ++i )
{
if( !MARKED[ i ] )
{
DepthFirstTraversal( i );
for(int x=0;x<connected_COMPONENTS.size();++x)
{
Positions_vector.push_back(TMP_IP[connected_COMPONENTS[x]]);
}
sort(connected_COMPONENTS.begin(),connected_COMPONENTS.end());
sort(Positions_vector.begin(),Positions_vector.end());
for(int x=0;x<connected_COMPONENTS.size();++x)
{
TMP_IP[connected_COMPONENTS[x]]=Positions_vector[x];
}
connected_COMPONENTS.clear();
Positions_vector.clear();
}
}
lexo_smallest(K);
return 0;
}
DEMO # IDEONE
Complexity of the above solution:
The total time for taking input is O(K^2).
Complexity of above algorithm is same as DFS. O(V+E).
Overall time: O(K^2)+O(V+E)
Even for K=5000, the above solution is amazingly fast.
In this you first need to form a new matrix in which if a(i,j) is Y then this means that element at jth position can come at ith place.
This task can be done easily by applying Floyd Warshall's Algorithm, By replacing all Y with 1 and all N with infinity(or a very large number). So after applying Floyd Warshall if any element a(i,j) is less than infinity then element at jth position can be placed at position i.
Now the task is easy. Choose the elements greedily i.e. for each i you are given a list of element that you can exchange places with. So now go sequentially from position 1 to end and for each i, find the element with minimum value (index j) for which a(i,j) is Y (i.e. less than infinity) and swap i and j.
You can do it location by location basis.
For the left-most location find the smallest number you can find and swap it to the location 1. This can be done by checking if a path from location of 1 to first location exists and so on.

Resources