Proof of Longest Increasing Subsequence using greedy patience sort

Proof of Longest Increasing Subsequence using greedy patience sort - algorithm

I came across the solution that uses Patience sort to obtain the length of the Longest Increasing Subsequence (LIS). http://www-stat.stanford.edu/~cgates/PERSI/papers/Longest.pdf, and here - http://en.wikipedia.org/wiki/Patience_sorting.
The proof that following the greedy strategy actually gives us the length correctly has 2 parts -
proves that the number of piles is at least equal to the length of the LIS.
proves that the number of piles using greedy strategy is at most equal to the LIS.
Thus by virtue of both 1) and 2), the solution gives the length of LIS correctly.
I get the explanation for 1), but I just cannot intuitively realize part 2). Can someone may be use a different example to convince me that this is indeed true. Or, you could even use a different proof technique too.

I just read over the paper and I agree that the proof is a bit, um, terse. (I'd say that it's missing some pretty important steps!)
Intuitively, the idea behind the proof is to show that if you play with the greedy strategy and at the end of the game pick any card in a pile numbered p, you can find an increasing subsequence in the original array whose length is p. If you can prove this fact, then you can conclude that the maximum number of piles produced by the greedy strategy is the length of the longest increasing subsequence.
To formally prove this, we're going to argue that the following two invariants hold at each step:
The top cards in each pile, when read from left to right, are in sorted order.
At any point in time, every card in every pile is part of an increasing subsequence whose length is given by the pile index.
Part (1) is easy to see from the greedy strategy - every element is placed as far to the left as possible without violating the rule that smaller cards must always be placed on top of larger cards. This means that if a card is put into pile p, we are effectively taking a sorted sequence and reducing the value of the pth element to a value that's greater than whatever is in position p - 1 (if it exists).
To see part (2), we'll go inductively. The first placed card is put into pile 1, and it's also part of an increasing subsequence of length 1 (the card by itself). For the inductive step, assume that this property holds after placing n cards and consider the (n+1)st. Suppose that it ends up in pile p. If p = 1, then the claim still holds because this card forms an increasing subsequence of length 1 all by itself. Otherwise, p > 1. Now, look at the card on top of pile p - 1. We know that this card's value is less than the value of the card we just placed, since otherwise we would have placed the card on top of that pile. We also know that the card on top of that pile precedes the card we just placed in the original ordering, since we're playing the cards in order. By our existing invariant, we know that the card on top of pile p - 1 is part of an increasing subsequence of length p - 1, so that subsequence, with this new card added into it, forms an increasing subsequence of length p, as required.

Related

Maximum Subsequence Sum

Given an array of integers and a threshold value, determine the maximum sum of any subsequence of the array that is less than or equal to the threshold. For all but at most 15 of the elements, either array[i] >= 2*array[j] or array[j] >=2*array[i] where j!=i.
threshold can be up to to 10^17, the length of the array can be up to 60, and array[i] can be up to 10^16.
Here threshold is too high, so we cannot solve it by normal knapsack method. I tried it dividing this array into three parts, then getting lists of possible sums by brute force by backtracking and then merging three lists to find result. But I think there could be a more optimal way of doing this.

This problem has been carefully set up so that all usual approaches will run out of space. You have to use the hint.
Step 1, sort the array size descending then and divide it into into up to 15 "weird ones" and a chain of elements such that b1 >= 2*b2, b2 >= 2*b3 and so on.
You do that by taking the largest into your chain, then sticking weird ones into the weird array until you find one half the size, add that to the chain, stick weird ones into the weird array, and so on.
Now for each of the up to 32768 subsets of the weird ones, try to figure out which subset of the rest gets you closest. However you can use the following observation. For any element that you have a choice about including, either it is too big to include, or it must be included. (Because if you don't include it, then all the rest together will give you a smaller number.) That gives you a maximum of 45 decision points to consider.
In other words
for each subset of weird ones:
for each element of the chain
If we can add this element:
Add it to the set we are looking at
if sum(this set) is best so far, improve our max
return the best found.

Can this greedy algorithm be more efficient?

I'm studying for my exam coming up and I am practicing a problem that wants me to implement a greedy algorithm.
I am given an unsorted array of different weights where 0 < weight_i for all i. I have to place all of them such that I use the least number of piles. I can not place two weights in a pile where the one on top is greater than the one below. I also have to respect the ordering of the weights, so they must be placed in order. There is no height limit for the pile.
An example: If I have the weights {53, 21, 40, 10, 18} I cannot place 40 above 21 because the pile must be in descending order, and I cannot place 21 above 40 because that does not respect the order. An optimal solution would have pile 1: 53, 21, 10 and pile 2: 40 18
My general solution is iterate through the array and always pick the first pile the weight is allowed to go. I believe this would give me an optimal solution (although I haven't proved it yet). I could not find a counter example to this. But this would be O(n^2) because worst case I have to iterate through every element and every pile (I think)
My question is, is there a way to get this down to O(n) or O(nlogn)? If there is I'm just not seeing it and need some help.

Your algorithm will give a correct result.
Now note the following: when visiting the piles in order and stopping at the first one where the next value can be stacked, you will always have a situation where the stacks are ordered by their current top (last) value, in ascending order.
You can use this property to avoid an iteration of the piles from "left to right". Instead use a binary search, among the piles, to find that first pile that can take the next value.
This will give you a O(nlogn) time complexity.

Believe it or not, the problem you describe is equivalent to computing the length of the longest increasing subsequence. There's a neat little greedy idea as to why.
Consider the longest increasing subsequence (LIS) of the array. Because the elements are ascending in index and also ascending in value, they must all be in different piles. As a result the minimum number of piles needed is equal to the number of elements in the LIS.
LIS is easily solvable in O(NlogN) using dynamic programming and a binary search.
Note that the algorithm you describe does the same thing as the algorithm below - it finds the first pile you can put the item on (with binary search), or it creates a new pile, so this serves as a "proof" of correctness for your algorithm and a way to reduce your complexity.
Let dp[i] be equal to the minimum value element at the end of an increasing subsequence of length (i + 1). To reframe it in terms of your question, dp[i] would also be equal to the weight of the stone on the ith pile.
from bisect import bisect_left
def lengthOfLIS(nums):
arr = []
for i in range(len(nums)):
idx = bisect_left(arr, nums[i])
if idx == len(arr):
arr.append(nums[i])
else:
arr[idx] = nums[i]
return len(arr)

Reverse Huffman's algorithm?

I have a problem simlar to Huffman's encoding, I'm not sure exactly how it can be solved or if it is a reverse Huffman's encoding. But it definitely can be solved using a greedy approach.
Consider a set of length, each associated with a probability. i.e.
X={a1=(100,1/4),a2=(500,1/4),a3=(200,1/2)}
Obviously, the sum of all the probabilities = 1.
Arrange the lengths together on a line one after the other from a starting point.
For example: {a2,a1,a3} in that order from start to finish.
Define the cost of an element a_i as its the total length from the starting line to the end of this element multiplied by its probability.
So from the previous arrangement:
cost(a2) = (500)*(1/4)
cost(a1) = (500+100)*(1/4)
cost(a3) = (500+100+200)*(1/2)
Define the total cost as the sum of all costs. e.g. cost(X) = cost(a2) + cost(a1) + cost(a3). Give an algorithm that finds an arrangement that minimizes cost(X)
I've tried forming some alternative huffman trees but it doesn't work.
Sorting by probability will fail (consider X={(100,0.4),(300,0.6)}).
Sorting by length will also fail (consider X={(100,0.1),(300,0.9)}).
If anyone can help or hint towards an optimal solution algorithm, it would be great.

Consider what happens if you swap two adjacent elements. The calculations before and after the two elements are the same, so it just depends on the two elements.
Taking two elements in isolation, the costs are P1L1 + P2(L1 + L2) and P2L2 + P1(L1 + L2). If you subtract this and simplify if I have got the algebra right you want to swap 1 to first when L1/P1 < L2/P2. Check - this at least gets the right answer when L1 = 0.
So I think you want to sort the elements into increasing order of Li/Pi, because if that is not the case you can improve the answer by swapping adjacent elements.

Branching Factor and Depth

Question
A simple two-player game involves a pile of N matchsticks and two
players who have alternating turns. In each turn, a player removes 1,
2 or 3 matchsticks from the pile. The player who removes the last
matchstick loses the game.
A) What are the branching factor and depth of the game tree (give a general solution expressed in terms of N)? How large is the search
space?
B) How many unique states are there in the game? For large N what could be done to make the search more efficient?
Answer
A) I said the branching factor would be 3 but I justified this because the player could only ever remove up to 3 matches, meaning our tree would usually have three children. The second part with regards to the depth, I'm not sure.
B) N x 2 where N is the number of matches remaining. I am not sure how we could make the search more efficient though? Maybe introducing Alpha-beta pruning?

A :
For the depth, just imagine what the longest possible game would look like. It is the game that consists of both players only removing 1 match in each turn. Since there are n matches, such a game would take n turns : the tree has depth n.
B :
There are only 2*N states, each of them accessible from 3 states of higher matchstick count. Since the number of matches necessarily goes down as the game goes on, the graph of possible states is a DAG (Directed Acyclic Graph). A dynamic programming method is therefore possible to analyze this game. In the end, you will see that the optimal move only depends on N mod 4, with N the number of remaining matches.
EDIT : Proof idea for the N mod 4 :
Every position is either a losing or a winning position. A losing position is a situation where no matter what you play, if your adversary plays optimally, you will lose. Similarly, a winning position is a situation where if you play the right moves, the adversary cannot win. N=1 is a losing position (by definition of the game). Therefore, N=2,3,4 are winning positions because by removing the right amount of matches you put the adversary in a losing position. N=5 is a losing position because no matter what admissible number of matches you remove, you put the adversary in a winning position. N=6,7,8 are winning positions ... you get the idea.
Now it is just about making this proof formal : take as hypothesis that a position N is a losing position if and only if N mod 4 = 1. If that is true up to some integer k, you can prove that it is true for k+1. It's true up to k = 4 as we showed earlier. By recurrence, it is true for any N.

The state of the game at any time can be described by whose turn it is and the number of matches held by each player. After n moves there are 3^n possible histories, but for large n, many fewer than 3^n possible states, so you can save time by, for example, recognising that you are about to encounter a state that you have already encountered and worked out a value for before.
See also https://en.wikipedia.org/wiki/Nim - if this is Nim, or a variety of Nim, there are efficient strategies already worked out for it.

anagram string edit distance algorithm/code?

There are two anagram strings S and P. There are two basic operations:
Swap two letters that are in neighborhood, e.g, swap "A" and "C" in BCCAB, cost is 1.
Swap the first letter and the last letter in the string, cost is 1.
Question: Design an efficient algorithm that minimize the cost to change S to P.
I tried a greedy algorithm, but I found counter examples and I think it is incorrect. I know famous DP problem edit distance, but I did not get the formula for this one.
Anyone can help? An idea and pseudo code would be great.

I wonder if http://en.wikipedia.org/wiki/A*_search_algorithm would count as efficient? For a heuristic, look for the smallest distance each character has to go, treating the string as a circle, and divide the sum of these distances by two. On the circle, each character needs to participate in enough swaps to move it, one step at a time, to its destination, and each swap affects only two characters, so this heuristic should be a lower bound to the number of swaps required.

Without the ends-swap the answer is simple: you have to get the first and last letter right, and there's no way to "save" by doing it later; hence for word ai where 0 <= i < n you'd "bubble" the correct a0 and an-1 in place, then repeat for the word ai where 1 <= i < n-1 until you're left with 0 or 1 letters.
With the ends-swap option, you're left with much harder problem, since there are two directions where each letter can arrive in the correct place. You'd basically have a bipartite graph between source and target word, and you'd want to find a matching that minimizes the sum of distances. Even that is not really an algorithm, since each swap moves two of the letters, not just one.
Bottom line is, you may have to do a search, but at least you can bound the search with the no-ends-swap distance.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio