How to choose the next center in the longest palindrome algorithm? - algorithm

This is a question about the longest palindrome algorithm discussed here some time ago. The quoted blog post, which explains the algorithm, says: "to pick the next center, take the center of the longest palindromic proper suffix of the current palindrome". Unfortunately, they don't provide a proof and I don't really understand why the next center is the center of the longest palindromic proper suffix of the current palindrome.
Can anybody prove/explain it ?

So we're moving to the right...
Say your "current" palindrome is 40 letters big. (Maybe centered on say position 100.) You're trying to find a bigger one.
(OK, there might be a bigger one that is 900 letters long, and that is 50,000 letters to the right -- totally uninvolved with this one. That's fine. But we'll get to that in the future. For now, we have to move the center to the right while looking for longer-than-40 palindromes. Makes sense?)
So we have to move to the right - we could move one step. BUT we want to move as far as possible without missing any.
Now, if the next one to the right is going to include this one...in fact, it has to include the right-most letter of this group of 40. (It can't be further to the left, as we've already checked, so, it must center after 100, and, because it's going to be longer than 40, it must include our right-hand letter, #120.)
So how far back do we have to go?
Well, you can't go back (from 120) further than a palindrome! If it's not a palindrome in the middle it will never be a palindrome.
3333333333333331110111
You can only go "back" to the 0. the 1 sitting to the left of the 0 (for example), could never be a palindrome.
So it's that simple. You have to include our rightmost letter (if we're going to include any of us at all), and, you want it to be as big as possible, and it has to be a palindrome because palindromes can only start (I mean "from the middle") with palindromes.
in the example above it's not possible that the 1 to the left or the 0, or let's say the right-most 3, could ever in this universe center a palindrome, no matter what we later find on the right. They don't have palindromes around them, so they could "never be" a palindrome center!
Note that the 3 in the middle of the 3s could possibly center a bigger palindrome .... but don't forget we've already checked this is the longest palindrome so far (based on centers, from the left), so that cannot be true.
So any palindrome that is longer than this one -- rather, the next possible starting point for a palindrome longer than this one -- is that 0.
In other words, it's simply the center of the biggest palindrome we currently have at the right. (so, not the "111" which is a palindrome but short, but the "1110111" which is the longest palindrome you can see stuck on the right.)
Indeed, the two possibilities we have to check are (A) the "0" and (B) the "1" at the second-last spot. of course, among those two possibilties, we have to go from left to right, so (A) the "0" is indeed the next one to check.
Don't forget those two (the 0 and the 1 in question) are equivalent to saying "there's a palindrome 1110111 stuck to the end, and there's a shorter palindrome 111 stuck to the end".
Of course 1110111 is longer, so the center of 1110111 is obviously to the left of the center of 111.
The longest palindrome stuck to the right, will of course have the center closest the left.
So hopefully that makes clear JUST the specific part of the discussion on the linked blog, which, you asked about!!! I deliberately repeated myself in a number of ways, hopefully it helps. It's Jungian algorithms day :)
Again please note I am specifically and only trying to clarify the very specific issue Michael asked about.
Bloody confusing eh?
BTW, I simply ignored the issue of on-character off-character centers - but it is irrelevant to understanding what you asked about.

Related

Need an explanation of Grundy numbers from Competitive Programming Handbook

I am trying to understand the example from the book https://cses.fi/book/book.pdf at page 239 .
The example is described as follows:
What I don't get is just what exactly, say, number 3 next to lower right corner means, we can move 4 steps up and 3 steps left from it, how is it 3? Same for 4 just above it, it doesn't correspond to any set of moves I can think of. The book in general makes a lot of leaps of logic they think are obvious but usually I can infer what they mean after some time, here I am just lost.
The rule for computing these numbers is recursive.
You consider all the values you can reach, and then pick the smallest (non-negative) integer that is not reachable.
For example, the value in the top-left corner is 0 because no moves are possible.
For example, the value next to lower right is 3 because the reachable values are 0,4,1,0,2,1,4 so 3 is the smallest integer not in this list.
This explains how to compute the numbers, but to understand them it is probably better to start with understanding the game of Nim. In the game of Nim, the Sprague Grundy number for a pile is simply equal to the size of a pile.

Is there a better algorithm for finding the longest sequence of a same letter in a string?

I've been challenging myself to look at algorithms and try to change them in ordem to make them the fastest i can. Recently i tried an algorithm which searches for the longest sequence of any letter on a string. The naive answer looks at all letters and when the current sequence is bigger than the biggest sequence found, the new biggest become the current. Example:
With C for current sequence and M for maximum sequence, order of letters checked and variables updates goes like this:
AAAACCDDD-> A(C=1,M=1)->A(C=2,M=2)->A(C=3,M=3)->A(C=4,M=4)->C(C=1,M=4)->C(C=2,M=4)->D(C=1,M=4)->D(C=2,M=4)->D(C=3,M=4) Answer: 4 It can be faster by stopping when there is no way to get a new biggest sequence given M,the place you are in the string and the string size.
I've tried and came up with an algorithm which usually accesses less elements of the string, I think will be easier to explain like this:
Instead of jumping 1 by 1, you jump what would be necessary to have a new biggest sequence if all letters across the jump were the same. So for example after you read AAAB, you would jump 3 spots because you suppose all 3 next letters are B (AAABBBB). Of course they might not be, and that is why you now go backwards counting consecutive B's right behind your position. Your next "jump" will be lower depending on how many B's you've found. So for instance
AAABCBBBBD after the jump you are in the third B. You go backwards and find one B, backwards again and finding a C you stop. Now you already know you have a sequence of 2 so your next jump can't be of 3 -you might miss a sequence of 4 B's. So you jump 2 and get to a B. Go backwards one and find a B. The next backwards position is where you started so you know that you found a sequence of 4.
In that example it didnt have much of a difference but if you use instead a string like AAABBBBCDDECEE you can see that after you jumped from the first C to the last C you would only need to backtrack once because after seeing that the letter behind you is E you don't care anymore about what was across that jump.
I've coded both methods and that second one has been 2 to 3 times faster. Now I'm really curious to know, is there a faster way to find it?

anagram string edit distance algorithm/code?

There are two anagram strings S and P. There are two basic operations:
Swap two letters that are in neighborhood, e.g, swap "A" and "C" in BCCAB, cost is 1.
Swap the first letter and the last letter in the string, cost is 1.
Question: Design an efficient algorithm that minimize the cost to change S to P.
I tried a greedy algorithm, but I found counter examples and I think it is incorrect. I know famous DP problem edit distance, but I did not get the formula for this one.
Anyone can help? An idea and pseudo code would be great.
I wonder if http://en.wikipedia.org/wiki/A*_search_algorithm would count as efficient? For a heuristic, look for the smallest distance each character has to go, treating the string as a circle, and divide the sum of these distances by two. On the circle, each character needs to participate in enough swaps to move it, one step at a time, to its destination, and each swap affects only two characters, so this heuristic should be a lower bound to the number of swaps required.
Without the ends-swap the answer is simple: you have to get the first and last letter right, and there's no way to "save" by doing it later; hence for word ai where 0 <= i < n you'd "bubble" the correct a0 and an-1 in place, then repeat for the word ai where 1 <= i < n-1 until you're left with 0 or 1 letters.
With the ends-swap option, you're left with much harder problem, since there are two directions where each letter can arrive in the correct place. You'd basically have a bipartite graph between source and target word, and you'd want to find a matching that minimizes the sum of distances. Even that is not really an algorithm, since each swap moves two of the letters, not just one.
Bottom line is, you may have to do a search, but at least you can bound the search with the no-ends-swap distance.

"Charge changing" algorithm

First of all I'm not sure how to name this problem. If anyone have better idea feel free to change it or tell that I do so.
Let's say I have two strings s1, s2 containing '+' and '-', which means positive and negative charge.
s1 is our begin input, s2 is pattern we want to get from s1. Our only operation is that we can change charge into opposite. But when we do so not only chosen charge is being changed but also charges next to one that we choose (left and right, besides first and last character since one of them do not have left and other right).
When it's not possible to get from s1 to s2.
How to find minimum amount of charge changes to transform from s1 to s2.
I believe the only one is when we have string length of 2 and in total amount '+'(or '-') is odd. For instance
in:"+-"
pattern:"++"
otherwise it's possible, but proof would be appreciated. As point 2 I have no idea, any hints are welcome.
Your intuition for when the problem is solvable isn't quite right. Half of all instances are insoluble whenever n = 2 (mod 3). One way to see this is by doing a few steps of reducing the appropriate system of equations (mod 2). Another way to see that there's some redundancy is to see that flipping the first, fourth, seventh, ... (n-1)st affects exactly the same set of characters as flipping the second, fifth, eight, ... nth.
As for an algorithm for solving these problems: There are two possible choices for the first flip. Once you've decided whether to flip around the first character, the value of the first character tells you whether you need to flip around the second character. Then the value of the second character tells you whether to flip around the third character. And so forth. So just try both possibilities. If neither one works, the problem's insoluble; if one works, report it; if both work, report the one that required fewer flips.

The perverse hangman problem

Perverse Hangman is a game played much like regular Hangman with one important difference: The winning word is determined dynamically by the house depending on what letters have been guessed.
For example, say you have the board _ A I L and 12 remaining guesses. Because there are 13 different words ending in AIL (bail, fail, hail, jail, kail, mail, nail, pail, rail, sail, tail, vail, wail) the house is guaranteed to win because no matter what 12 letters you guess, the house will claim the chosen word was the one you didn't guess. However, if the board was _ I L M, you have cornered the house as FILM is the only word that ends in ILM.
The challenge is: Given a dictionary, a word length & the number of allowed guesses, come up with an algorithm that either:
a) proves that the player always wins by outputting a decision tree for the player that corners the house no matter what
b) proves the house always wins by outputting a decision tree for the house that allows the house to escape no matter what.
As a toy example, consider the dictionary:
bat
bar
car
If you are allowed 3 wrong guesses, the player wins with the following tree:
Guess B
NO -> Guess C, Guess A, Guess R, WIN
YES-> Guess T
NO -> Guess A, Guess R, WIN
YES-> Guess A, WIN
This is almost identical to the "how do I find the odd coin by repeated weighings?" problem. The fundamental insight is that you are trying to maximise the amount of information you gain from your guess.
The greedy algorithm to build the decision tree is as follows:
- for each guess, choose the guess which for which the answer is "true" and which the answer is "false" is as close to 50-50 as possible, as information theoretically this gives the most information.
Let N be the size of the set, A be the size of the alphabet, and L be the number of letters in the word.
So put all your words in a set. For each letter position, and for each letter in your alphabet count how many words have that letter in that position (this can be optimised with an additional hash table). Choose the count which is closest in size to half the set. This is O(L*A).
Divide the set in two taking the subset which has this letter in this position, and make that the two branches to the tree. Repeat for each subset until you have the whole tree. In worst case this will require O(N) steps, but if you have a nice dictionary this will lead to O(logN) steps.
This isn't strictly an answer, since it doesn't give you a decision tree, but I did something very similar when writing my hangman solver. Basically, it looks at the set of words in its dictionary that match the pattern and picks the most common letter. If it guesses wrong, it eliminates the largest number of candidates. Since there's no penalty to guessing right in hangman, I think this is the optimal strategy given the constraints.
So with the dictionary you gave, it would first guess a correctly. Then it would guess r, also correctly, then b (incorrect), then c.
The problem with perverse hangman is that you always guess wrong if you can guess wrong, but that's perfect for this algorithm since it eliminates the largest set first. As a slightly more meaningful example:
Dictionary:
mar
bar
car
fir
wit
In this case it guesses r incorrectly first and is left with just wit. If wit were replaced in the dictionary with sir, then it would guess r correctly then a incorrectly, eliminating the larger set, then w or f at random incorrectly, followed by the other for the final word with only 1 incorrect guess.
So this algorithm will win if it's possible to win, though you have to actually run through it to see if it does win.

Resources