Fewest toggles to create an alternating chain - algorithm

I'm trying to solve this problem on SPOJ : http://www.spoj.pl/problems/EDIT/
I'm trying to get a decent recursive description of the algorithm, but I'm failing as my thoughts keep spinning in circles! Can you guys help me out with this one? I'll try to describe what approach I'm trying to solve this.
Basically I want to solve a problem of size j-i where i is the starting index and j is the ending index. Now, there should be two cases. If j-i is even then both the starting and the ending letters have to be the same case, and they have to be the opposite case when j-i is odd. I also want to reduce the problem of a lower size (j-i-1 or j-i-2), but I feel that if I know a solution to a smaller problem, then constructing a solution of a just bigger problem should also take into account the starting and ending letter cases of the smaller problem. This is exactly where I'm getting confused. Can you guys put my thoughts on the right track?

I think recursion is not the best way to go with this problem. It can be solved quite fast if we take a different approach!
Let us consider binary strings. Say an uppercase char is 1 and a lowercase one is 0. For example
AaAaB -> 10101
ABaa -> 1100
a -> 0
a "correct" alternating chain is either 10101010.. or 010101010..
We call the minimum number of substitutions required to change one string into the other the Hamming distance between the strings. What we have to find is the minimum Hamming distance between the input binary string and one of the two alternating chains of the same length.
It's not difficult: we XOR each string and then count the number of 1s. (link). For example, let's consider the following string: ABaa.
We convert it in binary:
ABaa -> 1100
We generate the only two alternating chains of length 4:
1010
0101
We XOR them with the input:
1100 XOR 1010 = 0101
1100 XOR 0101 = 1010
We count the 1s in the results and take the minimum. In this case, it's 2.
I coded this procedure in Java with some minor optimization (buffered I/O, no real need to generate the alternating chains) and it got accepted: (0.60 seconds one).

Given any string s of length n, there are only two possible "alternating chain".
This 2 variants can be defined sequentially by settings the first letter state (if first is upper then second is lower, third is upper...).
A simple linear algorithm would be to make 2 simple assumptions about the first letter:
First letter is UpperCase
First letter is LowerCase
For each assumption, run a simple edit distance algorithm and you are done.

You can do it recursively, but you'll need to pass and return a lot of state information between functions, which I think is not worthwhile when this problem can be solved by a simple loop.
As the others say, there are two possible "desired result" strings: one starts with an uppercase letter (let's call it result_U) and one starts with a lowercase letter (result_L). We want the smaller of EditDistance(input, result_U) and EditDistance(input, result_L).
Also observe that, to calculate EditDistance(input, result_U), we do not need to generate result_U, we just need to scan input 1 character at a time, and each character that is not the expected case will need 1 edit to make it the correct case, i.e. adds 1 to the edit distance. Ditto for EditDistance(input, result_L).
Also, we can combine the two loops so that we scan input only once. In fact, this can be done while reading each input string.
A naive approach would look like this:
Pseudocode:
EditDistance_U = 0
EditDistance_L = 0
Read a character
To arrive at result_U, does this character need editing?
Yes => EditDistance_U += 1
No => Do nothing
To arrive at result_L, does this character need editing?
Yes => EditDistance_L += 1
No => Do nothing
Loop until end of string
EditDistance = min(EditDistance_U, EditDistance_L)
There are obvious optimizations that can be done to the above also, but I'll leave it to you.
Hint 1: Do we really need 2 conditionals in the loop? How are they related to each other?
Hint 2: What is EditDistance_U + EditDistance_L?

Related

Check if string includes part of Fibonacci Sequence

Which way should I follow to create an algorithm to find out whether fibonacci sequence exists in a given string ?
The string includes only digits with no whitespaces and there may be more than one sequence, I need to find all of them.
If as your comment says the first number must have less than 6 digits, you can simply search for all positions there one of the 25 fibonacci numbers (there are only 25 with less than 6 digits) and than try to expand this 1 number sequence in both directions.
After your update:
You can even speed things up when you are only looking for sequences of at least 3 numbers.
Prebuild all 25 3-number-Strings that start with one of the 25 first fibonnaci-numbers this should give much less matches than the search for the single fibonacci-numbers I suggested above.
Than search for them (like described above and try to expand the found 3-number-sequences).
here's how I would approach this.
The main algorithm could search for triplets then try to extend them to as long a sequence as possible.
This leaves us with the subproblem of finding triplets. So if you are scanning through a string to look for fibonacci numbers, one thing you can take advantage of is that the next number must have the same number of digits or one more digit.
e.g. if you have the string "987159725844" and are considering "[987]159725844" then the next thing you need to look at is "987[159]725844" and "987[1597]25844". Then the next part you would find is "[2584]4" or "[25844]".
Once you have the 3 numbers you can check if they form an arithmetic progression with C - B == B - A. If they do you can now check if they are from the fibonacci sequence by seeing if the ratio is roughly 1.6 and then running the fibonacci iteration backwards down to the initial conditions 1,1.
The overall algorithm would then work by scanning through looking for all triples starting with width 1, then width 2, width 3 up to 6.
I'd say you should first find all interesting Fibonacci items (which, having 6 or less digits, are no more than 30) and store them into an array.
Then, loop every position in your input string, and try to find upon there the longest possible Fibonacci number (that is, you must browse the array backwards).
If some Fib number is found, then you must bifurcate to a secondary algorithm, consisting of merely going through the array from current position to the end, trying to match every item in the following substring. When the matching ends, you must get back to the main algorithm to keep searching in the input string from the current position.
None of these two algorithms is recursive, nor too expensive.
update
Ok. If no tables are allowed, you could still use this approach replacing in the first loop the way to get the bext Fibo number: Instead of indexing, apply your formula.

reverse deterministic shuffle -> derive key

I'am looking for an algorithm with which it is possible to derive a key from an already happened shuffling-process.
Assume we've got the string "Hello" which was shuffled:
"hello" -> "loelh"
Now I would like to derive a key k from it which i could use to undo the shuffling. So if we use k as input parameter for a deterministic shuffling-algorithm like for example Fisher-Yates and shuffle "loelh" again, we would restore the initial string "hello".
What i do not mean is to simply use one and the same deterministic shuffling algorithm to shuffle and de-shuffle. That's because in my case the first string would not have been really shuffled in the classical sense. Actually there would be two sets of data (byte or bit-arrays) which are just given and we want to get from the first to the second one with just a key which has been derived before.
I hope it's clear what I want to achieve and I would appreciate all hints or proposed solutions.
Regards,
Merrit
UPDATE:
Another attemp:
basically, one could also call it deterministic transformation of a bunch of data e.g. a byte-array, but I will stick with the "hello"-string example.
Assume we've got a transformation-algorithm transform(data, "unknown seed") where data is "hello" and unknown seed is what we are looking for. The result of transform is "loelh". We are looking for this "unknown seed" which we could use to reverse the process. At the time of the "unknown seed"-generation, both, the input data AND the result are known of course.
Later on I want to use the "unknown seed" (which should be known already ;-) to get the original string again: so this transform("loelh", seed) should lead to "hello" again.
So you could also see it as a form of equation like data*["unknown value"]=resultdata and we are trying to find the unknown value (the operator * could be any kind of operation).
First of all, let's simplify the problem greatly. Instead of permuting "hello", let's assume that you are always permuting "abcde", as that will make it easier to understand.
A shuffle is the random generation of a permutation. How the shuffle generates the permutation is irrelevant; shuffles generate permutations, that's all we need to know.
Let's state a permutation as a string containing the numbers 1 through 5. Suppose the shuffle produces permutation "21453". That is, we take the first letter and put it in position 2: _a___. We take the second letter and put it in position 1, ba___. We take the 3rd letter and put it in position 5: ab__c. We take the fourth letter and put it in position 3, bad_c, and we take the fifth letter and put it in position 4, badec.
Now you wish to deduce a "key" which allows you to "unpermute" the permutation. Well, that's just another permutation, called the inverse permutation. To compute the inverse permutation of "21453" you do the following:
find "1". It's in the 2nd spot.
find "2". It's in the 1st spot.
find "3". It's in the 5th spot.
find "4". It's in the 3rd spot.
Find "5". It's in the 4th spot.
And now read down the second column; the inverse permutation of "21453" is "21534". We are unpermuting "badec". We put the first letter in position 2: _b___. We put the second letter in position 1: ab___. We put the third letter in position 4: ab_d_. We put the fourth letter in position 5: ab_de. And we put the fifth letter in position 3: abcde.
Shuffling is just creating a random permutation of a given sequence. The typical way to do that is something like the Fisher-Yates Shuffle that you pointed out. The problem is that the shuffle program generates multiple random numbers based on a seed, and unless you implement the random number generator there's no easy way to reverse the sequence of random numbers.
There is another way to do it. What if you could generate the nth permutation of a sequence directly? That is, given the string "Fast", you define the first few permutations as:
0 Fast
1 Fats
2 Fsat
3 Fsta
... etc. for all 24 permutations
You want a random permutation of those four characters. Select a random number from 0 to 23 and then call a function to generate that permutation.
If you know the key, you can call a different function, again passing that key, to have it reverse the permutation back to the original.
In the fourth article in his series on permutations, Eric Lippert showed how to generate the nth permutation without having to generate all of the permutations that come before it. He doesn't show how to reverse the process, but doing so shouldn't be difficult if you understand how the generator works. It's well worth the time to study the entire series of articles.
If you don't know what the key (i.e. the random number used) is, then deriving the sequence of swaps required to get to the original order is expensive.
Edit
Upon reflection, it just might be possible to derive the key if you're given the original sequence and the transformed sequence. Since you know how far each symbol has moved, you should be able to derive the key. Consider the possible permutations of two letters:
0. ab 1. ba
Now, assign the letter b the value of 0, and the letter a the value of 1. What permutation number is ba? Find a in the string, swap to the left until it gets to the proper position, and multiply the number of swaps by one.
That's too easy. Consider the next one:
0. abc 1. acb 2. bac
3. cab 4. bca 5. cba
a is now 2, b is 1, and c is 0. Given cab:
swap a left one space. 1x2 = 2. Result is `acb`
swap b left one space. 1x1 = 1. Result is `abc`
So cab is permutation #3.
This does assume that your permutation generator numbers the permutations in the same way. It's also not a terribly efficient way of doing things. Worst case will require n(n-1)/2 swaps. You can optimize the swaps by moving things in an array, but it's still an O(n^2) algorithm. Where n is the length of the sequence. Not terrible for 100 or maybe even 1,000 items. Pretty bad after that, though.

Algorithm for finding basis of a set of bitstrings?

This is for a diff utility I'm writing in C++.
I have a list of n character-sets {"a", "abc", "abcde", "bcd", "de"} (taken from an alphabet of k=5 different letters). I need a way to observe that the entire list can be constructed by disjunctions of the character-sets {"a", "bc", "d", "e"}. That is, "b" and "c" are linearly dependent, and every other pair of letters is independent.
In the bit-twiddling version, the character-sets above are represented as {10000, 11100, 11111, 01110, 00011}, and I need a way to observe that they can all be constructed by ORing together bitstrings from the smaller set {10000, 01100, 00010, 00001}.
In other words, I believe I'm looking for a "discrete basis" of a set of n different bit-vectors in {0,1}k. This paper claims the general problem is NP-complete... but luckily I'm only looking for a solution to small cases (k < 32).
I can think of really stupid algorithms for generating the basis. For example: For each of the k2 pairs of letters, try to demonstrate (by an O(n) search) that they're dependent. But I really feel like there's an efficient bit-twiddling algorithm that I just haven't stumbled upon yet. Does anyone know it?
EDIT: I ended up not really needing a solution to this problem after all. But I'd still like to know if there is a simple bit-twiddling solution.
I'm thinking a disjoint set data structure, like union find turned on it's head (rather than combining nodes, we split them).
Algorithm:
Create an array main where you assign all the positions to the same group, then:
for each bitstring curr
for each position i
if (curr[i] == 1)
// max of main can be stored for constant time access
main[i] += max of main from previous iteration
Then all the distinct numbers in main are your different sets (possibly using the actual union-find algorithm).
Example:
So, main = 22222. (I won't use 1 as groups to reduce possible confusion, as curr uses bitstrings).
curr = 10000
main = 42222 // first bit (=2) += max (=2)
curr = 11100
main = 86622 // first 3 bits (=422) += max (=4)
curr = 11111
main = 16-14-14-10-10
curr = 01110
main = 16-30-30-26-10
curr = 00011
main = 16-30-30-56-40
Then split by distinct numbers:
{10000, 01100, 00010, 00001}
Improvement:
To reduce the speed at which main increases, we can replace
main[i] += max of main from previous iteration
with
main[i] += 1 + (max - min) of main from previous iteration
EDIT: Edit based on j_random_hacker's comment
You could combine the passes of the stupid algorithm at the cost of space.
Make a bit vector called violations that is (k - 1) k / 2 bits long (so, 496 for k = 32.) Take a single pass over character sets. For each, and for each pair of letters, look for violations (i.e. XOR the bits for those letters, OR the result into the corresponding position in violations.) When you're done, negate and read off what's left.
You could give Principal Component Analysis a try. There are some flavors of PCA designed for binary or more generally for categorical data.
Since someone showed it as NP complete, for large vocabs I doubt you will do better than a brute force search (with various pruning possible) of the entire set of possibilities O((2k-1) * n). At least in a worst case scenario, probably some heuristics will help in many cases as outlined in the paper you linked. This is your "stupid" approach generalized to all possible basis strings instead of just basis of length 2.
However, for small vocabs, I think an approach like this would do a lot better:
Are your words disjoint? If so, you are done (simple case of independent words like "abc" and "def")
Perform bitwise and on each possible pair of words. This gives you an initial set of candidate basis strings.
Goto step 1, but instead of using the original words, use the current basis candidate strings
Afterwards you also need to include any individual letter which is not a subset of one of the final accepted candidates. Maybe some other minor bookeeping for things like unused letters (using something like a bitwise or on all possible words).
Considering your simple example:
First pass gives you a, abc, bc, bcd, de, d
Second pass gives you a, bc, d
Bookkeeping gives you a, bc, d, e
I don't have a proof that this is right but I think intuitively it is at least in the right direction. The advantage lies in using the words instead of the brute force's approach of using possible candidates. With a large enough set of words, this approach would become terrible, but for vocabularies up to say a few hundred or maybe even a few thousand I bet it would be pretty quick. The nice thing is that it will still work even for a huge value of k.
If you like the answer and bounty it I'd be happy to try to solve in 20 lines of code :) and come up with a more convincing proof. Seems very doable to me.

Find if any permutation of a number is within a range

I need to find if any permutation of the number exists within a specified range, i just need to return Yes or No.
For eg : Number = 122, and Range = [200, 250]. The answer would be Yes, as 221 exists within the range.
PS:
For the problem that i have in hand, the number to be searched
will only have two different digits (It will only contain 1 and 2,
Eg : 1112221121).
This is not a homework question. It was asked in an interview.
The approach I suggested was to find all permutations of the given number and check. Or loop through the range and check if we find any permutation of the number.
Checking every permutation is too expensive and unnecessary.
First, you need to look at them as strings, not numbers,
Consider each digit position as a seperate variable.
Consider how the set of possible digits each variable can hold is restricted by the range. Each digit/variable pair will be either (a) always valid (b) always invalid; or (c) its validity is conditionally dependent on specific other variables.
Now model these dependencies and independencies as a graph. As case (c) is rare, it will be easy to search in time proportional to O(10N) = O(N)
Numbers have a great property which I think can help you here:
For a given number a of value KXXXX, where K is given, we can
deduce that K0000 <= a < K9999.
Using this property, we can try to build a permutation which is within the range:
Let's take your example:
Range = [200, 250]
Number = 122
First, we can define that the first number must be 2. We have two 2's so we are good so far.
The second number must be be between 0 and 5. We have two candidate, 1 and 2. Still not bad.
Let's check the first value 1:
Any number would be good here, and we still have an unused 2. We have found our permutation (212) and therefor the answer is Yes.
If we did find a contradiction with the value 1, we need to backtrack and try the value 2 and so on.
If none of the solutions are valid, return No.
This Algorithm can be implemented using backtracking and should be very efficient since you only have 2 values to test on each position.
The complexity of this algorithm is 2^l where l is the number of elements.
You could try to implement some kind of binary search:
If you have 6 ones and 4 twos in your number, then first you have the interval
[1111112222; 2222111111]
If your range does not overlap with this interval, you are finished. Now split this interval in the middle, you get
(1111112222 + 222211111) / 2
Now find the largest number consisting of 1's and 2's of the respective number that is smaller than the split point. (Probably this step could be improved by calculating the split directly in some efficient way based on the 1 and 2 or by interpreting 1 and 2 as 0 and 1 of a binary number. One could also consider taking the geometric mean of the two numbers, as the candidates might then be more evenly distributed between left and right.)
[Edit: I think I've got it: Suppose the bounds have the form pq and pr (i.e. p is a common prefix), then build from q and r a symmetric string s with the 1's at the beginning and the end of the string and the 2's in the middle and take ps as the split point (so from 1111112222 and 1122221111 you would build 111122222211, prefix is p=11).]
If this number is contained in the range, you are finished.
If not, look whether the range is above or below and repeat with [old lower bound;split] or [split;old upper bound].
Suppose the range given to you is: ABC and DEF (each character is a digit).
Algorithm permutationExists(range_start, range_end, range_index, nos1, nos2)
if (nos1>0 AND range_start[range_index] < 1 < range_end[range_index] and
permutationExists(range_start, range_end, range_index+1, nos1-1, nos2))
return true
elif (nos2>0 AND range_start[range_index] < 2 < range_end[range_index] and
permutationExists(range_start, range_end, range_index+1, nos1, nos2-1))
return true
else
return false
I am assuming every single number to be a series of digits. The given number is represented as {numberOf1s, numberOf2s}. I am trying to fit the digits (first 1s and then 2s) within the range, if not the procudure returns a false.
PS: I might be really wrong. I dont know if this sort of thing can work. I haven't given it much thought, really..
UPDATE
I am wrong in the way I express the algorithm. There are a few changes that need to be done in it. Here is a working code (It worked for most of my test cases): http://ideone.com/1aOa4
You really only need to check at most TWO of the possible permutations.
Suppose your input number contains only the digits X and Y, with X<Y. In your example, X=1 and Y=2. I'll ignore all the special cases where you've run out of one digit or the other.
Phase 1: Handle the common prefix.
Let A be the first digit in the lower bound of the range, and let B be the first digit in the upper bound of the range. If A<B, then we are done with Phase 1 and move on to Phase 2.
Otherwise, A=B. If X=A=B, then use X as the first digit of the permutation and repeat Phase 1 on the next digit. If Y=A=B, then use Y as the first digit of the permutation and repeat Phase 1 on the next digit.
If neither X nor Y is equal to A and B, then stop. The answer is No.
Phase 2: Done with the common prefix.
At this point, A<B. If A<X<B, then use X as the first digit of the permutation and fill in the remaining digits however you want. The answer is Yes. (And similarly if A<Y<B.)
Otherwise, check the following four cases. At most two of the cases will require real work.
If A=X, then try using X as the first digit of the permutation, followed by all the Y's, followed by the rest of the X's. In other words, make the rest of the permutation as large as possible. If this permutation is in range, then the answer is Yes. If this permutation is not in range, then no permutation starting with X can succeed.
If B=X, then try using X as the first digit of the permutation, followed by the rest of the X's, followed by all the Y's. In other words, make the rest of the permutation as small as possible. If this permutation is in range, then the answer is Yes. If this permutation is not in range, then no permutation starting with X can succeed.
Similar cases if A=Y or B=Y.
If none of these four cases succeed, then the answer is No. Notice that at most one of the X cases and at most one of the Y cases can match.
In this solution, I've assumed that the input number and the two numbers in the range all contain the same number of digits. With a little extra work, the approach can be extended to cases where the numbers of digits differ.

How to compute palindrome from a stream of characters in sub-linear space/time?

I don't even know if a solution exists or not. Here is the problem in detail. You are a program that is accepting an infinitely long stream of characters (for simplicity you can assume characters are either 1 or 0). At any point, I can stop the stream (let's say after N characters were passed through) and ask you if the string received so far is a palindrome or not. How can you do this using less sub-linear space and/or time.
Yes. The answer is about two-thirds of the way down http://rjlipton.wordpress.com/2011/01/12/stringology-the-real-string-theory/
EDIT: Some people have asked me to summarize the result, in case the link dies. The link gives some details about a proof of the following theorem: There is a multi-tape Turing machine that can recognize initial non-trivial palindromes in real-time. (A summary, also provided by the article linked: Suppose the machine has read x1, x2, ..., xk of the input. Then it has only constant time to decide if x1, x2, ..., xk is a palindrome.)
A multitape Turing machine is just one with several side-by-side tapes that it can read and write to; in a very specific sense it is exactly equivalent to a standard Turing machine.
A real-time computation is one in which a Turing machine must read a character from input at least once every M steps (for some bounded constant M). It is readily seen that any real-time algorithm should be linear-time, then.
There is a paper on the proof which is around 10 pages which is available behind an institutional paywall here which I will not repost elsewhere. You can contact the author for a more detailed explanation if you'd like; I just had read this recently and realized it was more or less what you were looking for.
You could use a rolling hash, or more rolling hashes for accuracy. Incrementally compute the hash of the characters read so far, in the order they were read, and in reverse order of reading.
If your hash function is x*3^(k-1)+x*3^(k-2)+...+x*3^0 for example, where x is a character you read, this is how you'd do it:
hLeftRight = 0
hRightLeft = 0
k = 0
repeat until there are numbers in the stream
x = stream.Get()
hLeftRight = 3*hLeftRight + x.Value
hRightLeft = hRightLeft + 3^k*x.Value
if (x.QueryPalindrome = true)
yield hLeftRight == hRightLeft
k = k + 1
Obviously you'd have to calculate the hashes modulo something, probably a prime or a power of two. And of course, this could lead to false positives.
Round 2
As I see it, with each new character, there are three cases:
Character breaks potential symmetry, for example, aab -> aabc
Character extends the middle, for example aab -> aabb
Character continues symmetry, for example aab->aaba
Assume you have a pointer that tracks down the string and points to the last character that continued a potential palindrome.
(I am going to use parenthesis to indicate a pointed at character)
Lets say you are starting with aa(b) and get an:
'a' (case 3), you move the pointer to
the left and check if it's an 'a' (it
is). You now have a(a)b.
'c' (case 1), you are not expecting a 'c', in this case you start back at the beginning and you now have aab(c).
The really tricky case is 2, because somehow you have to know that the character you just got isn't affecting symmetry, it is just extending the middle. For this, you have to hold an additional pointer that tracks where the plateau's (middle's) edge lies. For example, you have (b)baabb and you just got another 'b', in this case you have to know to reset the pointer to the base of the middle plateau here: bbaa(b)bb. Since we are going for constant time, you have to hold a pointer here to begin with (you can't afford the time to search for the plateau's edge). Now if you get another 'b', you know that you are still on the edge of that plateau and you keep the pointer where it is, so bbaa(b)bb -> bbaa(b)bbb. Now, if you get an 'a', you know that the 'b's are not part of the extended middle and you reset both pointers (The tracking pointer and the edge pointer) so you now have bbaabbbb((a)).
With these three cases, I think all bases are covered. If you ever want to check if the current string is a palindrome, check if the first pointer (not the plateau's edge pointer) is at index 0.
This might help you:
http://arxiv.org/pdf/1308.3466v1.pdf
If you store the last $k$ many input symbols you can easily find palindromes up to a length of $k$.
If you use the algorithms of the paper you can find the midpoints of palindromes and an length estimate of its length.

Resources