Does this DFA have a solution? - formal-languages

I trying to create a DFA that can recognize strings with alphabet {a,b,c} where a and c appear a even number of times and where b appears an uneven number of times.
I am wondering that this may only be expressed with other mathods such as turing machine or context-free languages.
You might find it fun to think of the solution.

The way I would go about constructing such a machine is as follows.
Make eight states. Each state represents a possible 3 tuple combination. The start state is a state representing the combination where all three are even. If a is the first character in the input, then you would to a state that represents an odd number of a's and even number of b's and c's. The accept state is where a and c are even, and b is odd.

This is possible using a DFA simply have a state for each of the combinations of an odd number of a's, b's and c's. So if you're in the state with even # of a's, odd # of b's and an even # of c's then you can accept. You can also define simple transitions for any of the other cases. So naively this can be done with 8 states.

Related

Conditional Randomization

Imagine there is a list of elements as follow:
1a, 2a, 3a, 4a, 5b, 6b, 7b, 8b
Now we need to randomize it such that not more than 2 "a"s or 2 "b"s get next to each other. For instance the following list is not allowed because of the 2nd, third and fourth elements:
3a, 7b, 8b, 5b, 2a, 1a, 5b, 4a
How can we write write an efficient code without generating many random sequences and many triad comparisons?
Create two bins, one for the a's and one for the b's. Pick from a random bin and record the bin. Pick a second number from a random bin. If the bin is not the same as before just record the bin. If the bin is the same as before then force the next pick to be from the other bin. Carry on forward, only forcing a bin when you have two picks in succession from the same bin.
I'm going to assume that:
There are only two kinds of element, a and b, and
There aren't "too many" of either kind (say, less than 30) or that you're willing to use a bignum package.
The basic idea is to (conceptually) first construct a valid sequence of as and bs, and then randomly assign the actual elements to the as and bs in the sequence. In practice, you could do both of these steps in parallel; every time you add an a to the sequence, you select a random a element from the set of such elements not yet assigned, and similarly with b elements.
The (slightly) complicated part is constructing the valid sequence without bias, and that's what I'm going to focus on.
As is often the case, the key is to be able to count the number of possible sequences, in a way which leads to an enumeration. We don't actually enumerate the possibilities -- that would take really a long time for even moderately long sequences -- but we do need to know for every prefix how to enumerate the sequences starting with that prefix.
Rather than produce the sequence element by element, we'll produce it in chunks of one or two elements of the same kind. Since we don't allow more than two consecutive elements of the same kind, the final sequence must be a series of alternating chunks. In effect, at every point except the very beginning, the choice is whether to select one or two of the "other" kind. At the beginning, we must select one or two of either kind, so we must first choose the starting kind, after which all the kinds are fixed; we merely need a sequence of 1's and 2's -- representing one element or two elements of the same kind -- with the kind alternating at each step. The sequence of 1s and 2s is constrained by the fact that we know how many elements there are of each kind, which corresponds to the sum of the numbers in the even and odd positions of the {1,2}-sequence.
Now, let's define f(m,n) as the count of sequences whose even and odd sums are m and n. (Using CS rather than maths rules, we'll assume that the first position is 0 (even) but it actually makes absolutely no difference.) Suppose that we have 6 as and 4 bs. There are then f(6,4) sequences which start with an a, and f(4,6) sequences which start with a b, so that the total count of valid sequences is f(6,4)+f(4,6).
Now, suppose we need to compute f(m,n). Assuming m is large enough, we have exactly two options: choose one of the m elements of the even kind or choose two of the m elements of the even kind. After that, we will swap even and odd because the next choice applies to the other kind.
That rather directly leads to the recursion
f(m, n) = f(n, m-1) + f(n, m-2)
which we might think of as a kind of two-dimensional fibonacci recursion. (Recall that fib(m) = fib(m-1) + fib(m-2); the difference here is the second argument, and the fact that the argument order flip-flops at each recursion.
As with Fibonacci numbers, computing the values naively without memoization leads to exponential blow-up of recursive calls, and a more efficient strategy is to compute the entire table starting from f(0,0) (which has the value 1, obviously); in essence, a dynamic programming approach. We could also just do the recursive computation with memoization, which is slightly less efficient but possibly easier to read.
For now, let's just assume that we've arranged for the computation of f(m,n) to be suitably fast, either because we've prebuilt the entire array of possibilities up to the largest values of m and n we will need, or because we're using a memoizing recursive solution so that we only need to do the slow computation once for any given m,n. Now let's construct the random sequence.
Suppose there are na a-elements and nb b-elements. Since we don't know whether the random sequence will start with an a or a b, we need to first make that decision. We know there are f(na,nb) valid sequences which start a and f(nb,na) valid sequences starting with a b, so we start by generating a random non-negative integer less than f(na,nb) + f(nb,na). If the random is less than f(na,nb) then we'll start with a-elements; otherwise we'll start with b elements.
Having made that decision, we'll proceed as follows. We know what the next element kind is and how many elements remain of each kind, so we only need to know whether to select one or two elements of the correct kind. To make that choice, we generate a non-negative random integer less than f(m, n); if it is less than f(n, m-1) then we select one element; otherwise we select two elements. Then we swap the element sets, fix the counts, and continue until m and n are both 0.

TopCoder algorithm: BrightLamps -- n lamp

The problem is here.
I have read others solution and ideas, but I don't understand them.
The following is mnbvmar's idea, can you explain it in more detail and
do you think it is right?
If you flip two consecutive K-element subsequences (that is, (i, i + 1, ..., i + K - 1) and (i + 1, ..., i + K)), only two lamps get toggled (i and i + K).
Now it's crucial to note that every combination that is possible to do with K-element flips, can also be done with the following two:
toggle first K lamps,
toggle any two lamps distant by K.
(each toggling of K consecutive lamps is a compound of these operations). It's much simpler now. Of course, each operation can be done zero times or once (it's no use doing it more than once).
Let's say we haven't used first operation in optimal solution. Then our sequence dissolves into K sequences of elements distant by K, in which you can toggle two consecutive elements. This is easy to solve; if i-th sequence has an even number of turned off lamps, you can turn them all on; in opposite case it's obvious we can't turn them all on, but it's possible to leave off only the least valuable lamp in the sequence.
What if we use the first operation? Then we just toggle first K lamps and then do exactly the same steps as before.
Let's say we have 10 lamps A-J and k=4. Then we can flip any of the X-sequences below:
ABCDEFGHIJ
XXXX //flip starting at A, changes A,B,C,D
XXXX //flip starting at B, changes B,C,D,E
XXXX //...
XXXX
XXXX
XXXX
.
If we execute the flip sequence starting at lamp "S" and right after it the one starting at the lamp right of S, eg. D and E, we can see than some lamps are flipped two times (ie. stay unchanged) and only the first and last affected one are permanently changed:
ABCDEFGHIJ
XXXX //this
XXXX //and this together
X X //is the same as this alone
After both flip sequences, only D and H are different from before;
E,F,G are flipped two times = unchanged. So, using this pattern,
it's possible to change only two lamps with k distance (4 in this case),
instead of k lamps.
.
Now, using this pattern on A changes E too, and using it on E changes I. Using it on any other lamp than A,E,I won´t affect A,E,I. ... In other words, it´s possible to split all (10) lamps in k (4) groups:
A,E,I
B,F,J
C,G
D,H
which are completely independent, ie. calculating the optimal solution for each group independly will give us the total optimal solution (as long as we remember to use the switch pattern above).
.
Within one group, let's take AEI, we could think of the same pattern with a smaller scale, ie. A affects E and E affects I (as said above), and the possible flip patterns are
ABCDE
xx
xx
xx
xx
Maybe you see now that, with chaining these patterns, we can flip any two lamps in this group, even if they are not consecutive, as long as it's two lamps. Ie. AB or AC or AD or AE or BC etc.
Now, getting to the real data, if the number of switched-off lamps is even, you can switch on pairs like described above until all lamps are un, and then that group is finished. If the number was uneven, select the lamp with the lowest value as the only one that remains off.

Is there an algorithm for choosing a few strings from a list so that the number of strings equal the number of different letters in them?

Edit2: I think the solution of David Eisenstat works but I will check it before I call the question solved.
Example list of strings:
1.) "a"
2.) "ab"
3.) "bc"
4.) "dc"
5.) "efa"
6.) "ef"
7.) "gh"
8.) "hi"
You can choose number 1.) there's 1 string and 1 letter in it: "a"
You can also choose 1.) and 2.) these are 2 strings with only two different letters in them "a" and "b"
other valid string combinations:
1.) 2.) 3.)
1.) 5.) 6.)
there's no valid combination with "h" (it would be ideal if cases like this could be proven however you can assume the program only needs to work when there's a valid answer)
There could be an extra condition that the strings you choose must include one specified letter, however simply finding all the possible combinations would solve the problem just as well. eg. specified letter "c" the only solution in this case would be: 1.) 2.) 3.)
[optional information] The purpose of this: I want to make a program which can choose from a big list of equations (probably around 100) which ones can be used to solve for a variable. Each equation gets one string, each letter in the string representing one unknown. The list of equations are all different eg. cannot be derived from each other, so you need as many equations as many unknowns there are in them. Solving for the unknowns will be done in a CAS, so you don't need to worry about it. However I believe the CAS (Maxima) might have a limit on how many equations it can solve simultaneously and it might be too slow if you give it too many unnecessary equations at a time.
As a start I would use an algorithm to reduce the number of strings just to make it faster. First all strings containing specified letter are in the reduced list, then all strings containing the letters from the strings in the reduced list are part of the reduced list until none is added. eg reduced list of "g" would be 7.) "gh" and 8.) "hi" This would only remove some unnecessary strings, but the task would remain the same with the rest.
I think this can be solved by taking away unnecessary strings from the reduced list until all the remaining are needed, however I don't know how to explicitly define which strings would be unnecessary (except for those mentioned in the previous paragraph).
If you work with the extra condition: This is an optimization task. I don't need a perfect solution, only an optimal solution. The program doesn't need to find the absolute minimum number of strings that give a solution. Having a few extra strings in the solution would probably only slow the computer down, but it would be acceptable.
Edit: Optional clarification about the meaning of the strings: Each letter in a string represent an unknown in an equation so the equation a=2 would be represented by "a" because that's the only unknown. The equation a+b=0 would be represented by "ab" and b^2-c=0 by "bc"
I'm not sure what to call this problem. It seems NP-hard, so I'm going to suggest an integer programming formulation, which can be attacked by an off-the-shelf solver.
Let x_i be a 0-1 variable indicating whether equation i is included in the output. Let y_j be a 0-1 variable indicating whether variable j is included in the output. We have constraints
for all equations i, for all variables j in equation i, y_j - x_i >= 0.
We need as many equations as variables in the output.
(sum over all equations i of x_i) - (sum over all variables j of y_j) = 0
As you point out, the empty set needs specifically to be disallowed. Let k be a variable that must appear in the output.
sum over all equations i containing variable k of x_i >= 1
Naturally, the objective is
minimize sum over all equations i of x_i.

Is it possible to create an algorithm which generates an autogram?

An autogram is a sentence which describes the characters it contains, usually enumerating each letter of the alphabet, but possibly also the punctuation it contains. Here is the example given in the wiki page.
This sentence employs two a’s, two c’s, two d’s, twenty-eight e’s, five f’s, three g’s, eight h’s, eleven i’s, three l’s, two m’s, thirteen n’s, nine o’s, two p’s, five r’s, twenty-five s’s, twenty-three t’s, six v’s, ten w’s, two x’s, five y’s, and one z.
Coming up with one is hard, because you don't know how many letters it contains until you finish the sentence. Which is what prompts me to ask: is it possible to write an algorithm which could create an autogram? For example, a given parameter would be the start of the sentence as an input e.g. "This sentence employs", and assuming that it uses the same format as the above "x a's, ... y z's".
I'm not asking for you to actually write an algorithm, although by all means I'd love to see if you know one to exist or want to try and write one; rather I'm curious as to whether the problem is computable in the first place.
You are asking two different questions.
"is it possible to write an algorithm which could create an autogram?"
There are algorithms to find autograms. As far as I know, they use randomization, which means that such an algorithm might find a solution for a given start text, but if it doesn't find one, then this doesn't mean that there isn't one. This takes us to the second question.
"I'm curious as to whether the problem is computable in the first place."
Computable would mean that there is an algorithm which for a given start text either outputs a solution, or states that there isn't one. The above-mentioned algorithms can't do that, and an exhaustive search is not workable. Therefore I'd say that this problem is not computable. However, this is rather of academic interest. In practice, the randomized algorithms work well enough.
Let's assume for the moment that all counts are less than or equal to some maximum M, with M < 100. As mentioned in the OP's link, this means that we only need to decide counts for the 16 letters that appear in these number words, as counts for the other 10 letters are already determined by the specified prefix text and can't change.
One property that I think is worth exploiting is the fact that, if we take some (possibly incorrect) solution and rearrange the number-words in it, then the total letter counts don't change. IOW, if we ignore the letters spent "naming themselves" (e.g. the c in two c's) then the total letter counts only depend on the multiset of number-words that are actually present in the sentence. What that means is that instead of having to consider all possible ways of assigning one of M number-words to each of the 16 letters, we can enumerate just the (much smaller) set of all multisets of number-words of size 16 or less, having elements taken from the ground set of number-words of size M, and for each multiset, look to see whether we can fit the 16 letters to its elements in a way that uses each multiset element exactly once.
Note that a multiset of numbers can be uniquely represented as a nondecreasing list of numbers, and this makes them easy to enumerate.
What does it mean for a letter to "fit" a multiset? Suppose we have a multiset W of number-words; this determines total letter counts for each of the 16 letters (for each letter, just sum the counts of that letter across all the number-words in W; also add a count of 1 for the letter "S" for each number-word besides "one", to account for the pluralisation). Call these letter counts f["A"] for the frequency of "A", etc. Pretend we have a function etoi() that operates like C's atoi(), but returns the numeric value of a number-word. (This is just conceptual; of course in practice we would always generate the number-word from the integer value (which we would keep around), and never the other way around.) Then a letter x fits a particular number-word w in W if and only if f[x] + 1 = etoi(w), since writing the letter x itself into the sentence will increase its frequency by 1, thereby making the two sides of the equation equal.
This does not yet address the fact that if more than one letter fits a number-word, only one of them can be assigned it. But it turns out that it is easy to determine whether a given multiset W of number-words, represented as a nondecreasing list of integers, simultaneously fits any set of letters:
Calculate the total letter frequencies f[] that W implies.
Sort these frequencies.
Skip past any zero-frequency letters. Suppose there were k of these.
For each remaining letter, check whether its frequency is equal to one less than the numeric value of the number-word in the corresponding position. I.e. check that f[k] + 1 == etoi(W[0]), f[k+1] + 1 == etoi(W[1]), etc.
If and only if all these frequencies agree, we have a winner!
The above approach is naive in that it assumes that we choose words to put in the multiset from a size M ground set. For M > 20 there is a lot of structure in this set that can be exploited, at the cost of slightly complicating the algorithm. In particular, instead of enumerating straight multisets of this ground set of all allowed numbers, it would be much better to enumerate multisets of {"one", "two", ..., "nineteen", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety"}, and then allow the "fit detection" step to combine the number-words for multiples of 10 with the single-digit number-words.

The perverse hangman problem

Perverse Hangman is a game played much like regular Hangman with one important difference: The winning word is determined dynamically by the house depending on what letters have been guessed.
For example, say you have the board _ A I L and 12 remaining guesses. Because there are 13 different words ending in AIL (bail, fail, hail, jail, kail, mail, nail, pail, rail, sail, tail, vail, wail) the house is guaranteed to win because no matter what 12 letters you guess, the house will claim the chosen word was the one you didn't guess. However, if the board was _ I L M, you have cornered the house as FILM is the only word that ends in ILM.
The challenge is: Given a dictionary, a word length & the number of allowed guesses, come up with an algorithm that either:
a) proves that the player always wins by outputting a decision tree for the player that corners the house no matter what
b) proves the house always wins by outputting a decision tree for the house that allows the house to escape no matter what.
As a toy example, consider the dictionary:
bat
bar
car
If you are allowed 3 wrong guesses, the player wins with the following tree:
Guess B
NO -> Guess C, Guess A, Guess R, WIN
YES-> Guess T
NO -> Guess A, Guess R, WIN
YES-> Guess A, WIN
This is almost identical to the "how do I find the odd coin by repeated weighings?" problem. The fundamental insight is that you are trying to maximise the amount of information you gain from your guess.
The greedy algorithm to build the decision tree is as follows:
- for each guess, choose the guess which for which the answer is "true" and which the answer is "false" is as close to 50-50 as possible, as information theoretically this gives the most information.
Let N be the size of the set, A be the size of the alphabet, and L be the number of letters in the word.
So put all your words in a set. For each letter position, and for each letter in your alphabet count how many words have that letter in that position (this can be optimised with an additional hash table). Choose the count which is closest in size to half the set. This is O(L*A).
Divide the set in two taking the subset which has this letter in this position, and make that the two branches to the tree. Repeat for each subset until you have the whole tree. In worst case this will require O(N) steps, but if you have a nice dictionary this will lead to O(logN) steps.
This isn't strictly an answer, since it doesn't give you a decision tree, but I did something very similar when writing my hangman solver. Basically, it looks at the set of words in its dictionary that match the pattern and picks the most common letter. If it guesses wrong, it eliminates the largest number of candidates. Since there's no penalty to guessing right in hangman, I think this is the optimal strategy given the constraints.
So with the dictionary you gave, it would first guess a correctly. Then it would guess r, also correctly, then b (incorrect), then c.
The problem with perverse hangman is that you always guess wrong if you can guess wrong, but that's perfect for this algorithm since it eliminates the largest set first. As a slightly more meaningful example:
Dictionary:
mar
bar
car
fir
wit
In this case it guesses r incorrectly first and is left with just wit. If wit were replaced in the dictionary with sir, then it would guess r correctly then a incorrectly, eliminating the larger set, then w or f at random incorrectly, followed by the other for the final word with only 1 incorrect guess.
So this algorithm will win if it's possible to win, though you have to actually run through it to see if it does win.

Resources