Chomsky Normal form removing epsilon transitions - chomsky-normal-form

I'm working on converting a CFG to Chomsky Normal Form but I'm having some difficulty.
I have this CFG
A-> BAB|B|epsilon
B -> 00|epsilon
Ok I add a new start state
S -> A
A-> BAB|B|epsilon
B -> 00|epsilon
Then I have to remove epsilon transitions so I start with B
S -> A
A-> BAB|B|AB|BA|A|epsilon
B -> 00
How do I then remove the epsilon from A? Can the start have an epsilon in it? And how do I convert A-> A?

You can't convert this grammar to one without ε, and therefore it cannot be written in Chomsky Normal form. This is because all productions can reduce to ε, therefore ε is a valid sentence in the language.

Related

CFG of Language which contains equal # of a's and b's

I've tried this
S -> e(Epsilon)
S -> SASBS
S -> SBSAS
A -> a
B -> b
Can someone verify if this is correct.
Your grammar is correct. Here is the proof.
First, we show that your grammar generates only strings with an equal number of a and b. Note that all productions with S on the LHS introduce an equal number of A as they do B. Therefore, any string of terminals derived from S will have an equal number of a and b.
Next, we show that all strings of a and b can be derived using this grammar. We proceed using mathematical induction.
Base case: S -> e and both S -> SASBS -> ASBS -> aSBS -> aBS -> abS -> ab and S -> SBSAS -> BSAS -> bSAS -> bAS -> baS -> ba, so the three shortest string in the language are generated by the grammar. There are no other strings in the language of length less than 4.
Induction hypothesis: all strings of length up to 2k in the language are generated by the grammar.
Inductive step: we must show all strings of length 2(k + 1) in the language are also generated by the grammar. If w = axb or w = bya for some strings x and y, then x and y are strings of length 2k in the language and are therefore generated by the grammar. In this case, we can use the same derivation with an extra application of either S -> SASBS -> ASBS -> aSBS -> aSbS -> aSb or S -> SBSAS -> BSAS -> bSAS -> bSaS -> bSa and then use the derivation for x or y to complete the derivation, yielding w. If, instead, w = axa or w = byb, then x or y is a string with exactly two more b than a or a than b. In this case, there must be a prefix p of w with |p| < |w| such that p is also a string in the language (see lemma below). If the prefix p is a word in the language, and w = pr, then r must also be a word in the language, so w must be the concatenation of two words in L. These words both have length less than |w| so less than 2(k + 1) and are generated by the grammar. If they are generated by the grammar then they are of the form SaSbS or SbSaS and their concatenation can be derived using the grammar by using the productions in the proper sequence. That is, S -> SASBS -> SASBSBSAS -> aSbSbSa = aSbS bSa <- aSbS SbSa (we are of course free to choose S -> e in that last reverse step justification).

How to easily prove the following in Coq such as using only assumptions?

Is there an easy way to prove the following in Coq such as using only assumptions?
(P -> (Q /\ R)) -> (~Q) -> ~P
The question is a bit vague... Do you wonder if it is possible (yes), what the answer is (see Arthur's comment above), or how to think about solving these problems?
In the latter case, remember that the goal is to create a "lambda-term" with the specified type. You can either use "tactics" which are helping you construct the term "from the outside and inwards. It is good to do it by hand a couple of times to understand what is going on and what the tactics really do, which I think is why you are given this exercise.
If you look at your example,
(P -> (Q /\ R)) -> (~Q) -> ~P
you can see that it is a function of three (!) arguments. It is because the last type ~P really means P -> False, so the types of the arguments to the function that you need to create are
P -> (Q /\ R)
Q -> False
P
and the function should construct a term of type
False
You can create a term fun A B C => _ where A, B, C has the types above, (this is what the tactic intros does), and you need to come up with a term that should go into the hole _ by combining the terms A, B, C and the raw gallina constructions.
In this case, when you have managed to create a term of type Q /\ R you will have to "destruct" it to get the term of type Q, (Hint: for that you will have to use the match construction).
Hope this helps without spoiling the fun!

Multi-character substitution cipher algorithm

My problem is the following. I have a list of substitutions, including one substitution for each letter of the alphabet, but also some substitutions for groups of more than one letter. For example, in my cipher p becomes b, l becomes w, e becomes i, but le becomes by, and ple becomes memi.
So, while I can think of a few simple/naïve ways of implementing this cipher, it's not very efficient, and I was wondering what the most efficient way to do it would be. The answer doesn't have to be in any particular language, a general structured English algorithm would be fine, but if it must be in some language I'd prefer C++ or Java or similar.
EDIT: I don't need this cipher to be decipherable, an algorithm that mapped all single letters to the letter 'w' but mapped the string 'had' to the string 'jon' instead should be ok, too (then the string "Mary had a little lamb." would become "Wwww jon w wwwwww wwww.").
I'd like the algorithm to be fully general.
One possible approach is to use deterministic automaton. The closest to your problem and commonly used example is Aho–Corasick string matching algorithm. The difference will be, instead of matching you would like to emit cypher at some transition. Generally at each transition you will emit or do not emit cypher.
In your example
p -> b
l -> w
e -> i
le -> by
ple -> memi
The automaton (in Erlang like pseudocode)
start(p) -> p(next());
start(l) -> l(next());
start(e) -> e(next());
...
p(l) -> pl(next);
p(X) -> emit(b), start(X).
l(e) -> emit(by), start(next());
l(X) -> emit(w), start(X).
e(X) -> emit(i), start(X).
pl(e) -> emit(memi), start(next());
pl(X) -> emit(b), l(X).
If you are not familiar with Erlang, start(), p() are functions each for one state. Each line with -> is one transition and the actions follows the ->. emit() is function which emits cypher and next() is function returning next character. The X is variable for any other character.

What is difference between trivial FD and two cyclic FD's

In the Complete Book by Ullman and Widom I've read that with two attributes (A and B) we have four cases for FD's. Second and third are A -> B and B -> A, so they are easier. But I don't understand what the difference between trivial dependency «B is a subset of A» and cyclic FD's A -> B and B -> A. Aren't they the same?
With two attributes you have four cases:
A -> B (this means you also have the trivial FDs: A -> A, B -> B)
B -> A (with trivial FDs as above)
A -> B, B -> A (with trivial FDs as above)
no non-trivial FDs. This means you only have the trivial FDs A -> A, B -> B. This means that the two attributes are independent.
A "real-world" example of case 3 could be two attributes: SSN (social security number of a person) and passport_number of a person. Each one is the consequence of the other.
An example of case 4 could be two attributes: SSN (social security number of a person) and book_title. The two attributes are completely independent. One does not imply the other.

Computing the Follow Set

Ok, I've understood how to compute the Follow_k(N) set (N is a nonterminal): for every production rule of the form A -> aBc you add First_k(First_k(c)Follow_k(A)) to Follow_k(B) (a, c are any group of terminals and nonterminals, or even lambda). ...and you repeat this until there's nothing left to add.
But what happends for production rules like: S -> ABCD (A, B, C, D are all nonterminals)?
Should I
add First_k(First_k(BCD)Follow_k(S)) to Follow_k(A) or
add First_k(First_k(CD)Follow_k(S)) to Follow_k(B) or
add First_k(First_k(D)Follow_k(S)) to Follow_k(C) or
add First_k(First_k(lambda)Follow_k(S)) to Follow_k(D) or
do all of the above?
UPDATE:
Let's take the following grammar for example:
S -> ABC
A -> a
B -> b
C -> c
Intuitively, Follow_1(S) = {} because nothing follows after S
Follow_1(A) = {b} because b follows after A,
Follow_1(B) = {c} because c follows after B,
Follow_1(C) = {} because nothing follows after C.
In order to get this result using the algorithm you must consider all cases for S -> ABC.
But my judgement or example may not be right so the question still remains open...
If you run into trouble on other grammar problems like this, give this online first, follow, & predict set finder a shot. It's automatic and you can compare answers to its output to get a feel for how to work through these.
But what happens for production rules like: S -> ABCD (A, B, C, D are all nonterminals)?
Here are the rules for finding follow sets.
First put $ (the end of input marker) in Follow(S) (S is the start symbol)
If there is a production A → aBb, (where a can be a whole string) then everything in FIRST(b) except for ε is placed in FOLLOW(B).
If there is a production A → aB, then everything in FOLLOW(A) is in FOLLOW(B)
If there is production A → aBb, where FIRST(b) contains ε, then everything in FOLLOW(A) is in FOLLOW(B)
Let's use your example grammar:
S -> ABC
A -> a
B -> b
C -> c
Rule 1 says that follow(S) contains $.
Rule 2 gives us: follow(A) contains first(B); also, follow(B) contains first(C).
Rule 3 says that follow(C) contains follow (S).
None of your productions are nullable, so we don't care about rule #4. A symbol is nullable if it derives ε or if it derives a nullable non-terminal symbol.
Nullability's transitivity can trip people up. Consider this grammar:
S -> A
A -> B
B -> ε
Since B derives ε, B's nullable. Since A derives B, which derives ε, A's nullable too. S derives A, which derives B, which derives ε, so S is nullable as well.
Granted, you didn't bring that up, but it's a common source of confusion in compiler courses, so I figured I'd lay it out.
Also, if you need some sample grammars to work through, http://faculty.stedwards.edu/laurab/cosc4342/g1answers.txt might be handy.

Resources