Expression for this language - expression

A language to generate all strings that have more a's than b's (not necessarily only one more, as with the nonterminal A for the language EQUAL, but any number more a's than b's).

First, let us show that the language is not regular by using the Pumping Lemma for Regular Languages. Assume this language were regular. Then, by the pumping lemma, we'd know that any string in the language of length at least p could be written as uxv such that |ux| <= p, |x| > 0 and for all nonnegative integers n, u(x^n)v is also in the language. Choose as our representative string b^p a^(p+1). This string is clearly in our language because it has more a's than b's. Rewriting this as uxv with |ux| <= p means that u and x consist only of b's. Because |x| > 0, choosing n > 1 causes us to increase the number of b's alone, which means there are not more a's than b's in the resulting string. This is a contradiction, so our only assumption - that the language is a regular language - must be false.
Next, let's see if the language is context-free. Notice that the language with more a's than b's is similar to the one with the same number of a's and b's, which we recognize as a canonical context-free language:
S -> SS | aSb | bSa | e
We can modify this grammar as follows:
whenever we allow one a, allow any number of a instead
instead of allowing the empty string, require at least one a be in the string
This suggests the grammar
S -> ST | TS | ASb | bSA | A
T -> TT | ATb | bTA | e
A -> aA | a
Note: we introduced nonterminal A to capture the idea of allowing any number of a's in place of one, and we added the nonterminal T to assist with the "at least one" a concept. We needed a new nonterminal here because otherwise every time we used the production S -> SS we'd be requiring extra a's in the grammar that strings in the language might not have.
To show this grammar is correct we can show:
it only generates strings in the language, that is, all strings it generates have more a than b
it generates all strings in the language, that is, for any string with more a's than b's, this grammar derives that string
To show the first part, simply note that any production that adds b also adds at least one a, and that to eliminate all instances of S requires using the production S -> a exactly once, which ensures at least one more a is in the final string than there are b's.
To show the second part, consider this procedure to generate any string:
if the string is of the form a*, then use S -> A -> a*.
if the string starts with a and ends with b, use S -> ASb -> aSb to peel off the outermost a and b. Then, solve the subproblem of size (N - 2) using this same procedure. Notice that we have removed the same number of a and b, so if the original string had more a than b, so does the string in the subproblem.
if the string starts with b and ends with a, use S -> bSA -> bSa to peel off the outermost a and b. Then, solve the subproblem of size (N - 2) using this same procedure. Notice that we have removed the same number of a and b, so if the original string had more a than b, so does the string in the subproblem.
if the string starts and ends with the same symbol, find any prefix of the string such that it and the complementary suffix both have at least as many a's as b's. This must be possible since the string has more a's than b's and, since the word starts and ends with the same symbol, neither the prefix nor the suffix needs to be empty. Then, use either S -> TS or S -> ST, depending on whether the prefix or suffix may contain the same numbers of a's and b's, and solve the subproblems.
Rigorously proving the claim in step 3 of this procedure is left as an exercise:
If a string consisting of a's and b's begins and ends with the same symbol and contains more a's than b's, the string can be split into a prefix and a suffix each of which has at least as many a's as b's.

Related

Is the empty string a subset or element of all alphabets?

I'm having trouble understanding when the empty string (epsilon) is a subset or element of an alphabet? My understanding was that epsilon was only part of a language, but my TA in the class said it was an element of all alphabets so now I am confused.
e.g. would {a,b,c} contain epsilon as an element?
e.g. would {} contain epsilon as an element?
e.g. is {eps} a subset of all alphabets or languages?
This question is probably better suited for cs.stackexchange but I will try to help you according to my understanding, please do correct me if necessary.
In general, your own intuition seems quite correct to me. ϵ is not automatically part of every alphabet. It is the empty string of characters.
However, this means that ϵ is a string over any alphabet, even your alphabet {a, b, c}.
So to answer your three examples:
No, it does not. If {a, b, c} is an alphabet, it is a set of symbols, and ϵ is a string. However, ϵ is definitely part of some languages defined over this alphabet.
No, {} is the empty set, and it contains nothing, not even ϵ.
{ϵ} is the set containing only ϵ. ϵ is a string, not a symbol, so it is not a subset of all alphabets (however, it seems there are cases where some alphabets are defined to contain ϵ but that is a different confusing story). It is also not a subset of all languages, because consider the language L = {aa, ab, ba, bb}. The empty string ∈ is not one of these elements.
The analogue to set theory might cause some additional confusion. Notice that the empty set is a subset of every set. The empty string is not a subset of every language, but rather it is a substring of any string.

Proving non regularity of a language using Myhill-Nerode

Prove that the language defined over {0,1} where the number of 0s of x=number of 1s of x is not regular.
Myhill-Nerode says that there are as many equivalence classes over the indistinguishability relation w.r.t. a regular language as there are states in a minimal DFA for that language.
Two strings w and x are indistinguishable w.r.t. a regular language L if for any y such that wy is in L, xy is also in L; and if for any z such that xz is in L, wz is in L also. In other words, w and x are indistinguishable if they can have exactly the same set of strings concatenated to them to get some string in L.
For your language, we can show the number of equivalence classes under this relation is infinite. Because no DFA has infinitely many states, this is a contradiction; by showing this we show the language is not regular.
In your case, we can show that 0^n is distinguishable from 0^m, for m > n, since the shortest string that can be appended to 0^n to get a string in your language is 1^n, whereas the shortest string that can be appended to 0^m to get a string in your language is 1^m, and m > n. Thus, each string 0^n is distinguishable from every other string 0^m, n != m, so there are infinitely many equivalence classes. As we said earlier, this means the language cannot be regular.
If we had found that a finite number of equivalence classes existed and named them, we'd have also found a minimal DFA for the language. Simply add one state per equivalence class and figure out the transitions.

Regular language for number of A's in the string

L={w|w€{a,b}, number of a is divisible by 2 }is the language. Can someone help me with the regular grammer of this?
The language is the set of all strings of a and b with an even number of a. This is a regular language and the goal is to produce a regular grammar for it.
Unless the regular grammar you're going to need is trivial, I would recommend always writing down the finite automaton first, and then converting it into a grammar. Converting a finite automaton into a grammar is very easy, and solving this problem is easy with a finite automaton. We will have two states: one will correspond to having seen an even number of a, the other an odd number. The state corresponding to having seen an even number of a will be accepting, and seeing b will not cause us to change states. The DFA is therefore:
b b
/-\ /-\
| V | V
----->(q0)--a-->(q1)
^ |
| a |
\---------/
A regular grammar for this can be formed by writing the transitions down as productions, using the states as nonterminal symbols, and including an empty production for the accepting state:
(q0) -> b(q0) | a(q1) | e
(q1) -> b(q1) | a(q0)
For the sake of completeness, you could run some other algorithms on the grammar or automaton and get a regular expression, maybe like this: b*(ab*ab*)* (just wrote that down, not sure if it's right or not, left as an exercise).

How does a pushdown automaton know how to read a palindrome?

For example, how does a PDA know how to read a palindrome in L = {a, b}*?
PDA that accepts palindromes over {a,b}* :
So, based on my drawing of the PDA:
How does it know when the first half of the string is on the final terminal (letter of the alphabet), and therefore knows to go from state 0 to state 1 (and furthermore knowing to "pop" letters from the stack backwards, hence creating the palindrome)?
This is a nondeterministic pushdown automata. The answer to your question is that it guesses and may be assumed to guess correctly. Nondeterministic automata accept a string w if any path along which w might be processed results in w's being accepted.
If we define acceptance as having an empty stack in an accepting state, then the only way something can be accepted by the above NPDA is if:
it puts some stuff on the stack in state q0
it eventually guesses that it needs to read the second half of the string
it reads what it pushed onto the stack, but backwards, in q1
There are three "guesses" that the NPDA makes:
it guesses that the string is an even-length palindrome when it guesses e e/e, where e is used in place of lambda.
it guesses that the string is an odd-length palindrome with a in between the two halves when it guesses a e/e, where e is used in place of lambda
it guesses that the string is an odd-length palindrome with b in between the two halves when it guesses b e/e, where e is used in place of lambda
Each of the above three guesses is also guessing that the first half of the string, excluding a possible middle element, has been seen already.
This guess will eventually be true for any palindrome, and it won't be true for anything but a palindrome, so the NPDA accepts PAL.

what is the logic of Finite Automa and loops

I have to draw a Finite Automaton that accepts the following string
Λ, a, aabc, acba and accb
In my view a(a+b+c)* this might be it's regular expression as the string is starting from a and includes an empty string as well.
Now I didn't find the logic of drawing FA as in below image
Question 1: If the string is starting with a then in FA, We are moving from x to y while reading b
Why we don't read a here.
Question 2: Why we use loop of a,b on state y and z
The language L = {λ, a, aabc, acba, accb} is finite. Therefore, L is not equivalent to the language denoted by the Kleene closure of the regular expression a(a + b + c), which is infinite. There is a simple algorithm that generates the nondeterministic finite automaton accepting a finite language, which consists of drawing paths accepting each of the strings in the language.
It's unclear what the relationship between the two languages and the diagram in the original post is, since the automaton in the diagram accepts neither language. Assuming that the nodes are labeled with their names, and circled nodes indicate acceptance, the language accepted by the automaton in the diagram is (a + b)*. In this case, the loops are used to accept the Kleene closure of (a + b). That said, it would be useful if you could clarify the meaning of the diagram.

Resources