Is the empty string a subset or element of all alphabets? - set

I'm having trouble understanding when the empty string (epsilon) is a subset or element of an alphabet? My understanding was that epsilon was only part of a language, but my TA in the class said it was an element of all alphabets so now I am confused.
e.g. would {a,b,c} contain epsilon as an element?
e.g. would {} contain epsilon as an element?
e.g. is {eps} a subset of all alphabets or languages?

This question is probably better suited for cs.stackexchange but I will try to help you according to my understanding, please do correct me if necessary.
In general, your own intuition seems quite correct to me. ϵ is not automatically part of every alphabet. It is the empty string of characters.
However, this means that ϵ is a string over any alphabet, even your alphabet {a, b, c}.
So to answer your three examples:
No, it does not. If {a, b, c} is an alphabet, it is a set of symbols, and ϵ is a string. However, ϵ is definitely part of some languages defined over this alphabet.
No, {} is the empty set, and it contains nothing, not even ϵ.
{ϵ} is the set containing only ϵ. ϵ is a string, not a symbol, so it is not a subset of all alphabets (however, it seems there are cases where some alphabets are defined to contain ϵ but that is a different confusing story). It is also not a subset of all languages, because consider the language L = {aa, ab, ba, bb}. The empty string ∈ is not one of these elements.
The analogue to set theory might cause some additional confusion. Notice that the empty set is a subset of every set. The empty string is not a subset of every language, but rather it is a substring of any string.

Related

Proving non regularity of a language using Myhill-Nerode

Prove that the language defined over {0,1} where the number of 0s of x=number of 1s of x is not regular.
Myhill-Nerode says that there are as many equivalence classes over the indistinguishability relation w.r.t. a regular language as there are states in a minimal DFA for that language.
Two strings w and x are indistinguishable w.r.t. a regular language L if for any y such that wy is in L, xy is also in L; and if for any z such that xz is in L, wz is in L also. In other words, w and x are indistinguishable if they can have exactly the same set of strings concatenated to them to get some string in L.
For your language, we can show the number of equivalence classes under this relation is infinite. Because no DFA has infinitely many states, this is a contradiction; by showing this we show the language is not regular.
In your case, we can show that 0^n is distinguishable from 0^m, for m > n, since the shortest string that can be appended to 0^n to get a string in your language is 1^n, whereas the shortest string that can be appended to 0^m to get a string in your language is 1^m, and m > n. Thus, each string 0^n is distinguishable from every other string 0^m, n != m, so there are infinitely many equivalence classes. As we said earlier, this means the language cannot be regular.
If we had found that a finite number of equivalence classes existed and named them, we'd have also found a minimal DFA for the language. Simply add one state per equivalence class and figure out the transitions.

Expression for this language

A language to generate all strings that have more a's than b's (not necessarily only one more, as with the nonterminal A for the language EQUAL, but any number more a's than b's).
First, let us show that the language is not regular by using the Pumping Lemma for Regular Languages. Assume this language were regular. Then, by the pumping lemma, we'd know that any string in the language of length at least p could be written as uxv such that |ux| <= p, |x| > 0 and for all nonnegative integers n, u(x^n)v is also in the language. Choose as our representative string b^p a^(p+1). This string is clearly in our language because it has more a's than b's. Rewriting this as uxv with |ux| <= p means that u and x consist only of b's. Because |x| > 0, choosing n > 1 causes us to increase the number of b's alone, which means there are not more a's than b's in the resulting string. This is a contradiction, so our only assumption - that the language is a regular language - must be false.
Next, let's see if the language is context-free. Notice that the language with more a's than b's is similar to the one with the same number of a's and b's, which we recognize as a canonical context-free language:
S -> SS | aSb | bSa | e
We can modify this grammar as follows:
whenever we allow one a, allow any number of a instead
instead of allowing the empty string, require at least one a be in the string
This suggests the grammar
S -> ST | TS | ASb | bSA | A
T -> TT | ATb | bTA | e
A -> aA | a
Note: we introduced nonterminal A to capture the idea of allowing any number of a's in place of one, and we added the nonterminal T to assist with the "at least one" a concept. We needed a new nonterminal here because otherwise every time we used the production S -> SS we'd be requiring extra a's in the grammar that strings in the language might not have.
To show this grammar is correct we can show:
it only generates strings in the language, that is, all strings it generates have more a than b
it generates all strings in the language, that is, for any string with more a's than b's, this grammar derives that string
To show the first part, simply note that any production that adds b also adds at least one a, and that to eliminate all instances of S requires using the production S -> a exactly once, which ensures at least one more a is in the final string than there are b's.
To show the second part, consider this procedure to generate any string:
if the string is of the form a*, then use S -> A -> a*.
if the string starts with a and ends with b, use S -> ASb -> aSb to peel off the outermost a and b. Then, solve the subproblem of size (N - 2) using this same procedure. Notice that we have removed the same number of a and b, so if the original string had more a than b, so does the string in the subproblem.
if the string starts with b and ends with a, use S -> bSA -> bSa to peel off the outermost a and b. Then, solve the subproblem of size (N - 2) using this same procedure. Notice that we have removed the same number of a and b, so if the original string had more a than b, so does the string in the subproblem.
if the string starts and ends with the same symbol, find any prefix of the string such that it and the complementary suffix both have at least as many a's as b's. This must be possible since the string has more a's than b's and, since the word starts and ends with the same symbol, neither the prefix nor the suffix needs to be empty. Then, use either S -> TS or S -> ST, depending on whether the prefix or suffix may contain the same numbers of a's and b's, and solve the subproblems.
Rigorously proving the claim in step 3 of this procedure is left as an exercise:
If a string consisting of a's and b's begins and ends with the same symbol and contains more a's than b's, the string can be split into a prefix and a suffix each of which has at least as many a's as b's.

How to demonstrate a set is decidible, semi-decidible or not semi-decidible?

I have been asked to prove if the following set is decidible, semi-decidible or not semi-decidible:
In other words, it is the set of inputs such that exists a Turing Machine encoded with the natural y with input p that returns its input.
Consider the set K as the set of naturals such that the Turing machine encoded with x and input x stops. This is demonstrated to be a non-decidible set.
I think that what I need is to find a reduction of K to L, but I don't know how to prove that L is decidible, semi-decidible or not semi-decidible.
L may not look decidable at first glance, because there is this nasty unbounded quantifier included, which seems to make necessary a possibly infinite search when you look for a y satisfying the condition for a specific p.
However, the answer is much simpler: There is a turing machine M which always returns its input, i.e. M(p) = p holds for all p in the considered language. Let y be a code of M. Then you can use this same y for all p, showing that L contains all words of the language. Hence L is of course decidable.
In fact, this is an example to demonstrate the principle of extensionality (if two sets have the same elements and one is decidable, then the other is decidable too, even if it doesn't look so).

How does a pushdown automaton know how to read a palindrome?

For example, how does a PDA know how to read a palindrome in L = {a, b}*?
PDA that accepts palindromes over {a,b}* :
So, based on my drawing of the PDA:
How does it know when the first half of the string is on the final terminal (letter of the alphabet), and therefore knows to go from state 0 to state 1 (and furthermore knowing to "pop" letters from the stack backwards, hence creating the palindrome)?
This is a nondeterministic pushdown automata. The answer to your question is that it guesses and may be assumed to guess correctly. Nondeterministic automata accept a string w if any path along which w might be processed results in w's being accepted.
If we define acceptance as having an empty stack in an accepting state, then the only way something can be accepted by the above NPDA is if:
it puts some stuff on the stack in state q0
it eventually guesses that it needs to read the second half of the string
it reads what it pushed onto the stack, but backwards, in q1
There are three "guesses" that the NPDA makes:
it guesses that the string is an even-length palindrome when it guesses e e/e, where e is used in place of lambda.
it guesses that the string is an odd-length palindrome with a in between the two halves when it guesses a e/e, where e is used in place of lambda
it guesses that the string is an odd-length palindrome with b in between the two halves when it guesses b e/e, where e is used in place of lambda
Each of the above three guesses is also guessing that the first half of the string, excluding a possible middle element, has been seen already.
This guess will eventually be true for any palindrome, and it won't be true for anything but a palindrome, so the NPDA accepts PAL.

Exhibiting an algorithm that determines if L = L*, given any regular language L

I am studying membership algorithms and I am working on this particular problem which says the following:
Exhibit an algorithm that, given any regular language L, determines whether or not L = L*
So, my first thought was, we have L* which is Kleene star of L and to determine if L = L*, well couldn't we just say that since L is regular, we know L* is by definition which states that the family of regular languages is closed under star-closure.
Therefore L will always be equal to L*?
I feel like there is definitely a lot more to it, there is probably something I am missing. Any help would be appreciated. Thanks again.
since L is regular, we know L* is by definition which states that the family of regular languages is closed under star-closure. Therefore L will always be equal to L*?
No. Regular(L) --> Regular(L*), but that does not mean that L == L*. Just because two languages are both regular does not mean that they are the same regular language. For instance, a* and b* are both regular languages, but this does not make them the same language.
A example of L != L* would be the language L = a*b*, and thus L* = (a*b*)*. The string abab is part of L* but not part of L.
As far as an algorithm goes, let me remind you that the concept of a regular language is one that can be parsed by a DFA - and for any given DFA, there is a single optimal reduction of that DFA.
The implication that you stated is wrong. Closedness under the Kleene star means only that L* is again regular, if L is regular.
One possibility to check whether L = L* is to compute the minimal automaton for both and then checking for equivalence.

Resources