proving that a language is part of a grammar and vice versa - complexity-theory

so here a grammar R and a Langauge L, I want to prove that from R comes out L.
R={S→abS|ε} , L={(ab)n|n≥0}
so I thought I would prove that L(G) ⊆ L and L(G) ⊇ L are right.
for L (G) ⊆ L: I show by induction on the number i of derivative steps that after every derivative step u → w through which w results from u according to the rules of R, w = v1v2 or w = v1v2w with | v2 | = | v1 | and v1 ∈ {a} ∗ and v2 ∈ {b} ∗.
and in the induction start: at i = 0 it produces that w is ε and at i = 1 w is {ε, abS}.
is that right so far ?

so here a grammar R and a Langauge L, I want to prove that from R comes out L.
Probably what you want to do is show that the language L(R) of some grammar R is the same as some other language L specified another way (in your case, set-builder notation with a regular expression).
so I thought I would prove that L(G) ⊆ L and L(G) ⊇ L are right.
Given the above assumption, you are correct in thinking this is the right way to proceed with the proof.
for L (G) ⊆ L: I show by induction on the number i of derivative steps that after every derivative step u → w through which w results from u according to the rules of R, w = v1v2 or w = v1v2w with | v2 | = | v1 | and v1 ∈ {a} ∗ and v2 ∈ {b} ∗. and in the induction start: at i = 0 it produces that w is ε and at i = 1 w is {ε, abS}.
This is hard for me to follow. That's not to say it's wrong. Let me write it down in my own words and perhaps you or others can judge whether we are saying the same thing.
We want to show that L(R) is a subset of L. That is, any string generated by the grammar R is contained in the language L. We can prove this by mathematical induction on the number of steps in the derivation of strings generated by the grammar. We start with the base case of one derivation step: S -> e produces the empty word, which is a string in the language L by choosing n = 0. Now that we have established the base case, we can state the induction hypothesis: assume that for all strings derived from the grammar in a number of steps up to and including k, those strings are also in L. Now we must prove the induction step: that any string derived in k+1 steps from the grammar is also in L. Let w be any string derived from the grammar in k+1 steps. From the grammar it is clear that the derivation of w must be S -> abS -> ababS -> ... -> abab...abS -> abab...abe = abab...ab. But this derivation is the same as the derivation of a string from the grammar in k steps, except that there was one extra application of S -> abS before the application of S -> e. By the induction hypothesis we know that the string w' derived in k steps is of the form (ab)^m for some m at least zero, and adding an extra application of S -> abS to the derivation adds ab. Because (ab)^m(ab) = (ab)^(m+1) we can choose n = m+1. So, all strings derived from the grammar in k+1 steps are also in the language, as required.
To prove that all strings in the language can be derived in the grammar, consider the following construction: to derive the string (ab)^n in the grammar, apply the production S -> abS a number of times equal to n, and the production S -> e exactly once. The first step gives an intermediate form (ab)^nS and the second step gives a closed form string (ab)^n.

Related

First sets of LL(1) parser

I have some problems understanding the following rules applied for first sets of LL(1) parser:
b) Else X1 is a nonterminal, so add First(X1) - ε to First(u).  
a. If X1 is a nullable nonterminal, i.e., X1 =>* ε,  add First(X2) - ε to First(u). 
Furthermore, if X2 can also go to ε, then add First(X3) - ε and so on, through all Xn until the first non­nullable symbol is encountered. 
b. If X1X2...Xn =>* ε, add ε to the first set.
How at b) if X1 nonterminal it can't add ε to First(u)? So if I have
S-> A / a
A-> b / ε
F(A) = {b,ε}
F(S) = {b,ε,a}
it's not correct? Also the little points a and b are confusing.
All it says is what are the terminals you can expect in a sentential form so that you can replace S by AB in the leftmost derivation. So, if A derives ε then in leftmost derivation you can replace A by ε. So now you depend upon B and say on. Consider this sample grammar:
S -> AB
A -> ε
B -> h
So, if there is a string with just one character/terminal "h" and you start verifying whether this string is valid by checking if there is any leftmost derivation deriving the string using the above grammar, then you can safely replace S by AB because A will derive ε and B will derive h.
Therefore, the language recognized by above grammar cannot have a null ε string. For having ε in the language, B should also derive ε. So now both the non-terminals A and B derive ε, therefore S derives ε.
That is, if there is some production S->ABCD and if all the non-terminals A,B,C and D derive ε, then only S can also derive ε and therefore ε will be in FIRST(S).
The FIRST sets given by you are correct. I think you are confused since the production S->A has only one terminal A on rhs and this A derives ε. Now as per b) FIRST(S) = {FIRST(A) - ε, a,} = {b, a} which is incorrect. Since rhs has only one terminal so there is this following possibility S -> A -> ε which specifies that FIRST(S) has ε or S can derive a null string ε.

Recursion Puzzle

Recently, one of my friends challenged me to solve this puzzle which goes as follows:
Suppose that you have two variables x and y. These are the only variables which can be used for storage in the program. There are three operations which can be done:
Operation 1: x = x+y
Operation 2: x = x-y
Operation 3: y = x-y
Now, you are given two number n1 and n2 and a target number k. Starting with x = n1 and y = n2, is there a way to arrive at x = k using the operations mentioned above? If yes, what is the sequence of operations which can generate x = k.
Example: If n1 = 16, n2 = 6 and k = 28 then the answer is YES. The sequence is:
Operation 1
Operation 1
If n1 = 19, n2 = 7 and k = 22 then the answer is YES. The sequence is:
Operation 2
Operation 3
Operation 1
Operation 1
Now, I have wrapped my head around the problem for too long but I am not getting any initial thoughts. I have a feeling that this is recursion but I do not know what should be the boundary conditions. It would be very helpful if someone can direct me towards an approach which can be used to solve this problem. Thanks!
Maybe not a complete answer, but a proof that a sequence exists if and only if k is a multiple of the greatest common divisor (GCD) of n1 and n2. Let's write G = GCD(n1, n2) for brevity.
First I'll prove that x and y are always integer multiples of the G. This proof is really straightforward by induction. Hypothesis: x = p * G and y = q * G, for some integers p and q.
Initially, the hypothesis holds by definition of G.
Each of the rules respects the induction hypothesis. The rules yield:
x + y = p * G + q * G = (p + q) * G
x - y = p * G - q * G = (p - q) * G
y - x = q * G - p * G = (q - p) * G
Due to this result, there can only be a sequence to k if k is an integer multiple of the GCD of n1 and n2.
For the other direction we need to show that any integer multiple of G can be achieved by the rules. This is definitely the case if we can reach x = G and y = G. For this we use Euclid's algorithm. Consider the second implementation in the linked wiki article:
function gcd(a, b)
while a ≠ b
if a > b
a := a − b
else
b := b − a
return a
This is a repetitive application of rules 2 and 3 and results in x = G and y = G.
Knowing that a solution exists, you can apply a BFS, as shown in Amit's answer, to find the shortest sequence.
Assuming a solution exists, finding the shortest sequence to get to it can be done using a BFS.
The pseudo code should be something like:
queue <- new empty queue
parent <- new map of type map:pair->pair
parent[(x,y)] = 'root' //special indicator to stop the search there
queue.enqueue(pair(x,y))
while !queue.empty():
curr <- queue.dequeue()
x <- curr.first
y <- curr.second
if x == target or y == target:
printSolAccordingToMap(parent,(x,y))
return
x1 <- x+y
x2 <- x-y
y1 <- x-y
if (x1,y) is not a key in parent:
parent[(x1,y)] = (x,y)
queue.enqueue(pair(x1,y))
//similarly to (x2,y) and (x,y1)
The function printSolAccordingToMap() simply traces back on the map until it finds the root, and prints it.
Note that this solution only finds the optimal sequence if one exists, but will cause infinite loop if one does not exist, so this is only partial answer yet.
Consider that you have both (x,y) always <= target & >0 if not you can always bring them in the range by simple operations. If you consider this constraints you can make a graph where there are O(target*target) nodes and edge you can find by doing an operation among three on that node. You now need to evaluate the shortest path from start position node to target node which is (target,any). The assumption here is (x,y) values always stay within (0,target). The time complexity is O(target*target*log(target)) using djikstra.
In the Vincent's answer, I think the proof is not complete.
Let us suppose two relatively prime numbers suppose n1=19 and n2=13 whose GCD will be 1. According to him, sequence exits if k is multiple of GCD.Since every number is multiple of 1. I think it is not possible for every k.

How do we know that an NFA has a minimum amount of states?

Is there some kind of proof for this? How can we know that the current NFA has the minimum amount?
As opposed to DFA minimization, where efficient methods exist to not only determine the size of, but actually compute, the smallest DFA in terms of number of states that describes a given regular language, no such general method is known for determining the size of a smallest NFA. Moreover, unless P=PSPACE, no polynomial-time algorithm exists to compute a minimal NFA to recognize a language, as the following decision problem is PSPACE-complete:
Given a DFA M that accepts the regular language L, and an integer k, is there an NFA with ≤ k states accepting L?
(Jiang & Ravikumar 1993).
There is, however, a simple theorem from Glaister and Shallit that can be used to determine lower bounds on the number of states of a minimal NFA:
Let L ⊆ Σ* be a regular language and suppose that there exist n pairs P = { (xi, wi) | 1 ≤ i ≤ n } such that:
xi wi ∈ L for 1 ≤ i ≤ n
xj wi ∉ L for 1 ≤ j, i ≤ n and j ≠ i
Then any NFA accepting L has at least n states.
See: Ian Glaister and Jeffrey Shallit (1996). "A lower bound technique for the size of nondeterministic finite automata". Information Processing Letters 59 (2), pp. 75–77. DOI:10.1016/0020-0190(96)00095-6.

String to string correction problem np-completeness proof

I have this assignment to prove that this problem:
Finite alphabet £, two strings x,y €
£*, and a positive integer K. Is
there a way to derive the string y
from the string x by a sequence of K
or fewer operations of single symbol
deletion or adjacent symbol
interchange?
is np-complete. I already figured out I have to make transformation from decision version of set covering problem, but I have no clue how to do this. Any help would be appreciated.
It looks like modified Levenshtein distance. Problem can be solved with DP in quadratic time.
Transformation from minimum set cover (MSC) to this string correction problem is described in:
Robert A. Wagner
On the complexity of the Extended String-to-String Correction Problem
1975, Proceedings of seventh annual ACM symposium on Theory of computing
In short with MSC problem:
Given finite sets x_1, ..., x_n, and integer L, does there exists a subset J of {1,...,n} such that |J| <= L, and
union_{j in J} x_j = union all x_i ?
Let w = union all x_i, let t = |w| and r = t^2, and choose symbols Q, R, S not in w.
Take strings:
A = Q^r R x_1 Q^r S^(r+1) ... Q^r R x_n Q^r S^(r+1)
B = R Q^r ... R Q^r w S^(r+1) ... S^(r+1) <- each ... is n times
and
k = (l+1)r - 1 + 2t(r+1)(n-1) + n(n-1)(r+1)^2/2 + (r*n + |x_1 ... x_n| - t)*W_d
[W_d is delete operation weight, can be 1.]
It is shown that string-string correction problems (A,B,k) is satisfiable iff source MSC problem is.
From strings construction it is clear that proof is not trivial :-) But it isn't too complex to manage.
The NP-hardness proof that is mentioned only works for arbitrarily large alphabets.
For finite alphabets, the problem is polynomial-time, see
https://dblp.org/rec/bibtex/journals/tcs/Meister15

Is union is regular expression different from union in set?

In mathematical set we have
A={1,2,3}
B={4,5,6}
A U B = B U A = {1,2,3,4,5,6} ={6,5,2,3,4,1} //order does not matter
But in theory of computation we get
a u b is either a or b but not both
also in a* u b* we get aaa or bbb but not aaabbb or bbbaaa as order does not matter in union.
why is that?
Thanks
Rahman
why?
No. In formal language theory, there is a correspondence between regular expressions and regular sets over an alphabet Σ. The function L maps a regular expression u to the corresponding regular set L(u); conversely, every every regular set A corresponds to a regular expression in L-1(A):
L(∅) = ∅
L(λ) = {λ}
L(a) = {a} (for all a ∈ Σ)
L(uv) = L(u)L(v) = {xy ∈ Σ* : x ∈ L(u) ∧ x ∈ L(v)}
L(u|v) = L(u) ∪ L(v) = {x ∈ Σ* : x ∈ L(u) ∨ x ∈ L(v)}
L(u*) = ∪[i ∈ ℕ] L(u)i = ∪[i ∈ ℕ] {xi ∈ Σ* : x ∈ L(u)}
The union of regular expressions corresponds to the union of regular sets, which is the familiar union operation from set theory. A regular expression u matches a string x iff x is a member of the corresponding set L(u). Therefore, u|v matches x iff x is a member of L(u) ∪ L(v).
This probably still belongs on math overflow, and you haven't really supplied enough context for a definitive answer so I'm going to make some assumptions.
Union Types in most languages might give you an expression like:
type C = B union A
So type C is the type/set of all values that might exist in the types B or C. So a value x of type B is also a value of type C.
And this is indeed the case for many languages. However stack overflow is targeting the more concrete world of programming. Math overflow is will have more theorists that will be better able to answer your question.

Resources