Left recursion elimination issue - left-recursion

So I have this left recursive grammar
E → E Op1 E2 | E2
As it stand, it is left recursion, so I eliminated the left recursion by putting in another step:
E → X E2
X → E Op1 E2 | ε
I have a sinking feeling however that I eliminated it wrongly, because if I trace it then the FIRST set of E is still going to be starting with E. Am I correct? Or am I missing something? This question is a part of a bigger grammar set, FYI.

What you're missing is the second part of recursion elimination: instead of
X → E Op1 E2 | ε
you need
X → Op1 E2 X | ε

Related

Simple Cardinality Proof

So I'm trying to perform a simple proof using cardinalities. It looks like:
⟦(A::nat set) ∩ B = {}⟧ ⟹ (card (A ∪ B) = card A + card B)
Which seems to makes sense, but for some reason blast hangs, the rest of the provers fail to apply, and sledgehammer times out. Is there a gap in what I think I know about cardinalities? If not, how can I prove this lemma?
Thanks in advance!
I believe that the lemma you are trying to prove does not appropriately consider the case of infinite sets.
In Isabelle/HOL, infinite cardinalities are represented by zero. As we can see by the following lemma.
lemma "¬(finite A) ⟹ card A = 0"
by simp
If we consider the case of an infinite set, A, and a set of one element, B, then assume the intersection, A ∩ B is an empty set.
We are left with:
card (A ∪ B) = 0 as their union will also be infinite.
card A = 0
card B = 1
So we can see that in this case, the lemma does not hold.
The lemma can be corrected by asserting both sets are finite:
lemma
"⟦finite A; finite B; ((A::nat set) ∩ B) = {}⟧ ⟹ (card (A ∪ B) = card A + card B)"
by (simp add: card_Un_disjoint)
Which is essentially the same as the card_Un_disjoint used by the proof:
lemma card_Un_disjoint: "finite A ⟹ finite B ⟹ A ∩ B = {} ⟹ card (A ∪ B) = card A + card B"
using card_Un_Int [of A B] by simp

proving that a language is part of a grammar and vice versa

so here a grammar R and a Langauge L, I want to prove that from R comes out L.
R={S→abS|ε} , L={(ab)n|n≥0}
so I thought I would prove that L(G) ⊆ L and L(G) ⊇ L are right.
for L (G) ⊆ L: I show by induction on the number i of derivative steps that after every derivative step u → w through which w results from u according to the rules of R, w = v1v2 or w = v1v2w with | v2 | = | v1 | and v1 ∈ {a} ∗ and v2 ∈ {b} ∗.
and in the induction start: at i = 0 it produces that w is ε and at i = 1 w is {ε, abS}.
is that right so far ?
so here a grammar R and a Langauge L, I want to prove that from R comes out L.
Probably what you want to do is show that the language L(R) of some grammar R is the same as some other language L specified another way (in your case, set-builder notation with a regular expression).
so I thought I would prove that L(G) ⊆ L and L(G) ⊇ L are right.
Given the above assumption, you are correct in thinking this is the right way to proceed with the proof.
for L (G) ⊆ L: I show by induction on the number i of derivative steps that after every derivative step u → w through which w results from u according to the rules of R, w = v1v2 or w = v1v2w with | v2 | = | v1 | and v1 ∈ {a} ∗ and v2 ∈ {b} ∗. and in the induction start: at i = 0 it produces that w is ε and at i = 1 w is {ε, abS}.
This is hard for me to follow. That's not to say it's wrong. Let me write it down in my own words and perhaps you or others can judge whether we are saying the same thing.
We want to show that L(R) is a subset of L. That is, any string generated by the grammar R is contained in the language L. We can prove this by mathematical induction on the number of steps in the derivation of strings generated by the grammar. We start with the base case of one derivation step: S -> e produces the empty word, which is a string in the language L by choosing n = 0. Now that we have established the base case, we can state the induction hypothesis: assume that for all strings derived from the grammar in a number of steps up to and including k, those strings are also in L. Now we must prove the induction step: that any string derived in k+1 steps from the grammar is also in L. Let w be any string derived from the grammar in k+1 steps. From the grammar it is clear that the derivation of w must be S -> abS -> ababS -> ... -> abab...abS -> abab...abe = abab...ab. But this derivation is the same as the derivation of a string from the grammar in k steps, except that there was one extra application of S -> abS before the application of S -> e. By the induction hypothesis we know that the string w' derived in k steps is of the form (ab)^m for some m at least zero, and adding an extra application of S -> abS to the derivation adds ab. Because (ab)^m(ab) = (ab)^(m+1) we can choose n = m+1. So, all strings derived from the grammar in k+1 steps are also in the language, as required.
To prove that all strings in the language can be derived in the grammar, consider the following construction: to derive the string (ab)^n in the grammar, apply the production S -> abS a number of times equal to n, and the production S -> e exactly once. The first step gives an intermediate form (ab)^nS and the second step gives a closed form string (ab)^n.

First sets of LL(1) parser

I have some problems understanding the following rules applied for first sets of LL(1) parser:
b) Else X1 is a nonterminal, so add First(X1) - ε to First(u).  
a. If X1 is a nullable nonterminal, i.e., X1 =>* ε,  add First(X2) - ε to First(u). 
Furthermore, if X2 can also go to ε, then add First(X3) - ε and so on, through all Xn until the first non­nullable symbol is encountered. 
b. If X1X2...Xn =>* ε, add ε to the first set.
How at b) if X1 nonterminal it can't add ε to First(u)? So if I have
S-> A / a
A-> b / ε
F(A) = {b,ε}
F(S) = {b,ε,a}
it's not correct? Also the little points a and b are confusing.
All it says is what are the terminals you can expect in a sentential form so that you can replace S by AB in the leftmost derivation. So, if A derives ε then in leftmost derivation you can replace A by ε. So now you depend upon B and say on. Consider this sample grammar:
S -> AB
A -> ε
B -> h
So, if there is a string with just one character/terminal "h" and you start verifying whether this string is valid by checking if there is any leftmost derivation deriving the string using the above grammar, then you can safely replace S by AB because A will derive ε and B will derive h.
Therefore, the language recognized by above grammar cannot have a null ε string. For having ε in the language, B should also derive ε. So now both the non-terminals A and B derive ε, therefore S derives ε.
That is, if there is some production S->ABCD and if all the non-terminals A,B,C and D derive ε, then only S can also derive ε and therefore ε will be in FIRST(S).
The FIRST sets given by you are correct. I think you are confused since the production S->A has only one terminal A on rhs and this A derives ε. Now as per b) FIRST(S) = {FIRST(A) - ε, a,} = {b, a} which is incorrect. Since rhs has only one terminal so there is this following possibility S -> A -> ε which specifies that FIRST(S) has ε or S can derive a null string ε.

How does one read the syntax for the Braun tree insertion?

In the section on insertion into Braun trees of the Verified Programming in Agda book (page 118), the author does some explanation of what the code is supposed to be doing, but leaving what it does aside, a singificant ommision in the book so far is not explaining the strange syntax in function pattern matching for theorem proving.
I understand that the with pattern can be further destructured by using | and I can understand that when using rewrite, | can also be used to separate the different rewrites, but this makes it confusing.
As far as I can tell, rewrite is definitely not a function. And then comes the following:
bt-insert a (bt-node{n}{m} a' l r p)
rewrite +comm n m with p | if a <A a' then (a , a') else (a' , a)
bt-insert a (bt-node{n}{m} a' l r _) | inj₁ p | (a1 , a2)
rewrite p = (bt-node a1 (bt-insert a2 r) l (inj₂ refl))
bt-insert a (bt-node{n}{m} a' l r _) | inj₂ p | (a1 , a2) =
(bt-node a1 (bt-insert a2 r) l (inj₁ (sym p)))
I am really confused as to how rewrite +comm n m with p | if a <A a' then (a , a') else (a' , a) should be parsed mentally. And how does one read | inj₁ p | (a1 , a2) rewrite p? Also, while testing the previous examples I've discovered that for some reason the order of the rewrites does not matter. Why is that?
If you ignore the proofs for a sec, this function can be simplified as
bt-insert : ∀ {n: ℕ} → A → braun-tree n → braun-tree (suc n)
bt-insert a (bt-node {n} {m} a' l r _) = bt-node a1 (bt-insert a2 r) l _
where
(a1, a2) = if a <A a' then (a , a') else (a' , a)
So (a1, a2) is just (min a a', max a a') i.e. (a, a') sorted.
All the other code is there to maintain the proofs of the invariants:
We rewrite +comm n m so that we can return a braun-tree (2 + (m + n)) even though the return type requires a braun-tree (2 + (n + m)).
p is used to prove that the resulting tree is still balanced: p proves that n ≡ m ∨ n ≡ suc m, so it's either inj₁ (p : n ≡ m) or inj₂ (p : n ≡ suc m). We use the proof of either property to compute the proof of suc m ≡ n ∨ suc m ≡ suc n (remember we flipped n and m via the proof of commutativity).
After pondering it for a bit, I realized that if...
p | if a <A a' then (a , a') else (a' , a)
inj₁ p | (a1 , a2)
I put the expressions like that then it makes sense visually. In bt_insert's second case the rewrite comes before the if statement and in the third case it comes after the destructuring of the if pattern.
Well, that leaves figuring out what the rest of the function is doing.

First-order logic formula

If I want to express in first-order logic that 'the element(s) in the set with the smallest radius has the value 0', would the following be right?
∀ e1 ∈ S. ∀ e2 ∈ S. Radius e1 ≤ Radius e2 ⇒ Value e1 = 0?
Are the variables correctly quantified?
Thanks
Just to clarify with parentheses, what you wrote is usually taken to mean:
\forall e1 \in S. (\forall e2 \in S. (Radius e1 <= Radius e2 --> Value e1 = 0))
This statement asserts that the value of every element is 0. Here's how: Pick an arbitrary e1, now pick e2 = e1, and we have: Radius e1 <= Radius e1 --> Value e1 = 0. Since the antecedent (thing before the -->) is true, we have Value e1 = 0. And since we made no assumptions about e1, we have forall e \in S. Value e = 0.
The problem is that your parentheses are off.
\forall e1 \in S. (\forall e2 \in S. Radius e1 <= Radius e2) --> Value e1 = 0
In order for the antecedent to be true now, e1's radius has to be less than or equal to every (as opposed to any) other radius, which seems like what you intended.
I think you want an exists
\exists e_1 . (\forall e_2 radius(e_1) <= radius(e_2)) and (radius(e_1) = 0)
I'm not sure about the precedence in the formula, but now that I think I understand the question, maybe you want (where M is the minimality condition radius(e_1) < radius(e_2))
\forall e_1 . ((\forall e_2 . M) -> value e_1 = 0)
I think your previous formula may be wrong for the following reason. Suppose you have elements with radii { 0, 1, 2 }, and values equal to radii. Then, you will have a case where 1 <= 2, but the value is not zero. If I'm interpreting your original formula correctly,
\forall e_1 . \forall e_2 . P(e_1, e_2)
Then this counterexample provides a case where P is false, therefore the entire formula fails (but the example should be true).
What you wrote is also true if there are no elements with the smallest radius. If this is desired, you are correct; if not, you need to add a clause to that effect:
(\forall e1 \in S. \forall e2 \in S. Radius e1 <= Radius e2 --> Value e1 = 0) \and (\exists e1 \in S. \forall e2 \in S. Radius e1 <= Radius e2)

Resources