Compilation - LL1 Grammar - syntax

I am studying the magic of compilers and I don't understand a result.
Here is the grammar :
S -> A #
A -> B G D E
B -> + | - | EPSILON
C -> c C | EPSILON
G -> c C
D -> . C | EPSILON
E -> e B G | EPSILON
When I try to find the "first" and "follow" sets, I get different results than the one I get when I do it with an online predictor.
Here are the results given:
Non-terminal Symbol / Follow Set
S $
A #
B c
C e, ., #
G ., #
D e, #
E #
Why isn't the follow set of G {e, ., #} ?
Because what I understand is that according to the A rule, D follow the G, so we add ., but it could also have been EPSILON, so we move to the E and it can be a e, but it could also have been EPSILON, so we move to the #, in respect with the S rule.
What am I missing here ?
I used the tool at http://hackingoff.com/compilers/predict-first-follow-set

Your computation of the FOLLOW set of G is correct.
The hackingoff tool is buggy. Here is a shorter grammar which exhibits the same error:
S -> a B C a
B -> b
C -> EPSILON
It's obvious that a is in the FOLLOW set for B but the tool reports that set as empty.

Related

Cleaner way to represent languages accepted by DFAs?

I am given 2 DFAs. * denotes final states and -> denotes the initial state, defined over the alphabet {a, b}.
1) ->A with a goes to A. -> A with b goes to *B. *B with a goes to *B. *B with b goes to ->A.
The regular expression for this is clearly:
E = a* b(a* + (a* ba* ba*)*)
And the language that it accepts is L1= {w over {a,b} | w is b preceeded by any number of a's followed by any number of a's or w is b preceeded by any number of a's followed by any number of bb with any number of a's in middle of(middle of bb), end or beginning.}
2) ->* A with b goes to ->* A. ->*A with a goes to *B. B with b goes to -> A. *B with a goes to C. C with a goes to C. C with b goes to C.
Note: A is both final and initial state. B is final state.
Now the regular expression that I get for this is:
E = b* ((ab) * + a(b b* a)*)
Finally the language that this DFA accepts is:
L2 = {w over {a, b} | w is n 1's followed by either k 01's or a followed by m 11^r0' s where n,km,r >= 0}
Now the question is, is there a cleaner way to represent the languages L1 and L2 because it does seem ugly. Thanks in advance.
E = a* b(a* + (a* ba* ba*)*)
= a*ba* + a*b(a* ba* ba*)*
= a*ba* + a*b(a*ba*ba*)*a*
= a*b(a*ba*ba*)*a*
= a*b(a*ba*b)*a*
This is the language of all strings of a and b containing an odd number of bs. This might be most compactly denoted symbolically as {w in {a,b}* | #b(w) = 1 (mod 2)}.
For the second one: the only way to get to state B is to see an a in A, and the only way to get to C from outside C is to see an a in B. C is a dead state and the only way to get to it is to see aa starting in A. That is: if you ever see two as in a row, the string is not in the language; the language is the set of all strings over a and b not containing the substring aa. This might be most compactly denoted symbolically as {(a+b)*aa(a+b)*}^c where ^c means "complement".

Linear ordering of directed multigraph of dependencies allowing for duplicates

Problem description
Given vertices V which can be seen as named "propositions".
Given weights:
data W
= Requires -- ^ Denotes that a "proposition" depends on another.
| Invalidates -- ^ Denotes that a "proposition" invalidates another.
In a linear ordering, if A requires B, then B must come before A, conversely, if A invalidates B, then B must come after A.
Given a weighted directed multigraph (multidigraph) with at most 2 parallel edges... Where a vertex can only require the inclusion of another vertex once, and only invalidates another vertex once...
G = (V, E)
E = (V, V, W)
Or alternatively represented as a directed cyclic graph with no self-loops and where the only cycles form directly between one vertex and another. With weights changed to:
data W
= Requires -- ^ Denotes that a "proposition" depends on another.
| InvalidatedBy -- ^ Denotes that a "proposition" is invalidated by another.
Given that vertices may occur more than once in the ordering...
How can a linear ordering be constructed from such a graph?
Additionally, if the tail of the linear ordering ends with a vertex V which was included due to being InvalidatedBy another vertex, then it may be omitted if the head of the ordering starts with V.
Some desired properties are:
Minimality - there should be as little duplication of vertices as possible
Stability - the ordering should be as similar as possible to the order between vertices on the same "level" in which the graph was constructed
Run-time complexity - The number of vertices are not that high, but still... the run-time complexity should be as low as possible.
If various algorithms fulfill these to varying degrees, I'd love to see all of them with their trade offs.
Algorithms written in any language, or pseudocode, are welcome.
Example graphs:
Example graph 1:
B `requires` A
C `requires` A
D `requires` A
E `invalidates` A
F `invalidates` A
G `invalidates` A
With minimal linear ordering: [A, B, C, D, E, F, G]
Example graph 2:
C `requires` A
C `invalidates` A
B `requires` A
With minimal linear ordering: [A, B, C]
Example graph 3:
B `requires` A
B `invalidates` A
C `requires` A
C `invalidates` A
With minimal linear ordering: [A, B, A, C]
Naive implementation
A naive implementation constructs a linear ordering by starting with all nodes with no incoming edges and for all of those nodes:
fetches all outgoing edges
partitions those by requires/invalidates
constructs the linear ordering of "requires" and puts that first
adds the current node
constructs the linear ordering of "invalidates" and adds that.
Here's a Haskell implementation of this description:
import Data.List (partition)
import Data.Maybe (fromJust)
import Control.Arrow ((***))
import Data.Graph.Inductive.Graph
fboth :: Functor f => (a -> b) -> (f a, f a) -> (f b, f b)
fboth f = fmap f *** fmap f
outs :: Graph gr => gr a b -> Node -> (Adj b, a)
outs gr n = let (_, _, l, o) = fromJust $ fst $ match n gr in (o, l)
starts :: Graph gr => gr a b -> [(Adj b, a)]
starts gr = filter (not . null . fst) $ outs gr <$> nodes gr
partW :: Adj W -> (Adj W, Adj W)
partW = partition ((Requires ==) . fst)
linearize :: Graph gr => gr a W -> [a]
linearize gr = concat $ linearize' gr <$> starts gr
linearize' :: Graph gr => gr a W -> (Adj W, a) -> [a]
linearize' gr (o, a) = concat req ++ [a] ++ concat inv
where (req, inv) = fboth (linearize' gr . outs gr . snd) $ partW o
The ordering can then be optimized by removing equal consecutive like so:
-- | Remove consecutive elements which are equal to a previous element.
-- Runtime complexity: O(n), space: O(1)
removeConsequtiveEq :: Eq a => [a] -> [a]
removeConsequtiveEq = \case
[] -> []
[x] -> [x]
(h:t) -> h : ug h t
where
ug e = \case
[] -> []
(x:xs) | e == x -> ug x xs
(x:xs) | otherwise -> x : ug x xs
Edit: Using DCG, SCC, and topsort
With the algorithm described by #Cirdec :
Given a directed cyclic graph (DCG) where edges of form: (f, t) denote that f must come before t in the ordering.
Compute the condensation of the DCG in 1.
Turn each SSC in the condensation in 2. into a palindrome.
Compute the topsort of the graph in 3.
Concatenate the computed ordering.
In Haskell:
{-# LANGUAGE LambdaCase #-}
import Data.List (nub)
import Data.Maybe (fromJust)
import Data.Graph.Inductive.Graph
import Data.Graph.Inductive.PatriciaTree
import Data.Graph.Inductive.NodeMap
import Data.Graph.Inductive.Query.DFS
data MkEdge = MkEdge Bool Int Int
req = MkEdge True
inv = MkEdge False
toGraph :: [MkEdge] -> [(Int, Int, Bool)] -> Gr Int Bool
toGraph edges es = run_ empty nm
where ns = nub $ edges >>= \(MkEdge _ f t) -> [f, t]
nm = insMapNodesM ns >> insMapEdgesM es
-- | Make graph into a directed cyclic graph (DCG).
-- "Requires" denotes a forward edge.
-- "Invalidates" denotes a backward edge.
toDCG :: [MkEdge] -> Gr Int Bool
toDCG edges = toGraph edges $
(\(MkEdge w f t) -> if w then (t, f, w) else (f, t, w)) <$> edges
-- | Make a palindrome of the given list by computing: [1 .. n] ++ [n - 1 .. 1].
-- Runtime complexity: O(n).
palindrome :: [a] -> [a]
palindrome = \case
[] -> []
xs -> xs ++ tail (reverse xs)
linearize :: Gr Int a -> [Int]
linearize dcg = concat $ topsort' scc2
where scc = nmap (fmap (fromJust . lab dcg)) $ condensation dcg
scc2 = nmap palindrome scc
For the graph g2:
g2 = [ 2 `req` 1
, 2 `inv` 1
, 3 `req` 1
, 3 `inv` 1
, 4 `req` 1
, 5 `inv` 1
]
> prettyPrint $ toDCG g2
1:2->[(False,2)]
2:1->[(True,1),(True,3),(True,4)]
3:3->[(False,2)]
4:4->[]
5:5->[(False,2)]
> prettyPrint $ condensation $ toDCG g2
1:[5]->[((),2)]
2:[1,2,3]->[((),3)]
3:[4]->[]
> linearize $ toDCG g2
[5,2,1,3,1,2,4]
This ordering is neither minimal nor valid since the ordering violates the dependencies. 5 invalidates 1, which 2 depends on. 2 invalidates 1 which 4 depends on.
A valid and minimal ordering is: [1,4,2,1,3,5]. By shifting the list to the right, we get [5,1,4,2,1,3] which is also a valid ordering.
If the direction of the graph is flipped, the ordering becomes: [4,2,1,3,1,2,5]. This is not a valid ordering either... At the boundaries, 5 can happen, and then 4, but 5 invalidates 1 which 4 depends on.
I believe the following algorithm will find a minimal string of vertices in linear time:
Decompose the graph into its strongly connected components. Existing algorithms do this in linear time.
In each strongly connected component each node needs to be listed both before and after every other node. List the nodes [1..n] of each strongly connected component in the following order [1..n] ++ [n-1..1]
Concatenate the strongly connected components together in order by a topological sort. Existing algorithms topologically sort directed acylic graphs like this in linear time.

This LL(1) parse table is correct?

Given grammar:
S -> AB
A -> aA | b
B -> CA
C -> cC | ɛ
Is its LL(1) parsing table is this?
No, it is not entirely correct because of these calculations:
First(S) = First(A) = {a,b}
First(A) = {a,b}
First(B) = First(C) = {c,ε}
First(C) = {c,ε}
Considering that the Follow of each non-terminal symbol is the terminal symbol right after:
Follow(S) ={a,b} (if SAB --> AB then SaAB --> aAB or SbB --> bB)
Follow(A) = {a,c} (if AaA-->aA and Ab --> b then AaA --> aA or Ab --> b)
Follow(B) = Follow (A) = {a,c} (model production A --> aB, which a terminal, and a = ε, then Follow (A) = Follow (B))
Follow(C) = {a,b} (from B-->CA, B-->CaA or B-->Cb)
So the the difference with your parse table, and these calculations, is that in non-terminal B row in columns a and b the values are NULL.
Yes it is correct.
First(S) = First(A) = {a,b}
First(A) = {a,b}
First(B) = {a,b,c}
B->CA and C->cC|ɛ
First(C) = {c,ε}
so if we put ɛ as a replacement of C in B -> CA, we'll have B -> A, Thus First(B)= First(A) instead of ɛ.

Find the closures of a set of attributes given a relation and FDs

I've the following relation:
R = BCDEFGHI
and the following FDs
C -> D
E -> D
EF -> G
EG -> F
FG -> E
FH -> C
H -> B
I'm asked to find the closure of the following set of attributes:
BC
BDEFG
CEFG
EFG
EFGH
My attempts
Let BC+ = BC.
Using FD C -> D, we have DC+ = BCD, and we're done.
Let BDEFG+ = BDEFG.
We're done.
Let CEFG+ = CEFG.
Using FD C -> D, then CEFG+ = CEFGD, and we're done.
Let EFG+ = EFG.
Using FD E -> D, then EFG+ = EFGD, and we're done.
Let EFGH+ = EFGH.
Using FD E -> D, then EFGH+ = EFGHD.
Using FD FH -> C, then EFGH+ = EFGHDC
Using FD H -> B, then EFGH+ = EFGHDCB, and we're done.
Since I'm very new to these topics, I'm not sure if what I've done is correct or not. I would appreciate some feedback from you! Thanks!
Looks ok. (Assuming that you properly did the steps that you didn't mention, ie decisions re dealing with FDs that you didn't mention and re stopping.)
(Don't say that a closure is equal to something when it isn't. Use some name for the algorithm's accumulator like "1. Let F = BC. Using ... then let F = BCE; Done so BC+ = F = BCE". Or write something like "1. Finding BC+: Using ... then BC+ >= BCE; Done so BC+ = BCE".)

Resolving PREDICT/PREDICT conflicts in LL(1)

I'm working on a simple LL(1) parser generator, and I've run into an issue with PREDICT/PREDICT conflicts given certain input grammars. For example, given an input grammar like:
E → E + E
| P
P → 1
I can remove out the left recursion from E, replacing it with a roughly equivalent right recursive rule, thus arriving at the grammar:
E → P E'
E' → + E E'
| ε
P → 1
Next, I can compute the relevant FIRST and FOLLOW sets for the grammar, and end up with the following:
FIRST(E) = { 1 }
FIRST(E') = { +, ε }
FIRST(P) = { 1 }
FOLLOW(E) = { +, EOF }
FOLLOW(E') = { +, EOF }
FOLLOW(P) = { +, EOF }
And finally, using PREDICT(A → α) = { FIRST(α) - ε } ∪ (FOLLOW(A) if ε ∈ FIRST(α) else ∅) to construct the PREDICT sets for the grammar, the resulting sets are as follows.
PREDICT(1. E → P E') = { 1 }
PREDICT(2. E' → + E E') = { +, EOF }
PREDICT(3. E' → ε) = { +, EOF }
PREDICT(4. P → 1) = { 1 }
So this is where I run into the conflict that PREDICT(2) = PREDICT(3), and thus, I cannot produce a parse table as the grammar is not LL(1), since parser wouldn't be able to choose which rule should be applied.
What I'm really wondering is whether it's possible to resolve the conflict or factor the grammar such that the conflict can be avoided, and produce a legal LL(1) grammar, without having to directly modify the original input grammar.
The problem here is that your original grammar is ambiguous.
E → E + E
E → P
means that P + P + P can be parsed either as (P + P) + P or P + (P + P). Eliminating left recursion doesn't fix the ambiguity, so the modified grammar is also ambiguous. And ambiguous grammars can't be LL(k) (or, for that matter, LR(k)).
So you need to make the grammar unambiguous:
E → E + P
E → P
(That's the common left-associative version.) Once you eliminate left recursion, you end up with:
E → P E'
E' → + P E'
| ε
Now + is not in FOLLOW(E').
(The example is drawn straight from the Dragon book, but simplified; it's example 4.8 in the rather battered old copy I have.)
It's worth noting that the transformation used here preserves the set of strings derived by the grammar, but not the derivation. The parse tree which results from the modified grammar is effectively right-associative, so it will need to be reprocessed to recover the desired parse. This fact is rather briefly mentioned by the Dragon book authors:
Although left-recursion elimination and left factoring are easy to do, they make the resulting grammar hard to read and difficult to use for translation purposes. (My emphasis)
They go on to suggest that operator precedence parsing can be used for expressions, and then mention that if an LR parser generator is available, dividing the grammar into a predictive part and an operator-precedence part is no longer necessary.

Resources