This LL(1) parse table is correct? - compilation

Given grammar:
S -> AB
A -> aA | b
B -> CA
C -> cC | ɛ
Is its LL(1) parsing table is this?

No, it is not entirely correct because of these calculations:
First(S) = First(A) = {a,b}
First(A) = {a,b}
First(B) = First(C) = {c,ε}
First(C) = {c,ε}
Considering that the Follow of each non-terminal symbol is the terminal symbol right after:
Follow(S) ={a,b} (if SAB --> AB then SaAB --> aAB or SbB --> bB)
Follow(A) = {a,c} (if AaA-->aA and Ab --> b then AaA --> aA or Ab --> b)
Follow(B) = Follow (A) = {a,c} (model production A --> aB, which a terminal, and a = ε, then Follow (A) = Follow (B))
Follow(C) = {a,b} (from B-->CA, B-->CaA or B-->Cb)
So the the difference with your parse table, and these calculations, is that in non-terminal B row in columns a and b the values are NULL.

Yes it is correct.
First(S) = First(A) = {a,b}
First(A) = {a,b}
First(B) = {a,b,c}
B->CA and C->cC|ɛ
First(C) = {c,ε}
so if we put ɛ as a replacement of C in B -> CA, we'll have B -> A, Thus First(B)= First(A) instead of ɛ.

Related

Determine if relation is in BCNF form?

How do I determine if the following relation is in BCNF form?
R(U,V,W,X,Y,Z)
UVW ->X
VW -> YU
VWY ->Z
I understand that for a functional dependency A->B A must be a superkey. And the relation must be in 3NF form. But I am unsure how to apply the concepts.
To determine if a relation is in BCNF, for the definition you should check that for each non-trivial dependency in F+, that is, for all the dependencies specified (F) and those derived from them, the determinant should be a superkey. Fortunately, there is a theorem that says that it is sufficient perform this check only for the specified dependencies.
In your case this means that you must check if UVW, VW and VWY are superkeys.
And to see if in a dependency X -> Y the set attributes X is a superkey you can compute the closure of the attributes (X+) and check if it contains the right hand part Y.
So you have to compute UVW+ and see if it contains {U,V,W,X,Y,Z} and similarly for the other two dependencies. I leave to you this simple exercise.
Using the algorithm to compute the closure of a given set of attributes and the definition of BCNF as shown in the following figure,
we can implement the above algorithm in python to compute closure of attributes and then determine whether a given set of attributes forms a superkey or not, as shown in the following code snippet:
def closure(s, fds):
c = s
for f in fds:
l, r = f[0], f[1]
if l.issubset(c):
c = c.union(r)
if s != c:
c = closure(c, fds)
return c
def is_superkey(s, rel, fds):
c = closure(s, fds)
print(f'({"".join(sorted(s))})+ = {"".join(sorted(c))}')
return c == rel
Now check if for each given FD A -> B from relation R, A is a superkey or not, to determine whether R is in BCNF or not:
def is_in_BCNF(rel, fds):
for fd in fds:
l, r = fd[0], fd[1]
isk = is_superkey(l, rel, fds)
print(f'For the Functional Dependency {"".join(sorted(l))} -> {"".join(sorted(r))}, ' +\
f'{"".join(sorted(l))} {"is" if isk else "is not"} a superkey')
if not isk:
print('=> R not in BCNF!')
return False
print('=> R in BCNF!')
return True
To process the given FDs in standard form, to convert to suitable data structure, we can use the following function:
import re
def process_fds(fds):
pfds = []
for fd in fds:
fd = re.sub('\s+', '', fd)
l, r = fd.split('->')
pfds.append([set(list(l)), set(list(r))])
return pfds
Now, let's test with a couple of relations:
relation = {'U','V','W','X','Y','Z'}
fds = process_fds(['UVW->X', 'VW->YU', 'VWY->Z'])
is_in_BCNF(relation, fds)
# (UVW)+ = UVWXYZ
# For the Functional Dependency UVW -> X, UVW is a superkey
# (VW)+ = UVWXYZ
# For the Functional Dependency VW -> UY, VW is a superkey
# (VWY)+ = UVWXYZ
# For the Functional Dependency VWY -> Z, VWY is a superkey
# => R in BCNF!
relation = {'A','B','C'}
fds = process_fds(['A -> BC', 'B -> A'])
is_in_BCNF(relation, fds)
# (A)+ = ABC
# For the Functional Dependency A -> BC, A is a superkey
# (B)+ = ABC
# For the Functional Dependency B -> A, B is a superkey
# => R in BCNF!

Cleaner way to represent languages accepted by DFAs?

I am given 2 DFAs. * denotes final states and -> denotes the initial state, defined over the alphabet {a, b}.
1) ->A with a goes to A. -> A with b goes to *B. *B with a goes to *B. *B with b goes to ->A.
The regular expression for this is clearly:
E = a* b(a* + (a* ba* ba*)*)
And the language that it accepts is L1= {w over {a,b} | w is b preceeded by any number of a's followed by any number of a's or w is b preceeded by any number of a's followed by any number of bb with any number of a's in middle of(middle of bb), end or beginning.}
2) ->* A with b goes to ->* A. ->*A with a goes to *B. B with b goes to -> A. *B with a goes to C. C with a goes to C. C with b goes to C.
Note: A is both final and initial state. B is final state.
Now the regular expression that I get for this is:
E = b* ((ab) * + a(b b* a)*)
Finally the language that this DFA accepts is:
L2 = {w over {a, b} | w is n 1's followed by either k 01's or a followed by m 11^r0' s where n,km,r >= 0}
Now the question is, is there a cleaner way to represent the languages L1 and L2 because it does seem ugly. Thanks in advance.
E = a* b(a* + (a* ba* ba*)*)
= a*ba* + a*b(a* ba* ba*)*
= a*ba* + a*b(a*ba*ba*)*a*
= a*b(a*ba*ba*)*a*
= a*b(a*ba*b)*a*
This is the language of all strings of a and b containing an odd number of bs. This might be most compactly denoted symbolically as {w in {a,b}* | #b(w) = 1 (mod 2)}.
For the second one: the only way to get to state B is to see an a in A, and the only way to get to C from outside C is to see an a in B. C is a dead state and the only way to get to it is to see aa starting in A. That is: if you ever see two as in a row, the string is not in the language; the language is the set of all strings over a and b not containing the substring aa. This might be most compactly denoted symbolically as {(a+b)*aa(a+b)*}^c where ^c means "complement".

Compilation - LL1 Grammar

I am studying the magic of compilers and I don't understand a result.
Here is the grammar :
S -> A #
A -> B G D E
B -> + | - | EPSILON
C -> c C | EPSILON
G -> c C
D -> . C | EPSILON
E -> e B G | EPSILON
When I try to find the "first" and "follow" sets, I get different results than the one I get when I do it with an online predictor.
Here are the results given:
Non-terminal Symbol / Follow Set
S $
A #
B c
C e, ., #
G ., #
D e, #
E #
Why isn't the follow set of G {e, ., #} ?
Because what I understand is that according to the A rule, D follow the G, so we add ., but it could also have been EPSILON, so we move to the E and it can be a e, but it could also have been EPSILON, so we move to the #, in respect with the S rule.
What am I missing here ?
I used the tool at http://hackingoff.com/compilers/predict-first-follow-set
Your computation of the FOLLOW set of G is correct.
The hackingoff tool is buggy. Here is a shorter grammar which exhibits the same error:
S -> a B C a
B -> b
C -> EPSILON
It's obvious that a is in the FOLLOW set for B but the tool reports that set as empty.

Find the closures of a set of attributes given a relation and FDs

I've the following relation:
R = BCDEFGHI
and the following FDs
C -> D
E -> D
EF -> G
EG -> F
FG -> E
FH -> C
H -> B
I'm asked to find the closure of the following set of attributes:
BC
BDEFG
CEFG
EFG
EFGH
My attempts
Let BC+ = BC.
Using FD C -> D, we have DC+ = BCD, and we're done.
Let BDEFG+ = BDEFG.
We're done.
Let CEFG+ = CEFG.
Using FD C -> D, then CEFG+ = CEFGD, and we're done.
Let EFG+ = EFG.
Using FD E -> D, then EFG+ = EFGD, and we're done.
Let EFGH+ = EFGH.
Using FD E -> D, then EFGH+ = EFGHD.
Using FD FH -> C, then EFGH+ = EFGHDC
Using FD H -> B, then EFGH+ = EFGHDCB, and we're done.
Since I'm very new to these topics, I'm not sure if what I've done is correct or not. I would appreciate some feedback from you! Thanks!
Looks ok. (Assuming that you properly did the steps that you didn't mention, ie decisions re dealing with FDs that you didn't mention and re stopping.)
(Don't say that a closure is equal to something when it isn't. Use some name for the algorithm's accumulator like "1. Let F = BC. Using ... then let F = BCE; Done so BC+ = F = BCE". Or write something like "1. Finding BC+: Using ... then BC+ >= BCE; Done so BC+ = BCE".)

Binary to ternary representation conversion

Does anybody know (or may point to some source to read about) a method or algorithm to convert a number represented in binary numeral system into the ternary one (my particular case), or universal algorithm for such conversions?
The solution I've already implemented is to convert a number to decimal first and then convert it into required numeral system. This works, but there are two steps. I wonder if it could be done in one step easily without implementing ternary arithmetic first? Is there some trick, guys?
UPD: It seems I didn't manage to describe clearly which way of conversion I'm looking for. I'm not asking for some way to convert base-2 to base-3, I do know how to do this. You may consider that I have algebraic data structures for ternary and binary numbers, in Haskell it looks like this:
data BDigit = B0 | B1
type BNumber = [BDigit]
data TDigit = T0 | T1 | T2
type TNumber = [TDigit]
And there are two obvious ways to convert one to another: first is to convert it into Integer first and get the result (not interesting way), second is to implement own multiplication and addition in base-3 and compute the result multiplying digit values to respective power of two (straightforward and heavy).
So I'm wondering if there's another method than these two.
If you are doing it with a computer things are already in binary, so just repeatedly dividing by 3 and taking remainders is about as easy as things get.
If you are doing it by hand, long division in binary works just like long division in decimal.
just divide by three and take remainders. if we start with 16
___101
11 |10000
11
100
11
1
100000 / 11 = 101 + 1/11 so the least significnnt digit is 1
101/ 11 = 1 + 10/11 the next digit is 2
1 and the msd is 1
so in ternary 121
You can use some clever abbreviations for converting. The following code is the "wrong" direction, it is a conversion from ternary to binary based on the fact that 3^2 = 2^3 + 1 using only binary addition. Basically I'm converting two ternary digits in three binary digits. From binary to ternary would be slightly more complicated, as ternary addition (and probably subtraction) would be required (working on that). I'm assuming the least significant digit in head of the list (which is the only way that makes sense), so you have to read the numbers "backwards".
addB :: BNumber → BNumber → BNumber
addB a [] = a
addB [] b = b
addB (B0:as) (B0:bs) = B0 : (addB as bs)
addB (B0:as) (B1:bs) = B1 : (addB as bs)
addB (B1:as) (B0:bs) = B1 : (addB as bs)
addB (B1:as) (B1:bs) = B0 : (addB (addB as bs) [B1])
t2b :: TNumber → BNumber
t2b [] = []
t2b [T0] = [B0]
t2b [T1] = [B1]
t2b [T2] = [B0,B1]
t2b (T2:T2:ts) = let bs = t2b ts in addB bs (B0:B0:B0:(addB bs [B1]))
t2b (t0:t1:ts) =
let bs = t2b ts
(b0,b1,b2) = conv t0 t1
in addB bs (b0:b1:b2:bs)
where conv T0 T0 = (B0,B0,B0)
conv T1 T0 = (B1,B0,B0)
conv T2 T0 = (B0,B1,B0)
conv T0 T1 = (B1,B1,B0)
conv T1 T1 = (B0,B0,B1)
conv T2 T1 = (B1,B0,B1)
conv T0 T2 = (B0,B1,B1)
conv T1 T2 = (B1,B1,B1)
[Edit] Here is the binary to ternary direction, as expected a little bit more lengthy:
addT :: TNumber → TNumber → TNumber
addT a [] = a
addT [] b = b
addT (T0:as) (T0:bs) = T0 : (addT as bs)
addT (T1:as) (T0:bs) = T1 : (addT as bs)
addT (T2:as) (T0:bs) = T2 : (addT as bs)
addT (T0:as) (T1:bs) = T1 : (addT as bs)
addT (T1:as) (T1:bs) = T2 : (addT as bs)
addT (T2:as) (T1:bs) = T0 : (addT (addT as bs) [T1])
addT (T0:as) (T2:bs) = T2 : (addT as bs)
addT (T1:as) (T2:bs) = T0 : (addT (addT as bs) [T1])
addT (T2:as) (T2:bs) = T1 : (addT (addT as bs) [T1])
subT :: TNumber → TNumber → TNumber
subT a [] = a
subT [] b = error "negative numbers supported"
subT (T0:as) (T0:bs) = T0 : (subT as bs)
subT (T1:as) (T0:bs) = T1 : (subT as bs)
subT (T2:as) (T0:bs) = T2 : (subT as bs)
subT (T0:as) (T1:bs) = T2 : (subT as (addT bs [T1]))
subT (T1:as) (T1:bs) = T0 : (subT as bs)
subT (T2:as) (T1:bs) = T1 : (subT as bs)
subT (T0:as) (T2:bs) = T1 : (subT as (addT bs [T1]))
subT (T1:as) (T2:bs) = T2 : (subT as (addT bs [T1]))
subT (T2:as) (T2:bs) = T0 : (subT as bs)
b2t :: BNumber → TNumber
b2t [] = []
b2t [B0] = [T0]
b2t [B1] = [T1]
b2t [B0,B1] = [T2]
b2t [B1,B1] = [T0,T1]
b2t (b0:b1:b2:bs) =
let ts = b2t bs
(t0,t1) = conv b0 b1 b2
in subT (t0:t1:ts) ts
where conv B0 B0 B0 = (T0,T0)
conv B1 B0 B0 = (T1,T0)
conv B0 B1 B0 = (T2,T0)
conv B1 B1 B0 = (T0,T1)
conv B0 B0 B1 = (T1,T1)
conv B1 B0 B1 = (T2,T1)
conv B0 B1 B1 = (T0,T2)
conv B1 B1 B1 = (T1,T2)
[Edit2] A slightly improved version of subT which doesn't need addT
subT :: TNumber → TNumber → TNumber
subT a [] = a
subT [] b = error "negative numbers supported"
subT (a:as) (b:bs)
| b ≡ T0 = a : (subT as bs)
| a ≡ b = T0 : (subT as bs)
| a ≡ T2 ∧ b ≡ T1 = T1 : (subT as bs)
| otherwise = let td = if a ≡ T0 ∧ b ≡ T2 then T1 else T2
in td : (subT as $ addTDigit bs T1)
where addTDigit [] d = [d]
addTDigit ts T0 = ts
addTDigit (T0:ts) d = d:ts
addTDigit (T1:ts) T1 = T2:ts
addTDigit (t:ts) d = let td = if t ≡ T2 ∧ d ≡ T2 then T1 else T0
in td : (addTDigit ts T1)
I think that everybody is missing something important. First, compute a table in advance, for each binary bit, we need the representation in ternary. In MATLAB, I'd built it like this, although every other step after that will be done purely by hand, the computation is so easy.
dec2base(2.^(0:10),3)
ans =
0000001
0000002
0000011
0000022
0000121
0001012
0002101
0011202
0100111
0200222
1101221
Now, consider the binary number 011000101 (which happens to be the decimal number 197, as we will find out later.) Extract the ternary representation for each binary bit from the table. I'll write out the corresponding rows.
0000001
0000011
0002101
0011202
Now just sum. We get this representation, in uncarried ternary.
0013315
Yes, those are not ternary numbers, but they are almost in a valid base 3 representation. Now all you need to do is to do the carries. Start with the units digit.
5 is larger than 2, so subtract off the number of multiples of 3, and increment the second digit of the result as appropriate.
0013322
The second digit is now a 2, a legal ternary digit, so go on to the third digit. Do that carry too,
0014022
Finally yielding the now completely valid ternary number...
0021022
Were my computations correct? I'll let MATLAB make the final judgement for us:
base2dec('011000101',2)
ans =
197
base2dec('0021022',3)
ans =
197
Have I pointed out just how trivial this operation was, that I could do the conversion entirely by hand, going essentially directly from binary to ternary, at least once I had that initial table written down and stored?
I'm afraid I don't know enough Haskell to be able to express this in code but I wonder if using Horner's rule for evaluating polynomials might yield a method.
For example ax^2 + bx + c can be evaluated as c+x*(b+x*a).
To convert, say,
the ternary number a*9+b*3+c to binary, one starts with the binary representation of a, then multiplies that by 3 (i.e shift and add), then adds the binary representation of b, multiplies the result by 3 and adds c.
It seems to me this should be doable with a map (to get the binary representation of the ternary digits) and a fold (of a,b -> a+3*b)
In case this is homework, pseudocode to write x in base b backwards:
while (x != 0) {
q <-- x/b
r <-- x - q*b
print r
x <-- q
}
I'm sure you can figure out how to write the result forwards instead of backwards. Note that / needs to be C-style integer division (the result is an integer, truncated toward zero).
Note that this doesn't depend at all on the base that the arithmetic is performed in. Arithmetic is defined on integers, not the representation of integers in a specific base.
Edit: Based on your updated question, I would slam the digit representation into an integer (via ors and shifts) and use the algorithm described above with integer arithmetic.
Certainly you could do it as you describe, but it seems like an awful lot of work.
I don't think there's a super-efficient way.
"The solution I've already implemented
is to convert a number to decimal
first."
I assume that you are actually converting to some built-in integer type first. I don't think that built-in integer has anything to do with base 10. (Though, when you print it, there will be a base 10 conversion).
Maybe you'd expect there to be some algorithm which looks at the input one digit at a time and produces the output.
But, say you want to convert 3486784400 (base 10) to base 3. You'll need to examine every digit before producing output, because
3486784401 (base 10) = 100000000000000000000 (base 3)
3486784400 (base 10) = 22222222222222222222 (base 3)
..also
"compute the result multiplying digit
values to respective power of two"
explicitly computing a power isn't necessary, see convert from base 60 to base 10
I think there might be some different different "views" of the problem, though I'm not sure any of them are faster or better. For example, the lower order base 3 digit of n is just n mod 3. Let say you already have the binary representation of n. Then consider how the powers of 2 work out mod 3. 2^0 = 1 mod 3, 2^1 = 2 mod 3, 2^2 = 1 mod 3, 2^3 = 2 mod 3, ... In other words, the powers alternate between being 1 mod 3 and being 2 mod 3. You now have an easy way to get the low-order base 3 digit by scanning the binary representation of n and usually only addition of either 1 or 2 at each bit position where a 1 occurs.
No, you can't convert a base2 number to a base3 number without loading it into an integer. The reason is that 2 and 3 are coprime - they have no common factors.
If you were working with base2 and base4, or even base6 and base9, then the set of integers up to the lowest common multiple of the two bases would be represented by two isomorphic sets. For example 13 (base4) = 0111 (base2), so converting 1313 (base4) = 01110111 (base2) - it's a find and replace operation.
At least the solution that you have works and is relatively simple. If you need to improve performance then convert the entire base2 representation to an integer before starting the base3 conversion; it means less modulus operations. The alternative would be process each character in the base2 number one by one, in which case you'll be dividing by all the powers of 3 for each digit in the base2 representation.
If you use binary-coded-ternary (one pair of bits per trit) you can convert using parallel arithmetic. See this tutorial.

Resources