Find the closures of a set of attributes given a relation and FDs - relation

I've the following relation:
R = BCDEFGHI
and the following FDs
C -> D
E -> D
EF -> G
EG -> F
FG -> E
FH -> C
H -> B
I'm asked to find the closure of the following set of attributes:
BC
BDEFG
CEFG
EFG
EFGH
My attempts
Let BC+ = BC.
Using FD C -> D, we have DC+ = BCD, and we're done.
Let BDEFG+ = BDEFG.
We're done.
Let CEFG+ = CEFG.
Using FD C -> D, then CEFG+ = CEFGD, and we're done.
Let EFG+ = EFG.
Using FD E -> D, then EFG+ = EFGD, and we're done.
Let EFGH+ = EFGH.
Using FD E -> D, then EFGH+ = EFGHD.
Using FD FH -> C, then EFGH+ = EFGHDC
Using FD H -> B, then EFGH+ = EFGHDCB, and we're done.
Since I'm very new to these topics, I'm not sure if what I've done is correct or not. I would appreciate some feedback from you! Thanks!

Looks ok. (Assuming that you properly did the steps that you didn't mention, ie decisions re dealing with FDs that you didn't mention and re stopping.)
(Don't say that a closure is equal to something when it isn't. Use some name for the algorithm's accumulator like "1. Let F = BC. Using ... then let F = BCE; Done so BC+ = F = BCE". Or write something like "1. Finding BC+: Using ... then BC+ >= BCE; Done so BC+ = BCE".)

Related

Determine if relation is in BCNF form?

How do I determine if the following relation is in BCNF form?
R(U,V,W,X,Y,Z)
UVW ->X
VW -> YU
VWY ->Z
I understand that for a functional dependency A->B A must be a superkey. And the relation must be in 3NF form. But I am unsure how to apply the concepts.
To determine if a relation is in BCNF, for the definition you should check that for each non-trivial dependency in F+, that is, for all the dependencies specified (F) and those derived from them, the determinant should be a superkey. Fortunately, there is a theorem that says that it is sufficient perform this check only for the specified dependencies.
In your case this means that you must check if UVW, VW and VWY are superkeys.
And to see if in a dependency X -> Y the set attributes X is a superkey you can compute the closure of the attributes (X+) and check if it contains the right hand part Y.
So you have to compute UVW+ and see if it contains {U,V,W,X,Y,Z} and similarly for the other two dependencies. I leave to you this simple exercise.
Using the algorithm to compute the closure of a given set of attributes and the definition of BCNF as shown in the following figure,
we can implement the above algorithm in python to compute closure of attributes and then determine whether a given set of attributes forms a superkey or not, as shown in the following code snippet:
def closure(s, fds):
c = s
for f in fds:
l, r = f[0], f[1]
if l.issubset(c):
c = c.union(r)
if s != c:
c = closure(c, fds)
return c
def is_superkey(s, rel, fds):
c = closure(s, fds)
print(f'({"".join(sorted(s))})+ = {"".join(sorted(c))}')
return c == rel
Now check if for each given FD A -> B from relation R, A is a superkey or not, to determine whether R is in BCNF or not:
def is_in_BCNF(rel, fds):
for fd in fds:
l, r = fd[0], fd[1]
isk = is_superkey(l, rel, fds)
print(f'For the Functional Dependency {"".join(sorted(l))} -> {"".join(sorted(r))}, ' +\
f'{"".join(sorted(l))} {"is" if isk else "is not"} a superkey')
if not isk:
print('=> R not in BCNF!')
return False
print('=> R in BCNF!')
return True
To process the given FDs in standard form, to convert to suitable data structure, we can use the following function:
import re
def process_fds(fds):
pfds = []
for fd in fds:
fd = re.sub('\s+', '', fd)
l, r = fd.split('->')
pfds.append([set(list(l)), set(list(r))])
return pfds
Now, let's test with a couple of relations:
relation = {'U','V','W','X','Y','Z'}
fds = process_fds(['UVW->X', 'VW->YU', 'VWY->Z'])
is_in_BCNF(relation, fds)
# (UVW)+ = UVWXYZ
# For the Functional Dependency UVW -> X, UVW is a superkey
# (VW)+ = UVWXYZ
# For the Functional Dependency VW -> UY, VW is a superkey
# (VWY)+ = UVWXYZ
# For the Functional Dependency VWY -> Z, VWY is a superkey
# => R in BCNF!
relation = {'A','B','C'}
fds = process_fds(['A -> BC', 'B -> A'])
is_in_BCNF(relation, fds)
# (A)+ = ABC
# For the Functional Dependency A -> BC, A is a superkey
# (B)+ = ABC
# For the Functional Dependency B -> A, B is a superkey
# => R in BCNF!

Confidence of association rules

Trying to understand an assignment I did not do correctly.
Assume all the (closed) frequent itemsets and their support counts are:
Support( {A, B, C, D} ) = 0.3
Support( {A, B, C} ) = 0.4
What is Conf(B -> ACD) = ?
What is Conf(A -> BCD) = ?
What is Conf(ABD -> C) = ?
What is Conf(BD -> AC) = ?
I was under the impression that for {confidence a -> bcd}, I could just do .4/.3 .... obviously incorrect, as support cannot be greater than 1.
Could someone enlighten me?
Confidence(B -> ACD) is the support of the combination by the antecedent. So Support(ABCD)/Support(B). You'll notice that Support(B) is not given explicitly, but you can infer the value via closedness.
So the result is 0.3/0.4 = 75%
Note that support is usually given in absolute terms, but of course that doesn't matter here.
I assume that you are using the following definitions for support and confidence formulated in an informal way.
Support(X) = Number of times the itemset appears together / Number of examples in the dataset
Conf(X -> Y) = Support(X union Y)/Support(X)
Applying the definition we have:
Conf(B -> ACD) = Support(ABCD)/Support(B)
Conf(A -> BCD) = Support(ABCD)/Support(A)
Conf(ABD -> C) = Support(ABCD)/Support(ABD)
Conf(BD -> AC) = Support(ABCD)/Support(BD)

Compilation - LL1 Grammar

I am studying the magic of compilers and I don't understand a result.
Here is the grammar :
S -> A #
A -> B G D E
B -> + | - | EPSILON
C -> c C | EPSILON
G -> c C
D -> . C | EPSILON
E -> e B G | EPSILON
When I try to find the "first" and "follow" sets, I get different results than the one I get when I do it with an online predictor.
Here are the results given:
Non-terminal Symbol / Follow Set
S $
A #
B c
C e, ., #
G ., #
D e, #
E #
Why isn't the follow set of G {e, ., #} ?
Because what I understand is that according to the A rule, D follow the G, so we add ., but it could also have been EPSILON, so we move to the E and it can be a e, but it could also have been EPSILON, so we move to the #, in respect with the S rule.
What am I missing here ?
I used the tool at http://hackingoff.com/compilers/predict-first-follow-set
Your computation of the FOLLOW set of G is correct.
The hackingoff tool is buggy. Here is a shorter grammar which exhibits the same error:
S -> a B C a
B -> b
C -> EPSILON
It's obvious that a is in the FOLLOW set for B but the tool reports that set as empty.

For loops in ocaml

I want to do something like
let switchgraph cases =
let g = Graph.makeGraph() in
let g = (Graph.addNode g 1) in
for i = 2 to cases do
let g = (Graph.addNode g i) in
done
g
But apparently, this is not possible. How else can i achieve this.
There are two things you need to fix:
you need to use references (see ref, := and !) for this, since let bindings are immutable
to sequence two expressions, you need to use ;
Something like this should work:
let switchgraph cases =
let g = ref (Graph.makeGraph()) in
g := Graph.addNode (!g) 1;
for i = 2 to cases do
g := Graph.addNode (!g) i
done;
!g
Note that g is the reference, and !g the value.

This LL(1) parse table is correct?

Given grammar:
S -> AB
A -> aA | b
B -> CA
C -> cC | ɛ
Is its LL(1) parsing table is this?
No, it is not entirely correct because of these calculations:
First(S) = First(A) = {a,b}
First(A) = {a,b}
First(B) = First(C) = {c,ε}
First(C) = {c,ε}
Considering that the Follow of each non-terminal symbol is the terminal symbol right after:
Follow(S) ={a,b} (if SAB --> AB then SaAB --> aAB or SbB --> bB)
Follow(A) = {a,c} (if AaA-->aA and Ab --> b then AaA --> aA or Ab --> b)
Follow(B) = Follow (A) = {a,c} (model production A --> aB, which a terminal, and a = ε, then Follow (A) = Follow (B))
Follow(C) = {a,b} (from B-->CA, B-->CaA or B-->Cb)
So the the difference with your parse table, and these calculations, is that in non-terminal B row in columns a and b the values are NULL.
Yes it is correct.
First(S) = First(A) = {a,b}
First(A) = {a,b}
First(B) = {a,b,c}
B->CA and C->cC|ɛ
First(C) = {c,ε}
so if we put ɛ as a replacement of C in B -> CA, we'll have B -> A, Thus First(B)= First(A) instead of ɛ.

Resources