Determine if relation is in BCNF form? - algorithm

How do I determine if the following relation is in BCNF form?
R(U,V,W,X,Y,Z)
UVW ->X
VW -> YU
VWY ->Z
I understand that for a functional dependency A->B A must be a superkey. And the relation must be in 3NF form. But I am unsure how to apply the concepts.

To determine if a relation is in BCNF, for the definition you should check that for each non-trivial dependency in F+, that is, for all the dependencies specified (F) and those derived from them, the determinant should be a superkey. Fortunately, there is a theorem that says that it is sufficient perform this check only for the specified dependencies.
In your case this means that you must check if UVW, VW and VWY are superkeys.
And to see if in a dependency X -> Y the set attributes X is a superkey you can compute the closure of the attributes (X+) and check if it contains the right hand part Y.
So you have to compute UVW+ and see if it contains {U,V,W,X,Y,Z} and similarly for the other two dependencies. I leave to you this simple exercise.

Using the algorithm to compute the closure of a given set of attributes and the definition of BCNF as shown in the following figure,
we can implement the above algorithm in python to compute closure of attributes and then determine whether a given set of attributes forms a superkey or not, as shown in the following code snippet:
def closure(s, fds):
c = s
for f in fds:
l, r = f[0], f[1]
if l.issubset(c):
c = c.union(r)
if s != c:
c = closure(c, fds)
return c
def is_superkey(s, rel, fds):
c = closure(s, fds)
print(f'({"".join(sorted(s))})+ = {"".join(sorted(c))}')
return c == rel
Now check if for each given FD A -> B from relation R, A is a superkey or not, to determine whether R is in BCNF or not:
def is_in_BCNF(rel, fds):
for fd in fds:
l, r = fd[0], fd[1]
isk = is_superkey(l, rel, fds)
print(f'For the Functional Dependency {"".join(sorted(l))} -> {"".join(sorted(r))}, ' +\
f'{"".join(sorted(l))} {"is" if isk else "is not"} a superkey')
if not isk:
print('=> R not in BCNF!')
return False
print('=> R in BCNF!')
return True
To process the given FDs in standard form, to convert to suitable data structure, we can use the following function:
import re
def process_fds(fds):
pfds = []
for fd in fds:
fd = re.sub('\s+', '', fd)
l, r = fd.split('->')
pfds.append([set(list(l)), set(list(r))])
return pfds
Now, let's test with a couple of relations:
relation = {'U','V','W','X','Y','Z'}
fds = process_fds(['UVW->X', 'VW->YU', 'VWY->Z'])
is_in_BCNF(relation, fds)
# (UVW)+ = UVWXYZ
# For the Functional Dependency UVW -> X, UVW is a superkey
# (VW)+ = UVWXYZ
# For the Functional Dependency VW -> UY, VW is a superkey
# (VWY)+ = UVWXYZ
# For the Functional Dependency VWY -> Z, VWY is a superkey
# => R in BCNF!
relation = {'A','B','C'}
fds = process_fds(['A -> BC', 'B -> A'])
is_in_BCNF(relation, fds)
# (A)+ = ABC
# For the Functional Dependency A -> BC, A is a superkey
# (B)+ = ABC
# For the Functional Dependency B -> A, B is a superkey
# => R in BCNF!

Related

Confidence of association rules

Trying to understand an assignment I did not do correctly.
Assume all the (closed) frequent itemsets and their support counts are:
Support( {A, B, C, D} ) = 0.3
Support( {A, B, C} ) = 0.4
What is Conf(B -> ACD) = ?
What is Conf(A -> BCD) = ?
What is Conf(ABD -> C) = ?
What is Conf(BD -> AC) = ?
I was under the impression that for {confidence a -> bcd}, I could just do .4/.3 .... obviously incorrect, as support cannot be greater than 1.
Could someone enlighten me?
Confidence(B -> ACD) is the support of the combination by the antecedent. So Support(ABCD)/Support(B). You'll notice that Support(B) is not given explicitly, but you can infer the value via closedness.
So the result is 0.3/0.4 = 75%
Note that support is usually given in absolute terms, but of course that doesn't matter here.
I assume that you are using the following definitions for support and confidence formulated in an informal way.
Support(X) = Number of times the itemset appears together / Number of examples in the dataset
Conf(X -> Y) = Support(X union Y)/Support(X)
Applying the definition we have:
Conf(B -> ACD) = Support(ABCD)/Support(B)
Conf(A -> BCD) = Support(ABCD)/Support(A)
Conf(ABD -> C) = Support(ABCD)/Support(ABD)
Conf(BD -> AC) = Support(ABCD)/Support(BD)

Cleaner way to represent languages accepted by DFAs?

I am given 2 DFAs. * denotes final states and -> denotes the initial state, defined over the alphabet {a, b}.
1) ->A with a goes to A. -> A with b goes to *B. *B with a goes to *B. *B with b goes to ->A.
The regular expression for this is clearly:
E = a* b(a* + (a* ba* ba*)*)
And the language that it accepts is L1= {w over {a,b} | w is b preceeded by any number of a's followed by any number of a's or w is b preceeded by any number of a's followed by any number of bb with any number of a's in middle of(middle of bb), end or beginning.}
2) ->* A with b goes to ->* A. ->*A with a goes to *B. B with b goes to -> A. *B with a goes to C. C with a goes to C. C with b goes to C.
Note: A is both final and initial state. B is final state.
Now the regular expression that I get for this is:
E = b* ((ab) * + a(b b* a)*)
Finally the language that this DFA accepts is:
L2 = {w over {a, b} | w is n 1's followed by either k 01's or a followed by m 11^r0' s where n,km,r >= 0}
Now the question is, is there a cleaner way to represent the languages L1 and L2 because it does seem ugly. Thanks in advance.
E = a* b(a* + (a* ba* ba*)*)
= a*ba* + a*b(a* ba* ba*)*
= a*ba* + a*b(a*ba*ba*)*a*
= a*b(a*ba*ba*)*a*
= a*b(a*ba*b)*a*
This is the language of all strings of a and b containing an odd number of bs. This might be most compactly denoted symbolically as {w in {a,b}* | #b(w) = 1 (mod 2)}.
For the second one: the only way to get to state B is to see an a in A, and the only way to get to C from outside C is to see an a in B. C is a dead state and the only way to get to it is to see aa starting in A. That is: if you ever see two as in a row, the string is not in the language; the language is the set of all strings over a and b not containing the substring aa. This might be most compactly denoted symbolically as {(a+b)*aa(a+b)*}^c where ^c means "complement".

Find the closures of a set of attributes given a relation and FDs

I've the following relation:
R = BCDEFGHI
and the following FDs
C -> D
E -> D
EF -> G
EG -> F
FG -> E
FH -> C
H -> B
I'm asked to find the closure of the following set of attributes:
BC
BDEFG
CEFG
EFG
EFGH
My attempts
Let BC+ = BC.
Using FD C -> D, we have DC+ = BCD, and we're done.
Let BDEFG+ = BDEFG.
We're done.
Let CEFG+ = CEFG.
Using FD C -> D, then CEFG+ = CEFGD, and we're done.
Let EFG+ = EFG.
Using FD E -> D, then EFG+ = EFGD, and we're done.
Let EFGH+ = EFGH.
Using FD E -> D, then EFGH+ = EFGHD.
Using FD FH -> C, then EFGH+ = EFGHDC
Using FD H -> B, then EFGH+ = EFGHDCB, and we're done.
Since I'm very new to these topics, I'm not sure if what I've done is correct or not. I would appreciate some feedback from you! Thanks!
Looks ok. (Assuming that you properly did the steps that you didn't mention, ie decisions re dealing with FDs that you didn't mention and re stopping.)
(Don't say that a closure is equal to something when it isn't. Use some name for the algorithm's accumulator like "1. Let F = BC. Using ... then let F = BCE; Done so BC+ = F = BCE". Or write something like "1. Finding BC+: Using ... then BC+ >= BCE; Done so BC+ = BCE".)

Resolving PREDICT/PREDICT conflicts in LL(1)

I'm working on a simple LL(1) parser generator, and I've run into an issue with PREDICT/PREDICT conflicts given certain input grammars. For example, given an input grammar like:
E → E + E
| P
P → 1
I can remove out the left recursion from E, replacing it with a roughly equivalent right recursive rule, thus arriving at the grammar:
E → P E'
E' → + E E'
| ε
P → 1
Next, I can compute the relevant FIRST and FOLLOW sets for the grammar, and end up with the following:
FIRST(E) = { 1 }
FIRST(E') = { +, ε }
FIRST(P) = { 1 }
FOLLOW(E) = { +, EOF }
FOLLOW(E') = { +, EOF }
FOLLOW(P) = { +, EOF }
And finally, using PREDICT(A → α) = { FIRST(α) - ε } ∪ (FOLLOW(A) if ε ∈ FIRST(α) else ∅) to construct the PREDICT sets for the grammar, the resulting sets are as follows.
PREDICT(1. E → P E') = { 1 }
PREDICT(2. E' → + E E') = { +, EOF }
PREDICT(3. E' → ε) = { +, EOF }
PREDICT(4. P → 1) = { 1 }
So this is where I run into the conflict that PREDICT(2) = PREDICT(3), and thus, I cannot produce a parse table as the grammar is not LL(1), since parser wouldn't be able to choose which rule should be applied.
What I'm really wondering is whether it's possible to resolve the conflict or factor the grammar such that the conflict can be avoided, and produce a legal LL(1) grammar, without having to directly modify the original input grammar.
The problem here is that your original grammar is ambiguous.
E → E + E
E → P
means that P + P + P can be parsed either as (P + P) + P or P + (P + P). Eliminating left recursion doesn't fix the ambiguity, so the modified grammar is also ambiguous. And ambiguous grammars can't be LL(k) (or, for that matter, LR(k)).
So you need to make the grammar unambiguous:
E → E + P
E → P
(That's the common left-associative version.) Once you eliminate left recursion, you end up with:
E → P E'
E' → + P E'
| ε
Now + is not in FOLLOW(E').
(The example is drawn straight from the Dragon book, but simplified; it's example 4.8 in the rather battered old copy I have.)
It's worth noting that the transformation used here preserves the set of strings derived by the grammar, but not the derivation. The parse tree which results from the modified grammar is effectively right-associative, so it will need to be reprocessed to recover the desired parse. This fact is rather briefly mentioned by the Dragon book authors:
Although left-recursion elimination and left factoring are easy to do, they make the resulting grammar hard to read and difficult to use for translation purposes. (My emphasis)
They go on to suggest that operator precedence parsing can be used for expressions, and then mention that if an LR parser generator is available, dividing the grammar into a predictive part and an operator-precedence part is no longer necessary.

Inconsistent behaviour with Haskell

I was reading on perceptrons and trying to implement one in haskell. The algorithm seems to be working as far as I can test. I'm going to rewrite the code entirely at some point, but before doing so I thought of asking a few questions that have arosen while coding this.
The neuron can be trained when returning the complete neuron. let neuron = train set [1,1] works, but if I change the train function to return an incomplete neuron without the inputs, or try to pattern match and create only an incomplete neuron, the code falls into neverending loop.
tl;dr when returning complete neuron everything works, but when returning curryable neuron, the code falls into a loop.
module Main where
import System.Random
type Inputs = [Float]
type Weights = [Float]
type Threshold = Float
type Output = Float
type Trainingset = [(Inputs, Output)]
data Neuron = Neuron Threshold Weights Inputs deriving Show
output :: Neuron -> Output
output (Neuron threshold weights inputs) =
if total >= threshold then 1 else 0
where total = sum $ zipWith (*) weights inputs
rate :: Float -> Float -> Float
rate t o = 0.1 * (t - o)
newweight :: Float -> Float -> Weights -> Inputs -> Weights
newweight t o weight input = zipWith nw weight input
where nw w x = w + (rate t o) * x
learn :: Neuron -> Float -> Neuron
learn on#(Neuron tr w i) t =
let o = output on
in Neuron tr (newweight t o w i) i
converged :: (Inputs -> Neuron) -> Trainingset -> Bool
converged n set = not $ any (\(i,o) -> output (n i) /= o) set
train :: Weights -> Trainingset -> Neuron
train w s = train' s (Neuron 1 w)
train' :: Trainingset -> (Inputs -> Neuron) -> Neuron
train' s n | not $ converged n set
= let (Neuron t w i) = train'' s n
in train' s (Neuron t w)
| otherwise = n $ fst $ head s
train'' :: Trainingset -> (Inputs -> Neuron) -> Neuron
train'' ((a,b):[]) n = learn (n a) b
train'' ((a,b):xs) n = let
(Neuron t w i) = learn (n a) b
in
train'' xs (Neuron t w)
set :: Trainingset
set = [
([1,0], 0),
([1,1], 1),
([0,1], 0),
([0,0], 0)
]
randomWeights :: Int -> IO [Float]
randomWeights n =
do
g <- newStdGen
return $ take n $ randomRs (-1, 1) g
main = do
w <- randomWeights 2
let (Neuron t w i) = train w set
print $ output $ (Neuron t w [1,1])
return ()
Edit: As per comments, specifying a little more.
Running with the code above, I get:
perceptron: <<loop>>
But by editing the main method to:
main = do
w <- randomWeights 2
let neuron = train w set
print $ neuron
return ()
(Notice the let neuron, and print rows), everything works and the output is:
Neuron 1.0 [0.71345896,0.33792675] [1.0,0.0]
Perhaps I am missing something, but I boiled your test case down to this program:
module Main where
data Foo a = Foo a
main = do
x ← getLine
let (Foo x) = Foo x
putStrLn x
This further simplifies to:
main = do
x ← getLine
let x = x
putStrLn x
The problem is that binding (Foo x) to something that depends on x
is a cyclic dependency. To evaluate x, we need to know the value of
x. OK, so we just need to calculate x. To calculate x, we need to
know the value of x. That's fine, we'll just calculate x. And so on.
This isn't C, remember: it's binding, not assignment, and the binding
is evaluated lazily.
Use better variable names, and it all works:
module Main where
data Foo a = Foo a
main = do
line ← getLine
let (Foo x) = Foo line
putStrLn x
(The variable in question, in your case, is w.)
This is a common mistake in Haskell. You cannot say things like:
let x = 0
let x = x + 1
And have it mean what it would in a language with assignment, or even nonrecursive binding. The first line is irrelevant, it gets shadowed by the second line, which defines x as x+1, that is, it defines recursively x = ((((...)+1)+1)+1)+1, which will loop upon evaluation.

Resources