Confidence of association rules - algorithm

Trying to understand an assignment I did not do correctly.
Assume all the (closed) frequent itemsets and their support counts are:
Support( {A, B, C, D} ) = 0.3
Support( {A, B, C} ) = 0.4
What is Conf(B -> ACD) = ?
What is Conf(A -> BCD) = ?
What is Conf(ABD -> C) = ?
What is Conf(BD -> AC) = ?
I was under the impression that for {confidence a -> bcd}, I could just do .4/.3 .... obviously incorrect, as support cannot be greater than 1.
Could someone enlighten me?

Confidence(B -> ACD) is the support of the combination by the antecedent. So Support(ABCD)/Support(B). You'll notice that Support(B) is not given explicitly, but you can infer the value via closedness.
So the result is 0.3/0.4 = 75%
Note that support is usually given in absolute terms, but of course that doesn't matter here.

I assume that you are using the following definitions for support and confidence formulated in an informal way.
Support(X) = Number of times the itemset appears together / Number of examples in the dataset
Conf(X -> Y) = Support(X union Y)/Support(X)
Applying the definition we have:
Conf(B -> ACD) = Support(ABCD)/Support(B)
Conf(A -> BCD) = Support(ABCD)/Support(A)
Conf(ABD -> C) = Support(ABCD)/Support(ABD)
Conf(BD -> AC) = Support(ABCD)/Support(BD)

Related

CFG of Language which contains equal # of a's and b's

I've tried this
S -> e(Epsilon)
S -> SASBS
S -> SBSAS
A -> a
B -> b
Can someone verify if this is correct.
Your grammar is correct. Here is the proof.
First, we show that your grammar generates only strings with an equal number of a and b. Note that all productions with S on the LHS introduce an equal number of A as they do B. Therefore, any string of terminals derived from S will have an equal number of a and b.
Next, we show that all strings of a and b can be derived using this grammar. We proceed using mathematical induction.
Base case: S -> e and both S -> SASBS -> ASBS -> aSBS -> aBS -> abS -> ab and S -> SBSAS -> BSAS -> bSAS -> bAS -> baS -> ba, so the three shortest string in the language are generated by the grammar. There are no other strings in the language of length less than 4.
Induction hypothesis: all strings of length up to 2k in the language are generated by the grammar.
Inductive step: we must show all strings of length 2(k + 1) in the language are also generated by the grammar. If w = axb or w = bya for some strings x and y, then x and y are strings of length 2k in the language and are therefore generated by the grammar. In this case, we can use the same derivation with an extra application of either S -> SASBS -> ASBS -> aSBS -> aSbS -> aSb or S -> SBSAS -> BSAS -> bSAS -> bSaS -> bSa and then use the derivation for x or y to complete the derivation, yielding w. If, instead, w = axa or w = byb, then x or y is a string with exactly two more b than a or a than b. In this case, there must be a prefix p of w with |p| < |w| such that p is also a string in the language (see lemma below). If the prefix p is a word in the language, and w = pr, then r must also be a word in the language, so w must be the concatenation of two words in L. These words both have length less than |w| so less than 2(k + 1) and are generated by the grammar. If they are generated by the grammar then they are of the form SaSbS or SbSaS and their concatenation can be derived using the grammar by using the productions in the proper sequence. That is, S -> SASBS -> SASBSBSAS -> aSbSbSa = aSbS bSa <- aSbS SbSa (we are of course free to choose S -> e in that last reverse step justification).

Is there such a thing as maximumWith?

Specifically I'm searching for a function 'maximumWith',
maximumWith :: (Foldable f, Ord b) => (a -> b) -> f a -> a
Which behaves in the following way:
maximumWith length [[1, 2], [0, 1, 3]] == [0, 1, 3]
maximumWith null [[(+), (*)], []] == []
maximumWith (const True) x == head x
My use case is picking the longest word in a list.
For this I'd like something akin to maximumWith length.
I'd thought such a thing existed, since sortWith etc. exist.
Let me collect all the notes in the comments together...
Let's look at sort. There are 4 functions in the family:
sortBy is the actual implementation.
sort = sortBy compare uses Ord overloading.
sortWith = sortBy . comparing is the analogue of your desired maximumWith. However, this function has an issue. The ranking of an element is given by applying the given mapping function to it. However, the ranking is not memoized, so if an element needs to compared multiple times, the ranking will be recomputed. You can only use it guilt-free if the ranking function is very cheap. Such functions include selectors (e.g. fst), and newtype constructors. YMMV on simple arithmetic and data constructors. Between this inefficiency, the simplicity of the definition, and its location in GHC.Exts, it's easy to deduce that it's not used that often.
sortOn fixes the inefficiency by decorating each element with its image under the ranking function in a pair, sorting by the ranks, and then erasing them.
The first two have analogues in maximum: maximumBy and maximum. sortWith has no analogy; you may as well write out maximumBy (comparing _) every time. There is also no maximumOn, even though such a thing would be more efficient. The easiest way to define a maximumOn is probably just to copy sortOn:
maximumOn :: (Functor f, Foldable f, Ord r) => (a -> r) -> f a -> a
maximumOn rank = snd . maximumBy (comparing fst) . fmap annotate
where annotate e = let r = rank e in r `seq` (r, e)
There's a bit of interesting code in maximumBy that keeps this from optimizing properly on lists. It also works to use
maximumOn :: (Foldable f, Ord r) => (a -> r) -> f a -> a
maximumOn rank = snd . fromJust . foldl' max' Nothing
where max' Nothing x = let r = rank x in r `seq` Just (r, x)
max' old#(Just (ro, xo)) xn = let rn = rank xn
in case ro `compare` rn of
LT -> Just (rn, xo)
_ -> old
These pragmas may be useful:
{-# SPECIALIZE maximumOn :: Ord r => (a -> r) -> [a] -> a #-}
{-# SPECIALIZE maximumOn :: (a -> Int) -> [a] -> a #-}
HTNW has explained how to do what you asked, but I figured I should mention that for the specific application you mentioned, there's a way that's more efficient in certain cases (assuming the words are represented by Strings). Suppose you want
longest :: [[a]] -> [a]
If you ask for maximumOn length [replicate (10^9) (), []], then you'll end up calculating the length of a very long list unnecessarily. There are several ways to work around this problem, but here's how I'd do it:
data MS a = MS
{ _longest :: [a]
, _longest_suffix :: [a]
, _longest_bound :: !Int }
We will ensure that longest is the first of the longest strings seen thus far, and that longest_bound + length longest_suffix = length longest.
step :: MS a -> [a] -> MS a
step (MS longest longest_suffix longest_bound) xs =
go longest_bound longest_suffix xs'
where
-- the new list is not longer
go n suffo [] = MS longest suffo n
-- the new list is longer
go n [] suffn = MS xs suffn n
-- don't know yet
go !n (_ : suffo) (_ : suffn) =
go (n + 1) suffo suffn
xs' = drop longest_bound xs
longest :: [[a]] -> [a]
longest = _longest . foldl' step (MS [] [] 0)
Now if the second to longest list has q elements, we'll walk at most q conses into each list. This is the best possible complexity. Of course, it's only significantly better than the maximumOn solution when the longest list is much longer than the second to longest.

Determine if relation is in BCNF form?

How do I determine if the following relation is in BCNF form?
R(U,V,W,X,Y,Z)
UVW ->X
VW -> YU
VWY ->Z
I understand that for a functional dependency A->B A must be a superkey. And the relation must be in 3NF form. But I am unsure how to apply the concepts.
To determine if a relation is in BCNF, for the definition you should check that for each non-trivial dependency in F+, that is, for all the dependencies specified (F) and those derived from them, the determinant should be a superkey. Fortunately, there is a theorem that says that it is sufficient perform this check only for the specified dependencies.
In your case this means that you must check if UVW, VW and VWY are superkeys.
And to see if in a dependency X -> Y the set attributes X is a superkey you can compute the closure of the attributes (X+) and check if it contains the right hand part Y.
So you have to compute UVW+ and see if it contains {U,V,W,X,Y,Z} and similarly for the other two dependencies. I leave to you this simple exercise.
Using the algorithm to compute the closure of a given set of attributes and the definition of BCNF as shown in the following figure,
we can implement the above algorithm in python to compute closure of attributes and then determine whether a given set of attributes forms a superkey or not, as shown in the following code snippet:
def closure(s, fds):
c = s
for f in fds:
l, r = f[0], f[1]
if l.issubset(c):
c = c.union(r)
if s != c:
c = closure(c, fds)
return c
def is_superkey(s, rel, fds):
c = closure(s, fds)
print(f'({"".join(sorted(s))})+ = {"".join(sorted(c))}')
return c == rel
Now check if for each given FD A -> B from relation R, A is a superkey or not, to determine whether R is in BCNF or not:
def is_in_BCNF(rel, fds):
for fd in fds:
l, r = fd[0], fd[1]
isk = is_superkey(l, rel, fds)
print(f'For the Functional Dependency {"".join(sorted(l))} -> {"".join(sorted(r))}, ' +\
f'{"".join(sorted(l))} {"is" if isk else "is not"} a superkey')
if not isk:
print('=> R not in BCNF!')
return False
print('=> R in BCNF!')
return True
To process the given FDs in standard form, to convert to suitable data structure, we can use the following function:
import re
def process_fds(fds):
pfds = []
for fd in fds:
fd = re.sub('\s+', '', fd)
l, r = fd.split('->')
pfds.append([set(list(l)), set(list(r))])
return pfds
Now, let's test with a couple of relations:
relation = {'U','V','W','X','Y','Z'}
fds = process_fds(['UVW->X', 'VW->YU', 'VWY->Z'])
is_in_BCNF(relation, fds)
# (UVW)+ = UVWXYZ
# For the Functional Dependency UVW -> X, UVW is a superkey
# (VW)+ = UVWXYZ
# For the Functional Dependency VW -> UY, VW is a superkey
# (VWY)+ = UVWXYZ
# For the Functional Dependency VWY -> Z, VWY is a superkey
# => R in BCNF!
relation = {'A','B','C'}
fds = process_fds(['A -> BC', 'B -> A'])
is_in_BCNF(relation, fds)
# (A)+ = ABC
# For the Functional Dependency A -> BC, A is a superkey
# (B)+ = ABC
# For the Functional Dependency B -> A, B is a superkey
# => R in BCNF!

Syntax directed definition (count number of pairs of parentheses)

given the following grammar I have to find the appropriate semantic actions to calculate, for each string of the language, the number of pairs of parentheses in the string.
S -> (L)
S -> a
L -> L, S
L -> S
Usually, to perform this type of exercise, I build a derivation tree of a sample string and then I add the attributes. After that it is easier to find the semantic rules.
So I built this derivation tree for the string "((a, (a), a))", but I can't proceed with the resolution of the exercise. How do I count the pairs of parentheses? I'am not able to do that...
I do't want the solution but I'd like someone to help me with the reasoning to be made in these cases.
(I'm sorry for the bad tree...)
The OP wrote:
These might be the correct semantic actions for this grammar?
S -> (L) {S.p = counter + 1}
S -> a {do nothing}
L -> L, S {L.p = S.p}
L -> S {L.p = S.p}
.p is a synthesized attribute.
S-> (S) { S.count =S.count + 1}
S-> SS{ S.count = S.count + S.count}
S-> ϵ{S.count = 0}
This should make things clear

Which way of these two pattern matching is more preferred?

I'm just curious, these two functions would do the same thing. But which one should I use?
let f a =
match a with
b -> a;;
let f a =
match a with
b -> b;;
Or it just depends on your preference?
I feel the second one would be better but I'm not sure.
Performance wise there is no difference. Style-wise b -> a is a bit problematic because you have an unused variable b. _ -> a would make more sense. Other than that, it's just preference.
Personally I would prefer _ -> a over b -> b because it doesn't introduce an extra variable.
PS: I assume in your real code there are more cases than just b - otherwise you could just write let f a = a.
Also, in your particular example I would rewrite using function
let f = function
| b -> b

Resources