What is difference between trivial FD and two cyclic FD's - relational-algebra

In the Complete Book by Ullman and Widom I've read that with two attributes (A and B) we have four cases for FD's. Second and third are A -> B and B -> A, so they are easier. But I don't understand what the difference between trivial dependency «B is a subset of A» and cyclic FD's A -> B and B -> A. Aren't they the same?

With two attributes you have four cases:
A -> B (this means you also have the trivial FDs: A -> A, B -> B)
B -> A (with trivial FDs as above)
A -> B, B -> A (with trivial FDs as above)
no non-trivial FDs. This means you only have the trivial FDs A -> A, B -> B. This means that the two attributes are independent.
A "real-world" example of case 3 could be two attributes: SSN (social security number of a person) and passport_number of a person. Each one is the consequence of the other.
An example of case 4 could be two attributes: SSN (social security number of a person) and book_title. The two attributes are completely independent. One does not imply the other.

Related

Algorithm for allowing concurrent walk of a graph

In a directed acyclic graph describing a set of tasks to process, i need to find all tasks that can be processed concurrently. The graph has no loops and is quite small (~1000 nodes, ~2000 edges), performance is not a primary concern.
Examples with desired result:
[] is a group. All tasks in a group must be processed before continuing
[x & y] means x and y can be processed concurrently (x and y in parallel)
x -> y means x and y must be processed sequentially (x before y)
1
a -> [b & c] -> c
2
[a & e] -> b -> c -> [d & f]
3
[ [a -> b] & [e -> f] ] -> [ [c -> d] & g ]
I do not want to actually execute the graph, but rather build a data structure that is as parallel as possible, while maintaining the order. The nomenclature and names of algorithms is not that familiar to me, so i'm having a hard time trying to find similar problems/solutions online.
Mathematically, I would frame this problem as finding a minimally defined series-parallel partial order extending the given partial order.
I would start by transitively reducing the graph and repeatedly applying two heuristics.
If x has one dependent y, and y has one dependency x, merge them into a new node z = [x → y].
If x and y have the same dependencies and dependents, merge them into a new node z = [x & y].
Now, if the input is already series-parallel, the result will be one node. In general, however, this will leave a graph that embeds an N-shaped structure like b → c, b → g, f → g from the last example in the question. This structure must be addressed by adding one or more of b → f, c → f, c → g, f → b, f → c, g → c. But in a different instance, this act would in turn create new N-shaped structures. There's no obvious notion of a closure, which is why this problem feels hard to me.
Some of these choices seem worse than others. For example, c → f forces the sequence b → c → f → g, whereas f → c is the only choice that doesn't increase the length of the critical path.
I guess what I'd try is,
If heuristics 1 and 2 have no targets, form a graph with edges x--y if and only if x and y have either a common dependent or a common dependency, compute the connected components of this graph, and &-merge the smallest component that isn't a singleton, followed by another transitive reduction.
Here's a solution i came up with (pseudocode):
sequence = []
for each (node, depth) in depthFirstSearch(graph)
sequence[depth].push(node)
return sequence
The sequence defines the order to process the graph. If an item in it contains more than one node, they can be processed concurrently.
While this allows for some concurrency, it does not advance as fast as it could. For example, f in the 3rd example in the question would require a to be completed first (as it will be at depth 1, when a and e are depth 0). Ideally work on f could start when e is done.

What is a data structure suited for representing railways with turnouts?

I'm trying to represent the paths in a railroad as a data structure but I am having a hard time representing turnouts.
This feels like a graph problem, but there is a difference compared to regular graphs.
A railway turnout is a vertex connected to three other vertices. A, B and C.
But, in a railway system the graph is traversed with a direction.
So, you are able to take the path B -> turnout -> A and C -> turnout -> A, but are not able to take the path B -> turnout -> C.
Is there a (graph) data structure which allows for representing paths with directions?
This data structure would provide the base for a software system to automate a small model railroad.
You can represent the turnout as 2 vertices - one for each state of the turnout. So if you have source A and destinations B and C and turnout which can switch between B and C - you will have 2 vertices for this turnout: TB and TC. Also you will have following edges: A->TB, TB->B, A->TC, TC->C
This allows you to travel from A -> TB -> B and from A -> TC -> C. And since you will have no edge between TB and TC - you will not be able to travel from B -> C directly
Each path can be considered as a vertice and a connection between two paths as an edge.
B ->
A
C ->
This can be represented as a graph in a Go map,
Take a look at the following,
In your example directional connection exists from B -> A and C -> A. This can be represented in a map as follows.
graph := map[string][]string{
"B": []string{"A"},
"C": []string{"A"},
}
Each key in the map represents the starting path of a directional connection. Each value in the array of the corresponding key is the destination path.

Chomsky Normal form removing epsilon transitions

I'm working on converting a CFG to Chomsky Normal Form but I'm having some difficulty.
I have this CFG
A-> BAB|B|epsilon
B -> 00|epsilon
Ok I add a new start state
S -> A
A-> BAB|B|epsilon
B -> 00|epsilon
Then I have to remove epsilon transitions so I start with B
S -> A
A-> BAB|B|AB|BA|A|epsilon
B -> 00
How do I then remove the epsilon from A? Can the start have an epsilon in it? And how do I convert A-> A?
You can't convert this grammar to one without ε, and therefore it cannot be written in Chomsky Normal form. This is because all productions can reduce to ε, therefore ε is a valid sentence in the language.

Calculation of combinations/cartesian product of sets (without duplicates and without order restrictions)

I have a combinatorial problem that can be solved inefficiently using the cartesian
product of multiple sets. Concretely, I have multiple items and multiple elements that
satisfy each item. The problem consists of finding all possible combinations of elements
that satisfy all items. For example:
items -> elements
------------------------
1 -> {a,b} // a and b cover the item 1
2 -> {a,b} // a and b cover the item 2
3 -> {a,b,c} // a, b and c cover the item 3
4 -> {a,b,e,f} // a, b, e, f cover the item 4
Alternative representation:
element -> items covered
------------------------
a -> {1,2,3,4}
b -> {1,2,3,4}
c -> {3}
e -> {4}
f -> {4}
The goal is to find all combinations that cover items 1,2,3,4.
Valid solutions are:
{a},{a,b},{a,c},{a,e},{a,f},{a,b,c},{a,b,e},{a,b,f},{a,b,c,e},{a,b,c,f}
{b},{b,c},{b,e},{b,f},{b,c,e},{b,c,f}
Note that the order is not important, so {a,b} = {b,a} ({a,b} x {c,d} = {c,d} x {a,b}).
Also, note that {a,a,a,a}, {a,a,a,b}... are redundant combinations.
As you can see, this problem is similar to the set cover problem, where the universe
of elements for this example are the items U={1,2,3,4} and the set of subsets from U is S={ab={1,2,3,4},c={3},ef{4}}, where set {1,2,3,4} is the set of items covered by the element a and b, {3} is the set of elements covered by c, and {4} is the set of elements covered by elements e and f. However, the goal here is not finding the
minimal combination of sets from S that covers all elements from U, but finding all combinations of elements {a,b,c,e,f} that cover all items {1,2,3,4}.
A näive implementation could be done by performing a cartesian product between
sets for 1,2,3 and 4, and then filtering the combinations that are redundant. However,
this approach is very inefficient. Suppose I have this situation:
1 -> {a,b,c,d,e,f,g,h}
2 -> {a,b,c,d,e,f,g,h}
3 -> {a,b,c,d,e,f,g,h}
4 -> {a,b,c,d,e,f,g,h}
5 -> {a,b,c,d,e,f,g,h}
6 -> {a,b,c,d,e,f,g,h,i}
A cartesian product between the six sets will result in a 8^5*9=294912 combinations,
when there are actually many fewer combinations, which are: {a,b,c,d,e,f,g} U {a,b,c,d,e,f,g} x {i}.
Another way to solve this problem is to enumerate all elements, skipping
the combinations that are equivalent to other previously generated, and also
skipping repeated elements. This is kinda easy to compute and can be implemented
as an iterator that returns a combination at a time, but I don't know if there is
a better way to solve this problem, or if this problem was studied before.
How would you solve this problem?
First, realize that if a set of elements does not satisfy all items, neither does any of its subsets.
Second, realize that if a set satisfies all items, so do all its supersets.
Now, all you have to do is:
Let S be the set of all elements.
Let R be the empty set.
Define a function find( s, r ) which does:
If r includes s, return r.
If s does not satisfy all items, return r.
Otherwise add s to r.
For every item I in s,
let s' be s-I
let s be f(s', r)
return s.
Just call find(S,R) and you have your answer.
This method performs some duplicate tests, but always kills a branch whenever it is identified as such. This leads to a lot of pruning on a large set of elements.
Both lookup of whether r includes a particular set of elements and the check if s satisfies all items can be made very fast at the expense of extra memory.
What if you did this:
1 -> {a,b}
2 -> {b,c}
3 -> {a,b,c}
4 -> {a,e,f}
=>
a -> [1,3,4]
b -> [1,2,3]
c -> [2,3]
e -> [4]
f -> [4]
Then enumerate the combinations of the left side that provide (at least) [1,2,3,4]
For each item in the set of all-satisfying sets, enumerate combinations
with other items.
All-Satisfying-Sets: {{a,b},{b,e},{b,f}}
Combinations within All-Satisfiying-Sets: {{a,b,e},{a,b,f},{b,e,f},{a,b,e,f}}
Others: {c}
Combinations with Others: {{a,b,c},{b,e,c},{b,f,c}
,{a,b,e,c},{a,b,f,c},{b,e,f,c},{a,b,e,f,c}}
Or you could do this in Haskell:
import Data.List (union, subsequences, sort)
example1 = [(["a"],[1,2,3,4])
,(["b"],[1,2,3,4])
,(["c"],[3])
,(["e"],[4])
,(["f"],[4])]
example2 = [(["a"],[1,2,3,4,5,6])
,(["b"],[1,2,3,4,5,6])
,(["c"],[1,2,3,4,5,6])
,(["e"],[1,2,3,4,5,6])
,(["f"],[1,2,3,4,5,6])
,(["g"],[1,2,3,4,5,6])
,(["h"],[1,2,3,4,5,6])
,(["i"],[6])]
combs items list =
let unify (a,b) (a',b') = (sort (a ++ a'), sort (union b b'))
in map fst
. filter ((==items) . snd)
. map (foldr unify ([],[]))
. subsequences
$ list
OUTPUT:
*Main> combs [1..4] example1
[["a"],["b"],["a","b"],["a","c"],["b","c"],["a","b","c"],["a","e"],["b","e"],
["a","b","e"],["a","c","e"],["b","c","e"],["a","b","c","e"],["a","f"],["b","f"],
["a","b","f"],["a","c","f"],["b","c","f"],["a","b","c","f"],["a","e","f"],
["b","e","f"],["a","b","e","f"],["a","c","e","f"],["b","c","e","f"],
["a","b","c","e","f"]]

Computing the Follow Set

Ok, I've understood how to compute the Follow_k(N) set (N is a nonterminal): for every production rule of the form A -> aBc you add First_k(First_k(c)Follow_k(A)) to Follow_k(B) (a, c are any group of terminals and nonterminals, or even lambda). ...and you repeat this until there's nothing left to add.
But what happends for production rules like: S -> ABCD (A, B, C, D are all nonterminals)?
Should I
add First_k(First_k(BCD)Follow_k(S)) to Follow_k(A) or
add First_k(First_k(CD)Follow_k(S)) to Follow_k(B) or
add First_k(First_k(D)Follow_k(S)) to Follow_k(C) or
add First_k(First_k(lambda)Follow_k(S)) to Follow_k(D) or
do all of the above?
UPDATE:
Let's take the following grammar for example:
S -> ABC
A -> a
B -> b
C -> c
Intuitively, Follow_1(S) = {} because nothing follows after S
Follow_1(A) = {b} because b follows after A,
Follow_1(B) = {c} because c follows after B,
Follow_1(C) = {} because nothing follows after C.
In order to get this result using the algorithm you must consider all cases for S -> ABC.
But my judgement or example may not be right so the question still remains open...
If you run into trouble on other grammar problems like this, give this online first, follow, & predict set finder a shot. It's automatic and you can compare answers to its output to get a feel for how to work through these.
But what happens for production rules like: S -> ABCD (A, B, C, D are all nonterminals)?
Here are the rules for finding follow sets.
First put $ (the end of input marker) in Follow(S) (S is the start symbol)
If there is a production A → aBb, (where a can be a whole string) then everything in FIRST(b) except for ε is placed in FOLLOW(B).
If there is a production A → aB, then everything in FOLLOW(A) is in FOLLOW(B)
If there is production A → aBb, where FIRST(b) contains ε, then everything in FOLLOW(A) is in FOLLOW(B)
Let's use your example grammar:
S -> ABC
A -> a
B -> b
C -> c
Rule 1 says that follow(S) contains $.
Rule 2 gives us: follow(A) contains first(B); also, follow(B) contains first(C).
Rule 3 says that follow(C) contains follow (S).
None of your productions are nullable, so we don't care about rule #4. A symbol is nullable if it derives ε or if it derives a nullable non-terminal symbol.
Nullability's transitivity can trip people up. Consider this grammar:
S -> A
A -> B
B -> ε
Since B derives ε, B's nullable. Since A derives B, which derives ε, A's nullable too. S derives A, which derives B, which derives ε, so S is nullable as well.
Granted, you didn't bring that up, but it's a common source of confusion in compiler courses, so I figured I'd lay it out.
Also, if you need some sample grammars to work through, http://faculty.stedwards.edu/laurab/cosc4342/g1answers.txt might be handy.

Resources