Functional dependency: find candidate key - candidate-key

Consider the following relation R = (ABCDEFGH) on which this set of functional dependencies hold: {BE -> GH, G -> FA, D -> C, F -> B} What is a candidate key of R?
I found the answer can be BED, DEG, FED. However, this question is a single multiple choice question, so I can't select multiple answers...
Hope someone can give me the correct answer and the method he used.

Let me explain how to find candidate keys in a simple manner:
Form a three columns,left,right and middle
In left column,add the attributes which appear only on left hand side of FD
In right column,add attributes which appear only on right hand side of FD
In middle column,add attributes which appear both on right and left hand side of FD
Explanation:
Attributes on left column indicates,every possible candidate keys must include these attributes
and
Attributes on right column indicates candidate keys should not include it
and
Attributes on middle may or may not be included in candidate keys
In this example,we get E,D on left column,B,F,G on middle and A,C,H on right column
Then,apply Closure property,
BED+=ABCDEFGH
since
BE->GH given
BE->FA Decomposition rule and transitivity rule BE->G and G->FA
BED->C given since D->H
Similarly DEF and DEG can be proved as candidate keys.
Thus,all three are Candidate keys
Hope,it helps!

set of candidate keys is {{BDE}, {DEF}, {DEG}}
A relation can have multiple candidate keys. However you can choose only one of them as a primary key.
This link gives an explanation of how the candidate keys are evaluated for your example.

Related

understanding constraint satisfaction problem: map coloring algorithm

I am trying to implement this recursive-backtracking function for a constraint satisfaction problem from the given algorithm:
function BACKTRACKING-SEARCH(csp) returns solution/failure
return RECURSIVE-BACKTRACKING({},csp)
function RECURSIVE-BACKTRACKING(assignment,csp) returns soln/failure
if assignment is complete then return assignment
var <- SELECT-UNASSIGNED-VARIABLE(VARIABLES[csp],assignment,csp)
for each value in ORDER-DOMAIN-VALUES(var,assignment,csp) do
if value is consistent with assignment given CONSTRAINT[csp] then
add {var = value} to assignment
result <- RECURSIVE-BACKTRACKING(assignment, csp)
if result != failure then return result
remove {var = value} from assignment
return failure
The input for csp in BACKTRACKING-SEARCH(csp) is a csp class that contains a) a list of states, b) the list of colors, and c) an ordered dictionary with a state as the key and the value is the list of neighbors of the state that cannot have the same color.
The problem is that I am having a hard time understanding how the algorithm works correctly. If anyone can give me a proper explanation of this algorithm, it would be very much appreciated. Some specific questions I have is:
if assignment is complete then return assignment
I assume that since assignment is inputted as an empty dictionary {}, that this will return the solution, that is, the dictionary that contains states and their colors. However, I don't understand how I can check if the assignment is complete? Would it be something like checking the size of the dictionary against the number of states?
var <- SELECT-UNASSIGNED-VARIABLE(VARIABLES[csp],assignment,csp)
The input csp class contains a list of states, I assume this could just be var equal to popping off a value in the list? I guess, what's confusing me is I'm not sure what the parameters (VARIABLES[csp], assignment, csp) are doing, given my input.
for each value in ORDER-DOMAIN-VALUES(var,assignment,csp) do
Again, confused on what the inputs of (var, assignment, csp) are doing exactly. But I assume that it'll go through each value (neighbor) in dictionary of the state?
if value is consistent with assignment given CONSTRAINT[csp] then
add {var = value} to assignment
result <- RECURSIVE-BACKTRACKING(assignment, csp)
if result != failure then return result
remove {var = value} from assignment
How do I properly check if value is consistent with assignment given constraints[csp]? I assume that constraints should be something that should be apart of my csp class that I haven't implemented yet? I don't understand what this if statement is doing in terms of checking. It would be quite useful if someone can clearly explain this if statement and the body of the if statement in depth.
So after rehashing some college literature (Peter Norvig's Artificial Intelligence: A Modern Approach), it turns out the problem in your hands is the application of Recursive Backtracking as a way to find a solution for the Graph Coloring Problem, which is also called Map Coloring (given its history to solve the problem of minimize colors needed to draw a map). Replacing each country in a map for a node and their borders with edges will give you a graph where we can apply recursive backtracking to find a solution.
Recursive backtracking will descend the graph nodes as a depth-first tree search, checking at each node for whether a color can be used. If not, it tries the next color, if yes, then it tries the next unvisited adjacent node. If for a given node no color satisfies the condition, it will step back (backtrack) and move on to a sibling (or the parent's sibling if no siblings for that node).
So,
I assume that since assignment is inputted as an empty dictionary {}, that this will return the solution, that is, the dictionary that contains states and their colors
...
Would it be something like checking the size of the dictionary against the number of states?
Yes and yes. Once the dictionary contains all the nodes of the graph with a color, you'll have a solution.
The input csp class contains a list of states, I assume this could just be var equal to popping off a value in the list?
That pseudocode syntax is confusing but the general idea is that you'll have a way to find out a node of the graph that hasn't been colored yet. One simply way is to return a node from the dictionary that doesn't have a value assigned to it. So if I understand the syntax correctly, var would store a node.
VARIABLES[csp] seems to me like a representation of the list of nodes inside your CSP structure.
I'm not sure what the parameters (VARIABLES[csp], assignment, csp) are doing, given my input
The assignment parameter is a dictionary containing the nodes evaluated so far (and the future solution), as mentioned above, and csp is the structure containing a,b and c.
Again, confused on what the inputs of (var, assignment, csp) are doing exactly. But I assume that it'll go through each value (neighbor) in dictionary of the state?
ORDER-DOMAIN-VALUES appears to be a function which will return the ordered set of colors in your CSP structure. The FOR loop will iterate over each color so that they're tested to satisfy the problem at that level.
if value is consistent with assignment given CONSTRAINT[csp] then
Here, what you're doing is testing the constraint with that value, to ensure it's true. In this case you want to check that any nodes adjacent to that node does not have that color already. If an adjacent node has that color, skip the IF and iterate the for loop to try the next color.
If no adjacent nodes have that color, then enter the IF body and add that node var with color value to the assigment dictionary (I believe {var = value} is a tuple representation, which I would write {var,value}, but oh well).
Then call the function recursive backtracking again, recursively.
If the recursive call returns non-failure, return its results (it means the solution has been found).
If it returns a failure (meaning, it tried all the colors and all of them happened to be used by another adjacent node), then remove that node ({var,value}) from the assignment (solution) array and move on to the next color. If all colors have been exausted, return failure.

Using Linked List to represent a Matrix Class

I'm having trouble initializing the linked list for the matrix based on the parameters I input. So if I input the parameters (3,3) it should actually make make 4x4 so I can use the first column and first row for indexing. and the left top corner node as an entry point.
def __init__(self, m, n, default=0):
self._head = MatrixNode(None)
for node in range(m - 1):
node = MatrixNode(0)
node._right = node
for node in range(n - 1):
node = MatrixNode(0)
node._down = node
this is what I have so far but I'm sure its horrible.
At first, it may be useful to know, what a MatrixNode is. I guess you just want to store a value there?
Then i see two linear loops, while a matrix is a n*m data structure. Are you sure, your loops do not need to be nested to initialize your structure correctly?
For linked lists i would expect something like row.next = nextrow and row.startnode.next = nextnode, i do not see anything like this here.
Having this said, i want to ask you, if you really want to implement a matrix yourself, and in such an object oriented (inefficient!) way.
You can use two-dimensional arrays (a=[[1,2], [3,4]];a[0][0]==1) or a good implementation from a numerics library like numpy/scipy.
There you have numpy.array for storing n-dimensional data (with nice addressing like matrix[1,2] and similiar syntax to matlab) or numpy.matrix which is like an array with some methods overloaded for matrix operations (i.e. matrix-matrix multiplication of arrays is pointwise and for matrices it's the usual matrix multiplication).
You are right, its horrible :)
First things first, a linked list is a very bad way of representing a matrix.
If you want to represent a matrix, start with a list of lists, and work from there if that's not enough (see other answer mentioning numpy, for example)
If you want to learn to use linked lists, choose a better example.
Then: you are re-using the variable name "node" for different things:
Your loop index. The code for node in range(...) will assign an integer from the range to node in every iteration.
Then you assign a new MatrixNode to node, and then you set the node's neighbor (_right or _down) to be not the actual neighbor, but itself (node._right = node).
You also never save your nodes that you create inside the loops anywhere, so they will be garbage-collected.
And you never use the optional argument default.

Re-order a ranked-list based on new partial rank

I have a question about ranking algorithm that might hasn't exist so far:
I have a list ordered by a score, for example a following list (denotes list-a):
Now I have new information to know that the list should be ranked as follow (denotes list-b):
The question in here is: How to construct a new ranking for the list-a follow restriction in list-b?
We can say that the new list must:
It must follow the rank in the list-b
Try to have less conflict with the rank in the list-a. (e.g about conflict: list-a says a>b, but now we say b>a => conflict).
The problem in here is the list-b doesn't have information about c, e, g (marked by red color in list-a). Now we need to construct a new ranking for list-a follow restriction in list-b.
My current solution:
Sure that we can solve it by using a brute force strategy as follow: add to the list-b the missing items c, e, g one by one and find the best place for it by:
Select one place for it in list-b (e.g: a > c > d > b > f)
Next check number of conflict with list-a, then select a position that have less conflict.
For example with c, we can do as follow:
When we have equal number of conflict for different position, then we select the first position (i guess). Just follow this way, we can add up to the final item.
This is my "bad way" to do it, so do you have any better idea for this problem? Because my list is really long (about 1 million items), if follow this way, it must be too expensive for computation.
Looking forward to hearing your suggestion.
Interesting problem. I am assuming that List B is going to also have the updated scores. So what you could do is make List A into a dictionary where item is the key and value is the score. Then you could iterate through list B and look up the items in constant time and update the score. You could then make the dictionary back into a collection and use built in sort. This would run in O(nlogn). Hope this helps

Breadth-first Resolution Algorithm

I want to implement a resolution algorithm which tries to get empty set as it resolves the candidate clauses.
I want algorithm to resolve the candidate parent clauses in a breadth-first order. However, I got confused at a point:
Let S be conjunction of all the clauses in knowledge base and negation of goal clause
when we try to resolve candidate clauses in S with the ones again in S, we get S'
As second step in the algorithm, should we try to resolve for S and S' or S' with S' itself?
and how should it proceed?
For example;
Suppose knowledge base + neg. of goal set consists of set of clauses such as
p(a,b) ^ q(z),~p(z,b) ^ ~q(y) ((let's call this set S)
when we run resolution algorithm on set S, we get clauses like:
q(a) ^ ~p(z,b) (let's call this set S')
now, If we have to employ BFS strategy, should we first find the resolvents whose first parent is in S and second is in S' first? or try to check for the resolvents whose parents are both from S'?
In some examples, when you first check with S' and S' for resolvents, you get the solution. However, when you proceed with checking pair of sets (S, S') (S, (S, S')) you get another way leading to empty clause. So, which order does correspond to BFS?
Thanks in advance
Here it is stated that:
All of the first-level resolvents are computed first, then the second-level resolvents,
and so on. A first-level resolvent is one between two clauses in the base set; an i-th
level resolvent is one whose deepest parent is an (i-1)-th level resolvent.
and here:
Level 0 clauses are the original axioms and the negation of the goal
Level k clauses are the resolvents computed from two clauses, one of which must be from level k-1 and the other from any earlier level.
What I mean from these statements and my comments are as the following:
Level 0 consists original clauses and negation of the goal. Let this set be X.
Level 1 consists resolution of (X,X) which are the only possible candidates. Let this set be Y.
Level 2 consists resolutions of (Y,X) and (Y,Y).
and so on.
My explanation applies the second statement. Actually it will give the same results as the first one except that you will resolve same sets at every level which is unnecessary. Breadth-first strategy is already very inefficient and a wrong approach makes it even worse.
I hope this clarifies your question.

How to determine correspondence between two lists of names?

I have:
1 million university student names and
3 million bank customer names
I manage to convert strings into numerical values based on hashing (similar strings have similar hash values). I would like to know how can I determine correlation between these two sets to see if values are pairing up at least 60%?
Can I achieve this using ICC? How does ICC 2-way random work?
Please kindly answer ASAP as I need this urgently.
This kind of entity resolution etc is normally easy, but I am surprised by the hashing approach here. Hashing loses information that is critical to entity resolution. So, if possible, you shouldn't use hash, rather the original strings.
Assuming using original strings is an option, then you would want to do something like this:
List A (1M), List B (3M)
// First, match the entities that match very well, and REMOVE them.
for a in List A
for b in List B
if compare(a,b) >= MATCH_THRESHOLD // This may be 90% etc
add (a,b) to matchedList
remove a from List A
remove b from List B
// Now, match the entities that match well, and run bipartite matching
// Bipartite matching is required because each entity can match "acceptably well"
// with more than one entity on the other side
for a in List A
for b in List B
compute compare(a,b)
set edge(a,b) = compare(a,b)
If compare(a,b) < THRESHOLD // This seems to be 60%
set edge(a,b) = 0
// Now, run bipartite matcher and take results
The time complexity of this algorithm is O(n1 * n2), which is not very good. There are ways to avoid this cost, but they depend upon your specific entity resolution function. For example, if the last name has to match (to make the 60% cut), then you can simply create sublists in A and B that are partitioned by the first couple of characters of the last name, and just run this algorithm between corresponding list. But it may very well be that last name "Nuth" is supposed to match "Knuth", etc. So, some local knowledge of what your name comparison function is can help you divide and conquer this problem better.

Resources