Relational Algebra: Natural Join having the same result as Cartesian product - relational-algebra

I am trying to understand what will be the result of performing a natural join
between two relations R and S, where they have no common attributes.
By following the below definition, I thought the answer might be an empty set:
Natural Join definition.
My line of thought was because the condition in the 'Select' symbol is not met, the projection of all of the attributes won't take place.
When I asked my lecturer about this, he said that the output will be the same as doing a cartezian product between R and S.
I can't seem to understand why, would appreciate any help )

Natural join combines a cross product and a selection into one
operation. It performs a selection forcing equality on those
attributes that appear in both relation schemes. Duplicates are
removed as in all relation operations.
There are two special cases:
• If the two relations have no attributes in common, then their
natural join is simply their cross product.
• If the two relations have more than one attribute in common,
then the natural join selects only the rows where all pairs of
matching attributes match.
Notation: r s
Let r and s be relation instances on schema R and S
respectively.
The result is a relation on schema R ∪ S which is
obtained by considering each pair of tuples tr from r and ts from s.
If tr and ts have the same value on each of the attributes in R ∩ S, a
tuple t is added to the result, where
– t has the same value as tr on r
– t has the same value as ts on s
Example:
R = (A, B, C, D)
S = (E, B, D)
Result schema = (A, B, C, D, E)
r s is defined as:
πr.A, r.B, r.C, r.D, s.E (σr.B = s.B r.D = s.D (r x s))

The definition of the natural join you linked is:
It can be broken as:
1.First take the cartezian product.
2.Then select only those row so that attributes of the same name have the same value
3.Now apply projection so that all attributes have distinct names.
If the two tables have no attributes with same name, we will jump to step 3 and therefore the result will indeed be cartezian product.

Related

Prolog - Print truth table for first order table

I've some difficults to print a truth table for first order logic.
The tasks is:
Write a Prolog predicate table (F, U) that receives one as input
well-formed formula in propositional logic F e
is satisfied by the rows of the truth table of formula F
which have U value. Formula F is expressed by
the function symbols not / 1, and / 2 and or / 2,
the constants t (for true) and f (for false),
any round brackets and use free variables as symbols
of proposition.
Example
?- table(and(P, or(Q, P)), U)
P=t, Q=f, U=t;
P=t, Q=t, U=t;
P=f, Q=f, U=f;
P=f, Q=t, U=f;

Translate to RA: bi-implication/equivalence

(No this isn't one of those translate SQL to RA questions ;-) I have a formula in First-Order Logic that I want to express in RA. That ought to be easy: Codd's 1972 approach in the Relational Completeness paper is to show each FOL operator can be equivalently expressed in RA.
Given relation SP:
Heading {S# CHAR, P# CHAR, QTY INT}
Key {S#, P#}
Characteristic predicate SP(s, p, q) = 'Supplier s supplies Part p in quantity q.'
Express: 'Supplier 'S1' and Supplier 'S2' supply exactly the same set of Parts (disregarding quantities).'
Formula:
∀p. (∃q1. SP('S1', p, q1) ) ⇔ (∃q2. SP('S2', p, q2) )
Note in case of S1 supplying no parts at all, this formula is true just in case S2 also supplies no parts.
This is a Yes/No question (the formula has no free variables); so I'd expect the RA expression must result in a relation with no attributes, returning an empty relation if the two Suppliers do not supply the same set of Parts (formula evaluates to False); otherwise the non-empty relation with no attributes (formula evaluates to True).
To explain a bit further: usually queries return a list of something -- such as the list of Parts supplied by S1, disregarding quantities: SP WHERE (S# = 'S1') {P#} (or in Greek π{P#}(σS# = 'S1'(SP))). For a Yes/No question, we're interested only in whether the query returns something vs nothing, e.g. does Supplier S1 supply Part P456?: SP WHERE (S# = 'S1' AND P# ='P456') {} (π{}(σS# = 'S1'(σP# = 'P456'(SP)))).
You'll notice I'm using a variant of RA: Tutorial D from Date & Darwen. This is easier to read and typeset than Codd's original RA (I've also included the Greek characters and subscripts form). I'll limit myself to Tutorial D operators that correspond to Codd's RA.
I can do the inverse of the query I want: 'Are there any Parts Supplied by S1 but not by S2, or vice versa?'
Firstly a couple of shorthands for common subexpressions
WITH S1P := SP WHERE (S# = 'S1'){P#},
S2P := SP WHERE (S# = 'S2'){P#} :
( S1P MINUS S2P )
UNION
( S2P MINUS S1P );
For those who prefer Greek:
S1P := π{P#}(σS# = 'S1'(SP))
S2P := π{P#}(σS# = 'S2'(SP))
(S1P \ S2P) ∪ (S2P \ S1P)
This'll return an empty result just in case the two Suppliers supply exactly the same set of Parts. So all that remains to do is project that result on to no attributes, and flip empty result to non-empty and vice versa. But Codd's RA doesn't have a way to express that flip, AFAICT.
Applying Codd's 1972 method to the formula, the outermost operation is a forall quantifier, so convert that to a negation of an existential:
¬∃p. ¬( (∃q1. SP('S1', p, q1) ) ⇔ (∃q2. SP('S2', p, q2) ) )
But now the outermost operation is negation. Codd's method only allows negation to appear nested inside conjunction.
I'm stuck. Any ideas?
There is no RA expression that answers the question, if we limit to RA operators and semantics per Codd's 1972 specification.
Even if we add the operators commonly included in RA these days, we can't answer the question as posed. For example, the operators covered in wikipedia such as Rename aka ρ, Extend (for calculated columns), Grouping/Aggregation, Outer Joins.
From the discussion, arguably, the desired result (a degree-zero relation) is not countenanced by Codd. I say "arguably" because Codd never rigorously defines 'relation'. There's Codd 1970 footnote 1 "R is a subset of the Cartesian product S1 x S2 x ... x Sn."; but no lower bound given for n. Clearly it's intended to include the degenerate 'product' for n is 1, then why not allow zero?
For example SQL does not support degree-zero tables. SQL does support pseudo-extending a would-be degree-zero table with a dummy column:
SELECT 'Yes' AS Dummy FROM SP WHERE...
Even allowing that, I claim the question as posed can't be answered in SQL. (Consider the case where SP is empty: then the two Suppliers do supply the same set of Products, viz. the empty set; but the FROM SP ... can only return an empty relation.)
Various non-standard operators or primitives have been suggested (see Comments on q and on other answers). AFAICT there is no authoritative reference that 'blesses' any particular approach. For example, the Alice Book seems not to consider relations of degree zero.
To briefly survey the possible operators/primitives. (Any one of these is expressively equivalent to any other, in the sense that if we have one we can define the others in terms of it -- except for the last.)
Those returning true/false:
Relational comparison: subset ⊆, which can be used to define equality of relations ==. (These require the operands to be 'Union Compatible'.)
IS_EMPTY( ) (which appears in Tutorial D).
The difficulty with returning true/false is that there are no such primitives in RA. (RA operators are usually described as "closed over relations".) Alternatively these operators could return a degree-zero relation; but then why not go to that direct?
Those returning a degree-zero relation:
A complement operation, valid only applied to a degree-zero relation. (This is the "flip" operation discussed in the q.)
Make Dee a primitive -- that is, the non-empty degree-zero relation. Then Dum =df Dee MINUS Dee; and in general complement of r (which must be degree-zero) =df Dee MINUS r
Provide primitive(s) to express a relation literal/constant value, just as most programming languages support expressing numeric or String literals, or more complex data structures. Then Dum/Dee are just two amongst the many relation constants.

How to understand `u=r÷s`, the division operator, in relational algebra?

let be a database having the following relational-schemes: R(A,B,D) and S(A,B) with the attributes of same name in the same domain and with the instances r and s respectively.
An instance of r
An instance of s
What is the scheme and what are the tuples of u=r÷s? How to define them in English with r and s?
My attempt
I know that
u=r÷s=
Which leads me to think that it would only be an array of one column A, but I'm not sure enough to know what will be ther result within the array.
Can you help me understand u=r÷s?
An intuitive property of the division operator of the relational algebra is simply that it is the inverse of the cartesian product. For example, if you have two relations R and S, then, if U is a relation defined as the cartesian product of them:
U = R x S
the division is the operator such that:
U ÷ R = S
and:
U ÷ S = R
So, you can think of the result of U ÷ R as: “the projection of U that, multiplied by R, produces U”, and of the operation ÷, as the operation that finds all the “parts” of U that are combined with all the tuples of R.
However, in order to be useful, we want that this operation can be applied to any couple of relations, that is, we want to divide a relation which is not the result of a cartesian product. For this, the formal definition is more complex.
So, supposing that we have two relations R and S with attributes respectively A and B, their division can be defined as:
R ÷ S = πA-B(R) - πA-B((πA-B(R) x S) - R)
that can be read in this way:
πA-B(R) x S: project R over the attributes of R which are not in S, and multiply (cartesian product) this relation with S. This produces a relation with the attributes A of R and with rows all the possible combinations of rows of S and the projection of R;
From the previous result subtract all the tuples originally in R, that is, perform (πA-B(R) x S) - R. In this way we obtain the “extra” tuples, that is the tuples in the cartesian product that were not present in the original relation.
Finally, subtract from the original relation those extra tuples (but, again, perform this operation only on the attributes of R which are not present in S). So, the final operation is: πA-B(R) - πA-B(the result of step 2).
So, coming to your example, the projection of r on D is equal to:
(D)
d1
d2
d3
d4
and the cartesian product with s is:
(A, B, D)
a1 b1 d1
a1 b1 d2
a1 b1 d3
a1 b1 d4
Now we can remove from this set the tuples that were also in the original relation r, i.e. the first two tuples and the last one, so that we obtain the following result:
(A, B, D)
a1 b1 d3
And finally, we can remove the previous tuples (projected on D), from the original relation (again projected on D), that is, we remove:
(D)
d3
from:
(D)
d1
d2
d3
d4
and we obtain the following result, which is the final result of the division:
(D)
d1
d2
d4
Finally, we could double check the result by multiplying it with the original relation s (which is composed only by the tuple (a1, b1)):
(A B D)
a1 b1 d1
a1 b1 d2
a1 b1 d4
And looking at the rows of the original relation r, you can see this fact, that should give you an important insight on the meaning of the division operator:
the only values of the column D in r that are present together with (a1, b1) (the only tuple of s), are d1, d2 and d4.
You can also see another example in Wikipedia, and for a detailed explanation of the division, together with its transformation is SQL, you could look at these slides.

Which function dependency is not in 1NF?

I have done research on normalization. I know if there is no Primary Key or repetition of a column or group of column it is not in 1NF. But I wondered how can we show it be functional dependencies?
For instance, let's say
P = (N, C, A, K)
A→ K then (augmentation) AN → AK Then ANC → AKC
C → K then (augmentation) CN → KC Then CAN → AKC
k(A,N,C) Not
Can we say it is not in 1NF or is there a way to change it to not become 1NF or any other example will be appreciated.?
EDIT
In other word, my question is, is it possible to determine a function is not in 1NF.
Lets say:
R = (A, B, C)
No valid functional dependencies
What is the highest normalization form of this?
"Not in 1NF" does not mean "2NF or higher". A relation in xNF is also in all the lower NFs. "Not in xNF" means it must be in some lower one. Always talk about its highest NF.
If a relation has no (non-trivial) FDs then it is in BCNF. That is because to not be in BCNF it has to have more that one key but a column set smaller than all columns needs a FD to make it a key so the only key is all columns. It might have non-FD JDs though so it might not be in a higher NF.

Problems with a simple dependency algorithm

In my webapp, we have many fields that sum up other fields, and those fields sum up more fields. I know that this is a directed acyclic graph.
When the page loads, I calculate values for all of the fields. What I'm really trying to do is to convert my DAG into a one-dimensional list which would contain an efficient order to calculate the fields in.
For example:
A = B + D, D = B + C, B = C + E
Efficient calculation order: E -> C -> B -> D -> A
Right now my algorithm just does simple inserts into a List iteratively, but I've run into some situations where that starts to break. I'm thinking what would be needed instead would be to work out all the dependencies into a tree structure, and from there convert that into the one dimensional form? Is there a simple algorithm for converting such a tree into an efficient ordering?
Are you looking for topological sort? This imposes an ordering (a sequence or list) on a DAG. It's used by, for example, spreadsheets, to figure out dependencies between cells for calculations.
What you want is a depth-first search.
function ExamineField(Field F)
{
if (F.already_in_list)
return
foreach C child of F
{
call ExamineField(C)
}
AddToList(F)
}
Then just call ExamineField() on each field in turn, and the list will be populated in an optimal ordering according to your spec.
Note that if the fields are cyclic (that is, you have something like A = B + C, B = A + D) then the algorithm must be modified so that it doesn't go into an endless loop.
For your example, the calls would go:
ExamineField(A)
ExamineField(B)
ExamineField(C)
AddToList(C)
ExamineField(E)
AddToList(E)
AddToList(B)
ExamineField(D)
ExamineField(B)
(already in list, nothing happens)
ExamineField(C)
(already in list, nothing happens)
AddToList(D)
AddToList(A)
ExamineField(B)
(already in list, nothing happens)
ExamineField(C)
(already in list, nothing happens)
ExamineField(D)
(already in list, nothing happens)
ExamineField(E)
(already in list, nothing happens)
And the list would end up C, E, B, D, A.

Resources