Understanding Division in Relational Algebra - relational-algebra

I have a bit of trouble understanding this slide regarding division in Relational Algebra. I did some research and was referred to by many people to On Making Relational Algebra Comprehensible by Lester I McCann. I'm having trouble on understanding one of the slides (Slide 13). I recreate the slides below essentially.
Query: Find the sno value of the suppliers that supply all parts of weight equal to 17.
Relation P
+-------------------------------+
| pno pname color weight city |
+-------------------------------+
| P1 Nut Red 12.0 London |
| . . . . . . . . . . . . . . . |
| P6 Cog Red 19.0 London |
+-------------------------------+
Relation SPJ
+-------------------------+
| sno pno jno qty |
+-------------------------+
| S1 P1 J1 200 |
| . . . . . . . . . . . . |
| S5 P6 J4 500 |
+-------------------------+
I understand that I need the following schema. Relation A projects a list of sno, pno. Relation B tells you which pno equals to 17 weight.
α (sno, pno)
β (pno)
α ← π sno,pno (SPJ)
β ← π pno (σ weight=17 (P))
Result:
Relation α
+---------+
| sno pno |
+---------+
| S1 P1 |
| S2 P3 |
| S2 P5 |
| S3 P3 |
| S3 P4 |
| S4 P6 |
| S5 P1 |
| S5 P2 |
| S5 P3 |
| S5 P4 |
| S5 P5 |
| S5 P6 |
+---------+
Relation β:
+-----+
| pno |
+-----+
| p2 |
| p3 |
+-----+
However the slide then goes on to say:
Find the values that do not belong in the answer, and
remove them from the list of possible answers.
In our P–SPJ example, the list of possible answers is
just the available sno values in α:
+-----+
| sno |
+-----+
| S1 |
| S2 |
| S3 |
| S4 |
| S5 |
+-----+
This is where I'm stuck. He says "P - SPJ" in the example but if I do that I don't get the relation above. I don't think it's possible to even do P - SPJ? According to A First Course in Database Systems, when we apply difference operation to relations, the two tables need to have schemas with identical sets of attributes (which P and SPJ do not have)?
If someone could just point me in the right direction that would be great thanks! I have the book A First Course in Database Systems, Chapter 4 which teaches Relational Algebra but unfortunately does not teach division (which I stumbled upon and wanted to learn).

Find the values that do not belong in the answer, and remove them from the list of possible answers.
When they say "Find the values that do not belong in the answer", that is something that they do later. That relation of "values that do not belong" will be π sno (δ).
When they say "and remove them from the list of possible answers" they mean that the answer is a relational difference that they finally do later between a "list of possible answers" relation that they find next & π sno (δ) that they find after that.
In our P–SPJ example, the list of possible answers is just the available sno values in α:
When they say "In our P-SPJ example, ..." they just mean "In our example involving relations P & SPJ, ...". They are using a dash; they are not using a minus sign for relational difference. What they next calculate & show is the "list of possible answers" relation π sno (α).
(Finally later on they get the answer, which is π sno (α) - π sno (δ).)

Related

Constraint programming suitable for extracting OneToMany relationships from records

Maybe someone can help me to solve a problem with Prolog or any constraint programming language. Imagine a table of projects (school projects where pupils do something with their mothers). Each project has one or more children participating. For each child we store its name and the name of its mother. But for each project there is only one cell that contains all mothers and one cell that contains all children. Both cells are not necessarily ordered in the same way.
Example:
+-----------+-----------+------------+
| | | |
| Project | Parents | Children |
| | | |
+-----------+-----------+------------+
| | | |
| 1 | Jane; | Brian; |
| | Claire | Stephen |
| | | |
+-----------+-----------+------------+
| | | |
| 2 | Claire; | Emma; |
| | Jane | William |
| | | |
+-----------+-----------+------------+
| | | |
| 3 | Jane; | William; |
| | Claire | James |
| | | |
+-----------+-----------+------------+
| | | |
| 4 | Jane; | Brian; |
| | Sophia; | James; |
| | Claire | Isabella |
| | | |
+-----------+-----------+------------+
| | | |
| 4 | Claire | Brian |
| | | |
+-----------+-----------+------------+
| | | |
| 5 | Jane | Emma |
| | | |
+-----------+-----------+------------+
I hope this example visualizes the problem. As I said both cells only contain the names separated by a delimiter, but are not necessarily ordered in a similar way. So for technical applications you would transform the data into this:
+-------------+-----------+----------+
| Project | Name | Role |
+-------------+-----------+----------+
| 1 | Jane | Mother |
+-------------+-----------+----------+
| 1 | Claire | Mother |
+-------------+-----------+----------+
| 1 | Brian | Child |
+-------------+-----------+----------+
| 1 | Stephen | Child |
+-------------+-----------+----------+
| 2 | Jane | Mother |
+-------------+-----------+----------+
| 2 | Claire | Mother |
+-------------+-----------+----------+
| 2 | Emma | Child |
+-------------+-----------+----------+
| 2 | William | Child |
+-------------+-----------+----------+
| | | |
| |
| And so on |
The number of parents and children is equal for each project. So for each deal we have n mothers and n children and each mother belongs to exactly one child. With these constraints it is possible to assign each mother to all of her children by logical inference starting with the projects that involve only one child (i.e. 4 and 5).
Results:
Jane has Emma, Stephen and James;
Claire has Brian and William;
Sophia has Isabella
I am wondering how this can be solved using constraint programming. Additionally, the data set might be underdetermined and I am wondering if it is possible to isolate records that, when solved manually (i.e. when the mother-child assignments are done manually), would break the underdetermination.
I'm not sure if I understand all the requirements of the problem, but here is a constraint programming model in MiniZinc (http://www.minizinc.org/). The full model is here: http://hakank.org/minizinc/one_to_many.mzn .
LATER NOTE: The first version of the project constraints where not correct. I have removed the incorrect code . See the edit history for the original answer.
enum mothers = {jane,claire,sophia};
enum children = {brian,stephen,emma,william,james,isabella};
% decision variables
% who is the mother of this child?
array[children] of var mothers: x;
solve satisfy;
constraint
% All mothers has at least one child
forall(m in mothers) (
exists(c in children) (
x[c] = m
)
)
;
constraint
% NOTE: This is a more correct version of the project constraints.
% project 1
(
( x[brian] = jane /\ x[stephen] = claire) \/
( x[stephen] = jane /\ x[brian] = claire)
)
/\
% project 2
(
( x[emma] = claire /\ x[william] = jane) \/
( x[william] = claire /\ x[emma] = jane)
)
/\
% project 3
(
( x[william] = claire /\ x[james] = jane) \/
( x[james] = claire /\ x[william] = jane)
)
/\
% project 4
(
( x[brian] = jane /\ x[james] = sophia /\ x[isabella] = claire) \/
( x[james] = jane /\ x[brian] = sophia /\ x[isabella] = claire) \/
( x[james] = jane /\ x[isabella] = sophia /\ x[brian] = claire) \/
( x[brian] = jane /\ x[isabella] = sophia /\ x[james] = claire) \/
( x[isabella] = jane /\ x[brian] = sophia /\ x[james] = claire) \/
( x[isabella] = jane /\ x[james] = sophia /\ x[brian] = claire)
)
/\
% project 4(sic!)
( x[brian] = claire) /\
% project 5
( x[emma] = jane)
;
output [
"\(c): \(x[c])\n"
| c in children
];
The unique solution is
brian: claire
stephen: jane
emma: jane
william: claire
james: jane
isabella: sophia
Edit2: Here is a more general solution. See http://hakank.org/minizinc/one_to_many.mzn for the complete model.
include "globals.mzn";
enum mothers = {jane,claire,sophia};
enum children = {brian,stephen,emma,william,james,isabella};
% decision variables
% who is the mother of this child?
array[children] of var mothers: x;
% combine all the combinations of mothers and children in a project
predicate check(array[int] of mothers: mm, array[int] of children: cc) =
let {
int: n = length(mm);
array[1..n] of var 1..n: y;
} in
all_different(y) /\
forall(i in 1..n) (
x[cc[i]] = mm[y[i]]
)
;
solve satisfy;
constraint
% All mothers has at least one child.
forall(m in mothers) (
exists(c in children) (
x[c] = m
)
)
;
constraint
% project 1
check([jane,claire], [brian,stephen]) /\
% project 2
check([claire,jane],[emma,william]) /\
% project 3
check([claire,jane],[william,james]) /\
% project 4
check([claire,sophia,jane],[brian,james,isabella]) /\
% project 4(sic!)
check([claire],[brian]) /\
% project 5
check([jane],[emma])
;
output [
"\(c): \(x[c])\n"
| c in children
];
This model use the following predicate to ensure that all the combinations of mothers vs children are considered:
predicate check(array[int] of mothers: mm, array[int] of children: cc) =
let {
int: n = length(mm);
array[1..n] of var 1..n: y;
} in
all_different(y) /\
forall(i in 1..n) (
x[cc[i]] = mm[y[i]]
)
;
It use the global constraint all_different(y) to ensure that mm[y[i]] is one of the mothers in mm, and then assign the `i'th child to that specific mother.
A bit off topic, but since from SWI-Prolog manual:
Plain Prolog can be regarded as CLP(H), where H stands for Herbrand
terms. Over this domain, =/2 and dif/2 are the most important
constraints that express, respectively, equality and disequality of
terms.
I feel authorized to suggest a Prolog solution, more general than the algorithm you suggested (progressively reduce relations based on single to single relations):
solve2(Projects,ParentsChildren) :-
foldl([_-Ps-Cs,L,L1]>>try_links(Ps,Cs,L,L1),Projects,[],ChildrenParent),
transpose_pairs(ChildrenParent,ParentsChildrenFlat),
group_pairs_by_key(ParentsChildrenFlat,ParentsChildren).
try_links([],[],Linked,Linked).
try_links(Ps,Cs,Linked,Linked2) :-
select(P,Ps,Ps1),
select(C,Cs,Cs1),
link(C,P,Linked,Linked1),
try_links(Ps1,Cs1,Linked1,Linked2).
link(C,P,Assigned,Assigned1) :-
( memberchk(C-Q,Assigned)
-> P==Q,
Assigned1=Assigned
; Assigned1=[C-P|Assigned]
).
This accepts data in a natural format, like
data(1,
[1-[jane,claire]-[brian,stephen]
,2-[claire,jane]-[emma,william]
,3-[jane,claire]-[william,james]
,4-[jane,sophia,claire]-[brian,james,isabella]
,5-[claire]-[brian]
,6-[jane]-[emma]
]).
data(2,
[1-[jane,claire]-[brian,stephen]
,2-[claire,jane]-[emma,william]
,3-[jane,claire]-[william,james]
,4-[jane,sophia,claire]-[brian,james,isabella]
,5-[claire]-[brian]
,6-[jane]-[emma]
,7-[sally,sandy]-[grace,miriam]
]).
?- data(2,Ps),solve2(Ps,S).
Ps = [1-[jane, claire]-[brian, stephen], 2-[claire, jane]-[emma, william], 3-[jane, claire]-[william, james], 4-[jane, sophia, claire]-[brian, james, isabella], 5-[claire]-[brian], 6-[jane]-[emma], 7-[...|...]-[grace|...]],
S = [claire-[william, brian], jane-[james, emma, stephen], sally-[grace], sandy-[miriam], sophia-[isabella]].
This is my first CHR program, so I hope that someone will come and give me some advice on how to improve it.
My thinking is that you need to expand all the lists into facts. From there, if you know that a project has just one parent and one child, you can establish the parent relationship from that. Also, once you have a parent-child relationship, you can remove that set from the other facts in the other projects and reduce the cardinality of the problem by one. Eventually you will have figured out everything you can. The only difference between a completely determined dataset and an incompletely determined one is in how far that reduction can go. If it doesn't quite get there, it will leave around some facts so you can see which projects/parents/children are still creating ambiguity.
:- use_module(library(chr)).
:- chr_constraint project/3, project_parent/2, project_child/2,
project_parents/2, project_children/2, project_size/2, parent/2.
%% turn a project into a fact about its size plus
%% facts for each parent and child in this project
project(N, Parents, Children) <=>
length(Parents, Len),
project_size(N, Len),
project_parents(N, Parents),
project_children(N, Children).
%% expand the list of parents for this project into a fact per parent per project
project_parents(_, []) <=> true.
project_parents(N, [Parent|Parents]) <=>
project_parent(N, Parent),
project_parents(N, Parents).
%% same for the children
project_children(_, []) <=> true.
project_children(N, [Child|Children]) <=>
project_child(N, Child),
project_children(N, Children).
%% a single parent-child combo on a project is exactly what we need
one_parent # project_size(Project, 1),
project_parent(Project, Parent),
project_child(Project, Child) <=>
parent(Parent, Child).
%% if I have a parent relationship for project of size N,
%% remove this parent and child from the project and decrease
%% the number of parents and children by one
parent_det # parent(Parent, Child) \ project_size(Project, N),
project_parent(Project, Parent),
project_child(Project, Child) <=>
succ(N0, N),
project_size(Project, N0).
I ran this with your example by making a main/0 predicate to do it:
main :-
project(1, [jane, claire], [brian, stephen]),
project(2, [claire, jane], [emma, william]),
project(3, [jane, claire], [william, james]),
project(4, [jane, sophia, claire], [brian, james, isabella]),
project(5, [claire], [brian]),
project(6, [jane], [emma]).
This outputs:
parent(sophia, isabella),
parent(jane, james),
parent(claire, william),
parent(jane, emma),
parent(jane, stephen),
parent(claire, brian).
To demonstrate incomplete determination, I added a seventh project:
project(7, [sally,sandy], [grace,miriam]).
The program then outputs this:
project_parent(7, sandy),
project_parent(7, sally),
project_child(7, miriam),
project_child(7, grace),
project_size(7, 2),
parent(sophia, isabella),
parent(jane, james),
parent(claire, william),
parent(jane, emma),
parent(jane, stephen),
parent(claire, brian).
As you can see, any project_size/2 that remains tells you the cardinality of what remains to be solved (project seven has two parent/children relationships still remaining to be determined) and you get back exactly the parents/children that remain to be handled, as well as all of the parent/2 relations which could be determined.
I'm pretty happy with this outcome but hopefully others can come and improve my code!
Edit: my code has a shortcoming which was identified on the mailing list, that certain inputs will fail to converge even though the solution can be computed, for instance:
project(1,[jane,claire],[brian, stephan]),
project(2,[jane,emma],[stephan, jones]).
For more information, see Ian's solution, which uses set intersection to determine the mapping.

Any algorithm to fill a space with smallest number of boxes

Let a 3D grid, just like a checkerboard, with an extra dimension. Now let's say that I have a certain amount of cubes into that grid, each cube occupying 1x1x1 cells. Let's say that each of these cubes is an item.
What I would like to do is replace/combine these cubes into larger boxes occupying any number of cells on the X, Y and Z axes, so that the resulting number of boxes is as small as possible while preserving the overall "appearance".
It's probably unclear so I'll give a 2D example. Say I have a 2D grid containing several squares occupying 1x1 cells. A letter represents the cells occupied by a given item, each item having a different letter from the other ones. In the first example we have 10 different items, each of them occupying 1x1x1 cells.
+---+---+---+---+---+---+
| | | | | | |
+---+---+---+---+---+---+
| | A | B | C | D | |
+---+---+---+---+---+---+
| | E | F | G | H | |
+---+---+---+---+---+---+
| | | K | L | | |
+---+---+---+---+---+---+
| | | | | | |
+---+---+---+---+---+---+
That's my input data. I could now optimize it, i.e reduce the number of items while still occupying the same cells, by multiple possible ways, one of which could be :
+---+---+---+---+---+---+
| | | | | | |
+---+---+---+---+---+---+
| | A | B | B | C | |
+---+---+---+---+---+---+
| | A | B | B | C | |
+---+---+---+---+---+---+
| | | B | B | | |
+---+---+---+---+---+---+
| | | | | | |
+---+---+---+---+---+---+
Here, instead of 10 items, I only have 3 (i.e A, B and C). However it can be optimized even more :
+---+---+---+---+---+---+
| | | | | | |
+---+---+---+---+---+---+
| | A | A | A | A | |
+---+---+---+---+---+---+
| | A | A | A | A | |
+---+---+---+---+---+---+
| | | B | B | | |
+---+---+---+---+---+---+
| | | | | | |
+---+---+---+---+---+---+
Here I only have two items, A and B. This is as optimized as this can be.
What I am looking for is an algorithm capable of finding the best item sizes and arrangement, or at least a reasonably good one, so that I have as few items as possible while occupying the same cells, and in 3D !
Is there such an algorithm ? I'm sure there are some domains where that kind of algorithm would be useful, and I need it for a video game. Thanks !!
Perhaps a simpler algorithm is possible, but a set partition should suffice.
Min x1 + x2 + x3 + ... //where x1 is 1 if the 1th partition is chosen, 0 otherwise
such that x1 + + x3 = 1// if 1st and 3rd partition contain 1st item
x2 + x3 = 1//if 2nd and 3rd partition contain 2nd item and so on.
x1, x2, x3,... are binary
You have 1 constraint for each item. Each constraint stipulates that each item can be part of exactly one box. The objective minimizes the total number of boxes.
This is an NP Hard integer programming however.
The number of variables in this problem could be exponential. You need to have an efficient way of enumerating them -- that is figuring out when a contiguous box can be found that is capable of including all points in it. It is here that you have to take into account information such as whether the grid is 2d or 3d, how you define a contiguous "box", etc.
Such problems are usually solved by column-generation, where these columns of the integer program are dynamically generated on the fly.
If I understand David Eppstein's1 explanation (see section 3) then a solution can be found in a maximal independent set in the bipartite intersection graph of axis-aligned diagonals connecting one concave vertex to another. (This would be 2d. I'm not sure about 3d, although perhaps it involves evaluating hyperplanes instead of lines?)
In your example, there is only one such diagonal:
________
| |
|_x....x_|
|____|
The two xs represent connected concave vertices. The maximal independent set of edges here contains only one edge, splitting the polygon in two.
Here's another with only one axis-parallel edge connecting two concave vertices, x and x. This polygon, though, also has two concave vertices, a and b, that do not have an opposite, axis-parallel match. In that case, it seems to me, each concave vertex without a partner would split the polygon it's on in two (either vertically or horizontally):
____________
| |
| |x
| . |
| . |a
|___ . |
b| . |
| .___|
|________|x
results in 4 rectangles:
____________
| |
| |x
| . |
| ..|a
|___.......... |
b| . |
| .___|
|________|x
Here's one with two intersecting axis-parallel diagonals, each connecting two concave vertices, (x,x) and (y,y):
____________
| |
| |x_
| . |
| . |
|___ . . . .z. .|y
y| . |
| .____|
|________|x
In this case, as I understand, the intersection graph again contains only one independent set:
(y,z) (z,y) (x,z) (z,x)
yielding 4 rectangles as a solution.
Since I'm not completely sure how the "intersection graph" in the paper is defined, I would welcome any clarifying comments.
1. Graph-Theoretic Solutions to Computational Geometry Problems, David Eppstein (Submitted on 26 Aug 2009)

Feature Tracking by using Lucas Kanade algorithm

Lucas Kanade Feature Tracker Refer Page 6 I am implementing the Lucas Kanade Feature Tracker in C++.
One thing is unclear in implementing the equation 23 which is mentioned in attached paper. I think Matrix G calculation should happened inside K loop, not outside K loop. In case when Patch B is present at the border in frame j, That time it is not useful to use full G Spatial Gradient Matrix which is calculated before K loop (as mentioned in paper). For Frame j, Matrix G should calculate for the showed patch B portion only.
Patch A Patch B
| |
| |
-----|--- -|-------
| |---| | | | |
| | | | |--| |
| |---| | | | |
| | |--| |
--------- ---------
Frame i Frame j

which algorithm should be chosen to complete this task

Hi
I am new to Cluster,I don't know which algorithm is appropriate to my task. let me describe my task:
first, given a set of points and their distances between them
clustering them into several clusters based on distance.
a few new points will be added, the distance among all of points will also be given.
repeating 2
for example,first we have the following matrix
| p1 | p2 | p3 |
---|----|----|----|
p1 | | | |
p2 | d1 | | |
p3 | d2 | d3 | |
after clustering, we add a new point and the distance is also given:
| p1 | p2 | p3 | p4 |
---|----|----|----|----|
p1 | | | | |
p2 | d1 | | | |
p3 | d2 | d3 | | |
p4 | d4 | d5 | d6 | |
The problem here is the speed, I expect that the clustering is the incremental cluster, i.e. the later clustering can utilize previous result. Because we will add the points frequently(if we find one), and if we re-cluster the points each time. Even if the cluster itself has O(n), the total time of cluster will be O(n^2).
Any suggestion?
Thanks
One option is to fix the number of clusters (say, you have K clusters). Whenever you add a new point, you add it to the cluster whose center of gravity (average of the coordinates of the points in the cluster) is nearest to the point you added. If you recluster completely whenever the number of points in your space becomes a power of two (1, 2, 4, 8, 16, 32 ...), the amortized cost of reclustering is still O(n).

The "Waiting lists problem"

A number of students want to get into sections for a class, some are already signed up for one section but want to change section, so they all get on the wait lists. A student can get into a new section only if someone drops from that section. No students are willing to drop a section they are already in unless that can be sure to get into a section they are waiting for. The wait list for each section is first come first serve.
Get as many students into their desired sections as you can.
The stated problem can quickly devolve to a gridlock scenario. My question is; are there known solutions to this problem?
One trivial solution would be to take each section in turn and force the first student from the waiting list into the section and then check if someone end up dropping out when things are resolved (O(n) or more on the number of section). This would work for some cases but I think that there might be better options involving forcing more than one student into a section (O(n) or more on the student count) and/or operating on more than one section at a time (O(bad) :-)
Well, this just comes down to finding cycles in the directed graph of classes right? each link is a student that wants to go from one node to another, and any time you find a cycle, you delete it, because those students can resolve their needs with each other. You're finished when you're out of cycles.
Ok, lets try. We have 8 students (1..8) and 4 sections. Each student is in a section and each section has room for 2 students. Most students want to switch but not all.
In the table below, we see the students their current section, their required section and the position on the queue (if any).
+------+-----+-----+-----+
| stud | now | req | que |
+------+-----+-----+-----+
| 1 | A | D | 2 |
| 2 | A | D | 1 |
| 3 | B | B | - |
| 4 | B | A | 2 |
| 5 | C | A | 1 |
| 6 | C | C | - |
| 7 | D | C | 1 |
| 8 | D | B | 1 |
+------+-----+-----+-----+
We can present this information in a graph:
+-----+ +-----+ +-----+
| C |---[5]--->1| A |2<---[4]---| B |
+-----+ +-----+ +-----+
1 | | 1
^ | | ^
| [1] [2] |
| | | |
[7] | | [8]
| V V |
| 2 1 |
| +-----+ |
\--------------| D |--------------/
+-----+
We try to find a section with a vacancy, but we find none. So because all sections are full, we need a dirty trick. So lets take a random section with a non empty queue. In this case section A and assume, it has an extra position. This means student 5 can enter section A, leaving a vacancy at section C which is taken by student 7. This leaves a vacancy in section D which is taken by student 2. We now have a vacancy at section A. But we assumed that section A has an extra position, so we can remove this assumption and have gained a simpler graph.
If the path never returned to section A, undo the moves and mark A as an invalid startingpoint. Retry with another section.
If there are no valid sections left we are finished.
Right now we have the following situation:
+-----+ +-----+ +-----+
| C | | A |1<---[4]---| B |
+-----+ +-----+ +-----+
| 1
| ^
[1] |
| |
| [8]
V |
1 |
+-----+ |
| D |--------------/
+-----+
We repeat the trick with another random section, and this solves the graph.
If you start with several students currently not assigned, you add an extra dummy section as their startingpoint. Of course, this means that there must be vacancies in any sections or the problem is not solvable.
Note that due to the order in the queue, it can be possible that there is no solution.
This is actually a Graph problem. You can think of each of these waiting list dependencies as edges on a directed graph. If this graph has a cycle, then you have one of the situations you described. Once you have identified a cycle, you can chose any point to "break" the cycle by "over filling" one of the classes, and you will know that things will settle correctly because there was a cycle in the graph.

Resources