Relational algebra - what is the proper way to represent a 'having' clause? - relational-algebra

This is a small part of a homework question so I can understand the whole.
SQL query to list car prices that occur more than once:
select car_price from cars
group by car_price
having count (car_price) > 1;
The general form of this in relational algebra is
Y (gl, al) R
where Y is the greek symbol, gl is list of attributes to group, and al is a list of aggregations.
The relational algebra:
Y (count(car_price)) cars
How is the having clause written in that statement? Is there a shorthand? If not, do I just need to select from that relation? Like this?
SELECT (count(car_price) > 1) [Y (count(car_price)) cars]

select count(*) from (select * from cars where price > 1) as cars;
also known as relational closure.

For a more or less precise answer to the actual question asked, "Relational algebra - what is the proper way to represent a ‘having’ clause?", it needs to be stated first that the question itself seems to suggest, or presume, that there exists such a thing as "THE" relational algebra, but that presumption is simply untrue !
An algebra is a set of operators, and anyone can define any set of operators he likes, meaning anyone can define any algebra he likes ! In his most recent publication, Hugh Darwen mentions that RESTRICT is not a fundamental operator of the algebra, though lots of others do consider it as such.
Especially with respect to aggregations and summaries, there is little consensus as to how those should be incorporated in a relational algebra. Defining operators such as COUNT() (that take a relation as an argument value and return an integer) as part of the algebra, might be problematic wrt the closure property of the algebra, precisely because such operators do not return a relation ...
So the sorry, but nevertheless most appropriate, answer here seems to be that a conclusive answer to this question is almost impossible to give ...

Related

How to SortBy two parameters in OCL?

I need to sort the collection of Persons by two parameters, by Surnames and after by Names. How can I do something like this in OCL?
The sortedBy function sorts elements using a the criteria expressed in its body and a < relationship between each gathered result.
In your case, assuming that you have a surname attribute, the following statement will sort the collection c using the < operator on each surname gathered (so a < on strings):
c->sortedBy(p | p.surname)
An idea could be to compute a unique string using the surname and the name concatenated toghether. Thus, if you have:
George Smith
Garry Smith
George Smath
The comparison would be done between "Smith_George", "Smith_Garry" and "Smath_George" and would be ordered, following the lexicographical order, to:
George Smath (Smath_George)
Garry Smith (Smith_Garry)
George Smith (smith_George)
Finally, the OCL request would be (assuming surname and name as existing attributes):
c->sortedBy(p | p.surname + '_' + p.name)
This little trick does the job, but it is not "exactly" a two parameters comparison for sortedBy.
The OCL sortedBy(lambda) appears to be very different to Java's sort(comparator) apparently requiring a projection of the objects as the metric for sorting. However if the projection is self, you have a different Java functionality. Therefore if you do sortedBy(p | p) the sorting is dependent on the < operation for p.
To facilitate this, the Eclipse OCL prototype of a future OCL introduces an OclComparable type with a compareTo method enabling all relational operations to be realized provided your custom type extends the OclComparable type.
(A similar OclSummable with zero() and sum() operates supports Collection::sum() generically; e.g. String realizes sum as a concatenation.)
Thanks you inspired me. I just raised http://issues.omg.org/browse/OCL25-213 whose text is:
The sortedBy iteration provides an elegant solution to a sort problem in which the sort metric is a projection of the sorted object. Thus sortedBy(p|p.name) or just sortedBy(name) is short and avoids the opportunities for typos from a more conventional exposition involving comparison of two objects. The natural solution may well be an efficient one for large collections with non-trivial metrics.
However the sortedBy solution is unfamiliar and so confusing for newcomers and unsuitable for multi-key sorting for which an artificial compound single key may need to be constructed.
One solution might be to provide a more conventional iterator such as sort(p1, p2 | comparison-expression) allowing a two key sort:
sort(p1, p2 | let diff1 = p1.key1.compareTo(p2.key1) in
if diff1 <> 0 the diff 1 else p1.key2.compareTo(p2.key2) endif)
However this has poor readability and ample opportunities for typos.
Alternatively sortedBy with a Tuple-valued metric might support multiple keys as:
sortedBy(Tuple{first=key1,second=key2})
(The alphabetical order of the Tuple part names determines the priority.)
(Since sortedBy is declaratively clear and compact, inefficient small/trivial implementations can be optimized to their sort() equivalents.)

expressing constraints in relational algebra

I need to express this constrain in relational algebra:
I have some table with one column with all possible values: ALL_VAL
and table with some values from ALL_VAL that not mach some rule: NOT_FIT_VAL
and I can calculate FIT_VAL = ALL_VAL - NOT_FIT_VAL
what i need is a constraint: in FIT_VAL there minimum one item.
I am using not equal sign with empty group:
ALL_VAL,
NOT_FIT_VAL
FIT_VAL = ALL_VAL - NOT_FIT_VAL
FIT_VAL <> {empty}
but I am not sure that <>(not equal) is allowed at all in relational algebra
there is not a single book or article that shows example or saying that I can use it.
I would like some clarification about it, and the correct expression.
thank you
Strictly speaking, the expression "FIT_VAL <> {empty}" is not a relational expression (it does not produce a relation, but a truth value instead), therefore it is a bit problematic to consider such expressions as being "valid relational algebra expressions".
But that is strictly speaking, and I don't know how much slack your textbook cuts its readers/users in that area. Under the "strictly speaking" approach, it is even outright impossible to use relational algebra to define constraints, because the definition of a constraint must produce a boolean result (does the database satisfy it or not) almost by definition. That is probably the reason why it is so highly exceptional to see relational algebra being used to express/define database constraints !
Another approach for defining database constraints using relational algebra, is to define a relational expression that plays the role of "faults expression" and then implicitly, tacitly assume that the rule is that the result of evaluating this expression must at all times be empty. But that's (AFAIK) an entirely private approach of mine, and I'd be surprised if you had also found it in a textbook.

Identify similar expressions?

How can I validate two expressions that are logically equivalent ?
For e.g :
(a+b) <=> (b+a) or (a+(b+c)) <=> ((a+b)+c) or (a && b) <=> (b && a) ,etc
I want an optimize solution for it, that will identify the duplicate one from 1000s of expressions.
Contrary to a comment by #japreiss, there is no open research question here. In the context of propositional expressions like the examples given, logical equivalence is very well understood. (I'm assuming OP is using + to mean logical OR rather than numeric addition)
A question for the OP: are you interested in doing this by hand or automating it via programming code?
There are multiple approaches but the one taught in most intro to logic courses and text books would be to construct separate truth tables for the left-hand side and right-hand side, and see if the final values on each row are identical.
If you have a computer system that can already evaluate the truth of an expression like (a+(b+c)) and ((a+b)+c), then that same system can probably be given the whole expression (a+(b+c)) <=> ((a+b)+c) and can tell you whether or not it is a tautology.
With regard to comparing 1,000's of expressions to find duplicates, one technique is to convert each expression to conjunctive normal form and then just use string comparison to see which ones are identical.

Maximum difference between columns using relational algebra

Is it possible to obtain the maximum difference between two columns (for example starting and ending weights)?
Right now I'm leaning towards no as this would require a new column with the difference between the two columns for each row, then taking the max of that. Doing it the way I orginally intended doesn't work either since arithmetic operations are not allowed in the conditions of select operations (e.g. SIGMA (c1 - c2 < c3 - c4)(Table) is not allowed).
Disclosure: this is part of a homework question.
It can be done, exactly in the way you planned, but you need generalized projection for that. The generalized projection is the operator
Π(E1, E2,..., En)R
where R is a relation, and E1...En are expressions in the form a⊕b, where a and b are attributes of R or constants, and ⊕ is an arbitrary binary operator between them. The result is a relation with attributes E1...En.
This would allow you to project the differences into a new relation (R' := Π(x-y)R), then find the maximum on that, just as you planned.
If we're not allowed to use generalized projection, then I think we have no means to actually subtract an attribute from another, or to actually calculate anything from them, as the definition of projection allow only attribute names, and the definition of selection allow only expressions of the form aθb where a and b are attributes or constants and θ is a binary relational operator (this is logical, in its way, because if we have a relation R(X,Y), then we have no idea about the type of X or Y, making operations on them quite meaningless).
I think generalized projection is a great extension to relational algebra. It's obviously immensely useful in real life, and it can be defended even from a more scientific point of view: if we allow binary conditional operators on the values like "X > 50", then we made assumptions on the type already, rendering that point kind of moot. Your instructor may disagree, though.
If you're looking to do this in the real world, you should be able to do this with a subquery (or a view, which amounts to much the same thing), something like:
select max (diff) from (
select high - low as diff from blah blah blah
)
Whether this applies to the abstract world of relational algebra, I couldn't say. I'm too busy fixing those damn real-world problems :-)

How to combine and optimize a predicate, generally?

I'm doing some work on a Complex Event Processing system. It supports filtering of sets of records based on members of those records, using a query language. The language supports logical, arithmetic and user-defined operators over arbitrary members.
Here's an example of a supported query:
( MemberA > MemberB ) &&
( #in MemberC { "str1", "str2" } ) &&
( com.foo.Bar.myPred( MemberD, MemberE ) )
My problem is that I want to combine queries into one super query, and then I want to optimize that super query to eliminate redundancies, tautologies and contradictions. e.g. I want to take
A > 0
and combine it with
A > 1
which is quite easy:
A > 0 || A > 1
but then I want to optimize it so that it reduces to
A > 0
If there are any URLs or books that discuss this general topic, I'd appreciate knowing about them.
Books? I think there's a few; and most likely you should look up for articles in this area.
What you may look at are SMT solvers that can work with your domain of queries. You feed them with the mathematicalized definition of your expression language, state axioms of the relations you support. Then they can, for instance, reason if (yes, two equal conjunctions) whether one predicate implies another is a tautology.
Note that the automated solutions to this task are vague and sometimes are beyond the theoretical capabilities of Turing machines (i.e. computer). You won't have an only and right solution to your problem.

Resources