Relational algebra - recode column values - relational-algebra

Suppose I have a table 'animals' whose rows represent different animals and there is a column species that might have values like 'cat', 'dog', 'horse', 'cow' etc. Suppose I am only interested in whether the animal is a dog or not. In sql (at least in MySQL) I am able to make a query like select (species='dog') as isDog from animals to return 1 for dogs and 0 otherwise. How can I express this in RA? It is not selecting because we are not limiting rows. Can I use project operator even though my expression (species='dog') is not an attribute as such? Or how should I deal with this?
EDIT:
I want to achieve what would result by using the project-operator on a column that does not exist but is rather based on the truth value of a statement. For example the table animals containing rows with just one column 'species' having rows: cat, dog, horse, cow. I need the boolean value that could be renamed to 'isDog' that would result in values 0,1,0,0 (1=true, 0=false). I get this information in MySQL by selecting (species='dog') as isDog and I wonder if it is valid RA to use the project-operator with (species='dog') to pick such a dynamically created column, or is there some other way to deal with this?

TL;DR To introduce specific values into a relational algebra expression you have to have a way to write table literals. Usually the necessary operators are not made explicit, but on the other hand algebra exercises frequently use some kind of notation for example values.
For your case using the simplest additional relations (with an ad hoc table literal notation similar to SQL VALUES):
(restrict SPECIES=dog Animals) natural join TABLE{ISDOG}{<1>}
union (restrict SPECIES<>dog Animals) natural join TABLE{ISDOG}{<0>}
If you want a reference to a relational algebra with an operator for general calculations see author Chris Date's EXTEND operator and his classic textbook An Introduction to Database Systems, 8th Edition.
(More generally: To output a table with rows with new columns with values that are arbitrary functions of values in input rows we have to have access to tables corresponding to operators. Either we use a table literal or we assume a boolean function and use its name as a table name.)
But it turns out that the relational model was designed so that:
Every algebra operator corresponds to a certain logic operator.
NATURAL JOIN & AND
RESTRICTtheta & ANDtheta
UNION & OR
MINUS & AND NOT
PROJECTall butC & EXISTS C
etc
Every nested algebra expression corresponds to a certain nested logic expression.
Animals &
"animal named NAME is AGE years old ... and is of species SPECIES"
restrict SPECIES=dog Animals &
"animal named NAME is AGE years old ... and is of species SPECIES" AND SPECIES=dog
A proposition is a statement. A predicate is a statement template. If each base table holds the rows that make a true proposition from a predicate parameterized by its columns then a query holds the rows that make a true proposition from a corresponding predicate parameterized by its columns.
-- table of rows where
animal named NAME is AGE years old ... and is of species SPECIES
Animals
-- table of rows where
animal named NAME is AGE years old ... and is of species SPECIES
AND if SPECIES=dog then ISDOG=1 ELSE ISDOG=0
-- ie rows where
animal named NAME is AGE years old ... and is of species SPECIES
AND (SPECIES=dog AND ISDOG=1 OR SPECIES<>DOG AND ISDOG=0)
-- ie rows where
animal named NAME is AGE years old ... and is of species SPECIES
AND SPECIES=dog AND ISDOG=1
OR animal named NAME is AGE years old ... and is of species SPECIES
AND SPECIES<>dog AND ISDOG=0
(restrict SPECIES=dog Animals) natural join TABLE{ISDOG}{<1>}
union (restrict SPECIES<>dog Animals) natural join TABLE{ISDOG}{<0>}
So you can just use logic, the language of precision in engineering (including software-), science (including computer-) and mathematics, to describe your result tables.
-- table of rows where
animal named NAME is AGE years old ... and is of species SPECIES
AND ISDOG=(if SPECIES=dog then 1 else 0)
So you can use table expressions and/or logic expressions in specifications, whichever happens to be clearer, on a (sub)expression-by-(sub)expression basis.
Animals
natural join
table of rows where if SPECIES=dog THEN ISDOG=1 ELSE ISDOG=0
(The table corresponding to that IF expression has a row for every string, and the 'dog' row is the only one with a 1.)
(Nb SQL ON & WHERE have this form of a table on the left and predicate with functions on the right.)
An algebra expression calculates the rows that satisfy its corresponding logic expression.
It might not be obvious to you what
equivalent relation expressions correspond to what equivalent logic expressions and vice versa. But all that matters is that your clients understand the algebra and/or logic in the specification and your programmers can write an equivalent SQL expression.
Instead of so natural joining a table representing arguments to a table representing a function, you can 1) cross join each one-row table holding a function result to the restriction of the argument table to rows that give that result then 2) union the cross joins.
(restrict SPECIES=dog Animals) natural join TABLE{ISDOG}{<1>}
union (restrict SPECIES<>dog Animals) natural join TABLE{ISDOG}{<0>}
See also: Is multiplication allowed in relational algebra?
Re querying relationally: Relational algebra for banking scenario
Re understanding the semantics of relational algebra & SQL (plus more links): Is there any rule of thumb to construct SQL query from a human-readable description?
PS SQL SELECT does things that projection does, but it also does other things that aren't projection, which get done in the algebra by rename, join and/or table literals. What you want isn't projection. It is called EXTEND by author Chris Date. I would advise anyone to use/reference Date's algebra. Although adding RESTRICT/WHERE and EXTEND on arbitrary logic expressions (wffs & terms) begs the question of how one deals with the logic expressions algebraically. This answer explains that/how you can always algebraically express the logic expressions given literal and/or operator tables.

Related

Joining tables with table type slows down calculation?

I have a big calculation that joins together about 10 tables and calculates some values from the result. I want to write a function that allows me to replace one of the tables that are joined (lets call it Table A) with a table (type) I give as an input parameter.
I have defined row and table types for table A like
create or replace TYPE t_tableA_row AS OBJECT(*All Columns of Table A*);
create or replace TYPE t_tableA_table as TABLE OF t_tableA_row;
And the same for the types of the calculation I need as an output of the function.
My functions looks like this
create or replace FUNCTION calculation_VarInput (varTableA t_tableA_table)
RETURN t_calculationResult_table AS
result_ t_calculationResult_table;
BEGIN
SELECT t_calculationResult_row (*All Columns of Calculation Result*)
BULK COLLECT INTO result_
FROM (*The calculation*)
RETURN result_;
END;
If I test this function with the normal calculation that just uses Table A(ignoring the input parameter), it works fine and takes about 3 Second. However, if I replace Table A with varTableA (the input parameter that is a table type of Table A), the calculation takes so long I have never seen it finish.
When I use table A for the calculation it looks like this
/*Inside the calculation*/
*a bunch tables being joined*
JOIN TableA A On A.Value = B.SomeOtherValue
JOIN *some other tables*
When I use varTableA its
/*Inside the calculation*/
*a bunch tables being joined*
JOIN TABLE(varTableA ) A On A.Value = B.SomeOtherValue
JOIN *some other tables*
Sorry for not posting the exact code but the calculation is huge and would really bloat this post.
Any ideas why using the table type when joining makes the calculation so much slower when compared to using the actual table?
Your function encapsulates some selection logic in a function and so hides information from the optimizer. This may lead the optimizer to make bad or inefficient decisions.
Oracle has gathered statistics for TableA so the optimizer knows how many rows it has, what columns are indexed and so on. Consequently it can figure out the best access path for the table. It has no stats for TABLE(varTableA ) so it assumes it will return 8192 (i.e. 8k) rows. This could change the execution plan if say the original TableA returned 8 rows. Or 80000. You can check this easily enough by running EXPLAIN PLAN for both versions of query.
If that is the problem add a /*+ cardinality */ to the query which accurately reflects the number of rows in the function's result set. The hint (hint, not function) tells the optimizer the number of rows it should use in its calculation.
I don't want to actually change the values in my tables permanently, I just want to know what the calculation result would be if some values were different.
Why not use a view instead? A simple view which selects from TableA and applies the required modifications in its projection. Of course I know nothing about your data and how you want to manipulate it, so this may be impractical for all sorts of reasons. But it's where I would start.

Oracle SQL sub query vs inner join

At first, I seen the select statement on Oracle Docs.
I have some question about oracle select behaviour, when my query contain select,join,where.
see this below for information:
My sample table:
[ P_IMAGE_ID ]
IMAGE_ID (PK)
FILE_NAME
FILE_TYPE
...
...
[ P_IMG_TAG ]
IMG_TAG_ID (PK)
IMAGE_ID (FK)
TAG
...
...
My requirement are: get distinct of image when it's tag is "70702".
Method 1: Select -> Join -> Where -> Distinct
SELECT DISTINCT PID.IMAGE_ID
, PID.FILE_NAME
FROM P_IMAGE_ID PID
INNER JOIN P_IMG_TAG PTAG
ON PTAG.IMAGE_ID = PID.IMAGE_ID
WHERE PTAG.TAG = '70702';
I think the query behaviour should be like:
join table -> hint where cause -> distinct select
I use Oracle SQL developer to get the explain plan:
Method 1 cost 76.
Method 2: Select -> Where -> Where -> Distinct
SELECT DISTINCT PID.IMAGE_ID
, PID.FILE_NAME
FROM P_IMAGE_ID PID
WHERE PID.IMAGE_ID IN
(
SELECT PTAG.IMAGE_ID
FROM P_IMG_TAG PTAG
WHERE PTAG.TAG = '70702'
);
I think the second query behaviour should be like:
hint where cause -> hint where cause -> distinct select
I use Oracle SQL developer to get the explain plan too:
Method 2 cost 76 too. Why?
I believe when I try where cause first for reduce the database process and avoid join table that query performance should be better than the table join query, but now when I test it, I am confused, why 2 method cost are equal ?
Or am I misunderstood something ?
List of my question here:
Why 2 method above cost are equal ?
If the result of sub select Tag = '70702' more than thousand or million or more, use join table should be better alright ?
If the result of sub select Tag = '70702' are least, use sub select for reduce data query process is better alright ?
When I use method 1 Select -> Join -> Where -> Distinct mean the database process table joining before hint where cause alright ?
Someone told me when i move hint cause Tag = '70702' into join cause
(ie. INNER JOIN P_IMG_TAG PTAG ON PAT.IMAGE_ID = PID.IMAGE_ID AND PTAG.TAG = '70702' ) it's performance may be better that's alright ?
I read topic subselect vs outer join and subquery or inner join but both are for SQL Server, I don't sure that may be like Oracle database.
The DBMS takes your query and executes something. But it doesn't execute steps that correspond to SQL statement parts in the order they appear in an SQL statement.
Read about "relational query optimization", which could just as well be called "relational query implementation". Eg for Oracle.
Any language processor takes declarations and calls as input and implements the described behaviour in terms of internal data structures and operations, maybe through one or more levels of "intermediate code" running on a "virtual machine", eventually down to physical machines. But even just staying in the input language, SQL queries can be rearranged into other SQL queries that return the same value but perform significantly better under simple and general implementation assumptions. Just as you know that your question's queries always return the same thing for a given database, the DBMS can know. Part of how it knows is that there are many rules for taking a relational algebra expression and generating a different but same-valued expression. Certain rewrite rules apply under certain limited circumstances. There are rules that take into consideration SQL-level relational things like primary keys, unique columns, foreign keys and other constraints. Other rules use implementation-oriented SQL-level things like indexes and statistics. This is the "relational query rewriting" part of relational query optimization.
Even when two different but equivalent queries generate different plans, the cost can be similar because the plans are so similar. Here, both a HASH and SORT index are UNIQUE. (It would be interesting to know what the few top plans were for each of your queries. It is quite likely that those few are the same for both, but that the plan that is more directly derived from the particular input expression is the one that is offered when there's little difference.)
The way to get the DBMS to find good query plans is to write the most natural expression of a query that you can find.

Oracle - select statement alias one column and wildcard to get all remaining columns

New to SQL. Pardon me if this question is a basic one. Is there a way for me to do this below
SELECT COLUMN1 as CUSTOM_NAME, <wildcard for remaining columns as is> from TABLE;
I only want COLUMN1 appear once in the final result
There is no way to make that kind of dynamic SELECT list with regular SQL*.
This is a good thing. Programming gets more difficult the more dynamic it is. Even the simple * syntax, while useful in many contexts, causes problems in production code. The Oracle SQL grammar is already more complicated than most traditional programming languages, adding a little meta language to describe what the queries return could be a nightmare.
*Well, you could create something using Oracle data cartridge, or DBMS_XMLGEN, or a trick with the PIVOT clause. But each of those solutions would be incredibly complicated and certainly not as simple as just typing the columns.
This is about as close as you will get.
It is very handy for putting the important columns up front,
while being able to scroll to the others if needed. COLUMN1 will end up being there twice.
SELECT COLUMN1 as CUSTOM_NAME,
aliasName.*
FROM TABLE aliasName;
In case you have many columns it might be worth to generate a full column list automatically instead of relying on the * selector.
So a two step approach would be to generate the column list with custom first N columns and unspecified order of the other columns, then use this generated list in your actual select statement.
-- select comma separated column names from table with the first columns being in specified order
select
LISTAGG(column_name, ', ') WITHIN GROUP (
ORDER BY decode(column_name,
'FIRST_COLUMN_NAME', 1,
'SECOND_COLUMN_NAME', 2) asc) "Columns"
from user_tab_columns
where table_name = 'TABLE_NAME';
Replace TABLE_NAME, FIRST_COLUMN_NAME and SECOND_COLUMN_NAME by your actual names, adjust the list of explicit columns as needed.
Then execute the query and use the result, which should look like
FIRST_COLUMN_NAME, SECOND_COLUMN_NAME, OTHER_COLUMN_NAMES
Ofcourse this is overhead for 5-ish columns, but if you ever run into a company database with 3 digit number of columns, this can be interesting.

Select distinct results, when using with() operator in Yii

How can I select only distinct records, from relational table, when using with() operator in Yii?
I'm getting my models (records) like that:
$probe = Probes::model()->with(array
(
'user',
'results',
'results.answer',
'survey',
'survey.questions',
'survey.questions.question',
'survey.questions.question.answers',
'manager'
))->findByPk($id);
I want to make sure, that survey.questions relation returns only distinct records. But it seems, that I don't see any way to achieve this (or I'm blind / not educated enough).
When giving relational table name / alias as array:
'results.question'=>array('alias'=>'results_question'),
the distinct key is not among those, that can be used in such array (as modifier).
I tried very ugly, bumpy way of changing select from default * to DISTINCT *:
'survey.questions'=>array('select'=>'distinct'),
But this has (of course?) failed:
Active record "SurveysQuestions" is trying to select an invalid column "distinct". Note, the column must exist in the table or be an expression with alias.
How can I achieve this (seemed so obvious and easy), if it is possible at all this way (using with())? If not, then -- please, advice how to get distinct records in relational table any way (other than manually filtering results using foreach, what I'm doing right now, and what is ugly).
You could set CDbCriteria::distinct to true:
'survey.questions'=>array('distinct'=>true),

How to join two datatable for dynamic column in LINQ

I have a two datatable dt1 and dt2 (which are generated in runtime) and i have to apply inner join query on this table.(EmpId is same in both table)
but the no of coloumns and their names are dynamic which are depends upon databse.
both table contains same coloumn name like table 1 have coloumn this table contains leave taken by employee "p" .he have not taken any sickleave so value is null.
EmpId Empname SickLeave Casual Leave
1 P 1
and table two have values like
EmpId Empname SickLeave Casual Leave
1 P 5 5
this table contain total leave given by a copmany to a employee (Max leave)
i have to join this query and show result like this
EmpId Empname SickLeave Casual Leave
1 P 0/5 1/5
so i want to know how can i join this two datable and show result like this using Ef and LINQ. (no of leave given two i.e sick leave ,casual leave but it may be three or 2 or 4 depend upon databese and its name also can be chage accoding to databse)
if any one have an idea please guide me
If you really need dynamic Linq to entities then
OPTION A) string to lambda
Dynamic Expressions and Queries in LINQ
System.Linq.Dynamic can be found at following links
http://msdn.microsoft.com/en-US/vstudio/bb894665.aspx
http://weblogs.asp.net/scottgu/archive/2008/01/07/dynamic-linq-part-1-using-the-linq-dynamic-query-library.aspx
http://www.scottgu.com/blogposts/dynquery/dynamiclinqcsharp.zip
How to convert a String to its equivalent LINQ Expression Tree?
OPTION B) Build expression trees
A more thorough approach is to build expression trees. Build expressions trees with code found here:
http://msdn.microsoft.com/en-us/library/system.linq.expressions.aspx
Dynamic LINQ and Dynamic Lambda expressions?

Resources