How to select multiple attributes along with grouping operation in Relational Algebra Queries? - relational-algebra

Say I have a table called T with attributes a1, a2, a3, a4, a5. Now I want to convert the below from SQL query to relational algebra query.
SELECT a1,a2,a3, COUNT(a4) AS counta4 FROM T GROUP BY a1;
When I try to do this in relational algebra as follows,
RENAME (a1,counta4) (a1 GROUPING COUNT a4 (T))
the result relation will only contain a1 and counta4 as its attributes.
But I want to include all a1,a2,a3,counta4 as the result relation's attributes.
How to achieve this in relational algebra using only SELECT, PROJECT, RENAME, UNION, INTERSECTION, MINUS, CROSS PRODUCT(CARTESIAN PRODUCT), JOIN, DIVISION, GROUPING operations?
For instance, let's say table T is the AUTHOR table and its attributes are as follows.
AUTHOR(authorID,articleUrl,firstName,lastName,articleTitle,articleWordsCount)
Now I want to select firstName,lastName, and the number of articles each author wrote.
So my SQL query will be,
SELECT authorID,firstName,lastName,COUNT(articleURL) AS numberOfArticles FROM AUTHOR GROUP BY authorID;
The syntax for GROUPING:
<grouping attributes> GROUPING <function list> (Relation)
<function list> is <Aggregate function> <attribute name> like pairs.
Reference: Fundamentals of Database Systems (7th edition) by Elmasri (Ramez Elmasri) & Navathe (Shamkant B. Navathe) - page 260 (8.4.2 Aggregate functions and Grouping)
Link: https://www.auhd.site/upfiles/elibrary/Azal2020-01-22-12-28-11-76901.pdf
The above example as a relational algebra query,
RENAME (authorID,numberOfArticles) (authorID GROUPING COUNT articleUrl (AUTHOR))
The result relation of this relational algebra query will have only authorID and numberOfArticles as the attributes but I need all the attributes mentioned in the SQL query which are authorID, firstName, LastName, numberOfArticles.

Related

join and group-by in Hadoop Pig

Often see people are using group by and join for the same problem, suppose I have a student table and score table, want to find student name with related course score. It seems we can resolve this problem by either using join, or using group by? Wondering pros and cons for the two solutions. Post data structure and code below. Thanks.
table students:
student ID, student name, student email address
score table:
student ID, course ID, score
student_scores = group students by (studentId) inner, scores by (studentId);
student_scores = join students by student Id, scores by studentId;
In the Pig Latin Manuall about Join it says:
Note the following about the GROUP/COGROUP and JOIN operators:
The GROUP and JOIN operators perform similar functions. GROUP creates a nested set of output tuples while JOIN creates a flat set of output tuples.
The GROUP/COGROUP and JOIN operators handle null values differently (see Nulls and JOIN Operator).
Not sure if it pros & cons , but they are diffrent

In Oracle selection from two tables, is the relative position of WHERE statements relevant for performance?

I have this homework to do and too much time on my hands, so I want the answer to be perfect instead of just correct. It is all theoretical with no real database behind it. I wondered, whether it would make a performance difference to switch position of WHERE statements as follows:
Order is a table of orders with an order number as primary key. Customer is a table of customers with customer number Cno as primary key. Cno is foreign key in table Order. My task is to formulate a select statement that gives me all the orders for customers named 'Meier', Name is a data column of the customer table.
My first idea is to connect the tables with the foreign key and filter with the name:
SELECT O.*
FROM Order O, Customer C
WHERE O.Cno = C.Cno
AND C.Name = 'Meier';
I visualize this statement to make a LARGE table with all customer and order data and then to filter with the name.
Then I had the idea I might make the intermediate 'table' smaller to win some performance, so I thought I might reduce the amount of customers to fetch orders for first. I thought of this:
SELECT O.*
FROM Customer C, Order O
WHERE C.Name = 'Meier'
AND O.Cno = C.Cno;
Now I 'see' that imaginary intermediate table to only include orders for customer 'Meier' and to not to be so large any more.
Then again, I thought Oracle probably optimizes all of this away on its own, so I don't need to worry about it.
Would there be a difference in the performance of these queries, just because the sequence of two where-clause statements is reversed?
The order doesn't matter because Oracle's optimizer will look at all the WHERE clauses and figure out the most efficient way to satisfy that query.
You may also consider to have your table statistics up-to-date, because the optimizer tend to use them a lot.
You may also do something like
SELECT O.*
FROM (select Cno from Customer where Name = 'Meier') C, Order O
WHERE O.Cno = C.Cno;
99,99% the way of execution will be the, same but you can influence it with hints. For example
SELECT O.*
FROM (select /*+ materialize */ Cno from Customer where Name = 'Meier') C, Order O
WHERE O.Cno = C.Cno;
will first subset customer into a temporary table and next join it with order. When you have more than 2 tables you can use ORDERED hint.
SELECT /*+ ORDERED */ O.*
FROM Customer C, Order O, Some_other_table T
WHERE C.Name = 'Meier'
AND O.Cno = C.Cno
AND O.Cno = T.Cno;
will always first join Customer to Order and next the result to Some_other_tables regardless of statistics, which can sometimes be stale.

Linq query for one to many

My question involves MVC + Linq query. I will try to make it simple without going into the details of the Model, View, etc.. Say I have 2 tables T1 & T2. T1 holds restaurants details & T2 holds restaurants image paths. T2 rows contain restaurantID. Now if T2 has more than one rows of image paths for a Restaurant and I only need the first image path from T2 in the linq query how would I form such query? I tried to simplify the question as in fact I have 6 table joins related to the Restaurants in the query. I formed a view model which only contains the fields I want to display. I am trying to populate the view model in the controller & the query is in the controller obviously.
When I join T2 to the query, I get all the Restaurants details together with the images. But the view repeats the same Restaurant as many times as the number of table rows in T2 which is not what I want. This is the problem from the way I set the query. The query uses joins. I only need the first row from T2 while I get all from the Restaurant details. I failed to find an example for such requirement on the web so far. Your directions will be much appreciated.
Serhat Albayoglu
On your join you can use an into and then in the select you can select the FirstOrDefault
var query = from t in context.T1
join t2 in context.T2 on t.Id equals t2.RestaurantID into tgroup
select
{
t2.FirstOrDefault().path
};

Linq query to order by on multiple related data

Consider the below tables with the data
Customers -> Orders -> Items
Customers ->
A
B
C
Orders ->
o1 - A,
o2 - A,
o3 - B,
o4 - C
OrderItems ->
o1 - Item1,
o1 - Item2,
o2 - Item3,
o3 - Item2,
o4 - Item1
Item ->
Item1,
Item2,
Item3,
Item4
We have a similar mapping as above in our DB.
Now in linq i would like to get List of Customers sorted by Items which are comma seperated
eg:
Customer Items
C Item1
A Item1, Item2
B Item2
Ive tried something like this
Customer.OrderBy( cust => string.Join(",", cust.Orders
.SelectMany( order=>order.OrderItems)
.Select( orderItem=> orderItem.Item.Name)
.OrderBy(item=>item)));
but string.Join is not allowed inside linq statements..
Its not required to display the Items in my grid, but i need to get customers sorted by the comma separated Items..
And also i dont want this to be done in the UI level as the sorting needs to be done on IQueryable customer object to which other filters are added and then executed later ..
A linq orderby query with IQueryable Customer object, returning an IQueryable object.
When you are querying the Customer collection, your query will be converted to sql instructions. Since you mentioned other filters, I suppose it's a big table and you don't want to display / get all rows.
Even the string.Join operator were allowed in EF, your order clause operates on an complex statement that needs to be processed for each row to determine the correct result order (see this question for an example of what your string.join would do).
You need to somehow simplify your order clause or store the item list string into an sql field. If you can't store this string, you can try to create a stored procedure, a view or process the data using linq to objects (in this case, apply all filters using EF, get the results using .ToList() and apply your order filter using regular linq to objects).
Be aware that if you don't simplify your query and the customer table is big, you'll face some performance issues.

Entity Framework 4 generated queries are joining full tables

I have two entities: Master and Details.
When I query them, the resulting query to database is:
SELECT [Extent2]."needed columns listed here", [Extent1]."needed columns listed here"
FROM (SELECT * [Details]."all columns listed here"...
FROM [dbo].[Details] AS [Details]) AS [Extent1]
LEFT OUTER JOIN [dbo].[Master] AS [Extent2] ON [Extent1].[key] = [Extent2].[key]
WHERE [Extent1].[filterColumn] = #p__linq__0
My question is: why not the filter is in the inner query? How can I get this query? I've tried a lot of EF and Linq expressions.
What I need is something like:
SELECT <anything needed>
FROM Master LEFT JOIN Details ON Master.key = Details.Key
WHERE filterColumn = #param
I'm having a full sequential scan in both tables, and in my production environment, I have milions of rows in each table.
Thanks a lot !!
Sometimes The entity Framework does not produce the best query. You can do a few of the following to optimize.
Modify the linq statement (test with
LINQPad)
Create a stored proc and map the stored proc to return an entity
Create a view that handles the join and map the view to a new
entity

Resources