Oracle: Which one of the queries is more efficient? - oracle

Query 1
select student.identifier,
id_tab.reporter_name,
non_id_tab.reporter_name
from student_table student
inner join id_table id_tab on (student.is_NEW = 'Y'
and student.reporter_id = id_tab.reporter_id
and id_tab.name in('name1','name2'))
inner join id_table non_id_tab on (student.non_reporter_id = non_id_tab.reporter_id)
Query 2
select student.identifier,
id_tab.reporter_name,non_id_tab.reporter_name
from student_table student,
id_table id_tab,
id_table non_id_tab
where student.is_NEW = 'Y'
and student.reporter_id = id_tab.reporter_id
and id_tab.name in('name1','name2')
and student.non_reporter_id = non_id_tab.reporter_id
Since these two queries produce exactly same output,I am assuming they are syntactically same(please correct me if I am wrong).
I was wondering whether either of them is more efficient that the other.
Can anyone help me here please?

I would rewrite it as follows, using the ON only for JOIN conditions and moving the filters to a WHERE condition:
...
from student_table student
inner join id_table id_tab on ( student.reporter_id = id_tab.reporter_id )
inner join id_table non_id_tab on (student.non_reporter_id = non_id_tab.reporter_id)
where student.is_NEW = 'Y'
and id_tab.name in('name1','name2')
This should give a more readable query; however, no matter how you write it (the ANSI join is highly preferrable), you should check the explain plans to understand how the query will be executed.

In terms of performance, there should be no difference.
Execution Plans created by the Oracle optimizer do not differ.
In terms of readability, joining tables inside the WHERE clause is an old style (SQL89).
From SQL92 and higher, it is recommended to use the JOIN syntax.

Related

How to simplify this postgres query

How can I simplify this query?
What I am trying to do is derive the column S9_Unlock via a subquery in which I only look for user_ids which are returned from the main query but this looks very awkward to me, especially as this query here is just an excerpt. In reality I am doing multiple of these subqueries to derive different columns...
SELECT userid, CAST(to_char(S9_unlock,'YYYY/MM/DD') AS timestamp) AS "S9_Unlock"
FROM (
SELECT ca.user_id AS userid
FROM shop_db.invoices AS inv
LEFT JOIN shop_db.carts AS ca ON inv.id = ca.invoice_id
LEFT JOIN shop_db.cart_items AS ci ON ca.id = ci.cart_id
WHERE (inv.created BETWEEN '2014-11-13' AND '2014-11-14' OR inv.created BETWEEN '2013-11-14' AND '2013-11-15')
AND inv.status <> 'do_not_book'
AND inv.id IS NOT NULL
GROUP BY user_id) AS master
LEFT JOIN (
SELECT MIN(s3.unl) AS "S9_Unlock", s3.user_id
FROM (
SELECT user_id, challenge_codes.created AS unl,
MAX /* Check if license contains Suite9 */
(CASE WHEN substring(bundle_article_code,1,6) = 'BuSu90' THEN 1 ELSE 0 END) AS "S9_Unlock"
FROM licensing_db.serial_numbers
LEFT JOIN licensing_db.licenses ON licenses.id = serial_numbers.license_id
LEFT JOIN user_db.users ON users.id = licenses.user_id
LEFT JOIN licensing_db.challenge_codes ON challenge_codes.serial_number_id = serial_numbers.id
WHERE user_id IN (
SELECT ca.user_id AS userid
FROM shop_db.invoices AS inv
LEFT JOIN shop_db.carts AS ca ON inv.id = ca.invoice_id
LEFT JOIN shop_db.cart_items AS ci ON ca.id = ci.cart_id
WHERE (inv.created BETWEEN '2014-11-13' AND '2014-11-14' OR inv.created BETWEEN '2013-11-14' AND '2013-11-15')
AND inv.status <> 'do_not_book'
AND inv.id IS NOT NULL
GROUP BY user_id
)
GROUP BY user_id, challenge_codes.created) AS s3
)
WHERE "S9_Unlock" = 1
AND s3.unl IS NOT NULL
GROUP BY s3.user_id) AS "S9_Unlock" ON "S9_Unlock".user_id = master.userid
In your query you have two sub-queries that are identical; this screams for a CTE.
In the sub-query on licensing issues you can filter out the valid licenses after the GROUP BY clause using a HAVING clause. Make that a WITH QUERY too and you end up with the rather more readable:
WITH inv AS (
SELECT ca.user_id AS userid
FROM shop_db.invoices AS inv
LEFT JOIN shop_db.carts AS ca ON ca.invoice_id = inv.id
LEFT JOIN shop_db.cart_items AS ci ON ci.cart_id = ca.id
WHERE (inv.created BETWEEN '2014-11-13' AND '2014-11-14' OR inv.created BETWEEN '2013-11-14' AND '2013-11-15')
AND inv.status <> 'do_not_book'
AND inv.id IS NOT NULL
), s3 AS (
SELECT u.user_id, min(cc.created) AS first_unlocked, bundle_article_code
FROM licensing_db.serial_numbers AS sn
LEFT JOIN licensing_db.licenses AS lic ON lic.id = sn.license_id
LEFT JOIN user_db.users AS u ON u.id = lic.user_id
LEFT JOIN licensing_db.challenge_codes AS cc ON cc.serial_number_id = sn.id
WHERE u.user_id IN (SELECT userid FROM inv)
GROUP BY u.user_id, bundle_article_code
HAVING bundle_article_code LIKE 'BuSu90%'
AND first_unlocked IS NOT NULL
)
SELECT userid, date_trunc('day', first_unlocked) AS "S9_Unlock"
FROM inv
LEFT JOIN s3 ON s3.user_id = inv.userid;
So the main query is now reduced to 3 lines and both the WITH-QUERY's perform a logically self-contained query of the database. The other sub-queries you refer to can similarly become a WITH-QUERY and then you assemble them in the main query. Remember that you can refer to earlier named queries in the list of with-queries, as is shown above with inv being referred to by s3. While such CTE's are syntactically not providing new functionality (except for the RECURSIVE variant), they do make complex queries much more readable and therefore easier to maintain.
Another approach would be to factor out logical sub-components (such as the inv sub-query) and make a VIEW out of those. Then you can simply reference the view in the main query. Making the whole thing a view is probably also a good idea if you want to make the query more flexible. What if you want to query for Suite9.1 ('BuSu91%') on 27 March 2014? Taken those literals out and then using them as WHERE clauses in a view makes your query more versatile; this can be either with sub-queries or with the complete CTE.
(Please check if the semantics are still right in the s3 with-query because without your table structures and sample data I ccannot test my code above.)
Instead of solving your problem as one big monolithic relational sql query, I would seriously consider going the "procedural" way, by using the built-in "plpgsql" language of postgresql. This could bring a lot of clarity in your application.

Unable to use order by in sub query in HQL

I have this HQL where I need a subquery. I know it's not legal to make a subquery in order by, but I can't figure out how to do it
SELECT OBJECT(l) FROM InboundNotification l
INNER JOIN l.item item
WHERE l.job = ? ORDER BY (SELECT SUM(itemInst.qty)
FROM ItemInst itemInst
WHERE itemInst.receivedFromNotification_id = l.id) DESC, item.localId DESC
The above fails since I have the subquery in order by. How can I reconfigure it so this will work?
A sort in the Java code is not a option here even though it's almost as efficient.
ok, i haven't a notion of hql, but I'm gonna assume it's something like other query languages dive in here given that this question has remained unanswered for so long.
could you rewrite the query so it's something like this:
SELECT OBJECT(l), SUM(itemInst.qty) theSum
FROM InboundNotification l
INNER JOIN l.item item WHERE l.job = ?
INNER JOIN ItemInst on ItemInst.KEY = l.KEY
WHERE itemInst.receivedFromNotification_id = l.id)
GROUP BY OBJECT(l)
ORDER BY theSum
where ItemInst.KEY = l.KEY shows the appropriate relationship for your situation (if such a relationship exists)

Writing a LINQ query to traverse many tables

I wanted to write a LINQ query based on the SQL below.
Basically this strategy seems really confusing - why start from MerchantGroupMerchant and do 2 'from' statements?
Problem: Is there a simpler way to write this LINQ query?
var listOfCampaignsMerchantIsInvolvedIn =
(from merchantgroupactivity in uow.MerchantGroupActivities
from merchantgroupmerchant in uow.MerchantGroupMerchants
where merchantgroupmerchant.MerchantU.Id == merchantUIDGuid
select new
{
merchantgroupactivity.ActivityU.CampaignU.Id
}).Distinct();
Here is the table structure:
and the SQL:
SELECT DISTINCT Campaign.ID
FROM Campaign
INNER JOIN Activity
ON ( Campaign.CampaignUID = Activity.CampaignUID )
INNER JOIN MerchantGroupActivity
ON ( Activity.ActivityUID = MerchantGroupActivity.ActivityUID )
INNER JOIN MerchantGroup
ON ( MerchantGroup.MerchantGroupUID = MerchantGroupActivity.MerchantGroupUID )
INNER JOIN MerchantGroupMerchant
ON ( MerchantGroupMerchant.MerchantGroupUID = MerchantGroup.MerchantGroupUID )
INNER JOIN Merchant
ON ( Merchant.MerchantUID = MerchantGroupMerchant.MerchantUID )
WHERE Merchant.ID = 'M1'
No, not really, even if you use views to partially or completely reduce query size your execution plan will still look the same in the end (and execute just as fast/slow). If you have to traverse 5 joins then you have to traverse 5 joins, the only cure is "shorting" the model by introducing links between say merchant and activity or merchant and campaign. You can accomplish this by either introducing the M2M table between them (at the cost of manual maintenance), but I would not recommend it unless retrieval is really an issue. If this query is too slow you should check for existence of indexes on all join FK fields.

Why does this query result in a MERGE JOIN CARTESIAN in Oracle?

Here is my query:
select count(*)
from email_prod_junc j
inner join trckd_prod t5 on j.trckd_prod_sk = t5.trckd_prod_sk
inner join prod_brnd b on t5.prod_brnd_sk = b.prod_brnd_sk
inner join email e on j.email_sk = e.email_sk
inner join dm_geography_sales_pos_uniq u on (u.emp_sk = e.emp_sk and u.prod_brnd_sk = b.prod_brnd_sk)
The explain plan says:
Cartesian Join between DM_GEOGRAPHY_SALES_POS_UNIQ and EMAIL_PROD_JUNC.
I don't understand why because there is a join condition for each table.
I solved this by adding the ORDERED hint:
select /*+ ordered */
I got the information from here
If you specify the tables in the order you want them joined and use this hint, Oracle won't spend time trying to figure out the optimal join order, it will just join them as they are ordered in the FROM clause.
Without knowing your indexes and the full plan, it's hard to say why this is happening exactly. My best guess is that EMAIL_PROD_JUNC and DM_GEOGRAPHY_SALES_POS_UNIQ are relatively small and that there's an index on TRCKD_PROD(trckd_prod_sk, prod_brnd_sk). If that's the case, then the optimizer may have decided that the Cartesian on the two smaller tables is less expensive than filtering TRCKD_PROD twice.
I would speculate that it happens because of the on (x and y) condition of the last inner join. Oracle probably doesn't know how to optimize the multi-statement condition, so it does a full join, then filters the result by the condition after the fact. I'm not really familiar with Oracle's explain plan, so I can't say that with authority
Edit
If you wanted to test this hypothesis, you could try changing the query to:
inner join dm_geography_sales_pos_uniq u on u.emp_sk = e.emp_sk
where u.prod_brnd_sk = b.prod_brnd_sk
and see if that eliminates the full join from the plan

What is the difference between using Join in Linq and "Olde Style" pre ANSI join syntax?

This question follows on from a question I asked yesterday about why using the join query on my Entities produced horrendously complicated SQL. It seemed that performing a query like this:
var query = from ev in genesisContext.Events
join pe in genesisContext.People_Event_Link
on ev equals pe.Event
where pe.P_ID == key
select ev;
Produced the horrible SQL that took 18 seconds to run on the database, whereas joining the entities through a where clause (sort of like pre-ANSI SQL syntax) took less than a second to run and produced the same result
var query = from pe in genesisContext.People_Event_Link
from ev in genesisContext.Events
where pe.P_ID == key && pe.Event == ev
select ev;
I've googled all over but still don't understand why the second is produces different SQL to the first. Can someone please explain the difference to me? When should I use the join keyword
This is the SQL that was produced when I used Join in my query and took 18 seconds to run:
SELECT
1 AS [C1],
[Extent1].[E_ID] AS [E_ID],
[Extent1].[E_START_DATE] AS [E_START_DATE],
[Extent1].[E_END_DATE] AS [E_END_DATE],
[Extent1].[E_COMMENTS] AS [E_COMMENTS],
[Extent1].[E_DATE_ADDED] AS [E_DATE_ADDED],
[Extent1].[E_RECORDED_BY] AS [E_RECORDED_BY],
[Extent1].[E_DATE_UPDATED] AS [E_DATE_UPDATED],
[Extent1].[E_UPDATED_BY] AS [E_UPDATED_BY],
[Extent1].[ET_ID] AS [ET_ID],
[Extent1].[L_ID] AS [L_ID]
FROM [dbo].[Events] AS [Extent1]
INNER JOIN [dbo].[People_Event_Link] AS [Extent2] ON EXISTS (SELECT
1 AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable1]
LEFT OUTER JOIN (SELECT
[Extent3].[E_ID] AS [E_ID]
FROM [dbo].[Events] AS [Extent3]
WHERE [Extent2].[E_ID] = [Extent3].[E_ID] ) AS [Project1] ON 1 = 1
LEFT OUTER JOIN (SELECT
[Extent4].[E_ID] AS [E_ID]
FROM [dbo].[Events] AS [Extent4]
WHERE [Extent2].[E_ID] = [Extent4].[E_ID] ) AS [Project2] ON 1 = 1
WHERE ([Extent1].[E_ID] = [Project1].[E_ID]) OR (([Extent1].[E_ID] IS NULL) AND ([Project2].[E_ID] IS NULL))
)
WHERE [Extent2].[P_ID] = 291
This is the SQL that was produce using the ANSI Style syntax (and is fairly close to what I would write if I were writing the SQL myself):
SELECT * FROM Events AS E INNER JOIN People_Event_Link AS PE ON E.E_ID=PE.E_ID INNER JOIN PEOPLE AS P ON P.P_ID=PE.P_ID
WHERE P.P_ID = 291
Neither of the above queries are entirely "correct." In EF, it is generally correct to use the relationship properties in lieu of either of the above. For example, if you had a Person object with a one to many relationship to PhoneNumbers in a property called Person.PhoneNumbers, you could write:
var q = from p in Context.Person
from pn in p.PhoneNumbers
select pn;
The EF will build the join for you.
In terms of the question above, the reason the generated SQL is different is because the expression trees are different, even though they produce equivalent results. Expression trees are mapped to SQL, and you of course know that you can write different SQL which produces the same results but with different performance. The mapping is designed to produce decent SQL when you write a farily "conventional" EF query.
But the mapping is not so smart as to take a very unconventional query and optimize it. In your first query, you state that the objects must be equivalent. In the second, you state that the ID property must be equivalent. My sample query above says "just get the details for this one record." The EF is designed to work with the way I show, principally, but also handles scalar equivalence well.

Resources