A LEFT JOIN B ON #IncludeBJoin = 1 AND B.AID = A.ID WHERE #IncludeBJoin = 0 OR B.ID IS NOT NULL - performance

I know that the temptation to use generic stored procedures for code reuse can be a trap, but I don't know if this method of parameterizing a query has anything but a constant performance cost.
SELECT
...
FROM
A
LEFT JOIN
B ON
#IncludeBJoin = 1 AND
B.AID = A.ID
WHERE
#IncludeBJoin = 0
OR B.ID IS NOT NULL -- If included, only include INNER JOIN results
The intent of the above is to short-circuit conditionally unnecessary join to B when only results from A are necessary. This is the simple case, but a practical case would have many such optional joins. I'd like to avoid distinct stored procedures for each combination if it is worth the performance cost.
Then, what about this extension?
SELECT
...
FROM
A
LEFT JOIN
B ON
#IncludeBJoin = 1 AND
B.AID = A.ID
WHERE
#IncludeBJoin = 0
OR B.ID IS NOT NULL
GROUP BY A... WITH ROLLUP -- Extension
HAVING (#AggregatesOnly = 0) OR (GROUPING(A...) = 1) -- Extension
The intent of the above is to always return the rolled-up aggregates and optionally include the component rows. If #AggregatesOnly = 1, such that I only want the groupings, what performance price would I pay?
Execution plans seems somewhat magical to me as their representations can seem conceptually distant from the original statement definition, and the inner-workings of the SQL execution engine are not well documented publically AFAIK. I've read that procedural logic such as the use of IF statements in stored procedures can really kill the benefit of a compiled stored procedure. Does these short-circuiting mechanisms have the same effect, even though they seem more "set-based"?

Related

How to associate three tables?

I have three tables ABC. I hope to display all records of A while linking BC. Field a1 of A is associated with field b1 of B. a1 may be empty, so I wrote
A.a1=B.b1(+)
The two fields c and d of table C are associated with A.a2 and B.b2 respectively
so i wrote
A.a2 = C.c(+) and B.b2 = C.d(+)
The total sql is as follows
select A.a1, A.a2, B.e,C.f,
from A, B, C
where A.a1=B.b1(+)
and A.a2 = C.c(+)
and B.b2 = C.d(+)
But the prompt says that a table can only have one external link at most.
I tried to use case when to display the information of B and C,
select A.a1, A.a2,
case when a1 is null then null
else (selece B.e from B
where B.b1=A.a1) end,
case when a1 is null then null
when (selece B.e from B
where B.b1=A.a1) is null then null
else (selece C.f from C
where C.c=A.a2 and C.d=B.b2) end
from A
but is there any other better association method?
"Older" (I believe lower than 21c) Oracle database versions won't let you outer join one table to two or more other tables using the "old" Oracle's (+) outer join operator.
But, if you switch to JOINs, then you won't have that problem. Something like this:
select *
from b left join a on a.a1 = b.b1
left join c on c.d = b.b2 and c.c = a.a2
(Fetch columns you want, include conditions you need, but - that's the general idea.)

Oracle: Which one of the queries is more efficient?

Query 1
select student.identifier,
id_tab.reporter_name,
non_id_tab.reporter_name
from student_table student
inner join id_table id_tab on (student.is_NEW = 'Y'
and student.reporter_id = id_tab.reporter_id
and id_tab.name in('name1','name2'))
inner join id_table non_id_tab on (student.non_reporter_id = non_id_tab.reporter_id)
Query 2
select student.identifier,
id_tab.reporter_name,non_id_tab.reporter_name
from student_table student,
id_table id_tab,
id_table non_id_tab
where student.is_NEW = 'Y'
and student.reporter_id = id_tab.reporter_id
and id_tab.name in('name1','name2')
and student.non_reporter_id = non_id_tab.reporter_id
Since these two queries produce exactly same output,I am assuming they are syntactically same(please correct me if I am wrong).
I was wondering whether either of them is more efficient that the other.
Can anyone help me here please?
I would rewrite it as follows, using the ON only for JOIN conditions and moving the filters to a WHERE condition:
...
from student_table student
inner join id_table id_tab on ( student.reporter_id = id_tab.reporter_id )
inner join id_table non_id_tab on (student.non_reporter_id = non_id_tab.reporter_id)
where student.is_NEW = 'Y'
and id_tab.name in('name1','name2')
This should give a more readable query; however, no matter how you write it (the ANSI join is highly preferrable), you should check the explain plans to understand how the query will be executed.
In terms of performance, there should be no difference.
Execution Plans created by the Oracle optimizer do not differ.
In terms of readability, joining tables inside the WHERE clause is an old style (SQL89).
From SQL92 and higher, it is recommended to use the JOIN syntax.

Linq-to-SQL left join on left join/multiple left joins in one Linq-to-SQL statement

I'm trying to rewrite SQL procedure to Linq, it all went well and works fine, as long as it works on small data set. I couldn't really find answer to this anywhere. Thing is, I have 3 joins in the query, 2 are left joins and 1 is inner join, they all join to each other/like a tree. Below you can see SQL procedure:
SELECT ...
FROM sprawa s (NOLOCK)
LEFT JOIN strona st (NOLOCK) on s.ident = st.id_sprawy
INNER JOIN stan_szczegoly ss (NOLOCK) on s.kod_stanu = ss.kod_stanu
LEFT JOIN broni b (NOLOCK) on b.id_strony = st.ident
What I'd like to ask you is a way to translate this to Linq. For now I have this:
var queryOne = from s in db.sprawa
join st in db.strona on s.ident equals st.id_sprawy into tmp1
from st2 in tmp1.DefaultIfEmpty()
join ss in db.stan_szczegoly on s.kod_stanu equals ss.kod_stanu
join b in db.broni on st2.ident equals b.id_strony into tmp2
from b2 in tmp2.DefaultIfEmpty()
select new { };
Seems alright, but when checked with SQL Profiler, query that is sent to database looks like that:
SELECT ... FROM [dbo].[sprawa] AS [Extent1]
LEFT OUTER JOIN [dbo].[strona] AS [Extent2]
ON [Extent1].[ident] = [Extent2].[id_sprawy]
INNER JOIN [dbo].[stan_szczegoly] AS [Extent3]
ON [Extent1].[kod_stanu] = [Extent3].[kod_stanu]
INNER JOIN [dbo].[broni] AS [Extent4]
ON ([Extent2].[ident] = [Extent4].[id_strony]) OR
(([Extent2].[ident] IS NULL) AND ([Extent4].[id_strony] IS NULL))
As you can see both SQL queries are bit different. Effect is the same, but latter works incomparably slower (less than a second to over 30 minutes). There's also a union made, but it shouldn't be the problem. If asked for I'll paste code for it.
I'd be grateful for any advice on how to better the performance of my Linq statement or how to write it in a way that is translated properly.
I guess I found the solution:
var queryOne = from s in db.sprawa
join st in db.strona on s.ident equals st.id_sprawy into tmp1
where tmp1.Any()
from st2 in tmp1.DefaultIfEmpty()
join ss in db.stan_szczegoly on s.kod_stanu equals ss.kod_stanu
join b in db.broni on st2.ident equals b.id_strony into tmp2
where tmp2.Any()
from b2 in tmp2.DefaultIfEmpty()
select new { };
In other words where table.Any() after each into table statement. It doesn't make translation any better but has sped up execution time from nearly 30minutes(!) to about 5 seconds.
This has to be used carefully though, because it MAY lead to losing some records in result set.

Best way to exclude records from multiple tables

I got the following tables (just an example): vehicles, vehicle_descriptions, vehicle_parts
vehicles have 1 to many with vehicle_descriptions and vehicle_parts. There may not be a corresponding vehicle_description/part for a given vehicle.
SELECT * FROM vehicles
LEFT OUTER JOIN vehicles d ON vehicles.vin = d.vin AND d.summary NOT LIKE 'honda'
LEFT OUTER JOIN
(SELECT SUM(desc_total) FROM vehicle_descriptions WHERE NOT LIKE desc 'honda' GROUP BY vin) b
ON vehicles.vin = vehicle_b.vin
LEFT OUTER JOIN
(SELECT SUM(part_count) FROM vehicle_parts WHERE part_for NOT LIKE 'honda' GROUP BY vin) c ON vehicles.vin = c.vin
If either vehicle_desc, vehicles, or part contains the exclusion term, the whole record should not show up in the result set. The query above will return a record even if one of the tables contain the exclusion term Honda. How would I fix the above query?
You're not using any of the information in either sum() as part of what you show, just to decide whether to include the vehicle. And you're doing an unnecessary self join in your first clause. Generally in situations like this, the "exists" and "not exists" clauses work well. So what about this? I'll use Oracle syntax, you can convert to ANSI of course.
SELECT * FROM vehicles v where summary <> 'honda'
and not exists (select 1 from vehicle_descriptions d where d.vin = v.vin and d.desc <> 'honda')
and not exists (select 1 from vehicle_parts p where p.vin = v.vin and p.part_for <> 'honda')

What is the difference between using Join in Linq and "Olde Style" pre ANSI join syntax?

This question follows on from a question I asked yesterday about why using the join query on my Entities produced horrendously complicated SQL. It seemed that performing a query like this:
var query = from ev in genesisContext.Events
join pe in genesisContext.People_Event_Link
on ev equals pe.Event
where pe.P_ID == key
select ev;
Produced the horrible SQL that took 18 seconds to run on the database, whereas joining the entities through a where clause (sort of like pre-ANSI SQL syntax) took less than a second to run and produced the same result
var query = from pe in genesisContext.People_Event_Link
from ev in genesisContext.Events
where pe.P_ID == key && pe.Event == ev
select ev;
I've googled all over but still don't understand why the second is produces different SQL to the first. Can someone please explain the difference to me? When should I use the join keyword
This is the SQL that was produced when I used Join in my query and took 18 seconds to run:
SELECT
1 AS [C1],
[Extent1].[E_ID] AS [E_ID],
[Extent1].[E_START_DATE] AS [E_START_DATE],
[Extent1].[E_END_DATE] AS [E_END_DATE],
[Extent1].[E_COMMENTS] AS [E_COMMENTS],
[Extent1].[E_DATE_ADDED] AS [E_DATE_ADDED],
[Extent1].[E_RECORDED_BY] AS [E_RECORDED_BY],
[Extent1].[E_DATE_UPDATED] AS [E_DATE_UPDATED],
[Extent1].[E_UPDATED_BY] AS [E_UPDATED_BY],
[Extent1].[ET_ID] AS [ET_ID],
[Extent1].[L_ID] AS [L_ID]
FROM [dbo].[Events] AS [Extent1]
INNER JOIN [dbo].[People_Event_Link] AS [Extent2] ON EXISTS (SELECT
1 AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable1]
LEFT OUTER JOIN (SELECT
[Extent3].[E_ID] AS [E_ID]
FROM [dbo].[Events] AS [Extent3]
WHERE [Extent2].[E_ID] = [Extent3].[E_ID] ) AS [Project1] ON 1 = 1
LEFT OUTER JOIN (SELECT
[Extent4].[E_ID] AS [E_ID]
FROM [dbo].[Events] AS [Extent4]
WHERE [Extent2].[E_ID] = [Extent4].[E_ID] ) AS [Project2] ON 1 = 1
WHERE ([Extent1].[E_ID] = [Project1].[E_ID]) OR (([Extent1].[E_ID] IS NULL) AND ([Project2].[E_ID] IS NULL))
)
WHERE [Extent2].[P_ID] = 291
This is the SQL that was produce using the ANSI Style syntax (and is fairly close to what I would write if I were writing the SQL myself):
SELECT * FROM Events AS E INNER JOIN People_Event_Link AS PE ON E.E_ID=PE.E_ID INNER JOIN PEOPLE AS P ON P.P_ID=PE.P_ID
WHERE P.P_ID = 291
Neither of the above queries are entirely "correct." In EF, it is generally correct to use the relationship properties in lieu of either of the above. For example, if you had a Person object with a one to many relationship to PhoneNumbers in a property called Person.PhoneNumbers, you could write:
var q = from p in Context.Person
from pn in p.PhoneNumbers
select pn;
The EF will build the join for you.
In terms of the question above, the reason the generated SQL is different is because the expression trees are different, even though they produce equivalent results. Expression trees are mapped to SQL, and you of course know that you can write different SQL which produces the same results but with different performance. The mapping is designed to produce decent SQL when you write a farily "conventional" EF query.
But the mapping is not so smart as to take a very unconventional query and optimize it. In your first query, you state that the objects must be equivalent. In the second, you state that the ID property must be equivalent. My sample query above says "just get the details for this one record." The EF is designed to work with the way I show, principally, but also handles scalar equivalence well.

Resources