making inner join linq or using [table].[joiningTable].column is the same? - performance

I have recently started working on linq and I was wondering suppose I have 2 related tables Project (<=with fkAccessLevelId) and AccessLevel and I want to just select values from both tables. Now there are 2 ways I can select values from these tables.
The one i commonly use is:
(from P in DataContext.Projects
join AL in DataContext.AccessLevel
on P.AccessLevelId equals AL.AccessLevelId
select new
{
ProjectName = P.Name,
Access = AL.AccessName
}
Another way of doing this would be:
(from P in DataContext.Projects
select new
{
ProjectName = P.Name,
Access = P.AccessLevel.AccessName
}
What i wanted to know is which of these way is efficient if we increase the number of table say 5-6 with 1-2 tables containing thousands of records...?

You should take a look at the SQL generated. You have to understand that there are several main performance bottle necks in a Linq query (in this case I assume a OMG...Linq to SQL?!?!) the usual main bottle neck is the SQL query on the server.
Typically SQL Server has a very good optimizer, so actually, given the same query, refactored, the perf is pretty uniform.
However in your case, there is a very real difference in the two queries. A project with no Access Level would not appear in the first query, whilst the second query would return with a null AccessName. In effect you would be comparing a LEFT JOIN to an INNER JOIN.
TL:DR For SQL Server/Linq to Entity Framework queries that do the same thing should give similar performance. However your queries are far from similar.

Related

Most efficient way to select in bulk from a multi million records table

I'm interested in getting and doing some processing on all the entities A returned by a query of the form:
SELECT * FROM A a WHERE a.id not in (select b.id from B)
Where A is a "complex" entity in the sense that it inherits (InheritanceTyped.Joined) from other entities and that several of its attributes are other entities (#OneToOne and #ManyToOne).
The query itself takes a few minutes to yield results hence my desire to execute it as few as possible.
Here are the different approaches i tried to get those A elements as efficiently as possible :
Pagination using setFirstResult/ setMaxResults
Do the job, but pretty slowly as the query seems to be executed everytime.(around 50 elements processed/sec)
Getting IDs first, A objects next
Keeping all the IDs in memory is doable, so I execute once
SELECT a.id FROM A a WHERE a.id not in (select b.id from B)
and then select a from A a WHERE a.id= :id, which goes relatively fast as the id column is indexed. This is currently the solution that is the most efficient with (around 100 elements processed/sec)
Using ScollableResults I had high hope with this solution, but it ended up being slower than other alternatives, leaving me at around 20 elements processed/sec ...
As a neophyte, I don't know what other options to investigate, or if I did something wrong in any of my attempts.
Hence my questions:
Are there (factually) other approaches to efficiently tackle this kind of problem ?
Is it normal that ScrollableResults performed so poorly ? Is there something I should have paid attention to while implementing this solution?
EDIT:
Here's the execution plan

Oracle query with multiple tables

I am trying to display volunteer information with duty and what performance is allocated.
I want to display this information. However, when I run the query, it did not gather the different date from same performance. And also availability_date is mixed up. Is it right query for it? I am not sure it is right query.
Could you give me some feedback for me?
Thanks.
Query is here.
SELECT Production.name, performance.performance_date, volunteer_duty.availability_date, customer.name "Customer volunteer", volunteer.volunteerid, membership.name "Member volunteer", membership.membershipid
FROM Customer, Membership, Volunteer, volunteer_duty, duty, performance_duty, performance, production
WHERE
Customer.customerId (+) = Membership.customerId AND
Membership.membershipId = Volunteer.membershipId AND
volunteer.volunteerid = volunteer_duty.volunteerid AND
duty.dutyid = volunteer_duty.dutyid AND
volunteer_duty.dutyId = performance_duty.dutyId AND
volunteer_duty.volunteerId = performance_duty.volunteerId AND
performance_duty.performanceId = performance.performanceId AND
Performance.productionId = production.productionId
--Added image--
Result:
The query seems reasonable, in terms of it having what appear to be the appropriate join conditions between all the tables. It's not clear to me what issue you are having with the results; it might help if you explained in more detail and/or showed a relevant subset of the data.
However, since you say there is some issue related to availability_date, my first thought is that you want to have some condition on that column, to ensure that a volunteer is available for a given duty on the date of a given performance. This might mean simply adding volunteer_duty.availability_date = performance.performance_date to the query conditions.
My more general recommendation is to start writing the query from scratch, adding one table at a time, and using ANSI join syntax. This will make it clearer which conditions are related to which joins, and if you add one table at a time hopefully you will see the point at which the results are going wrong.
For instance, I'd probably start with this:
SELECT production.name, performance.performance_date
FROM production
JOIN performance ON production.productionid = performance.productionid
If that gives results that make sense, then I would go on to add a join to performance_duty and run that query. Et cetera.
I suggest that you explicitly write JOINS, instead of using the WHERE-Syntax.
Using INNER JOINs the query you are describing, could look like:
SELECT *
FROM volunteer v
INNER JOIN volunteer_duty vd ON(v.volunteerId = vd.colunteerId)
INNER JOIN performance_duty pd ON(vd.dutyId = pd.dutyId AND vd.volunteerId = pd.colunteerId)
INNER JOIN performance p ON (pd.performanceId = p.performanceId)

LINQ - Using where or join - Performance difference?

Based on this question:
What is difference between Where and Join in linq?
My question is following:
Is there a performance difference in the following two statements:
from order in myDB.OrdersSet
from person in myDB.PersonSet
from product in myDB.ProductSet
where order.Persons_Id==person.Id && order.Products_Id==product.Id
select new { order.Id, person.Name, person.SurName, product.Model,UrunAdı=product.Name };
and
from order in myDB.OrdersSet
join person in myDB.PersonSet on order.Persons_Id equals person.Id
join product in myDB.ProductSet on order.Products_Id equals product.Id
select new { order.Id, person.Name, person.SurName, product.Model,UrunAdı=product.Name };
I would always use the second one just because it´s more clear.
My question is now, is the first one slower than the second one?
Does it build a cartesic product and filters it afterwards with the where clauses ?
Thank you.
It entirely depends on the provider you're using.
With LINQ to Objects, it will absolutely build the Cartesian product and filter afterwards.
For out-of-process query providers such as LINQ to SQL, it depends on whether it's smart enough to realise that it can translate it into a SQL join. Even if LINQ to SQL doesn't, it's likely that the query engine actually performing the query will do so - you'd have to check with the relevant query plan tool for your database to see what's actually going to happen.
Side-note: multiple "from" clauses don't always result in a Cartesian product - the contents of one "from" can depend on the current element of earlier ones, e.g.
from file in files
from line in ReadLines(file)
...
My question is now, is the first one slower than the second one? Does it build a cartesic product and filters it afterwards with the where clauses ?
If the collections are in memory, then yes. There is no query optimizer for LinqToObjects - it simply does what the programmer asks in the order that is asked.
If the collections are in a database (which is suspected due to the myDB variable), then no. The query is translated into sql and sent off to the database where there is a query optimizer. This optimizer will generate an execution plan. Since both queries are asking for the same logical result, it is reasonable to expect the same efficient plan will be generated for both. The only ways to be certain are to
inspect the execution plans
or measure the IO (SET STATISTICS IO ON).
Is there a performance difference
If you find yourself in a scenario where you have to ask, you should cultivate tools with which to measure and discover the truth for yourself. Measure - not ask.

Linq-to-NHibernate OrderBy Not Working

I'm trying order a Linq to NHibernate query by the sum of it's
children.
session.Linq<Parent>().OrderBy( p => p.Children.Sum( c => c.SomeNumber ) ).ToList()
This does not seem to work. When looking at NHProf, I can see that it
is ordering by Parent.Id. I figured that maybe it was returning the
results and ordering them outside of SQL, but if I add a .Skip
( 1 ).Take( 1 ) to the Linq query, it still orders by Parent.Id.
I tried doing this with an in memory List and it works just
fine.
Am I doing something wrong, or is this an issue with Linq to
NHibernate?
I'm sure I could always return the list, then do the operations on
them, but that is not an ideal workaround, because I don't want to
return all the records.
To order a query by an aggregate value, you need to use a group by query. In your example you need to use a 'group by' with a join.
The equivelant SQL would be something like:
select id, sum(child.parentid) as childsum from dbo.Parent
inner join child on
parent.id= child.parentid
group by id
order by childsum desc
If only Linq2NH could do this for us... but it can't sadly. You can do it in HQL as long as you're ok with getting the id, and the sum back, but not the whole object.
I battled with Linq2NH for months before I abandoned it. Its slow, doesn't support 2nd level cache and only supports VERY basic queries. (at least when I abandoned it 6 months ago - it may have come along leaps and bounds!) Its fine if you are doing a simple home-made application. If your application is even remotely complex ditch it and spend a bit of time getting to know HQL: a little harder to understand but much more powerful.

Outer Joins with Subsonic 3.0

Does anyone know of a way to do a left outer join with SubSonic 3.0 or another way to approach this problem? What I am trying to accomplish is that I have one table for departments and another table for divisions. A department can have multiple divisions. I need to display a list of departments with the divisions it contains. Getting back a collection of departments which each contain a collection of divisions would be ideal, but I would take a flattened result table too.
Using the LINQ syntax seems to be broken (I am new to LINQ though and may be using it wrong), for example this throws an ArgumentException error:
var allDepartments = from div in Division.All()
join dept in Department.All() on div.DepartmentId equals dept.Id into divdept
select divdept;
So I figured I could fall back to using the SubSonic query syntax. This code however generates an INNER JOIN instead of an OUTER JOIN:
List<Department> allDepartments = new Select()
.From<Department>()
.LeftOuterJoin<Division>(DepartmentsTable.IdColumn, DivisionsTable.DepartmentIdColumn)
.ExecuteTypedList<Department>();
Any help would be appreciated. I am not having much luck with SubSonic 3. I really enjoyed using SubSonic 2 and may go back to that if I can't figure out something as basic as a left join.
Getting back a collection of departments which each contain a collection of divisions would be ideal
SubSonic does this for you (if you setup your relationships correctly in the database), just select all Departments:
var depts = Model.Department.All();
There will be a property in each item of depts named Divisions, which contains a collection of Division objects.

Resources