Cross-Join Syntax in Entity Framework / IQueryable - linq

I'm trying to deepen my education about IQueryable custom providers and expression trees. I'm interested in custom parsing a cross-join (viz. SelectMany), and I'm trying to understand if that is exactly what EF is doing when it handles this:
var infoQuery =
from cust in db.Customers
from ord in cust.Orders
where cust.City == "London"
select ord;
Allegedly EF can handle cross joins, though the syntax in that link does not look right to me. Then I found a link with the title "Cross Product Queries" for EF. The syntax looks "correct," but the article itself speaks as if these are normal inner joins rather than cross joins.
Indeed, the code snippet above comes from that last article - and leaves me wondering if EF simply says "I know how these two entities are related, so I will form the inner-join automatically."
What is the real story with EF and this alleged "cross join" sample?
footnote
As I try to build my own IQueryable LINQ provider, the educational goal I've set for myself is to make my own query context for the code snippet above, so that when ToList() is called on the query:
A Console.WriteLine() is automatically fired that prints "This is a cross join of:Customer and Order
The == operator is magically converted into a != before the query is fully interpreted (by an ExpressionVisitor perhaps, not sure).
If someone knows of articles or has snippets of code that would speed my educational goal, please do share! :)

Look closely at the syntax:
from cust in db.Customers
from ord in cust.Orders // cust.
select ...
Because of cust.Orders this is a regular inner join. It's even the preferred way to do a join, because it is far more succinct than the regular join statement.
I don't understand the title of this "Cross-Product Queries" article. Firstly, because as far as I know "cross product" applies to three-dimensional vectors not relational algebra. Secondly, because there is not a single cross join in the examples, only inner joins. Maybe what they're trying to say is that the above syntax looks like a cross join? But it isn't, so it's only confusing to use the word so prominently in the title.
This code snippet
from cust in db.Customers
from ord in db.Orders // db.
select ...
is a true cross join (or Cartesian product). If there are n customers and m orders the result set contains n * m rows. Hardly useful with orders and customers, but it can be useful to get all combinations of elements in two sequences. This construct can also be useful if you want to join but also need a second condition in the join, like
from cust in db.Customers
from ord in db.Orders
where cust.CustomerId == ord.CustomerId && ord.OrderDate > DateTime.Today
Which effectively turns it into an inner join. Maybe not the best example, but there are cases where this comes in handy. It is not supported in the join - on - equals syntax.

Related

making inner join linq or using [table].[joiningTable].column is the same?

I have recently started working on linq and I was wondering suppose I have 2 related tables Project (<=with fkAccessLevelId) and AccessLevel and I want to just select values from both tables. Now there are 2 ways I can select values from these tables.
The one i commonly use is:
(from P in DataContext.Projects
join AL in DataContext.AccessLevel
on P.AccessLevelId equals AL.AccessLevelId
select new
{
ProjectName = P.Name,
Access = AL.AccessName
}
Another way of doing this would be:
(from P in DataContext.Projects
select new
{
ProjectName = P.Name,
Access = P.AccessLevel.AccessName
}
What i wanted to know is which of these way is efficient if we increase the number of table say 5-6 with 1-2 tables containing thousands of records...?
You should take a look at the SQL generated. You have to understand that there are several main performance bottle necks in a Linq query (in this case I assume a OMG...Linq to SQL?!?!) the usual main bottle neck is the SQL query on the server.
Typically SQL Server has a very good optimizer, so actually, given the same query, refactored, the perf is pretty uniform.
However in your case, there is a very real difference in the two queries. A project with no Access Level would not appear in the first query, whilst the second query would return with a null AccessName. In effect you would be comparing a LEFT JOIN to an INNER JOIN.
TL:DR For SQL Server/Linq to Entity Framework queries that do the same thing should give similar performance. However your queries are far from similar.

Oracle query with multiple tables

I am trying to display volunteer information with duty and what performance is allocated.
I want to display this information. However, when I run the query, it did not gather the different date from same performance. And also availability_date is mixed up. Is it right query for it? I am not sure it is right query.
Could you give me some feedback for me?
Thanks.
Query is here.
SELECT Production.name, performance.performance_date, volunteer_duty.availability_date, customer.name "Customer volunteer", volunteer.volunteerid, membership.name "Member volunteer", membership.membershipid
FROM Customer, Membership, Volunteer, volunteer_duty, duty, performance_duty, performance, production
WHERE
Customer.customerId (+) = Membership.customerId AND
Membership.membershipId = Volunteer.membershipId AND
volunteer.volunteerid = volunteer_duty.volunteerid AND
duty.dutyid = volunteer_duty.dutyid AND
volunteer_duty.dutyId = performance_duty.dutyId AND
volunteer_duty.volunteerId = performance_duty.volunteerId AND
performance_duty.performanceId = performance.performanceId AND
Performance.productionId = production.productionId
--Added image--
Result:
The query seems reasonable, in terms of it having what appear to be the appropriate join conditions between all the tables. It's not clear to me what issue you are having with the results; it might help if you explained in more detail and/or showed a relevant subset of the data.
However, since you say there is some issue related to availability_date, my first thought is that you want to have some condition on that column, to ensure that a volunteer is available for a given duty on the date of a given performance. This might mean simply adding volunteer_duty.availability_date = performance.performance_date to the query conditions.
My more general recommendation is to start writing the query from scratch, adding one table at a time, and using ANSI join syntax. This will make it clearer which conditions are related to which joins, and if you add one table at a time hopefully you will see the point at which the results are going wrong.
For instance, I'd probably start with this:
SELECT production.name, performance.performance_date
FROM production
JOIN performance ON production.productionid = performance.productionid
If that gives results that make sense, then I would go on to add a join to performance_duty and run that query. Et cetera.
I suggest that you explicitly write JOINS, instead of using the WHERE-Syntax.
Using INNER JOINs the query you are describing, could look like:
SELECT *
FROM volunteer v
INNER JOIN volunteer_duty vd ON(v.volunteerId = vd.colunteerId)
INNER JOIN performance_duty pd ON(vd.dutyId = pd.dutyId AND vd.volunteerId = pd.colunteerId)
INNER JOIN performance p ON (pd.performanceId = p.performanceId)

LINQ - Using where or join - Performance difference?

Based on this question:
What is difference between Where and Join in linq?
My question is following:
Is there a performance difference in the following two statements:
from order in myDB.OrdersSet
from person in myDB.PersonSet
from product in myDB.ProductSet
where order.Persons_Id==person.Id && order.Products_Id==product.Id
select new { order.Id, person.Name, person.SurName, product.Model,UrunAdı=product.Name };
and
from order in myDB.OrdersSet
join person in myDB.PersonSet on order.Persons_Id equals person.Id
join product in myDB.ProductSet on order.Products_Id equals product.Id
select new { order.Id, person.Name, person.SurName, product.Model,UrunAdı=product.Name };
I would always use the second one just because it´s more clear.
My question is now, is the first one slower than the second one?
Does it build a cartesic product and filters it afterwards with the where clauses ?
Thank you.
It entirely depends on the provider you're using.
With LINQ to Objects, it will absolutely build the Cartesian product and filter afterwards.
For out-of-process query providers such as LINQ to SQL, it depends on whether it's smart enough to realise that it can translate it into a SQL join. Even if LINQ to SQL doesn't, it's likely that the query engine actually performing the query will do so - you'd have to check with the relevant query plan tool for your database to see what's actually going to happen.
Side-note: multiple "from" clauses don't always result in a Cartesian product - the contents of one "from" can depend on the current element of earlier ones, e.g.
from file in files
from line in ReadLines(file)
...
My question is now, is the first one slower than the second one? Does it build a cartesic product and filters it afterwards with the where clauses ?
If the collections are in memory, then yes. There is no query optimizer for LinqToObjects - it simply does what the programmer asks in the order that is asked.
If the collections are in a database (which is suspected due to the myDB variable), then no. The query is translated into sql and sent off to the database where there is a query optimizer. This optimizer will generate an execution plan. Since both queries are asking for the same logical result, it is reasonable to expect the same efficient plan will be generated for both. The only ways to be certain are to
inspect the execution plans
or measure the IO (SET STATISTICS IO ON).
Is there a performance difference
If you find yourself in a scenario where you have to ask, you should cultivate tools with which to measure and discover the truth for yourself. Measure - not ask.

How do I do a left outer join in LINQ?

Can anybody explain in detail how to implement a left outer join in LINQ?
The key aspect here is DefaultIfEmpty()
Take a look at the following article to get a basic understanding. The example here is demonstrated for LINQ to SQL.
http://smehrozalam.wordpress.com/2009/06/10/c-left-outer-joins-with-linq/
If you are looking for LINQ to Objects example, then have a look at this
http://www.hookedonlinq.com/OuterJoinSample.ashx

Outer Joins with Subsonic 3.0

Does anyone know of a way to do a left outer join with SubSonic 3.0 or another way to approach this problem? What I am trying to accomplish is that I have one table for departments and another table for divisions. A department can have multiple divisions. I need to display a list of departments with the divisions it contains. Getting back a collection of departments which each contain a collection of divisions would be ideal, but I would take a flattened result table too.
Using the LINQ syntax seems to be broken (I am new to LINQ though and may be using it wrong), for example this throws an ArgumentException error:
var allDepartments = from div in Division.All()
join dept in Department.All() on div.DepartmentId equals dept.Id into divdept
select divdept;
So I figured I could fall back to using the SubSonic query syntax. This code however generates an INNER JOIN instead of an OUTER JOIN:
List<Department> allDepartments = new Select()
.From<Department>()
.LeftOuterJoin<Division>(DepartmentsTable.IdColumn, DivisionsTable.DepartmentIdColumn)
.ExecuteTypedList<Department>();
Any help would be appreciated. I am not having much luck with SubSonic 3. I really enjoyed using SubSonic 2 and may go back to that if I can't figure out something as basic as a left join.
Getting back a collection of departments which each contain a collection of divisions would be ideal
SubSonic does this for you (if you setup your relationships correctly in the database), just select all Departments:
var depts = Model.Department.All();
There will be a property in each item of depts named Divisions, which contains a collection of Division objects.

Resources