Linq Expression Syntax - How to make it more readable? - linq

I am in the process of writing something that will use Linq to combine results from my database, via Linq2Sql and an in-memory list of objects in order to find out which of my in-memory objects match something on the database.
I've come up with the query in both expression and query syntax.
Expression Syntax
var query = order.Items.Join(productNonCriticalityList,
i => i.ProductID,
p => p.ProductID,
(i, p) => i);
Query Syntax
var query =
from p in productNonCriticalityList
join i in order.Items
on p.ProductID equals i.ProductID
select i;
I realise that we have all the code completion goodness with expression syntax, and I do actually use that more. Mainly because it's easier to create re-usable chunks of filter code that can be chained together to form more complex filters.
But for a join the latter seems far more readable to me, but maybe that is because I am used to writing T-SQL.
So, am I missing a trick or is it just a matter of getting used to it?

I agree with the other responders that the exact question you're asking is simply a matter of preference. Personaly, I mix the two forms depending upon which is clearer for the specific query that I'm writing.
If I have one comment though, I would say that the query looks like it might load all of the items from the order. That might be fine for a single order one time, but if you're looping through lots of orders, it might be more efficient to load all of the items for all of the in one go (you might want to additionally filter by date or customer, or whatever though). If you do that, you might get better results by switching the query around:
var productIds = (from p in productNonCriticalityList
orderby p.productID
select p.ProductID).Distinct();
var orderItems = from i in dc.OrderItems
where productIds.Contains(i.ProductID)
&& // Additional filtering here.
select i;
It's a bit backwards at first glance, but it could save you from loading in all the order items and also from sending lots of queries. It works because the where productIds.Contains(...) call can be converted to where i.ProductID in (1, 2, 3, 4, 5) in SQL. Of course, you'd have to judge it based on the expected number of order items, and the number of product IDs.

It really all comes down to preference. Some people just hate the idea of query like syntax in their code. I for one appreciate the query syntax, it is declarative and quite readable. Like you said though, the chainability of the first example is a nice thing to have. I guess for my money I would keep it query until I felt I needed to begin chaining the call.

I used to feel the same way. Now I find query syntax easier to read and write, particularly when things get complicated. As much as it irked me to type it the first time, 'let' does wonderful things in ways that would not be readable in Expression Syntax.

I prefer the Query syntax when its complex and Expression syntax when its a simple query.
If a DBA were to read the C# code to see what SQL we are using, they would understand and digest the Query syntax easier.
Taking a simple example:
Query
var col = from o in orders
orderby o.Cost ascending
select o;
Expression
var col2 = orders.OrderBy(o => o.Cost);
To me, the Expression syntax is an easier choice to understand here.
Another example:
Query
var col9 = from o in orders
orderby o.CustomerID, o.Cost descending
select o;
Expression
var col6 = orders.OrderBy(o => o.CustomerID).
ThenByDescending(o => o.Cost);
Both are easy to read and understand, however if the query was
//returns same results as above
var col5 = from o in orders
orderby o.Cost descending
orderby o.CustomerID
select o;
//NOTE the ordering of the orderby's
That looks a little confusing to be as the fields are in a different order and it appears a little backwards.
For Joins
Query
var col = from c in customers
join o in orders on
c.CustomerID equals o.CustomerID
select new
{
c.CustomerID,
c.Name,
o.OrderID,
o.Cost
};
Expression:
var col2 = customers.Join(orders,
c => c.CustomerID,o => o.CustomerID,
(c, o) => new
{
c.CustomerID,
c.Name,
o.OrderID,
o.Cost
}
);
I find that Query is better.
My summary would be use whatever looks easiest and fastest to understand given the query at hand. There is no golden rule of which to use. However, if there are a lot of joins, I'd go with Query syntax.

Well, both statements are equivalent. So you could youse them both, depending on the surrounging code and what is more readable. In my project I make the decision which syntax to use dependent on those two conditions.
Personally I would write the expression syntax in one line, but this is a matter of taste.

Related

What should you use for joining in LINQ, Query syntax or method syntax?

I would like to know in terms of performance is there any difference between using a query syntax or method syntax (Lambda expressions) for joining two entities?
I already know that in general there are no difference in terms of result, between query syntax and method syntax. However, for joining which of these are better to use performance wise?
Here is the sample code:
var queryResult = (from p in People
join i in Incomes
on p.PersonId equals i.PersonId
select new { p.PersonId, p.Name, p.Age, i.Amount }
).ToList();
var lambdaResult = People.Join(Incomes,
p => p.PersonId,
i => i.PersonId,
(p, i) => new { p.PersonId, p.Name, p.Age, i.Amount }).ToList();
I have already went through these websites but nothing has been mentioned for join
https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/linq/query-syntax-and-method-syntax-in-linq
LINQ - Query syntax vs method chains & lambda
There is no difference. Your first version (query language) is translated lexically into the second one (method syntax) before "real" compilation. The query language is only syntactic sugar and transformed into method calls. These calls are then compiled (if possible - the translation itself does not care about the correctness of the result, e.g. if People.Join even is valid C# and there is such a Join method in whatever People might be).
There maybe a difference in that this translation uses an explicit Select call instead of the resultSelector parameter of the Join method, but even that does not measurably impact performance.
This article by Jon Skeet helped me understand the transformation from query language to method syntax.
To answer the question "What should you use": this is really up to you. Consider:
what is more readable/understandable to you (and your co-workers!)
complex queries often are more readable in query syntax, the SQL-like style can be easier to read than a long chain of method calls
Note that every query syntax expression can be expressed as method calls, but not all method calls can be expressed in query syntax
mixing both syntaxes in a single query is often more confusing than sticking to one of them

LINQ syntax - ordering of criteria

I'm trying to understand LINQ syntax and getting stuck. So I've got this line which gets all of the people with the postcode I'm searching for
IQueryable<int> PersonIDsWithThisPostcode = _context.Addresses
.Where(pst => pst.Postcode.Contains(p))
.Select(b => b.PersonID);
This line then only returns people in PersonIDsWithThisPostcode
persons = persons.Where(ps => PersonIDsWithThisPostcode.Contains(ps.PersonID));
I'd have expected it to be something along the lines of this, where you're looking at a container, then checking against a subset of values to see what you want.
persons = persons.Where(ps => ps.PersonID.Contains(PersonIDsWithThisPostcode));
So from a SQL point-of-view I'd think of it something like this
bucket = bucket.Where(bucket.Contains(listoffish));
but it seems to act like this
bucket = bucket.Where(listoffish.Contains(bucket));
I've read through lots of documentation but I can't get my head around this apparently simple notion. Any help to explain this way of thinking would be appreciated.
Thanks
If PersonID is an int you can't use ps.PersonID.Contains because an int is not a collection (or string which would search a substring).
The only correct way is to search your PersonId in a collection which is the PersonIDsWithThisPostcode-query that returns all matching PersonIds.
A single PersonID doesn't contain a collection but a collection of PersonIds contains a single PersonId.
So this is correct, it returns all persons which PersonId is in the other sequence:
persons = persons.Where(ps => PersonIDsWithThisPostcode.Contains(ps.PersonID));
and this not:
persons = persons.Where(ps => ps.PersonID.Contains(PersonIDsWithThisPostcode));
The syntax is reversed in comparison to SQL, which should come as no surprise, considering that C# and SQL are two different languages.
In SQL you place the list on the right, because IN operator reads "item in collection"
WHERE someId IN (100, 102, 113, 200, 219)
In C#, without regard to LINQ, you check if a collection contains an item using code that reads "collection contains item"
myList.Contains(someId);
When you use Contains in LINQ that gets translated to SQL, LINQ provider translates one syntax to the other syntax to shield C# programmers from thinking about the differences.

Joining two tables and returning multiple records as one row using LINQ

I'm trying to write a LINQ expression that will join two tables and return data in a format similar to what is possible using MySql's GROUP_CONCAT. I tried searching around on Google and SO, but all the results I found used MSSQL or were only using one table. The expression I have written now looks like this:
from d in division
join o in office on d.Id = o.DivisionId
select new
{
id = d.Id,
cell = new string[] { d.DivisionName, o.OfficeName }
}
As expected, this returns a list of every division and what offices correspond to that division. The only problem is that since most divisions will have more than one office, I get a division back for each office in said division. Essentially I'm seeing results like this:
Division1: Office1
Division1: Office2
Division1: Office3
Division2: Office1
When I want to see:
Division1: Office1, Office2, Office3
Division2: Office1
I remember doing something a while ago with MySql that used GROUP_CONCAT, but I can't figure out what the equivalent of that would be using LINQ. I tried writing a method which had an IEnumerable<Office> parameter and built a string using the Aggregate extension method, but the way I have my LINQ expression written now, each Office is passed in rather than an IEnumerable<Office>. Is there a better way to approach this problem than what I'm doing now? I'm rather new to LINQ expressions, so I apologize if this is trivial.
You want a group join, e.g.
from d in division
join o in office on d.Id = o.DivisionId into offices
select new
{
id = d.Id,
divisionName = d.DivisionName,
officeNames = offices.Select(o => o.OfficeName)
}

Any way to make this more efficient? It runs fast just would like improvements if available

Any way to make this more efficient?
internal Func<enntities, IQueryable<CategoryList>> GetCategoryListWithPostingCount =
CompiledQuery.Compile((entities entities) =>
from c in entities.Categories.Include("Postings_Categories")
where c.ParentCategoryID == null
orderby c.DisplayOrder
select new CategoryList
{
ParentCategoryName = c.CategoryName,
ParentCategoryID = c.CategoryID,
SubCategories = (
from s in entities.Categories
where s.ParentCategoryID == c.CategoryID
select new SubCategoryList
{
PostingCount = s.Postings_Categories.Count,
SubCategoryName = s.CategoryName,
SubCategoryID = s.CategoryID
})
});
Any suggestions for improvement would depend on us knowing a lot more than just the LINQ query. For example, if you have a lot of SubCategoryLists per Category, and if Category has a lot of scalar data attached to it (big description text, etc), it might be faster to pull the category list in a separate database round-trip. But it probably won't.
Looking at what tables are joined and adding the appropriate indices could make a difference, too.
But all in all, I'd say this is a case of premature optimization. The code looks clean, and I can tell what you're trying to do with it, so call that good for now. If you find that it's a slow point in your program, worry about it then.
PS--The question is tagged LINQ-to-SQL, but this looks like LINQ-to-Entities to me...

Translate an IQueryable instance to LINQ syntax in a string

I would like to find out if anyone has existing work surrounding formatting an IQueryable instance back into a LINQ C# syntax inside a string. It'd be a nice-to-have feature for an internal LINQ-to-SQL auditing framework I'm building. Once my framework gets the IQueryable instance from a data repository method, I'd like to output something like:
This LINQ query:
from ce in db.EiClassEnrollment
join c in db.EiCourse on ce.CourseID equals c.CourseID
join cl in db.EiClass on ce.ClassID equals cl.ClassID
join t in db.EiTerm on ce.TermID equals t.TermID
join st in db.EiStaff on cl.Instructor equals st.StaffID
where (ce.StudentID == studentID) && (ce.TermID == termID) && (cl.Campus == campusID)
select new { ce, cl, t, c, st };
Generates the following LINQ-to-SQL query:
DECLARE #p0 int;
DECLARE #p1 int;
DECLARE #p2 int;
SET #p0 = 777;
SET #p1 = 778;
SET #p2 = 779;
SELECT [t0].[ClassEnrollmentID], ..., [t4].[Name]
FROM [dbo].[ei_ClassEnrollment] AS [t0]
INNER JOIN [dbo].[ei_Course] AS [t1] ON [t0].[CourseID] = [t1].[CourseID]
INNER JOIN [dbo].[ei_Class] AS [t2] ON [t0].[ClassID] = [t2].[ClassID]
INNER JOIN [dbo].[ei_Term] AS [t3] ON [t0].[TermID] = [t3].[TermID]
INNER JOIN [dbo].[ei_Staff] AS [t4] ON [t2].[Instructor] = [t4].[StaffID]
WHERE ([t0].[StudentID] = #p0) AND ([t0].[TermID] = #p1) AND ([t2].[Campus] = #p2)
I already have the SQL output working as you can see. I just need to find a way to get the IQueryable to translate into a string representing its original LINQ syntax (with an acceptable translation loss). I'm not afraid of writing it myself, but I'd like to see if anyone else has done this first.
Everything IQueryable can be compiled in to an Expression object. Expressions have a Body property representing the body of the lambda expression. You may be able to, while parsing your sources, compile each expression then output the body, which should be normalized.
The best approach to this would be to read up on expression trees in C#. I think you may be able to use a visitor pattern over an IQueryable<T> type to recover the C# syntax. I know there are some implementations available for Expression<Func<T>>, but I can't recall ever seeing this done for a LINQ query.
UPDATE I got curious about this and started doing some research. You can access the underlying Expression Tree through the Expression property of an IQueryable<>. It looks like you would need to implement a LINQ provider that renders C# instead of SQL. This is very far from trivial. In fact I think it would be difficult to justify the amount of work that would be required unless this is an educational (non-commercial) project. But if you're undaunted, here is what looks like an excellent tutorial on LINQ providers. All the source code is available on Codeplex too.
I've done my own implementation for this since I could not find any existing work that was freely available or in source form. I put up a quick blog post about my work and included the entire C# source code for it. You can do some pretty neat stuff with it. Feel free to check it out.
http://bittwiddlers.org/?p=120

Resources