I've been working with Entity Framework for a few weeks now. I have been working with Linq-Objects and Linq-SQL for years. A lot of times, I like to write linq statements like this:
from student in db.Students
from score in student.Scores
where score > 90
select student;
With other forms of linq, this returns distinct students who have at least one score greater than 90. However, in EF this query returns one student for every score greater than 90.
Does anyone know if this behavior can be replicated in unit tests? Is it possible this is a bug in EF?
I don't like that SQL-like syntax (I have no better name for it), especially when you start nesting them.
var students = db.Students.Where(student
=> student.Scores.Any(score => score > 90)
)
.ToList();
This snippet, using the method syntax, does the same thing. I find it far more readable. It's more explicit in the order of operations used.
And as far as I have experienced, EF hasn't yet shown a bug with its selection using method syntax.
Edit
To actually answer your problem:
However, in EF this query returns one student for every score greater than 90.
I think is is due to a JOIN statement used in the final SQL that will be run. This is why I avoid SQL-like syntax, because it becomes very hard to differentiate between what you want to retrieve (students) and what you want to filter with (scores).
Much like you would do in SQL, you are joining the data from students and scores, and then running a filtering operation on that collection. It becomes harder to then unseparate that result again into a collection of students. I think this is the main cause of your issue. It's not a bug per sé, but I think EF can only handle it one way.
Alternative solutions to the above:
If it returns one student per score over 90, take the distinct students returned. It should be the same result set.
Use more explicit parentheses () and formatting to nest separate SQL-like statements.
Note: I'm not saying it can't be done with SQL-like syntax. I am well aware that most of this answer is opinion based.
Related
AX allows you to enter basic SQL into View ranges. For example, in an AOT view's range, for the match value, you could enter (StatRepInterval.Name == 'Weekly'). This works nicely.
However, I need to do a more advanced lookup on a View, using a subquery. Can anyone suggest a way to do this?
This is what I would like to use, but I receive an error: "Query extended range failure: Syntax error near 34."
(StatRepInterval.Name == (SELECT FIRSTONLY StatRepInterval.Name FROM StatRepInterval WHERE StatRepInterval.PrintDirection == 1 ORDER BY StatRepInterval.Name DESC))
I've tried a lot of different variants of the subquery, from straight T-SQL to X++ SQL, but nothing seems to work.
Thanks for the help.
Sub-queries are not supported in query expressions.
This may be solved by using additional datasources with inner or outer joins as you observed.
See the spec and Axaptapedida on query expressions.
I found a way to do this. It isn't pretty, and I'm going to leave the question unanswered for a bit, should someone else have a more graceful solution.
Create a source View that contains all fields I wish to return, plus calculated fields that contain my subquery results.
Create a second View that uses the first as a data source, and applies all the necessary ranges.
Works pretty nicely.
Probably inefficient if there were large tables of data, but this is in a relatively small section of AX.
I'm looking for a clean way to write this Linq query.
Basically I have a collection of objects with id's, then using nhibernate and Linq, I need to check if the nhibernate entity has a subclass collection where all id's in object collection exist in the nhibernate subclass collection.
If there was just one item this would work:
var objectImCheckingAgainst = ... //irrelevant
where Obj.SubObj.Any(a => a.id == objectImCheckingAgainst.Id)
Now I want to instead somehow pass a list of objectImCheckingAgainst and return true only if the Obj.SubObj collection contains all items in list of objectImCheckingAgainst based on Id.
I like to use GroupJoin for this.
return objectImCheckingAgainst.GroupJoin(Obj.SubObj,
a => a.Id,
b => b.id,
(a, b) => b.Any())
.All(c => c);
I believe this query should be more or less self-explanatory, but essentially, this joins the two collections using their respective ids as keys, then groups those results. Then for each of those groupings, it determines whether any matches exist. Finally, it ensures that all groupings had matches.
A useful alternative that I sometimes use is .Count() == 1 instead of the .Any(). Obviously, the difference there is whether you want to support multiple elements with the same id matching. From your description, it sounded like that either doesn't matter or is enforced by another means. But that's an easy swap, either way.
An important concept in GroupJoin that I know is relevant, but may or may not be obvious, is that the first enumerable (which is to say, the first argument to the extension method, or objectImCheckingAgainst in this example) will have all its elements included in the result, but the second one may or may not. It's not like Join, where the ordering is irrelevant. If you're used to SQL, these are the elementary beginnings of a LEFT OUTER JOIN.
Another way you could accomplish this, somewhat more simply but not as efficiently, would be to simply nest the queries:
return objectImCheckingAgainst.All(c => Obj.SubObj.Any(x => x.id == c.Id));
I say this because it's pretty similar to the example you provided.
I don't have any experience with NHibernate, but I know many ORMs (I believe EF included) will map this to SQL, so efficiency may or may not be a concern. But in general, I like to write LINQ as close to par as I can so it works as well in memory as against a database, so I'd go with the first one I mentioned.
I'm not well versed in LINQ-to-NHibernate but when using LINQ against any SQL backen it's always important to keep an eye on the generated SQL. I think this where clause...
where Obj.SubObj.All(a => idList.Contains(a.id))
...will produce the best SQL (having an IN statement).
idList is a list of Ids extracted from the list of objectImCheckingAgainst objects.
Based on this question:
What is difference between Where and Join in linq?
My question is following:
Is there a performance difference in the following two statements:
from order in myDB.OrdersSet
from person in myDB.PersonSet
from product in myDB.ProductSet
where order.Persons_Id==person.Id && order.Products_Id==product.Id
select new { order.Id, person.Name, person.SurName, product.Model,UrunAdı=product.Name };
and
from order in myDB.OrdersSet
join person in myDB.PersonSet on order.Persons_Id equals person.Id
join product in myDB.ProductSet on order.Products_Id equals product.Id
select new { order.Id, person.Name, person.SurName, product.Model,UrunAdı=product.Name };
I would always use the second one just because it´s more clear.
My question is now, is the first one slower than the second one?
Does it build a cartesic product and filters it afterwards with the where clauses ?
Thank you.
It entirely depends on the provider you're using.
With LINQ to Objects, it will absolutely build the Cartesian product and filter afterwards.
For out-of-process query providers such as LINQ to SQL, it depends on whether it's smart enough to realise that it can translate it into a SQL join. Even if LINQ to SQL doesn't, it's likely that the query engine actually performing the query will do so - you'd have to check with the relevant query plan tool for your database to see what's actually going to happen.
Side-note: multiple "from" clauses don't always result in a Cartesian product - the contents of one "from" can depend on the current element of earlier ones, e.g.
from file in files
from line in ReadLines(file)
...
My question is now, is the first one slower than the second one? Does it build a cartesic product and filters it afterwards with the where clauses ?
If the collections are in memory, then yes. There is no query optimizer for LinqToObjects - it simply does what the programmer asks in the order that is asked.
If the collections are in a database (which is suspected due to the myDB variable), then no. The query is translated into sql and sent off to the database where there is a query optimizer. This optimizer will generate an execution plan. Since both queries are asking for the same logical result, it is reasonable to expect the same efficient plan will be generated for both. The only ways to be certain are to
inspect the execution plans
or measure the IO (SET STATISTICS IO ON).
Is there a performance difference
If you find yourself in a scenario where you have to ask, you should cultivate tools with which to measure and discover the truth for yourself. Measure - not ask.
I'm trying order a Linq to NHibernate query by the sum of it's
children.
session.Linq<Parent>().OrderBy( p => p.Children.Sum( c => c.SomeNumber ) ).ToList()
This does not seem to work. When looking at NHProf, I can see that it
is ordering by Parent.Id. I figured that maybe it was returning the
results and ordering them outside of SQL, but if I add a .Skip
( 1 ).Take( 1 ) to the Linq query, it still orders by Parent.Id.
I tried doing this with an in memory List and it works just
fine.
Am I doing something wrong, or is this an issue with Linq to
NHibernate?
I'm sure I could always return the list, then do the operations on
them, but that is not an ideal workaround, because I don't want to
return all the records.
To order a query by an aggregate value, you need to use a group by query. In your example you need to use a 'group by' with a join.
The equivelant SQL would be something like:
select id, sum(child.parentid) as childsum from dbo.Parent
inner join child on
parent.id= child.parentid
group by id
order by childsum desc
If only Linq2NH could do this for us... but it can't sadly. You can do it in HQL as long as you're ok with getting the id, and the sum back, but not the whole object.
I battled with Linq2NH for months before I abandoned it. Its slow, doesn't support 2nd level cache and only supports VERY basic queries. (at least when I abandoned it 6 months ago - it may have come along leaps and bounds!) Its fine if you are doing a simple home-made application. If your application is even remotely complex ditch it and spend a bit of time getting to know HQL: a little harder to understand but much more powerful.
We need to produce a fairly complex dynamic query builder for retrieving reports on the fly. We're scratching our heads a little on what sort of data structure would be best.
It's really nothing more than holding a list of selectParts, a list of fromParts, a list of where criteria, order by, group by, that sort of thing, for persistence. When we start thinking about joins, especially outer joins, having clauses, and aggregate functions, things start getting a little fuzzy.
We're building it up interfaces first for now and trying to think as far ahead as we can, but definitely will go through a series of refactorings when we discover limitations with our structures.
I'm posting this question here in the hopes that someone has already come up with something that we can base it on. Or know of some library or some such. It would be nice to get some tips or heads-up on potential issues before we dive into implementations next week.
I've done something similar couple of times in the past. A couple of the bigger things spring to mind..
The where clause is the hardest to get right. If you divide things up into what I would call "expressions" and "predicates" it makes it easier.
Expressions - column references, parameters, literals, functions, aggregates (count/sum)
Predicates - comparisons, like, between, in, is null (predicates have expression as children, e.g. expr1 = expr2. Then you also having composites such as and/or/not.
The whole where clause, as you can imagine, is a tree with a predicate at the root, with maybe sub-predicates underneath eventually terminating with expressions at the leaves.
To construct the HQL you walk the model (depth first usually). I used a visitor as I need to walk my models for other reasons, but if you don't have multiple purposes you can build the rendering code right into the model.
e.g. If you had
"where upper(column1) = :param1 AND ( column2 is null OR column3 between :param2 and param3)"
Then the tree is
Root
- AND
- Equal
- Function(upper)
- ColumnReference(column1)
- Parameter(param1)
- OR
- IsNull
- ColumnReference(column2)
- Between
- ColumnReference(column3)
- Parameter(param2)
- Parameter(param3)
Then you walk the tree depth first and merge rendered bits of HQL on the way back up. The upper function for example would expect one piece of child HQL to be rendered and it would then generate
"upper( " + childHql + " )"
and pass that up to it's parent. Something like Between expects three child HQL pieces.
You can then re-use the expression model in the select/group by/order by clauses
You can skip storing the group by if you wish by just storing the select and before query construction scan for aggregate. If there is one or more then just copy all the non-aggregate select expressions into the group by.
From clause is just a list of table reference + zero or more join clauses. Each join clause has a type (inner/left/right) and a table reference. Table reference is a table name + optional alias.
Plus, if you ever get into wanting to parse a query language (or anything really) then I can highly recommend ANTLR. Learning curve is quite steep but there are plenty of example grammars to look at.
HTH.
if you need EJB-QL parser and data structures, EclipseLink (well several of it's internal classes) have good one:
JPQLParseTree tree = org.eclipse.persistence.internal.jpa.parsing.jpql.JPQLParser.buildParserFor(" _the_ejb_ql_string_ ").parse();
JPQLParseTree contains the all the data.
but generating EJB-QL back from modified JPQLParseTree is something you have to do yourself.