How do I improve the performance of this simple LINQ? - linq

I have two tables, one parent "Point" and one child "PointValue", connected by a single foreign key "PointID", making a one-to-many relation in SQL Server 2005.
I have a LINQ query:
var points = from p in ContextDB.Points
//join v in ContextDB.PointValues on p.PointID equals v.PointID
where p.InstanceID == instanceId
orderby p.PointInTime descending
select new
{
Point = p,
Values = p.PointValues.Take(16).ToList()
};
As you can see from the commented out join and the "Values" assignment, the "Point" table has a relation to "PointValue" (called "Points" and "PointValues" by LINQ).
When iterating through the "var points" IQueryable (say, when binding it to a GridView, etc.) the initial query is very fast, however iterating through the "Values" property is very slow. SQL Profiler shows me that for each value in the "points" IQueryable another query is executed.
How do I get this to be one query?
Interestingly, the initial query becomes very slow when the join is uncommented.

I think you want to use the DataLoadOptions.LoadWith method, described here:
http://msdn.microsoft.com/en-us/library/system.data.linq.dataloadoptions.loadwith.aspx
In your case you would do something like the following, when creating your DataContext:
DataLoadOptions options = new DataLoadOptions();
ContextDB.LoadOptions = options;
options.LoadWith((Point p) => p.PointValues);

You should make sure that the PointValues table has an index on the PointID column.
See also this SO question: Does Foreign Key improve query performance?

Related

How do I get unique Linq-to-Entities results when there is no primary key?

I have an entity model of an Oracle data source over which I have no control. Using this model I query a particular view to see get all volume price breaks for a given product:
As you can see, this view has no primary key. Here is the LINQ I am using:
var db = GetOracleDataContext();
var result = db.ITEMPRICEBREAKS_V.Where(p => p.STOCKNO == stockId).ToList();
This works to some degree, but instead of returning four distinct records with their own quantities and pricing, it returns four identical records, each with the pricing and quantities of the first record ($4800, 0, 2).
I have no control over this view. Is there another way I can structure my LINQ query so that I can get the four distinct values?
Select only the fields you care about and use Distinct(). For example:
var result = db.ITEMPRICEBREAKS_V
.Where(p => p.STOCKNO == stockId)
.Select(p => p.Price)
.Distinct()
.ToList();
However, I'd strongly recommend, along with the other commenters, that you get a primary key involved.

Is it possible to detect if the selected item is the first in LINQ-to-SQL?

I wonder how I can build a query expression which understands the given item being selected is the first or not. Say I'm selecting 10 items from DB:
var query = db.Table.Take(10).Select(t => IsFirst ? t.Value1 : t.Value2);
There is an indexed variant of Select but that is not supported in LINQ-to-SQL. If it was supported my problems would be solved. Is there any other trick?
I could have used ROW_NUMBER() on T-SQL for instance, which LINQ-to-SQL uses but does not give access to.
I know I can Concat two queries and use the first expression in the first and so forth but I don't want to manipulate the rest of the query, just the select statement itself because the query is built at multiple places and this is where I want to behave differently on first row. I'll consider other options if that is not possible.
You can use the indexed overload, but you need to use the LINQ to Objects version:
var query =
db.Table.Take(10).AsEnumreable()
.Select((t, index) => index == 0 ? t.Value1 : t.Value2);
If Table have a primary key. You could do this:
var result= (
from t in db.Table.Take(10)
let first=db.Table.Take(10).Select (ta =>ta.PrimayKey).First()
select new
{
Value=(t.PrimaryKey=first?t.Value1 : t.Value2)
}
);

When to prefer joins expressed with SelectMany() over joins expressed with the join keyword in Linq

Linq allows to express inner joins by using the join keyword or by using
SelectMany() (i.e. a couple of from keywords) with a where keyword:
var personsToState = from person in persons
join state in statesOfUS
on person.State equals state.USPS
select new { person, State = state.Name };
foreach (var item in personsToState)
{
System.Diagnostics.Debug.WriteLine(item);
}
// The same query can be expressed with the query operator SelectMany(), which is
// expressed as two from clauses and a single where clause connecting the sequences.
var personsToState2 = from person in persons
from state in statesOfUS
where person.State == state.USPS
select new { person, State = state.Name };
foreach (var item in personsToState2)
{
System.Diagnostics.Debug.WriteLine(item);
}
My question: when is it purposeful to use the join-style and when to use the where-style,
has one style performance advantages over the other style?
For local queries Join is more efficient due to its keyed lookup as Athari mentioned, however for LINQ to SQL (L2S) you'll get more mileage out of SelectMany. In L2S a SelectMany ultimately uses some type of SQL join in the generated SQL depending on your query.
Take a look at questions 11 & 12 of the LINQ Quiz by Joseph/Ben Albahari, authors of C# 4.0 In a Nutshell. They show samples of different types of joins and they state:
With LINQ to SQL, SelectMany-based
joins are the most flexible, and can
perform both equi and non-equi joins.
Throw in DefaultIfEmpty, and you can
do left outer joins as well!
In addition, Matt Warren has a detailed blog post on this topic as it pertains to IQueryable / SQL here: LINQ: Building an IQueryable provider - Part VII.
Back to your question of which to use, you should use whichever query is more readable and allows you to easily express yourself and construct your end goal clearly. Performance shouldn't be an initial concern unless you are dealing with large collections and have profiled both approaches. In L2S you have to consider the flexibility SelectMany offers you depending on the way you need to pair up your data.
Join is more efficient, it uses Lookup class (a variation of Dictionary with multiple values for a single key) to find matching values.

Can I force the auto-generated Linq-to-SQL classes to use an OUTER JOIN?

Let's say I have an Order table which has a FirstSalesPersonId field and a SecondSalesPersonId field. Both of these are foreign keys that reference the SalesPerson table. For any given order, either one or two salespersons may be credited with the order. In other words, FirstSalesPersonId can never be NULL, but SecondSalesPersonId can be NULL.
When I drop my Order and SalesPerson tables onto the "Linq to SQL Classes" design surface, the class builder spots the two FK relationships from the Order table to the SalesPerson table, and so the generated Order class has a SalesPerson field and a SalesPerson1 field (which I can rename to SalesPerson1 and SalesPerson2 to avoid confusion).
Because I always want to have the salesperson data available whenever I process an order, I am using DataLoadOptions.LoadWith to specify that the two salesperson fields are populated when the order instance is populated, as follows:
dataLoadOptions.LoadWith<Order>(o => o.SalesPerson1);
dataLoadOptions.LoadWith<Order>(o => o.SalesPerson2);
The problem I'm having is that Linq to SQL is using something like the following SQL to load an order:
SELECT ...
FROM Order O
INNER JOIN SalesPerson SP1 ON SP1.salesPersonId = O.firstSalesPersonId
INNER JOIN SalesPerson SP2 ON SP2.salesPersonId = O.secondSalesPersonId
This would make sense if there were always two salesperson records, but because there is sometimes no second salesperson (secondSalesPersonId is NULL), the INNER JOIN causes the query to return no records in that case.
What I effectively want here is to change the second INNER JOIN into a LEFT OUTER JOIN. Is there a way to do that through the UI for the class generator? If not, how else can I achieve this?
(Note that because I'm using the generated classes almost exclusively, I'd rather not have something tacked on the side for this one case if I can avoid it).
Edit: per my comment reply, the SecondSalesPersonId field is nullable (in the DB, and in the generated classes).
The default behaviour actually is a LEFT JOIN, assuming you've set up the model correctly.
Here's a slightly anonymized example that I just tested on one of my own databases:
class Program
{
static void Main(string[] args)
{
using (TestDataContext context = new TestDataContext())
{
DataLoadOptions dlo = new DataLoadOptions();
dlo.LoadWith<Place>(p => p.Address);
context.LoadOptions = dlo;
var places = context.Places.Where(p => p.ID >= 100 && p.ID <= 200);
foreach (var place in places)
{
Console.WriteLine(p.ID, p.AddressID);
}
}
}
}
This is just a simple test that prints out a list of places and their address IDs. Here is the query text that appears in the profiler:
SELECT [t0].[ID], [t0].[Name], [t0].[AddressID], ...
FROM [dbo].[Places] AS [t0]
LEFT OUTER JOIN (
SELECT 1 AS [test], [t1].[AddressID],
[t1].[StreetLine1], [t1].[StreetLine2],
[t1].[City], [t1].[Region], [t1].[Country], [t1].[PostalCode]
FROM [dbo].[Addresses] AS [t1]
) AS [t2] ON [t2].[AddressID] = [t0].[AddressID]
WHERE ([t0].[PlaceID] >= #p0) AND ([t0].[PlaceID] <= #p1)
This isn't exactly a very pretty query (your guess is as good as mine as to what that 1 as [test] is all about), but it's definitively a LEFT JOIN and doesn't exhibit the problem you seem to be having. And this is just using the generated classes, I haven't made any changes.
Note that I also tested this on a dual relationship (i.e. a single Place having two Address references, one nullable, one not), and I get the exact same results. The first (non-nullable) gets turned into an INNER JOIN, and the second gets turned into a LEFT JOIN.
It has to be something in your model, like changing the nullability of the second reference. I know you say it's configured as nullable, but maybe you need to double-check? If it's definitely nullable then I suggest you post your full schema and DBML so somebody can try to reproduce the behaviour that you're seeing.
If you make the secondSalesPersonId field in the database table nullable, LINQ-to-SQL should properly construct the Association object so that the resulting SQL statement will do the LEFT OUTER JOIN.
UPDATE:
Since the field is nullable, your problem may be in explicitly declaring dataLoadOptions.LoadWith<>(). I'm running a similar situation in my current project where I have an Order, but the order goes through multiple stages. Each stage corresponds to a separate table with data related to that stage. I simply retrieve the Order, and the appropriate data follows along, if it exists. I don't use the dataLoadOptions at all, and it does what I need it to do. For example, if the Order has a purchase order record, but no invoice record, Order.PurchaseOrder will contain the purchase order data and Order.Invoice will be null. My query looks something like this:
DC.Orders.Where(a => a.Order_ID == id).SingleOrDefault();
I try not to micromanage LINQ-to-SQL...it does 95% of what I need straight out of the box.
UPDATE 2:
I found this post that discusses the use of DefaultIfEmpty() in order to populated child entities with null if they don't exist. I tried it out with LINQPad on my database and converted that example to lambda syntax (since that's what I use):
ParentTable.GroupJoin
(
ChildTable,
p => p.ParentTable_ID,
c => c.ChildTable_ID,
(p, aggregate) => new { p = p, aggregate = aggregate }
)
.SelectMany (a => a.aggregate.DefaultIfEmpty (),
(a, c) => new
{
ParentTableEntity = a.p,
ChildTableEntity = c
}
)
From what I can figure out from this statement, the GroupJoin expression relates the parent and child tables, while the SelectMany expression aggregates the related child records. The key appears to be the use of the DefaultIfEmpty, which forces the inclusion of the parent entity record even if there are no related child records. (Thanks for compelling me to dig into this further...I think I may have found some useful stuff to help with a pretty huge report I've got on my pipeline...)
UPDATE 3:
If the goal is to keep it simple, then it looks like you're going to have to reference those salesperson fields directly in your Select() expression. The reason you're having to use LoadWith<>() in the first place is because the tables are not being referenced anywhere in your query statement, so the LINQ engine won't automatically pull that information in.
As an example, given this structure:
MailingList ListCompany
=========== ===========
List_ID (PK) ListCompany_ID (PK)
ListCompany_ID (FK) FullName (string)
I want to get the name of the company associated with a particular mailing list:
MailingLists.Where(a => a.List_ID == 2).Select(a => a.ListCompany.FullName)
If that association has NOT been made, meaning that the ListCompany_ID field in the MailingList table for that record is equal to null, this is the resulting SQL generated by the LINQ engine:
SELECT [t1].[FullName]
FROM [MailingLists] AS [t0]
LEFT OUTER JOIN [ListCompanies] AS [t1] ON [t1].[ListCompany_ID] = [t0].[ListCompany_ID]
WHERE [t0].[List_ID] = #p0

Insert to 2 tables in single query using LINQ

I need to insert to two tables in a single query. Is this possible to do in LINQ?
At present I am using insertonsubmit() 2 times.
If your tables have a primary key/foreign key relationship to each other, then you also have two objects which you can link to each other:
InternetStoreDataContext db = new InternetStoreDataContext();
Category c = new Category();
c.name = "Accessories";
Product p = new Product();
p.name = "USB Mouse";
c.Products.Add(p);
//and finally
db.Categories.Add(c);
db.SubmitChanges();
That adds your object and all linked objects when submitting the changes.
Note that for that to work, you must have a primary key in both tables. Otherwise LINQ doesn't offer you the linking possibility.
Here are good examples of using LINQ to SQL: http://weblogs.asp.net/scottgu/archive/2007/05/19/using-linq-to-sql-part-1.aspx
The database submit doesn't happen until you call SubmitChanges. There is no tangible cost associated with multiple calls to InsertOnSubmit - so why not just do that?
This will still result in two TSQL INSERT commands - it simply isn't possible to insert into two tables in a single regular INSERT command.

Resources