LINQ to SQL deferred execution and materialization - linq

I am sort of new to this. Curious what happens in the following situation?
var q = //MY LINQ TO SQL QUERY.Select(...)
.........
.........
var c = q.Count();
.........
.........
var x = q.Where(....).Select(....);
var y = x.ToList();//or something such that forces materialization
var d = q.Count();//same as c
var e = x.Count();
var f = y.Count();
How many times did the sql statements make a trip to the db actually? Once at Count(). Again at Where()? Or Linq retains what it materialized during Count()?
Or it also depends on what the Where(..) has? Like if it is again referencing to database vs it just references what's obtained as part of 'q'/ or any other .net collections etc?
Edit:
Updated my code with a couple of other scenarios. Please correct my answers below:
q -no db trip
c -yes, but translates to aggregate qry - select Count(*) and not a result set (as per answer below)
x -no db trip. No matter what is written in the Where(..)
y - yes
d - yes - does not *reuse* c
e - yes - select count(*) EVEN THOUGH x already materized during y
f - no db trip

When you call Count it does not materialize the entire data set. Rather it prepares and executes a query like
SELECT COUNT(*) FROM ...
Using ExecuteScalar to get the result.
When you call Where and Select it does not materialize anything (assuming q is an IQueryable). Instead it's just preparing a query like
SELECT col1, col2, ... FROM ...
But it doesn't actually execute it at that point. It will only execute the query when you call GetEnumerator on q. You'll rarely do this directly, but anything like the following will cause your query to be executed:
var arry = q.ToArray();
var list = q.ToList();
foreach(var rec in q) ...
It will only execute this query once, so having multiple foreach loops will not create multiple database queries. Of course, if you create a new IQueryable based on q (e.g. var q2 = q.Where(...)) it will not be tied to the result set used by q, so it will have to query the database again.
I tested out your code in LINQPad, and it seems all your analyses are correct.

Related

Entity Framework Core + Count with Group By

I have a table which contains ~600k records and 33 columns. In my project I am using EF Core (2.0.1) to retrieve data from database. I am having issues with below code:
var theCounter = (from f in _context.tblData.Take(100000)
group f by f.TypeId into data
select new DataDto { ID = data.Key, Count = data.Count() }).ToList();
This code is a part of REST API and when I am testing it from SOAP UI, I am gettin timeout error. When I tested the code for
Take(1000)
There are around 300 unique TypeIds.
it works fine. Any ideas how I can make it work?
-- EDIT 1:
Here is what I see when debugging the code:
Microsoft.EntityFrameworkCore.Query:Warning: Query: '(from TblData <generated>_1 in DbSet<TblData> select [<generated>_1]).Take(__p_0)' uses a row limiting operation (Skip/Take) without OrderBy which may lead to unpredictable results.
Microsoft.EntityFrameworkCore.Query:Warning: Query: '(from TblData <generated>_1 in DbSet<TblData> select [<generated>_1]).Take(__p_0)' uses a row limiting operation (Skip/Take) without OrderBy which may lead to unpredictable results.
Microsoft.EntityFrameworkCore.Query:Warning: The LINQ expression 'GroupBy([f].TypeId, [f])' could not be translated and will be evaluated locally.
Microsoft.EntityFrameworkCore.Query:Warning: The LINQ expression 'GroupBy([f].TypeId, [f])' could not be translated and will be evaluated locally.
Microsoft.EntityFrameworkCore.Query:Warning: The LINQ expression 'Count()' could not be translated and will be evaluated locally.
Microsoft.EntityFrameworkCore.Database.Command:Information: Executed DbCommand (131ms) [Parameters=[#__p_0='?'], CommandType='Text', CommandTimeout='30']
SELECT [t2].[Id], [t2].[at], [t2].[add], [t2].[AddDate], [t2].[aftc], [t2].[aftcd], [t2].[aid], [t2].[afl], [t2].[prdid], [t2].[cid], [t2].[TypeId], [t2].[env], [t2].[ext], [t2].[extddcode], [t2].[fn], [t2].[fn], [t2].[fic], [t2].[gid], [t2].[grp], [t2].[hnm], [t2].[IP], [t2].[icid], [t2].[ln], [t2].[lg], [t2].[pcid], [t2].[ret], [t2].[rts], [t2].[rnam], [t2].[sled], [t2].[seq], [t2].[sid], [t2].[styp]
FROM (
SELECT TOP(#__p_0) [t1].[Id], [t1].[at], [t1].[add], [t1].[AddDate], [t1].[aftc], [t1].[aftcd], [t1].[aid], [t1].[afl], [t1].[prdid], [t1].[cid], [t1].[TypeId], [t1].[env], [t1].[ext], [t1].[extddcode], [t1].[fn], [t1].[fn], [t1].[fic], [t1].[gid], [t1].[grp], [t1].[hnm], [t1].[IP], [t1].[icid], [t1].[ln], [t1].[lg], [t1].[pcid], [t1].[ret], [t1].[rts], [t1].[rnam], [t1].[sled], [t1].[seq], [t1].[sid], [t1].[styp]
FROM [TblData] AS [t1]
) AS [t2]
WHERE [t2].[TypeId] IS NOT NULL
ORDER BY [t2].[TypeId]
I think it is not translated properly. Any ideas why?
-- EDIT 2:
I have changed my queries to:
var query = _context.TblData
.Select(a => new {ID = a.Id, TypeId= a.TypeId})
.Distinct();
var q1 = query.GroupBy(p => p.TypeId)
.Select(g => new DataDto {TypeId= g.Key, Count = g.Count()});
return await q1.ToListAsync();
But it was translated to:
SELECT DISTINCT [a0].[Id], [a0].[TypeId] AS [TypeId]
FROM [tblData] AS [a0]
ORDER BY [a0].[TypeId]
When I checked directly in the database this query takes 14 seconds to execute. Any idea why it was not translated to something like:
SELECT DISTINCT [a0].[Id], COUNT([TypeId]) AS [TypeId]
FROM [tblData] AS [a0]
GROUP BY COUNT([a0].[Id])
ORDER BY [a0].[TypeId]
I had to upgrade EF Core version to 2.1 and LINQ is now translated properly into SQL.

converting sql to linq woes

At my job our main application was written long ago before n-tier was really a thing, ergo - it has tons and tons of business logic begin handled in stored procs and such.
So we have finally decided to bite the bullet and make it not suck so bad. I have been tasked with converting a 900+ line sql script to a .NET exe, which I am doing in C#/Linq. Problem is...for the last 5-6 years at another job, I had been doing Linq exclusively, so my SQL has gotten somewhat rusty, and some of thing I am converting I have never tried to do before in Linq, so I'm hitting some roadblocks.
Anyway, enough whining.
I'm having trouble with the following sql statement, I think due to the fact that he is joining on a temp table and a derived table. Here's the SQL:
insert into #processedBatchesPurgeList
select d.pricebatchdetailid
from pricebatchheader h (nolock)
join pricebatchstatus pbs (nolock) on h.pricebatchstatusid = pbs.pricebatchstatusid
join pricebatchdetail d (nolock) on h.pricebatchheaderid = d.pricebatchheaderid
join
( -- Grab most recent REG.
select
item_key
,store_no
,pricebatchdetailid = max(pricebatchdetailid)
from pricebatchdetail _pbd (nolock)
join pricechgtype pct (nolock) on _pbd.pricechgtypeid = pct.pricechgtypeid
where
lower(rtrim(ltrim(pct.pricechgtypedesc))) = 'reg'
and expired = 0
group by item_key, store_no
) dreg
on d.item_key = dreg.item_key
and d.store_no = dreg.store_no
where
d.pricebatchdetailid < dreg.pricebatchdetailid -- Make sure PBD is not most recent REG.
and h.processeddate < #processedBatchesPurgeDateLimit
and lower(rtrim(ltrim(pbs.pricebatchstatusdesc))) = 'processed' -- Pushed/processed batches only.
So that's raising an overall question first: how to handle temp tables in Linq? This script uses about 10 of them. I currently have them as List. The problem is, if I try to .Join() on one in a query, I get the "Local sequence cannot be used in LINQ to SQL implementations of query operators except the Contains operator." error.
I was able to get the join to the derived table to work using 2 queries, just so a single one wouldn't get nightmarishly long:
var dreg = (from _pbd in db.PriceBatchDetails.Where(pbd => pbd.Expired == false && pbd.PriceChgType.PriceChgTypeDesc.ToLower().Trim() == "reg")
group _pbd by new { _pbd.Item_Key, _pbd.Store_No } into _pbds
select new
{
Item_Key = _pbds.Key.Item_Key,
Store_No = _pbds.Key.Store_No,
PriceBatchDetailID = _pbds.Max(pbdet => pbdet.PriceBatchDetailID)
});
var query = (from h in db.PriceBatchHeaders.Where(pbh => pbh.ProcessedDate < processedBatchesPurgeDateLimit)
join pbs in db.PriceBatchStatus on h.PriceBatchStatusID equals pbs.PriceBatchStatusID
join d in db.PriceBatchDetails on h.PriceBatchHeaderID equals d.PriceBatchHeaderID
join dr in dreg on new { d.Item_Key, d.Store_No } equals new { dr.Item_Key, dr.Store_No }
where d.PriceBatchDetailID < dr.PriceBatchDetailID
&& pbs.PriceBatchStatusDesc.ToLower().Trim() == "processed"
select d.PriceBatchDetailID);
So that query gives the expected results, which I am holding in a List, but then I need to join the results of that query to another one selected from the database, which is leading me back to the aforementioned "Local sequence cannot be used..." error.
That query is this:
insert into #pbhArchiveFullListSaved
select h.pricebatchheaderid
from pricebatchheader h (nolock)
join pricebatchdetail d (nolock)
on h.pricebatchheaderid = d.pricebatchheaderid
join #processedBatchesPurgeList dlist
on d.pricebatchdetailid = dlist.pricebatchdetailid -- PBH list is restricted to PBD purge list rows that have PBH references.
group by h.pricebatchheaderid
The join there on #processedBatchesPurgeList is the problem I am running into.
So uh...help? I have never written SQL like this, and certainly never tried to convert it to Linq.
As pointed out by the comments above, this is no longer being rewritten as Linq.
Was hoping to get a performance improvement along with achieving better SOX compliance, which was the whole reason for the rewrite in the first place.
I'm happy with just satisfying the SOX compliance issues.
Thanks, everyone.

Finding strings that are not in DB already

I have some bad performance issues in my application. One of the big operations is comparing strings.
I download a list of strings, approximately 1000 - 10000. These are all unique strings.
Then I need to check if these strings already exists in the database.
The linq query that I'm using looks like this:
IEnumerable<string> allNewStrings = DownloadAllStrings();
var selection = from a in allNewStrings
where !(from o in context.Items
select o.TheUniqueString).Contains(a)
select a;
Am I doing something wrong or how could I make this process faster preferably with Linq?
Thanks.
You did query the same unique strings 1000 - 10000 times for every element in allNewStrings, so it's extremely inefficient.
Try to query unique strings separately in order that it is executed once:
IEnumerable<string> allNewStrings = DownloadAllStrings();
var uniqueStrings = from o in context.Items
select o.TheUniqueString;
var selection = from a in allNewStrings
where !uniqueStrings.Contains(a)
select a;
Now you can see that the last query could be written using Except which is more efficient for the case of set operators like your example:
var selection = allNewStrings.Except(uniqueStrings);
An alternative solution would be to use a HashSet:
var set = new HashSet<string>(DownloadAllStrings());
set.ExceptWith(context.Items.Select(s => s.TheUniqueString));
The set will now contain the the strings that are not in the DB.

Wait for DomainContext.Load<t> from an entityquery with joins to complete (returning new type via 'select new')

My app consolidates data from other DBs for reporting purposes. We can't link the databases, so all the data processing has to be done in code - this is fine as we want to allow manual validation during the imports.
Certain users will be able to start an update through the Silverlight 4 front end.
I have 3 tables in database x that are fed from one EF4 Model (ModelX). I want to join those tables together, select specific columns and return the result as a new entity that exists in a different EF4 Model (ModelY). I'm using this query:
var myQuery = from i in DBx.table1 from it in DBx.table2 from h in DBx.table3 where (i.id==it.id && h.otherid == i.otherid) select new ModelYServer {Name = i.name,Thing = it.thing, Stuff = h.stuff};
The bit i'm stuck on, is how to execute that query, and wait until the Asynchronous call has completed. Normally, i'd use:
DomainContext.Load<T>(myQuery).Completed += (sender,args) =>
{List<T> myList = ((LoadOperation<T>)sender.Entities.ToList();};
but I can't pass myQuery (an IEnumerable) into the DomainContext.Load() as that expects an EntityQuery. The dataset is very large, and is taking up to 30 seconds to return, so I definitely need to wait before continuing.
So can anyone tell me how I can wait for the IEnumerable query to complete, or suggest a better way of doing this (there very likely is one).
Thanks
Mick
One simple way is just to force it to evaluate by calling ToList:
var query = from i in DBx.table1
join it in DBx.table2 on i.id equals it.id
join h in DBx.table3 on i.otherid equals h.otherid
select new ModelYServer {
Name = i.name,
Thing = it.thing,
Stuff = h.stuff
};
// This will block until the results have been fetched
var results = query.ToList();
// Now use results...
(I've changed your where clause into joins on the earlier tables, as that's what you were effectively doing and this is more idiomatic, IMO.)

LINQ subquery question

Can anybody tell me how I would get the records in the first statement that are not in the second statement (see below)?
from or in TblOrganisations
where or.OrgType == 2
select or.PkOrgID
Second query:
from o in TblOrganisations
join m in LuMetricSites
on o.PkOrgID equals m.FkSiteID
orderby m.SiteOrder
select o.PkOrgID
If you only need the IDs then Except should do the trick:
var inFirstButNotInSecond = first.Except(second);
Note that Except treats the two sequences as sets. This means that any duplicate elements in first won't be included in the results. I suspect that this won't be a problem since the name PkOrgID suggests a unique ID of some kind.
(See the documentation for Enumerable.Except and Queryable.Except for more info.)
Do you need the whole records, or just the IDs? The IDs are easy...
var ids = firstQuery.Except(secondQuery);
EDIT: Okay, if you can't do that, you'll need something like:
var secondQuery = ...; // As you've already got it
var query = from or in TblOrganisations
where or.OrgType == 2
where !secondQuery.Contains(or.PkOrgID)
select ...;
Check the SQL it produces, but I think it should do the right thing. Note that there's no point in performing any ordering in the second query - or even the join against TblOrganisations. In other words, you could use:
var query = from or in TblOrganisations
where or.OrgType == 2
where !LuMetricSites.Select(m => m.FkSiteID).Contains(or.PkOrgID)
select ...;
Use Except:
var filtered = first.Except(second);

Resources