EF - Linq Expression and using a List of Ints to get best performance - performance

So I have a list(table) of about 100k items and I want to retrieve all values that match a given list.
I have something like this.
the Table Sections key is NOT a primary key, so I'm expecting each value in listOfKeys to return a few rows.
List<int> listOfKeys = new List<int>(){1,3,44};
var allSections = Sections.Where(s => listOfKeys.Contains(s.id));
I don't know if it makes a difference but generally listOfKeys will only have between 1 to 3 items.
I'm using the Entity Framework.
So my question is, is this the best / fastest way to include a list in a linq expression?
I'm assuming that it isn't better to use another .NETICollection data object. Should I be using a Union or something?
Thanks

Suppose the listOfKeys will contain only small about of items and it's local list (not from database), like <50, then it's OK. The query generated will be basically WHERE id in (...) or WHERE id = ... OR id = ... ... and that's OK for database engine to handle it.

A Join would probably be more efficient:
var allSections =
from s in Sections
join k in listOfKeys on s.id equals k
select s;
Or, if you prefer the extension method syntax:
var allSections = Sections.Join(listOfKeys, s => s.id, k => k, (s, k) => s);

Related

Combining LINQ Queries to reduce database calls

I have 2 queries that work, I was hoping to combine them to reduce the database calls.
var locations = from l in db.Locations
where l.LocationID.Equals(TagID)
select l;
I do the above because I need l.Name, but is there a way to take the above results and put them into the query below?
articles = from a in db.Articles
where
(
from l in a.Locations
where l.LocationID.Equals(TagID)
select l
).Any()
select a;
Will I actually be reducing any database calls here?
This seems a bit complicated because Locations appears to be a multi-value property of Articles and you want to only load the correct one. According to this answer to a similar question you need to use a select to return them separately in one go so e.g.
var articles = from a in db.Articles
select new {
Article = a,
Location = a.Locations.Where(l => l.LocationId == TagId)
};
First failed attempt using join:
var articlesAndLocations = from a in db.Articles
join l in a.Locations
on l.LocationID equals TagID
select new { Article = a, Location = l };
(I usually use the other LINQ syntax though so apologies if I've done something stupid there.)
Could you not use the Include() method here to pull in the locations which are associated with each article, then select both the article and location object? or the properties you need from each.
The include method will ensure that you don't need to dip into the db twice, but will allow you to access properties on related entities.
You would need to use a contains method on an IEnumerable I believe, something like this:
var tagIdList = new List() { TagID };
var articles = from a in db.Articles.Include("Locations")
where tagIdList.Contains(from l in a.Locations select l.LocationID)
select new { a, a.Locations.Name };
(Untested)

How to compare two tables in linq to sql?

I am trying to compare two tables (i.e values, count, etc..) in linq to sql but I am not getting the way to achieve it. I tried the following,
Table1.Any(i => i.itemNo == Table2.itemNo)
It gives error. Could you please help me?
Thanks in Advance.
how about
var isDifferent =
Table1.Zip(Table2, (j, k) => j.itemNo == k.itemMo).Any(m => !m);
EDIT
if Linq-To-Sql does not support Zip.
var one = Table1.ToList();
var two = Table2.ToList();
var isDifferent =
one.Zip(two, (j, k) => j.itemNo == k.itemMo).Any(m => !m);
if the tables are vary large this could cause performance problems. In that case you will need a much more sophisticated solution, if so, please ask.
EDIT2
If the tables are very large you don't want to get all the data from the server and hold it memory. Additionaly, Linq and SQL server do not garauntee the order of the rows unless you specify an order in the query. This becomes espcially relavent for large result sets returned by a multi processor server where the effects of parallelism are likely to come into play.
I suggest that Linq-to-Sql doesen't really cater well for your scenario so you will have to help it out using ExecuteQuery somthing like this.
string zipQuery =
#"SELECT TOP 1
1
FROM
[Table1] [one]
WHERE
NOT EXISTS (
SELECT * FROM [Table2] [two] WHERE [two].[itemNo] = [one].[itemNo]
)
UNION ALL
SELECT
1
FROM
[Table2] [two]
WHERE
NOT EXISTS (
SELECT * FROM [Table1] [one] WHERE [one].[itemNo] = [two].[itemNo]
)
UNION ALL
SELECT 0";
var isDifferent = context.ExecuteQuery<int>(zipQuery).Single() == 1;
This will do the select on the server without returning lots of data to the client but, I think you will agree is much more complicated.
EDIT3
Okay, the zip approach should be fine for 1000 rows. I've read your comment and I suggest changing the code accordingly.
var one = Table1.ToList();
var two = Table2.ToList();
var isDifferent =
one.Count != two.Count ||
one.Zip(two, (o, t) => o.itemNo == k.itemNo).Any(m => !m);
You should probably consider putting an order by on the list retrievers, like this.
var one = Table1.OrderBy(o => o.itemNo).ToList();
Strictly, the results of a Linq-to-Sql come back in any order unless an order is specified.

Finding strings that are not in DB already

I have some bad performance issues in my application. One of the big operations is comparing strings.
I download a list of strings, approximately 1000 - 10000. These are all unique strings.
Then I need to check if these strings already exists in the database.
The linq query that I'm using looks like this:
IEnumerable<string> allNewStrings = DownloadAllStrings();
var selection = from a in allNewStrings
where !(from o in context.Items
select o.TheUniqueString).Contains(a)
select a;
Am I doing something wrong or how could I make this process faster preferably with Linq?
Thanks.
You did query the same unique strings 1000 - 10000 times for every element in allNewStrings, so it's extremely inefficient.
Try to query unique strings separately in order that it is executed once:
IEnumerable<string> allNewStrings = DownloadAllStrings();
var uniqueStrings = from o in context.Items
select o.TheUniqueString;
var selection = from a in allNewStrings
where !uniqueStrings.Contains(a)
select a;
Now you can see that the last query could be written using Except which is more efficient for the case of set operators like your example:
var selection = allNewStrings.Except(uniqueStrings);
An alternative solution would be to use a HashSet:
var set = new HashSet<string>(DownloadAllStrings());
set.ExceptWith(context.Items.Select(s => s.TheUniqueString));
The set will now contain the the strings that are not in the DB.

How to optimize this linq to objects query?

I'm matching some in memory lists entities with a .contains (subselect) query to filter out old from new users.
Checking for performance problems i saw this:
The oldList mostly has around 1000 users in them, while the new list varies from 100 to 500. Is there a way to optimize this query?
Absolutely - build a set instead of checking a list each time:
// Change string to whatever the type of UserID is.
var oldUserSet = new HashSet<string>(oldList.Select(o => o.UserID));
var newUsers = NewList.Where(n => !oldUserSet.Contains(n.UserID))
.ToList();
The containment check on a HashSet should be O(1) assuming few hash collisions, instead of the O(N) of checking each against the whole sequence (for each new user).
You could make a HashSet<T> of your user IDs in advance. This will cause the Contains to become an O(1) operation:
var oldSet = new HashSet<int>(oldList.Select(o => o.UserID));
var newUsers = NewList.Where(n => !oldSet.Contains(n.UserID)).ToList();
While those HashSet<T> answers are straightforward and simple, some may prefer a linq-centric solution.
LinqToObjects implements join and GroupJoin with a HashSet. Just use one of those - this example uses GroupJoin:
List<User> newUsers =
(
from n in NewList
join o in oldList on n.UserId equals o.UserId into oldGroup
where !oldGroup.Any()
select n
).ToList()

LINQ subquery question

Can anybody tell me how I would get the records in the first statement that are not in the second statement (see below)?
from or in TblOrganisations
where or.OrgType == 2
select or.PkOrgID
Second query:
from o in TblOrganisations
join m in LuMetricSites
on o.PkOrgID equals m.FkSiteID
orderby m.SiteOrder
select o.PkOrgID
If you only need the IDs then Except should do the trick:
var inFirstButNotInSecond = first.Except(second);
Note that Except treats the two sequences as sets. This means that any duplicate elements in first won't be included in the results. I suspect that this won't be a problem since the name PkOrgID suggests a unique ID of some kind.
(See the documentation for Enumerable.Except and Queryable.Except for more info.)
Do you need the whole records, or just the IDs? The IDs are easy...
var ids = firstQuery.Except(secondQuery);
EDIT: Okay, if you can't do that, you'll need something like:
var secondQuery = ...; // As you've already got it
var query = from or in TblOrganisations
where or.OrgType == 2
where !secondQuery.Contains(or.PkOrgID)
select ...;
Check the SQL it produces, but I think it should do the right thing. Note that there's no point in performing any ordering in the second query - or even the join against TblOrganisations. In other words, you could use:
var query = from or in TblOrganisations
where or.OrgType == 2
where !LuMetricSites.Select(m => m.FkSiteID).Contains(or.PkOrgID)
select ...;
Use Except:
var filtered = first.Except(second);

Resources