EF 6.2 code first, simple query takes very long - performance

In an old DB application I'd like to start moving towards code first approach.
There are a lot of SPs, triggers, functions, etc. in the database which make things error prone.
As a starter, I'd like to have a proof of concept, therefore I started with a new solution, where I imported the entire database (Add new item -> ADO.NET entity data model -> Code First from database)
As a simple first shot I wanted to query 1 column of 1 table. The table contains about 5k rows and the result delivers 3k strings. This takes over 90 seconds now!
Here's the code of the query:
static void Main(string[] args)
{
using (var db = new Model1())
{
var theList = db.T_MyTable.AsNoTracking()
.Where(t => t.SOME_UID != null)
.OrderBy(t => t.SOMENAME)
.Select(t => t.SOMENAME)
.ToList();
foreach (var item in theList)
{
Console.WriteLine(item);
}
Console.WriteLine("Number of names: " + theList.Count());
}
Console.ReadKey();
}
In the generated table code I added the column type "VARCHAR" to all of the string fields/column properties:
[Column(TypeName = "VARCHAR")] // this I added to all of the string properties
[StringLength(50)]
public string SOME_UID { get; set; }
I assume I miss out an important step, can't believe code first query is so slow.

I figured the root cause is the huge context that needs to be built, existing of over 1000 tables/files.
How I found the problem: using the profiler I observed that the expected query hits the database after about 90 seconds, telling me that the query itself is fast. Then I tried the same code in a new project, where I only imported the single table I access in the code.
Another proof that it's context related is executing the query twice in the same session; the second time was executed in the milliseconds.
Key point: if you have a legacy database with a lot of tables, don't use 1 single DbContext that contains all the tables (except for initializing the database), but several smaller domain specific ones with the tables you need for the given domain context. Entities can exist in multiple DbContexts, taylor the relationships (e.g. by "Ignore"-ing where not required) and do lazy loading where appropriate. These things help to boost performance.

Related

Getting max value on server (Entity Framework)

I'm using EF Core but I'm not really an expert with it, especially when it comes to details like querying tables in a performant manner...
So what I try to do is simply get the max-value of one column from a table with filtered data.
What I have so far is this:
protected override void ReadExistingDBEntry()
{
using Model.ResultContext db = new();
// Filter Tabledata to the Rows relevant to us. the whole Table may contain 0 rows or millions of them
IQueryable<Measurement> dbMeasuringsExisting = db.Measurements
.Where(meas => meas.MeasuringInstanceGuid == Globals.MeasProgInstance.Guid
&& meas.MachineId == DBMatchingItem.Id);
if (dbMeasuringsExisting.Any())
{
// the max value we're interested in. Still dbMeasuringsExisting could contain millions of rows
iMaxMessID = dbMeasuringsExisting.Max(meas => meas.MessID);
}
}
The equivalent SQL to what I want would be something like this.
select max(MessID)
from Measurement
where MeasuringInstanceGuid = Globals.MeasProgInstance.Guid
and MachineId = DBMatchingItem.Id;
While the above code works (it returns the correct value), I think it has a performance issue when the database table is getting larger, because the max filtering is done at the client-side after all rows are transferred, or am I wrong here?
How to do it better? I want the database server to filter my data. Of course I don't want any SQL script ;-)
This can be addressed by typing the return as nullable so that you do not get a returned error and then applying a default value for the int. Alternatively, you can just assign it to a nullable int. Note, the assumption here of an integer return type of the ID. The same principal would apply to a Guid as well.
int MaxMessID = dbMeasuringsExisting.Max(p => (int?)p.MessID) ?? 0;
There is no need for the Any() statement as that causes an additional trip to the database which is not desirable in this case.

ADO.NET - Data Adapter Fill Method - Fill Dataset with rows modified in SQL

I am using ADO.NET with Data Adaptor to Fill a Dataset in my .NET Core 3.1 Project.
The first run for the Fill method occurs when my program initially starts so I have an in memeory cache to start using with my business/program logic. When I then make any changes to the tables using EF Core, once the changes have been saved I then run the Data Adapter Fill method to re-populate the Dataset with the updates from the tables that were modified in SQL through EF Core..
Reading various docs for a number of days now, what I'm unclear about is whether the Data Adapter Fill method overwrites all of the existing table rows in the Dataset each time the fill method is called? i.e if I'm loading a dataset with a table from SQL that has 10k rows, is it going to overwrite all 10k rows that exist in the dataset, even if 99% of the rows have not changed?
The reason I am going down the Dataset route is that I want to keep and in memory cache of the various tables from SQL so I can query the data as fast as possible without raising queries SQL all the time.
The solution I want is something along the lines of Data Adaptor Fill method, but I don't want the Dataset to be overwritten for any rows that had not been modified in SQL since the last run.
Is this how things are working already? or do I have to look for another solution?
Below just an example of the Adaptor Fill method.
public async Task<AdoNetResult> FillAlarmsDataSet()
{
string connectionString = _config.GetConnectionString("DefaultConnection");
try
{
string cmdText1 = "SELECT * FROM [dbo].[Alarm] ORDER BY Id;" +
"SELECT * FROM [dbo].[AlarmApplicationRole] ORDER BY Id;";
dataAdapter = new SqlDataAdapter(cmdText1, connectionString);
// Create table mappings
dataAdapter.TableMappings.Add("Alarm", "Alarm");
dataAdapter.TableMappings.Add("AlarmApplicationRole", "AlarmApplicationRole");
alarmDataSet = new DataSet
{
Locale = CultureInfo.InvariantCulture
};
// Create and fill the DataSet
await Task.Run(() => dataAdapter.Fill(alarmDataSet));
return AdoNetResult.Success;
}
catch (Exception ex)
{
// Return the task with details of the exception
return AdoNetResult.Failed(ex);
}
}

How to improve performance of an EF Core query which uses several Includes

I've got this query, which I'll simplify for brevity:
public IQueryable<User> GetByIdAsync(Guid userId)
{
return MyContext
.Users
//Bunch of Includes
//Most of which have a ThenInclude
//Followed by another ThenInclude
.FirstOrDefaultAsync(u => u.Id == userId)
}
When run for around 100 users, it takes over 15 seconds (running locally on my machine). Not great.
I've tried using AsNoTracking(), as well as changing it to use a compiled query like so:
private static Func<MyContext, Guid, Task<User>> _getByIdAsync =
EF.CompileAsyncQuery((MyContext context, Guid userId) =>
context
.Users
//Same Includes as above
.Where(u => u.Id == userId)
.FirstOrDefault());
public IQueryable<User> GetByIdAsync(Guid userId)
{
return await _getByIdAsync(userId);
}
Still no difference.
I've had a look at this answer for a relevant thread, which suggests using plain old SQL:
https://stackoverflow.com/a/16977218/9778248
And I've had a look at this answer, which mentions clustered indexes:
https://stackoverflow.com/a/55557237/9778248
I certainly can't exclude any of the Includes as the client depends on all this info. Redesigning is also not an option at this stage.
Questions
Are there any other options that can improve performance?
I can't see any CLUSTERED or NONCLUSTERED tags in any of my child table indexes. Is this worth looking into and if so, can I be pointed to any documentation that explains how I can go about updating using EF (or without)?
You have many ways but it all depends.
You have .FirstOrDefaultAsync(u => u.Id == userId) which means that for 100 users you will go to database 100 times so in total 15 000 / 100 == equals 150 ms per request. To improve it try to get all 100 user at once using in clause like .Where(u=> userIds.contains(u.Id))
Example.
private static Func<MyContext, Guid, Task<List<User>>> _getByIdAsync =
EF.CompileAsyncQuery((MyContext context, List<Guid> userIds) =>
context
.Users
//Same Includes as above
.Where(u => userIds.Contains(u.Id))).ToListAsync();
I know nothing about your data structure but if you can write linq using joins it could be faster, because for many to many within one request EF can go to database each time per dependency.
Example how you can query using joins
var query = (from users in context.Users
join otherTable in context.OtherTable on users.Id equals otherTable.UserId).ToList();
Ef try to fit general purpose but sometimes as you know your data only you can do it better, I used to have similar problem to yours when I have repository methods to get data 1 by one, but then I wrote new method to fetch data using array and that method took care about joined data, via EF it was basically impossible to do it fast. So what I am saying in one request load all one to one, then read from db and using another query go and grab many to many you need.
Also you can get sql query
You can get sql using this sample
public IQueryable<User> GetByIdAsync(Guid userId)
{
var = query = MyContext
.Users
//Bunch of Includes
//Most of which have a ThenInclude
//Followed by another ThenInclude
var sql = query.ToSql(); // <--------------------------- sql query
return query.FirstOrDefaultAsync(u => u.Id == userId)
}
and use sql query to profile and see if its using indexes.
And lastly I really hate methods like this public IQueryable GetByIdAsync(Guid userId) problem is that most of the time you dont need all that includes, but you start using them more and more and become depend on them... This is why I would recommend use EF without repository pattern, EF itself is repository get from database only data you need.

How to avoid Query Plan re-compilation when using IEnumerable.Contains in Entity Framework LINQ queries?

I have the following LINQ query executed using Entity Framework (v6.1.1):
private IList<Customer> GetFullCustomers(IEnumerable<int> customersIds)
{
IQueryable<Customer> fullCustomerQuery = GetFullQuery();
return fullCustomerQuery.Where(c => customersIds.Contains(c.Id)).ToList();
}
This query is translated into fairly nice SQL:
SELECT
[Extent1].[Id] AS [Id],
[Extent1].[FirstName] AS [FirstName]
-- ...
FROM [dbo].[Customer] AS [Extent1]
WHERE [Extent1].[Id] IN (1, 2, 3, 5)
However, I get a very significant performance hit on a query compilation phase. Calling:
ELinqQueryState.GetExecutionPlan(MergeOption? forMergeOption)
Takes ~50% of the time of each request. Digging deeper, it turned out that query gets re-compiled every time I pass different customersIds.
According to MSDN article, this is an expected behavior because IEnumerable that is used in a query is considered volatile and is part of SQL that is cached. That's why SQL is different for every different combination of customersIds and it always has different hash that is used to get compiled query from cache.
Now the question is: How can I avoid this re-compilation while still querying with multiple customersIds?
This is a great question. First of all, here are a couple of workarounds that come to mind (they all require changes to the query):
First workaround
This one maybe a bit obvious and unfortunately not generally applicable: If the selection of items you would need to pass over to Enumerable.Contains already exists in a table in the database, you can write a query that calls Enumerable.Contains on the corresponding entity set in the predicate instead of bringing the items into memory first. An Enumerable.Contains call over data in the database should result in some kind of JOIN-based query that can be cached. E.g. assuming no navigation properties between Customers and SelectedCustomers, you should be able to write the query like this:
var q = db.Customers.Where(c =>
db.SelectedCustomers.Select(s => s.Id).Contains(c.Id));
The syntax of the query with Any is a bit simpler in this case:
var q = db.Customers.Where(c =>
db.SelectedCustomers.Any(s => s.Id == c.Id));
If you don't already have the necessary selection data stored in the database, you will probably don't want the overhead of having to store it, so you should consider the next workaround.
Second workaround
If you know beforehand that you will have a relatively manageable maximum number of elements in the list you can replace Enumerable.Contains with a tree of OR-ed equality comparisons, e.g.:
var list = new [] {1,2,3};
var q = db.Customers.Where(c =>
list[0] == c.Id ||
list[1] == c.Id ||
list[2] == c.Id );
This should produce a parameterized query that can be cached. If the list varies in size from query to query, this should produce a different cache entry for each list size. Alternatively you could use a list with a fixed size and pass some sentinel value that you know will never match the value argument, e.g. 0, -1, or alternatively just repeat one of the other values. In order to produce such predicate expression programmatically at runtime based on a list, you might want to consider using something like PredicateBuilder.
Potential fixes and their challenges
On one hand, changes necessary to support caching of this kind of query using CompiledQuery explicitly would be pretty complex in the current version of EF. The key reason is that the elements in the IEnumerable<T> passed to the Enumerable.Contains method would have to translate into a structural part of the query for the particular translation we produce, e.g.:
var list = new [] {1,2,3};
var q = db.Customers.Where(c => list.Contains(c.Id)).ToList();
The enumerable β€œlist” looks like a simple variable in C#/LINQ but it needs to be translated to a query like this (simplified for clarity):
SELECT * FROM Customers WHERE Id IN(1,2,3)
If list changes to new [] {5,4,3,2,1}, and we would have to generate the SQL query again!
SELECT * FROM Customers WHERE Id IN(5,4,3,2,1)
As a potential solution, we have talked about leaving generated SQL queries open with some kind of special place holder, e.g. store in the query cache that just says
SELECT * FROM Customers WHERE Id IN(<place holder>)
At execution time, we could pick this SQL from the cache and finish the SQL generation with the actual values. Another option would be to leverage a Table-Valued Parameter for the list if the target database can support it. The first option would probably work ok only with constant values, the latter requires a database that supports a special feature. Both are very complex to implement in EF.
Auto compiled queries
On the other hand, for automatic compiled queries (as opposed to explicit CompiledQuery) the issue becomes somewhat artificial: in this case we compute the query cache key after the initial LINQ translation, hence any IEnumerable<T> argument passed should have already been expanded into DbExpression nodes: a tree of OR-ed equality comparisons in EF5, and usually a single DbInExpression node in EF6. Since the query tree already contains a distinct expression for each distinct combination of elements in the source argument of Enumerable.Contains (and therefore for each distinct output SQL query), it is possible to cache the queries.
However even in EF6 these queries are not cached even in the auto compiled queries case. The key reason for that is that we expect the variability of elements in a list to be high (this has to do with the variable size of the list but is also exacerbated by the fact that we normally don't parameterize values that appear as constants to the query, so a list of constants will be translated into constant literals in SQL), so with enough calls to a query with Enumerable.Contains you could produce considerable cache pollution.
We have considered alternative solutions to this as well, but we haven't implemented any yet. So my conclusion is that you would be better off with the second workaround in most cases if as I said, you know the number of elements in the list will remain small and manageable (otherwise you will face performance issues).
Hope this helps!
As of now, this is still a problem in Entity Framework Core when using the SQL Server Database Provider.
πŸ’‘ Still on Entity Framework 6 (non-core)? skip to the next section.
I wrote QueryableValues to solve this problem in a flexible and performant way; with it you can compose the values from an IEnumerable<T> in your query, like if it were another entity in your DbContext.
In contrast to other solutions out there, QueryableValues achieves this level of performance by:
Resolving with a single round-trip to the database.
Preserving the query's execution plan regardless of the provided values.
Usage example:
// Sample values.
IEnumerable<int> values = Enumerable.Range(1, 10);
// Using a Join.
var myQuery1 =
from e in dbContext.MyEntities
join v in dbContext.AsQueryableValues(values) on e.Id equals v
select new
{
e.Id,
e.Name
};
// Using Contains.
var myQuery2 =
from e in dbContext.MyEntities
where dbContext.AsQueryableValues(values).Contains(e.Id)
select new
{
e.Id,
e.Name
};
You can also compose complex types!
It's available as a nuget package and the project can be found here. It's distributed under the MIT license.
The benchmarks speak for themselves.
An Alternative for Entity Framework 6 (non-core)
πŸŽ‰ NEW! QueryableValues EF6 Edition has arrived!
I'll explain how to manually provide some of the functionality of QueryableValues on this legacy version of Entity Framework, specifically, the ability to compose an IEnumerable<int> with any of your entities in the same way that QueryableValues does on EF Core. You can use this same technique to support collections of other simple types like long, string, etc.
Requirements
Must use the SQL Server provider
Must use the database-first strategy OR you already have a way to map a TVF using the code-first strategy
Instructions Summary
Create a method that takes an IEnumerable<int> and returns XML.
Create a TVF in your database that takes XML and returns a rowset.
Add the TVF to the EDMX using the designer.
Encapsulate the code that glues the functions created on step 1 and 2 and return an IQueryable<int>.
Use the IQueryable<int> in your queries as desired.
Instructions
1. Create a method that takes a IEnumerable<int> and returns XML
This method will serialize the provided values as XML, so later on it can be transmitted as a parameter in your query.
static string GetXml<T>(IEnumerable<T> values)
{
var sb = new StringBuilder();
using (var stringWriter = new System.IO.StringWriter(sb))
{
var settings = new System.Xml.XmlWriterSettings
{
ConformanceLevel = System.Xml.ConformanceLevel.Fragment
};
using (var xmlWriter = System.Xml.XmlWriter.Create(stringWriter, settings))
{
xmlWriter.WriteStartElement("R");
foreach (var value in values)
{
xmlWriter.WriteStartElement("V");
xmlWriter.WriteValue(value);
xmlWriter.WriteEndElement();
}
xmlWriter.WriteEndElement();
}
}
return sb.ToString();
}
If the above method is provided with new[] { 1, 2, 3 }, it will return a XML string with the following structure:
<R><V>1</V><V>2</V><V>3</V></R>
2. Create a TVF in your database that takes XML and returns a rowset
The following table-valued function (TVF) will take the XML created by the previous function and project it as a rowset with a single column (V), that can then be used from SQL Server's side in your query. Must be created in the database associated with your EDMX file, so it can be added to your EDMX model in the next step.
CREATE FUNCTION dbo.udf_GetIntValuesFromXml
(
#Values XML
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
SELECT I.value('. cast as xs:integer?', 'int') AS V
FROM #Values.nodes('/R/V') N(I)
)
The above function when provided with the <R><V>1</V><V>2</V><V>3</V></R> XML, will return the following rowset:
V
1
2
3
3. Add the TVF to the EDMX using the designer
Table-Valued Functions (TVFs) - EF Docs
After adding this function to your EDMX model, ensure to save the changes to the EDMX file so that your DbContext generated code is up to date.
4. Encapsulate the code that glues the functions created on step 1 and 2 and return an IQueryable<int>
The following code encapsulates the XML serializer function explained above and everything else you need on the .NET side to make this work:
using System.Collections.Generic;
using System.Linq;
public static class QueryableValuesClassicDbContextExtensions
{
private static string GetXml<T>(IEnumerable<T> values)
{
var sb = new StringBuilder();
using (var stringWriter = new System.IO.StringWriter(sb))
{
var settings = new System.Xml.XmlWriterSettings
{
ConformanceLevel = System.Xml.ConformanceLevel.Fragment
};
using (var xmlWriter = System.Xml.XmlWriter.Create(stringWriter, settings))
{
xmlWriter.WriteStartElement("R");
foreach (var value in values)
{
xmlWriter.WriteStartElement("V");
xmlWriter.WriteValue(value);
xmlWriter.WriteEndElement();
}
xmlWriter.WriteEndElement();
}
}
return sb.ToString();
}
public static IQueryable<int> AsQueryableValues(this IQueryableValuesClassicDbContext dbContext, IEnumerable<int> values)
{
return dbContext.GetIntValuesFromXml(GetXml(values));
}
}
public interface IQueryableValuesClassicDbContext
{
IQueryable<int> GetIntValuesFromXml(string xml);
}
The IQueryableValuesClassicDbContext interface is intended to be explicitly implemented on your DbContext class to provide access to the TVF that was added to the EDMX model.
You can do this by creating a partial class for your DbContext. For example, if your DbContext name is TestDbContext:
using System.Linq;
partial class TestDbContext : IQueryableValuesClassicDbContext
{
IQueryable<int> IQueryableValuesClassicDbContext.GetIntValuesFromXml(string xml)
{
return udf_GetIntValuesFromXml(xml).Select(i => i.Value);
}
}
5. Use the IQueryable<int> in your queries as desired (via AsQueryableValues)
using (var db = new TestDbContext())
{
var valuesQuery = db.AsQueryableValues(new[] { 1, 2, 3, 4, 5 });
var resultsUsingContains = db.MyEntity
.Where(i => valuesQuery.Contains(i.MyEntityID))
.Select(i => new { i.MyEntityID, i.PropA })
.ToList();
var resultsUsingJoin = (
from i in db.MyEntity
join v in valuesQuery on i.MyEntityID equals v
select new { i.MyEntityID, i.PropA }
)
.ToList();
}
Below is the T-SQL generated behind the scenes for the above EF queries. As you can see, it's completely parameterized.
exec sp_executesql N'SELECT
[Extent1].[MyEntityID] AS [MyEntityID],
[Extent1].[PropA] AS [PropA]
FROM [dbo].[MyEntity] AS [Extent1]
WHERE EXISTS (SELECT
1 AS [C1]
FROM [dbo].[udf_GetIntValuesFromXml](#Values) AS [Extent2]
WHERE ([Extent2].[V] = [Extent1].[MyEntityID]) AND ([Extent2].[V] IS NOT NULL)
)',N'#Values nvarchar(4000)',#Values=N'<R><V>1</V><V>2</V><V>3</V><V>4</V><V>5</V></R>'
exec sp_executesql N'SELECT
[Extent1].[MyEntityID] AS [MyEntityID],
[Extent1].[PropA] AS [PropA]
FROM [dbo].[MyEntity] AS [Extent1]
INNER JOIN [dbo].[udf_GetIntValuesFromXml](#Values) AS [Extent2] ON [Extent1].[MyEntityID] = [Extent2].[V]',N'#Values nvarchar(4000)',#Values=N'<R><V>1</V><V>2</V><V>3</V><V>4</V><V>5</V></R>'
Limitations
The provided IEnumerable<int> is enumerated at query build time, not at execution time.
The final query cannot reference more than one IQueryable<T> returned by the AsQueryableValues extension method. This is another limitation around composing the same TVF more than once. EF will create two parameters with the same name, which is illegal and you will get the following error:
A parameter named 'Values' already exists in the parameter collection. Parameter names must be unique in the parameter collection.
Incorrect type used for the XML type parameter of the TVF (notice the use of nvarchar instead of xml in the T-SQL above). This is a deficiency in the EF infrastructure (ObjectParameter) that's used to compose the TVF. Not using the correct parameter type has a detrimental effect in performance due to the implicit casting that must be done by SQL Server.
Conclusion
Despite the limitations, this is still a robust solution when compared to not using parameterized T-SQL queries. To understand the underlying issue that this mitigates you can continue reading here.
Legal Stuff
Feel free to use the code and examples above as you wish. I'm releasing it under the MIT license:
MIT License
Copyright (c) Carlos Villegas (yv989c)
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
I had this exact challenge. Here is how I tackled this problem for either strings or longs in an extension method for IQueryables.
To limit the caching pollution we create the same query with a multitude n of m (configurable) parameters, so 1 * m, 2 * m etc. So if the setting is 15; The queryplans would have either 15, 30, 45 etc parameters, depending on the number of elements in the contains (we don't know in advance, but probably less than 100) limiting the number of query plans to 3 if the biggest contains is less than or equal to 45.
The remaining parameters are filled with a placeholdervalue that (we know) doesn't exists in the database. In this case '-1'
Resulting query part;
... WHERE [Filter1].[SomeProperty] IN (#p__linq__0,#p__linq__1, (...) ,#p__linq__19)
... #p__linq__0='SomeSearchText1',#p__linq__1='SomeSearchText2',#p__linq__2='-1',
(...) ,#p__linq__19='-1'
Usage:
ICollection<string> searchtexts = .....ToList();
//or
//ICollection<long> searchIds = .....ToList();
//this is the setting that is relevant for the resulting multitude of possible queryplans
int itemsPerSet = 15;
IQueryable<MyEntity> myEntities = (from c in dbContext.MyEntities
select c)
.WhereContains(d => d.SomeProperty, searchtexts, "-1", itemsPerSet);
The extension method:
using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Linq.Expressions;
namespace MyCompany.Something.Extensions
{
public static class IQueryableExtensions
{
public static IQueryable<T> WhereContains<T, U>(this IQueryable<T> source, Expression<Func<T,U>> propertySelector, ICollection<U> identifiers, U placeholderThatDoesNotExistsAsValue, int cacheLevel)
{
if(!(propertySelector.Body is MemberExpression))
{
throw new ArgumentException("propertySelector must be a MemberExpression", nameof(propertySelector));
}
var propertyExpression = propertySelector.Body as MemberExpression;
var propertyName = propertyExpression.Member.Name;
return WhereContains(source, propertyName, identifiers, placeholderThatDoesNotExistsAsValue, cacheLevel);
}
public static IQueryable<T> WhereContains<T, U>(this IQueryable<T> source, string propertyName, ICollection<U> identifiers, U placeholderThatDoesNotExistsAsValue, int cacheLevel)
{
return source.Where(ContainsPredicateBuilder<T, U>(identifiers, propertyName, placeholderThatDoesNotExistsAsValue, cacheLevel));
}
public static Expression<Func<T, bool>> ContainsPredicateBuilder<T,U>(ICollection<U> ids, string propertyName, U placeholderValue, int cacheLevel = 20)
{
if(cacheLevel < 1)
{
throw new ArgumentException("cacheLevel must be greater than or equal to 1", nameof(cacheLevel));
}
Expression<Func<T, bool>> predicate;
var propertyIsNullable = Nullable.GetUnderlyingType(typeof(T).GetProperty(propertyName).PropertyType) != null;
// fill a list of cachableLevel number of parameters for the property, equal the selected items and padded with the placeholder value to fill the list.
Expression finalExpression = Expression.Constant(false);
var parameter = Expression.Parameter(typeof(T), "x");
/* factor makes sure that this query part contains a multitude of m parameters (i.e. 20, 40, 60, ...),
* so the number of query plans is limited even if lots of users have more than m items selected */
int factor = Math.Max(1, (int)Math.Ceiling((double)ids.Count / cacheLevel));
for (var i = 0; i < factor * cacheLevel; i++)
{
U id = placeholderValue;
if (i < ids.Count)
{
id = ids.ElementAt(i);
}
var temp = new { id };
var constant = Expression.Constant(temp);
var field = Expression.Property(constant, "id");
var member = Expression.Property(parameter, propertyName);
if (propertyIsNullable)
{
member = Expression.Property(member, "Value");
}
var expression = Expression.Equal(member, field);
finalExpression = Expression.OrElse(finalExpression, expression);
}
predicate = Expression.Lambda<Func<T, bool>>(finalExpression, parameter);
return predicate;
}
}
}
This is really a huge problem, and there's no one-size-fits-all answer. However, when most lists are relatively small, diverga's "Second Workaround" works well. I've built a library distributed as a NuGet package to perform this transformation with as little modification to the query as possible:
https://github.com/bchurchill/EFCacheContains
It's been tested out in one project, but feedback and user experiences would be appreciated! If any issues come up please report on github so that I can follow-up.

Using "Any" or "Contains" when context not saved yet

Why isn't the exception triggered? Linq's "Any()" is not considering the new entries?
MyContext db = new MyContext();
foreach (string email in {"asdf#gmail.com", "asdf#gmail.com"})
{
Person person = new Person();
person.Email = email;
if (db.Persons.Any(p => p.Email.Equals(email))
{
throw new Exception("Email already used!");
}
db.Persons.Add(person);
}
db.SaveChanges()
Shouldn't the exception be triggered on the second iteration?
The previous code is adapted for the question, but the real scenario is the following:
I receive an excel of persons and I iterate over it adding every row as a person to db.Persons, checking their emails aren't already used in the db. The problem is when there are repeated emails in the worksheet itself (two rows with the same email)
Yes - queries (by design) are only computed against the data source. If you want to query in-memory items you can also query the Local store:
if (db.Persons.Any(p => p.Email.Equals(email) ||
db.Persons.Local.Any(p => p.Email.Equals(email) )
However - since YOU are in control of what's added to the store wouldn't it make sense to check for duplicates in your code instead of in EF? Or is this just a contrived example?
Also, throwing an exception for an already existing item seems like a poor design as well - exceptions can be expensive, and if the client does not know to catch them (and in this case compare the message of the exception) they can cause the entire program to terminate unexpectedly.
A call to db.Persons will always trigger a database query, but those new Persons are not yet persisted to the database.
I imagine if you look at the data in debug, you'll see that the new person isn't there on the second iteration. If you were to set MyContext db = new MyContext() again, it would be, but you wouldn't do that in a real situation.
What is the actual use case you need to solve? This example doesn't seem like it would happen in a real situation.
If you're comparing against the db, your code should work. If you need to prevent dups being entered, it should happen elsewhere - on the client or checking the C# collection before you start writing it to the db.

Resources