What is the benefit of using IQueryable and LINQ queries? - linq

I have a project where was realized own configuration classes:
IconSizesConfigSection: ConfigurationSection
IconSizesCollection: ConfigurationElementCollection
IconSize: ConfigurationElement
In Config class exists this property:
public IQueryable<IconSize> IconSizes
{
get
{
IconSizesConfigSection configInfo = (IconSizesConfigSection)ConfigurationManager.GetSection("iconConfig");
return configInfo.IconSizes.OfType<IconSize>().AsQueryable<IconSize>();
}
}
IconSizes property returns IconSizesCollection which derives from ConfigurationElementCollection. In turn ConfigurationElementCollection derives from ICollection, IEnumerable.
In some another class I have such code:
var previewIconSize = Config.IconSizes.FirstOrDefault(c => c.Name == "AvatarSize");
Why in such case uses Deffered Execution?
Why initially it uses AsQueryable<IconSize>() for collection and then uses LINQ and Deffered Execution?
Is there any benefits compared with using simple List?

In these case, there is no practical benefit. Using IQueryable is helpful for cases when query rewriting/translation will optimize performance. You will actually incur decreased performance in the provided example.
One example of using IQueryable in a helpful way is the significant performance increase gained when lazily translating and evaluating queries against a database or web service. This will perform significantly better than the alternative of pulling massive result sets and applying query logic in active memory with a "simple List".
The way you can tell that using the IQueryable in your case is detrimental is that the collection is already loaded into memory, when you begin the query.

Both IEnumerable and IQueryable use deferred execution. The difference is that IQueryable is used to cross boundaries like database queries, entity framework queries or OData queries.
When an IQueryable is iterated over, the query is translated to the remote provider's idiom and executed there. When the response is received from the remote provider, it is translated to a local object representation.

Deferred Execution is good because your user may never use the result set and hence there would have been no point querying the data source.
There may be some LINQ methods your user can't use unless they cast the result to IQueryable which means you might restrict what they can do, or force them to cast/copy the list into something more useful.
If you use a List, then you're hard coding your solution to a List, do you care what the implementation of the collection is, does your user ... probably not as long as it supports the necessary interfaces.

Related

OData accent-insensitive filter

How can I apply a filter accent-insensitive? In OData the "eq" operator is case and accent sensitive. The case is easy to fix, because the "tolower" but relative to the accent I'm not getting a simple solution. I know contains is supposed to be accent-insensitive but if I use contains filtering by "São José" I am only getting these responses "São José" and "São José dos Campos", it is missing "Sao Jose".
The following example filtering by "Florianopolis" is expected to return "Florianópolis", but it does not:
url: api/cidades/get?$filter=contains(nome, 'Florianopolis')
[HttpGet]
[EnableQuery]
public ActionResult<IQueryable<CidadeDTO>> Get()
{
try
{
return Ok(_mapper.Map<IEnumerable<CidadeDTO>>(_db.Cidades));
}
catch (System.Exception e)
{
return BadRequest(e.GetBaseException().Message);
}
}
It should bring aswell, like entity framework.
If your OData model was mapped directly to EF models AND an IQueryable<T> expression was passed into OK() then the query is explicitly passed through to the database engine as SQL:
SELECT * FROM Cidades WHERE nome LIKE '%Florianopolis%'
When that occurs, the Collation settings in the database connection will determine the comparison matching logic.
When your database collation is case and accent insensitive, but your data is still filtered as if it was not, then this is an indication that an IEnumerable<T> has been passed into OK() and the comparison logic is being evaluated in C# which by default in insensitive to both case and accent. Unfortunately this means that it is very likely that the entire data table has been loaded into memory first so that the filter can be applied.
In your case the OData model is mapped to DTO expressions that are mapped to the EF models via AutoMapper and that is where the generated query can break down. By calling Map() you are loading ALL records from the EF table and leaving the $filter criteria to be applied by the EnableQueryAttribute
For OData query conventions to be applied automatically you must return an IQueryable<T> from your method, or atleast pass an IQueryable<T> into the OK() response handler. With AutoMapper, you can use the Queryable Extensions to satisfy the IQueryable<T> requirement:
Queryable Extensions
When using an ORM such as NHibernate or Entity Framework with AutoMapper’s standard mapper.Map functions, you may notice that the ORM will query all the fields of all the objects within a graph when AutoMapper is attempting to map the results to a destination type.
...
ProjectTo must be the last call in the chain. ORMs work with entities, not DTOs. So apply any filtering and sorting on entities and, as the last step, project to DTOs.
In OData the last requirement (about ProjectTo) is still problematic because the EnableQueryAttribute will append the query options to the IQueryable<T> response, which will still end up materializing the entire table into memory first (IEnumerable<T>) and then apply the filter, which is still incredibly inefficient. It is this behaviour that is generally observed when someone complains about poor performance from an OData implementation, it is not always AutoMapper, but usually the pattern that the data source is loaded into memory in its entirety and then filtered. Following the default guidance for AutoMapper will lead you in this direction.
Instead we need to use an additional package: AutoMapper.Extensions.ExpressionMapping that will give us access to the UseAsDataSource extension method.
UseAsDataSource
Mapping expressions to one another is a tedious and produces long ugly code.
UseAsDataSource().For<DTO>() makes this translation clean by not having to explicitly map expressions. It also calls ProjectTo<TDO>() for you as well, where applicable.
This changes your implementation to the following:
[HttpGet]
[EnableQuery]
public ActionResult<IQueryable<CidadeDTO>> Get()
{
return Ok(_db.Cidades.UseAsDataSource().For<CidadeDTO>());
}
Don't fall into the trap of assuming that AutoMapper is necessary or best practice for an OData API implementation. If you are not using the unique features that AutoMapper provides then adding an additional abstraction layer can end up over-complicating your solution.
I'm not against AutoMapper, I use it a lot for Integrations, ETL, GraphQL and non-DDD style data schemas where the DTO models are significantly different to the underlying EF/data storage models. But it is a maintenance and performance overhead that a simple DDD data model and OData API based solution can easily do without.
Don't hire an excavator when a simple shovel will do the job.
AutoMapper is a convention based ORM that can be useful when you want to change the structure between implementation layers in your code, traditionally you might map Business Domain models that may represent aggregates or have flattened structures to highly normalised Database models.
OData is also a convention based ORM. It was designed to facilitate many of the same operations that AutoAmpper provides with the exception of Flattening and Unflattening models. These operations are deferred to the EF engine. The types exposed via OData mapping are DTOs
If your DTO models are the same relational structure as your EF models, then you would generally not use AutoMapper at all, the OData Edm mapping is optimised specifically to manage this type of workload and is designed to be and has been integrated directly into the serialization layer, making the Edm truely Data Transfer Objects that only exist over the wire and in the client.
This did the job
[HttpGet]
public ActionResult<IQueryable<PessoaDTO>> Get(ODataQueryOptions<Pessoa> options)
{
try
{
var queryResult = options.ApplyTo(_db.Pessoas);
return Ok(queryResult);
}
catch (System.Exception e)
{
return BadRequest(e.GetBaseException().Message);
}
}

IEnumerable<T>.Count() vs List<T>.Count with Entity Framework

I am retrieving a list of items using Entity Framework and if there are some items retrieved I do something with them.
var items = db.MyTable.Where(t => t.Expiration < DateTime.Now).ToList();
if(items.Count != 0)
{
// Do something...
}
The if statement could also be written as
if(items.Count() != 0)
{
// Do something...
}
In the first case, the .Count is a List<T>.Count property. In the second case, the .Count() is IEnumerable<T>.Count() extension method.
Although both approaches achieve the same result, however, is one more preferred than the other? (Possibly some difference in performance?)
Enumerable.Count<T> (the extension method for IEnumerable<T>) just calls Count if the underlying type is an ICollection<T>, so for List<T> there is no difference.
Queryable.Count<T> (the extension method for IQueryable<T>) will use the underlying query provider, which in many cases will push the count down to the actual SQL, which will perform faster than counting the objects in memory.
If a filter is applied (e.g. Count(i => i.Name = "John")) or if the underlying type is not an ICollection<T>, the collection is enumerated to compute the count.
is one more preferred than the other?
I generally prefer to use Count() since 1) it's more portable (the underlying type can be anything that implements IEnumerable<T> or IQueryable<T>) and 2) it's easier to add a filter later if necessary.
As Tim states in his comment, I also prefer using Any() to Count() > 0 since it doesn't have to actually count the items - it will just check for the existence of one item. Conversely I use !Any() instead of Count() == 0.
It depends on the underlying collection and where Linq will be pulling from. For example if it's SQL then using .ToList() will cause the query to pull back the entire list, and then count it. However, the .Count() extension method will translate it into a SQL COUNT statement on the database side. In which case there will be an obvious performance difference.
For just a standard List or Collection it's as stated in D. Stanley's answer.
I would say that it depends on what's going on inside the if block. If you're simply doing the check to determine whether to perform a sequence of operations on the underlying enumeration, then it's probably not needed in any event. Simply iterate over the enumeration (omitting ToList as well). If you're not using the collection inside the if block, then you should avoid using ToList and definitely use Any over any Count/Count() method.
Once you've performed the ToList then you're no longer using Entity Framework and I expect that Count() is only marginally slower than Count since, if the underlying collection is ICollection<T> it defers to that implementation. The only overhead would be determining whether it implements that interface.
http://msdn.microsoft.com/en-us/library/bb338038.aspx
Remarks:
If the type of source implements ICollection<T>, that implementation is used to obtain the count of elements. Otherwise, this method determines the count.

Linq To Sql Where does not call overridden Equals

I'm currently working on a project where I'm going to do a lot of comparison of similar non-database (service layer objects for this discuss) objects and objects retrieved from the database via LinqToSql. For the sake of this discussion, assume I have a service layer Product object with a string field that is represented in the database. However, in the database, there is also a primary key Id that is not represented in the service layer.
Accordingly (as I often do for unit testing etc), I overrode Equals(Object), Equals(Product), and GetHashCode and implemented IEquatable with the expectation that I would be able to write code like this:
myContext.Products.Where(p => p.Equals(passedInProduct).SingleOrDefault();
And so forth.
The Equals override is tested and works. The objects are mutable so the usual caveats apply to the GetHashCode override. However, for the purposes of this example, the objects are not modified except by LtS and could be made readonly.
Here's the simple test:
Create a test object in memory and commit to the LtS context. By committing, the test object is populated with a few auto-generated fields.
Create another identical test object in memory (separate reference)
Attempt to retrieve the first object from the database using the second object as the criteria. (see code line above).
// Setup
string productDesc = "12A";
Product testProduct1 = _CreateTestProductInDatabase(productDesc);
Product testProduct2 = _CreateTestProduct(productDesc);
// check setup
Product retrievedProduct1 = ProductRepo.Retrieve(testProduct1);
//Assert.IsNotNull(retrievedProduct1);
// execute - try to retrieve the 'equivalent' product object
Product retrievedProduct2 = ProductRepo.Retrieve(testProduct2);
A simplified version of Retrieve (cruft removed is just parameter checks etc):
using (var dbContext = new ProductDataContext()) {
Product retrievedProduct = dbContext.Products
.Where(p => p.Equals(product)).SingleOrDefault();
NB: The overridden Equals method knows not to care about the auto-generated fields from the database and only looks at the string that is represented in the service layer.
Here's what I observed:
Retrieve on testProduct1 succeeds (no surprise, equal by reference)
Retrieve on testProduct2 fails (null)
The overridden Equals method called in the Retrieve method is never hit during either Retrieve calls
However, the overridden Equals method is called multiple times by the context on SubmitChanges (called when creating the first test object in the database) (works as expected).
Statically, the compiler knows that the type of the objects being emitted and is able to resolve the type.
So my specific questions:
Am I trying to do something ill-advised? Seems like a straightforward use of Equals.
Corollary to first question: alternate suggestions to deal with linq to sql equality checking while keeping comparison details inside the objects rather than the repository
Why might I have observed the Equals method being resolved in SubmitChanges but not in the Where clause?
I'm as much interested in understanding as making my Equals calls work. But I also would love to learn how to make this 'pattern' work rather than just understand why it appears to be an 'anti-pattern' in the contest of LtS and C#.
Please don't suggest I just filter directly on the context with Where statements. Obviously, I can remove the Equals call and do that. However, some of the other objects (not presented here) are large and a bit complicated. For the sake of maintenance and clarity, I want to keep knowledge of how to compare itself to another of its own type in one place and ideally as part of the object in question.
Some other things I tried that didn't change the behavior:
Overloaded and used == instead
Casting the lambda variable to the type p => (Product)p
Getting an IQueryable object first and calling Equals in the Where clause
Some other things I tried that didn't work:
Creating a static ProductEquals(Product first, Product second) method: System.NotSupportedException:has no supported translation to SQL.
Thanks StackOverflow contributors!
Re Possible dups: I've read ~10 other questions. I'd love a pointer to an exact duplicate but most don't seem to directly address what seems to be an oddity of LinqToSql.
Am I trying to do something ill-advised?
Absolutely. Consider what LINQ to SQL does: it creates a SQL representation of your query. It doesn't know what your overridden Equals method does, so can't translate that logic into SQL.
Corollary to first question: alternate suggestions to deal with linq to sql equality checking while keeping comparison details inside the objects rather than the repository
You'd need to do something with expression trees to represent the equality that way - and then build those expression trees up into a full query. It won't be fun, but it should be possible. It will affect how you build all your queries though.
I would have expected most database representations to be ID-based though, so you should be able to just compare IDs for equality. Usually when I've seen attempts to really model data in an OO fashion but store it in a database, the leakiness of the abstraction has caused a lot of pain.
Why might I have observed the Equals method being resolved in SubmitChanges but not in the Where clause?
Presumably SubmitChanges is working against a set of in-memory objects to work out what's changed - it doesn't have to do any conversion to SQL to do that part.

Is there an implementation of IQueryable over DbDataReader?

I have a lot of existing code which uses raw ADO.NET (DbConnection, DbDataReader, etc). I would like to transition to using LINQ to SQL for new code, but for now put both the existing and new code behind a unified set of Repository classes.
One issue I have is this: I would like the Repository classes to expose result sets as IQueryable<> which I get for free with LINQ to SQL. How do I wrap my existing DbDataReader result sets in an IQueryable? Do I have to implement IQueryable over DbDataReader from scratch?
Note I am aware of LINQ to DataSet, but I don't use DataSets because of memory scale issues, as the result sets I deal with can be quite large (order of 1000s). This implies that the IQueryable over DbDataReader implementation will need to be efficient as well (i.e. don't cache results in memory).
I can't see any benefit in implement IQueryable<T> - that suggests more functionality than is actually available - however, you could implement it as an IEnumerable<T> easily enough, with the caveat that it is once-only. An iterator block would be a reasonable choice:
public static IEnumerable<IDataRecord> AsEnumerable(
this IDataReader reader)
{
while (reader.Read())
{
yield return reader; // a bit dangerous
}
}
The "a bit dangerous" is because the caller could cast it back and abuse it...

Do you ToList()?

Do you have a default type that you prefer to use in your dealings with the results of LINQ queries?
By default LINQ will return an IEnumerable<> or maybe an IOrderedEnumerable<>. We have found that a List<> is generally more useful to us, so have adopted a habit of ToList()ing our queries most of the time, and certainly using List<> in our function arguments and return values.
The only exception to this has been in LINQ to SQL where calling .ToList() would enumerate the IEnumerable prematurely.
We are also using WCF extensively, the default collection type of which is System.Array. We always change this to System.Collections.Generic.List in the Service Reference Settings dialog in VS2008 for consistency with the rest of our codebase.
What do you do?
ToList always evaluates the sequence immediately - not just in LINQ to SQL. If you want that, that's fine - but it's not always appropriate.
Personally I would try to avoid declaring that you return List<T> directly - usually IList<T> is more appropriate, and allows you to change to a different implementation later on. Of course, there are some operations which are only specified on List<T> itself... this sort of decision is always tricky.
EDIT: (I would have put this in a comment, but it would be too bulky.) Deferred execution allows you to deal with data sources which are too big to fit in memory. For instance, if you're processing log files - transforming them from one format to another, uploading them into a database, working out some stats, or something like that - you may very well be able to handle arbitrary amounts of data by streaming it, but you really don't want to suck everything into memory. This may not be a concern for your particular application, but it's something to bear in mind.
We have the same scenario - WCF communications to a server, the server uses LINQtoSQL.
We use .ToArray() when requesting objects from the server, because it's "illegal" for the client to change the list. (Meaning, there is no purpose to support ".Add", ".Remove", etc).
While still on the server, however, I would recommend that you leave it as it's default (which is not IEnumerable, but rather IQueryable). This way, if you want to filter even more based on some criteria, the filtering is STILL on the SQL side until evaluated.
This is a very important point as it means incredible performance gains or losses depending on what you do.
EXAMPLE:
// This is just an example... imagine this is on the server only. It's the
// basic method that gets the list of clients.
private IEnumerable<Client> GetClients()
{
var result = MyDataContext.Clients;
return result.AsEnumerable();
}
// This method here is actually called by the user...
public Client[] GetClientsForLoggedInUser()
{
var clients = GetClients().Where(client=> client.Owner == currentUser);
return clients.ToArray();
}
Do you see what's happening there? The "GetClients" method is going to force a download of ALL 'clients' from the database... THEN the Where clause will happen in the GetClientsForLoogedInUser method to filter it down.
Now, notice the slight change:
private IQueryable<Client> GetClients()
{
var result = MyDataContext.Clients;
return result.AsQueryable();
}
Now, the actual evaluation won't happen until ".ToArray" is called... and SQL will do the filtering. MUCH better!
In the Linq-to-Objects case, returning List<T> from a function isn't as nice as returning IList<T>, as THE VENERABLE SKEET points out. But often you can still do better than that. If the thing you are returning ought to be immutable, IList is a bad choice because it invites the caller to add or remove things.
For example, sometimes you have a method or property that returns the result of a Linq query or uses yield return to lazily generate a list, and then you realise that it would be better to do that the first time you're called, cache the result in a List<T> and return the cached version thereafter. That's when returning IList may be a bad idea, because the caller may modify the list for their own purposes, which will then corrupt your cache, making their changes visible to all other callers.
Better to return IEnumerable<T>, so all they have is forward iteration. And if the caller wants rapid random access, i.e. they wish they could use [] to access by index, they can use ElementAt, which Linq defines so that it quietly sniffs for IList and uses that if available, and if not it does the dumb linear lookup.
One thing I've used ToList for is when I've got a complex system of Linq expressions mixed with custom operators that use yield return to filter or transform lists. Stepping through in the debugger can get mighty confusing as it jumps around doing lazy evaluation, so I sometimes temporarily add a ToList() to a few places so that I can more easily follow the execution path. (Although if the things you are executing have side-effects, this can change the meaning of the program.)
It depends if you need to modify the collection. I like to use an Array when I know that no one is going to add/delete items. I use a list when I need to sort/add/delete items. But, usually I just leave it as IEnumerable as long as I can.
If you don't need the added features of List<>, why not just stick with IQueryable<> ?!?!?! Lowest common denominator is the best solution (especially when you see Timothy's answer).

Resources