Using local methods for querying data from DocumentDB - linq

I have the following function in my base repository
public IEnumerable<T> GetAll(ClaimsPrincipal user)
{
return client.CreateDocumentQuery<T>(collectionLink)
.Where(doc => doc.Type == DocType)
.AsEnumerable()
.Where(doc =>
doc.Owner == user.Id()
|| user.InGroup(doc.Owner)
|| doc.Public);
}
I have two extension methods on the ClaimsPrincipal class. Id() just returns .FindFirst("user_id").Value and .InGroup(<id>) checks if the user has a group-membership to the group that owns the document.
However, am I correct in assuming that once I call .AsEnumerable() the query goes to the database with only the first where-clause, returns everything it matches, and then does the second where-clause on the client side?

However, am I correct in assuming that once I call .AsEnumerable() the query goes to the database
Not quite.
When this method returns, it won't have hit the database at all. AsEnumerable() doesn't force the query to execute - it's basically just a cast to IEnumerable<T>.
You're right in saying the first Where is performed at the database and the second Where is performed client-side, but "returns everything it matches" suggests that's done in bulk - which it might be, but it may be streamed as well. The client-side Where will certainly stream, but whether the IQueryable<T> implementation fetches everything eagerly is an implementation detail.
If you're really only interested in which filters are in the database and which are local though, you're correct.

Related

Why is Entity Framework's AsEnumerable() downloading all data from the server?

What is the explanation for EF downloading all result rows when AsEnumerable() is used?
What I mean is that this code:
context.Logs.AsEnumerable().Where(x => x.Id % 2 == 0).Take(100).ToList();
will download all the rows from the table before passing any row to the Where() method and there could be millions of rows in the table.
What I would like it to do, is to download only enough to gather 100 rows that would satisfy the Id % 2 == 0 condition (most likely just around 200 rows).
Couldn't EF do on demand loading of rows like you can with plain ADO.NET using Read() method of SqlDataReader and save time and bandwidth?
I suppose that it does not work like that for a reason and I'd like to hear a good argument supporting that design decision.
NOTE: This is a completely contrived example and I know normally you should not use EF this way, but I found this in some existing code and was just surprised my assumptions turned out to be incorrect.
The short answer: The reason for the different behaviors is that, when you use IQueryable directly, a single SQL query can be formed for your entire LINQ query; but when you use IEnumerable, the entire table of data must be loaded.
The long answer: Consider the following code.
context.Logs.Where(x => x.Id % 2 == 0)
context.Logs is of type IQueryable<Log>. IQueryable<Log>.Where is taking an Expression<Func<Log, bool>> as the predicate. The Expression represents an abstract syntax tree; that is, it's more than just code you can run. Think of it as being represented in memory, at runtime, like this:
Lambda (=>)
Parameters
Variable: x
Body
Equals (==)
Modulo (%)
PropertyAccess (.)
Variable: x
Property: Id
Constant: 2
Constant: 0
The LINQ-to-Entities engine can take context.Logs.Where(x => x.Id % 2 == 0) and mechanically convert it into a SQL query that looks something like this:
SELECT *
FROM "Logs"
WHERE "Logs"."Id" % 2 = 0;
If you change your code to context.Logs.Where(x => x.Id % 2 == 0).Take(100), the SQL query becomes something like this:
SELECT *
FROM "Logs"
WHERE "Logs"."Id" % 2 = 0
LIMIT 100;
This is entirely because the LINQ extension methods on IQueryable use Expression instead of just Func.
Now consider context.Logs.AsEnumerable().Where(x => x.Id % 2 == 0). The IEnumerable<Log>.Where extension method is taking a Func<Log, bool> as a predicate. That is only runnable code. It cannot be analyzed to determine its structure; it cannot be used to form a SQL query.
Entity Framework and Linq use lazy loading. It means (among other things) that they will not run the query until they need to enumerate the results: for instance using ToList() or AsEnumerable(), or if the result is used as an enumerator (in a foreach for instance).
Instead, it builds a query using predicates, and returns IQueryable objects to further "pre-filter" the results before actually returning them. You can find more infos here for instance. Entity framework will actually build a SQL query depending on the predicates you have passed it.
In your example:
context.Logs.AsEnumerable().Where(x => x.Id % 2 == 0).Take(100).ToList();
From the Logs table in the context, it fetches all, returns a IEnumerable with the results, then filters the result, takes the first 100, then lists the results as a List.
On the other hand, just removing the AsEnumerable solves your problem:
context.Logs.Where(x => x.Id % 2 == 0).Take(100).ToList();
Here it will build a query/filter on the result, then only once the ToList() is executed, query the database.
It also means that you can dynamically build a complex query without actually running it on the DB it until the end, for instance:
var logs = context.Logs.Where(a); // first filter
if (something) {
logs = logs.Where(b); // second filter
}
var results = logs.Take(100).ToList(); // only here is the query actually executed
Update
As mentionned in your comment, you seem to already know what I just wrote, and are just asking for a reason.
It's even simpler: since AsEnumerable casts the results to another type (a IQueryable<T> to IEnumerable<T> in this case), it has to convert all the results rows first, so it has to fetch the data first. It's basically a ToList in this case.
Clearly, you understand why it's better to avoid using AsEnumerable() the way you do in your question.
Also, some of the other answers have made it very clear why calling AsEnumerable() changes the way the query is performed and read. In short, it's because you are then invoking IEnumrable<T> extension methods rather than the IQueryable<T> extension methods, the latter allowing you to combine predicates before executing the query in the database.
However, I still feel that this doesn't answer your actual question, which is a legitimate question. You said (emphasis mine):
What I mean is that this code:
context.Logs.AsEnumerable().Where(x => x.Id % 2 == 0).Take(100).ToList();
will download all the rows from the table before passing any row to the Where() method and there could be millions of rows in the table.
My question to you is: what made you conclude that this is true?
I would argue that, because you are using IEnumrable<T> instead of IQueryable<T>, it's true that the query being performed in the database will be a simple:
select * from logs
... without any predicates, unlike what would have happened if you had used IQueryable<T> to invoke Where and Take.
However, the AsEnumerable() method call does not fetch all the rows at that moment, as other answers have implied. In fact, this is the implementation of the AsEnumerable() call:
public static IEnumerable<TSource> AsEnumerable<TSource>(this IEnumerable<TSource> source)
{
return source;
}
There is no fetching going on there. In fact, even the calls to IEnumerable<T>.Where() and IEnumerable<T>.Take() don't actually start fetching any rows at that moment. They simply setup wrapping IEnumerables that will filter results as they are iterated on. The fetching and iterating of the results really only begins when ToList() is called.
So when you say:
Couldn't EF do on demand loading of rows like you can with plain ADO.NET using Read() method of SqlDataReader and save time and bandwidth?
... again, my question to you would be: doesn't it do that already?
If your table had 1,000,000 rows, I would still expect your code snippet to only fetch up to 100 rows that satisfy your Where condition, and then stop fetching rows.
To prove the point, try running the following little program:
static void Main(string[] args)
{
var list = PretendImAOneMillionRecordTable().Where(i => i < 500).Take(10).ToList();
}
private static IEnumerable<int> PretendImAOneMillionRecordTable()
{
for (int i = 0; i < 1000000; i++)
{
Console.WriteLine("fetching {0}", i);
yield return i;
}
}
... when I run it, I only get the following 10 lines of output:
fetching 0
fetching 1
fetching 2
fetching 3
fetching 4
fetching 5
fetching 6
fetching 7
fetching 8
fetching 9
It doesn't iterate through the whole set of 1,000,000 "rows" even though I am chaining Where() and Take() calls on IEnumerable<T>.
Now, you do have to keep in mind that, for your little EF code snippet, if you test it using a very small table, it may actually fetch all the rows at once, if all the rows fit within the value for SqlConnection.PacketSize. This is normal. Every time SqlDataReader.Read() is called, it never only fetches a single row at a time. To reduce the amount of network call roundtrips, it will always try to fetch a batch of rows at a time. I wonder if this is what you observed, and this mislead you into thinking that AsEnumerable() was causing all rows to be fetched from the table.
Even though you will find that your example doesn't perform nearly as bad as you thought, this would not be a reason not to use IQueryable. Using IQueryable to construct more complex database queries will almost always provide better performance, because you can then benefit from database indexes, etc to fetch results more efficiently.
AsEnumerable() eagerly loads the DbSet<T> Logs
You probably want something like
context.Logs.Where(x => x.Id % 2 == 0).AsEnumerable();
The idea here is that you're applying a predicate filter to the collection before actually loading it from the database.
An impressive subset of the world of LINQ is supported by EF. It will translate your beautiful LINQ queries into SQL expressions behind the scenes.
I have come across this before.
The context command is not executed until a linq function is called, because you have done
context.Logs.AsEnumerable()
it has assumed you have finished with the query and therefore compiled it and returns all rows.
If you changed this to:
context.Logs.Where(x => x.Id % 2 == 0).AsEnumerable()
It would compile a SQL statement that would get only the rows where the id is modular 2.
Similarly if you did
context.Logs.Where(x => x.Id % 2 == 0).Take(100).ToList();
that would create a statement that would get the top 100...
I hope that helps.
LinQ to Entities has a store expression formed by all the Linq methods before It goes to an enumeration.
When you use AsEnumerable() and then Where() like this:
context.Logs.Where(...).AsEnumerable()
The Where() knows that the previous chain call has a store expression so he appends his predicate to It for lazy loading.
The overload of Where that is being called is different if you call this:
context.Logs.AsEnumerable().Where(...)
Here the Where() only knows that his previous method is an enumeration (it could be any kind of "enumerable" collection) and the only way that he can apply his condition is iterating over the collection with the IEnumerable implementation of the DbSet class, which must to retrieve the records from the database first.
I don't think you should ever use this:
context.Logs.AsEnumerable().Where(x => x.Id % 2 == 0).Take(100).ToList();
The correct way of doing things would be:
context.Logs.AsQueryable().Where(x => x.Id % 2 == 0).Take(100).ToList();
Answer with explanations here:
What's the difference(s) between .ToList(), .AsEnumerable(), AsQueryable()?
Why use AsQueryable() instead of List()?

Does this extension method efficiently materialize my IQueryable?

So I recently discovered that you can force Entity Framework not to translate your projection into SQL by specifying a Func<T, TResult> to the .Select() extension method rather than an expression. This is useful when you want to transform your queried data, but that transformation should happen in your code rather than on the database.
For example, when using EF5's new Enum support and trying to project that to a string property in a DTO, whereas this would fail:
results.Select(r => new Dto { Status = r.Status.ToString() })
this would work:
results.Select(new Func<Record, Dto>(r => new Dto { Status = r.Status.ToString() }));
because in the first (expression) case, EF can't figure out how to translate Status.ToString() to something the SQL database could perform, but as per this article Func predicates aren't translated.
Once I had this working, it wasn't much of a leap to create the following extension method:
public static IQueryable<T> Materialize<T>(this IQueryable<T> q)
{
return q.Select(new Func<T, T>(t => t)).AsQueryable();
}
So my question is - are there any pitfalls I should be wary of when using this? Is there a performance impact - either in injecting this do-nothing projection into the query pipeline or by causing EF to not send the .Where() clause to the server and thereby send all the results over the wire?
The intention is to still use a .Where() method to filter the results on the server, but then to use .Materialize() before .Select() so that the provider doesn't try to translate the projection to SQL Server:
return Context.Record
.Where(r => // Some filter to limit results)
.Materialize()
.Select(r => // Some projection to DTO, etc.);
Simply using AsEnumerable should do the same:
return Context.Record
.Where(r => // Some filter to limit results)
.AsEnumerable() // All extension methods now accept Func instead of Expression
.Select(r => // Some projection to DTO, etc.);
There is also no reason in your Materialize method to go back to IQueryable because it is not a real IQueryable translated to another query any more. It is just IEnumerable.
In terms of performance you should be OK. Everything before materialization is evaluated in the database and everything after materialization in your code. Moreover in both your and my example query has still deferred execution - it is not executed until something enumerates the query.
There is one problem though: number of columns fetched to client. In first case, it would be something like select Status from Record and in another select Status, field2, field3, field4 from Record

Cannot form a select statement for query in silverlight

I want to do something like
from table1
where col5="abcd"
select col1
I did like
query_ = From g In DomainService.GetGEsQuery Select New GE With {.Desc = g.codDesc}
"This cause a runtime error, i tried various combinations but failed"
please help.
I'm assuming your trying to do this on the client side. If so you could do something like this
DomainService.Load(DomainService.GetGEsQuery().Where(g => g.codDesc == "something"), lo =>
{
if (lo.HasError == false)
{
List<string> temp = lo.Entities.Select(a => a.Name).ToList();
}
}, null);
you could also do this in the server side (which i would personally prefer) like this
public IQueryable<string> GetGEStringList(string something)
{
return this.ObjectContext.GE.Where(g => g.codDesc == something).Select(a => a.Name);
}
Hope this helps
DomainService.GetGEsQuery() returns an IQueryable, that is only useful in a subsequent asynchronous load. Your are missing the () on the method call, but that is only the first problem.
You can apply filter operations to the query returned using Where etc, but it still needs to be passed to the Load method of your domain context (called DomainService in your example).
The example Jack7 has posted shows an anonymous callback from the load method which then accesses the results inside the load object lo and extracts just the required field with another query. Note that you can filter the query in RIA services, but not change the basic return type (i.e. you cannot filter out unwanted columns on the client-side).
Jack7's second suggestion to implement a specific method server-side, returning just the data you want, is your best option.

Test LINQ to SQL expression

I am writing an application that works with MS SQL database via LINQ to SQL. I need to perform filtering sometimes, and occasionally my filtering conditions are too complicated to be translated into SQL query. While I am trying to make them translatable, I want my application to at least work, though slow sometimes.
LINQ to SQL data model is hidden inside repositories, and I do not want to provide several GetAll method overloads for different cases and be aware of what overload to use on upper levels. So I want to test my expression inside repository to be translatable and, if no, perform in-memory query against the whole data set instead of throwing NotSupportedException on query instantiating.
This is what I have now:
IQueryable<TEntity> table = GetTable<TEntity>();
IQueryable<TEntity> result;
try
{
result = table.Where(searchExpression);
//this will test our expression
//consuming as little resources as possible (???)
result.FirstOrDefault();
}
catch (NotSupportedException)
{
//trying to perform in-memory search if query could not be constructed
result = table
.AsEnumerable()
.Where(searchExpression.Compile())
.AsQueryable();
}
return result;
searchExpression is Expression<Func<TEntity, bool>>
As you see, I am using FirstOrDefault to try to instantiate the query and throw the exception if it cannot be instantiated. However, it will perform useless database call when the expression is good. I could use Any, Count or other method, and it may well be a bit less expensive then FirstOrDefault, but still all methods that come to my mind make a costly trip to database, while all I need is to test my expression.
Is there any alternative way to say whether my expression is 'good' or 'bad', without actual database call?
UPDATE:
Or, more generally, is there a way to tell LINQ to make in-memory queries when it fails to construct SQL, so that this testing mechanism would not be needed at all?
Instead of
result.FirstOrDefault();
would it be sufficient to use
string sqlCommand = dataContext.GetCommand(result).CommandText;
?
If the expression does not generate valid Sql, this should throw a NotSupportedException, but it does not actually execute the sqlCommand.
I think this will solve your problem:
IQueryable<TEntity> table = GetTable<TEntity>();
IQueryable<TEntity> result;
try
{
return table.Where(searchExpression).ToList();
}
catch (NotSupportedException)
{
//trying to perform in-memory search if query could not be constructed
return table
.AsEnumerable()
.Where(searchExpression.Compile())
.ToList();
}
So the method returns is the expression is converted to valid SQL. Otherwise it catches the exception and runs the query in memory. This should work but it doesn't answer your question if it's possible to check if a specific searchExpression can be converted. I don't think such a thing exists.

How do you query an object set and in that same query filter an attached entity collection?

I am using Entity Framework for the first time and noticed that the entities object returns entity collections.
DBEntities db = new DBEntities();
db.Users; //Users is an ObjectSet<User>
User user = db.Users.Where(x => x.Username == "test").First(); //Is this getting executed in the SQL or in memory?
user.Posts; //Posts is an EntityCollection<Post>
Post post = user.Posts.Where(x => x.PostID == "123").First(); //Is this getting executed in the SQL or in memory?
Do both ObjectSet and EntityCollection implement IQueryable? I am hoping they do so that I know the queries are getting executed at the data source and not in memory.
EDIT: So apparently EntityCollection does not while ObjectSet does. Does that mean I would be better off using this code?
DBEntities db = new DBEntities();
User user = db.Users.Where(x => x.Username == "test").First(); //Is this getting executed in the SQL or in memory?
Post post = db.Posts.Where(x => (x.PostID == "123")&&(x.Username == user.Username)).First(); // Querying the object set instead of the entity collection.
Also, what is the difference between ObjectSet and EntityCollection? Shouldn't they be the same?
Thanks in advance!
EDIT: Sorry, I'm new to this. I'm trying to understand. Attached EntityCollections are lazy loaded, so if I access them then memory is populated with them. Rather than doing two querys to the object sets like in my last edit, I am curious if this query would be more what I was after:
DBEntities db = new DBEntities();
User user = (from x in db.Users
from y in x.Posts
where x.Username == "test"
where y.PostID == 123
select x).First();
ObjectSet<T> does implement IQueryable<T>, but EntityCollection<T> does not.
The difference is that ObjectSet<T> is meant to be used for querying directly (which is why it does implement the interface). EntityCollection<T>, on the other hand, is used for the "many" end of a result set, typically returned in a query done on an ObjectSet<T>. As such, it impelments IEnumerable<T>, but not IQueryable<T> (as it's already the populated results of a query).
I was almost ready to say yes, they both do. Luckily I check the documentation first.
EntityCollection does not implement IQueryable.
As for the difference, ObjectSet<TEntity> represents the the objects generated from a table in a database. EntityCollection<TEntity> represents a collection of entity objects on the 'Many' side of One to Many or Many to Many relationship.

Resources