Help me understand entity framework 4 caching for lazy loading

Help me understand entity framework 4 caching for lazy loading - caching

I am getting some unexpected behaviour with entity framework 4.0 and I am hoping someone can help me understand this. I am using the northwind database for the purposes of this question. I am also using the default code generator (not poco or self tracking). I am expecting that anytime I query the context for the framework to only make a round trip if I have not already fetched those objects. I do get this behaviour if I turn off lazy loading. Currently in my application I am breifly turning on lazy loading and then turning it back off so I can get the desired behaviour. That pretty much sucks, so please help. Here is a good code example that can demonstrate my problem.
Public Sub ManyRoundTrips()
context.ContextOptions.LazyLoadingEnabled = True
Dim employees As List(Of Employee) = context.Employees.Execute(System.Data.Objects.MergeOption.AppendOnly).ToList()
'makes unnessesary round trip to the database, I just loaded the employees'
MessageBox.Show(context.Employees.Where(Function(x) x.EmployeeID < 10).ToList().Count)
context.Orders.Execute(System.Data.Objects.MergeOption.AppendOnly)
For Each emp As Employee In employees
'makes unnessesary trip to database every time despite orders being pre loaded.'
Dim i As Integer = emp.Orders.Count
Next
End Sub
Public Sub OneRoundTrip()
context.ContextOptions.LazyLoadingEnabled = True
Dim employees As List(Of Employee) = context.Employees.Include("Orders").Execute(System.Data.Objects.MergeOption.AppendOnly).ToList()
MessageBox.Show(employees.Where(Function(x) x.EmployeeID < 10).ToList().Count)
For Each emp As Employee In employees
Dim i As Integer = emp.Orders.Count
Next
End Sub
Why is the first block of code making unnessesary round trips?

Your expectation is not correct. Queries always query the DB. Always. That's because LINQ is always converted to SQL.
To load an object from the context if it's already been fetched and from the DB if it hasn't, use ObjectContext.GetObjectByKey().

The first 'unnecessary' trip is necessary - you did a new query and the database could have changed in the meantime. If you used the employees variable instead (where you have stored the result of the query) it wouldn't need to make a trip to the database.
The second one is necessary because you are asking it to fetch the Orders for each Employee. With Lazy loading and no Include() it hasn't read the Orders until you ask it to with emp.Orders.Count().
Remember that until you start iterating on a query (or call some method that requires it to iterate) LINQ to EF does nothing. If you save that query in a variable and then call .Count() on it there will be a round trip. If you use that same query and start enumerating it there will be another round trip. If the entities in that query themselves have relationships and lazy loading is on each time you access one there will be yet another round trip.
Your second example shows how to do this right if you know ahead of time that you want the Orders, that's eager loading. Notice how you don't go back to the context to ask it again for the employees, you reuse the one you've already loaded.

Related

Memory/Efficiency with Linq and large data sets

So you know the background I'm coming from, I've been a professional programmer for over twelve years. My best language by far is C# but I've done C, C++, and most recently objectiveC. I've done a lot of work accessing data in databases but I haven't done as much UI work as most people (Except in IOS).
Recently I've begun using the Entity framework in C# for a job and I must say I wish I'd discovered it sooner. I wouldn't say it's the best thing since sliced bread but it's pretty damned close. After using it for a while it got me thinking about best practices and usage as compared to the old school method of using IDBConnections and IDBCommands for everything.
I was coding for a situation where I was going to be listing the contents of a table of users from a database in a bound data grid with the intention of giving the user the ability to do standard CRUD stuff. I started off by making an User class and a IUserManager interface with a corresponding implementation. Each user is assigned to a department and naturally there'd need to be a way to perform CRUD on departments too so I added a Department class, an IDepartmentManager interface and an implementation for that too. I set it up so that the grid bound on the results of the .GetAll() method on the IUserManager interface. Then I started filling in the guts.
I don't have the code in front of me any more but I basically used IDBConnection to tap into the datastore with an IDBCommand using a SQL query. Then I called command.ExecuteReader() and iterated the .Read() method on the IDataReader object. Using the ordinal for each column I pulled out the data, validated it and slipped it into a User class and added the class to a Dictionary that the method would then return. All the DB classes are of course IDisposable so wrapping them in a using takes care of cleaning up the mess.
Pretty standard stuff, I've done it a bazillion times.
That's when I realized that the departmentId I was pulling from the DB wasn't what I wanted to display in my grid. Telling someone 'this guy is in department 7' isn't as useful as saying 'this guy is in accounting'. So I first toyed with modding my query to get both the departmentId and name, and storing the name on the user object for display later. Then I decided to give the user a Department class instance that it would hang onto during it's lifetime that would be populated. That's when I converted the guts to linq.
public Dictionary<int, User> GetAll()
{
var result = new Dictionary<int, User>();
using (var datastore = new myEntities())
{
result = (from user in datastore.userInfoes
join department in datastore.userDepartmentInfoes on user.departmentID equals department.departmentID
select new User()
{
UserIndex = user.id,
FirstName = user.firstName,
LastName = user.lastName,
Department = new Department()
{
DepartmentId = user.departmentID.Value,
DepartmentName = department.departmentName,
},
Username = user.userName,
}
).ToDictionary(x => x.UserIndex, x => x);
}
return result;
}
That's where I started thinking (read: over-analysing probably)
The implementation I had would work just fine. It would even work pretty well for a small dataset. It'll even work fine for a largish dataset (say 10,000). Even if you counted every person in the company I currently work for five times over you'd have less than a thousand people.
But what if for a second I worked for a really big honking company that had 10 million employees? That would result in the departmentName strings being duplicated potentially millions of times.
That also got me thinking that unlike IOS's MVC implementation this particular situation wasn't going to query just enough users to fill the screen and then handle paging and stuff. As soon as the calling code refresh the data binding it was going to pull all 10 million users all at once and pass back the collection. That's going to be slow.
So that leaves me with the idea in my head that this method is both slow and inefficient with larger data sets. Not only that but the fact that there might be 2 million instances of 'Accounting' held with this data set it is going to be a major memory hog. We're also kind of defeating the purpose of a relational database here because of the Department class inside the User. In the DB you just have a departmentId int foreign key referencing an entry in another table. The link only occurs when you cross reference to the other table and even then there's really only one 'Accounting' string at any one time. In the above code you're going to have a whole lot of 'Accounting' strings floating around waiting to be cleaned up.
An MVC scenario would basically 'know' that it takes X number of entries to fill the grid's viewable area. It would only query X at a time starting from index Y and as the user navigated it would query and display additional records as needed. That's a heck of a lot better than querying all 10 million and letting them hang out somewhere whether they're displayed or not.
Like I said, I may very well be over-analysing this. I might also be incorrect in some of my assumptions with the way linq works. But in the interest of learning I figured I had to ask: What is the best way to do something like this? Is this sort of thing ok for small datasets? Would the whole thing be better off as an MCV implementation rather than pulling in the entire dataset to be displayed in the grid?

If you need the whole set of data in memory - you will have to load it anyway. I am sure you will not list 10kk users in a grid, right? The techniques that comes up is paging. Check this article from msdn with examples.
As for departments objects, does your UserInfo has a foreign key to the department? If so you should just have userInfo.Department available to you and no joins are needed.
If you bind the department data to the grid columns, why having the property of Department type? I assume your User class is something you bind to UI. Flatten it out into:
class User
{
Username
UserIndex
FirstName
LastName
DepartmentId
DepartmentName
}
What is the purpose of GetAll()? You return a dictionary and it feels like you need to enable lookups by id. Or do you use the result to enumerate the users?
For lookups, consider talking to the database to get you a single user data when needed. Implement caching if makes sense next.
For enumeration, do not return dictionary - that is all-in-memory object, return IEnumerable with yielded (paged?) results or even better IQueryable so that calling GetAll() doesn't execute the sql call right away, and the calling code can scope the call down by adding necessary filters

Performace issue using Foreach in LINQ

I am using an IList<Employee> where i get the records more then 5000 by using linq which could be better? empdetailsList has 5000
Example :
foreach(Employee emp in empdetailsList)
{
Employee employee=new Employee();
employee=Details.GetFeeDetails(emp.Emplid);
}
The above example takes a lot of time in order to iterate each empdetails where i need to get corresponding fees list.
suggest me anybody what to do?

Linq to SQL/Linq to Entities use a deferred execution pattern. As soon as you call For Each or anything else that indirectly calls GetEnumerator, that's when your query gets translated into SQL and performed against the database.
The trick is to make sure your query is completely and correctly defined before that happens. Use Where(...), and the other Linq filters to reduce as much as possible the amount of data the query will retrieve. These filters are built into a single query before the database is called.
Linq to SQL/Linq to Entities also both use Lazy Loading. This is where if you have related entities (like Sales Order --> has many Sales Order Lines --> has 1 Product), the query will not return them unless it knows it needs to. If you did something like this:
Dim orders = entities.SalesOrders
For Each o in orders
For Each ol in o.SalesOrderLines
Console.WriteLine(ol.Product.Name)
Next
Next
You will get awful performance, because at the time of calling GetEnumerator (the start of the For Each), the query engine doesn't know you need the related entities, so "saves time" by ignoring them. If you observe the database activity, you'll then see hundreds/thousands of database roundtrips as each related entity is then retrieved 1 at a time.
To avoid this problem, if you know you'll need related entities, use the Include() method in Entity Framework. If you've got it right, when you profile the database activity you should only see a single query being made, and every item being retrieved by that query should be used for something by your application.

If the call to Details.GetFeeDetails(emp.Emplid); involves another round-trip of some sort, then that's the issue. I would suggest altering your query in this case to return fee details with the original IList<Employee> query.

Can I use the auto-generated Linq-to-SQL entity classes in 'disconnected' mode?

Suppose I have an automatically-generated Employee class based on the Employees table in my database.
Now suppose that I want to pass employee data to a ShowAges method that will print out name & age for a list of employees. I'll retrieve the data for a given set of employees via a linq query, which will return me a set of Employee instances. I can then pass the Employee instances to the ShowAges method, which can access the Name & Age fields to get the data it needs.
However, because my Employees table has relationships with various other tables in my database, my Employee class also has a Department field, a Manager field, etc. that provide access to related records in those other tables. If the ShowAges method were to invoke any of those methods, this would cause lots more data to be fetched from the database, on-demand.
I want to be sure that the ShowAges method only uses the data I have already fetched for it, but I really don't want to have to go to the trouble of defining a new class which replicates the Employee class but has fewer methods. (In my real-world scenario, the class would have to be considerably more complex than the Employee class described here; it would have several 'joined' classes that do need to be populated, and others that don't).
Is there a way to 'switch off' or 'disconnect' the Employees instances so that an attempt to access any property or related object that's not already populated will raise an exception?
If not, then I assume that since this must be a common requirement, there might be an already-established pattern for doing this sort of thing?

maybe not the answer you're looking for,but how about projecting the results of your query into a more light-weight POCO, eg:
var employeePOCOs = from e in l2sEmployees
select new EmployeePOCO
{
Id = e.Id,
Name = e.FirstName + " " + e.LastName
};
where EmployeePOCO is a predefined class
would that help? I've used this when returning Entity Framework objects back through an AJAX call where the output was going to JSON, and it seemed to do the trick.

One way to do this is to 'detach' the entity from its database context. Take a look at an answer I gave to a similar question. It shows you a couple ways of detaching entities.

Using Linq SubmitChanges without TimeStamp and StoredProcedures the same time

I am using Sql tables without rowversion or timestamp. However, I need to use Linq to update certain values in the table. Since Linq cannot know which values to update, I am using a second DataContext to retrieve the current object from database and use both the database and the actual object as Input for the Attach method like so:
Public Sub SaveCustomer(ByVal cust As Customer)
Using dc As New AppDataContext()
If (cust.Id > 0) Then
Dim tempCust As Customer = Nothing
Using dc2 As New AppDataContext()
tempCust = dc2.Customers.Single(Function(c) c.Id = cust.Id)
End Using
dc.Customers.Attach(cust, tempCust)
Else
dc.Customers.InsertOnSubmit(cust)
End If
dc.SubmitChanges()
End Using
End Sub
While this does work, I have a problem though: I am also using StoredProcedures to update some fields of Customer at certain times. Now imagine the following workflow:
Get customer from database
Set a customer field to a new value
Use a stored procedure to update another customer field
Call SaveCustomer
What happens now, is, that the SaveCustomer method retrieves the current object from the database which does not contain the value set in code, but DOES contain the value set by the stored procedure. When attaching this with the actual object and then submit, it will update the value set in code also in the database and ... tadaaaa... set the other one to NULL, since the actual object does not contain the changed made by the stored procedure.
Was that understandable?
Is there any best practice to solve this problem?

If you make changes behind the back of the ORM, and don't use concurrency checking - then you are going to have problems. You don't show what you did in step "3", but IMO you should update the object model to reflect these changes, perhaps using OUTPUT TSQL paramaters. Or; stick to object-oriented.
Of course, doing anything without concurrency checking is a good way to lose data - so my preferred option is simply "add a rowversion". Otherwise, you could perhaps read the updated object out and merge things... somehow guessing what the right data is...

If you're going to disconnect your object from one context and use another one for the update, you need to either retain the original object, use a row version, or implement some sort of hashing routine in your database and retain the hash as part of your object. Of these, I highly recommend the Rowversion option as well. Using the current value as the original value like you are trying to do is only asking for concurrency problems.

Using LINQ with stored procedure that returns multiple instances of the same entity per row

Our development policy dictates that all database accesses are made via stored procedures, and this is creating an issue when using LINQ.
The scenario discussed below has been somewhat simplified, in order to make the explanation easier.
Consider a database that has 2 tables.
Orders (OrderID (PK), InvoiceAddressID (FK), DeliveryAddressID (FK) )
Addresses (AddresID (PK), Street, ZipCode)
The resultset returned by the stored procedure has to rename the address related columns, so that the invoice and delivery addresses are distinct from each other.
OrderID InvAddrID DelAddrID InvStreet DelStreet InvZipCode DelZipCode
1 27 46 Main St Back St abc123 xyz789
This, however, means that LINQ has no idea what to do with these columns in the resultset, as they no longer match the property names in the Address entity.
The frustrating thing about this is that there seems to be no way to define which resultset columns map to which Entity properties, even though it is possible (to a certain extent) to map entity properties to stored procedure parameters for the insert/update operations.
Has anybody else had the same issue?
I'd imagine that this would be a relatively common scenarios, from a schema point of view, but the stored procedure seems to be the key factor here.

Have you considered creating a view like the below for the stored procedure to select from? It would add complexity, but allow LINQ to see the Entity the way you wanted.
Create view OrderAddress as
Select o.OrderID
,i.AddressID as InvID
,d.AddressID as DelID
...
from Orders o
left join Addresses i
on o.InvAddressID= i.AddressID
left join Addresses d
on o.DelAddressID = i.AddressID

LINQ is a bit fussy about querying data; it wants the schema to match. I suspect you're going to have to bring that back into an automatically generated type, and do the mapping to you entity type afterwards in LINQ to objects (i.e. after AsEnumerable() or similar) - as it doesn't like you creating instances of the mapped entities manually inside a query.
Actually, I would recommend challenging the requirement in one respect: rather than SPs, consider using UDFs to query data; they work similarly in terms of being owned by the database, but they are composable at the server (paging, sorting, joinable, etc).
(this bit a bit random - take with a pinch of salt)
UDFs can be associated with entity types if the schema matches, so another option (I haven't tried it) would be to have a GetAddress(id) udf, and a "main" udf, and join them:
var qry = from row in ctx.MainUdf(id)
select new {
Order = ctx.GetOrder(row.OrderId),
InvoiceAddress = ctx.GetAddress(row.InvoiceAddressId),
DeliveryAddress = ctx.GetAddress(row.DeliveryAddressId)) };
(where the udf just returns the ids - actually, you might have the join to the other udfs, making it even worse).
or something - might be too messy for serious consideration, though.

If you know exactly what columns your result set will include, you should be able to create a new entity type that has properties for each column in the result set. Rather than trying to pack the data into an Order, for example, you can pack it into an OrderWithAddresses, which has exactly the structure your stored procedure would expect. If you're using LINQ to Entities, you should even be able to indicate in your .edmx file that an OrderWithAddresses is an Order with two additional properties. In LINQ to SQL you will have to specify all of the columns as if it were an entirely unrelated data type.
If your columns get generated dynamically by the stored procedure, you will need to try a different approach: Create a new stored procedure that only pulls data from the Orders table, and one that only pulls data from the addresses table. Set up your LINQ mapping to use these stored procedures instead. (Of course, the only reason you're using stored procs is to comply with your company policy). Then, use LINQ to join these data. It should be only slightly less efficient, but it will more appropriately reflect the actual structure of your data, which I think is better programming practice.

I think I understand what you're after, but I could wildy off...
If you mock up classes in a DBML (right-click -> new -> class) that are the same structure as your source tables, you could simply create new objects based on what is read from the stored procedure. Using LINQ to objects, you could still query your selection. It's more code, but it's not that hard to do. For example, mock up your DBML like this:
Pay attention to the associations http://geeksharp.com/screens/orders-dbml.png
Make sure you pay attention to the associations I added. You can expand "Parent Property" and change the name of those associations to "InvoiceAddress" and "DeliveryAddress." I also changed the child property names to "InvoiceOrders" and "DeliveryOrders" respectively. Notice the stored procedure up top called "usp_GetOrders." Now, with a bit of code, you can map the columns manually. I know it's not ideal, especially if the stored proc doesn't expose every member of each table, but it can get you close:
public List<Order> GetOrders()
{
// our DBML classes
List<Order> dbOrders = new List<Order>();
using (OrderSystemDataContext db = new OrderSystemDataContext())
{
// call stored proc
var spOrders = db.usp_GetOrders();
foreach (var spOrder in spOrders)
{
Order ord = new Order();
Address invAddr = new Address();
Address delAddr = new Address();
// set all the properties
ord.OrderID = spOrder.OrderID;
// add the invoice address
invAddr.AddressID = spOrder.InvAddrID;
invAddr.Street = spOrder.InvStreet;
invAddr.ZipCode = spOrder.InvZipCode;
ord.InvoiceAddress = invAddr;
// add the delivery address
delAddr.AddressID = spOrder.DelAddrID;
delAddr.Street = spOrder.DelStreet;
delAddr.ZipCode = spOrder.DelZipCode;
ord.DeliveryAddress = delAddr;
// add to the collection
dbOrders.Add(ord);
}
}
// at this point I have a List of orders I can query...
return dbOrders;
}
Again, I realize this seems cumbersome, but I think the end result is worth a few extra lines of code.

this it isn't very efficient at all, but if all else fails, you could try making two procedure calls from the application one to get the invoice address and then another one to get the delivery address.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio