How can I speedup IEnumerable<T> database access - linq

I am using EF6 code first to execute a query that pulls a large amount of data for parallelized GPU processing. The linq query returns an IEnumerable.
IEnumerable<DatabaseObject> results = ( from items in _myContext.DbSet
select items).Include("Table1").Include("Table2");
Now, I need to perform some statistical analysis on the complete set of data, and present the result to the user.
Unfortunately, because of the sheer size of the returned data, just doing a
results.ToList() is taking an extremely long time to complete... and I haven't even begun the parallelized processing of the data as yet!
I there anything that I can do to make this more efficient other than reducing the amount of data being pulled? This is not an option since it is the complete set of data that needs to be processed.
EDIT 1
My current code first is as follows:
public class Orders
{
[Key]
public virtual DateTime ServerTimeId
{
get;
set;
}
public string Seller
{
get;
set;
}
public decimal Price
{
get;
set;
}
public decimal Volume
{
get;
set;
}
public List<Table1> Tables1{ get; set; }
public List<Table2> Table22{ get; set; }
}
Although by not using .Include my query speeds up significantly, if I do not use .Include ("Tables1).Include("Tables2") these fields are null
in the final result for this query:
var result = ( from items in _context.DbOrders
select orderbook ).Include("Tables1").Include("Tables2")
In my DbContext, I have defined:
public DbSet<Orderok> DbOrders { get; set; }
If there is a way to force EF6 to populate these tables without the use of .Include, then I'd be very pleased if someone could instruct me.

You can load the main table, DbOrders and the child tables separately into the context:
_myContext.Configuration.ProxyCreationEnabled = false;
_myContext.DbOrders.Load();
_myContext.Table1.Load();
_myContext.Table2.Load();
Now the context is fully charged with the data you need. I hope you won't run into an out of memory exception (because then the whole approach collapses).
Entity Framework excecutes relationship fixup, which means that it populates the navigation properties DbOrders.Table1 and DbOrders.Table1.
Disabling proxy creation has two reasons:
The materialized objects will be as light-weight as possible
Lazy loading is disabled, otherwise it would be triggered when you access a navigation property.
Now you can continue working wit the data by accessing the Local collection:
from entity in _myContext.DbOrders.Local
...
You can further try to speed op the process by unmapping all database fields that you don't need. This makes the SQL result sets smaller and the materialized object will be even lighter. To achieve that, maybe you have to create a dedicated context.

Related

Net Core - .Include() results in loop causing Visual Studio debug to crash

My database has a one-to-many relation between "UsageRecord" and "Dimension"
This is modelled as follows (using a Database-First approach):
public partial class Dimension
{
...
public virtual ICollection<UsageRecord> UsageRecord { get; set; }
}
Usage Record class:
public partial class UsageRecord
{
public long Id { get; set; }
...
public long DimensionId { get; set; }
public virtual Dimension Dimension { get; set; }
}
So, if i query the list of UsageRecords (EagerLoading):
_context.Set<UsageRecord>.Where(x => x.ProductId == productId).ToList()
i get a list of UsageRecord objects I can navigate through during debug:
Please notice that the Dimension object is null, and this is correct since i haven't included it in the query.
Now, If i try to include it, the application crashes:
_context.Set<UsageRecord>.Where(x => x.ProductId == productId).Include(p => p.Dimension).ToList();
Postman exits with a 502 error, and the VS Debug first shows a list of question marks "?" before crashing.
I think this is due to the fact that by Including the Dimension object, this loops through the list of UsageRecords attached and then the Dimension again and again.
How can I avoid it?
In order to retrieve your result from LINQ query you can solve your issue in these ways:
Configure your serializer to ignore loops
Create a view model for your controller's action
Use anonymous type from Select result in your controller's action

Linq Pivot Query on Column with Unknown Values

Say I have the following class:
public class Sightings
{
public string CommonName { get; set; }
public string ScientificName { get; set; }
public string TimePeriod { get; set; }
public bool Seen { get; set; }
}
I would like to create a pivot query on TimePeriod. If I knew what the values in TimePeriod were ahead of time I would know how to write the query. However, I don't know. They could be years (e.g., 2007, 2008, 2009 etc.) or they could be months (e.g., Jan-2004, Feb-2004, March-2004) or they could be quarters (e.g., Q1-2004, Q2-2005, Q3-2004 etc.). It really depends on how the user wants to view the data.
Is it possible to write a Linq query that can do this, when the values in the pivot column are not known?
I've considered writing the values to an Excel sheet, creating a pivot sheet and then reading the values out of the sheet. I may have to do that if a Linq query won't work.
Do you know the most granular values that the user can choose ?
If for example it's months, then you can simply write a query for months (see Is it possible to Pivot data using LINQ?).
If the user then chooses years, it should be simple to aggregate those values to years.
If they want quarters, then you can aggregate to quarters and so on.
It's maybe not the best/most generic solution, but it should get the job done quite easily.

NHibernate Many-To-Many Performance Issue

My application has the following entities (with a many-to-many relationship between Product and Model):
public class TopProduct {
public virtual int Id { get; set; }
public virtual Product Product { get; set; }
public virtual int Order { get; set; }
}
public class Product {
public virtual int Id { get; set; }
public virtual string Name { get; set; }
public virtual IList<Model> Models { get; set; }
}
public class Model {
public virtual string ModelNumber { get; set; }
public virtual IList<Product> Products { get; set; }
}
Note: A product could have 1000s of models.
I need to display a list of TopProducts and the first 5 models (ordered alphabetically) against each one.
For example say I have the following query:
var topProducts = session.Query<TopProduct>()
.Cacheable()
.Fetch(tp => tp.Product).ThenFetchMany(p => p.Models)
.OrderBy(tp => tp.Order)
.ToList();
If I now say:
foreach (var topProduct in topProducts) {
var models = topProduct.Product.Models.Take(5).ToList();
...
}
This executes extremely slowly as it retrieves an item from the second level cache for each model. Since there could be 1000s of models against a product, it would need to retrieve 1000s of items from the cache the second time it is executed.
I have been racking my brain trying to think of a better way of doing this but so far I am out of ideas. Unfortunately my model and database cannot be modified at this stage.
I'd appreciate the help. Thanks
The key to your problem is understanding how entity and query caching work.
Entity caching stores, essentially, the POID of an entity and its property values.
When you want to get/initialize an instance, NH will first check the cache to see if the values are there, in order to avoid a db query.
Query caching, on the other hand, stores a query as the key (to simplify, let's say it's the command text and the parameter values), and a list of entity ids as the value (this is assuming your result is a list of entities, and not a projection)
When NH executes a cacheable query, it will see if the results are cached. If they are, it will load the proxies from those ids. Then, as you use them, it will initialize them one by one, either from the entity cache or from the db.
Collection cache is similar.
Usually, getting many second-level cache hits for those entity loads is a good thing. Unless, of course, you are using a distributed cache located in a separate machine, in which case this is almost as bad as getting them from the db.
If that is the case, I suggest you skip caching the query.

List vs IEnumerable vs IQueryable when defining Navigation property

I want to create a new model object named Movie_Type in my ASP.NET MVC web application. What will be the differences if I define the navigation proprty of this class to be either List, ICollection or IQueryable as following?
public partial class Movie_Type
{
public int Id { get; set; }
public string Name { get; set; }
public string Description { get; set; }
public List<Movie> Movies { get; set; }
}
OR
public partial class Movie_Type
{
public int Id { get; set; }
public string Name { get; set; }
public string Description { get; set; }
public IQueryable<Movie> Movies { get; set; }
}
OR
public partial class Movie_Type
{
public int Id { get; set; }
public string Name { get; set; }
public string Description { get; set; }
public ICollection<Movie> Movies { get; set; }
}
Edit:-
#Tomas Petricek.
thanks for your reply. in my case i am using the database first approach and then i use DbContext template to map my tables, which automatically created ICollection for all the navigation properties, So my questions are:-
1. Does this mean that it is not always the best choice to use Icollection. And i should change the automatically generated classes to best fit my case.
2. Secondly i can manage to choose between lazy or Eager loading by defining .include such as
var courses = db.Courses.Include(c => c.Department);
Regardless of what i am using to define the navigation properties. So i can not understand ur point.
3. i did not ever find any examples or tutorials that use IQuerable to define the navigation properties ,, so what might be the reason?
BR
You cannot use a navigation property of type IQueryable<T>. You must use ICollection<T> or some collection type which implements ICollection<T> - like List<T>. (IQueryable<T> does not implement ICollection<T>.)
The navigation property is simply an object or a collection of objects in memory or it is null or the collection is empty.
It is never loaded from the database when you load the parent object which contains the navigation property from the database.
You either have to explicitely say that you want to load the navigation property together with the parent which is eager loading:
var movieTypes = context.Movie_Types.Include(m => m.Movies).ToList();
// no option to filter or sort the movies collection here.
// It will always load the full collection into memory
Or it will be loaded by lazy loading (which is enabled by default if your navigation property is virtual):
var movieTypes = context.Movie_Types.ToList();
foreach (var mt in movieTypes)
{
// one new database query as soon as you access properties of mt.Movies
foreach (var m in mt.Movies)
{
Console.WriteLine(m.Title);
}
}
The last option is explicit loading which comes closest to your intention I guess:
var movieTypes = context.Movie_Types.ToList();
foreach (var mt in movieTypes)
{
IQueryable<Movie> mq = context.Entry(mt).Collection(m => m.Movies).Query();
// You can use this IQueryable now to apply more filters
// to the collection or sorting, for example:
mq.Where(m => m.Title.StartWith("A")) // filter by title
.OrderBy(m => m.PublishDate) // sort by date
.Take(10) // take only the first ten of result
.Load(); // populate now the nav. property
// again this was a database query
foreach (var m in mt.Movies) // contains only the filtered movies now
{
Console.WriteLine(m.Title);
}
}
There are two possible ways of looking at things:
Is the result stored in memory as part of the object instance?
If you choose ICollection, the result will be stored in memory - this may not be a good idea if the data set is very large or if you don't always need to get the data. On the other hand, when you store the data in memory, you will be able to modify the data set from your program.
Can you refine the query that gets sent to the SQL server?
This means that you would be able to use LINQ over the returned property and the additional LINQ operators would be translated to SQL - if you don't choose this option, additional LINQ processing will run in memory.
If you want to store data in memory, then you can use ICollection. If you want to be able to refine the query, then you need to use IQueryable. Here is a summary table:
| | Refine query | Don't change query |
|-----------------|--------------|--------------------|
| In-memory | N/A | ICollection |
| Lazy execution | IQueryable | IEnumerable |
More of a standard is IEnumerable as it is the least common denominator.
Iqueryable can be returned if you want extra querying functionality to the caller without having 10 repository methods to handle varying querying scenarios.
A downside is ienumerable could 'count()' slowly but if the object implements ICollection then this interface is checked for this value first without having to enumerate all items.
Also be aware if you return iqueryable to an untrusted caller they can do some casting and method calls on the iqueryable and get access to the context, connection, connection string, run queries, etc
Also note nhibernate for example has a query object you can pass to a repository to specify options. With entity framework you need to return IQueryable to enhance querying criteria
The collection that entity framework actually creates for you if you use virtual navigation properties implements ICollection, but not IQueryable, so you cannot use IQueryable for your navigation properties, as Slauma says.
You are free to define your properties as IEnumerable, as ICollection extends IEnumerable, but if you do this then you will lose your ability to add new child items to these navigation properties.

Beginner EF4 / CodeFirst / MVC3 help

Although I love what I'm learning, I'm finding it a struggle and need some help
I've been using these two tutorials which I think are awesome:
http://weblogs.asp.net/scottgu/archive/2010/07/16/code-first-development-with-entity-framework-4.aspx
http://msdn.microsoft.com/en-us/data/gg685467
Currently my main problem/confusion is:
I have a CodeFirst table/entity I don't know how to correctly get data from other tables/entities to show in my views:
public class Car {
public int ID { get; set; }
public string Name { get; set; }
public int EngineID { get; set; }
public virtual Engine { get; set; }
}
public class Engine {
public int ID { get; set; }
public string Name { get; set; }
public string Manufacturer { get; set; }
// (plus a whole lot of other things)
}
Now when I create a View for Cars (using the List type/option) I get a nice autogenerated list
#foreach (var item in Model) {
<tr>
<td>#item.ID</td>
<td>#item.Name</td>
<td>#item.EngineID</td>
</tr>
Perfect... except EngineID is mostly worthless to the viewer, and I want to show Engine.Name instead
So I assumed I could use EF lazy loading:
<td>#item.Engine.Name</td>
Unfortunately when I tried that, it says my ObjectContext has been disposed so can't get any further data requiring a connection
Then I tried going to the controller and including the Engine.Name
var cars = (from c in db.Cars.Include("Engine.Name") select c;
Which tells me: Entities.Engine does not declare a navigation property with the name 'Name'
... ? Lies
Include("Engine") works fine, but all I want is the Name, and Include("Engine") is loading a large amount of things I don't want
Previously in a situation like this I have created a view in the DB for Car that includes EngineName as well. But with CodeFirst and my noobness I haven't found a way to do this
How should I be resolving this issue?
I thought perhaps I could create a Model pretty much identical to the Car entity, but add Engine.Name to it. This would be nice as I could then reuse it in multiple places, but I am at a loss on how to populate it etc
Wanting to learn TDD as well but the above is already frustrating me :p
Ps any other tutorial links or handy things to read will be greatly appreciated
It isn't lies as you are actually trying to include a property that's a 2nd level down withouth giving it a way to navigate. If you let EF generate your DB with this structure, it would likely have made a navigation table called something like Car_Engine and if you include the name without the object it HAS mapped, then it's not got a navigation property in your new object.
The simple way around this is to go:
(from c in db.Cars.Include("Engine") select new { c, EngineName = c.Engine.Name }
If you still get navigation property errors then you might need to make sure your are mapping to your schema correctly. This can be done with EntityTypeConfiguration classes using the fluent API - very powerful.
This of course won't help in strongly typing your car object to show in MVC.
If you'd like to get around this, your gut feeling is right. It's pretty common to use viewmodels that are read only (by design, not necessarily set to readonly) classes that provide simple views of your data.
Personally I keep my model quite clean and then have another project with viewmodels and a presentation project to populate. I'd avoid using overlapping entities in your core model as it might lead to unpredictable behaviour in the data context and at least a peristance nightmare when updating multiple entities (ie who's responsible for updating the engine name?).
Using you viewmodels, you can have a class called CarSummaryView or something with only the data you want on it. This also solves the issue of being vulnerable to overposting or underposting on your site. It can be populated by the above query quite easily.
PS There's a bunch of advantages to using viewmodels beyond just not loading full heirarchies. One of the biggest is the intrinsic benefit it gives you with avoiding over and underposting scenarios.
There's loads of ways to implement viewmodels, but as a simple CarView example:
public class CarView
{
public int ID { get; set; }
public string Name { get; set; }
public string EngineName { get; set; }
}
This should be clearly seperated from your entity model. In a big project, you'd have a single viewmodels project that the presenter can return, but in a smaller one you just need them in the same layer as the service code.
To populate it directly from the query, you can do the following
List<CarView> cars = (from c in db.Cars.Include("Engine.Name") select new CarView() { ID = c.ID, Name = c.Name, EngineName = c.Engine.Name }).ToList();

Resources