NHibernate bulk loading child objects for multiple parents - performance

Let's say we have a Category table and Product table. And each Product references Category. So each Category has many Products.
I want to load many Categories without Products(to decrease DB access time) then check what Categories we actually need and define a much less subset of Categories.
After this I need to load all Products for selected Categories and attach them to Categories with a single DB query.
I can load Products separately but in that case they will not be attached to Categories.

This can be achieved with HQL and futures
given the entities and maps as follows,
public class Category
{
private IList<Product> _products;
public Category()
{
_products = new List<Product>();
}
public virtual int Id { get; set; }
public virtual string CategoryName { get; set; }
public virtual IList<Product> Products
{
get { return _products; }
set { _products = value; }
}
}
public class CategoriesClassMap : ClassMap<Category>
{
public CategoriesClassMap()
{
Table("Categories");
Id(x => x.Id).GeneratedBy.Native();
Map(x => x.CategoryName);
HasMany<Product>(c => c.Products).LazyLoad();
}
}
public class Product
{
public virtual int Id { get; set; }
public virtual string ProductName { get; set; }
public virtual Category Category { get; set; }
}
public class ProductSClassMap : ClassMap<Product>
{
public ProductSClassMap()
{
Table("Products");
Id(x => x.Id).GeneratedBy.Native();
Map(x => x.ProductName);
References<Category>(x => x.Category).Not.Nullable();
}
}
With following HQL, it will load all the categories and the products in a single query,
var categories = session.CreateQuery("from Category c join fetch c.Products where c.Id in (1,2)")
.Future<Category>().Distinct().ToList();
It only fetches data related to category id 1 and 2. The SQL generated looks like,
select category0_.Id as Id1_0_, products1_.Id as Id3_1_, category0_.CategoryName as Category2_1_0_, products1_.ProductName as ProductN2_3_1_, products1_.Category_id as Category3_3_1_, products1_.Category_id as Category3_0__, products1_.Id as Id0__ from Categories category0_ inner join Products products1_ on category0_.Id=products1_.Category_id where category0_.Id in (1 , 2);
The same (using future) is applicable for queryover or criteria

This approach, solution is natively built in NHiberante. It is called:
19.1.5. Using batch fetching
NHibernate can make efficient use of batch fetching, that is, NHibernate can load several uninitialized proxies if one proxy is accessed (or collections. Batch fetching is an optimization of the lazy select fetching strategy. There are two ways you can tune batch fetching: on the class and the collection level.
Batch fetching for classes/entities is easier to understand. Imagine you have the following situation at runtime: You have 25 Cat instances loaded in an ISession, each Cat has a reference to its Owner, a Person. The Person class is mapped with a proxy, lazy="true". If you now iterate through all cats and call cat.Owner on each, NHibernate will by default execute 25 SELECT statements, to retrieve the proxied owners. You can tune this behavior by specifying a batch-size in the mapping of Person:
<class name="Person" batch-size="10">...</class>
NHibernate will now execute only three queries, the pattern is 10, 10, 5.
You may also enable batch fetching of collections. For example, if each Person has a lazy collection of Cats, and 10 persons are currently loaded in the ISesssion, iterating through all persons will generate 10 SELECTs, one for every call to person.Cats. If you enable batch fetching for the Cats collection in the mapping of Person, NHibernate can pre-fetch collections:
<class name="Person">
<set name="Cats" batch-size="3">
...
</set>
</class>
SUMMARY: There is an optimization mapping setting: batch-size="25".
We can use it on class level (used later for many-to-one relations) or on collections (directly on one-to-many realtion)
This will lead to very few SELECT statements to load complex object graph. And most important benefit is, that we can use paging (Take(), Skip()) when we query the root entity (no multiple rows)
Check also this, with even some more links...

Related

How can I speedup IEnumerable<T> database access

I am using EF6 code first to execute a query that pulls a large amount of data for parallelized GPU processing. The linq query returns an IEnumerable.
IEnumerable<DatabaseObject> results = ( from items in _myContext.DbSet
select items).Include("Table1").Include("Table2");
Now, I need to perform some statistical analysis on the complete set of data, and present the result to the user.
Unfortunately, because of the sheer size of the returned data, just doing a
results.ToList() is taking an extremely long time to complete... and I haven't even begun the parallelized processing of the data as yet!
I there anything that I can do to make this more efficient other than reducing the amount of data being pulled? This is not an option since it is the complete set of data that needs to be processed.
EDIT 1
My current code first is as follows:
public class Orders
{
[Key]
public virtual DateTime ServerTimeId
{
get;
set;
}
public string Seller
{
get;
set;
}
public decimal Price
{
get;
set;
}
public decimal Volume
{
get;
set;
}
public List<Table1> Tables1{ get; set; }
public List<Table2> Table22{ get; set; }
}
Although by not using .Include my query speeds up significantly, if I do not use .Include ("Tables1).Include("Tables2") these fields are null
in the final result for this query:
var result = ( from items in _context.DbOrders
select orderbook ).Include("Tables1").Include("Tables2")
In my DbContext, I have defined:
public DbSet<Orderok> DbOrders { get; set; }
If there is a way to force EF6 to populate these tables without the use of .Include, then I'd be very pleased if someone could instruct me.
You can load the main table, DbOrders and the child tables separately into the context:
_myContext.Configuration.ProxyCreationEnabled = false;
_myContext.DbOrders.Load();
_myContext.Table1.Load();
_myContext.Table2.Load();
Now the context is fully charged with the data you need. I hope you won't run into an out of memory exception (because then the whole approach collapses).
Entity Framework excecutes relationship fixup, which means that it populates the navigation properties DbOrders.Table1 and DbOrders.Table1.
Disabling proxy creation has two reasons:
The materialized objects will be as light-weight as possible
Lazy loading is disabled, otherwise it would be triggered when you access a navigation property.
Now you can continue working wit the data by accessing the Local collection:
from entity in _myContext.DbOrders.Local
...
You can further try to speed op the process by unmapping all database fields that you don't need. This makes the SQL result sets smaller and the materialized object will be even lighter. To achieve that, maybe you have to create a dedicated context.

NHibernate Many-To-Many Performance Issue

My application has the following entities (with a many-to-many relationship between Product and Model):
public class TopProduct {
public virtual int Id { get; set; }
public virtual Product Product { get; set; }
public virtual int Order { get; set; }
}
public class Product {
public virtual int Id { get; set; }
public virtual string Name { get; set; }
public virtual IList<Model> Models { get; set; }
}
public class Model {
public virtual string ModelNumber { get; set; }
public virtual IList<Product> Products { get; set; }
}
Note: A product could have 1000s of models.
I need to display a list of TopProducts and the first 5 models (ordered alphabetically) against each one.
For example say I have the following query:
var topProducts = session.Query<TopProduct>()
.Cacheable()
.Fetch(tp => tp.Product).ThenFetchMany(p => p.Models)
.OrderBy(tp => tp.Order)
.ToList();
If I now say:
foreach (var topProduct in topProducts) {
var models = topProduct.Product.Models.Take(5).ToList();
...
}
This executes extremely slowly as it retrieves an item from the second level cache for each model. Since there could be 1000s of models against a product, it would need to retrieve 1000s of items from the cache the second time it is executed.
I have been racking my brain trying to think of a better way of doing this but so far I am out of ideas. Unfortunately my model and database cannot be modified at this stage.
I'd appreciate the help. Thanks
The key to your problem is understanding how entity and query caching work.
Entity caching stores, essentially, the POID of an entity and its property values.
When you want to get/initialize an instance, NH will first check the cache to see if the values are there, in order to avoid a db query.
Query caching, on the other hand, stores a query as the key (to simplify, let's say it's the command text and the parameter values), and a list of entity ids as the value (this is assuming your result is a list of entities, and not a projection)
When NH executes a cacheable query, it will see if the results are cached. If they are, it will load the proxies from those ids. Then, as you use them, it will initialize them one by one, either from the entity cache or from the db.
Collection cache is similar.
Usually, getting many second-level cache hits for those entity loads is a good thing. Unless, of course, you are using a distributed cache located in a separate machine, in which case this is almost as bad as getting them from the db.
If that is the case, I suggest you skip caching the query.

List vs IEnumerable vs IQueryable when defining Navigation property

I want to create a new model object named Movie_Type in my ASP.NET MVC web application. What will be the differences if I define the navigation proprty of this class to be either List, ICollection or IQueryable as following?
public partial class Movie_Type
{
public int Id { get; set; }
public string Name { get; set; }
public string Description { get; set; }
public List<Movie> Movies { get; set; }
}
OR
public partial class Movie_Type
{
public int Id { get; set; }
public string Name { get; set; }
public string Description { get; set; }
public IQueryable<Movie> Movies { get; set; }
}
OR
public partial class Movie_Type
{
public int Id { get; set; }
public string Name { get; set; }
public string Description { get; set; }
public ICollection<Movie> Movies { get; set; }
}
Edit:-
#Tomas Petricek.
thanks for your reply. in my case i am using the database first approach and then i use DbContext template to map my tables, which automatically created ICollection for all the navigation properties, So my questions are:-
1. Does this mean that it is not always the best choice to use Icollection. And i should change the automatically generated classes to best fit my case.
2. Secondly i can manage to choose between lazy or Eager loading by defining .include such as
var courses = db.Courses.Include(c => c.Department);
Regardless of what i am using to define the navigation properties. So i can not understand ur point.
3. i did not ever find any examples or tutorials that use IQuerable to define the navigation properties ,, so what might be the reason?
BR
You cannot use a navigation property of type IQueryable<T>. You must use ICollection<T> or some collection type which implements ICollection<T> - like List<T>. (IQueryable<T> does not implement ICollection<T>.)
The navigation property is simply an object or a collection of objects in memory or it is null or the collection is empty.
It is never loaded from the database when you load the parent object which contains the navigation property from the database.
You either have to explicitely say that you want to load the navigation property together with the parent which is eager loading:
var movieTypes = context.Movie_Types.Include(m => m.Movies).ToList();
// no option to filter or sort the movies collection here.
// It will always load the full collection into memory
Or it will be loaded by lazy loading (which is enabled by default if your navigation property is virtual):
var movieTypes = context.Movie_Types.ToList();
foreach (var mt in movieTypes)
{
// one new database query as soon as you access properties of mt.Movies
foreach (var m in mt.Movies)
{
Console.WriteLine(m.Title);
}
}
The last option is explicit loading which comes closest to your intention I guess:
var movieTypes = context.Movie_Types.ToList();
foreach (var mt in movieTypes)
{
IQueryable<Movie> mq = context.Entry(mt).Collection(m => m.Movies).Query();
// You can use this IQueryable now to apply more filters
// to the collection or sorting, for example:
mq.Where(m => m.Title.StartWith("A")) // filter by title
.OrderBy(m => m.PublishDate) // sort by date
.Take(10) // take only the first ten of result
.Load(); // populate now the nav. property
// again this was a database query
foreach (var m in mt.Movies) // contains only the filtered movies now
{
Console.WriteLine(m.Title);
}
}
There are two possible ways of looking at things:
Is the result stored in memory as part of the object instance?
If you choose ICollection, the result will be stored in memory - this may not be a good idea if the data set is very large or if you don't always need to get the data. On the other hand, when you store the data in memory, you will be able to modify the data set from your program.
Can you refine the query that gets sent to the SQL server?
This means that you would be able to use LINQ over the returned property and the additional LINQ operators would be translated to SQL - if you don't choose this option, additional LINQ processing will run in memory.
If you want to store data in memory, then you can use ICollection. If you want to be able to refine the query, then you need to use IQueryable. Here is a summary table:
| | Refine query | Don't change query |
|-----------------|--------------|--------------------|
| In-memory | N/A | ICollection |
| Lazy execution | IQueryable | IEnumerable |
More of a standard is IEnumerable as it is the least common denominator.
Iqueryable can be returned if you want extra querying functionality to the caller without having 10 repository methods to handle varying querying scenarios.
A downside is ienumerable could 'count()' slowly but if the object implements ICollection then this interface is checked for this value first without having to enumerate all items.
Also be aware if you return iqueryable to an untrusted caller they can do some casting and method calls on the iqueryable and get access to the context, connection, connection string, run queries, etc
Also note nhibernate for example has a query object you can pass to a repository to specify options. With entity framework you need to return IQueryable to enhance querying criteria
The collection that entity framework actually creates for you if you use virtual navigation properties implements ICollection, but not IQueryable, so you cannot use IQueryable for your navigation properties, as Slauma says.
You are free to define your properties as IEnumerable, as ICollection extends IEnumerable, but if you do this then you will lose your ability to add new child items to these navigation properties.

How to use a Dictionary or Hashtable for LINQ query performance underneath an OData service

I am very new to OData (only started on it yesterday) so please excuse me if this question is too dumb :-)
I have built a test project as a Proof of Concept for migrating our current web services to OData. For this test project, I am using Reflection Providers to expose POCO classes via OData. These POCO classes come from in-memory cache. Below is the code so far:
public class DataSource
{
public IQueryable<Category> CategoryList
{
get
{
List<Category> categoryList = GetCategoryListFromCache();
return categoryList.AsQueryable();
}
}
// below method is only required to allow navigation
// from Category to Product via OData urls
// eg: OData.svc/CategoryList(1)/ProductList(2) and so on
public IQueryable<Category> ProductList
{
get
{
return null;
}
}
}
[DataServiceKeyAttribute("CategoryId")]
public class Category
{
public int CategoryId { get; set; }
public string CategoryName { get; set; }
public List<Product> ProductList { get; set; }
}
[DataServiceKeyAttribute("ProductId")]
public class Product
{
public int ProductId { get; set; }
public string ProductName { get; set; }
}
To the best of my knowledge, OData is going to use LINQ behind the scenes to query these in-memory objects, ie: List in this case if somebody navigates to OData.svc/CategoryList(1)/ProductList(2) and so on.
Here is the problem though: In the real world scenario, I am looking at over 18 million records inside the cache representing over 24 different entities.
The current production web services make very good use of .NET Dictionary and Hashtable collections to ensure very fast look ups and to avoid a lot of looping. So to get to a Product having ProductID 2 under Category having CategoryID 1, the current web services just do 2 look ups, ie: first one to locate the Category and the second one to locate the Product inside the Category. Something like a btree.
I wanted to know how could I follow a similar architecture with OData where I could tell OData and LINQ to use Dictionary or Hashtables for locating records rather than looping over a Generic List?
Is it possible using Reflection Providers or I am left with no other choice but to write my custom provider for OData?
Thanks in advance.
You will need to process expression trees, so you will need at least partial IQueryable implementation over the underlying LINQ to Objects. For this you don't need a full blown custom provider though, just return you IQueryable from the propties on the context class.
In that IQueryable you would have to recognize filters on the "key" properties (.Where(p => p.ProductID = 2)) and translate that into a dictionary/hashtable lookup. Then you can use LINQ to objects to process the rest of the query.
But if the client issues a query with filter which doesn't touch the key property, it will end up doing a full scan. Although, your custom IQueryable could detect that and fail such query if you choose so.

MVC2/LINQ Repository pattern help for a beginner

I have the following method in my repository that returns a mix of two objects from my database,
public IQueryable <ICustomersAndSitesM> CustomerAndSites
{
get
{
return from customer in customerTable
join site in customerSitesTable
on customer.Id equals site.CustomerId
select new CustomersAndSitesMix(customer, site);
}
}
This is my interface code for the ICustomerAndSitesM
interface ICustomersAndSitesM
{
IQueryable<Customer> Customers { get; }
IQueryable<CustomerSite> CustomerSites { get; }
}
Im struggling with working out how and where to define CustomersAndSitesMix, should this be a seperate class or a method in the interface? and will that need to have definttions for both the customer and customer site?
I would set CustomersAndSitesMix up as a class with all the properties from your other two tables that you need. I am not sure why you are needing this interface, but I would also change the return type of your method to List<CustomersAndSitesMix> and add a ToList() on the end of your LINQ query. Then you would return an inner join of both tables, which may be what you want or you may want to add an arguement, say CustomerID to your function so you could pass it in and get only a subset.

Resources