How to use a Dictionary or Hashtable for LINQ query performance underneath an OData service - performance

I am very new to OData (only started on it yesterday) so please excuse me if this question is too dumb :-)
I have built a test project as a Proof of Concept for migrating our current web services to OData. For this test project, I am using Reflection Providers to expose POCO classes via OData. These POCO classes come from in-memory cache. Below is the code so far:
public class DataSource
{
public IQueryable<Category> CategoryList
{
get
{
List<Category> categoryList = GetCategoryListFromCache();
return categoryList.AsQueryable();
}
}
// below method is only required to allow navigation
// from Category to Product via OData urls
// eg: OData.svc/CategoryList(1)/ProductList(2) and so on
public IQueryable<Category> ProductList
{
get
{
return null;
}
}
}
[DataServiceKeyAttribute("CategoryId")]
public class Category
{
public int CategoryId { get; set; }
public string CategoryName { get; set; }
public List<Product> ProductList { get; set; }
}
[DataServiceKeyAttribute("ProductId")]
public class Product
{
public int ProductId { get; set; }
public string ProductName { get; set; }
}
To the best of my knowledge, OData is going to use LINQ behind the scenes to query these in-memory objects, ie: List in this case if somebody navigates to OData.svc/CategoryList(1)/ProductList(2) and so on.
Here is the problem though: In the real world scenario, I am looking at over 18 million records inside the cache representing over 24 different entities.
The current production web services make very good use of .NET Dictionary and Hashtable collections to ensure very fast look ups and to avoid a lot of looping. So to get to a Product having ProductID 2 under Category having CategoryID 1, the current web services just do 2 look ups, ie: first one to locate the Category and the second one to locate the Product inside the Category. Something like a btree.
I wanted to know how could I follow a similar architecture with OData where I could tell OData and LINQ to use Dictionary or Hashtables for locating records rather than looping over a Generic List?
Is it possible using Reflection Providers or I am left with no other choice but to write my custom provider for OData?
Thanks in advance.

You will need to process expression trees, so you will need at least partial IQueryable implementation over the underlying LINQ to Objects. For this you don't need a full blown custom provider though, just return you IQueryable from the propties on the context class.
In that IQueryable you would have to recognize filters on the "key" properties (.Where(p => p.ProductID = 2)) and translate that into a dictionary/hashtable lookup. Then you can use LINQ to objects to process the rest of the query.
But if the client issues a query with filter which doesn't touch the key property, it will end up doing a full scan. Although, your custom IQueryable could detect that and fail such query if you choose so.

Related

Entity Splitting For One-To-Many table relationships

Following this article (What are best practices for multi-language database design?), I have all my database tables splitted in two: the first table contains only language-neutral data (primary key, etc.) and the second table contains one record per language, containing the localized data plus the ISO code of the language. The relationship between the two tables is one to many.
Here a screenshot of the datamodel: https://dl.dropboxusercontent.com/u/17099565/datamodel.jpg
Because the website has 8 languages, for each record in table "CourseCategory" I have 8 record in table "CourseCategoryContents". The same happens with "Course" and "CourseContent"
Then I use Entity Splitting in order to have only one entity for the Course Category and one entity for the Course:
public class CourseCategoryConfiguration : EntityTypeConfiguration<WebCourseCategory>
{
public CourseCategoryConfiguration()
{
Map(m =>
{
m.Properties(i => new { i.Id, i.Order, i.Online });
m.ToTable("CourseCategories");
});
Map(m =>
{
m.Properties(i => new { i.LanguageCode, i.Name, i.Permalink, i.Text, i.MetaTitle, i.MetaDescription, i.MetaKeywords });
m.ToTable("CourseCategoryContents");
});
}
}
public class CourseConfiguration : EntityTypeConfiguration<WebCourse>
{
public CourseConfiguration()
{
Map(m =>
{
m.Properties(i => new { i.Id, i.CategoryId, i.Order, i.Label, i.ThumbnailUrl, i.HeaderImageUrl });
m.ToTable("Courses");
});
Map(m =>
{
m.Properties(i => new { i.LanguageCode, i.Name, i.Permalink, i.Text, i.MetaTitle, i.MetaDescription, i.MetaKeywords, i.Online });
m.ToTable("CourseContents");
});
}
}
Then to retrive the courses in a desired language including their category I do this:
using (WebContext dbContext = new WebContext())
{
// all courses of all categories in the desired language
return dbContext.Courses
.Include(course => course.Category)
.Where(course => course.LanguageCode == lan
&& course.Category.LanguageCode == lan)
.ToList();
}
}
Entity splitting works fine with one-to-one relationships, but here I have one-to-many relationships.
The website has contents (CourseCategories and Courses) in 3 languages ("en", "de", "fr").
EF correctly returns all the Courses with their Category in the right language (eg. in english), but returns each record 3 times. This is because I have the CourseCategory in 3 languages too.
The only one working solution I came up is avoiding using ".Include(Category)", getting all the courses in the desired language in first, then, in a foreach cycle, for each Course retriving its Category in language. I don't like this lazy loading approach, I would like to retrive all the desired data in one shot.
Thanks!
The best solution is to map tables to the model as it then in your model Course class will have a navigation property ICollection<CourseCategoryContent>.
In this case you just project this model to DTO or ViewModel "according to your application design"
e.g.
Your model will look like this
public class Course
{
public int Id {get; set;}
public int Order {get; set;}
public ICollection<CourseCategoryContent> CourseCategoryContents {get; set;}
}
public class CourseCategoryContent
{
public string LanguageId {get; set;}
public string Name {get; set;}
}
Then just create new DTO or ViewModel like :
public class CourseDTO
{
public int Id {get; set;}
public int Order {get; set;}
public string Name {get; set;}
}
Finally do the projection
public IQueryable<CourseDTO> GetCourseDTOQuery ()
{
return dbContext.Courses.Select(x=>new CourseDTO{
Id = x.Id,
Order = x.Order,
Name = x.CourseCategoryContents.FirstOrDefault(lang => lang.LanguageId == lang).Name,
});
}
And note that the return type is IQueryable so you could do any filter, Order or grouping operation on it before hitting the database.
hope this helped
No fix-all answer i'm afraid, every way has a compromise.
I've used both the database approach (10+ language dependent tables) and the resource file approach in fairly large projects, if the data is static and doesn't change (i.e you don't charge a different price or whatever) I would definately consider abstracting language away from your database model and using Resource keys then loading your data from files.
The reason or this is the problem you are experiencing right now where you can't filter includes (this may have changed in EF6 perhaps? I know it's on the list of things to do). You might be able to get away with reading it into memory and filtering them though like you're doing but this meant it wasn't very performant for us and I had to write Stored Procedures that I just passed the iso language and executed in EF.
From a maintenance point of view it was easier as well, for the DB project I had to write an admin console so people could log on and edit values for different languages etc. Using resource files I just copy-pasted the values into excel and emailed them to the people we use to translate.
It depends on the complexity of your project and what you prefer, i'd still consider both approaches in future.
TLDR: options that i've found are:
1) filter in memory
2) lazy load with filter
3) write stored procedure to EF and map that result
4) use resources instead
Hope this helps
EDIT: After looking at diagram it looks like you may need to search against the language dependant values? In that case resources probably won't work. If you're just letting them navigate off a menu then you're good to go.

NHibernate Many-To-Many Performance Issue

My application has the following entities (with a many-to-many relationship between Product and Model):
public class TopProduct {
public virtual int Id { get; set; }
public virtual Product Product { get; set; }
public virtual int Order { get; set; }
}
public class Product {
public virtual int Id { get; set; }
public virtual string Name { get; set; }
public virtual IList<Model> Models { get; set; }
}
public class Model {
public virtual string ModelNumber { get; set; }
public virtual IList<Product> Products { get; set; }
}
Note: A product could have 1000s of models.
I need to display a list of TopProducts and the first 5 models (ordered alphabetically) against each one.
For example say I have the following query:
var topProducts = session.Query<TopProduct>()
.Cacheable()
.Fetch(tp => tp.Product).ThenFetchMany(p => p.Models)
.OrderBy(tp => tp.Order)
.ToList();
If I now say:
foreach (var topProduct in topProducts) {
var models = topProduct.Product.Models.Take(5).ToList();
...
}
This executes extremely slowly as it retrieves an item from the second level cache for each model. Since there could be 1000s of models against a product, it would need to retrieve 1000s of items from the cache the second time it is executed.
I have been racking my brain trying to think of a better way of doing this but so far I am out of ideas. Unfortunately my model and database cannot be modified at this stage.
I'd appreciate the help. Thanks
The key to your problem is understanding how entity and query caching work.
Entity caching stores, essentially, the POID of an entity and its property values.
When you want to get/initialize an instance, NH will first check the cache to see if the values are there, in order to avoid a db query.
Query caching, on the other hand, stores a query as the key (to simplify, let's say it's the command text and the parameter values), and a list of entity ids as the value (this is assuming your result is a list of entities, and not a projection)
When NH executes a cacheable query, it will see if the results are cached. If they are, it will load the proxies from those ids. Then, as you use them, it will initialize them one by one, either from the entity cache or from the db.
Collection cache is similar.
Usually, getting many second-level cache hits for those entity loads is a good thing. Unless, of course, you are using a distributed cache located in a separate machine, in which case this is almost as bad as getting them from the db.
If that is the case, I suggest you skip caching the query.

List vs IEnumerable vs IQueryable when defining Navigation property

I want to create a new model object named Movie_Type in my ASP.NET MVC web application. What will be the differences if I define the navigation proprty of this class to be either List, ICollection or IQueryable as following?
public partial class Movie_Type
{
public int Id { get; set; }
public string Name { get; set; }
public string Description { get; set; }
public List<Movie> Movies { get; set; }
}
OR
public partial class Movie_Type
{
public int Id { get; set; }
public string Name { get; set; }
public string Description { get; set; }
public IQueryable<Movie> Movies { get; set; }
}
OR
public partial class Movie_Type
{
public int Id { get; set; }
public string Name { get; set; }
public string Description { get; set; }
public ICollection<Movie> Movies { get; set; }
}
Edit:-
#Tomas Petricek.
thanks for your reply. in my case i am using the database first approach and then i use DbContext template to map my tables, which automatically created ICollection for all the navigation properties, So my questions are:-
1. Does this mean that it is not always the best choice to use Icollection. And i should change the automatically generated classes to best fit my case.
2. Secondly i can manage to choose between lazy or Eager loading by defining .include such as
var courses = db.Courses.Include(c => c.Department);
Regardless of what i am using to define the navigation properties. So i can not understand ur point.
3. i did not ever find any examples or tutorials that use IQuerable to define the navigation properties ,, so what might be the reason?
BR
You cannot use a navigation property of type IQueryable<T>. You must use ICollection<T> or some collection type which implements ICollection<T> - like List<T>. (IQueryable<T> does not implement ICollection<T>.)
The navigation property is simply an object or a collection of objects in memory or it is null or the collection is empty.
It is never loaded from the database when you load the parent object which contains the navigation property from the database.
You either have to explicitely say that you want to load the navigation property together with the parent which is eager loading:
var movieTypes = context.Movie_Types.Include(m => m.Movies).ToList();
// no option to filter or sort the movies collection here.
// It will always load the full collection into memory
Or it will be loaded by lazy loading (which is enabled by default if your navigation property is virtual):
var movieTypes = context.Movie_Types.ToList();
foreach (var mt in movieTypes)
{
// one new database query as soon as you access properties of mt.Movies
foreach (var m in mt.Movies)
{
Console.WriteLine(m.Title);
}
}
The last option is explicit loading which comes closest to your intention I guess:
var movieTypes = context.Movie_Types.ToList();
foreach (var mt in movieTypes)
{
IQueryable<Movie> mq = context.Entry(mt).Collection(m => m.Movies).Query();
// You can use this IQueryable now to apply more filters
// to the collection or sorting, for example:
mq.Where(m => m.Title.StartWith("A")) // filter by title
.OrderBy(m => m.PublishDate) // sort by date
.Take(10) // take only the first ten of result
.Load(); // populate now the nav. property
// again this was a database query
foreach (var m in mt.Movies) // contains only the filtered movies now
{
Console.WriteLine(m.Title);
}
}
There are two possible ways of looking at things:
Is the result stored in memory as part of the object instance?
If you choose ICollection, the result will be stored in memory - this may not be a good idea if the data set is very large or if you don't always need to get the data. On the other hand, when you store the data in memory, you will be able to modify the data set from your program.
Can you refine the query that gets sent to the SQL server?
This means that you would be able to use LINQ over the returned property and the additional LINQ operators would be translated to SQL - if you don't choose this option, additional LINQ processing will run in memory.
If you want to store data in memory, then you can use ICollection. If you want to be able to refine the query, then you need to use IQueryable. Here is a summary table:
| | Refine query | Don't change query |
|-----------------|--------------|--------------------|
| In-memory | N/A | ICollection |
| Lazy execution | IQueryable | IEnumerable |
More of a standard is IEnumerable as it is the least common denominator.
Iqueryable can be returned if you want extra querying functionality to the caller without having 10 repository methods to handle varying querying scenarios.
A downside is ienumerable could 'count()' slowly but if the object implements ICollection then this interface is checked for this value first without having to enumerate all items.
Also be aware if you return iqueryable to an untrusted caller they can do some casting and method calls on the iqueryable and get access to the context, connection, connection string, run queries, etc
Also note nhibernate for example has a query object you can pass to a repository to specify options. With entity framework you need to return IQueryable to enhance querying criteria
The collection that entity framework actually creates for you if you use virtual navigation properties implements ICollection, but not IQueryable, so you cannot use IQueryable for your navigation properties, as Slauma says.
You are free to define your properties as IEnumerable, as ICollection extends IEnumerable, but if you do this then you will lose your ability to add new child items to these navigation properties.

Beginner EF4 / CodeFirst / MVC3 help

Although I love what I'm learning, I'm finding it a struggle and need some help
I've been using these two tutorials which I think are awesome:
http://weblogs.asp.net/scottgu/archive/2010/07/16/code-first-development-with-entity-framework-4.aspx
http://msdn.microsoft.com/en-us/data/gg685467
Currently my main problem/confusion is:
I have a CodeFirst table/entity I don't know how to correctly get data from other tables/entities to show in my views:
public class Car {
public int ID { get; set; }
public string Name { get; set; }
public int EngineID { get; set; }
public virtual Engine { get; set; }
}
public class Engine {
public int ID { get; set; }
public string Name { get; set; }
public string Manufacturer { get; set; }
// (plus a whole lot of other things)
}
Now when I create a View for Cars (using the List type/option) I get a nice autogenerated list
#foreach (var item in Model) {
<tr>
<td>#item.ID</td>
<td>#item.Name</td>
<td>#item.EngineID</td>
</tr>
Perfect... except EngineID is mostly worthless to the viewer, and I want to show Engine.Name instead
So I assumed I could use EF lazy loading:
<td>#item.Engine.Name</td>
Unfortunately when I tried that, it says my ObjectContext has been disposed so can't get any further data requiring a connection
Then I tried going to the controller and including the Engine.Name
var cars = (from c in db.Cars.Include("Engine.Name") select c;
Which tells me: Entities.Engine does not declare a navigation property with the name 'Name'
... ? Lies
Include("Engine") works fine, but all I want is the Name, and Include("Engine") is loading a large amount of things I don't want
Previously in a situation like this I have created a view in the DB for Car that includes EngineName as well. But with CodeFirst and my noobness I haven't found a way to do this
How should I be resolving this issue?
I thought perhaps I could create a Model pretty much identical to the Car entity, but add Engine.Name to it. This would be nice as I could then reuse it in multiple places, but I am at a loss on how to populate it etc
Wanting to learn TDD as well but the above is already frustrating me :p
Ps any other tutorial links or handy things to read will be greatly appreciated
It isn't lies as you are actually trying to include a property that's a 2nd level down withouth giving it a way to navigate. If you let EF generate your DB with this structure, it would likely have made a navigation table called something like Car_Engine and if you include the name without the object it HAS mapped, then it's not got a navigation property in your new object.
The simple way around this is to go:
(from c in db.Cars.Include("Engine") select new { c, EngineName = c.Engine.Name }
If you still get navigation property errors then you might need to make sure your are mapping to your schema correctly. This can be done with EntityTypeConfiguration classes using the fluent API - very powerful.
This of course won't help in strongly typing your car object to show in MVC.
If you'd like to get around this, your gut feeling is right. It's pretty common to use viewmodels that are read only (by design, not necessarily set to readonly) classes that provide simple views of your data.
Personally I keep my model quite clean and then have another project with viewmodels and a presentation project to populate. I'd avoid using overlapping entities in your core model as it might lead to unpredictable behaviour in the data context and at least a peristance nightmare when updating multiple entities (ie who's responsible for updating the engine name?).
Using you viewmodels, you can have a class called CarSummaryView or something with only the data you want on it. This also solves the issue of being vulnerable to overposting or underposting on your site. It can be populated by the above query quite easily.
PS There's a bunch of advantages to using viewmodels beyond just not loading full heirarchies. One of the biggest is the intrinsic benefit it gives you with avoiding over and underposting scenarios.
There's loads of ways to implement viewmodels, but as a simple CarView example:
public class CarView
{
public int ID { get; set; }
public string Name { get; set; }
public string EngineName { get; set; }
}
This should be clearly seperated from your entity model. In a big project, you'd have a single viewmodels project that the presenter can return, but in a smaller one you just need them in the same layer as the service code.
To populate it directly from the query, you can do the following
List<CarView> cars = (from c in db.Cars.Include("Engine.Name") select new CarView() { ID = c.ID, Name = c.Name, EngineName = c.Engine.Name }).ToList();

Seeking recommendation for 3-tiered LINQ Query in Entity Framework

I currently have a LINQ query that is correctly retrieving all relevant poll questions and their associated responses. In this query, I'm using the .Include() method to retrieve the responses. I like this approach because it makes the code in my View simple -- basically I have a #foreach for the responses nested inside a #foreach for the questions.
Now, I'd like to add response-specific information such as # of votes today, # of votes this week and # of votes overall. Again, these would be retrieved and displayed for each response of each question.
Is there an efficient LINQ solution that would allow me to continue using my .Include() method and my nested #foreach loops or do I need to scrap the .Include() method and use joins to pull everything together?
If it matters for performance reasons, this is being written in .net MVC-3.
Thanks in advance for your opinions/suggestions.
I like this approach because it makes the code in my View simple -- basically I have a #foreach for the responses nested inside a #foreach for the questions.
Personally I wouldn't be satisfied with this. Why writing loops in your view when you can use Display Temapltes? As far as your question about including the # of votes today, # of votes this week and # of votes overall is concerned the answer, as always, is to use a view model which is specifically tailored to the needs of the view:
public class QuestionViewModel
{
public int VotesToday { get; set; }
public int VotesThisWeek { get; set; }
public int TotalVotes { get; set; }
public IEnumerable<ResponseViewModel> { get; set; }
}
then you would pass an IEnumerable<QuestionViewModel> to your view and it will look like this:
#model IEnumerable<AppName.Models.QuestionViewModel>
#Html.DisplayForModel()
and in ~/Views/Shared/DisplayTemplates/QuestionViewModel.cshtml
#model AppName.Models.QuestionViewModel
<div>#Model.VotesToday</div>
<div>#Model.VotesThisWeek</div>
<div>#Model.TotalVotes</div>
#Html.DisplayFor(x => x.ResponseViewModel)
and in ~/Views/Shared/DisplayTemplates/ResponseViewModel.cshtml:
#model AppName.Models.ResponseViewModel
<div>#Model.Body</div>
Now, that's a clean view.
Let's move to the controller now:
public class QuestionsController: Controller
{
private readonly IQuestionsRepository _repository;
public QuestionsController(IQuestionsRepository _repository)
{
_repository = repository;
}
public ActionResult Index()
{
IEnumerable<Question> model = _repository.GetQuestions();
IEnumerable<QuestionViewModel> viewModel = Mapper
.Map<IEnumerable<Question>, IEnumerable<QuestionViewModel>>(model);
return View(viewModel);
}
}
Here we have abstracted the data access away into a repository so that the controller should never know anything about EF or whatever data access technology you are using. A controller should only know about your model, your view model and abstraction of how to manipulate the model (in this case the repository interface).
As far as the conversion between the your model and the view model is concerned you could use AutoMapper (the Mapper.Map<TSource, TDest> part in my example).
As far as the repository is concerned, that's an implementation detail: whether you perform one or three queries to your database it's up to you. All that's needed is that you are capable of aggregating the required information.

Resources