How to map a collection efficiently in an ORM? - performance

I have a simple model containing two entities:
Post entity - an aggregate root, which can have a list of Comment -> Comments
Comment entity
When adding a new comment, lets say it's this:
post.Comments.Add(newComment);
System will request a SELECT to load all existing comments before adding the new one. But if there are hundreds of comments so it's heavy right. Is it possible to avoid the load? Thanks for any suggestion.

This is a classic problem with ORMs. There is no silver bullet here.
If you load all the comments in advance you're causing a lot of bandwidth wasted since all the comments might not be needed.
On the other hand, if you don't load the comments in advance and someone does something like:
for(var i = 0; i < 100; i++){
doSomethingWith(post.Comments[i]);
}
You're causing a hundred fetches from the database - effectively causing the Select n+1 issue.
So, it's an opinionated design choice, some ORMs will only fetch the post, others will include special syntax like .Include which indicates you're also interested in the comments and some will ignore this completely and waste requests.

You're concerned about performance of inserting a new comment, so maybe below options can help:
Reuse the db connection from nhibernate, write your own insert command
With support of Dapper, write your insert query https://github.com/StackExchange/dapper-dot-net

Related

Is there a way to sort a content query by the value of a field programmatically?

I'm working on a portal based on Orchard CMS. We're using Orchard to manage the "normal" content of the site, as well as to model what's essentially data for a small application embedded in it.
We figured that doing it that way is "recommended" for working in Orchard, and that it would save us duplicating a bunch of effort in features that Orchard already provides, mainly generating a good enough admin UI. This is also why we're using fields wherever possible.
However, for said application, the client wants to be able to display the data in the regular UI in a garden-variety datagrid that can be filtered, sorted, and paged.
I first tried to implement this by cobbling together a page with a bunch of form elements for the filtering, above a projection with filters bound to query string parameters. However, I ran into the following issues with this approach:
Filters for numeric fields crash when the value is missing - as would be pretty common to indicate that the given field shouldn't be considered when filtering. (This I could achieve by changing the implementation in the Orchard source, which would however make upgrading trickier later. I'd prefer to keep anything I haven't written untouched.)
It seems the sort order can only be defined in the administration UI, it doesn't seem to support tokens to allow for the field to sort by to be changed when querying.
So I decided to dump that approach and switched to trying to do this with just MVC controllers that access data using IContentQuery. However, there I found out that:
I have no clue how, if at all, it's possible to sort the query based on field values.
Or, for that matter, how / if I can filter.
I did take a look at the code of Orchard.Projections, however, how it handles sorting is pretty inscrutable to me, and there doesn't seem to be a straightforward way to change the sort order for just one query either.
So, is there any way to achieve what I need here with the rest of the setup (which isn't little) unchanged, or am I in a trap here, and I'll have to move every single property I wish to use for sorting / filtering into a content part and code the admin UI myself? (Or do something ludicrous, like create one query for every sortable property and direction.)
EDIT: Another thought I had was having my custom content part duplicate the fields that are displayed in the datagrids into Hibernate-backed properties accessible to query code, and whenever the content item is updated, copy values from these fields into the properties before saving. However, again, I'm not sure if this is feasible, and how I would be able to modify a content item just before it's saved on update.
Right so I have actually done a similar thing here to you. I ended up going down both approaches, creating some custom filters for projections so I could manage filters on the frontend. It turned out pretty cool but in the end projections lacked the raw querying power I needed (I needed to filter and sort based on joins to aggregated tables which I think I decided I didn't know how I could do that in projections, or if its nature of query building would allow it). I then decided to move all my data into a record so I could query and filter it. This felt like the right way to go about it, since if I was building a UI to filter records it made sense those records should be defined in code. However, I was sorting on users where each site had different registration data associated to users and (I think the following is a terrible affliction many Orchard devs suffer from) I wanted to build a reusable, modular system so I wouldn't have to change anything, ever!
Didn't really work out quite like I hoped, but to eventually answer the question in your title: yes, you can query fields. Orchard projections builds an index that it uses for querying fields. You can access these in HQL, get the ids of the content items, then call getmany to get them all. I did this several years ago, and I cant remember much but I do remember having a distinctly unenjoyable time with it haha. So after you have an nhibernate session you can write your hql
select distinct civr.Id
from Orchard.ContentManagement.Records.ContentItemVersionRecord civr
join civ.ContentItemRecord cir
join ci.FieldIndexPartRecord fipr
join fipr.StringFieldIndexRecord sfir
This just shows you how to join to the field indexes. There are a few, for each different data type. This is the string one I'm joining here. They are all basically the same, with a PropertyName and value field. Hql allows you to add conditions to your join so we can use that to join with the relevant field index records. If you have a part called Group attached directly to your content type then it would be like this:
join fipr.StringFieldIndexRecord sfir
with sfir.PropertyName = 'MyContentType.Group.'
where sfir.Value = 'HR'
If your field is attached to a part, replace MyContentType with the name of your part. Hql is pretty awesome, can learn more here: https://docs.jboss.org/hibernate/orm/3.3/reference/en/html/queryhql.html But I dunno, it gave me a headache haha. At least HQL has documentation though, unlike Orchard's query layer. Also can always fall back to pure SQL when HQL wont do what you want, there is an option to write SQL queries from the NHibernate session.
Your other option is to index your content types with lucene (easy if you are using fields) then filter and search by that. I quite liked using that, although sometimes indexes are corrupted, or need to be rebuilt etc. So I've found it dangerous to rely on it for something that populates pages regularly.
And pretty much whatever you do, one query to filter and sort, then another query to getmany on the contentmanager to get the content items is what you should accept is the way to go. Good luck!
You can use indexing and the Orchard Search API for this. Sebastien demoed something similar to what you're trying to achieve at Orchard Harvest recently: https://www.youtube.com/watch?v=7v5qSR4g7E0

Linq security - hide columns

I am struggling a bit on what probably is a simple matter or something I misunderstand... But anyway, using Linq entity code first, I am trying to keep some of my tables to be inaccessible from the client, without success.
Using Breeze, I have made a datacontext that holds only the dbsets I want exposed, fine.
But when I write a query using .expand(). For example, let's say I have a posts table which I want to expose, and an Owner table that I want to hide.
Using a query like:
var query = EntityQuery
.from('Posts')
.expand('Owner');
I can still see all the columns from Owner.
So the question is: in Linq, how am I supposed to secure/protect/hide the tables, and/or specific columns, that I want to hide?
After some digging, all I have found is the [JsonIgnore] attribute, which seems insufficient to me.
What is the best way to do this? I feel I am missing something probably huge here, but it's the end of the day around here...
Thanks
If you are using the Breeze's WebApi implementation then Breeze also supports ODataQueryOptions ( see here and here ).
This allows you to mark up your controller methods so as to limit how the query is interpreted. For example, to only allow filtering on your 'Posts' query and therefore exclude the ability to "expand" or "select" 'Owners' from any 'Posts' request you could do the following.
[Queryable(AllowedQueryOptions=AllowedQueryOptions.Filter| AllowedQueryOptions.Top | AllowQueryOptions.Skip)]
public IQueryable<Posts> Posts() {
....
}
Ok apparently my question was already addressed here:
Risks of using OData and IQueryable
I just found it.

When would it be worth it to maintain an inverse relationship in Doctrine2?

In the Doctrine manual, under Constrain relationships as much as possible, it gives the advice "Eliminate nonessential associations" and "avoid bidirectional associations if possible". I don't understand what criteria would make an association "essential".
I say this because it seems that you would often want to go from the One side of a One-to-Many association rather than from the Many side. For example, I would want to get all of a User's active PhoneNumbers, rather than get all active PhoneNumbers and their associated User. This becomes more important when you have to traverse multiple One-to-Many relations, e.g. if you wanted to see all Users with a MissedCall from the last two days (MissedCall->PhoneNumber->User).
This is how the simple case would look with an inverse association:
SELECT * FROM User u
LEFT JOIN u.PhoneNumbers p WITH p.active
It would make it more sensible if there were a way to go across a given relation in the opposite direction in DQL, like the following raw SQL:
SELECT * FROM User u
LEFT JOIN PhoneNumber p ON p.User_id = u.id AND p.active
Can someone explain why they give this advice, and in what cases it would be worth ignoring?
-- Edit --
If there are mitigating factors or other workarounds, please give me simple example code or a link.
I do not see any way to traverse a relation's inverse when that inverse is not defined, so I'm going to assume that building custom DQL is not in fact a solution -- there are some joins that are trivial with SQL that are impossible with DQL, and hydration probably wouldn't work anyway. This is why I don't understand why adding inverse relations is a bad idea.
Using Doctrine, I only define relationships when they're needed. This means that all of the relationships defined are actually used in the codebase.
For projects with a large team working on different areas of the project, not everyone will be accustomed to Doctrine, it's current configuration, and eager/lazy loading relationships. If you define bi-directional relationships where they aren't essential and possibly don't make sense, it could potentially lead to extra queries for data that:
may not be used
may have been selected previously
Defining only essential relationships will allow you greater control over how you and your team traverse through your data and reduce extra or overly large queries
Updated 22/08/2011
By essential relationships, I mean the ones you use. It doesn't make sense to define a relationship you wouldn't use. For example:
\Entity\Post has a defined relationship to both \Entity\User and \Entity\Comment
Use $post->user to get author
Use $post->comments to get all comments
\Entity\User has a defined relationship to both \Entity\Post and \Entity\Comment
Use $user->posts to get all user posts
Use $user->comments to get all user comments
\Entity\Comment only has a relationship to \Entity\User
Use $comment->user to get author
Cannot use $comment->post as I don't retrieve the post it belongs to in my application
I wouldn't think of them as "Inverse" relationships. Think of them as "Bi-directional", if using the data in both directions makes sense. If it doesn't make sense, or you wouldn't use the data that way around, don't define it.
I hope this makes sense
I think this is a great question, and am looking forward to others' answers.
Generally, I've interpreted the advice you cited in the down to the following rule of thumb:
If I don't need to access the (inverse) association inside my entity, then I typically make it unidirectional. In your example of users and (missed) calls, I'd probably keep it unidirectional, and let some service class or repository handle putting together custom DQL for the odd occurrence when I needed to get a list of all users with recent missed calls. That's a case I'd consider exceptional -- most of the time, I'm just interested in a particular user's calls, so the unidirectional relationship works (at least until I've got so many records that I feel the need to optimize).

LINQ Projection in Entity Framework

I posted a couple of questions about filtering in an eager loading query, and I guess the EF does not support filtering inside of the Include statement, so I came up with this.
I want to perform a simple query where get a ChildProdcut by sku number and it PriceTiers that are filtered for IsActive.
Dim ChildProduct = ChildProductRepository.Query.
Where(Function(x) x.Sku = Sku).
Select(Function(x) New With {
.ChildProduct = x,
.PriceTiers = x.PriceTiers.
Where(Function(y) y.IsActive).
OrderBy(Function(y) y.QuantityStart)
}).Select(Function(x) x.ChildProduct).Single
Is there a more efficient way of doing this? I am on the right track at all? It does work.
Another thing I really don't understand is why does this work? Do you just have to load an object graph and the EF will pick up on that and see that these collections belong to the ChildProduct even though they are inside of an anonymous type?
Also, what are the standards for formatting a long LINQ expression?
Is there a more efficient way of doing this? I am on the right track at all?
Nope, that's about the way you do this in EF and yes, you're on the right track.
Another thing I really don't understand is why does this work?
This is considered to be a bit of a hack, but it works because EF analyzes the whole expression and generates one query (it would look about the same as if you just used Include, but with the PriceTiers collection filtered). As a result, you get your ChildProducts with the PriceTiers populated (and correctly filtered). Obviously, you don't need the PriceTiers property of your anonymous class (you discard it by just selecting x.ChildProduct), but adding it to the LINQ query tells EF to add the join and the extra where to the generated SQL. As a result, the ChildProduct contains all you need.
If this functionality is critcal, create a stored procedure and link entity framework to it.

What do you do with a one-off piece of data that needs to be persisted?

Recently I've been requested to add on something for the administrator of a site where he can 'feature' something.
For this discussion let's say it's a 'featured article'.
So naturally we already have a database model of 'articles' and it has ~20 columns as it is so I really do not feel like bloating it anymore than it already is.
My options:
Tack on a 'featured' bool (or int) and realize that only one thing will be featured at any given time
Create a new model to hold this and any other feature-creep items that might pop up.
I take your suggestions! ;)
What do you do in this instance? I come across this every now and then and I just hate having to tack on one more column to something. This information DOES need to be persisted.
I'd probably just add a simple two-column table that's basically a key-value store. Then add a new column with values like (featured_article_id, 45) or whatever the first featured ID is.
Edit: as pointed out in the comments by rmeador, it should be noted that this is only a good solution as long as things stay relatively simple. If you need to store more complex data, consider figuring out a more flexible solution.
If only one article can be featured at a time it is a waste to add a bool column. You should go up a level and add a column for the FeaturedArticleID. Do you have a Site_Settings table?
You could use an extensible model like having a table of attributes, and then a linking table to form a many-to-many relationship between articles and attributes. This way, these sorts of features do not require the schema to be modified.
Have some kind of global_settings table with a parameter_name and parameter_value columns. Put featured article id here.
For quick-and-dirty stuff like this, I like to include some sort of Settings table:
CREATE TABLE Settings (
SettingName NVARCHAR(250) NOT NULL,
SettingValue NVARCHAR(250)
)
If you need per-user or per-customer settings, instead of global ones, you could add a column to identify it to that specific user/customer. Then, you could just add a row for "FeaturedArticle" and parse the ID from a string. It's not super optimized, but plaintext is very flexible, which sounds like exactly what you need.

Resources