How to work around a potential performance issue when using a Grails hasMany relation? - performance

Given the following domain classes:
class Post {
SortedSet tags
static hasMany = [tags: Tag]
}
class Tag {
static belongsTo = Post
static hasMany = [posts: Post]
}
From my understanding so far, using a hasMany will result in hibernate Set mapping.
However, in order to maintain uniqueness/order, Hibernate needs to load the entire set from the database and compare their hashes.
This could lead to a significant performance problem with adding and deleting posts/tags
if their sets get large. What is the best way to work around this issue?

There is no order ensured by Hibernate/GORM in the default mapping. Therefore, it doesn't have to load elements from the database in order to do the sorting. You will have your hands on a bunch of ids, but that's that extent of it.
See 19.5.2:
http://www.hibernate.org/hib_docs/reference/en/html/performance-collections.html
In general, Hibernate/GORM is going to have better performance than you expect. Unless and until you can actually prove a real-world performance issue, trust in the framework and don't worry about it.

The ordering of the set is guaranteed by the Set implementation, ie, the SortedSet. Unless you use a List, which keeps track of indexes on the db, the ordering is server-side only.
If your domain class is in a SortedSet, you have to implement Comparable in order to enable the proper sorting of the set.
The question of performance is not really a question per se. If you want to access a single Tag, you should get it by its Id. If you want the sorted tags, well, the sort only makes sense if you are looking at all Tags, not a particular one, so you end up retrieving all Tags at once. Since the sorting is performed server-side and not db-side, there is really not much difference between a SortedSet and a regular HashSet in regards to Db.

The Grails docs seems to be updated:
http://grails.org/doc/1.0.x/
In section 5.2.4 they discuss the potential performance issues for the collection types.
Here's the relevant section:
A Note on Collection Types and Performance
The Java Set type is a collection that doesn't allow duplicates. In order to ensure uniqueness when adding an entry to a Set association Hibernate has to load the entire associations from the database. If you have a large numbers of entries in the association this can be costly in terms of performance.
The same behavior is required for List types, since Hibernate needs to load the entire association in-order to maintain order. Therefore it is recommended that if you anticipate a large numbers of records in the association that you make the association bidirectional so that the link can be created on the inverse side. For example consider the following code:
def book = new Book(title:"New Grails Book")
def author = Author.get(1)
book.author = author
book.save()
In this example the association link is being created by the child (Book) and hence it is not necessary to manipulate the collection directly resulting in fewer queries and more efficient code. Given an Author with a large number of associated Book instances if you were to write code like the following you would see an impact on performance:
def book = new Book(title:"New Grails Book")
def author = Author.get(1)
author.addToBooks(book)
author.save()

Related

Laravel / Eloquent special relation type based on parsed string attribute

I have developed a system where various classes have attributes consisting of a custom formula. The formula can contain special tokens which refer to different types of object. For example an object of class FruitSalad may have the following attribute;
$contents = "[A12] + [B76]";
In somewhat abstract terms, this means "add apple 12 to banana 76". It can also get significantly more complex than that with as many as 15 or 20 references to other objects involved in one formula.
I have a trait which passes formulae such as this and each time it finds a reference to a model (i.e. "[A12]") it gets it from the database with A::find(12) and adds it to an array of component objects which can be used for other processes later on in the request.
So, in essence, it's a relationship. But instead of a pivot table to describe the relationship, there is a formula on the parent model which can include references to child models.
This is all working. Yay! But it's really inefficient because there are so many tiny queries to get single models as formulae are parsed. One request may quite easily result in hundreds of queries. Oops.
I see two potential options;
1. Get all my apples and bananas from the database at the start of the request and get them from an in-memory store instead of from the database when parsing a formula (is this the repository pattern??).
2. Create a custom relation type (something like hasManyFromFormula) which makes eager loading work so that the parsing becomes much simpler because the relevant apples and bananas would already be loaded into the parent model.
Is there a precedent for this? As for why I am doing it like this, it would a bit tough to explain in brief but suffice to say it is to support a highly configurable data retrieval system which supports as-yet unknown input data configurations.
Help!
Thanks,
Geoff
Am not completely sure if it is the best solution, but in the end I created a new directory class for basic components and then set it up in the app service provider as a singleton. The constructor for the directory class loaded all models of several relevant classes and made them available as collections throughout the app.

Laravel Repository pattern and many to many relation

In our new project we decided to use hexagonal architecture. We decided to use repository pattern to gain more data access abstraction. We are using command bus pattern as service layer.
In our dashboard page we need a lot of data and because of that we should use 3 level many to many relations (user -> projects -> skills -> review) and also skills should be active(status=1).
The problem rises here, where should i put this?
$userRepository->getDashboardData($userId).
2.$userRepository->getUser($userId)->withProjects()->withActiveSkills()->withReviews();
3.$user = $userRepository->getById();
$projects = $projectRepository->getByUserId($user->id);
$skills = $skillRepository->getActiveSkillsByProjectsIds($projectIds);
In this case, I couldn't find the benefits of repository pattern except coding to interface which can be achived with model interfac.
I think solution 3 is prefect but it adds a lot of work.
You have to decide (for example) from an object-oriented perspective if a "User" returned is one that has a collection of skills within it. If so, your returned user will already have those objects.
In the case of using regular objects, try to avoid child entities unless it makes good sense. Like, for example.. The 'User' entity is responsible for ensuring that the child entities play by the business rules. Prefer to use a different repository to select the other types of entities based on whatever other criteria.
Talking about a "relationship" in this way makes me feel like you're using ActiveRecord because otherwise they'd just be child objects. The "relationship" exists in the relational database. It only creeps into your objects if you're mixing database record / object like with AR.
In the case of using ActiveRecord objects, you might consider having specific methods on the repository to load the correctly configured member objects. $members->allIncludingSkills() or something perhaps. This is because you have to solve for N+1 when returning multiple entities. Then, you need to use eager-loading for the result set and you don't want to use the same eager loading configuration for every request.. Therefore, you need a way to delineate configurations per request.. One way to do this is to call different methods on the repository for different requests.
However, for me.. I'd prefer not to have a bunch of objects with just.. infinite reach.. For example.. You can have a $member->posts[0]->author->posts[0]->author->posts[0]->author->posts[0].
I prefer to keep things as 'flat' as possible.
$member = $members->withId($id);
$posts = $posts->writtenBy($member->id);
Or something like that. (just typing off the top of my head).
Nobody likes tons of nested arrays and ActiveRecord can be abused to the point where its objects are essentially arrays with methods and the potential for infinite nesting. So, while it can be a convenient way to work with data. I would work to prevent abusing relationships as a concept and keep your structures as flat as possible.
It's not only very possible to code without ORM 'relationship' functionality.. It's often easier.. You can tell that this functionality adds a ton of trouble because of just how many features the ORM has to provide in order to try to mitigate the pain.
And really, what's the point? It just keeps you from having to use the ID of a specific Member to do the lookup? Maybe it's easier to loop over a ton of different things I guess?
Repositories are really only particularly useful in the ActiveRecord case if you want to be able to test your code in isolation. Otherwise, you can create scopes and whatnot using Laravel's built-in functionality to prevent the need for redundant (and consequently brittle) query logic everywhere.
It's also perfectly reasonable to create models that exist SPECIFICALLY for the UI. You can have more than one ActiveRecord model that uses the same database table, for example, that you use just for a specific user-interface use-case. Dashboard for example. If you have a new use-case.. You just create a new model.
This, to me.. Is core to designing systems. Asking ourselves.. Ok, when we have a new use-case what will we have to do? If the answer is, sure our architecture is such that we just do this and this and we don't really have to mess with the rest.. then great! Otherwise, the answer is probably more like.. I have no idea.. I guess modify everything and hope it works.
There's many ways to approach this stuff. But, I would propose to avoid using a lot of complex tooling in exchange for simpler approaches / solutions. Repository is a great way to abstract away data persistence to allow for testing in isolation. If you want to test in isolation, use it. But, I'm not sure that I'm sold much on how ORM relationships work with an object model.
For example, do we have some massive Member object that contains the following?
All comments ever left by that member
All skills the member has
All recommendations that the member has made
All friend invites the member has sent
All friends that the member has established
I don't like the idea of these massive objects that are designed to just be containers for absolutely everything. I prefer to break objects into bits that are specifically designed for use-cases.
But, I'm rambling. In short..
Don't abuse ORM relationship functionality.
It's better to have multiple small objects that are specifically designed for a use-case than a few large ones that do everything.
Just my 2 cents.

do any mongodb ORMs allow you to alias fields?

I just watched this: http://blog.mongodb.org/post/38467892360/mongodb-schema-design-insights-and-tradeoffs-from
One suggestion that came out of the talk: in docs that will be replicated many times, try to make the field names as small as possible:
Reduce collection size by always using short field names as a
convention. This will help you save memory over time.
Choose "u" over "publicationUrl". Makes sense if you're talking about millions of rows. However, big readability problem there. It might be obvious that the value is a url, but what sort of url is it?
This might be solvable in the ORM though. Do any ORMs that interface with MongoDb allow you to say that 'u' in the db would map to 'publicationUrl' in the code? When you have things like a.u in code, that's pretty poor readability; article.u isn't much better.
(Ruby and node.js tags are there because those are the languages that I work with mongo in. Feel free to add tags.)
Per this discussion, Mongoose allows for virtual field names with getters and setters. Unfortunately virtuals can't be used in queries, and other server side operations such as map-reduce. The discussion also suggests this plugin for aliases as well which seems to address the query issue, but I suspect that it would also have trouble with more complex server side operations.
This is easy to with the Ruby ORM Mongoid. Here is an example straight from the docs:
class Band
include Mongoid::Document
field :n, as: :name, type: String
end
band = Band.new(name: "Placebo")
band.attributes #=> { "n" => "Placebo" }
criteria = Band.where(name: "Placebo")
criteria.selector #=> { "n" => "Placebo" }
I have used Mongoid on quite a few projects (albeit, all small ones) and really enjoy working with it. The docs are really great, and there is a section in the docs about performance as well.
Doctrine MongoDB ODM allows you to set an alias for your field while your object including getters and setters can remain readable f.e.:
/** #String(name="pUrl") */
private $publicationUrl;
Annotations Reference — Doctrine MongoDB ODM 1.0.0-BETA2 documentation — Field

What is the better way to cache fields of referenced document in Mongoid?

What is the better way to cache fields of referenced document in Mongoid?
Currently I use additional fields for that:
class Trip
include Mongoid::Document
belongs_to :driver
field :driver_phone
field :driver_name
end
class Driver
include Mongoid::Document
field :name
field :phone
end
May be it would be more clear to store cache as nested object so in mongo it would be stored as:
{ driver_cache: { name: "john", phone: 12345 } }
I thought about embedded document with 1-1 relation? Is that right choice?
Author of Mongoid (Durran Jordan) suggested folowing option
This gem looks handy for this type of thing:
https://github.com/logandk/mongoid_denormalize
Great question. I authored and maintain mongoid_alize, a gem for denormalizing mongoid relations. And so I struggled with this question.
As of 0.3.1 alize now stores data for a denormalized one-to-one in a Hash, very much like your second example above w/ driver_cache. Previous versions, however, stored data in separate fields (your first example).
The change was motivated by many factors, but largely to make the handling of one-to-one's and one-to-many's consistent (alize can denormalize one-to-one, one-to-many, and many-to-many). one-to-many's have always been handled by storing the data in a array of hashes. So storing one-to-one's as just a Hash becomes a much more symmetrical design.
It also solved several other problems. You can find a more detailed explanation here - https://github.com/dzello/mongoid_alize#release-030
Hope that helps!
Either way, you're fine.
First approach seems slightly better because it explicitly states what data you're caching. Also, it (probably) requires less work from Mongoid :-)
Alexey,
I would recommend thinking about how the data will be used. If you always use the driver information in the context of a trip object then embedding is probably the proper choice.
If, however, you will use that information in other contexts perhaps it would be better as it's own collection as you have created it.
Also, consider embedding the trips inside the driver objects. That may or may not make sense given what your app is trying to do but logically it would make sense to have a collection of drivers that each have a set of trips (embedded or not) instead of having trips that embed drivers. I can see that scenario (where a trip is always though of in the context of a driver) to be more common than the above.
-Tyler
Alternative hash storage of your cacheable data:
field :driver_cache, type: Hash
(Keep in mind, internally the keys will be converted to strings.)

Which one do you prefer for Searching/Reporting DataTable or DTO or Domain Class?

The project currently I am working in requires a lot of searhing/filtering pages. For example I have a comlex search page to get Issues by data,category,unit,...
Issue Domain Class is complex and contains lots of value objects and child objects.
.I am wondering how people deal with Searching/Filtering/Reporting for UI. As far As I know I have 3 options but none of them make me happier.
1.) Send parameters to Repository/DAO to Get DataTable and Bind DataTable to UI Controls.For Example to ASP.NET GridView
DataTable dataTable =issueReportRepository.FindBy(specs);
.....
grid.DataSource=dataTable;
grid.DataBind();
In this option I can simply by pass the Domain Layer and query database for given specs. And I dont have to get fully constructed complex Domain Object. No need for value objects,child objects,.. Get data to displayed in UI in DataTable directly from database and show in the UI.
But If have have to show a calculated field in UI like method return value I have to do this in the DataBase because I don't have fully domain object. I have to duplicate logic and DataTable problems like no intellisense etc...
2.)Send parameters to Repository/DAO to Get DTO and Bind DTO to UI Controls.
IList<IssueDTO> issueDTOs =issueReportRepository.FindBy(specs);
....
grid.DataSource=issueDTOs;
grid.DataBind();
In this option is same as like above but I have to create anemic DTO objects for every search page. Also For different Issue search pages I have to show different parts of the Issue Objects.IssueSearchDTO, CompanyIssueTO,MyIssueDTO....
3.) Send parameters to Real Repository class to get fully constructed Domain Objects.
IList<Issue> issues =issueRepository.FindBy(specs);
//Bind to grid...
I like Domain Driven Design and Patterns. There is no DTO or duplication logic in this option.but in this option I have to create lot's of child and value object that will not shown in the UI.Also it requires lot's ob join to get full domain object and performance cost for needles child objects and value objects.
I don't use any ORM tool Maybe I can implement Lazy Loading by hand for this version but It seems a bit overkill.
Which one do you prefer?Or Am I doing it wrong? Are there any suggestions or better way to do this?
I have a few suggestions, but of course the overall answer is "it depends".
First, you should be using an ORM tool or you should have a very good reason not to be doing so.
Second, implementing Lazy Loading by hand is relatively simple so in the event that you're not going to use an ORM tool, you can simply create properties on your objects that say something like:
private Foo _foo;
public Foo Foo
{
get {
if(_foo == null)
{
_foo = _repository.Get(id);
}
return _foo;
}
}
Third, performance is something that should be considered initially but should not drive you away from an elegant design. I would argue that you should use (3) initially and only deviate from it if its performance is insufficient. This results in writing the least amount of code and having the least duplication in your design.
If performance suffers you can address it easily in the UI layer using Caching and/or in your Domain layer using Lazy Loading. If these both fail to provide acceptable performance, then you can fall back to a DTO approach where you only pass back a lightweight collection of value objects needed.
This is a great question and I wanted to provide my answer as well. I think the technically best answer is to go with option #3. It provides the ability to best describe and organize the data along with scalability for future enhancements to reporting/searching requests.
However while this might be the overall best option, there is a huge cost IMO vs. the other (2) options which are the additional design time for all the classes and relationships needed to support the reporting needs (again under the premise that there is no ORM tool being used).
I struggle with this in a lot of my applications as well and the reality is that #2 is the best compromise between time and design. Now if you were asking about your busniess objects and all their needs there is no question that a fully laid out and properly designed model is important and there is no substitute. However when it comes to reporting and searching this to me is a different animal. #2 provides strongly typed data in the anemic classes and is not as primitive as hardcoded values in DataSets like #1, and still reduces greatly the amount of time needed to complete the design compared to #3.
Ideally I would love to extend my object model to encompass all reporting needs, but sometimes the effort required to do this is so extensive, that creating a separate set of classes just for reporting needs is an easier but still viable option. I actually asked almost this identical question a few years back and was also told that creating another set of classes (essentially DTOs) for reporting needs was not a bad option.
So to wrap it up, #3 is technically the best option, but #2 is probably the most realistic and viable option when considering time and quality together for complex reporting and searching needs.

Resources