I just watched this: http://blog.mongodb.org/post/38467892360/mongodb-schema-design-insights-and-tradeoffs-from
One suggestion that came out of the talk: in docs that will be replicated many times, try to make the field names as small as possible:
Reduce collection size by always using short field names as a
convention. This will help you save memory over time.
Choose "u" over "publicationUrl". Makes sense if you're talking about millions of rows. However, big readability problem there. It might be obvious that the value is a url, but what sort of url is it?
This might be solvable in the ORM though. Do any ORMs that interface with MongoDb allow you to say that 'u' in the db would map to 'publicationUrl' in the code? When you have things like a.u in code, that's pretty poor readability; article.u isn't much better.
(Ruby and node.js tags are there because those are the languages that I work with mongo in. Feel free to add tags.)
Per this discussion, Mongoose allows for virtual field names with getters and setters. Unfortunately virtuals can't be used in queries, and other server side operations such as map-reduce. The discussion also suggests this plugin for aliases as well which seems to address the query issue, but I suspect that it would also have trouble with more complex server side operations.
This is easy to with the Ruby ORM Mongoid. Here is an example straight from the docs:
class Band
include Mongoid::Document
field :n, as: :name, type: String
end
band = Band.new(name: "Placebo")
band.attributes #=> { "n" => "Placebo" }
criteria = Band.where(name: "Placebo")
criteria.selector #=> { "n" => "Placebo" }
I have used Mongoid on quite a few projects (albeit, all small ones) and really enjoy working with it. The docs are really great, and there is a section in the docs about performance as well.
Doctrine MongoDB ODM allows you to set an alias for your field while your object including getters and setters can remain readable f.e.:
/** #String(name="pUrl") */
private $publicationUrl;
Annotations Reference — Doctrine MongoDB ODM 1.0.0-BETA2 documentation — Field
Related
What is the better way to cache fields of referenced document in Mongoid?
Currently I use additional fields for that:
class Trip
include Mongoid::Document
belongs_to :driver
field :driver_phone
field :driver_name
end
class Driver
include Mongoid::Document
field :name
field :phone
end
May be it would be more clear to store cache as nested object so in mongo it would be stored as:
{ driver_cache: { name: "john", phone: 12345 } }
I thought about embedded document with 1-1 relation? Is that right choice?
Author of Mongoid (Durran Jordan) suggested folowing option
This gem looks handy for this type of thing:
https://github.com/logandk/mongoid_denormalize
Great question. I authored and maintain mongoid_alize, a gem for denormalizing mongoid relations. And so I struggled with this question.
As of 0.3.1 alize now stores data for a denormalized one-to-one in a Hash, very much like your second example above w/ driver_cache. Previous versions, however, stored data in separate fields (your first example).
The change was motivated by many factors, but largely to make the handling of one-to-one's and one-to-many's consistent (alize can denormalize one-to-one, one-to-many, and many-to-many). one-to-many's have always been handled by storing the data in a array of hashes. So storing one-to-one's as just a Hash becomes a much more symmetrical design.
It also solved several other problems. You can find a more detailed explanation here - https://github.com/dzello/mongoid_alize#release-030
Hope that helps!
Either way, you're fine.
First approach seems slightly better because it explicitly states what data you're caching. Also, it (probably) requires less work from Mongoid :-)
Alexey,
I would recommend thinking about how the data will be used. If you always use the driver information in the context of a trip object then embedding is probably the proper choice.
If, however, you will use that information in other contexts perhaps it would be better as it's own collection as you have created it.
Also, consider embedding the trips inside the driver objects. That may or may not make sense given what your app is trying to do but logically it would make sense to have a collection of drivers that each have a set of trips (embedded or not) instead of having trips that embed drivers. I can see that scenario (where a trip is always though of in the context of a driver) to be more common than the above.
-Tyler
Alternative hash storage of your cacheable data:
field :driver_cache, type: Hash
(Keep in mind, internally the keys will be converted to strings.)
For a while, I've been working on building a little Ruby library to interface with CouchDB, a neat little document database with a HTTP interface. Key features are:
document objects are glorified hashes
the JavaScript Map/Reduce functions are written in native Ruby, and parsed into JavaScript using S Expressions
you can interface with multiple Couch databases
it should integrate well with micro-frameworks like Camping
I want to be able to do something like this:
#recipes = Recipes.all
Where "Recipes" is a class defining a couple of required keys that the document has (the class name is automatically used as a "kind" key).
But then in tough times I might want to do something like this:
#recipes.each do |recipe|
recipe.cost = "too much!!"
recipe.push!
end
Now, obviously to be able to "push" like that, I either need the database to be.. somewhere in scope.. or for the document object itself to hold a reference to the database object? How is this done in well-established ORMs like ActiveRecord?
I don't want to have to do, you know, recipe.push!(#couch_database_object), or whatever, because that's yucky! But I don't wan to be some scope-polluting scumbag.
Any advice?
I'm working on a Sinatra-based project that uses the Datamapper ORM. I'd like to be able to define criteria for the DM validations in an external YAML file so that less-experienced users of the system can easily tweak the setup. I have this working pretty well as a proof-of-concept, but I suspect there could be a much easier or a least less processor-intensive way to approach this.
Right now, the script loads the YAML file and generates the DM classes with a series of eval statements (I know this already places me on thin ice). The problem is that this process has to happen with every request. My bright idea is to check the YAML for changes, regenerate the classes and export to static source if changes are detected, and include the static files if no changes are detected.
This is proving more difficult than I anticipated because exporting code blocks to strings for serialization isn't as trivial as I expected.
Is this ridiculous? Am I approaching this in an entirely wrong-headed way?
I'm new to Ruby and the world of ORMs, so please forgive my ignorance.
Thanks!
DM validations in an external YAML file so that less-experienced users of the system can easily tweak the setup
A DSL for a DSL. Not having seen your YAML I still wonder how much easier than the DM Validations it really can get?
require 'dm-validations'
class User
include DataMapper::Resource
property :name, String
# Manual validation
validates_length_of :name, :max => 42
# Auto-validation
property :bio, Text, :length => 100..500
end
Instead of going for YAML I would provide the less-experienced users with a couple of relevant validation examples and possibly also a short guideline based on the dm-validations documentation.
It does seem a little crazy to go and put everything in YAML, as that's only a shade easier than writing the validations in Ruby. What you could do is make a DSL in Ruby that makes defining validations much easier, then expose that to your users instead of the whole class.
The project currently I am working in requires a lot of searhing/filtering pages. For example I have a comlex search page to get Issues by data,category,unit,...
Issue Domain Class is complex and contains lots of value objects and child objects.
.I am wondering how people deal with Searching/Filtering/Reporting for UI. As far As I know I have 3 options but none of them make me happier.
1.) Send parameters to Repository/DAO to Get DataTable and Bind DataTable to UI Controls.For Example to ASP.NET GridView
DataTable dataTable =issueReportRepository.FindBy(specs);
.....
grid.DataSource=dataTable;
grid.DataBind();
In this option I can simply by pass the Domain Layer and query database for given specs. And I dont have to get fully constructed complex Domain Object. No need for value objects,child objects,.. Get data to displayed in UI in DataTable directly from database and show in the UI.
But If have have to show a calculated field in UI like method return value I have to do this in the DataBase because I don't have fully domain object. I have to duplicate logic and DataTable problems like no intellisense etc...
2.)Send parameters to Repository/DAO to Get DTO and Bind DTO to UI Controls.
IList<IssueDTO> issueDTOs =issueReportRepository.FindBy(specs);
....
grid.DataSource=issueDTOs;
grid.DataBind();
In this option is same as like above but I have to create anemic DTO objects for every search page. Also For different Issue search pages I have to show different parts of the Issue Objects.IssueSearchDTO, CompanyIssueTO,MyIssueDTO....
3.) Send parameters to Real Repository class to get fully constructed Domain Objects.
IList<Issue> issues =issueRepository.FindBy(specs);
//Bind to grid...
I like Domain Driven Design and Patterns. There is no DTO or duplication logic in this option.but in this option I have to create lot's of child and value object that will not shown in the UI.Also it requires lot's ob join to get full domain object and performance cost for needles child objects and value objects.
I don't use any ORM tool Maybe I can implement Lazy Loading by hand for this version but It seems a bit overkill.
Which one do you prefer?Or Am I doing it wrong? Are there any suggestions or better way to do this?
I have a few suggestions, but of course the overall answer is "it depends".
First, you should be using an ORM tool or you should have a very good reason not to be doing so.
Second, implementing Lazy Loading by hand is relatively simple so in the event that you're not going to use an ORM tool, you can simply create properties on your objects that say something like:
private Foo _foo;
public Foo Foo
{
get {
if(_foo == null)
{
_foo = _repository.Get(id);
}
return _foo;
}
}
Third, performance is something that should be considered initially but should not drive you away from an elegant design. I would argue that you should use (3) initially and only deviate from it if its performance is insufficient. This results in writing the least amount of code and having the least duplication in your design.
If performance suffers you can address it easily in the UI layer using Caching and/or in your Domain layer using Lazy Loading. If these both fail to provide acceptable performance, then you can fall back to a DTO approach where you only pass back a lightweight collection of value objects needed.
This is a great question and I wanted to provide my answer as well. I think the technically best answer is to go with option #3. It provides the ability to best describe and organize the data along with scalability for future enhancements to reporting/searching requests.
However while this might be the overall best option, there is a huge cost IMO vs. the other (2) options which are the additional design time for all the classes and relationships needed to support the reporting needs (again under the premise that there is no ORM tool being used).
I struggle with this in a lot of my applications as well and the reality is that #2 is the best compromise between time and design. Now if you were asking about your busniess objects and all their needs there is no question that a fully laid out and properly designed model is important and there is no substitute. However when it comes to reporting and searching this to me is a different animal. #2 provides strongly typed data in the anemic classes and is not as primitive as hardcoded values in DataSets like #1, and still reduces greatly the amount of time needed to complete the design compared to #3.
Ideally I would love to extend my object model to encompass all reporting needs, but sometimes the effort required to do this is so extensive, that creating a separate set of classes just for reporting needs is an easier but still viable option. I actually asked almost this identical question a few years back and was also told that creating another set of classes (essentially DTOs) for reporting needs was not a bad option.
So to wrap it up, #3 is technically the best option, but #2 is probably the most realistic and viable option when considering time and quality together for complex reporting and searching needs.
Given the following domain classes:
class Post {
SortedSet tags
static hasMany = [tags: Tag]
}
class Tag {
static belongsTo = Post
static hasMany = [posts: Post]
}
From my understanding so far, using a hasMany will result in hibernate Set mapping.
However, in order to maintain uniqueness/order, Hibernate needs to load the entire set from the database and compare their hashes.
This could lead to a significant performance problem with adding and deleting posts/tags
if their sets get large. What is the best way to work around this issue?
There is no order ensured by Hibernate/GORM in the default mapping. Therefore, it doesn't have to load elements from the database in order to do the sorting. You will have your hands on a bunch of ids, but that's that extent of it.
See 19.5.2:
http://www.hibernate.org/hib_docs/reference/en/html/performance-collections.html
In general, Hibernate/GORM is going to have better performance than you expect. Unless and until you can actually prove a real-world performance issue, trust in the framework and don't worry about it.
The ordering of the set is guaranteed by the Set implementation, ie, the SortedSet. Unless you use a List, which keeps track of indexes on the db, the ordering is server-side only.
If your domain class is in a SortedSet, you have to implement Comparable in order to enable the proper sorting of the set.
The question of performance is not really a question per se. If you want to access a single Tag, you should get it by its Id. If you want the sorted tags, well, the sort only makes sense if you are looking at all Tags, not a particular one, so you end up retrieving all Tags at once. Since the sorting is performed server-side and not db-side, there is really not much difference between a SortedSet and a regular HashSet in regards to Db.
The Grails docs seems to be updated:
http://grails.org/doc/1.0.x/
In section 5.2.4 they discuss the potential performance issues for the collection types.
Here's the relevant section:
A Note on Collection Types and Performance
The Java Set type is a collection that doesn't allow duplicates. In order to ensure uniqueness when adding an entry to a Set association Hibernate has to load the entire associations from the database. If you have a large numbers of entries in the association this can be costly in terms of performance.
The same behavior is required for List types, since Hibernate needs to load the entire association in-order to maintain order. Therefore it is recommended that if you anticipate a large numbers of records in the association that you make the association bidirectional so that the link can be created on the inverse side. For example consider the following code:
def book = new Book(title:"New Grails Book")
def author = Author.get(1)
book.author = author
book.save()
In this example the association link is being created by the child (Book) and hence it is not necessary to manipulate the collection directly resulting in fewer queries and more efficient code. Given an Author with a large number of associated Book instances if you were to write code like the following you would see an impact on performance:
def book = new Book(title:"New Grails Book")
def author = Author.get(1)
author.addToBooks(book)
author.save()