What is the better way to cache fields of referenced document in Mongoid? - ruby

What is the better way to cache fields of referenced document in Mongoid?
Currently I use additional fields for that:
class Trip
include Mongoid::Document
belongs_to :driver
field :driver_phone
field :driver_name
end
class Driver
include Mongoid::Document
field :name
field :phone
end
May be it would be more clear to store cache as nested object so in mongo it would be stored as:
{ driver_cache: { name: "john", phone: 12345 } }
I thought about embedded document with 1-1 relation? Is that right choice?

Author of Mongoid (Durran Jordan) suggested folowing option
This gem looks handy for this type of thing:
https://github.com/logandk/mongoid_denormalize

Great question. I authored and maintain mongoid_alize, a gem for denormalizing mongoid relations. And so I struggled with this question.
As of 0.3.1 alize now stores data for a denormalized one-to-one in a Hash, very much like your second example above w/ driver_cache. Previous versions, however, stored data in separate fields (your first example).
The change was motivated by many factors, but largely to make the handling of one-to-one's and one-to-many's consistent (alize can denormalize one-to-one, one-to-many, and many-to-many). one-to-many's have always been handled by storing the data in a array of hashes. So storing one-to-one's as just a Hash becomes a much more symmetrical design.
It also solved several other problems. You can find a more detailed explanation here - https://github.com/dzello/mongoid_alize#release-030
Hope that helps!

Either way, you're fine.
First approach seems slightly better because it explicitly states what data you're caching. Also, it (probably) requires less work from Mongoid :-)

Alexey,
I would recommend thinking about how the data will be used. If you always use the driver information in the context of a trip object then embedding is probably the proper choice.
If, however, you will use that information in other contexts perhaps it would be better as it's own collection as you have created it.
Also, consider embedding the trips inside the driver objects. That may or may not make sense given what your app is trying to do but logically it would make sense to have a collection of drivers that each have a set of trips (embedded or not) instead of having trips that embed drivers. I can see that scenario (where a trip is always though of in the context of a driver) to be more common than the above.
-Tyler

Alternative hash storage of your cacheable data:
field :driver_cache, type: Hash
(Keep in mind, internally the keys will be converted to strings.)

Related

What is a "fully-hydrated User object" in the context of GraphQL? [duplicate]

When someone talks about hydrating an object, what does that mean?
I see a Java project called Hydrate on the web that transforms data between different representations (RDMS to OOPS to XML). Is this the general meaning of object hydration; to transform data between representations? Could it mean reconstructing an object hierarchy from a stored representation?
Hydration refers to the process of filling an object with data. An object which has not yet been hydrated has been instantiated and represents an entity that does have data, but the data has not yet been loaded into the object. This is something that is done for performance reasons.
Additionally, the term hydration is used when discussing plans for loading data from databases or other data sources. Here are some examples:
You could say that an object is partially hydrated when you have only loaded some of the fields into it, but not all of them. This can be done because those other fields are not necessary for your current operations. So there's no reason to waste bandwidth and CPU cycles loading, transferring, and setting this data when it's not going to be used.
Additionally, there are some ORM's, such as Doctrine, which do not hydrate objects when they are instantiated, but only when the data is accessed in that object. This is one method that helps to not load data which is not going to be used.
With respect to the more generic term hydrate
Hydrating an object is taking an object that exists in memory, that doesn't yet contain any domain data ("real" data), and then populating it with domain data (such as from a database, from the network, or from a file system).
From Erick Robertson's comments on this answer:
deserialization == instantiation + hydration
If you don't need to worry about blistering performance, and you aren't debugging performance optimizations that are in the internals of a data access API, then you probably don't need to deal with hydration explicitly. You would typically use deserialization instead so you can write less code. Some data access APIs don't give you this option, and in those cases you'd also have to explicitly call the hydration step yourself.
For a bit more detail on the concept of Hydration, see Erick Robertson's answer on this same question.
With respect to the Java project called hydrate
You asked about this framework specifically, so I looked into it.
As best as I can tell, I don't think this project used the word "hydrate" in a very generic sense. I see its use in the title as an approximate synonym for "serialization". As explained above, this usage isn't entirely accurate:
See: http://en.wikipedia.org/wiki/Serialization
translating data structures or object state into a format that can be stored [...] and reconstructed later in the same or another computer environment.
I can't find the reason behind their name directly on the Hydrate FAQ, but I got clues to their intention. I think they picked the name "Hydrate" because the purpose of the library is similar to the popular sound-alike Hibernate framework, but it was designed with the exact opposite workflow in mind.
Most ORMs, Hibernate included, take an in-memory object-model oriented approach, with the database taking second consideration. The Hydrate library instead takes a database-schema oriented approach, preserving your relational data structures and letting your program work on top of them more cleanly.
Metaphorically speaking, still with respect to this library's name: Hydrate is like "making something ready to use" (like re-hydrating Dried Foods). It is a metaphorical opposite of Hibernate, which is more like "putting something away for the winter" (like Animal Hibernation).
The decision to name the library Hydrate, as far as I can tell, was not concerned with the generic computer programming term "hydrate".
When using the generic computer programming term "hydrate", performance optimizations are usually the motivation (or debugging existing optimizations). Even if the library supports granular control over when and how objects are populated with data, the timing and performance don't seem to be the primary motivation for the name or the library's functionality. The library seems more concerned with enabling end-to-end mapping and schema-preservation.
While it is somewhat redundant vernacular as Merlyn mentioned, in my experience it refers only to filling/populating an object, not instantiating/creating it, so it is a useful word when you need to be precise.
This is a pretty old question, but it seems that there is still confusion over the meaning of the following terms. Hopefully, this will disambiguate.
Hydrate
When you see descriptions that say things like, "an object that is waiting for data, is waiting to be hydrated", that's confusing and misleading. Objects don't wait for things, and hydration is just the act of filling an object with data.
Using JavaScript as the example:
const obj = {}; // empty object
const data = { foo: true, bar: true, baz: true };
// Hydrate "obj" with "data"
Object.assign(obj, data);
console.log(obj.foo); // true
console.log(obj.bar); // true
console.log(obj.baz); // true
Anything that adds values to obj is "hydrating" it. I'm just using Object.assign() in this example.
Since the terms "serialize" and "deserialize" were also mentioned in other answers, here are examples to help disambiguate the meaning of those concepts from hydration:
Serialize
console.log(JSON.stringify({ foo: true, bar: true, baz: true }));
Deserialize
console.log(JSON.parse('{"foo":true,"bar":true,"baz":true}'));
In PHP, you can create a new class from its name, w/o invoke constructor, like this:
require "A.php";
$className = "A";
$class = new \ReflectionClass($className);
$instance = $class->newInstanceWithoutConstructor();
Then, you can hydrate invoking setters (or public attributes)

Laravel / Eloquent special relation type based on parsed string attribute

I have developed a system where various classes have attributes consisting of a custom formula. The formula can contain special tokens which refer to different types of object. For example an object of class FruitSalad may have the following attribute;
$contents = "[A12] + [B76]";
In somewhat abstract terms, this means "add apple 12 to banana 76". It can also get significantly more complex than that with as many as 15 or 20 references to other objects involved in one formula.
I have a trait which passes formulae such as this and each time it finds a reference to a model (i.e. "[A12]") it gets it from the database with A::find(12) and adds it to an array of component objects which can be used for other processes later on in the request.
So, in essence, it's a relationship. But instead of a pivot table to describe the relationship, there is a formula on the parent model which can include references to child models.
This is all working. Yay! But it's really inefficient because there are so many tiny queries to get single models as formulae are parsed. One request may quite easily result in hundreds of queries. Oops.
I see two potential options;
1. Get all my apples and bananas from the database at the start of the request and get them from an in-memory store instead of from the database when parsing a formula (is this the repository pattern??).
2. Create a custom relation type (something like hasManyFromFormula) which makes eager loading work so that the parsing becomes much simpler because the relevant apples and bananas would already be loaded into the parent model.
Is there a precedent for this? As for why I am doing it like this, it would a bit tough to explain in brief but suffice to say it is to support a highly configurable data retrieval system which supports as-yet unknown input data configurations.
Help!
Thanks,
Geoff
Am not completely sure if it is the best solution, but in the end I created a new directory class for basic components and then set it up in the app service provider as a singleton. The constructor for the directory class loaded all models of several relevant classes and made them available as collections throughout the app.

do any mongodb ORMs allow you to alias fields?

I just watched this: http://blog.mongodb.org/post/38467892360/mongodb-schema-design-insights-and-tradeoffs-from
One suggestion that came out of the talk: in docs that will be replicated many times, try to make the field names as small as possible:
Reduce collection size by always using short field names as a
convention. This will help you save memory over time.
Choose "u" over "publicationUrl". Makes sense if you're talking about millions of rows. However, big readability problem there. It might be obvious that the value is a url, but what sort of url is it?
This might be solvable in the ORM though. Do any ORMs that interface with MongoDb allow you to say that 'u' in the db would map to 'publicationUrl' in the code? When you have things like a.u in code, that's pretty poor readability; article.u isn't much better.
(Ruby and node.js tags are there because those are the languages that I work with mongo in. Feel free to add tags.)
Per this discussion, Mongoose allows for virtual field names with getters and setters. Unfortunately virtuals can't be used in queries, and other server side operations such as map-reduce. The discussion also suggests this plugin for aliases as well which seems to address the query issue, but I suspect that it would also have trouble with more complex server side operations.
This is easy to with the Ruby ORM Mongoid. Here is an example straight from the docs:
class Band
include Mongoid::Document
field :n, as: :name, type: String
end
band = Band.new(name: "Placebo")
band.attributes #=> { "n" => "Placebo" }
criteria = Band.where(name: "Placebo")
criteria.selector #=> { "n" => "Placebo" }
I have used Mongoid on quite a few projects (albeit, all small ones) and really enjoy working with it. The docs are really great, and there is a section in the docs about performance as well.
Doctrine MongoDB ODM allows you to set an alias for your field while your object including getters and setters can remain readable f.e.:
/** #String(name="pUrl") */
private $publicationUrl;
Annotations Reference — Doctrine MongoDB ODM 1.0.0-BETA2 documentation — Field

aws-sdk-ruby AWS::Record::Base records sharing the same domain

We are using the aws-sdk for ruby, specifically AWS::Record::Base.
For various reasons we need to put records of various objects within the same domain in sdb.
The approach we thought we'd use here would be to add an attribute to each object that contains the object name and then include that in the where clause of finder methods when obtaining objects from sdb.
My questions to readers are:
what are your thoughts on this approach?
how would this be best implemented tidily? How is it best to add a default attribute included in an object without defining it explicitly in each model? Is overriding find or where in the finder methods sufficient to ensure that obtaining objects from sdb includes the clauses considering the new default attribute?
Thoughts appreciated.
It really depends on your problem, but I find it slightly distasteful. Variant records are fine and dandy, but when you start out with apples and dinosaurs and they have no common attributes, this approach has no benefit that I know of [aside from conserving your (seemingly pointless) quota of 250 SimpleDB domains]. If your records have something in common, then I can see where this approach might be useful, but someone scarred like me by legacy systems with variant records in Btrieve (achieved through C unions) has a hardwired antipathy toward this approach.
The cleanest approach I can think of is to have your models share a common parent through inheritance. The parent could then know of the child types, and implement the query appropriately. However, this design is definitely not SOLID and violates the Law of Demeter.

How to work around a potential performance issue when using a Grails hasMany relation?

Given the following domain classes:
class Post {
SortedSet tags
static hasMany = [tags: Tag]
}
class Tag {
static belongsTo = Post
static hasMany = [posts: Post]
}
From my understanding so far, using a hasMany will result in hibernate Set mapping.
However, in order to maintain uniqueness/order, Hibernate needs to load the entire set from the database and compare their hashes.
This could lead to a significant performance problem with adding and deleting posts/tags
if their sets get large. What is the best way to work around this issue?
There is no order ensured by Hibernate/GORM in the default mapping. Therefore, it doesn't have to load elements from the database in order to do the sorting. You will have your hands on a bunch of ids, but that's that extent of it.
See 19.5.2:
http://www.hibernate.org/hib_docs/reference/en/html/performance-collections.html
In general, Hibernate/GORM is going to have better performance than you expect. Unless and until you can actually prove a real-world performance issue, trust in the framework and don't worry about it.
The ordering of the set is guaranteed by the Set implementation, ie, the SortedSet. Unless you use a List, which keeps track of indexes on the db, the ordering is server-side only.
If your domain class is in a SortedSet, you have to implement Comparable in order to enable the proper sorting of the set.
The question of performance is not really a question per se. If you want to access a single Tag, you should get it by its Id. If you want the sorted tags, well, the sort only makes sense if you are looking at all Tags, not a particular one, so you end up retrieving all Tags at once. Since the sorting is performed server-side and not db-side, there is really not much difference between a SortedSet and a regular HashSet in regards to Db.
The Grails docs seems to be updated:
http://grails.org/doc/1.0.x/
In section 5.2.4 they discuss the potential performance issues for the collection types.
Here's the relevant section:
A Note on Collection Types and Performance
The Java Set type is a collection that doesn't allow duplicates. In order to ensure uniqueness when adding an entry to a Set association Hibernate has to load the entire associations from the database. If you have a large numbers of entries in the association this can be costly in terms of performance.
The same behavior is required for List types, since Hibernate needs to load the entire association in-order to maintain order. Therefore it is recommended that if you anticipate a large numbers of records in the association that you make the association bidirectional so that the link can be created on the inverse side. For example consider the following code:
def book = new Book(title:"New Grails Book")
def author = Author.get(1)
book.author = author
book.save()
In this example the association link is being created by the child (Book) and hence it is not necessary to manipulate the collection directly resulting in fewer queries and more efficient code. Given an Author with a large number of associated Book instances if you were to write code like the following you would see an impact on performance:
def book = new Book(title:"New Grails Book")
def author = Author.get(1)
author.addToBooks(book)
author.save()

Resources