How to extract Mongoid documents based on a field value in the first or last embedded document? - ruby

I wish to find Order documents based on a field in the last embedded Notificationdocument.
In the example below I wish to find all pending orders that has one or more embedded notifications, and where the last notification has a datetime that is between 5 and 10 days old.
My suggestion here dosen't seem to do the trick...:
Order.where(status: 'pending').gte('notifications.last.datetime' => 5.days.ago).lte('notifications.last.datetime' => 10.days.ago)
Here are the two models:
class Order
include Mongoid::Document
field :datetime, type: DateTime
field :status, type: String, default: 'pending'
embeds_many :notifications, :inverse_of => :order
end
class Notification
include Mongoid::Document
field :datetime, type: DateTime
embedded_in :order, :inverse_of => :notifications
end

The main issue of the question seems to be how to refer to the LAST element of an array in the query.
Unfortunately, it is impossible as of MongoDB 2.4.
The simplest way to implement this feature is to use negative value to point to an element in an array like 'notifications.-1.datetime', but it doesn't work. (Refer to [#SERVER-5565] Handle negative array offsets consistently - MongoDB.)
To make matters worse, it also seems impossible to solve this using Aggregation Framework. There is no way to
add an array index to each element when $unwinding ([#SERVER-4588] aggregation: add option to $unwind to emit array index - MongoDB) or
select the index of an array dynamically when $projecting. ([#SERVER-4589] aggregation: need an array indexing operator - MongoDB)
Therefore, the only option you have seem to change the schema to match what you want. The simplest way is to add to Order one more field which contains datetime of the last Notification.
Update:
You can first get all candidates from the server, and then narrow down them on the client side to get the final result set. This involves no schema change. If the scale of database is relatively small or some degradation of performance is acceptable, this might be the best solution.
query = Order.where(status: 'pending').elem_match(
notifications: { datetime: { '$gte' => 10.days.ago, '$lte' => 5.days.ago } })
query.select do |order|
# datetime = order.notifications[0].datetime
datetime = order.notifications[order.notifications.size - 1].datetime
10.days.ago <= datetime && datetime <= 5.days.ago
end.each do |order|
p order # result
end

I know it comes a little late, but hey, better later than never. :P
You can use JavaScript in where:
Order.where("this.notifications[this.notifications.length - 1].datetime > new Date('#{5.days.ago}')")
Just found out that and was a huge relief having not to change my models. Hope that helps!

Related

Mongoid's .includes() Not Populating Relations

I am using Mongoid v4.0.2, and I'm running into an interesting issue using .includes(). I have a record that represents invoices, who has a list of charges.
I want to query for a single invoice and have the charges be populated after I run the query. According to the docs (search for "Eager Loading"), I should be able to do something like this to have Mongoid populate the charges:
Invoice.includes(:charges).find_by({ _id: <objectId> })
When I get the record back the charges are still showing up as a list of ObjectId's, and removing the .includes() seems to have no effect one way or another. I've verified each charge exists in the record I'm querying for, so I'm confused why they aren't populating.
I believe I have the data models set up correctly, but I'll include them here for completeness.
class Invoice
include Mongoid::Document
has_many :charges
field :status, type: String
field :created, type: Time, default: -> { Time.now }
end
class Charge
include Mongoid::Document
field :created, type: Time, default: -> { Time.now }
field :transactionId, type: String
field :category, type: String
field :amount, type: Float
field :notes, type: String
belongs_to :invoices
end
There is no reason to use includes if you are only finding one document. Just find the document and then access the relation. Either way, 2 database requests will be issued.
The only time includes provides a performance increase is when you are loading multiple relations for multiple documents, because what Mongoid will do is load the queried documents, go through and gather all of the ids that should be queried for all of those documents and then query for all relations as one database call using the :id.in => ids feature. In your case, there is no point to do this.

How to write code to NOT include null value fields into mongodb

class State
include Mongoid::Document
embeds_many :cities
field :name
end
class City
include Mongoid::Document
embedded_in :state
field :name
field :population
field ...
end
I don't want to include the fields with nil value into mongodb,
nsw = State.new name: 'NSW'
if number_of_people
nsw.cities.create name: 'Syndey', population: number_of_people
else
nsw.cities.create name: 'Syndey'
end
so it is necessary to check whether or not that field is empty or null. But the problem is when there are many fields in City, the code looks ugly.
How to improve this and write smart code?
You need to define a custom class method in City model like the following:
def self.create_persistences(fields = {})
attributes = {}
fields.each do |key, value|
attributes[key] = value if value
end
create attributes
end
and in your controller, call this method without conditions hassle:
nsw.cities.create_persistences name: 'Syndey', population: number_of_people
note: you can also override create method on your model instead of defining new method but in my opinion, I don't prefer to override something you may use in other part of the code.
Now we know what you are doing your answer seems clear. But I think your question needs an edit to inform.
So what you have is data from some source that you are using to populate your new model. So at some stage here you are going to have a hash or at least some way of constructing a hash in some form from however your data is organized. Take the following [short form but the same thing]:
info = { name: "Sydney", population: 100 }
City.new( info );
info = { name: "Melbourne", population: 80, info: "fun" }
City.new( info )
info = { name: "Adelaide" }
City.new( info )
So (at least in my testing ), you are going to get each document, with only the specified fields created each time.
So dynamically using the hash (and hopefully you are even just reading in that way ) is going to be a lot smarter than testing each value in code.
If you have to do a lot of value testing to even "build up" a hash then you have problems that no-one here can fix. But building hashes should be easy.

Sequel join: ".id" is returning the id of the the other table

I have a Ruby app which uses Ramaze and Sequel.
I have the classes Tag and Tagging which have this relationship:
class Tag < Sequel::Model
one_to_many :taggings, :class => 'Thoth::Tagging'
class Tagging < Sequel::Model(:taggings)
many_to_one :tag, :class => 'Thoth::Tag'
I want to return a list of tags in order of popularity, the ones that have the most taggings (filtering out any that have less than three taggings). I'm doing that this way:
tags = .left_outer_join(:taggings, :tag_id => :id).group(:tag_id).having("count(*) > 2").order("count(*) desc").all
This does return what seem to be tag objects, but when I call .id on them, I get the id of a tagging that points to the tag, rather than the tag itself.
On closer inspection, the results are quite different from a regular find:
> tag_regular = Tag[2]
=> #<Thoth::Tag #values={:title=>nil, :position=>nil, :parent_id=>1, :name=>"academic", :id=>2}>
> tag_from_join = Tag.join(:taggings, :tag_id => :id).group(:tag_id).having("count(*) > 2").order("count(*) desc").all.select{|tag| tag.name == "academic"}.first
=> #<Thoth::Tag #values={:tag_id=>2, :post_id=>5, :title=>nil, :position=>nil, :parent_id=>1, :name=>"academic", :id=>1611, :created_at=>nil}>
In both cases I get a Thoth::Tag, but the values are quite different, based on the different fields in the join I suppose.
All I actually need to do is get a list of regular tag objects sorted by the number of taggings, but in an efficient single-query way. Is there a better way?
The default selection is *, so you are selecting columns from both tags and taggings. If you have an id column in both tables, because Sequel returns records as a hash keyed by column name, columns in the taggings table will override columns with the same name in the tags table.
If you only want the columns from tags, add select_all(:tags) to the dataset.
The Sequel master branch has a table_select plugin that will handle this situation by default.

Bulk Insert into Mongo - Ruby

I am new to Ruby and Mongo and am working with twitter data. I'm using Ruby 1.9.3 and Mongo gems.
I am querying bulk data out of Mongo, filtering out some documents, processing the remaining documents (inserting new fields) and then writing new documents into Mongo.
The code below is working but runs relatively slow as I loop through using .each and then insert new documents into Mongo one at a time.
My Question: How can this be structured to process and insert in bulk?
cursor = raw.find({'user.screen_name' => users[cur], 'entities.urls' => []},{:fields => params})
cursor.each do |r|
if r['lang'] == "en"
score = r['retweet_count'] + r['favorite_count']
timestamp = Time.now.strftime("%d/%m/%Y %H:%M")
#Commit to Mongo
#document = {:id => r['id'],
:id_str => r['id_str'],
:retweet_count => r['retweet_count'],
:favorite_count => r['favorite_count'],
:score => score,
:created_at => r['created_at'],
:timestamp => timestamp,
:user => [{:id => r['user']['id'],
:id_str => r['user']['id_str'],
:screen_name => r['user']['screen_name'],
}
]
}
#collection.save(#document)
end #end.if
end #end.each
Any help is greatly appreciated.
In your case there is no way to make this much faster. One thing you could do is retrieve the documents in bulks, processing them and the reinserting them in bulks, but it would still be slow.
To speed this up you need to do all the processing server side, where the data already exist.
You should either use the aggregate framework of mongodb if the result document does not exceed 16mb or for more flexibility but slower execution (much faster than the potential your solution has) you can use the MapReduce framework of mongodb
What exactly are you doing? Why not going pure ruby or pure mongo (well that's ruby too) ? and Why do you really need to load every single attribute?
What I've understood from your code is you actually create a completely new document, and I think that's wrong.
You can do that with this in ruby side:
cursor = YourModel.find(params)
cursor.each do |r|
if r.lang == "en"
r.score = r.retweet_count + r.favorite_count
r.timestamp = Time.now.strftime("%d/%m/%Y %H:%M")
r.save
end #end.if
end #end.each
And ofcourse you can import include Mongoid::Timestamps in your model and it handles your created_at, and updated_at attribute (it creates them itself)
in mongoid it's a little harder
first you get your collection with use my_db then the next code will generate what you want
db.models.find({something: your_param}).forEach(function(doc){
doc.score = doc.retweet_count + doc.favorite_count
doc.timestamp = new Timestamp()
db.models.save(doc)
}
);
I don't know what was your parameters, but it's easy to create them, and also mongoid really do lazy loading, so if you don't try to use an attribute, it won't load that. You can actually save a lot of time not using every attribute.
And these methods, change the existing document, and won't create another one.

multiple levels of associated db objects to YAML

I need to create a 'List' object from the following db tables. I've already done this in a rails/datamapper application, but now I have a need to get specific lists into and out of a db through YAML.
List
Categories
Items
Item choices
e.g. given a list identifier, pull the initial list, the categories for that list, the items for those categories, and the choices for those items into some object, then output as a yaml file.
My first step is output a specific list to yaml, this shouldn't be a unique situation and I'm sure others have solved it before. From reading I'm guessing I need a multilevel hash of some sort, but all I've been able to do so far is get list and category...i.e. this is a bit out of my range right now, and I'm only working from the command line.
I'm asking for two things really to assist in sharpening my skill set:
guidance on working with a multiple level, nested hash situation to properly serialize an object for yaml, given a series of associated db tables
if there is an easier way that someone has already solved.
The included to_json (doc) method already allows you to easily nest related records, and choose what you want to output :
List.all.to_json(:only => {}, :include => {
:categories => { :only => {}, :include => {
:items => { :only => :your_attribute_name }
}
})
The next step is to convert it to yaml :
ActiveSupport::JSON.decode(your_json).to_yaml
Hope this helps

Resources