Bulk Insert into Mongo - Ruby - ruby

I am new to Ruby and Mongo and am working with twitter data. I'm using Ruby 1.9.3 and Mongo gems.
I am querying bulk data out of Mongo, filtering out some documents, processing the remaining documents (inserting new fields) and then writing new documents into Mongo.
The code below is working but runs relatively slow as I loop through using .each and then insert new documents into Mongo one at a time.
My Question: How can this be structured to process and insert in bulk?
cursor = raw.find({'user.screen_name' => users[cur], 'entities.urls' => []},{:fields => params})
cursor.each do |r|
if r['lang'] == "en"
score = r['retweet_count'] + r['favorite_count']
timestamp = Time.now.strftime("%d/%m/%Y %H:%M")
#Commit to Mongo
#document = {:id => r['id'],
:id_str => r['id_str'],
:retweet_count => r['retweet_count'],
:favorite_count => r['favorite_count'],
:score => score,
:created_at => r['created_at'],
:timestamp => timestamp,
:user => [{:id => r['user']['id'],
:id_str => r['user']['id_str'],
:screen_name => r['user']['screen_name'],
}
]
}
#collection.save(#document)
end #end.if
end #end.each
Any help is greatly appreciated.

In your case there is no way to make this much faster. One thing you could do is retrieve the documents in bulks, processing them and the reinserting them in bulks, but it would still be slow.
To speed this up you need to do all the processing server side, where the data already exist.
You should either use the aggregate framework of mongodb if the result document does not exceed 16mb or for more flexibility but slower execution (much faster than the potential your solution has) you can use the MapReduce framework of mongodb

What exactly are you doing? Why not going pure ruby or pure mongo (well that's ruby too) ? and Why do you really need to load every single attribute?
What I've understood from your code is you actually create a completely new document, and I think that's wrong.
You can do that with this in ruby side:
cursor = YourModel.find(params)
cursor.each do |r|
if r.lang == "en"
r.score = r.retweet_count + r.favorite_count
r.timestamp = Time.now.strftime("%d/%m/%Y %H:%M")
r.save
end #end.if
end #end.each
And ofcourse you can import include Mongoid::Timestamps in your model and it handles your created_at, and updated_at attribute (it creates them itself)
in mongoid it's a little harder
first you get your collection with use my_db then the next code will generate what you want
db.models.find({something: your_param}).forEach(function(doc){
doc.score = doc.retweet_count + doc.favorite_count
doc.timestamp = new Timestamp()
db.models.save(doc)
}
);
I don't know what was your parameters, but it's easy to create them, and also mongoid really do lazy loading, so if you don't try to use an attribute, it won't load that. You can actually save a lot of time not using every attribute.
And these methods, change the existing document, and won't create another one.

Related

Optimizing Postgres queries using Sequel gem

I have a Ruby script using the Sequel gem to access my Postgres database.
At the moment, here are my calls:
#db[:items].insert_conflict.insert(:sku => sku, :name => itemName)
dbItem = Item.where(:sku => sku).first
dbPrice = Price.create(:price => foundPrice, :quantity => quantity, :status => status)
dbItem.add_price(dbPrice)
store.add_price(dbPrice)
Unfortunately, insert_conflict.insert returns either the key if it is inserted, and nil if it exists.. otherwise I'd be able to use the one call (to get the actual object that was inserted or already existed, see line 2). Is there any other way to reduce this to one call?
As for the last three calls, I believe since I'm adding relationships between three different schemas, there's no way to reduce this. But I'm including it just in case.

Mongodb and Ruby gem - Check if record exists

I've a simple Ruby script (no rails, sinatra etc.) that uses the Mongo gem to insert records into my DB as part of a Redis/Resque worker.
Upon occasion instead of doing a fresh insert I'd like to update a counter field on an existing record. I can do this handily enough with rails/mysql. What's the quickest way of doing this in pure Ruby with Mongodb?
Thanks,
Ed
The Ruby client library for MongoDB is very convenient and easy to use. So, to update a document in MongoDB, use something similar to this:
#!/usr/bin/ruby
require 'mongo'
database = Mongo::Connection.new.db("yourdatabasename")
# get the document
x = database.find({"_id" => "12312132"})
# change the document
x["count"] = (x["count"] || 0) + 1
# update it in mongodb
database["collection"].update("_id" => "thecollectionid", x)
You might want to check out the manual for updating documents in MongoDB as well.
thanks to envu's direction I went with upsert in the end. here is an example snippet of how to use it the Ruby client:
link_id = #globallinks.update(
{
":url" => "http://somevalue.com"
},
{
'$inc' => {":totalcount" => 1},
'$set' => {":timelastseen" => Time.now}
},
{
:upsert=>true
}
)

Mongoid: multiple checks on a single field

I need to select transactions with the same type as a given transaction. And I need to check that it doesn't return all transactions with the nil type.
With ActiveRecord I can easily write:
given_transaction = Transaction.first
needed_transactions = Transaction.where('type != nil and type = ?', given_transaction.type)
and all works
when I try to write the same thing with mongoid:
needed_transactions = Transaction.where(:type => given_transaction.type, :type.ne => nil)
It generates the following query:
"query"=>{:type=>{"$ne"=>"planned"}}
In other words, mongoid ignores the first check and only uses the last check on the field.
I tried "all_of", "all_in", "and" — and still can't find the working solution.
Maybe I am doing something wrong... My world is going upside down because of this... :(((
From the fine manual:
All queries in Mongoid are Criteria, which is a chainable and lazily evaluated wrapper to a MongoDB dynamic query.
And looking at the Criteria docs for where we see a bunch of examples with a single condition. But remember the chainability mentioned above. Perhaps you're looking for this:
needed_transactions = Transaction.where(:type => given_transaction.type).where(:type.ne => nil)
The Criteria#and docs might make good reading as well:
Adds another simple expression that must match in order to return results. This is the same as Criteria#where and is mostly here for syntactic sugar.
MONGOID
# Match all people with last name Jordan and first name starting with d.
Person.where(last_name: "Jordan").and(first_name: /^d/i)
MONGODB QUERY SELECTOR
{ "last_name" : "Jordan", "first_name" : /^d/i }
I have to admit that I don't understand why you're checking :type twice like that though; if given_transaction.type.nil? is possible then you could deal with that without even querying your database.
And BTW, with ActiveRecord you'd want to say this:
Transaction.where('type is not null and type = ?', given_transaction.type)
As far as the strange query you're getting is concerned, when you do this:
Transaction.where(:type => given_transaction.type, :type.ne => nil)
Mongoid ends up trying to build a Hash with two values for the :type key:
{ :type => 'planned' }
{ :type => { :$ne => nil } }
and somehow it ends up replacing the nil with 'planned'. I don't know the internal details of Mongoid's where or the methods it patches into Symbol, I'm just backtracking from the observed behavior.

Find documents including element in Array field with mongomapper?

I am new to mongodb/mongomapper and can't find an answer to this.
I have a mongomapper class with the following fields
key :author_id, Integer
key :partecipant_ids, Array
Let's say I have a "record" with the following attributes:
{ :author_id => 10, :partecipant_ids => [10,15,201] }
I want to retrieve all the objects where the partecipant with id 15 is involved.
I did not find any mention in the documentation.
The strange thing is that previously I was doing this query
MessageThread.where :partecipant_ids => [15]
which worked, but after (maybe) some change in the gem/mongodb version it stopped working.
Unfortunately I don't know which version of mongodb and mongomapper I was using before.
In the current versions of MongoMapper, this will work:
MessageThread.where(:partecipant_ids => 15)
And this should work as well...
MessageThread.where(:partecipant_ids => [15])
...because plucky autoexpands that to:
MessageThread.where(:partecipant_ids => { :$in => [15] })
(see https://github.com/jnunemaker/plucky/blob/master/lib/plucky/criteria_hash.rb#L121)
I'd say take a look at your data and try out queries in the Mongo console to make sure you have a working query. MongoDB queries translate directly to MM queries except for the above (and a few other minor) caveats. See http://www.mongodb.org/display/DOCS/Querying

Retrieving array of ids in Mongoid

how do you retrieve an array of IDs in Mongoid?
arr=["id1","id2"]
User.where(:id=>arr)
You can do this easily if you are retrieving another attribute
User.where(:nickname.in=>["kk","ll"])
But I am wondering how to do this in mongoid -> this should be a very simple and common operation
Remember that the ID is stored as :_id and not :id . There is an id helper method, but when you do queries, you should use :_id:
User.where(:_id.in => arr)
Often I find it useful to get a list of ids to do complex queries, so I do something like:
user_ids = User.only(:_id).where(:foo => :bar).distinct(:_id)
Post.where(:user_id.in => user_ids)
Or simply:
arr = ['id1', 'id2', 'id3']
User.find(arr)
The above method suggested by browsersenior doesn't seem to work anymore, at least for me. What I do is:
User.criteria.id(arr)
user_ids = User.only(:_id).where(:foo => :bar).map(&:_id)
Post.where(:user_id.in => user_ids)
The solution above works fine when amount of users is small. But it will require a lot of memory while there are thousands of users.
User.only(:_id).where(:foo => :bar).map(&:_id)
will create a list of User objects with nil in each field except id.
The solution (for mongoid 2.5):
User.collection.master.where(:foo => :bar).to_a.map {|o| o['_id']}

Resources