Retrieving array of ids in Mongoid - ruby

how do you retrieve an array of IDs in Mongoid?
arr=["id1","id2"]
User.where(:id=>arr)
You can do this easily if you are retrieving another attribute
User.where(:nickname.in=>["kk","ll"])
But I am wondering how to do this in mongoid -> this should be a very simple and common operation

Remember that the ID is stored as :_id and not :id . There is an id helper method, but when you do queries, you should use :_id:
User.where(:_id.in => arr)
Often I find it useful to get a list of ids to do complex queries, so I do something like:
user_ids = User.only(:_id).where(:foo => :bar).distinct(:_id)
Post.where(:user_id.in => user_ids)

Or simply:
arr = ['id1', 'id2', 'id3']
User.find(arr)

The above method suggested by browsersenior doesn't seem to work anymore, at least for me. What I do is:
User.criteria.id(arr)

user_ids = User.only(:_id).where(:foo => :bar).map(&:_id)
Post.where(:user_id.in => user_ids)
The solution above works fine when amount of users is small. But it will require a lot of memory while there are thousands of users.
User.only(:_id).where(:foo => :bar).map(&:_id)
will create a list of User objects with nil in each field except id.
The solution (for mongoid 2.5):
User.collection.master.where(:foo => :bar).to_a.map {|o| o['_id']}

Related

What is the difference between activerecord's attribute and read_attribute?

I'm able to run both and it returns the same vale:
user = User.new(name:'John')
user.attributes['first_name']
=> 'John'
user.read_attribute('first_name')
=> 'John'
Is one more performant than the other? Are there cases where I would use one over the other?
Thanks!
attributes returns a hash of all attributes for the user and ['first_name'] just accesses the specified parameter of the hash, whereas read_attribute just returns the single parameter asked for. You don't really need either of those methods to access the name as this can be done which makes the code a lot cleaner:
user = User.new(name:'John')
user.name
=> 'John'

Bulk Insert into Mongo - Ruby

I am new to Ruby and Mongo and am working with twitter data. I'm using Ruby 1.9.3 and Mongo gems.
I am querying bulk data out of Mongo, filtering out some documents, processing the remaining documents (inserting new fields) and then writing new documents into Mongo.
The code below is working but runs relatively slow as I loop through using .each and then insert new documents into Mongo one at a time.
My Question: How can this be structured to process and insert in bulk?
cursor = raw.find({'user.screen_name' => users[cur], 'entities.urls' => []},{:fields => params})
cursor.each do |r|
if r['lang'] == "en"
score = r['retweet_count'] + r['favorite_count']
timestamp = Time.now.strftime("%d/%m/%Y %H:%M")
#Commit to Mongo
#document = {:id => r['id'],
:id_str => r['id_str'],
:retweet_count => r['retweet_count'],
:favorite_count => r['favorite_count'],
:score => score,
:created_at => r['created_at'],
:timestamp => timestamp,
:user => [{:id => r['user']['id'],
:id_str => r['user']['id_str'],
:screen_name => r['user']['screen_name'],
}
]
}
#collection.save(#document)
end #end.if
end #end.each
Any help is greatly appreciated.
In your case there is no way to make this much faster. One thing you could do is retrieve the documents in bulks, processing them and the reinserting them in bulks, but it would still be slow.
To speed this up you need to do all the processing server side, where the data already exist.
You should either use the aggregate framework of mongodb if the result document does not exceed 16mb or for more flexibility but slower execution (much faster than the potential your solution has) you can use the MapReduce framework of mongodb
What exactly are you doing? Why not going pure ruby or pure mongo (well that's ruby too) ? and Why do you really need to load every single attribute?
What I've understood from your code is you actually create a completely new document, and I think that's wrong.
You can do that with this in ruby side:
cursor = YourModel.find(params)
cursor.each do |r|
if r.lang == "en"
r.score = r.retweet_count + r.favorite_count
r.timestamp = Time.now.strftime("%d/%m/%Y %H:%M")
r.save
end #end.if
end #end.each
And ofcourse you can import include Mongoid::Timestamps in your model and it handles your created_at, and updated_at attribute (it creates them itself)
in mongoid it's a little harder
first you get your collection with use my_db then the next code will generate what you want
db.models.find({something: your_param}).forEach(function(doc){
doc.score = doc.retweet_count + doc.favorite_count
doc.timestamp = new Timestamp()
db.models.save(doc)
}
);
I don't know what was your parameters, but it's easy to create them, and also mongoid really do lazy loading, so if you don't try to use an attribute, it won't load that. You can actually save a lot of time not using every attribute.
And these methods, change the existing document, and won't create another one.

DataMapper first_or_create always set certain field?

I'm using the following with datamapper to create/get a new user from my db:
user = User.first_or_create({:id => data['id']})
This gets the user with id = data['id'] or creates it if it doesn't already exist.
I want to know how to set other attributes/fields of the returned object regardless of whether it is a new record or existing?
Is the only way to do this to then call user.update({:field => value ...}) or is there a better way to achieve this?
Well, you could write it as one line:
(User.first_or_create(:id => data['id'])).update(:field => value)
with hashes for the parameters if you wish (or if you need to specify more than one); however, it's worth noting that this will only work if the model as specified by the first_or_create is valid. If :name were a required field, for instance, then this wouldn't work:
(User.first_or_create({:id => data['id'], :name => "Morse"})).update(:name => "Lewis")
as the creation in the first part would fail.
You could get around this by specifying the parameters needed for a new record with something like
(User.first_or_create({:id => data['id'], :name => "Morse"}, {:name => "Lewis"})).update(:name => "Lewis")
but this is unusually painful, and is difficult to read.
Also note that using first_or_create with an :id will attempt to create a model with that specific :id, if such a record doesn't exist. This might not be what you want.
Alternatively, you can use first_or_new. You can't call update on an object created using this, however, as the record won't exist (although I believe this might have worked in previous versions of DataMapper).
Just for anyone coming across this answer, User.first_or_create({:id => data['id']}) does NOT "get the user with id = data['id'] or creates it if it doesn't already exist." It actually gets the first record in the table and changes its id t0 data['id'].
To actually get the first record with that id, or create it if it doesn't exist, you need to use a where clause:
User.where(id: data['id]).first_or_create

multiple levels of associated db objects to YAML

I need to create a 'List' object from the following db tables. I've already done this in a rails/datamapper application, but now I have a need to get specific lists into and out of a db through YAML.
List
Categories
Items
Item choices
e.g. given a list identifier, pull the initial list, the categories for that list, the items for those categories, and the choices for those items into some object, then output as a yaml file.
My first step is output a specific list to yaml, this shouldn't be a unique situation and I'm sure others have solved it before. From reading I'm guessing I need a multilevel hash of some sort, but all I've been able to do so far is get list and category...i.e. this is a bit out of my range right now, and I'm only working from the command line.
I'm asking for two things really to assist in sharpening my skill set:
guidance on working with a multiple level, nested hash situation to properly serialize an object for yaml, given a series of associated db tables
if there is an easier way that someone has already solved.
The included to_json (doc) method already allows you to easily nest related records, and choose what you want to output :
List.all.to_json(:only => {}, :include => {
:categories => { :only => {}, :include => {
:items => { :only => :your_attribute_name }
}
})
The next step is to convert it to yaml :
ActiveSupport::JSON.decode(your_json).to_yaml
Hope this helps

ActiveRecord: Find through multiple instances

Say I have the following in my controller:
#category1
#category2
and I want to find all stores associated with those two categories...
#stores = #category1.stores + #category2.stores
this does work, but unfortunately returns an unaltered Array, rather than a AR::Base Array, and as such, I can't do things like pagination, scope, etc...
It seems to me like there's a built-in way of finding through multiple instance association... isn't there?
##stores = #category1.stores + #category2.stores
#if you want to call API methods you can just add conditions with the category id
#stores = Store.find(:all, :conditions => ['category_id=?', a || b])
With ActiveRecord, whenever you're finding a set of unique model objects, calling find on that model is usually your best bet.
Then all you need to do is constrain the join table with the categories you care about.
#stores = Store.all(:joins => :categories,
:conditions => ['category_stores.category_id in (?)', [#category1.id, #category2.id]])

Resources