How to update or insert on Sequel dataset? - ruby

I just started using Sequel in a really small Sinatra app. Since I've got only one DB table, I don't need to use models.
I want to update a record if it exists or insert a new record if it does not. I came up with the following solution:
rec = $nums.where(:number => n, :type => t)
if $nums.select(1).where(rec.exists)
rec.update(:counter => :counter + 1)
else
$nums.insert(:number => n, :counter => 1, :type => t)
end
Where $nums is DB[:numbers] dataset.
I believe that this way isn't the most elegant implementation of "update or insert" behavior.
How should it be done?

You should probably not check before updating/inserting; because:
This is an extra db call.
This could introduce a race condition.
What you should do instead is to test the return value of update:
rec = $nums.where(:number => n, :type => t)
if 1 != rec.update(:counter => :counter + 1)
$nums.insert(:number => n, :counter => 1, :type => t)
end

Sequel 4.25.0 (released July 31st, 2015) added insert_conflict for Postgres v9.5+
Sequel 4.30.0 (released January 4th, 2016) added insert_conflict for SQLite
This can be used to either insert or update a row, like so:
DB[:table_name].insert_conflict(:update).insert( number:n, type:t, counter:c )

I believe you can't have it much cleaner than that (although some databases have specific upsert syntax, which might be supported by Sequel). You can just wrap what you have in a separate method and pretend that it doesn't exist. :)
Just couple suggestions:
Enclose everything within a transaction.
Create unique index on (number, type) fields.
Don't use global variables.

You could use upsert, except it doesn't currently work for updating counters. Hopefully a future version will - ideas welcome!

Related

Optimizing Postgres queries using Sequel gem

I have a Ruby script using the Sequel gem to access my Postgres database.
At the moment, here are my calls:
#db[:items].insert_conflict.insert(:sku => sku, :name => itemName)
dbItem = Item.where(:sku => sku).first
dbPrice = Price.create(:price => foundPrice, :quantity => quantity, :status => status)
dbItem.add_price(dbPrice)
store.add_price(dbPrice)
Unfortunately, insert_conflict.insert returns either the key if it is inserted, and nil if it exists.. otherwise I'd be able to use the one call (to get the actual object that was inserted or already existed, see line 2). Is there any other way to reduce this to one call?
As for the last three calls, I believe since I'm adding relationships between three different schemas, there's no way to reduce this. But I'm including it just in case.

Bulk Insert into Mongo - Ruby

I am new to Ruby and Mongo and am working with twitter data. I'm using Ruby 1.9.3 and Mongo gems.
I am querying bulk data out of Mongo, filtering out some documents, processing the remaining documents (inserting new fields) and then writing new documents into Mongo.
The code below is working but runs relatively slow as I loop through using .each and then insert new documents into Mongo one at a time.
My Question: How can this be structured to process and insert in bulk?
cursor = raw.find({'user.screen_name' => users[cur], 'entities.urls' => []},{:fields => params})
cursor.each do |r|
if r['lang'] == "en"
score = r['retweet_count'] + r['favorite_count']
timestamp = Time.now.strftime("%d/%m/%Y %H:%M")
#Commit to Mongo
#document = {:id => r['id'],
:id_str => r['id_str'],
:retweet_count => r['retweet_count'],
:favorite_count => r['favorite_count'],
:score => score,
:created_at => r['created_at'],
:timestamp => timestamp,
:user => [{:id => r['user']['id'],
:id_str => r['user']['id_str'],
:screen_name => r['user']['screen_name'],
}
]
}
#collection.save(#document)
end #end.if
end #end.each
Any help is greatly appreciated.
In your case there is no way to make this much faster. One thing you could do is retrieve the documents in bulks, processing them and the reinserting them in bulks, but it would still be slow.
To speed this up you need to do all the processing server side, where the data already exist.
You should either use the aggregate framework of mongodb if the result document does not exceed 16mb or for more flexibility but slower execution (much faster than the potential your solution has) you can use the MapReduce framework of mongodb
What exactly are you doing? Why not going pure ruby or pure mongo (well that's ruby too) ? and Why do you really need to load every single attribute?
What I've understood from your code is you actually create a completely new document, and I think that's wrong.
You can do that with this in ruby side:
cursor = YourModel.find(params)
cursor.each do |r|
if r.lang == "en"
r.score = r.retweet_count + r.favorite_count
r.timestamp = Time.now.strftime("%d/%m/%Y %H:%M")
r.save
end #end.if
end #end.each
And ofcourse you can import include Mongoid::Timestamps in your model and it handles your created_at, and updated_at attribute (it creates them itself)
in mongoid it's a little harder
first you get your collection with use my_db then the next code will generate what you want
db.models.find({something: your_param}).forEach(function(doc){
doc.score = doc.retweet_count + doc.favorite_count
doc.timestamp = new Timestamp()
db.models.save(doc)
}
);
I don't know what was your parameters, but it's easy to create them, and also mongoid really do lazy loading, so if you don't try to use an attribute, it won't load that. You can actually save a lot of time not using every attribute.
And these methods, change the existing document, and won't create another one.

Mongodb and Ruby gem - Check if record exists

I've a simple Ruby script (no rails, sinatra etc.) that uses the Mongo gem to insert records into my DB as part of a Redis/Resque worker.
Upon occasion instead of doing a fresh insert I'd like to update a counter field on an existing record. I can do this handily enough with rails/mysql. What's the quickest way of doing this in pure Ruby with Mongodb?
Thanks,
Ed
The Ruby client library for MongoDB is very convenient and easy to use. So, to update a document in MongoDB, use something similar to this:
#!/usr/bin/ruby
require 'mongo'
database = Mongo::Connection.new.db("yourdatabasename")
# get the document
x = database.find({"_id" => "12312132"})
# change the document
x["count"] = (x["count"] || 0) + 1
# update it in mongodb
database["collection"].update("_id" => "thecollectionid", x)
You might want to check out the manual for updating documents in MongoDB as well.
thanks to envu's direction I went with upsert in the end. here is an example snippet of how to use it the Ruby client:
link_id = #globallinks.update(
{
":url" => "http://somevalue.com"
},
{
'$inc' => {":totalcount" => 1},
'$set' => {":timelastseen" => Time.now}
},
{
:upsert=>true
}
)

DataMapper first_or_create always set certain field?

I'm using the following with datamapper to create/get a new user from my db:
user = User.first_or_create({:id => data['id']})
This gets the user with id = data['id'] or creates it if it doesn't already exist.
I want to know how to set other attributes/fields of the returned object regardless of whether it is a new record or existing?
Is the only way to do this to then call user.update({:field => value ...}) or is there a better way to achieve this?
Well, you could write it as one line:
(User.first_or_create(:id => data['id'])).update(:field => value)
with hashes for the parameters if you wish (or if you need to specify more than one); however, it's worth noting that this will only work if the model as specified by the first_or_create is valid. If :name were a required field, for instance, then this wouldn't work:
(User.first_or_create({:id => data['id'], :name => "Morse"})).update(:name => "Lewis")
as the creation in the first part would fail.
You could get around this by specifying the parameters needed for a new record with something like
(User.first_or_create({:id => data['id'], :name => "Morse"}, {:name => "Lewis"})).update(:name => "Lewis")
but this is unusually painful, and is difficult to read.
Also note that using first_or_create with an :id will attempt to create a model with that specific :id, if such a record doesn't exist. This might not be what you want.
Alternatively, you can use first_or_new. You can't call update on an object created using this, however, as the record won't exist (although I believe this might have worked in previous versions of DataMapper).
Just for anyone coming across this answer, User.first_or_create({:id => data['id']}) does NOT "get the user with id = data['id'] or creates it if it doesn't already exist." It actually gets the first record in the table and changes its id t0 data['id'].
To actually get the first record with that id, or create it if it doesn't exist, you need to use a where clause:
User.where(id: data['id]).first_or_create

Ruby: DataMapper and has n, :xyz, :through => Resource

I've encountered following issue:
there are 2 models: X and Y, they're associated with each other like this: has n, :<name>, :through => Resouce; when i'm doing something like x.ys = array_with_500_ys it takes really long time because DataMapper inserts only one association per query (insert into xs_ys(x_id, y_id) values(xid, yid)). This takes really long.
The question is: how to make this faster?
Thanks.
Because DataMapper has abstracted the 'back end', the standard behaviour is to insert one record at a time as SQL (or whatever you are using).
Assuming you are using an SQL backend, such as Postgres, you could drop back to raw SQL, and do the following:
x = X.first
query = "INSERT INTO xs_ys(x_id, y_id) VALUES"
vals = []
array_with_500_ys.each do |y|
vals << "(#{x.id}, #{y.id})"
end
repository.adapter.execute(query + vals.join(','));
This creates one 'insert', passing all records to be inserted. Not sure if this would be any faster, but you could put it into a background job if you need the app not to time out for the user.

Resources