Optimizing Postgres queries using Sequel gem - ruby

I have a Ruby script using the Sequel gem to access my Postgres database.
At the moment, here are my calls:
#db[:items].insert_conflict.insert(:sku => sku, :name => itemName)
dbItem = Item.where(:sku => sku).first
dbPrice = Price.create(:price => foundPrice, :quantity => quantity, :status => status)
dbItem.add_price(dbPrice)
store.add_price(dbPrice)
Unfortunately, insert_conflict.insert returns either the key if it is inserted, and nil if it exists.. otherwise I'd be able to use the one call (to get the actual object that was inserted or already existed, see line 2). Is there any other way to reduce this to one call?
As for the last three calls, I believe since I'm adding relationships between three different schemas, there's no way to reduce this. But I'm including it just in case.

Related

Sequel gem: Why does eager().all work but eager().first doesn't?

I'm storing a user's profile fields in a separate table, and want to look up a user by email address (for password reset). Trying to determine the best approach, and ran into this unexpected behaviour inconsistency.
Schema
create_table(:users) do
String :username, primary_key: true
...
end
create_table(:user_fields) do
primary_key :id
foreign_key :user_id, :users, type: String, null: false
String :label, null: false
String :value, null: false
end
Console Session
This version works (look up field, eager load it's associated user, call .all, take the first one):
irb(main):005:0> a = UserField.where(label: 'email', value: 'testuser#test.com').eager(:user).all[0]
I, [2015-09-29T17:54:06.273263 #147] INFO -- : (0.000176s) SELECT * FROM `user_fields` WHERE ((`label` = 'email') AND (`value` = 'testuser#test.com'))
I, [2015-09-29T17:54:06.273555 #147] INFO -- : (0.000109s) SELECT * FROM `users` WHERE (`users`.`username` IN ('testuser'))
=> #<UserField #values={:id=>2, :user_id=>"testuser", :label=>"email", :value=>"testuser#test.com"}>
irb(main):006:0> a.user
=> #<User #values={:username=>"testuser"}>
You can see both queries (field and user) are kicked off together, and when you try to access a.user, the data's already loaded.
But when I try calling .first in place of .all:
irb(main):007:0> b = UserField.where(label: 'email', value: 'testuser#test.com').eager(:user).first
I, [2015-09-29T17:54:25.832064 #147] INFO -- : (0.000197s) SELECT * FROM `user_fields` WHERE ((`label` = 'email') AND (`value` = 'testuser#test.com')) LIMIT 1
=> #<UserField #values={:id=>2, :user_id=>"testuser", :label=>"email", :value=>"testuser#test.com"}>
irb(main):008:0> b.user
I, [2015-09-29T17:54:27.887718 #147] INFO -- : (0.000172s) SELECT * FROM `users` WHERE (`username` = 'testuser') LIMIT 1
=> #<User #values={:username=>"testuser"}>
The eager load fails -- it doesn't kick off the second query for the user object until you try to reference it with b.user.
What am I failing to understand about the sequel gem API here? And what's the best way to load a model instance based on the attributes of it's associated models? (find user by email address)
Eager loading only makes sense when loading multiple objects. And in order to eager load, you need all of the current objects first, in order to get all associated objects in one query. With each, you don't have access to all current objects first, since you are iterating over them.
You can use the eager_each plugin if you want Sequel to handle things internally for you, though note that it makes dataset.first do something similar to dataset.all.first for eagerly loaded datasets. But it's better to not eager load if you only need one object, and to call all if you need to eagerly load multiple ones.

Bulk Insert into Mongo - Ruby

I am new to Ruby and Mongo and am working with twitter data. I'm using Ruby 1.9.3 and Mongo gems.
I am querying bulk data out of Mongo, filtering out some documents, processing the remaining documents (inserting new fields) and then writing new documents into Mongo.
The code below is working but runs relatively slow as I loop through using .each and then insert new documents into Mongo one at a time.
My Question: How can this be structured to process and insert in bulk?
cursor = raw.find({'user.screen_name' => users[cur], 'entities.urls' => []},{:fields => params})
cursor.each do |r|
if r['lang'] == "en"
score = r['retweet_count'] + r['favorite_count']
timestamp = Time.now.strftime("%d/%m/%Y %H:%M")
#Commit to Mongo
#document = {:id => r['id'],
:id_str => r['id_str'],
:retweet_count => r['retweet_count'],
:favorite_count => r['favorite_count'],
:score => score,
:created_at => r['created_at'],
:timestamp => timestamp,
:user => [{:id => r['user']['id'],
:id_str => r['user']['id_str'],
:screen_name => r['user']['screen_name'],
}
]
}
#collection.save(#document)
end #end.if
end #end.each
Any help is greatly appreciated.
In your case there is no way to make this much faster. One thing you could do is retrieve the documents in bulks, processing them and the reinserting them in bulks, but it would still be slow.
To speed this up you need to do all the processing server side, where the data already exist.
You should either use the aggregate framework of mongodb if the result document does not exceed 16mb or for more flexibility but slower execution (much faster than the potential your solution has) you can use the MapReduce framework of mongodb
What exactly are you doing? Why not going pure ruby or pure mongo (well that's ruby too) ? and Why do you really need to load every single attribute?
What I've understood from your code is you actually create a completely new document, and I think that's wrong.
You can do that with this in ruby side:
cursor = YourModel.find(params)
cursor.each do |r|
if r.lang == "en"
r.score = r.retweet_count + r.favorite_count
r.timestamp = Time.now.strftime("%d/%m/%Y %H:%M")
r.save
end #end.if
end #end.each
And ofcourse you can import include Mongoid::Timestamps in your model and it handles your created_at, and updated_at attribute (it creates them itself)
in mongoid it's a little harder
first you get your collection with use my_db then the next code will generate what you want
db.models.find({something: your_param}).forEach(function(doc){
doc.score = doc.retweet_count + doc.favorite_count
doc.timestamp = new Timestamp()
db.models.save(doc)
}
);
I don't know what was your parameters, but it's easy to create them, and also mongoid really do lazy loading, so if you don't try to use an attribute, it won't load that. You can actually save a lot of time not using every attribute.
And these methods, change the existing document, and won't create another one.

How to update or insert on Sequel dataset?

I just started using Sequel in a really small Sinatra app. Since I've got only one DB table, I don't need to use models.
I want to update a record if it exists or insert a new record if it does not. I came up with the following solution:
rec = $nums.where(:number => n, :type => t)
if $nums.select(1).where(rec.exists)
rec.update(:counter => :counter + 1)
else
$nums.insert(:number => n, :counter => 1, :type => t)
end
Where $nums is DB[:numbers] dataset.
I believe that this way isn't the most elegant implementation of "update or insert" behavior.
How should it be done?
You should probably not check before updating/inserting; because:
This is an extra db call.
This could introduce a race condition.
What you should do instead is to test the return value of update:
rec = $nums.where(:number => n, :type => t)
if 1 != rec.update(:counter => :counter + 1)
$nums.insert(:number => n, :counter => 1, :type => t)
end
Sequel 4.25.0 (released July 31st, 2015) added insert_conflict for Postgres v9.5+
Sequel 4.30.0 (released January 4th, 2016) added insert_conflict for SQLite
This can be used to either insert or update a row, like so:
DB[:table_name].insert_conflict(:update).insert( number:n, type:t, counter:c )
I believe you can't have it much cleaner than that (although some databases have specific upsert syntax, which might be supported by Sequel). You can just wrap what you have in a separate method and pretend that it doesn't exist. :)
Just couple suggestions:
Enclose everything within a transaction.
Create unique index on (number, type) fields.
Don't use global variables.
You could use upsert, except it doesn't currently work for updating counters. Hopefully a future version will - ideas welcome!

DataMapper first_or_create always set certain field?

I'm using the following with datamapper to create/get a new user from my db:
user = User.first_or_create({:id => data['id']})
This gets the user with id = data['id'] or creates it if it doesn't already exist.
I want to know how to set other attributes/fields of the returned object regardless of whether it is a new record or existing?
Is the only way to do this to then call user.update({:field => value ...}) or is there a better way to achieve this?
Well, you could write it as one line:
(User.first_or_create(:id => data['id'])).update(:field => value)
with hashes for the parameters if you wish (or if you need to specify more than one); however, it's worth noting that this will only work if the model as specified by the first_or_create is valid. If :name were a required field, for instance, then this wouldn't work:
(User.first_or_create({:id => data['id'], :name => "Morse"})).update(:name => "Lewis")
as the creation in the first part would fail.
You could get around this by specifying the parameters needed for a new record with something like
(User.first_or_create({:id => data['id'], :name => "Morse"}, {:name => "Lewis"})).update(:name => "Lewis")
but this is unusually painful, and is difficult to read.
Also note that using first_or_create with an :id will attempt to create a model with that specific :id, if such a record doesn't exist. This might not be what you want.
Alternatively, you can use first_or_new. You can't call update on an object created using this, however, as the record won't exist (although I believe this might have worked in previous versions of DataMapper).
Just for anyone coming across this answer, User.first_or_create({:id => data['id']}) does NOT "get the user with id = data['id'] or creates it if it doesn't already exist." It actually gets the first record in the table and changes its id t0 data['id'].
To actually get the first record with that id, or create it if it doesn't exist, you need to use a where clause:
User.where(id: data['id]).first_or_create

multiple levels of associated db objects to YAML

I need to create a 'List' object from the following db tables. I've already done this in a rails/datamapper application, but now I have a need to get specific lists into and out of a db through YAML.
List
Categories
Items
Item choices
e.g. given a list identifier, pull the initial list, the categories for that list, the items for those categories, and the choices for those items into some object, then output as a yaml file.
My first step is output a specific list to yaml, this shouldn't be a unique situation and I'm sure others have solved it before. From reading I'm guessing I need a multilevel hash of some sort, but all I've been able to do so far is get list and category...i.e. this is a bit out of my range right now, and I'm only working from the command line.
I'm asking for two things really to assist in sharpening my skill set:
guidance on working with a multiple level, nested hash situation to properly serialize an object for yaml, given a series of associated db tables
if there is an easier way that someone has already solved.
The included to_json (doc) method already allows you to easily nest related records, and choose what you want to output :
List.all.to_json(:only => {}, :include => {
:categories => { :only => {}, :include => {
:items => { :only => :your_attribute_name }
}
})
The next step is to convert it to yaml :
ActiveSupport::JSON.decode(your_json).to_yaml
Hope this helps

Resources