Optimizing nested activerecord query - ruby

Activerecord question.. How to optimize this query..
Prefectures (have many) Cities (have many) Shops (have many) Sales (have many) Brands
I'd like to get a list of one sale per prefecture, which is yet to end.. and then list the brands available at the sale.
The nesting is making this kind of tricky for me!
Here is what I came up with, though it's pretty ugly & I think it can be optimized at the query level rather than getting all unfinished sales..
#Get all sales which are yet to finish, ordered by finish date
upcoming_sales = Sale.includes([{:shop => {:city => :prefecture}}, :brands])
.where("finish > ?", Date.today)
.order('start ASC, finish ASC')
.select(['shop.city.prefecture.name', 'brands.name'])
#filter down to a single sale per prefecture
#sales = upcoming_sales.each_with_object({}){ |s,o|
o[s.shop.city.prefecture.name] = o[s.shop.city.prefecture.name] ||= s
}

How about something like this?
class Sale < ActiveRecord::Base
belongs_to :shop
has_many :brands
def self.first_sale_per_prefecture()
first_sale_id_per_prefecture = %(
select max(sales.id)
from sales
inner join shops on shop_id = shops.id
inner join cities on city_id = cities.id
where finish > #{Date.today}
group by prefecture_id
order by finish desc)
where("sales.id in (#{first_sale_id_per_prefecture})").includes(:brands, :shop => {:city => :prefecture})
end
end

You could get the upcoming sales and then join to shops => cities => prefectures and SELECT DISTINCT prefecture_id
this would ensure you only have one sale per prefecture. Something like this:
#sales = Sale.includes([{:shop => :prefecture},:brands])
.order('finish DESC')
.where("finish > ?", Date.today)
.joins(:shops => { :cities => :prefectures })
.select('sales.*, DISTINCT prefecture.id')

I'm going to try this using Arel
class Sale < ActiveRecord::Base
belongs_to :shop
class << self
# Returns only Sale objects with obj.finish > today
# add on other ActiveRecord queries:
# Sale.unfinished.all
# Sale.unfinished.all :limit => 10
def unfinished
where(Sale.arel_table[:finish].gt(Date.today))
end
# should select only one set of unfinished sales,
# and where the prefecture name is distinct
def distinct_prefecture
Sale.unfinished.joins({:shop => {:city => :prefecture}}).where(Prefecture.arel_table[:name].distinct)
end
end
end
Then, where you want it:
#sales = Sale.distinct_prefecture \
.includes(:brands]) \ # should already include the other stuff with joins
.order('start ASC, finish ASC')
#brand_list = #sales.collect{|s| s.brands}
If you want a limited result, this should be ok:
#sales = Sale.distinct_prefecture \
.limit(10) \
.includes(:brands]) \
.order('start ASC, finish ASC')

Related

How to group and sum by foreign key?

I have these two models in my Rails app:
class Person < ApplicationRecord
has_many :payments
end
class Payment < ApplicationRecord
belongs_to :person
end
How can I group the payments by person and order them by amount?
Right now I have...
Payment.group(:person_id).sum("amount")
...which works but doesn't include the persons' names. It returns something like this:
{ 1 => 1200.00, 2 => 2500.00 }
How can I replace the IDs / integers with the persons' names and also sort the whole thing by amount?
Thanks for any help!
Just be a bit more specific:
Payment.select('people.name, SUM(payments.amount)').joins(:person).group(:person_id)
Assuming that the persons table is named people in your application.
This will return the ActiveRecord::Relation that you can work with:
Person.joins(:payments).group('persons.id').select("persons.id, persons.name, sum(payments.amount) as amounts_summ")
Only for unique name fields:
Assuming you have name property for Person model, solution can be like this:
Payment.joins(:person).group(:name).order('sum_amount DESC').sum(:amount)
It generates query
SELECT SUM("payments"."amount") AS sum_amount, "name" AS name FROM "payments" INNER JOIN "persons" ON "persons"."id" = "payments"."persons_id" GROUP BY "name" ORDER BY sum_amount DESC
and return hash like this:
=> {"Mike"=>22333.0, "John"=>5676.0, "Alex"=>2000.0, "Carol"=>2000.0}

Rails ActiveRecord query does not use all required relations in `FROM`

I'm trying to maintain/fix an outdated plugin (redmine_backlogs) in Redmine for my company's productivity workflow; I am not even remotely conversant in the subtleties of Ruby (let alone Rails), but by virtue of having glanced at the code, I am the company guru on the matter... so.
Upfront clarity: I'm looking for help on troubleshooting Ruby on Rails code - namely:
The application I am debugging is doing something like
SELECT COUNT(*) FROM tableA WHERE tableB.id = ? ...
whereas I am expecting of course
SELECT COUNT(*) FROM tableA,tableB WHERE tableB.id = ? ...
The log reads as follows:
ActiveRecord::StatementInvalid (Mysql2::Error: Unknown column 'releases.id' in 'where clause': SELECT COUNT(*) FROM `projects` WHERE (projects.status <> 9) AND (releases.id = 10 OR (projects.status <> 9 AND ( 'system' = 'none' OR (projects.lft >= 91 AND projects.rgt <= 92 AND 'none' = 'tree') OR (projects.lft > 91 AND projects.rgt < 92 AND 'none' IN ('hierarchy', 'descendants')) OR (projects.lft < 91 AND projects.rgt > 92 AND 'none' = 'hierarchy'))))):
plugins/redmine_backlogs/app/controllers/rb_master_backlogs_controller.rb:43:in `_menu_new'
plugins/redmine_backlogs/app/controllers/rb_master_backlogs_controller.rb:62:in `menu'
lib/redmine/sudo_mode.rb:63:in `sudo_mode'
The active record item comes from this section of code in plugins/redmine_backlogs/app/controllers/rb_master_backlogs_controller.rb
...
elsif #release #menu for release
projects = #release.shared_to_projects(#project)
else #menu for product backlog
projects = #project.projects_in_shared_product_backlog
end
#make the submenu or single link
if !projects.empty? # <<<<<<<<<<<< ----------------------- Line 43 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
if projects.length > 1
links << {:label => l(label_new), :url => '#', :sub => []}
...
The projects object being tested for emptiness is generated by the call to #release.shared_to_project which is defined in ./app/models/rb_release.rb, whose relevant sections are:
class RbRelease < ActiveRecord::Base
...
unloadable
belongs_to :project, :inverse_of => :releases # <<<<<<<---- Is this where the association is being asserted ?
has_many :issues, :class_name => 'RbStory', :foreign_key => 'release_id', :dependent => :nullify
has_many :rb_release_burnchart_day_cache, :dependent => :delete_all, :foreign_key => 'release_id'
validates_presence_of :project_id, :name, :release_start_date, :release_end_date # <<<<------- or here ....?
validates_inclusion_of :status, :in => RELEASE_STATUSES
validates_inclusion_of :sharing, :in => RELEASE_SHARINGS
...
def shared_to_projects(scope_project)
#shared_projects ||=
begin
# Project used when fetching tree sharing
r = self.project.root? ? self.project : self.project.root
# Project used for other sharings
p = self.project
Project.visible.scoped(:include => :releases,
:conditions => ["#{RbRelease.table_name}.id = #{id}" +
" OR (#{Project.table_name}.status <> #{Project::STATUS_ARCHIVED} AND (" +
" 'system' = ? " +
" OR (#{Project.table_name}.lft >= #{r.lft} AND
...
So evidently we're relying on something from the main Project object to do something with the templated query - checking ../../app/models/project.rb I reach a dead end as to what to check for next... The RbRelease has declared its association with Project, but still the query does not seem to include that in the FROM clause before stating its conditions.
My naive question is: how would I fix the relationship association so that the SQL query is correctly built ?
==============
(previously there was also this issue as part of the question, but it turned out to be unrelated)
Started GET "/redmine/rb/master_backlog/afcd123_ghg/menu?project_id=9&authenticity_token=<REDACTED>" for 10.0.3.1 at 2017-06-21 15:21:36 +0000
Processing by RbMasterBacklogsController#menu as JSON
Parameters: {"project_id"=>"afcd123_ghg", "authenticity_token"=>"<REDACTED>"}
Current user: tai (id=125)
DEPRECATION WARNING: Relation#first with finder options is deprecated. Please build a scope and then call #first on it instead. (called from rb_project_settings at /usr/share/redmine/plugins/redmine_backlogs/lib/backlogs_project_patch.rb:193)
Completed 200 OK in 16ms (Views: 0.6ms | ActiveRecord: 7.0ms)
This issue was very application specific; I post the actual solution here as I managed to track down someone else who has been maintaining this plugin.
The relationship has to be explicitly state in a joins() method call, on to which the rest gets tacked.
That is, stating a relationship does not have any sway on this query, as it is a direct query on the Projects object.
Project.visible.joins('LEFT OUTER JOIN releases ON releases.project_id = projects.id').
includes(:releases).
where("#{RbRelease.table_name}.id = #{id}" +
...

Cache queries when creating sub records?

I have an application which handles orders with line items. The line items come in as part of the order in JSON format, e.g.:
{
"customer_id":24,
"line_items":[
{
"variant_id":"1423_101_10",
"quantity":"5",
"product_id":"1423"
},
{
"variant_id":"2396_101_12",
"quantity":"3",
"product_id":"2396"
}
]
}
So this will set up an order in the orders table, e.g.:
id | customer_id
1 | 24
And line items in the line_items table, e.g.:
id | order_id | product_id | variant_id | quantity | price*
1 | 1 | 1423 | 1423_101_10 | 5 | 10
2 | 1 | 2396 | 2396_101_10 | 3 | 15
*price doesn't come from the order JSON, it's retrieved via a lookup
However, when the new records are created it does a SELECT for the order for each line_item added. This wouldn't be an issue in the example above, but this application can and does have hundreds and sometimes thousands of line items for a particular order, so it seems like it's inefficient and potentially a cause of the Heroku server running out of memory. Is there a way to only load the Order once, rather than for each line item?
Another potential bottleneck is that a lookup is done against a Products table to get the price. In the example above, there's no possible caching, but if multiple variants of the same Product are selected, it seems inefficient to look up the Product each time when it may already have been loaded. For example, 1423_101_10, 1423_101_12, 1423_102_10 and 1423_102_12 are all the same Product with the same price. Is it better to try and cache Products already looked up or would that complicate things further?
Edit:
Completely forgot to add any code!
Order Model:
class Order < ActiveRecord::Base
has_many :line_items, :dependent => :destroy
Line Item Model:
class LineItem < ActiveRecord::Base
before_create :set_price
belongs_to :order
belongs_to :product, :primary_key => "product_id", :conditions => proc { "season = '#{order.season}'" }
def set_price
write_attribute :price, product.prices[order.currency] if price.nil? && product && order
end
Product Model:
class Product < ActiveRecord::Base
Edit 2:
OrdersController (simplified)
class OrdersController < ApplicationController
def create
#order = Order.new(order_params)
authorize! :create, #order
if #order.save
render_order_json
end
end
def order_params
permitted = params.permit(:customer_id, :line_items => line_item_params)
permitted[:line_items_attributes] = permitted.delete("line_items") if permitted["line_items"]
permitted
end
def line_item_params
[:product_id, :variant_id, :quantity]
end
Edit 3: An example of the SQL I see reported:
Order Load (1.0ms) SELECT "orders".* FROM "orders" WHERE "orders"."id" = $1 ORDER BY "orders"."id" ASC LIMIT 1 [["id", 1]]
Product Load (1.0ms) SELECT "products".* FROM "products" WHERE "products"."product_id" = $1 AND (season = 'AW14') ORDER BY "products"."id" ASC LIMIT 1 [["product_id", 1423]]
SQL (2.0ms) INSERT INTO "line_items" ("order_id", "price", "product_id", "quantity", "variant_id") VALUES ($1, $2, $3, $4, $5) RETURNING "id" [["order_id", 1], ["price", 10.0], ["product_id", 1423], ["quantity", 5], ["variant_id", "1423_101_10"]]
Order Load (1.0ms) SELECT "orders".* FROM "orders" WHERE "orders"."id" = $1 ORDER BY "orders"."id" ASC LIMIT 1 [["id", 1]]
Product Load (2.0ms) SELECT "products".* FROM "products" WHERE "products"."product_id" = $1 AND (season = 'AW14') ORDER BY "products"."id" ASC LIMIT 1 [["product_id", 2396]]
SQL (1.0ms) INSERT INTO "line_items" ("order_id", "price", "product_id", "quantity", "variant_id") VALUES ($1, $2, $3, $4, $5) RETURNING "id" [["order_id", 1], ["price", 15.0], ["product_id", 2396], ["quantity", 3], ["variant_id", "2396_101_10"]]
If you want to speed up the create action, you have several options:
removing the database intensive callbacks
speeding up the callbacks through caching
delay creation to be executed through background tasks
Depending on your application needs, those might be viable options in the order of impact into your code-base and infrastructure.
This totally depends on what you have already setup, so it might be the other way around.
By removing the callback (set_price) that creates the 1+n problem in your create code, you will have to create some lookup method that fetches all the prices at once and applies them to the order.
Caching could go into the set_price method, so that the lookup is only done once. You will have to take care of cache-expiry when the price changes, which might be none-trivial.
Using a background-job like resque or sidekiq can take the order and do all the processing without the response timing out. You will have to do an asynchronous check for the order to be processed to make it visible in the frontend.
In the end it just took a bit of a workflow change to speed up the Order creation.
Instead of calling set_price on the before_create method, it's done in the OrderController via the Order model. So now my code looks like:
Order Model:
class Order < ActiveRecord::Base
has_many :line_items, :dependent => :destroy
def set_prices
self.line_items.each do |item|
item.set_price
end
end
LineItem model:
class LineItem < ActiveRecord::Base
belongs_to :order
belongs_to :product, :primary_key => "product_id", :conditions => proc { "season = '#{order.season}'" }
def set_price
self.price = Product.where(:product_id => product_id, :season => season).first.prices[currency]
end
OrdersController:
class OrdersController < ApplicationController
def create
#order = Order.new(order_params)
authorize! :create, #order
#order.set_prices
if #order.save
render_order_json
end
end
def order_params
permitted = params.permit(:customer_id, :line_items => line_item_params)
permitted[:line_items_attributes] = permitted.delete("line_items") if permitted["line_items"]
permitted
end
def line_item_params
[:product_id, :variant_id, :quantity]
end
This question was also related and also sped things up.
Stop child models updating when parent is updated

How can I avoid duplication in a join query using Sequel with Postgres on Sinatra?

I want to do a simple join. I have two tables: "candidates" and "notes".
Not all candidates have notes written about them, some candidates have more than one note written about them. The linking fields are id in the candidates table and candidate_id in the notes table. The query is:
people = candidates.where(:industry => industry).where("country = ?", country).left_outer_join(:notes, :candidate_id => :id).order(Sequel.desc(:id)).map do |row|
{
:id => row[:id],
:first => row[:first],
:last => row[:last],
:designation => row[:designation],
:company => row[:company],
:email => row[:email],
:remarks => row[:remarks],
:note => row[:note]
}
end
It works kind of fine and gets all the specified candidates from the candidates table and the notes from the notes table but where there is more than one note it repeats the name of the candidate. In the resulting list, person "abc" appears twice or three times depending on the number of notes associated with that person.
I am not actually printing the notes in the HTML result just a "tick" if that person has notes and "--" if no notes.
I want the person's name to appear only once. I have tried adding distinct in every conceivable place in the query but it made no difference.
Any ideas?
In order for distinct to work, you need to make sure you are only selecting columns that you want to be distinct on. You could try adding this to the query
.select(:candidates__id, :first, :last, :designation, :company, :email, :remarks, Sequel.as({:notes=>nil}).as(:notes)).distinct
But you may be better off using a subselect instead of a join to check for the existence of notes (assuming you are using a decent database):
candidates.where(:industry => industry, :country=>country).select_append(Sequel.as({:id=>DB[:notes].select(:candidate_id)}, :note)).order(Sequel.desc(:id)).map do |row|
{ :id => row[:id], :first => row[:first], :last => row[:last], :designation => row[:designation], :company => row[:company], :email => row[:email], :remarks => row[:remarks], :note => row[:note] }
end

How can I avoid running singular expressions against arrays in activerecord and rails 3?

I am sorry if I am asking the question poorly. I have a Rails 3.1 app with models (simplified) like so:
class Employee < ActiveRecord::Base
has_many :merged_children, :class_name => 'Employee', :foreign_key => "merge_parent_id"
has_many :timesheets
def total_time
merged_children.timesheets.in_range(range).hours_minutes.sum
end
end
class Timesheet < ActiveRecord::Base
belongs_to :employee
def in_range(range)
# filter records based on transaction_date in range
end
def hours_minutes
(hours + minutes/60.0).to_f
end
end
Note: The in_range method acts as a scope, essentially, and hours_minutes is a calculation. hours_minutes is valid for each timesheet record in the resulting dataset, and then total_time should sum those values and return the amount.
The "total_time" method is not working because employee.merged_children returns an array and timesheets is meant to run against a single Employee object.
Is there any way to structure the "total_time" so that it still sends one query to the db? It seems inelegant to iterate over the merged_children array, issuing a query for each. Not sure if a direct call to an Arel table would help or hurt, but I am open to ideas.
If we get it right, the resulting SQL should effectively look something like:
SELECT sum(hours + minutes/60.0)
FROM employees e1 join employees e2 on e1.id = e2.merge_parent_id join timesheets t on t.employee_id = e2.id
WHERE e1.id = [#employee.id] and t.transaction_date BETWEEN [#range.begin] and [#range.end]
Thanks so much!
The easiest thing here might be to add
has_many :children_timesheets, :through => :merged_children, :source => :timesheets
To your employee model,
Then (assuming in_range is actually a scope, or a class method that does a find)
children_timesheets.in_range(...)
Should be the collection of timesheets you're interested in and you can do something like
children_timesheets.in_range(...).collect(&:hours_minutes).sum
Untested with actual data.
range = ((1.day.ago)...(2.days.ago))
merge_parent = Employee.find(some_id)
Timesheet.where(:transaction_date => range)
.joins(:employee).where(:employees => {:merge_parent_id => merge_parent.id})
.sum('hours*60 + minutes')
(0.3ms) SELECT SUM(hours*60 + minutes) AS sum_id FROM "timesheets" INNER JOIN "employees" ON "employees"."id" = "timesheets"."employee_id" WHERE "employees"."merge_parent_id" = 1 AND ("timesheets"."created_at" >= '2011-12-13 03:04:35.085416' AND "timesheets"."created_at" < '2011-12-12 03:04:
Returns "0" for me. So hopefully it will return something nicer for you

Resources