If I have a model of type Foo that has many child records of type Bar, I'd like to be able to show a list of Foo records and show the number of child Bar records. So I have something like...
#foos.each do |foo|
puts foo.name
puts foo.bars.count
end
How can I avoid the N+1 problem on my aggregates? In other words, I don't want a new SELECT COUNT(*)... query for each row. I could simply create a SQL view and map it to a new Model, but is there a simpler approach?
DataMpper is fickle about these things so I'll give you a couple options to try that may work depending on what your real code looks like.
Just change count to size, i.e., puts foo.bars.size. DM's strategic eager loading can sometimes work with this approach.
Force an eager load before the #foos.each loop, and change count to size, e.g.
#foos = Foo.all(...)
#foos.bars.to_a
#foos.each do | foo |
puts foo.name
puts foo.bars.size
end
Issue a raw SQL query before your #foos.each loop that returns structs with foo ids and bar counts, #map those into a Hash by food id and get them inside the loop. (I've only had to resort to this level of nonsense once or twice, I'd recommend fiddling with #1 and 2 for bit before it.)
Related
Given a foo table, a bar table, and a foos_bars table, all three with id columns, the approach to getting bars with foos that the documentation would seem to imply is something like:
class Foo < ROM::Relation[:sql]
def with_foos_bars
qualified.inner_join(:foos_bars, foo_id: :id)
end
def with_bars
with_category_fixtures.qualified.inner_join(:categories, id: :bar_id)
end
end
However, #qualified only applies to the class, so this is actually just qualifying "Foo" twice, but we need to qualify at least two of the tables for a usable SQL query. The same seems to be the case for #prefix. Omitting #qualified and prefix simply leads to an ambiguous SQL query.
To clarify: the question is how does one join through a join table in Ruby Object Mapper?
You need to use symbol column names with Sequel naming conventions for now, so something like this:
class Foo < ROM::Relation[:sql]
def with_foos_bars
qualified.inner_join(
foos_bars, foos_bars__foo_id: foos__id
)
end
def with_bars
with_category_fixtures.qualified.inner_join(
:categories, categories__id: :foos_bars__bar_id
)
end
end
The plan is to provide new interfaces that would simplify that, although I gotta say this simple naming conventions has been working well for me. Having said that, there's definitely place for improvements here.
I hope this helps.
I want to write a Kiba Etl script which has a source from a CSV to Destination CSV with a list of transformation rules among which the 2nd transformer is an Aggregation in which operation such as select name, sum(euro) group by name
Kiba ETL Script file
source CsvSource, 'users.csv', col_sep: ';', headers: true, header_converters: :symbol
transform VerifyFieldsPresence, [:name, :euro]
transform AggregateFields, { sum: :euro, group_by: :name}
transform RenameField,from: :euro, to: :total_amount
destination CsvDestination, 'result.csv', [:name, :total_amount]
users.csv
date;euro;name
7/3/2015;10;Jack
7/3/2015;85;Jill
8/3/2015;6;Jack
8/3/2015;12;Jill
9/3/2015;99;Mack
result.csv (expected result)
total_amount;name
16;Jack
97;Jill
99;Mack
As etl transformers execute one after the other on a single row at one time, But my 2nd transformer behavior depends on the entire collection of row which I cant access it in the class which is passed to transform method.
transform AggregateFields, { sum: :euro, group_by: :name }
Is there possibly any which this behavior can be achieved using kiba gem
Thank you in Advance
EDIT: it's 2020, and Kiba ETL v3 includes a much better way to do this. Check out this article https://thibautbarrere.com/2020/03/05/new-in-kiba-etl-v3 for all the relevant information.
Kiba author here! You can achieve that in many different ways, depending mainly on the data size and your actual needs. Here are a couple of possibilities.
Aggregating using a variable in your Kiba script
require 'awesome_print'
transform do |r|
r[:amount] = BigDecimal.new(r[:amount])
r
end
total_amounts = Hash.new(0)
transform do |r|
total_amounts[r[:name]] += r[:amount]
r
end
post_process do
# pretty print here, but you could save to a CSV too
ap total_amounts
end
This is the simplest way, yet this is quite flexible.
It will keep your aggregates in memory though, so this may be good enough or not, depending on your scenario. Note that currently Kiba is mono-threaded (but "Kiba Pro" will be multi-threaded), so there is no need to add a lock or use a thread-safe structure for the aggregate, for now.
Calling TextQL from post_process blocks
Another quick and easy way to aggregate is to generate a non-aggregated CSV file first, then leverage TextQl to actually do the aggregation, like this:
destination CsvSource, 'non-aggregated-output.csv', [:name, :amount]
post_process do
query = <<SQL
select
name,
/* apparently sqlite has reduced precision, round to 2 for now */
round(sum(amount), 2) as total_amount
from tbl group by name
SQL
textql('non-aggregated-output.csv', query, 'aggregated-output.csv')
end
With the following helpers defined:
def system!(cmd)
raise "Failed to run command #{command}" unless system(command)
end
def textql(source_file, query, output_file)
system! "cat #{source_file} | textql -header -output-header=true -sql \"#{query}\" > #{output_file}"
# this one uses csvfix to pretty print the table
system! "cat #{output_file} | csvfix ascii_table"
end
Be careful with the precision though when doing computations.
Writing an in-memory aggregating destination
A useful trick that can work here is to wrap a given destination with a class to do the aggregation. Here is how it could look like:
class InMemoryAggregate
def initialize(sum:, group_by:, destination:)
#aggregate = Hash.new(0)
#sum = sum
#group_by = group_by
# this relies a bit on the internals of Kiba, but not too much
#destination = destination.shift.new(*destination)
end
def write(row)
# do not write, but count here instead
#aggregate[row[#group_by]] += row[#sum]
end
def close
# use close to actually do the writing
#aggregate.each do |k,v|
# reformat BigDecimal additions here
value = '%0.2f' % v
#destination.write(#group_by => k, #sum => value)
end
#destination.close
end
end
which you can use this way:
# convert your string into an actual number
transform do |r|
r[:amount] = BigDecimal.new(r[:amount])
r
end
destination CsvDestination, 'non-aggregated.csv', [:name, :amount]
destination InMemoryAggregate,
sum: :amount, group_by: :name,
destination: [
CsvDestination, 'aggregated.csv', [:name, :amount]
]
post_process do
system!("cat aggregated.csv | csvfix ascii_table")
end
The nice thing about this version is that you can reuse your aggregator with different destinations (like a database one, or anything else).
Note though that this will keep all the aggregates in memory, like the first version.
Inserting into a store with aggregating capabilities
Another way (especially useful if you have very large volumes) is to send the resulting data into something that will be able to aggregate the data for you. It could be a regular SQL database, Redis, or anything more fancy, which you would then be able to query as needed.
So as I said, the implementation will largely depend on your actual needs. Hope you will find something that works for you here!
lets say I have this
#objects = SampleObject.all
then I want to check if #objects is blank, I could the ff:
unless #objects.blank?
#objects.each do |object|
end
else
..
end
however, doing so will trigger rails to execute a SELECT count(*) query
so instead, I could do something like
unless #objects.length > 0
is there a way to override the .blank? given a particular class?
say
def self.empty?
self.length > 0 ? false : true <br>
end
You should use ActiveRecord::Relation#any? method:
if #objects.any?
# ...
end
which is (in this case) negation of ActiveRecord::Relation#empty? method:
unless #objects.empty?
# ...
end
blank? uses empty?, since blank? source code:
# File activesupport/lib/active_support/core_ext/object/blank.rb, line 13
def blank?
respond_to?(:empty?) ? empty? : !self
end
Now the docs about empty? says:
Returns true if the collection is empty.
If the collection has been
loaded or the :counter_sql option is provided, it is equivalent to
collection.size.zero?.
If the collection has not been loaded, it is
equivalent to collection.exists?.
If the collection has not already
been loaded and you are going to fetch the records anyway it is better
to check collection.length.zero?
So, it really depends weather the collection is loaded or not?
Both empty? & any? use SELEC COUNT(*) if the collection isn't loaded (reference), i think in your case SampleObject.all will be lazy loaded as #Marek said, thus the COUNT calls.
For your case i don't think you can avoid the COUNT call since you want to fetch all records and eager loading all records just to avoid a second call to db just feels pointless (solving one performance issue by causing bigger one), however if its a subset collection, i believe there will be no second COUNT call.
SampleObject.all gives ActiveRecord::Relation object, you can use blank? also.
blank? - Returns true if relation is blank
Thus you can write as
unless #objects.blank?
# ...
end
You can call ActiveRecord::Relation#to_a to execute the query immediately:
#objects = SampleObject.all.to_a # runs the query, returns an array
if #objects.any? # no query, this is Enumerable#any?
# ...
end
I have a group of radio buttons returning the following hash:
{"1"=>"1", "3"=>"2"}
The key represents the event_id and the value represents the regoption_id. I need to insert these to the subscriptions table preferably all at once.
I tried the following:
params[:children].each do |child|
Subscription.create({:event_id => child[0], :regoption_id => child[1]}).save
end
This ends up saving just one radio group, not all in the hash. Any ideas on how to do this?
There's a gem called activerecord-import that will insert multiple records efficiently. It works with many popular DB backends, and will just do the right thing with most of them. This does exactly what you want: it accepts an array of object instances, or an array-of-hashes-of-values, and inserts them into a table in a single statement.
Here's a usage example right from the gem documentation:
books = []
10.times do |i|
books << Book.new(:name => "book #{i}")
end
Book.import books
What's the most efficient way to iterate through an entire table using Datamapper?
If I do this, does Datamapper try to pull the entire result set into memory before performing the iteration? Assume, for the sake of argument, that I have millions of records and that this is infeasible:
Author.all.each do |a|
puts a.title
end
Is there a way that I can tell Datamapper to load the results in chunks? Is it smart enough to know to do this automatically?
Thanks, Nicolas, I actually came up with a similar solution. I've accepted your answer since it makes use of Datamapper's dm-pagination system, but I'm wondering if this would do equally as well (or worse):
while authors = Author.slice(offset, CHUNK) do
authors.each do |a|
# do something with a
end
offset += CHUNK
end
Datamapper will run just one sql query for the example above so it will have to keep the whole result set in memory.
I think you should use some sort of pagination if your collection is big.
Using dm-pagination you could do something like:
PAGE_SIZE = 20
pager = Author.page(:per_page => PAGE_SIZE).pager # This will run a count query
(1..pager.total_pages).each do |page_number|
Author.page(:per_page => PAGE_SIZE, :page => page_number).each do |a|
puts a.title
end
end
You can play around with different values for PAGE_SIZE to find a good trade-off between the number of sql queries and memory usage.
What you want is the dm-chunked_query plugin: (example from the docs)
require 'dm-chunked_query'
MyModel.each_chunk(20) do |chunk|
chunk.each do |resource|
# ...
end
end
This will allow you to iterate over all the records in the model, in chunks of 20 records at a time.
EDIT: the example above had an extra #each after #each_chunk, and it was unnecessary. The gem author updated the README example, and I changed the above code to match.