I have a problem (due to time) when inserting around 13,000 records into the devices database.
Is there any way to optimize this? Is it possible to put these all into one transaction (as I believe that it is currently creating one transaction per insert (which apparently has a diabolical effect on speed)).
Currently this takes around 10 minutes, this includes converting CSV to a hash (this doesn't seem to be the bottleneck).
Stupidly I am not using RhoSync...
Thanks
Set up a transaction around the inserts and then only commit at the end.
From their FAQ.
http://docs.rhomobile.com/faq#how-can-i-seed-a-large-amount-of-data-into-my-application-with-rhom
db = ::Rho::RHO.get_src_db('Model')
db.start_transaction
begin
items.each do |item|
# create hash of attribute/value pairs
data = {
:field1 => item['value1'],
:field2 => item['value2']
}
# Creates a new Model object and saves it
new_item = Model.create(data)
end
db.commit
rescue
db.rollback
end
I've found this technique to be a tremendous speed up.
Use Fixed schema rather then property bag, and you can use one transaction (see below link for how).
http://docs.rhomobile.com/rhodes/rhom#perfomance-tips.
This question was answer by someone else, on google groups (HAYAKAWA Takashi)
Related
I've been flirting with Redis for a while now.
I've watched these series some time ago and they were awesome. I've been through some of the documentation and the mentioning of the Time complexity of the queries blew me away, this is something that's rarely mentioned in web materials but is of huge importance for app building.
Anyhow I'm trying to make my app use the Redis on the consumer end so the users can fetch the data as fast as possible.
So I'm trying to save some objects to hash as:
$redis->hmset("taxi_car", array(
"brand" => "Toyota",
"model" => "Yaris",
"license number" => "RO-01-PHP",
"year of fabrication" => 2010,
"nr_stats" => 0)
as found here and this works nicely.
However I can't find a way to delete the whole entry anywhere.
Did I get this hash thing wrong?
Following this example I would like to delete the entry with given licence number. All I could find is how to delete the licence number from the object:
$redis->hdel("taxi_car", "license number");
and can't figure out how to delete the whole hash row (please do correct with proper word for row here).
Another problem here is that it seems this only allows me to save a single taxi_car in the Redis. How do I set the UUID so I can have multiple Taxi cars?
I'm going to play with this a bit, any help is welcome. Thanks!
To delete a key of any type, Hash included, call the Redis DEL command.
To have multiple keys, give them different names, e.g. taxi_car:1, taxi_car:2 etc.
User.find(:all, :order => "RANDOM()", :limit => 10) was the way I did it in Rails 3.
User.all(:order => "RANDOM()", :limit => 10) is how I thought Rails 4 would do it, but this is still giving me a Deprecation warning:
DEPRECATION WARNING: Relation#all is deprecated. If you want to eager-load a relation, you can call #load (e.g. `Post.where(published: true).load`). If you want to get an array of records from a relation, you can call #to_a (e.g. `Post.where(published: true).to_a`).
You'll want to use the order and limit methods instead. You can get rid of the all.
For PostgreSQL and SQLite:
User.order("RANDOM()").limit(10)
Or for MySQL:
User.order("RAND()").limit(10)
As the random function could change for different databases, I would recommend to use the following code:
User.offset(rand(User.count)).first
Of course, this is useful only if you're looking for only one record.
If you wanna get more that one, you could do something like:
User.offset(rand(User.count) - 10).limit(10)
The - 10 is to assure you get 10 records in case rand returns a number greater than count - 10.
Keep in mind you'll always get 10 consecutive records.
I think the best solution is really ordering randomly in database.
But if you need to avoid specific random function from database, you can use pluck and shuffle approach.
For one record:
User.find(User.pluck(:id).shuffle.first)
For more than one record:
User.where(id: User.pluck(:id).sample(10))
I would suggest making this a scope as you can then chain it:
class User < ActiveRecord::Base
scope :random, -> { order(Arel::Nodes::NamedFunction.new('RANDOM', [])) }
end
User.random.limit(10)
User.active.random.limit(10)
While not the fastest solution, I like the brevity of:
User.ids.sample(10)
The .ids method yields an array of User IDs and .sample(10) picks 10 random values from this array.
Strongly Recommend this gem for random records, which is specially designed for table with lots of data rows:
https://github.com/haopingfan/quick_random_records
All other answers perform badly with large database, except this gem:
quick_random_records only cost 4.6ms totally.
the accepted answer User.order('RAND()').limit(10) cost 733.0ms.
the offset approach cost 245.4ms totally.
the User.all.sample(10) approach cost 573.4ms.
Note: My table only has 120,000 users. The more records you have, the more enormous the difference of performance will be.
UPDATE:
Perform on table with 550,000 rows
Model.where(id: Model.pluck(:id).sample(10)) cost 1384.0ms
gem: quick_random_records only cost 6.4ms totally
For MYSQL this worked for me:
User.order("RAND()").limit(10)
You could call .sample on the records, like: User.all.sample(10)
The answer of #maurimiranda User.offset(rand(User.count)).first is not good in case we need get 10 random records because User.offset(rand(User.count) - 10).limit(10) will return a sequence of 10 records from the random position, they are not "total randomly", right? So we need to call that function 10 times to get 10 "total randomly".
Beside that, offset is also not good if the random function return a high value. If your query looks like offset: 10000 and limit: 20 , it is generating 10,020 rows and throwing away the first 10,000 of them,
which is very expensive. So call 10 times offset.limit is not efficient.
So i thought that in case we just want to get one random user then User.offset(rand(User.count)).first maybe better (at least we can improve by caching User.count).
But if we want 10 random users or more then User.order("RAND()").limit(10) should be better.
Here's a quick solution.. currently using it with over 1.5 million records and getting decent performance. The best solution would be to cache one or more random record sets, and then refresh them with a background worker at a desired interval.
Created random_records_helper.rb file:
module RandomRecordsHelper
def random_user_ids(n)
user_ids = []
user_count = User.count
n.times{user_ids << rand(1..user_count)}
return user_ids
end
in the controller:
#users = User.where(id: random_user_ids(10))
This is much quicker than the .order("RANDOM()").limit(10) method - I went from a 13 sec load time down to 500ms.
Let's say I have a large query (for the purposes of this exercise say it returns 1M records) in MongoDB, like:
users = Users.where(:last_name => 'Smith')
If I loop through this result, working with each member, with something like:
users.each do |user|
# Some manipulation to "user"
# Some calculation for "user"
...
# Saving "user"
end
I'll often get a Mongo cursor timeout (as the database cursor that is reserved exceeds the default timeout length). I know I can extend the cursor timeout, or even turn it off--but this isn't always the most efficient method. So, one way I get around this is to change the code to:
users = Users.where(:last_name => 'Smith')
user_array = []
users.each do |u|
user_array << u
end
THEN, I can loop through user_array (since it's a Ruby array), doing manipulations and calculations, without worrying about a MongoDB timeout.
This works fine, but there has to be a better way--does anyone have a suggestion?
If your result set is so large that it causes cursor timeouts, it's not a good idea to load it entirely to RAM.
A common approach is to process records in batches.
Get 1000 users (sorted by _id).
Process them.
Get another batch of 1000 users where _id is greater than _id of last processed user.
Repeat until done.
For a long running task, consider using rails runner.
runner runs Ruby code in the context of Rails non-interactively. For instance:
$ rails runner "Model.long_running_method"
For further details, see:
http://guides.rubyonrails.org/command_line.html
Let me set the stage: My application deals with gift cards. When we create cards they have to have a unique string that the user can use to redeem it with. So when someone orders our gift cards, like a retailer, we need to make a lot of new card objects and store them in the DB.
With that in mind, I'm trying to see how quickly I can have my application generate 100,000 Cards. Database expert, I am not, so I need someone to explain this little phenomena: When I create 1000 Cards, it takes 5 seconds. When I create 100,000 cards it should take 500 seconds right?
Now I know what you're wanting to see, the card creation method I'm using, because the first assumption would be that it's getting slower because it's checking the uniqueness of a bunch of cards, more as it goes along. But I can show you my rake task
desc "Creates cards for a retailer"
task :order_cards, [:number_of_cards, :value, :retailer_name] => :environment do |t, args|
t = Time.now
puts "Searching for retailer"
#retailer = Retailer.find_by_name(args[:retailer_name])
puts "Retailer found"
puts "Generating codes"
value = args[:value].to_i
number_of_cards = args[:number_of_cards].to_i
codes = []
top_off_codes(codes, number_of_cards)
while codes != codes.uniq
codes.uniq!
top_off_codes(codes, number_of_cards)
end
stored_codes = Card.all.collect do |c|
c.code
end
while codes != (codes - stored_codes)
codes -= stored_codes
top_off_codes(codes, number_of_cards)
end
puts "Codes are unique and generated"
puts "Creating bundle"
#bundle = #retailer.bundles.create!(:value => value)
puts "Bundle created"
puts "Creating cards"
#bundle.transaction do
codes.each do |code|
#bundle.cards.create!(:code => code)
end
end
puts "Cards generated in #{Time.now - t}s"
end
def top_off_codes(codes, intended_number)
(intended_number - codes.size).times do
codes << ReadableRandom.get(CODE_LENGTH)
end
end
I'm using a gem called readable_random for the unique code. So if you read through all of that code, you'll see that it does all of it's uniqueness testing before it ever starts creating cards. It also writes status updates to the screen while it's running, and it always sits for a while at creating. Meanwhile it flies through the uniqueness tests. So my question to the stackoverflow community is: Why is my database slowing down as I add more cards? Why is this not a linear function in regards to time per card? I'm sure the answer is simple and I'm just a moron who knows nothing about data storage. And if anyone has any suggestions, how would you optimize this method, and how fast do you think you could get it to create 100,000 cards?
(When I plotted out my times on a graph and did a quick curve fit to get my line formula, I calculated how long it would take to create 100,000 cards with my current code and it says 5.5 hours. That maybe completely wrong, I'm not sure. But if it stays on the line I curve fitted, it would be right around there.)
Not an answer to your question, but a couple of suggestions on how to make the insert faster:
Use Ruby's Hash to eliminate duplicates - using your card codes as hash keys, adding them to a hash until your hash grows to the desired size. You can also use class Set instead (but I doubt it's any faster than Hash).
Use bulk insert into the database, instead of series of INSERT queries. Most DBMS's offer the possibility: create text file with new records, and tell database to import it. Here are links for MySQL and PostgreSQL.
My first thoughts would be around transactions - if you have 100,000 pending changes waiting to be committed in the transaction that would slow things down a little, but any decent DB should be able to handle that.
What DB are you using?
What indexes are in place?
Any DB optimisations, eg clustered tables/indexes.
Not sure of the Ruby transaction support - is the #bundle.transaction line something from ActiveModel or another library you are using?
I have a collection of users:
users = User.all()
I want to pass a subset of the user collection to a method.
Each subset should contain 1000 items (or less on the last iteration).
some_method(users)
So say users has 9500 items in it, I want to call some_method 10 times, 9 times passing 1000 items and the last time 500.
You can use Enumerable#each_slice method:
User.all.each_slice(1000) do |subarray|
some_method subarray
end
but that would first pull all the records from the database.
However, I guess you could make something like this:
def ar_each_slice scope, size
(scope.count.to_f / size).ceil.times do |i|
yield scope.scoped(:offset => i*size, :limit => size)
end
end
and use it as in:
ar_each_slice(User.scoped, 1000) do |slice|
some_method slice.all
end
It will first get the number of records (using COUNT), and then get 1000 by 1000 using LIMIT clause and pass it to your block.
Since Rails 2.3 one can specify batch_size:
User.find_in_batches(:batch_size =>1000) do |users|
some_method(users)
end
In this case, framework will run select query for every 1000 records. It keeps memory low if you are processing large number of records.
I think, you should divide into subset manually.
For example,
some_method(users[0..999])
I forgot about using :batch_size but Chandra suggested it. That's the right way to go.
Using .all will ask the database to retrieve all records, passing them to Ruby to hold then iterate over them internally. That is a really bad way to handle it if your database will be growing. That's because the glob of records will make the DBM work harder as it grows, and Ruby will have to allocate more and more space to hold them. Your response time will grow as a result.
A better solution is to use the :limit and :offset options to tell the DBM to successively find the first 1000 records at offset 0, then the next 1000 records at offset 1, etc. Keep looping until there are no more records.
You can determine how many times you'll have to loop by doing a .count before you begin asking, which is really fast unless your where-clause is beastly, or simply loop until you get no records back.