I've encountered following issue:
there are 2 models: X and Y, they're associated with each other like this: has n, :<name>, :through => Resouce; when i'm doing something like x.ys = array_with_500_ys it takes really long time because DataMapper inserts only one association per query (insert into xs_ys(x_id, y_id) values(xid, yid)). This takes really long.
The question is: how to make this faster?
Thanks.
Because DataMapper has abstracted the 'back end', the standard behaviour is to insert one record at a time as SQL (or whatever you are using).
Assuming you are using an SQL backend, such as Postgres, you could drop back to raw SQL, and do the following:
x = X.first
query = "INSERT INTO xs_ys(x_id, y_id) VALUES"
vals = []
array_with_500_ys.each do |y|
vals << "(#{x.id}, #{y.id})"
end
repository.adapter.execute(query + vals.join(','));
This creates one 'insert', passing all records to be inserted. Not sure if this would be any faster, but you could put it into a background job if you need the app not to time out for the user.
Related
When simply displaying large amounts of data (over 100k records) my code works well, and I paginate on the server.
However, when I need to sort this data I'm stuck. I'm only sorting on the page, and NOT sorting on ALL the records related to this one customer.
How can I paginate but also sort across all the records of my customer and NOT simply sort the records returned from the server side pagination?
I'm also using BootStrap Table to display all my data.
Here is my code that gets all the customers:
def get_customers
#data_to_return = []
#currency = current_shop.country_currency
customers = current_shop.customers.limit(records_limit).offset(records_offset)#.order("#{sort_by}" " " "#{sort_order}")
customers.each do |customer|
#data_to_return.push(
state: false,
id: customer.id,
email: customer.email,
accepts_marketing: customer.accepts_marketing,
customer_status: customer.customer_status,
tags: customer.tags)
end
sort_customers
end
And then this is the sort_customers method:
def sort_customers
fixed_data = data_to_return.sort_by {|hsh| hsh[sort_by]}
customer_size = current_shop.customers.length
if sort_order == "ASC"
fixed_data
else
fixed_data.reverse!
end
render json: {"total": customer_size, "rows": fixed_data}
end
In the above code you can see that data_to_return is coming from get_customers and its limited. But I don't want to return ALL the customers for many reasons.
How can I sort across all the records, but only return the paginated subset?
You should actually sort at the model/query level, not at the ruby level.
The difference is basically:
# sort in ruby
relation.sort_by { |item| foo(item) }
# sort in database - composes with pagination
relation.order('column_name ASC/DESC')
In the first case, the relation is implicitly executed, enumerated and converted to array before calling sort_by. If you did pagination (manually or with kaminari), you will get just that page of data.
In the second case, you are actually composing the limit, offset and where (limit and offset are anyways used under the hood by kaminari, where is implicit when you use associations) with a order so your database would execute
SELECT `customers`.`*` FROM `customers`
WHERE ...
OFFSET ...
LIMIT ...
ORDER BY ...
which will return the correct data.
A good option is to define scopes in the model, like
class Customer < ApplicationRecord
scope :sorted_by_email, ->(ascending = true) { order("email #{ascending ? 'ASC' : 'DESC'}") }
end
# in controller
customers = current_shop.customers.
limit(records_limit).
offset(records_offset).
sorted_by_email(false)
You can resolve sorting and paginate issue using Data Tables library, which is client side. It's a Jquery library. Using this you need to load all data into page, then it would work very well.
Below are the references please check.
Data tables jquery libray
Data tables gem for rails
You can try these, they will work very well. You can customise it as well
If the answer is helpful, you can accept it.
I've set up a mongoDB on an Ubuntu AWS instance. I also have something like 920 files ranging in size from 5Mb to 2Gb or so.
Once each unzipped text file is uniq'd with uniq, I run the following script to insert them into the DB:
require 'mongo'
require 'bson'
Mongo::Logger.logger.level = ::Logger::FATAL
puts "Working..."
db = Mongo::Client.new([ 'localhost:27017' ], :database => 'supers')
coll = db[:hashes]
# suppressors = File.open('_combined.txt')
suppressors = Dir['./_uniqued_*.txt']
count = suppressors.count
puts "Found #{count}"
suppressors.each_with_index do |fileroute, i|
suppressor = File.open(fileroute, 'r')
percentage = ((i+1) / count.to_f * 100).round(2)
puts "Working on `#{fileroute}` (#{i+1}/#{count} - #{percentage})"
c = 0
suppressor.each_line do |hash|
c+=1
coll.update_one({ :_id => hash }, { :$inc => { :count => 1 } }, { upsert: true} )
puts "Processed 50k records for #{fileroute}" if c % 50_000 == 0
end
end
The idea is, if the record already exists, the $inc will set the count to 2 or 3 so I'll be able to find all the duplicates by running a query on the DB later.
I connected to the instance via RoboMongo and at first every time I refreshed the following query:
db.getCollection('hashes').count({})
I'd see that it was filling up the DB very quickly. There's lots of files but I figured I'd leave it overnight.
However after some time the result got stuck at 3788104. I was worried there was some hard size limit (df says I'm only using 35% of the HDD space)
Is there something in the config file which automatically limits the amount of records which can be inserted or something?
PS: is it just me or is either upsert or .each_line incredibly slow?
MongoDB's update model is based on write concerns, meaning that calling the function updateOne alone is not a guarantee for success.
If the version of MongoDB is at least 2.6, the function updateOne will return a document with information about any errors. If the version of MongoDB is older, an explicit call of the getLastError command will return the document with possible errors.
If the database does not contain all desired documents, it is likely that this returned document will contain errors.
In both cases, the write concern can be adjusted to the desired level, i.e., gives control about how many mongo instances must have propagated the change for it to be considered a success.
(Note: I am not familiar with the Ruby driver, this is assuming it behaves like the shell).
I am new to Ruby and Mongo and am working with twitter data. I'm using Ruby 1.9.3 and Mongo gems.
I am querying bulk data out of Mongo, filtering out some documents, processing the remaining documents (inserting new fields) and then writing new documents into Mongo.
The code below is working but runs relatively slow as I loop through using .each and then insert new documents into Mongo one at a time.
My Question: How can this be structured to process and insert in bulk?
cursor = raw.find({'user.screen_name' => users[cur], 'entities.urls' => []},{:fields => params})
cursor.each do |r|
if r['lang'] == "en"
score = r['retweet_count'] + r['favorite_count']
timestamp = Time.now.strftime("%d/%m/%Y %H:%M")
#Commit to Mongo
#document = {:id => r['id'],
:id_str => r['id_str'],
:retweet_count => r['retweet_count'],
:favorite_count => r['favorite_count'],
:score => score,
:created_at => r['created_at'],
:timestamp => timestamp,
:user => [{:id => r['user']['id'],
:id_str => r['user']['id_str'],
:screen_name => r['user']['screen_name'],
}
]
}
#collection.save(#document)
end #end.if
end #end.each
Any help is greatly appreciated.
In your case there is no way to make this much faster. One thing you could do is retrieve the documents in bulks, processing them and the reinserting them in bulks, but it would still be slow.
To speed this up you need to do all the processing server side, where the data already exist.
You should either use the aggregate framework of mongodb if the result document does not exceed 16mb or for more flexibility but slower execution (much faster than the potential your solution has) you can use the MapReduce framework of mongodb
What exactly are you doing? Why not going pure ruby or pure mongo (well that's ruby too) ? and Why do you really need to load every single attribute?
What I've understood from your code is you actually create a completely new document, and I think that's wrong.
You can do that with this in ruby side:
cursor = YourModel.find(params)
cursor.each do |r|
if r.lang == "en"
r.score = r.retweet_count + r.favorite_count
r.timestamp = Time.now.strftime("%d/%m/%Y %H:%M")
r.save
end #end.if
end #end.each
And ofcourse you can import include Mongoid::Timestamps in your model and it handles your created_at, and updated_at attribute (it creates them itself)
in mongoid it's a little harder
first you get your collection with use my_db then the next code will generate what you want
db.models.find({something: your_param}).forEach(function(doc){
doc.score = doc.retweet_count + doc.favorite_count
doc.timestamp = new Timestamp()
db.models.save(doc)
}
);
I don't know what was your parameters, but it's easy to create them, and also mongoid really do lazy loading, so if you don't try to use an attribute, it won't load that. You can actually save a lot of time not using every attribute.
And these methods, change the existing document, and won't create another one.
I have two models with the appropriate foreign key created in the people table:
class Person < ActiveRecord::Base
belongs_to :family
class Family < ActiveRecord::Base
has_many :people
If I do the following I get an object - #family_members - as an instance variable and I have no problems:
#family_members = Family.find(1)
I can access the 'child' people table fields easily in my view:
#family_members.people.first_name
However, if I use the arel way with "where" etc. I get an "ActiveRecord::Relation", not a normal object, which leaves me stumped as to how to access the same "first_name" field form the people table like I accessed above:
#family_members = Family.where(:id => 1)
or even
#family_members = Family.joins(:people).where(:id => 1)
(is the "joins" even required??)
I understand that using ".first" will cause the query to run:
#family_members = Family.where(:id => 1).first
But it returns an array, not an object, so if I use in my view:
#family_members.people.first_name
I get a "method 'people' unknown" error.
How can I access the 'first_name' field of the people table like I did with the object created by "find" but using an ActiveRecord relation?
* added information 7/15 ********
To clarify what I am looking for -- here is what I would have written if I were writing SQL instead of Arel:
SELECT f.home_phone, f.address, p.first_name, p.last_name, p.birthday
FROM families f INNER JOIN people p ON p.family.id = f.id WHERE family_id = 1
With that query's results loaded into a result set I could access:
myResultSet("home_phone") -- the home_phone from the families table
myResultSet("address") -- the address from the families table
myResultSet("first_name") -- the first_name from the people table
myResultSet("birthdate") -- the birthdate from the people table
If the two tables in the query have a same-named field I would just use "AS" to request one of the fields by another name.
I have used this kind of query/result set for many years in web apps and I am trying to deduce how to do the same in Rails and ActiveRecord.
#family_members.people.first_name shouldn't ever work so I'm surprised you find it working ... #family_members contains a Family object, #family_members.people is an array of Person objects.
The fact that you're calling it #family_members seems to make me think you're expecting it to be an array of Persons... in which case the correct code would be...
#family_members = Family.find(1).people # finds people in first Family object
If you expect #family_members to contain just the first family member, then...
#family_members = Family.find(1).people.first
If you want an array of first names of all family members, then...
#family_members = Family.find(1).people # finds people in 1st Family object
#family_members.map {|member| member.first_name} # array of first_name
#family_members = Family.find(1) and #family_members = Family.where(:id => 1) are functionally identical.. both retrieve the first Family object in the database in each case may contain zero, one, or multiple people.
Just to be clear, the "1" in all examples above refer to which Family object is retrieved, not which Person in the Family.
I just started using Sequel in a really small Sinatra app. Since I've got only one DB table, I don't need to use models.
I want to update a record if it exists or insert a new record if it does not. I came up with the following solution:
rec = $nums.where(:number => n, :type => t)
if $nums.select(1).where(rec.exists)
rec.update(:counter => :counter + 1)
else
$nums.insert(:number => n, :counter => 1, :type => t)
end
Where $nums is DB[:numbers] dataset.
I believe that this way isn't the most elegant implementation of "update or insert" behavior.
How should it be done?
You should probably not check before updating/inserting; because:
This is an extra db call.
This could introduce a race condition.
What you should do instead is to test the return value of update:
rec = $nums.where(:number => n, :type => t)
if 1 != rec.update(:counter => :counter + 1)
$nums.insert(:number => n, :counter => 1, :type => t)
end
Sequel 4.25.0 (released July 31st, 2015) added insert_conflict for Postgres v9.5+
Sequel 4.30.0 (released January 4th, 2016) added insert_conflict for SQLite
This can be used to either insert or update a row, like so:
DB[:table_name].insert_conflict(:update).insert( number:n, type:t, counter:c )
I believe you can't have it much cleaner than that (although some databases have specific upsert syntax, which might be supported by Sequel). You can just wrap what you have in a separate method and pretend that it doesn't exist. :)
Just couple suggestions:
Enclose everything within a transaction.
Create unique index on (number, type) fields.
Don't use global variables.
You could use upsert, except it doesn't currently work for updating counters. Hopefully a future version will - ideas welcome!