Delete entire voltrb store collection - voltrb

store.widgets.clear does not seem to save into the Database.
So I tried:
store.widgets.each do |i|
i.destroy
end
And this destroys only half of the records in the DB.
Any suggestions on how I can remove an entire store collection?

I need to add a .destroy_all (or make .clear do that), but haven't gotten around to it. The issue your seeing with .each is that its iterating through the array while deleting. For now you can do:
store.widgets.reverse.each(&:destroy)
I'll prioritize the destroy_all.

Related

Delete hanging ferret index entry after record deleted

I'm using ActsAsFerret which is a Rails wrapper for the Ruby-based Ferret search engine.
Sometimes it seems a record is deleted using SQL, rather than via ActiveRecord (ie record.destroy), and when this happens an index entry is left in the Ferret index.
I can see it in this example, where School #136574 has left a hanging index entry: In the search below I'm using the lazy option, which says "Just get the data out of the Ferret index and don't bother hitting the database":
>> #schools = School.search(params)
=> [#<FerretResult wrapper for School with id 136574, #<FerretResult wrapper for School with id 55814]
>> #schools.collect(&:id)
ActiveRecord::RecordNotFound: Couldn't find School with ID=136574
This causes a serious problem because if you try and do anything with the results it does School.find(id) and you get an exception like above.
I did School.rebuild_index, thinking that would fix it, but it hasn't.
I can work around it - it seems that calling .collect on the results has been overridden to convert them into their corresponding object (causing the exception), and I can do something like this:
#schools = #schools.select{|result| School.find_by_id(result.id)};#schools.size
This does work, but it's slow, and a bit of a pain in the ass.
Does anyone know a way to delete the hanging index records?

How do I find and remove duplicate mongo documents with ruby

I have a collection in Mongo with duplicates on a specific key that I need to remove all but one of. The Map Reduce solutions don't seem to make it clear how to remove all but one of the duplicates. I am using Ruby, how can I do this in a somewhat efficient way? My current solution is unbelievably slow!
I currently just iterate over an array of the duplicate keys and delete the first document that is returned but this only works if there are at most 1 duplicate document for each key and it is really slow.
dupes.each do |key|
$mongodb.collection("some_collection").remove($mongodb.collection("some_collection").find({key: key}).first)
end
I think you should use the MongoDB ensureIndex() to remove the duplicates. For instance, in your case, you want to drop the duplicate documents give the key duplicate_key, you can do
db.duplicate_collection.ensureIndex({'duplicate_key' : 1},{unique: true, dropDups: true})
where duplicate_collection is the collection where your duplicate documents are. This operation will only preserve single document if there are duplicate documents give a particular key.
After the operation, if you think you want to remove the index, just do the dropIndex operation. For details, you can search the mongodb documentation.
A lot of solutions suggest Map Reduce (which is fast and fine) but I implemented a solution in Ruby that seems pretty fast as well and makes it easy to leave the one document from each duplicate set.
Basically you find all your duplicate keys by adding them to a hash and any time you find a duplicate key in the collection you add the id of that document to an array which you will use in a bulk removal at the end.
all_keys = {}
dupes = []
dupe_key = "some_key"
$mongodb.collection("some_collection").find.each do |doc|
all_keys[doc[dupe_key]].present? ? dupes << doc["_id"] : asins[doc[dupe_key]] = 1
end
$mongodb.collection("some_collection").remove({_id: {"$in" => dupes } })
The only issue with this method is that it potentially won't work if the total list of keys/dupe ids can't be stored in memory. The map reduce solution would probably be best at that point.

collect all xpath results into an array in ruby

I have a very poorly coded JSP that I am trying to run automation on. There are a series of checkboxes with names (no IDs) of "delete[x]" where X is the item number of the item populated. I am trying to select all the checkboxes so I can delete every entry. Here is what I have
check_boxes = []
check_boxes.push(#browser.checkbox(:xpath, "//input[contains(#name,'delete')]"))
puts check_boxes.size
check_boxes.each do |check_box|
check_box.set
The problem with this is it only selects the first instance (node) that matches the xpath to dump into the array. I know I can iterate through the xpath adding an index to the node, and then put a rescue in that drops me out when the index goes out of bounds, but that seems like the dirty way to do it.
I know there is an "as" tag that gets a set of anchors and i was wondering if there was a method like that for taking the whole selection of checkboxes
I don't think the problem is the xpath itself. It is the #browser.checkbox that is causing only the first checkbox to be returned.
If you want all matching checkboxes, you should use (notice the plural):
#browser.checkboxes
Note that checkboxes returns a collection of checkboxes. Unless you are doing something really fancy, you usually do not need to convert it to an array.
You can simply do:
#browser.checkboxes(:name => /delete/).each do |checkbox|
checkbox.set
end

RhoMobile 13,000 inserts causing issues due to time

I have a problem (due to time) when inserting around 13,000 records into the devices database.
Is there any way to optimize this? Is it possible to put these all into one transaction (as I believe that it is currently creating one transaction per insert (which apparently has a diabolical effect on speed)).
Currently this takes around 10 minutes, this includes converting CSV to a hash (this doesn't seem to be the bottleneck).
Stupidly I am not using RhoSync...
Thanks
Set up a transaction around the inserts and then only commit at the end.
From their FAQ.
http://docs.rhomobile.com/faq#how-can-i-seed-a-large-amount-of-data-into-my-application-with-rhom
db = ::Rho::RHO.get_src_db('Model')
db.start_transaction
begin
items.each do |item|
# create hash of attribute/value pairs
data = {
:field1 => item['value1'],
:field2 => item['value2']
}
# Creates a new Model object and saves it
new_item = Model.create(data)
end
db.commit
rescue
db.rollback
end
I've found this technique to be a tremendous speed up.
Use Fixed schema rather then property bag, and you can use one transaction (see below link for how).
http://docs.rhomobile.com/rhodes/rhom#perfomance-tips.
This question was answer by someone else, on google groups (HAYAKAWA Takashi)

Ruby - Feedzirra and updates

trying to get my head around Feedzirra here.
I have it all setup and everything, and can even get results and updates, but something odd is going on.
I came up with the following code:
def initialize(feed_url)
#feed_url = feed_url
#rssObject = Feedzirra::Feed.fetch_and_parse(#feed_url)
end
def update_from_feed_continuously()
#rssObject = Feedzirra::Feed.update(#rssObject)
if #rssObject.updated?
puts #rssObject.new_entries.count
else
puts "nil"
end
end
Right, what I'm doing above, is starting with the big feed, and then only getting updates. I'm sure I must be doing something stupid, as even though I'm able to get the updates, and store them on the same instance variable, after the first time, I'm never able to get those again.
Obviously this happens because I'm overwriting my instance variable with only updates, and lose the full feed object.
I then thought about changing my code to this:
def update_from_feed_continuously()
feed = Feedzirra::Feed.update(#rssObject)
if feed.updated?
puts feed.new_entries.count
else
puts "nil"
end
end
Well, I'm not overwriting anything and that should be the way to go right?
WRONG, this means I'm doomed to always try to get updates to the same static feed object, as although I get the updates on a variable, I'm never actually updating my "static feed object", and newly added items will be appended to my "feed.new_entries" as they in theory are new.
I'm sure I;m missing a step here, but I'd really appreciate if someone could shed me a light on it. I've been going through this code for hours, and can't get to grips with it.
Obviously it should work fine, if I did something like:
if feed.updated?
puts feed.new_entries.count
#rssObject = initialize(#feed_url)
else
Because that would reinitialize my instance variable with a brand new feed object, and the updates would come again.
But that also means that any new update added on that exact moment would be lost, as well as massive overkill, as I'd have to load the thing again.
Thanks in advance!
How to do updates is a bit counterintuitive with the current API. This example shows the best way to do it:
# I'm using Atom here, but it could be anything. You don't need to know ahead of time.
# It will parse out to the correct format when it updates.
feed_to_update = Feedzirra::Parser::Atom.new
feed_to_update.feed_url = some_stored_feed_url
feed_to_update.etag = some_stored_feed_etag
feed_to_update.last_modified = some_stored_feed_last_modified
last_entry = Feedzirra::Parser::AtomEntry.new
last_entry.url = the_url_of_the_last_entry_for_a_feed
feed_to_update.entries = [last_entry]
updated_feed = Feedzirra::Feed.update(feed_to_update)
updated_feed.updated? # => nil if there is nothing new
updated_feed.new_entries # => [] if nothing new otherwise a collection of feedzirra entries
updated_feed.etag # => same as before if nothing new. although could change with comments added to entries.
updated_feed.last_modified # => same as before if nothing new. although could change with comments added to entries.
Basically, you'll have to save off four pieces of data (feed_url,
last_modified, etag, and the url of the most recent entry). Then when you
want to do updates you construct a new feed object and call update on
that.
I think a more obvious solution would be to add :if_modified_since option to fetch_and_parse method of class Feed, see https://github.com/pauldix/feedzirra/blob/master/lib/feedzirra/feed.rb#L116 and https://github.com/pauldix/feedzirra/blob/master/lib/feedzirra/feed.rb#L206
You can reset #rssObject to the updated feed.
feed = Feedzirra::Feed.update(#rssObject)
if feed.updated?
puts feed.new_entries.count
#rssObject = feed
else
puts 'nil'
end
The number of entries in #rssObject will keep growing as new entries are found. So if the first fetch finds 10 entries, and then next finds 10 new entries, #rssObject.entries.size will be 20.
Note that you can do this regardless of whether update finds new entries. If feed.updated? is false, feed will be the original feed object, #rssObject.

Resources