Ruby datajob with mongoid - ruby

I'm trying to use ruby and mongoid in order to extract some data from an oracle database and into my mongoDB in order to perfom a couple of operations on it.
The question is:
I created my 'Record' class with includes the Mongoid::Document and set up all my fields, and have already assigned the data coming out of the oracle database, and have all my BSON objects stored in an array.
Now my question is: How I save them?
Here's my piece of code
query = db.report # Sequel Object
query.each do |row|
r = Record.new #Mongoid class
r.directory_name = row[:directory_name]
r.directory_code = row[:directory_id]
r.directory_edition = row[:edition]
r.last_updated = row[:updated]
r.canvass = row[:canvass_id]
r.specialty_item = row[:item]
r.production_number = row[:prodnr]
r.status = row[:exposure_status]
r.scanned_date = row[:scandate]
r.customer_id = row[:customer_id]
r.sales_rep = row[:sales_rep_name]
r.phone = row[:phone]
r.customer_name = row[:customer_name]
records << r
end

You would need to do Record.collection.insert(records). Although note that this will skip any validations you have written in your Mongoid model but will be a little faster than creating mongoid records and saving them, as it will use the ruby mongo driver directly. You should only do this if you know that data is consistent.
If you want to do all the validations on your data before saving them in MongoDB, you should create a model instead of putting them in an array.
So you can persist the data extracted in MongoDB in three ways according to your preferences:
Insert all records at once using mongo driver, but beware the array you are creating can be huge:
query.each do |row|
.....
end
Record.collection.insert(records)
Insert one record at a time using mongo driver(replace records << r with new line)
query.each do |row|
.....
Record.collection.insert(r)
end
Insert one record at a time using Mongoid and all the validations and callbacks(replace records << r with new line)
query.each do |row|
.....
r.save
end
update: Missed that you are already creating the record hence the mongo driver suggestions. If you want to use mongo driver directly, you should use a hash instead of Mongoid model. i.e instead of
r = Record.new
r.status = row[:status]
# copy more data
you should do
r = {}
r[:status] = row[:status]
# copy more data

Related

Put the sample results in an array, a hash, etc using ruby and oci8

I use ruby-2.3 and oci-8 gem. I want to make the select query:
stm = "select * from DATASERVICEUSERS t where boss<>100 and loginad is not null"
res = CONN.exec(stm).fetch_hash do |row|
#do something with row
end
CONN.logoff
How can I query the result of the whole to put for example in an array or hash, instead of cycle pass through each record? I need just a collection of elements of the result of this request.
Oci-8 doesn't provice that. The .exec method produces a cursor that you you need to process like your code demonstrates. You can fill up an array with an array of fields or a hash.
Here an example for an array
records = []
conn.exec(sql) { |record| records << record}
# records: [["xxxx", "xxxx"], ["yyyy", "yyyy"], ..]
I know this is quite an old question but I've come across this problem. I'm not as well versed in ruby but oci8 2.2.7 actually provides fetch_hash
https://www.rubydoc.info/gems/ruby-oci8/OCI8/Cursor#fetch_hash-instance_method
here's an example from my use case:
records = []
dataCursor = #odb.exec(queryUUNRData)
while((data = dataCursor.fetch_hash) != nil)
records.push data
end
dataCursor.close
the resulting dataset already includes the column names as hash key

Bulk Insert into Mongo - Ruby

I am new to Ruby and Mongo and am working with twitter data. I'm using Ruby 1.9.3 and Mongo gems.
I am querying bulk data out of Mongo, filtering out some documents, processing the remaining documents (inserting new fields) and then writing new documents into Mongo.
The code below is working but runs relatively slow as I loop through using .each and then insert new documents into Mongo one at a time.
My Question: How can this be structured to process and insert in bulk?
cursor = raw.find({'user.screen_name' => users[cur], 'entities.urls' => []},{:fields => params})
cursor.each do |r|
if r['lang'] == "en"
score = r['retweet_count'] + r['favorite_count']
timestamp = Time.now.strftime("%d/%m/%Y %H:%M")
#Commit to Mongo
#document = {:id => r['id'],
:id_str => r['id_str'],
:retweet_count => r['retweet_count'],
:favorite_count => r['favorite_count'],
:score => score,
:created_at => r['created_at'],
:timestamp => timestamp,
:user => [{:id => r['user']['id'],
:id_str => r['user']['id_str'],
:screen_name => r['user']['screen_name'],
}
]
}
#collection.save(#document)
end #end.if
end #end.each
Any help is greatly appreciated.
In your case there is no way to make this much faster. One thing you could do is retrieve the documents in bulks, processing them and the reinserting them in bulks, but it would still be slow.
To speed this up you need to do all the processing server side, where the data already exist.
You should either use the aggregate framework of mongodb if the result document does not exceed 16mb or for more flexibility but slower execution (much faster than the potential your solution has) you can use the MapReduce framework of mongodb
What exactly are you doing? Why not going pure ruby or pure mongo (well that's ruby too) ? and Why do you really need to load every single attribute?
What I've understood from your code is you actually create a completely new document, and I think that's wrong.
You can do that with this in ruby side:
cursor = YourModel.find(params)
cursor.each do |r|
if r.lang == "en"
r.score = r.retweet_count + r.favorite_count
r.timestamp = Time.now.strftime("%d/%m/%Y %H:%M")
r.save
end #end.if
end #end.each
And ofcourse you can import include Mongoid::Timestamps in your model and it handles your created_at, and updated_at attribute (it creates them itself)
in mongoid it's a little harder
first you get your collection with use my_db then the next code will generate what you want
db.models.find({something: your_param}).forEach(function(doc){
doc.score = doc.retweet_count + doc.favorite_count
doc.timestamp = new Timestamp()
db.models.save(doc)
}
);
I don't know what was your parameters, but it's easy to create them, and also mongoid really do lazy loading, so if you don't try to use an attribute, it won't load that. You can actually save a lot of time not using every attribute.
And these methods, change the existing document, and won't create another one.

Getting an object from mongoDB with Mongoid

Simple enough situation. I've got a MongoDB database with a bunch of information from a previous developer. However I have limited information on the model that came before hand and I DONT have access to the original model class. I've been tinkering with the MongoDB driver to get some more information on it (MongoID will have to be used eventually to map the object back out) as follows.
#The flow is as follows
#Connection
#Databases
#Database
#Collection
#Hash Info
#Setup the connection. you can supply attributes in the form of ("db",portno) but most of the time it will pick up the defaults
conn = Mongo::Connection.new
#Database info
mongodbinfo =conn.database_names
conn.database_info.each { |info| puts info.inspect }
db = conn.db("db_name_here")
db.collection_names.each { |collection| puts collection.inspect }
collection = db.collection("model_name_here")
puts collection.inspect
collection.find.each { |row|
puts row.inspect
puts row.class
}
Each row is a separate object and as MongoDB works, each object/document is a BSON object.
So the bottom line question is How do i de-serialize the BSON into a model using mongoID?
P.s Feel free to use the above code if your trying to figure out a new mongoDB, its been handy for debugging IMHO.
So this was a bust.
In the end I used the Mondb driver to manually pull the data out with queries. However creating the object was far more difficult.
Its better to have the actual model when using ORM.

replacing a value in a json array with sinatra

I have records with a 'resource' field which can contain multiple resources. When I return this data, I need to iterate over this field and return an individual record for each value in the field. I am currently using sinatra and am able to interate over the fields okay, but I am having difficulty replacing the field in the json array.
For example
event: Name
resources: resourceA, resourceB, resourceC
This record needs to be returned as 3 uniqe records/events with only one resource per record.
With the code listed below, I am getting three records, but all three records are coming back with the same resource value (resourceC)
Here is my code
docs = #db.view('lab/events', :startkey => params[:startDate], :endkey => endSearch)['rows']
rows = Array.new
docs.each do |doc|
resources = doc['value']['resources'].split(",")
resources.each do |r|
doc['value']['resources'] = r
rows.push(doc['value'])
end
end
Any help is greatly appreciated.
Thanks
Chris
if you use the ruby gem "json" you can convert the json string to a hash
require 'json'
converted_hash = JSON(json_string).to_hash
This should be much easier to manage.
You can then turn the hash to a JSON string:
new_json_string = converted_hash.to_json
Basically what is happening is ruby is seeing all three records as the same record so as the hash value is updated on one record, it impacts all other records that were created from the same doc. To get around this, I acutally needed to create a duplicate record each time through and modify it's value.
docs = #db.view('lab/events', :startkey => params[:startDate], :endkey => endSearch)['rows']
rows = Array.new
docs.each do |doc|
resources = doc['value']['resources'].split(",")
resources.each do |r|
newDoc = doc['value'].dup # <= create a duplicate record and update the value
newDoc["resources"] = r
rows.push(newDoc)
end
end

Ruby: DataMapper and has n, :xyz, :through => Resource

I've encountered following issue:
there are 2 models: X and Y, they're associated with each other like this: has n, :<name>, :through => Resouce; when i'm doing something like x.ys = array_with_500_ys it takes really long time because DataMapper inserts only one association per query (insert into xs_ys(x_id, y_id) values(xid, yid)). This takes really long.
The question is: how to make this faster?
Thanks.
Because DataMapper has abstracted the 'back end', the standard behaviour is to insert one record at a time as SQL (or whatever you are using).
Assuming you are using an SQL backend, such as Postgres, you could drop back to raw SQL, and do the following:
x = X.first
query = "INSERT INTO xs_ys(x_id, y_id) VALUES"
vals = []
array_with_500_ys.each do |y|
vals << "(#{x.id}, #{y.id})"
end
repository.adapter.execute(query + vals.join(','));
This creates one 'insert', passing all records to be inserted. Not sure if this would be any faster, but you could put it into a background job if you need the app not to time out for the user.

Resources