Getting an object from mongoDB with Mongoid - ruby

Simple enough situation. I've got a MongoDB database with a bunch of information from a previous developer. However I have limited information on the model that came before hand and I DONT have access to the original model class. I've been tinkering with the MongoDB driver to get some more information on it (MongoID will have to be used eventually to map the object back out) as follows.
#The flow is as follows
#Connection
#Databases
#Database
#Collection
#Hash Info
#Setup the connection. you can supply attributes in the form of ("db",portno) but most of the time it will pick up the defaults
conn = Mongo::Connection.new
#Database info
mongodbinfo =conn.database_names
conn.database_info.each { |info| puts info.inspect }
db = conn.db("db_name_here")
db.collection_names.each { |collection| puts collection.inspect }
collection = db.collection("model_name_here")
puts collection.inspect
collection.find.each { |row|
puts row.inspect
puts row.class
}
Each row is a separate object and as MongoDB works, each object/document is a BSON object.
So the bottom line question is How do i de-serialize the BSON into a model using mongoID?
P.s Feel free to use the above code if your trying to figure out a new mongoDB, its been handy for debugging IMHO.

So this was a bust.
In the end I used the Mondb driver to manually pull the data out with queries. However creating the object was far more difficult.
Its better to have the actual model when using ORM.

Related

Django: How do I cache object from get_object() in a class-based view?

I have been struggling for hours with this: I just can't figure a proper way to cache an object queryset result (object = queryset.get()) in order to avoid re-hitting the database on each view request.
This is my current (simplified) code, and as you can see, I override get_object() to add some extra data (not only the today variable), check if object is in sessions and add object to session.
views.py
from myapp import MyModel
from django.core.cache.utils import make_template_fragment_key
from django.views.generic import DetailView
class myClassView(DetailView):
model = MyModel
def get_object(self,queryset=None):
if queryset is None:
queryset = self.get_queryset()
pk = self.kwargs.get(self.pk_url_kwarg, None)
if pk is not None:
queryset = queryset.filter(pk=pk)
else:
raise AttributeError("My error message.")
try:
today = datetime.today().strftime('%Y%m%d')
cache_key = make_template_fragment_key('some_name', [pk, today])
if cache.has_key(cache_key):
object = self.request.session[cache_key]
return object
else:
object = queryset.get()
object.id = my_id
object.today = today
# Add object to session
self.request.session[cache_key] = object
except queryset.model.DoesNotExist:
raise Http404("Error 404")
return object
The above only works if I add the following:
settings.py
SESSION_SERIALIZER = 'django.contrib.sessions.serializers.PickleSerializer'
But I don't like this hack since it is not secure for Django 1.6 and newer versions because, according to How To Use Sessions (Django 1.7 documents):
If the SECRET_KEY is not kept secret and you are using the PickleSerializer, this can lead to arbitrary remote code execution
If I don't add the SESSIONS_SERIALIZER line I get a "django object is not JSON serializable" error. However, elsewhere my code breaks and I get KeyError errors when trying to pull data from session. This issue is solved converting my string keys into integers. Before changing the settings file Django was converting the str keys into integers automatically when session data was getting requested.
So considering this session serializer security issue I'd prefer other option. So I read here and here about caching get_object(), but I just don't get how to fit that into my get_object() bit. I tried..
if cache.has_key(cache_key):
self._object = super(myClassView,self).get_object(queryset=None)
return self._object
...but it fails. This seems the best solution so far. But how do I implement this into my code? Or, is there a better idea? I'm all ears. Thanks!
You should step back and reassess the situation. What are you trying to achieve?
The get_object is a method that get called in the detailed view to access one specific object from the database.
If you access this method the first time the object gets invalidated and cached in the Queryset.
In order to cache the get_queryset method you need a good cache backend like Redis or Memcached in place so that you can do a simple Write-through Cache operation:
if cache.has_key(cache_key):
object = cache.get(cache_key)
return object
else:
object = queryset.get(pk=pk)
cache.set(cache_key,object)
return object
Note that the django objects are serialized in the cache backend and retrieved as objects when deserialised.
That approach is the just a starting point. You cache the object the first time it misses.
You can also add a post_save,post_update signal to save the object in the cache every time the model is saved or updated:
#receiver(post_save, sender=MyModel)
#receiver(post_delete, sender=MyModel)
def add_MyModel_to_cache(sender, **kwargs):
object = kwargs['instance']
cache.set(cache_key,object)
You have to carefully review what you want to cache and when as it is very easy to misjudge requests

Return database results in the same JSON parent and children

I have Team and Players classes and want to return the data in one JSON string which contains Team info but at the same time it displays all the information about the players.
class Team < ActiveRecord::Base
has_many :players
end
class Players < ActiveRecord::Base
belongs_to :team
end
I know how to retrieve the information about team and players but not in the same query. Another problem is I don't how to merge the result JSONs in one JSON.
team = Team.last.to_json
player = team.players.to_json
How can I query the info about Team and Players in the same query. I tried:
#team = Team.includes(:players).where(players: {team_id: Team.last}).last.to_json
and it only returns me information about the team. I want a JSON like:
-id
-name
-players
-player
-player
In case it's impossible, how can I merge into one JSON all the information from the two queries.
You can write a "join" to incorporate the players in the team with the team information. At that point you'll have a structure that has the information needed to create the JSON. See "12 Joining Tables" from the Active Record documentation for more information.
Or, you can make two separate queries, then create a bit more complex JSON hash or array allowing you to output both sets of data into one larger serialized object. For instance:
require 'json'
team = {
'name' => 'bears'
}
players = {
'1' => 'fred',
'2' => 'joe'
}
puts ({
'team' => team,
'players' => players
}).to_json
Here's the output:
{"team":{"name":"bears"},"players":{"1":"fred","2":"joe"}}
Here's the data returned back to the Ruby object:
data = '{"team":{"name":"bears"},"players":{"1":"fred","2":"joe"}}'
JSON[data]
# => {"team"=>{"name"=>"bears"}, "players"=>{"1"=>"fred", "2"=>"joe"}}
Also, since you're using Sinatra, it's not necessary to use Active Record. Sequel is a very good ORM, and is my personal favorite when working with Sinatra. You might find it easier to work with.
Another option to manual serialization is to use ActiveModel::Serializer which allows you to define relationships between objects and gives you finer grained choices of what to include when you serialize, what to filter out and what related objects to preload. An alternative could also be Rabl which also has quite a nice API.
If you're just playing around with a small amount of JSON this might be overkill, but it's a nice practice to be more organized

ruby serialise a model to represent in

I have a set of legacy database tables that i cannot normalize out to what should have been done in the first place. e.g one big table with 200 columns.
I'm building an API and would like to represent this data to the consumer in a better state, and perhaps address the database issues at a later stage, there are many backend systems that reply on the data and changes are not easy.
I wanted to represent the current database schema using Active Record, however perform a model transformation into a new model that will be used for presentation only to an API consumer as json data.
current database schema:
Products table (200 columns)
New Model:
Product
+ Pricing
+ Assets
+ Locations
+ Supplier
I could hard-code a json string in a template, but feel that would not be a very poor approach.
What approach or gem would you recommend to tackle this best?
I have looked at :
RABL
ActiveModel::Serializers
If you define an as_json method that returns a hash, ActiveRecord will take care of the serialization for you. E.g.
class Product < ActiveRecord::Base
def as_json options = {}
{
product: <product value>,
pricing: <pricing value>,
# ... etc.
}
end
end
Now you can do:
> Product.first.to_json
=> "{\"product\":<product_value> ... }"
You can even render these as json from the controllers via:
render json: #model

Bulk Insert into Mongo - Ruby

I am new to Ruby and Mongo and am working with twitter data. I'm using Ruby 1.9.3 and Mongo gems.
I am querying bulk data out of Mongo, filtering out some documents, processing the remaining documents (inserting new fields) and then writing new documents into Mongo.
The code below is working but runs relatively slow as I loop through using .each and then insert new documents into Mongo one at a time.
My Question: How can this be structured to process and insert in bulk?
cursor = raw.find({'user.screen_name' => users[cur], 'entities.urls' => []},{:fields => params})
cursor.each do |r|
if r['lang'] == "en"
score = r['retweet_count'] + r['favorite_count']
timestamp = Time.now.strftime("%d/%m/%Y %H:%M")
#Commit to Mongo
#document = {:id => r['id'],
:id_str => r['id_str'],
:retweet_count => r['retweet_count'],
:favorite_count => r['favorite_count'],
:score => score,
:created_at => r['created_at'],
:timestamp => timestamp,
:user => [{:id => r['user']['id'],
:id_str => r['user']['id_str'],
:screen_name => r['user']['screen_name'],
}
]
}
#collection.save(#document)
end #end.if
end #end.each
Any help is greatly appreciated.
In your case there is no way to make this much faster. One thing you could do is retrieve the documents in bulks, processing them and the reinserting them in bulks, but it would still be slow.
To speed this up you need to do all the processing server side, where the data already exist.
You should either use the aggregate framework of mongodb if the result document does not exceed 16mb or for more flexibility but slower execution (much faster than the potential your solution has) you can use the MapReduce framework of mongodb
What exactly are you doing? Why not going pure ruby or pure mongo (well that's ruby too) ? and Why do you really need to load every single attribute?
What I've understood from your code is you actually create a completely new document, and I think that's wrong.
You can do that with this in ruby side:
cursor = YourModel.find(params)
cursor.each do |r|
if r.lang == "en"
r.score = r.retweet_count + r.favorite_count
r.timestamp = Time.now.strftime("%d/%m/%Y %H:%M")
r.save
end #end.if
end #end.each
And ofcourse you can import include Mongoid::Timestamps in your model and it handles your created_at, and updated_at attribute (it creates them itself)
in mongoid it's a little harder
first you get your collection with use my_db then the next code will generate what you want
db.models.find({something: your_param}).forEach(function(doc){
doc.score = doc.retweet_count + doc.favorite_count
doc.timestamp = new Timestamp()
db.models.save(doc)
}
);
I don't know what was your parameters, but it's easy to create them, and also mongoid really do lazy loading, so if you don't try to use an attribute, it won't load that. You can actually save a lot of time not using every attribute.
And these methods, change the existing document, and won't create another one.

Ruby datajob with mongoid

I'm trying to use ruby and mongoid in order to extract some data from an oracle database and into my mongoDB in order to perfom a couple of operations on it.
The question is:
I created my 'Record' class with includes the Mongoid::Document and set up all my fields, and have already assigned the data coming out of the oracle database, and have all my BSON objects stored in an array.
Now my question is: How I save them?
Here's my piece of code
query = db.report # Sequel Object
query.each do |row|
r = Record.new #Mongoid class
r.directory_name = row[:directory_name]
r.directory_code = row[:directory_id]
r.directory_edition = row[:edition]
r.last_updated = row[:updated]
r.canvass = row[:canvass_id]
r.specialty_item = row[:item]
r.production_number = row[:prodnr]
r.status = row[:exposure_status]
r.scanned_date = row[:scandate]
r.customer_id = row[:customer_id]
r.sales_rep = row[:sales_rep_name]
r.phone = row[:phone]
r.customer_name = row[:customer_name]
records << r
end
You would need to do Record.collection.insert(records). Although note that this will skip any validations you have written in your Mongoid model but will be a little faster than creating mongoid records and saving them, as it will use the ruby mongo driver directly. You should only do this if you know that data is consistent.
If you want to do all the validations on your data before saving them in MongoDB, you should create a model instead of putting them in an array.
So you can persist the data extracted in MongoDB in three ways according to your preferences:
Insert all records at once using mongo driver, but beware the array you are creating can be huge:
query.each do |row|
.....
end
Record.collection.insert(records)
Insert one record at a time using mongo driver(replace records << r with new line)
query.each do |row|
.....
Record.collection.insert(r)
end
Insert one record at a time using Mongoid and all the validations and callbacks(replace records << r with new line)
query.each do |row|
.....
r.save
end
update: Missed that you are already creating the record hence the mongo driver suggestions. If you want to use mongo driver directly, you should use a hash instead of Mongoid model. i.e instead of
r = Record.new
r.status = row[:status]
# copy more data
you should do
r = {}
r[:status] = row[:status]
# copy more data

Resources