Caching dataset results in Sequel and Sinatra - ruby

I'm building an API with Sinatra along with Sequel as ORM on a postgres database.
I have some complexes datasets to query in a paging style, so i'd like to keep the dataset in cache for following next pages requests after a first call.
I've read Sequel dataset are cached by default, but i need to keep this object between 2 requests to benefit this behavior.
So I was wondering to put this object somewhere to retrieve it later if the same query is called again rather than doing a full new dataset each time.
I tried in Sinatra session hash, but i've got a TypeError : can't dump anonymous class #<Class:0x000000028c13b8> when putting the object dataset in it.
I'm wondering maybe to use memcached for that.
Any advices on the best way to do that will be very appreciated, thanks.

Memcached or Redis (using LRU) would likely be appropriate solutions for what you are describing. The Ruby Dalli gem makes it pretty easy to get started with memcached. You can find it at https://github.com/mperham/dalli.
On the Github page you will see the following basic example
require 'dalli'
options = { :namespace => "app_v1", :compress => true }
dc = Dalli::Client.new('localhost:11211', options)
dc.set('abc', 123)
value = dc.get('abc')
This illustrates the basics to use the gem. Consider that Memcached is simply a key/value store utilizing LRU (least recently used) fallout. This means you allocate memory to Memcached and let your keys organically expire unless there is a reason to manually expire the key.
From there it becomes simply attempting to fetch a key from memcached, and then only running your real queries if there is no match found.
found = dc.get('my_unique_key')
unless found
# Do your Sequel query here
dc.set('my_unique_key', 'value_goes_here')
end

Related

If I eager load associated child records, then that means future WHERE retrievals won't dig through database again?

Just trying to understand... if at the start of some method I eager load a record and its associated children like this:
#object = Object.include(:children).where(email:"test#example.com").first
Then does that mean that if later I have to look through that object's children this will not generate more database queries?
I.e.,
#found_child = #object.children.where(type_of_child:"this type").first
Unfortunately not - using ActiveRecord::Relation methods such as where will query the database again.
You could however filter the data without any further queries, using the standard Array / Enumerable methods:
#object.children.detect {|child| child.type_of_child == "this type"}
It will generate another database query in your case.
Eager loading is used to avoid N+1 queries. This is done by loading all associated objects. But this doesn't work when you want to filter that list with where later on, Rails will than build a new query and run that one.
That said: In your example the include makes your code actually slower, because it loads associated object, but cannot use them.
I would change your example to:
#object = Object.find_by(email: "test#example.com")
#found_child = #object.children.find_by(type_of_child: "this type")

Ruby mongoDB and large documents

I have a populated mongoDB.
Now I need to add huge amounts of additional data to my documents (log file data). This data exceeds the BSON size limit.
Document too large: This BSON document is limited to 16777216 bytes. (BSON::InvalidDocument)
A simplified example of my situation would look like this:
cli = MongoClient.new("localhost", MongoClient::DEFAULT_PORT)
db = cli.db("testdb")
coll = db.collection("test")
data = {:name => "Customer1", :data1 => "some value", :log_file => "A" * 17_000_000}
coll.save data
What is the best way to add this huge amount of data?
Could I use GridFS to store those files and link the GridFS-file-handle to the correct document?
Could I then access the GridFS-file during queries?
I would suggest two approaches:
GridFS with instructions here https://github.com/mongodb/mongo-ruby-driver/wiki/GridFS
Advantages: uses already existing service(mongodb) to store files so presumably easiest to implement/ cheapest since you already have the infrastructure.
Disadvantage: Not necesarilly the best use of an in-memory DB, especially if it's used for other storage as well.
S3 - Store links to a hosted data service (such as Amazon S3) which is designed for file storage (redundant, replicated and highly available). In this case you just upload the files and store a pointer to their S3 location in your DB.
Advantage Keeps your DB leaner, probably cheaper since you keep your mongo machines optimised for doing mongo things (i.e. high-memory) and take advantage of the really cheap file storage on S3 as well as the near-infinite scalability.
Disadvantage Harder to implement since you need to design your own code to do this. Though there may be off the shelf solutions somewhere.
Some more useful discussion on this SO post
Maybe you can split up your document and reference them. See this SO post: syntax for linking documents in mongodb
The paragraph about document growth finally solved my question. (Found by following Konrad's link.)
http://docs.mongodb.org/manual/core/data-model-operations/#data-model-document-growth
What I am now basically doing is this:
cli = MongoClient.new("localhost", MongoClient::DEFAULT_PORT)
db = cli.db("testdb")
coll = db.collection("test")
grid = Grid.new db
#store data
id = grid.put "A"*17_000_000
data = {:name => "Customer1", :data1 => "some value", :log_file => id}
coll.save data
#access data
cust = coll.find({:name => "Customer1"})
id = cust.first["log_file"]
data = grid.get id

Mongoid has_many relationship causes Rack cookie error in Sinatra

Writing an application using Mongoid 3.1 and Sinatra in Ruby 1.9.3. I have a model called Order that has_many Items. Whenever I try to append an Item to an Order.items, I run into problems. I have the following route, summed up slightly:
order = session[:user].get_order(Time.now)
order.items << Item.new
order.save
"Hi, mom!" # Garbage page so that I know nothing else is called.
Doing that once is okay; doing it twice causes the following error:
Warning! Rack::Session::Cookie data size exceeds 4K.
Warning! Rack::Session::Cookie failed to save session. Content dropped.
I've been banging my head against the wall trying to get it to stop doing this. Why is the session loading all my items? Am I not using the has_many relationship correctly?
Your User model probably has_many :orders. Ruby is probably calling Marshal.dump to dump your user object into the cookie. You can imagine this might get huge. You should do the following:
Only store the user_id in the session.
Store your session server-side instead of in the cookie.
You'll need to use different middleware to store your session server-side. See this page for an example of storing your session in memcache. Since you're already using mongo, you could use Rack::Session::Mongo.
Even though you're not using Rails, the Rails guide on session security is useful reading. [link]

How to cache Zend Lucene search results in Code Igniter?

I'm not sure if this is the best way to go about this, but my aim is to have pagination of my lucene search results.
I thought it would make sense to run the search, store all the results in the cache, and then have a page function on my results controller that could return any particular subset of results from the cached results.
Is this a bad approach? I've never used caching of any sort, so don't know where to begin. The CI Caching Driver looked promising, but everything throws a server error. I don't know if I need to install APC, or Memcached, or what to do.
Help!
Lucene is a search engine that is built for scale. You can push it pretty far till the need arises to cache the search results. I would suggest you use the default settings and run it.
If you still feel the need for cache, first look at this Lucene FAQ and then the next level would perhaps be something on the lines of memcache.
Hope it helps!
Zend Search Lucene is indexed on the file system and as the user above has stated, built for scale. Unless you are indexing hundreds of thousands of documents, then caching is not really necessary - especially since all you would effectively be doing is taking data from one file and storing it in another.
On the other hand, if you are only storing, say, product Id in your search index and then selecting the products from the database when you get a result, it's well worth caching. This can easily be achived by using Zend_Cache.
A basic example of Zend Db caching is here:
$frontendOptions = array(
'automatic_serialization' => true
);
$backendOptions = array(
'cache_dir' => YOUR_CACHE_PATH_ON_THE_FILE_SYSTEM,
'file_name_prefix' => 'my_cache_prefix',
);
$cache = Zend_Cache::factory('Core',
'File',
$frontendOptions,
$backendOptions
);
Zend_Db_Table_Abstract::setDefaultMetadataCache($cache);
This should be added to your bootstrap file in an _initDbCache (call it whatever you want) method.
Of course that is a very simple implementation and does not achieve full result caching, more information on Zend Caching with Zend Db can be found here.

Pagination with MongoDB

I have been using MongoDB and RoR to store logging data. I am pulling out the data and looking to page the results. Has anyone done paging with MongoDB or know of any resources online that might help get me started?
Cheers
Eef
Pagination in MongoDB can be accomplished by using a combination of limit() and skip().
For example, assume we have a collection called users in our active database.
>> db.users.find().limit(3)
This retrieves a list of the first three user documents for us. Note, this is essentially the same as writing:
>> db.users.find().skip(0).limit(3)
For the next three, we can do this:
>> db.users.find().skip(3).limit(3)
This skips over the first three user records, and gives us the next three. If there is only one more user in your database, don't worry; MongoDB is smart enough to only return data that is present, and won't crash.
This can be generalised like so, and would be roughly equivalent to what you would do in a web application. Assuming we have variables called PAGE_SIZE which is set to 3, and an arbitrary PAGE_NUMBER:
>> db.users.find().skip(PAGE_SIZE * (PAGE_NUMBER - 1)).limit(PAGE_SIZE)
I cannot speak directly as to how to employ this method in Ruby on Rails, but I suspect the Ruby MongoDB library exposes these methods.

Resources