Sinatra + Heroku: Store data >4k in a Session - session

I have a Sinatra app that will be running on Heroku.
It has a fairly long running method (approx 1 to 3 seconds) that creates a list of data. I've been storing this data in a session. So that when my user needs to access it again I avoid having to regenerate the list (they use it often).
In some instances the data is over 4k. Meaning that I can no longer store it in a session.
Rack::Session::Pool, works perfectly. Except that it is not compatible with heroku (since dynos do not share memory).
Could someone suggest how I might best store this data?
I've considered writing it to my SQL database, since a SELECT would be less expensive than the original generation of the list. There must be a better way?

If you don't want to use a DB then how about something like Memcache - Heroku have a memcache addon that you could use.

Related

Is it necessary for memcached to replicate its data?

I understand that memcached is a distributed caching system. However, is it entirely necessary for memcached to replicate? The objective is to persist sessions in a clustered environment.
For example if we have memcached running on say 2 servers, both with data on it, and server #1 goes down, could we potentially lose session data that was stored on it? In other words, what should we expect to see happen should any memcached server (storing data) goes down and how would it affect our sessions in a clustered environment?
At the end of the day, will it be up to use to add some fault tolerance to our application? For example, if the key doesn't exist possibly because one of the servers it was on went down, re-query and store back to memcached?
From what I'm reading, it appears to lean in this direction but would like confirmation: https://developers.google.com/appengine/articles/scaling/memcache#transient
Thanks in advance!
Memcached has it's own fault tolerance built in so you don't need to add it to your application. I think providing an example will show why this is the case. Let's say you have 2 memcached servers set up in front of your database (let's say it's mysql). Initially when you start your application there will be nothing in memcached. When your application needs to get data if will first check in memcached and if it doesn't exist then it will read the data from the database and insert it into memcached before returning it to the user. For writes you will make sure that you insert the data into both your database and memcached. As you application continues to run it will populate the memcached servers with a bunch of data and take load off of your database.
Now one of your memcached servers crashes and you lose half of your cached data. What will happen is that your application will now be going to the database more frequently right after the crash and your application logic will continue to insert data into memcached except everything will go directly to the server that didn't crash. The only consequence here is that your cache is smaller and your database might need to do a little bit more work if everything doesn't fit into the cache. Your memcached client should also be able to handle the crash since it will be able to figure out where your remaining healthy memcached servers are and it will automatically hash values into them accordingly. So in short you don't need any extra logic for failure situations in memcached since the memcached client should take care of this for you. You just need to understand that memcached servers going down might mean your database has to do a lot of extra work. I also wouldn't recommend re-populating the cache after a failure. Just let the cache warm itself back up since there's no point in loading items that you aren't going to use in the near future.
m03geek also made a post where he mentioned that you could also use Couchbase and this is true, but I want to add a few things to his response about what the pros and cons are. First off Couchbase has two bucket (database) types and these are the Memcached Bucket and the Couchbase Bucket. The Memcached bucket is plain memcached and everything I wrote above is valid for this bucket. The only reasons you might want to go with Couchbase if you are going to use the memcached bucket are that you get a nice web ui which will provide stats about your memcached cluster along with ease of use of adding and removing servers. You can also get paid support down the road for Couchbase.
The Couchbase bucket is totally different in that it is not a cache, but an actual database. You can completely drop your backend database and just use this bucket type. One nice thing about the Couchbase bucket is that it provides replication and therefore prevents the cold cache problem that memcached has. I would suggest reading the Couchbase documentation if this sounds interesting you you since there are a lot of feature you get with the Couchbase bucket.
This paper about how Facebook uses memcached might be interesting too.
https://www.usenix.org/system/files/conference/nsdi13/nsdi13-final170_update.pdf
Couchbase embedded memcached and "vanilla" memcached have some differences. One of them, as far as I know, is that couchbase's memcached servers act like one. This means that if you store your key-value on one server, you'll be able to retreive it from another server in cluster. And vanilla memcached "clusters" are usally built with sharding technique, which means on app side you should know what server contain desired key.
My opinion is that replicating memcached data is unnessesary. Modern datacenters provide almost 99% uptime. So if someday one of your memcached servers will go down just some of your online users will be needed to relogin.
Also on many websites you can see "Remember me" checkbox that sets a cookie, which can be used to restore session. If your users will have that cookie they will not even notice that one of your servers were down. (that's answer for your question about "add some fault tolerance to our application")
But you can always use something like haproxy and replicate all your session data on 2 or more independent servers. In this case to store 1 user session you'll need N times more RAM, where N is number of replicas.
Another way - to use couchbase to store sessions. Couchbase cluster support replicas "out of the box" and it also stores data on disk, so if your node (or all nodes) will suddenly shutdown or reboot, session data will not lost.
Short answer: memcached with "remember me" cookie and without replication should be enough.

Store 600 KB of session data per registered user in Heroku

For a given application, we would need to store about 600 Kb of data in the web session per registered user who connects on our website. We would have about 1,000 registered users in parallel hence we need to store 600 Mb of session data.
The reason we need so much data in the session is to avoid querying frequently a table with about 1 billion rows in the database.
I understood Heroku stores session information in the database. This is fine as it means the session data is available cross-dynos (no session affinity).
Is there another way of storing more efficiently information across dynos ? Reading the docs, I found memcachier.
My questions would be the following :
Do you think storing that amount of session in the database would be performant enough
Do you suggest other caching systems than memcachier to store session information available across different dynos ?
Thanks a lot for your help !
Olivier
Heroku does not store session information at all -- how session information is stored depends entirely on your application and your application's framework and that will work in the same way regardless of whether you are deployed on Heroku or any other system.
As far as what kind of storage is sensible, however: it sounds like cookie storage is right out, due to the volume of data. Database storage was the de facto default for web applications for a long time and there's nothing wrong with it. Memcached would be faster, and how much faster exactly depends on your configuration (are you using connection pooling? does each page view hit the database for something else anyways? what is your caching system like?). But as long as you're sure this strategy of storing so much info in session data is sound, then the difference between database and memcached storage will not be great.

what way to store data by key and value?

I store data in
HttpContext.Current.Application.Add(appKey, value);
And read data by this one:
HttpContext.Current.Application[appKey];
This has the advantage for me that is using a key for a value but after a short time (about 20 minutes) it does not work, and I can not find [appKey],because the application life cycle in iis data will lose.
i want to know is that another way to store my data by key and value?
i do not want sql server,file,... and want storing data on server not on client
i store users some data in it.
thanks for your helping
Since IIS may recycle and throw away any cache/memory contents at any time, the only way you will get data persisted is to store it outside IIS. Some examples are; (and yes, I included the ones you stated you didn't want just to have the list a bit more complete, feel free to skip them)
A SQL database (there are quite a few free ones if the price is prohibitive)
A NoSQL database (same thing there, quite a few free ones and usually simpler to use for key/value)
File (which you also stated you didn't want)
Some kind of external memory cache, a'la AppFabric cache or memcached.
Cookies (somewhat limited in size and not secure in any way by default)
you could create a persistent cookie on the user's machine so that the session doesn't expire, or increase the session timeout to a value that would work better for your situation/users
How to create persistent cookies in asp.net?
Session timeout in ASP.NET
You're talking about persisting data beyond the scope of a session. So you're going to have to use some form of persistent storage (Database, File, Caching Server).
Have you considered using AppFabric. It's actually pretty easy to implement. You could either access it directly from your code using the nuget packages, or you could just configured it as a session store. (I think) doing the latter would mean you'd get rid of the session timeout issue.
Do you understand that whatever you decide to store in Application, will be available for all users in your application?
Now regarding your actual question, what kind of data do you plan on storing? If its user sensitive data, then it probably makes sense to store it in the session. If it's client specific and it doesn't contain any sensitive information, than cookies is probably a reasonable way forward.
If it is indeed an application wide data and it must be the same for every user of your application, then you can make configuration changes to make sure that it doesn't expiry after 20 minutes.

Application data in Sinatra

Say I have some objects that need to be created only once in the application yet accessed from within multiple requests. The objects are immutable. What is the best way to do this?
Store them in the session.
If you don't want to lose them after a server's restart, then use a database (SQLite, which is a single file, for example).
You want to persist your objects. Normally you'd do it with some ORM like Active Record or Datamapper. Depending on what is available to you. If you want something dead simple without migrations and you have access to a MongoDB use mongomapper.
If that object is used only for some time, then it is discarded (and if needed again, then recreated), use some caching mechanism, like memcached or redis.
If setting up such services is heavy and you want to avoid it, and - say - you are using Debian/Ubuntu then save your objects into files (with Marshal-ing) in the /shm device which is practically memory.
If the structure of the data is complex, then go with SQLite as suggested above.

Multiple webRole instances at Azure and session state

I have webRole with some data stored in Session. The data is some tens of small variables (strings), and one-two big objects (some megabytes). I need to run this webRole in multiple instances. Since two requests from the single user can go to different instances, Session became useless. So, i am looking for most efficient and simplest method of storing volatile user data for this case. I know that i can store it in cookies at client side, but this will fail for big objects. I also know that i can user data in Azure storage - but this seems to be more complicated than Session. Can anybody suggest both efficient and simple method, like Session state? Or probably some workaround to get Session state working correctly when multiple instances enabled.
This may help
http://social.msdn.microsoft.com/Forums/en-US/windowsazure/thread/7ddc0ca8-0cc5-4549-b44e-5b8c39570896
You need to use another session state storage than memory. In Azure you can use Cache, Storage tables or SQL server to share session data between instances.

Resources