In my application I want to keep a large amount of data in memory specific to a user currently accessing my web application in a user specific session. As for as I know play framework uses cookie to store session data which has a limit of 4k. How can I have much larger session data? Does ehacache memcache help here? This session has expiration time from last activity of the user.
If a session data is cache'ble its better to keep it in Cache with key as userid and clear it when user logs off. Get it reloaded from DB on relevant DB update/delete. Keeping the content in external cache like memcache, will help you to scale well and will enable you to move to distributed cache in the long run, if required. Check this interesting article on Share Nothing.
The idea with Play is to dispel the need for the session and the keeping of lots of information in memory. The problem with the in-memory approach, is that you tie the user to the specific server that their data is held, where-as the play share nothing approach means you can scale horizontally easily without worry of sticky sessions and the like.
The options you have are
- store transient data in a temporary database that can be accessed via a userId or other unique idenifier of the users session. This database would be the equivalent of your server side session.
- use a cache. However the idea of a cache is that if the information is not in the cache, it can be retrieved from the database (or other source) instead. A cache should not have to guarantee that the data will be available. If in the case of an in memory cache (like ehcache) if you have a load balanced set of servers, you may not be able to guarantee that all requets go back to the same server, so data in the cache may not be available on all servers for a particular session.
The answer to your question depends on your use case, but I think the database is your best approach based on the information you have supplied.
Related
This might be super stupid. Shoot me, but I was in a strange mood yesterday and thought about the following:
What if I store webapp data in a persistent way, just by using sessions. So I store a sessioncookie with an hash, way longer so it's not bruteable. Then just save all stored data in the session. I also set sessiontime to unlimited...
Would there be any use for this? :D
Not really. Most session state implementations keep the sessions in-memory. On app restart (or hardware failure, etc) memory is cleared and session cache is lost.
You could do so if you have your sessions stored in a database rather than in-proc but could be a bit of work depending on what platform you're working with. It's slower as well.
Generally you don't want to keep sessions very large because if they are in-proc sessions, you're going to eat up your servers memory real fast. Even if you go with the database approach for sessions, this is still often done but using in-memory temp tables for sessions and, therefore will eat up the ram of the database server.
Sessions should be light-weight and non-essential to the applications functionality. For anything important that must be persisted, keep it in a database.
I have been given a requirement to persist user data once the user has authenticated initially. We don't want to hit the database to look up the user every time they navigate to a new view etc...
I have a User class that is [Serializable] so it could be stored in a session. I am using SQL server for session state as well. I was thinking of storing the object in session but I really hate doing that.
How are developers handling this type of requirement these days?
Three ways:
Encrypting data in cookies and sending it to client, decrypting it whenever you need it
Storing it server side by an Id (e.g UserId) in Cache, Session, or any other storage(which is safer than cookie).
Use second level caching strategy if you used an ORM
Assuming your user object is not huge and does not change often i think it is acceptable to store it in the session.
Since you already have a sql server session you will be making SP calls to pull/push the data already and adding a small object to that should have minimal perf issues compared to other options like persisting it down to the client and sending it back on every request.
I would also consider the server a much more secure place to keep this info.
You want to minimize the number of times you write to the session(request a lock) when it is stored in sql as it is implemented in a sealed class that exclusivity locks the session. If any of your other requests in this session require write access to the SQL session they will be blocked by the initial request until it releases the session lock. (there are some new hooks in .NET 4 for allowing you to change the SessionStateBehavior in the pipeline before the session is accessed)
You might consider a session state server (appfabric) if perf of your SQL session store is an issue.
Currently our DNS routes the user to the correct datacenter and then we have a round-robin situation for the servers. We currently store the session information in the cookie but it's grown too large so we want to move it out of the browser and into a database. I'm worried that if we create a midteir box that they all hit that the response times will be affected. It's not feasible to store the session info all all machines because we're talking about 200M+ unique sessions a month. Any suggestions, thoughts?
A job for memcached or, if you want to save session data to disk, memcacheddb
Memached is a free & open source, high-performance,
distributed memory object caching
system, generic in nature, but
intended for use in speeding up
dynamic web applications by
alleviating database load.
Memcached is an in-memory key-value
store for small chunks of arbitrary
data (strings, objects) from results
of database calls, API calls, or page
rendering.
Memcached is simple yet powerful. Its
simple design promotes quick
deployment, ease of development, and
solves many problems facing large data
caches. Its API is available for most
popular languages.
Let's understand the role of browser-based cookies
Cookies are stored per browser
profile.
The same user logged on from different computers or browsers is
considered different users.
State cookies are mixed with user cookies
Segregate the cookies.
Long-term state cookies, e.g. the currently-remembered userId.
session state cookies
user cookies
Reading that your site is only beginning to consider server-side cookies implies that a segregation of cookies has not yet been done. User cookies should be stored on server as much as possible, so that when a user logs on at another computer or browser, the preferences and shopping carts are preserved. Your development team has to decide for some cookies, for example shopping carts, to be between being session-state or user info cookies.
User cookies
Need to be accessible across the web site, regardless where the user logs in. Your developers have to decide, when a user updates a preference or shopping cart, how immediate should that change be visible if the same userId is logged in at another location.
Which means you have to implement a distributed database system. You have a master db server. Let us say you have 20 web servers, each server with its own database.
Store only frequently changed cookies on the local db and leave the infrequently changing cookies on the master.
Everytime a cookie is updated at a local db, a updated flag is queued for update to the master. The cookie record in the master is not updated, only marked as stale with the location number where the fresh data resides. So that if that userid somehow gets activated 3000 miles away simultaneously, that session would find out the stale records and trigger a request to copy from those records from the fresh location to its own local db and the master db and the records no longer marked as stale on the master db.
Then you schedule a regular sync of most frequently used cookies. The frequency of sync could be nightly or depends on the result of characterization of cookie modification.
First, your programmers would need to write a routine to log all cookie read/writes. You should collect a week's worth of cookie read/write activity to perform your initial component analysis.
You perform simple statistical characterization per cookie, userid and frequency of change. Then you slide along your preferences deciding which cookie is pushed to all the local dbs and which stays on the master. The decision balances between the size of the cookie block on the local dbs and the frequency of database sync you are willing to allow. Which means not every user have the same set of cookies propagated. of course, your programmers would need to write routines to automate the regular recharacterization. Rather than per user, you might wish to lighten the processing load of cookie propagation by grouping your users using cluster analysis. May be the grouping of users for your site is so obvious that you need not perform cluster analysis.
You might be surprised to find that most of the cookies could fall into the longer-than-weekly-update bucket. Or the worse case, daily-update. and the worst case you should accept is hourly update for cookie fields which are not pushed onto the local dbs. You want to increase the chances that a cookie access occurs on the local db rather than being pulled from the master database. So when a user decides to click on "preferences" which is seldom changed, you preemptively pull the preferences records from the master while distracting the user with some frills like "have you considered preview our new service?", "would you like to answer our usability survey?", "new Gibson rant, would you comment?", etc until the "preferences" cookies are copied over.
The characterization of cookies could be done per userid, or per cluster of users to decide which cookie field to push around to local dbs.
It is more simplistic to characterize per userid because it barely involves any statistical analysis skills on the part of the programmer. The disadvantage is that the web server would have to perform decisions for each of 200 million users. The database cookie table would be
Cookie[id, param, value, expectedMutationInterval].
You web server would decide per user which cookie push regularly by the threshold time.
SELECT param, value
WHERE expectedMutationInterval < $thresholdTime
AND id = UserId
You have to perform a regular recharacterization of cookies to update expectedMutationInterval per user per cookie. A simple SQL query would be able to perform the update of expectedMutationInterval. A more complex analysis could be performed to produce the value expectedMutationInterval.
If each cookie field change is logged by time, userid and ipaddr then your Cookie log table would be
CookieLog[id, time, ipaddr, param, value].
which would help your automated recharacterization routine decide what fields to push depending on the dayofweek/month/season and location/region/ipaddr.
Then after removing user info cookies from the browser, if you still find your sessison cookies overflowing, you now decide which session cookies to push to the browser and which stays on the local server. You use the same master-local db analysis technique but now used to decide between local db and pushing to browser. You leave your least frequently accessed session cookies on the local server, either as session attributes or on in-memory db. So when a client finds a cookie is missing, it makes are request to the server for the cookie while sacrificing some least recently/frequently used cookie space on the browser to accommodate placing of that fresh cookie.
Since these are session cookies, they need be propagated to other locations because if a same userid is logged on 3000 miles away, it should have its own set of session cookies.
Characterization of browser cookies are an irony because, for AJAX apps, the client accesses the cookies without letting the server know. Letting the server know might defeat the purpose of placing the cookies in the browser in the first place. So you would have to choose idle times to send cookie accesses to the server to log - for characterization purposes.
Such level of granularity is good for cookies that are short in lengths (parameter value + parameter name), be it session based or user based cookies.
Therefore, if your parameter names and values of cookie fields are long, you might seek to quantize them.
However, quantization is a little more complex. Browser cookies have a lot of commonality. Just like any quantization/compression method, you look for the clusters of commonalities and assign each commonality block a signature. Then the cookies are stored in terms of the quantized signature.
How do you facilitate quantization of browser-based cookies? Using GWT as an example, use the Dictionary or Map class.
e.g., the cookie "%1"="^$Kdm3i" might translate to LastConnectedFriend=MohammadAli#jinnah.
You should not need to perform characterization, for example, why store your cookie as "LastConnectedFriend" when you could map it to "%1"? When a user logs in, why not map the most frequently accessed friends, etc, and place that map on the GWT/AJAX launching page? In that way you could shorten your session cookie lengths.
So, is your company looking for a statistical programmer? Disclaimer is, this is written off-the-cuff and might need some factual realignment.
I've used Coldfusion sessions for quite a while, so I know how they are used, but now I need to know how they work, so that I can plan for scaling my website.
Is a Coldfusion user 'session' simply a quick method to setup 2 cookies (CFTOKEN and CFID) and an associated server side memory structure? (the SESSION scope) Does it do anything else? I'm trying to identify the overhead associated with user sessions versus other methods such as cookies.
Your understanding of them is basically correct. Although they are not bound to the cookies. The cookies are a recording of a token. That token can get passed in the url string if cookies are not enabled in the browser.
There are 2 main advantages I see of saving things in session instead of cookies:
You control the session scope. People can't edit the data in the session scope without you providing them an interface. Cookies can be modified by the client.
Complex data like structures, arrays, objects, network sessions (FTP, exchange) can be stored there.
Their memory overhead is "low" but that's a relative term. Use the ColdFusion Admin Server Monitor to drill into how much memory your sessions are actually using.
First of all, Session is scope: secure and efficient way to keep current user attributes like permissions or preferences. Not sure what do you mean under "other methods", but I doubt that you'll be able to keep complex data structures (query,object,array) in cookies.
Second, application server provides you with really handy event handlers specially for sessions: onSessionStart() and onSessionEnd().
Third, sessions can be pretty easily shared and clustered: between CF applications or between CF and J2EE.
Sessions are per-user memory space assigned within a particular application space within the jvm memory. The two cookies are pointers to (the token of) that memory space. Yes, there are overhead of using session (RAM, swap space, etc), but unless you're shoving mass amount of data inside the session scope, it shouldn't be that bad.
One aspect of sessions not mentioned is that they have a lifetime: by default 20 minutes (of inactivity). This lifetime can be set by application, but can never be more than the limit set in ColdFusion Administrator.
If memory usage is a concern the time limit could be reduced, although there's still much that depends on the Java garbage collection.
Session variables are normally keept in the web server RAM memory.
In a cluster, each request made by a client can be handled by a different cluster node. right?!
So, in this case...
What happens with session variables? Aren't they stored in the nodes RAM memory?
How the other nodes will handled my request correctly if it doesn't have my session variables, or at least all of it?
This issue is treated by the web server (Apache, IIS) or by the language runtime (PHP, ASP.NET, Ruby, JSP)?
EDIT: Is there some solution for Classic ASP?
To extend #yogman's answer.
Memcached is pure awesomeness! It's a high performance and distributed object cache.
And even though I mentioned distributed it's basically as simple as starting one instance on one of your spare/idle servers, you configure it as in ip, port and how much ram to use and you're done.
memcached -d -u www -m 2048 -l 10.0.0.8 -p 11211
(Runs memcached in daemon mode, as user www, 2048 MB (2 GB) of RAM on IP 10.0.0.8 with port 11211.)
From then on, you ask memcached for data and if the data is not yet cached you pull it from the original source and store it in memcached. I'm sure you are familiar with cache basics.
In a cluster environment you can link up your memcached's into a cluster and replicate the cache across your nodes. Memcached runs on Linux, Unix and Windows, start it anywhere you have spare RAM and start using your resources.
APIs for memcached should be generally available. I'm saying should because I only know of Perl, Java and PHP. But I am sure that e.g. in Python people have means to leverage it as well. There is a memcached wiki, in case you need pointers, or let me know in the comments if I was raving too much. ;)
There are 3 ways to store session state in ASP.NET. The first is in process, where the variables are stored in memory. The second is to use a session state service by putting the following in your web.config file:
<sessionState
mode="StateServer"
stateConnectionString="tcpip=127.0.0.1:42424"
sqlConnectionString="data source=127.0.0.1;user id=sa;password="
cookieless="false"
timeout="20" />
As you can see in the stateConnectionString attribute, the session state service can be located on a different computer.
The third option is to use a centralized SQL database. To do that, you put the following in your web.config:
<sessionState
mode="SQLServer"
stateConnectionString="tcpip=127.0.0.1:42424"
sqlConnectionString=
"data source=SERVERHAME;user id=sa;password="
cookieless="false"
timeout="20"
/>
More details on all of these options are written up here: http://www.ondotnet.com/pub/a/dotnet/2003/03/24/sessionstate.html
Get a Linux machine and set up http://www.danga.com/memcached . Its speed is unbeatable compared to other approaches. (for example, cookies, form hidden variables, databases)
As with all sorts of thing, "it depends".
There are different solutions and approaches.
As mentioned, there's the concept of a centralized store for session state (database, memcached, shared file system, etc.).
There are also cluster wide caching systems available that make local data available to all of the machines in the cluster. Conceptually it's similar to the centralized session state store, but this data isn't persistent. Rather it lives within the individual nodes and is replicated using some mechanism provided by your provider.
Another method is server pinning. When a client hits the cluster the first time, some mechanism (typically a load balancer fronting the cluster) pins the client to a specific server. In a typical client lifespan, that client will spend their entire time on a single machine.
For the failover mechanism, each machine of the cluster is paired with another machine, and so any session changes are shared with the paired machine. Should the clients pinned machine encounter an issue, the client will hit another machine. At this point, perhaps due to cookies, the new machine sees that it's not the original machine for the client, so it pings both the original machine, and the paired machine for the clients session data.
At that point the client may well be pinned to the new machine.
Different platforms do it in different ways, including having no session state at all.
With Hazelcast, you can either use Hazelcast distributed map to store and share sessions across the cluster or let Hazelcast Webapp Manager do everything for you. Please check out the docs for details. Hazelcast is a distributed/partitioned, super lite and easy, free data distribution solution for Java.
Regards,
-talip
http://www.hazelcast.com
To achieve load balancing for classic ASP, you may store the user specific values in the database and pass a reference unique id in the URL as follows.
Maintain a session table in the database which generates a unique id for each record. The first time you want to store session specific data, generate a record in your session table and store the session values in it. Obtain the unique id of the new session record and re-write all links in your web application to send the unique id as part of querystring.
In every subsequent page where you need the session data, query the session table with the unique id passed in the querystring.
Example:
Consider your website to have 4 pages: Login.asp, welcome.asp, taskList.asp, newtask.asp
When the user logs in using login.asp page, after validating the user, create a record in session table and store the required session specific values (lets say user's login date/time for this example). Obtain the new session record's unique id (lets say the unique id is abcd).
Append all links in your website with the unique id as below:
welcome.asp?sessionId=abcd
tasklist.asp?sessionId=abcd
newtask.asp?sessionId=abcd
Now, if in any of the above web pages you want to show the user's login date/time, you just have to query your session table with the sessionID parameter (abcd in this case) and display to the user.
Since the unique value identifying the session is a part of the URL, any of your web servers serving the user will be able to display the correct login date/time value.
Hope this helps.
In ASP.NET you can persist session data to an SQL Server database which is common to all web servers in the cluster.
Once configured (in the web.config for your site), the framework handles all of the persistance for you and you can access the session data as normal.
As Will said, most load-balancing approaches will use some sort of stickiness in the way the distribute forthcoming requests from the same client, meaning, a unique client will hit the same server unless that actual server goes down.
That minimizes the need of distribution of session-data, meaning that only in the eventual failure of a server, a client would loose his session. Depending on your app, this is more or less critical. In most cases, this is not a big issue.
Even the simplest way of loadbalacing (round-rubin the DNS-lookups) will do some sort of stickiness since most browsers will cache the actual lookup and therefor keep going to the first record it received, AFAIK.
It's usually the runtime that is responsible for the sessiondata, in for exampla PHP it's possible to define your own session-handler, which can persist the data into a database for instance. By default PHP stores sessiondata on files, and it might be possible to share these files on a SAN or equivalent in order to share session-data. This was just a theory I had but never got around to test since we decided that loosing sessions wasn't critical and didn't want that single point of failure.