race condition with redis session - ruby

I have the problem that my software has critical data stored in the session.
Since I'm using ajax and the user can simply open the software in several tabs there WILL be parallel requests.
Limiting it to one request at a time is unfortunately not possible.
My original attempt to solve this problem was to use an after_filter in my application_controller to call a method that would detect changes other workers made and merge them into its own session object before saving it.
Unfortunately that did mitigate my problem but not solve it completely.
It seems to me that between my after_filter and the middleware that actually saves my session, which is ActionDispatch::Session::RedisStore, there is still a gap big enough for another worker write his own session.
I cannot think of any other solution to close this gap but this one:
Write a class that inherits from the middleware
teach it to execute the "merge code" in get_session and set_session
replace the original middleware with my class by config.middleware.swap
Before I do this I would like to aks for opinions and advice or, ideally, a better solution. Messing with the middleware seems too dangerous to me to do without asking for advice first.

Since you said the data in session is critical, I think it's better to synchronize requests from each user, while keep concurrency between different users.
For example, you can fire up several rails processes, each listening its own port, and provide a load balancer (e.g. Nginx) above those processes.
A load balancer with session sticky feature is perfect, but IP hash is also acceptable.

Related

Which does stale-while-revalidate cache strategy mean?

I am trying to implement different cache strategies using ServiceWorker. For the following strategies the way to implement is completely clear:
Cache first
Cache only
Network first
Network only
For example, while trying to implement the cache-first strategy, in the fetch hook of the service-worker I will first ask the CacheStorage (or any other) for the requested URL and then if exists respondWith it and if not respondWith the result of network request.
But for the stale-while-revalidate strategy according to this definition of the workbox, I have the following questions:
First about the mechanism itself. Does stale-while-revalidate mean that use cache until the network responses and then use the network data or just use the network response to renew your cache data for the next time?
Now if the network is cached for the next time, then what scenarios contain a real use-case of that?
And if the network response should be replaced immediately in the app, so how could it be done in a service worker? Because the hook will be resolved with the cached data and then network data could not be resolved (with respondWith).
Yes, it means exactly that. The idea is simple: respond immediately from the cache, then refresh the cache in the background for the next time.
All scenarios where it is not important to always get the very latest version of the page/app =) I'm using stale-while-revalidate strategy on two different web applications, one for public transportation services and one for displaying restaurant menu information. Many sites/apps are just fine with this but of course not all.
One very important thing to note here on the #2:
You could eg. use stale-while-revalidate only for static assets. This way your html, js, css, images etc. would be cached and quickly served to the user, but the data fetched dynamically from an API could still be fresh. For some apps this works, for some others not so well. Depends completely on the app. Of course you have to remember not to change the semantics of your API if the user is running a previous version of the app etc.
Not possible in any automatic way. What you could do, however, is implement a msg channel between the Service Worker and the "regular JS code on the page" using window.postMessage API. You could listen for certain messages on the page and then, from the Service Worker, send a msg when an important change has happened and the cache has been updated. Then you could either show the user a prompt telling that the page really needs to be reloaded right now or even force reload it from JS. You would need to put this logic of determining when an important update has happened into the Service Worker of course.

Spring Session - asynchronous call handling

Does Spring Session management take care of asynchronous calls?
Say that we have multiple controllers and each one is reading/writing different session attributes. Will there be a concurrency issue as the session object is entirely written/read to/from external servers and not the attributes alone?
We are facing such an issue that the attributes set from a controller are not present in the next read... this is an intermittent issue depending on the execution of other controllers in parallel.
When we use the session object from the container we never faced this issue... assuming that it is a direct attribute set/get happening right on to the session object in the memory.
The general use case for the session is storing some user specific data. If I am understanding your context correctly, your issue describes the scenario in which a user, while for example being authenticated from two devices (for example a PC and a phone - hence withing the bounds of the same session) is hitting your backend with requests so fast you face concurrency issues around reading and writing the session data.
This is not a common (and IMHO reasonable) scenario for the session, so projects such as spring-data-redis or spring-data-gemfire won't support it out of the box.
The good news is that spring-session was built with flexibility in mind, so you could of course achieve what you want. You could implement your own version of SessionRepository and manually synchronize (for example via Redis distributed locks) the relevant methods. But, before doing that, check your design and make sure you are using session for the right data storage job.
This question is very similar in nature to your last question. And, you should read my answer to that question before reading my response/comments here.
The previous answer (and insight) posted by the anonymous user is fairly accurate.
Anytime you have a highly concurrent (Web) application/environment where many different, simultaneous HTTP requests are coming in, accessing the same HTTP session, there is always a possibility for lost updates caused by race conditions between competing concurrent HTTP requests. This is due to the very nature of a Servlet container (e.g. Apache Tomcat, or Eclipse Jetty) since each HTTP request is processed by, and in, a separate Thread.
Not only does the HTTP session object provided by the Servlet container need to be Thread-safe, but so too do all the application domain objects that your Web application puts into the HTTP session. So, be mindful of this.
In addition, most HTTP session implementations, such as Apache Tomcat's, or even Spring Session's session implementations backed by different session management providers (e.g. Spring Session Data Redis, or Spring Session Data GemFire) make extensive use of "deltas" to send only the changes (or differences) to the Session state, there by minimizing the chance of lost updates due to race conditions.
For instance, if the HTTP session currently has an attribute key/value of 1/A and HTTP request 1 (processed by Thread 1) reads the HTTP session (with only 1/A) and adds an attribute 2/B, while another concurrent HTTP request 2 (processed by Thread 2) reads the same HTTP session, by session ID (seeing the same initial session state with 1/A), and now wants to add 3/C, then as Web application developers, we expect the end result and HTTP session state to be, after request 1 & 2 in Threads 1 & 2 complete, to include attributes: [1/A, 2/B, 3/C].
However, if 2 (or even more) competing HTTP requests are both modifying say HTTP sessoin attribute 1/A and HTTP request/Thread 1 wants to set the attribute to 1/B and the competing HTTP request/Thread 2 wants to set the same attribute to 1/C then who wins?
Well, it turns out, last 1 wins, or rather, the last Thread to write the HTTP session state wins and the result could either be 1/B or 1/C, which is indeterminate and subject to the vagaries of scheduling, network latency, load, etc, etc. In fact, it is nearly impossible to reason which one will happen, much less always happen.
While our anonymous user provided some context with, say, a user using multiple devices (a Web browser and perhaps a mobile device... smart phone or tablet) concurrently, reproducing this sort of error with a single user, even multiple users would not be impossible, but very improbable.
But, if we think about this in a production context, where you might have, say, several hundred Web application instances, spread across multiple physical machines, or VMs, or container, etc, load balanced by some network load balancer/appliance, and then throw in the fact that many Web applications today are "single page apps", highly sophisticated non-dumb (no longer thin) but thick clients with JavaScript and AJAX calls, then we begin the understand that this scenario is much more likely, especially in a highly loaded Web application; think Amazon or Facebook. Not only many concurrent users, but many concurrent requests by a single user given all the dynamic, asynchronous calls that a Web application can make.
Still, as our anonymous user pointed out, this does not excuse the Web application developer from responsibly designing and coding our Web application.
In general, I would say the HTTP session should only be used to track very minimal (i.e. in quantity) and necessary information to maintain a good user experience and preserve the proper interaction between the user and the application as the user transitions through different parts or phases of the Web app, like tracking preferences or items (in a shopping cart). In general, the HTTP session should not be used to store "transactional" data. To due so is to get yourself into trouble. The HTTP session should be primarily a read heavy data structure (rather than write heavy), particularly because the HTTP session can be and most likely will be accessed from multiple Threads.
Of course, different backing data stores (like Redis, and even GemFire) provide locking mechanisms. GemFire even provides cache level transactions, which is very heavy and arguable not appropriate when processing Web interactions managed in and by an HTTP session object (not to be confused with transactions). Even locking is going to introduce serious contention and latency to the application.
Anyway, all of this is to say that you very much need to be conscious of the interactions and data access patterns, otherwise you will find yourself in hot water, so be careful, always!
Food for thought!

Best way to initialize initial connection with a server for REST calls?

I've been building some apps that connect to a SQL backend. I use ajax calls to hit WebMethods, a WebAPI, etc.
I notice that the first initial call to the SQL backend retrieves the data fairly slow. I can only assume that this is because it must first negotiate credentials first before retrieving the data. It probably caches this somewhere, and thus, any calls made afterwards come back very fast.
I'm wondering if there's an ideal, or optimal way, to initialize this connection.
My thought was to make a simple GET call right when the page loads (grabbing something very small, like a single entry). I probably wouldn't be using the returned data in any useful way, other than to ensure that any calls afterwards come back faster.
Is this an okay way to approach fixing the initial delay? I'd love to hear how others handle this.
Cheers!
There are a number of reasons that your first call could be slower than subsequent ones
Depending on your server platform, code may be compiled when first executed
You may not have an active DB connection in your connection pool
The database may not have cached indices or data on the first call
Some VM platforms may take a while to allocate sufficient resources to your server if it has been idle for a while.
One way I deal with those types of issues on the server side is to add startup code to my web service that fetches data likely to be used by many callers when the service first initializes (e.g. lookup tables, user credential tables, etc).
If you only control the client, consider that you may well wish to monitor server health (I use the open source monitoring platform Zabbix. There are also many commercial web-based monitoring solutions). Exercising the server outside of end-user code is probably better than making an extra GET call from a page that an end user has loaded.

User closes the browser without logging out

I am developing a social network in ASP.NET MVC 3. Every user has must have the ability to see connected people.
What is the best way to do this?
I added a flag in the table Contact in my database, and I set it to true when the user logs in and set it to false when he logs out.
But the problem with this solution is when the user closes the browser without logging out, he will still remain connected.
The only way to truly know that a user is currently connected is to maintain some sort of connection between the user and the server. Two options immediately come to mind:
Use javascript to periodically call your server using ajax. You would have a special endpoint on your server that would be used to update a "last connected time" status, and you would have a second endpoint for users to poll to see who is online.
Use a websocket to maintain a persistent connection with your server
Option 1 should be fairly easy to implement. The main thing to keep in mind that this will increase the amount of requests coming into your server, and you will have to plan accordingly in order handle the traffic this could generate. You will have some control over the amount of load on your server by configuring how often javascript timer calls back to your server.
Option 2 could be a little more involved if you did this without library support. Of course there are libraries out there such as SignalR that make this really easy to do. This also has an impact on the performance of your site since each user will be maintaining a persistent connection. The advantage with this approach is that it reduces the need for polling like option 1 does. If you use this approach it would also be very easy to push a message to user A that user B has gone offline.
I guess I should also mention a really easy 3rd option as well. If you feel like your site is pretty interactive, you could just track the last time they made a request to your site. This of course may not give you enough accuracy to determine whether a user is "connected".

Session timeout in web applications

The session timeout in web applications typically denotes the idle time - i.e. the period of time when the user doesn't work with the application.
Now, what if there is an automated script written that posts a request every 5 minutes - wouldn't that user's session go on endlessly? This being the case, won't this approach heavily load the application affecting its performance in the long run?
Running an automated call to the server, say via an AJAX request, will keep the session alive. Typically that's the point though. An interesting side effect of this is that if the request happens predictably and regularly, you can use it as a "ping" to determine if the user's browser is still open. If one or two pings are missed, you can close the session earlier and actually free up resources sooner than if you just let the session time out.
Yes, and Yes.
This is why if you're going to write an application for the web, you really want to find a way to implement it without using server side sessions. Usually, you will be able to find ways to implement the same functionality using cookies -- then the session data is client-side so who cares if they stay active permanently.
I did something similar for an application that relies heavily on session data.
What I did was set the IIS timeout to a relatively low number, say 10 minutes, then have a timed AJAX call that pings a blank page every 5 minutes.
This overhead on this is actually fairly low, as all you are doing is requesting a blank page, and if a person closes their browser, the session ends in 10 minutes.
You want to keep session as small as possible. That said, if everyone starts doing that, of course it will load your application, with(out) session. If you think your users are compelled to do that, consider why, as either your application is missing an important feature or is forcing them into something.
Now, regardless of that, if you are expecting lots of users to be active at the same time, so much than a single server won't do, then you would will end up having the session out of process. If the session is in Sql Server, it is just saved data, so in that case we wouldn't be talking about memory usage.
Well... I guess "It Depends" The first question you should ask yourself is whether you even need session.
If you have an automated process, my guess is that you don't really need to use session.
In that case, either turn it off or don't worry about it.
I guess your session table would be a little bit larger, but on the other hand you won't be tearing down and recreating the session. I don't see how this would "heavily load" the application. I suppose it would depend on the application itself and how much memory is used to maintain session state.
It would allow the use's session to go on endlessly, as long as they have their browser open. If need to keep a session alive for an extended period of time, you could also track the sessions via the DB and not in memory.
Also, if you are worried about the indefinite open session, you could implement a timeout from when the session opened and if there is an extended idle time.

Resources