avoid persisting memory state between requests in FastAPI - multi-tenant

Is there a way to deploy a FastAPI application so that memory state cannot be persisted between requests? The goal is to avoid leaking any data between requests in a multi-tenant application.
Starting up the application from scratch for every request seems not feasible since it takes too long. Is there a way in which the application is launched for every instance of the service but individual requests are handled by workers or threads that get purged after the request is handled so that any static property, singleton instance and such is destroyed and the next request is handled with clean memory?

FastAPI is basically stateless by default. It actually takes extra work to persist data across requests through methods such connection pooling, reading a value from Redis, and so on. If you consider things such as starting up the server, loading a configuration, setting up path redirects, and so on to be "state", then FastAPI will not work for your purposes.
When you say "memory state", it sounds like you are trying to partition off instances of FastAPI server from each other so that they do not even use the same memory. This is not going to be a viable solution because most web servers, FastAPI included, are not designed for this type of segregating. By default, the requests from one tenant will not have anything to do with the requests from another tenant unless you write additional code that allows them to become related; so separating the concerns of the different tenants becomes a matter for the programmer, not the server's memory.
Instead, if you absolutely cannot let requests from multiple tenants inhabit the same memory, you'd be better off giving different tenants their own subdomain on the DNS level. Spin up a VPS and instance of your FastAPI program for each of them. That will truly prevent the requests from one tenant share any memory or state with the others.

Related

Is there a way to update cached in-memory value on all running instance of a serverless function? (AWS,Google,Azure or OpenWhisk)

Suppose I am running a serverless function with a global state variable which is cached in memory. Assuming that the value is cached on multiple running instances, how an update to the global state would be broadcasted to every serverless instance with the updated value?
Is this possible in any of the serverless framework?
It depends on the serverless framework you're using, which makes it hard to give a useful answer on Stack Overflow. You'll have to research each of them. And you'll have to review this over time because their underlying implementations can change.
In general, you will be able to achieve your goal as long as you can open up a bidirectional connection from each function instance so that your system outside the function instances can send them updates when it needs to. This is because you can't just send a request and have it reach every backing instance. The serverless frameworks are specifically designed to not work that way. They load balance your requests to the various backing instances. And it's not guaranteed to be round robin, so there's no way for you to be confident you're sending enough duplicate requests for each of the backing instances to have been hit at least once.
However, there is something also built into most serverless frameworks that may stop you, even if you can open up long lives connections from each of them that allow them to be reliably messaged at least once each. To help keep resources available for functions that need them, inactive functions are often "paused" in some way. Again, each framework will have its own way of doing this.
For example, OpenWhisk has a configurable "grace period" where it allows CPU to be allocated only for a small period of time after the last request for a container. OpenWhisk calls this pausing and unpausing containers. When a container is paused, no CPU is allocated to it, so background processing (like if it's Node.js and you've put something onto the event loop with setInterval) will not run and messages sent to it from a connection it opened will not be responded to.
This will prevent your updates from reliably going out unless you have constant activity that keeps every OpenWhisk container not only warm, but unpaused. And, it goes against the interests of the folks maintaining the OpenWhisk cluster you're deploying to. They will want to pause your container as soon as they can so that the CPU it consumed can be allocated to containers not yet paused instead. They will try to tune their cluster so that containers remain unpaused for a duration as short as possible after a request/event is handled. So, this will be hard for you to control unless you're working with an OpenWhisk deployment you control, in which case you just need to tune it according to your needs.
Network restrictions that interfere with your ability to open these connections may also prevent you from using this architecture.
You should take these factors into consideration if you plan to use a serverless framework and consider changing your architecture if you require global state that would be mutated this way in your system.
Specifically, you should consider switching to a stateless design where instead of caching occurring in each function instance, it occurs in a shared service designed for fast caching, like Redis or Memcached. Then each function can check that shared caching service for the data before retrieving it from its source. Many cloud providers who provide serverless compute options also provide managed databases like these. So you can often deploy it all to the same place.
Also, you could switch, if not to a stateless design, a pull model for caching instead of a push model. Instead of having updates pushed out to each function instance to refresh their cached data, each function would pull fresh data from its source when they detect that the data stored in their memory has expired.

Is it worth to have multi instances on same machine for scaling system to handle more request

As I mentioned in the title, if I want to scale the system to handle more requests, then the created instances should be in different servers. is it right?. for example in tomcat in the context of spring, tomcat can handle up to 200 requests parallel, because it can create 200 threads, and each thread will handle the request. if i would like to handle more than 200 requsts in same time, then i should create multi instances in different servers and create loadbalancer to balance between instances. it does not make any sense to have multi instances on the same machine, right?
Is it worth it? You'll have to judge. More important: You'll have to measure.
You can configure the thread pool of a single instance to handle more connections - but that wouldn't help if the CPU, Memory or I/O connections are already fully saturated. However, it doesn't really matter if that happens from within one or many processes.
It could make sense to have multiple processes if they all have different requirements to the underlying infrastructure. But if one of them saturates any resources already, then it doesn't make sense to add more processes. Or more threads for that matter.

Spring Session - asynchronous call handling

Does Spring Session management take care of asynchronous calls?
Say that we have multiple controllers and each one is reading/writing different session attributes. Will there be a concurrency issue as the session object is entirely written/read to/from external servers and not the attributes alone?
We are facing such an issue that the attributes set from a controller are not present in the next read... this is an intermittent issue depending on the execution of other controllers in parallel.
When we use the session object from the container we never faced this issue... assuming that it is a direct attribute set/get happening right on to the session object in the memory.
The general use case for the session is storing some user specific data. If I am understanding your context correctly, your issue describes the scenario in which a user, while for example being authenticated from two devices (for example a PC and a phone - hence withing the bounds of the same session) is hitting your backend with requests so fast you face concurrency issues around reading and writing the session data.
This is not a common (and IMHO reasonable) scenario for the session, so projects such as spring-data-redis or spring-data-gemfire won't support it out of the box.
The good news is that spring-session was built with flexibility in mind, so you could of course achieve what you want. You could implement your own version of SessionRepository and manually synchronize (for example via Redis distributed locks) the relevant methods. But, before doing that, check your design and make sure you are using session for the right data storage job.
This question is very similar in nature to your last question. And, you should read my answer to that question before reading my response/comments here.
The previous answer (and insight) posted by the anonymous user is fairly accurate.
Anytime you have a highly concurrent (Web) application/environment where many different, simultaneous HTTP requests are coming in, accessing the same HTTP session, there is always a possibility for lost updates caused by race conditions between competing concurrent HTTP requests. This is due to the very nature of a Servlet container (e.g. Apache Tomcat, or Eclipse Jetty) since each HTTP request is processed by, and in, a separate Thread.
Not only does the HTTP session object provided by the Servlet container need to be Thread-safe, but so too do all the application domain objects that your Web application puts into the HTTP session. So, be mindful of this.
In addition, most HTTP session implementations, such as Apache Tomcat's, or even Spring Session's session implementations backed by different session management providers (e.g. Spring Session Data Redis, or Spring Session Data GemFire) make extensive use of "deltas" to send only the changes (or differences) to the Session state, there by minimizing the chance of lost updates due to race conditions.
For instance, if the HTTP session currently has an attribute key/value of 1/A and HTTP request 1 (processed by Thread 1) reads the HTTP session (with only 1/A) and adds an attribute 2/B, while another concurrent HTTP request 2 (processed by Thread 2) reads the same HTTP session, by session ID (seeing the same initial session state with 1/A), and now wants to add 3/C, then as Web application developers, we expect the end result and HTTP session state to be, after request 1 & 2 in Threads 1 & 2 complete, to include attributes: [1/A, 2/B, 3/C].
However, if 2 (or even more) competing HTTP requests are both modifying say HTTP sessoin attribute 1/A and HTTP request/Thread 1 wants to set the attribute to 1/B and the competing HTTP request/Thread 2 wants to set the same attribute to 1/C then who wins?
Well, it turns out, last 1 wins, or rather, the last Thread to write the HTTP session state wins and the result could either be 1/B or 1/C, which is indeterminate and subject to the vagaries of scheduling, network latency, load, etc, etc. In fact, it is nearly impossible to reason which one will happen, much less always happen.
While our anonymous user provided some context with, say, a user using multiple devices (a Web browser and perhaps a mobile device... smart phone or tablet) concurrently, reproducing this sort of error with a single user, even multiple users would not be impossible, but very improbable.
But, if we think about this in a production context, where you might have, say, several hundred Web application instances, spread across multiple physical machines, or VMs, or container, etc, load balanced by some network load balancer/appliance, and then throw in the fact that many Web applications today are "single page apps", highly sophisticated non-dumb (no longer thin) but thick clients with JavaScript and AJAX calls, then we begin the understand that this scenario is much more likely, especially in a highly loaded Web application; think Amazon or Facebook. Not only many concurrent users, but many concurrent requests by a single user given all the dynamic, asynchronous calls that a Web application can make.
Still, as our anonymous user pointed out, this does not excuse the Web application developer from responsibly designing and coding our Web application.
In general, I would say the HTTP session should only be used to track very minimal (i.e. in quantity) and necessary information to maintain a good user experience and preserve the proper interaction between the user and the application as the user transitions through different parts or phases of the Web app, like tracking preferences or items (in a shopping cart). In general, the HTTP session should not be used to store "transactional" data. To due so is to get yourself into trouble. The HTTP session should be primarily a read heavy data structure (rather than write heavy), particularly because the HTTP session can be and most likely will be accessed from multiple Threads.
Of course, different backing data stores (like Redis, and even GemFire) provide locking mechanisms. GemFire even provides cache level transactions, which is very heavy and arguable not appropriate when processing Web interactions managed in and by an HTTP session object (not to be confused with transactions). Even locking is going to introduce serious contention and latency to the application.
Anyway, all of this is to say that you very much need to be conscious of the interactions and data access patterns, otherwise you will find yourself in hot water, so be careful, always!
Food for thought!

CPU bound/stateful distributed system design

I'm working on a web application frontend to a legacy system which involves a lot of CPU bound background processing. The application is also stateful on the server side and the domain objects needs to be held in memory across the entire session as the user operates on it via the web based interface. Think of it as something like a web UI front end to photoshop where each filter can take 20-30 seconds to execute on the server side, so the app still has to interact with the user in real time while they wait.
The main problem is that each instance of the server can only support around 4-8 instances of each "workspace" at once and I need to support a few hundreds of concurrent users at once. I'm going to be building this on Amazon EC2 to make use of the auto scaling functionality. So to summarize, the system is:
A web application frontend to a legacy backend system
task performed are CPU bound
Stateful, most calls will be some sort of RPC, the user will make multiple actions that interact with the stateful objects held in server side memory
Most tasks are semi-realtime, where they have to execute for 20-30 seconds and return the results to the user in the same session
Use amazon aws auto scaling
I'm wondering what is the best way to make a system like this distributed.
Obviously I will need a web server to interact with the browser and then send the cpu-bound tasks from the web server to a bunch of dedicated servers that does the background processing. The question is how to best hook up the 2 tiers together for my specific neeeds.
I've been looking at message Queue systems such as rabbitMQ but these seems to be geared towards one time task where any worker node can simply grab a job form a queue, execute it and forget the state. My needs are a little different since there could be multiple 'tasks' that needs to be 'sticky', for example if step 1 is started in node 1 then step 2 for the same workspace has to go to the same worker process.
Another problem I see is that most worker queue systems seems to be geared towards background tasks that can be processed anytime rather than a system that has to provide user feedback that I'm dealing with.
My question is, is there an off the shelf solution for something like this that will allow me to easily build a system that can scale? Would love to hear your thoughts.
RabbitMQ is has an RPC tutorial. I haven't used this pattern in particular but I am running RabbitMQ on a couple of nodes and it can handle hundreds of connections and millions of messages. With a little work in monitoring you can detect when there is more work to do then you have consumers for. Messages can also timeout so queues won't backup too greatly. To scale out capacity you can create multiple RabbitMQ nodes/clusters. You could have multiple rounds of RPC so that after the first response you include the information required to get second message to the correct destination.
0MQ has this as a basic pattern which will fanout work as needed. I've only played with this but it is simpler to code and possibly simpler to maintain (as it doesn't need a broker, devices can provide one though). This may not handle stickiness by default but it should be possible to write your own routing layer to handle it.
Don't discount HTTP for this as well. When you want request/reply, a strict throughput per backend node, and something that scales well, HTTP is well supported. With AWS you can use their ELB easily in front of an autoscaling group to provide the routing from frontend to backend. ELB supports sticky sessions as well.
I'm a big fan of RabbitMQ but if this is the whole scope then HTTP would work nicely and have fewer moving parts in AWS than the other solutions.

Thin server with application state

I need to build a webservice with application state. By this I mean the webservice needs to load and process a lot of data before being ready to answer requests, so a Rails-like approach where normally you don't keep state at the application level between two requests doesn't look appropriate.
I was wondering if a good approach was a daemon (using Daemon-Kit for instance) embedding a simple web server like Thin. The daemon would load and process the initial data.
But I feel it would be better to use Thin directly (launched with Rack). In this case how can I initialize and maintain my application state ?
EDIT: There will be thousands of requests per second, so having to read the app state from files or DB at each one is not efficient. I need to use global variables, and I am wondering what it the cleanest way to initialize and store then in a Ruby/Thin environment.
You could maintain state a number of ways.
A database, including NoSQL databases like Memcache or Redis
A file, or multiple files
Global variables or class variables, assuming the server never gets restarted/reloaded

Resources