Thin server with application state - ruby

I need to build a webservice with application state. By this I mean the webservice needs to load and process a lot of data before being ready to answer requests, so a Rails-like approach where normally you don't keep state at the application level between two requests doesn't look appropriate.
I was wondering if a good approach was a daemon (using Daemon-Kit for instance) embedding a simple web server like Thin. The daemon would load and process the initial data.
But I feel it would be better to use Thin directly (launched with Rack). In this case how can I initialize and maintain my application state ?
EDIT: There will be thousands of requests per second, so having to read the app state from files or DB at each one is not efficient. I need to use global variables, and I am wondering what it the cleanest way to initialize and store then in a Ruby/Thin environment.

You could maintain state a number of ways.
A database, including NoSQL databases like Memcache or Redis
A file, or multiple files
Global variables or class variables, assuming the server never gets restarted/reloaded

Related

Is there a way to update cached in-memory value on all running instance of a serverless function? (AWS,Google,Azure or OpenWhisk)

Suppose I am running a serverless function with a global state variable which is cached in memory. Assuming that the value is cached on multiple running instances, how an update to the global state would be broadcasted to every serverless instance with the updated value?
Is this possible in any of the serverless framework?
It depends on the serverless framework you're using, which makes it hard to give a useful answer on Stack Overflow. You'll have to research each of them. And you'll have to review this over time because their underlying implementations can change.
In general, you will be able to achieve your goal as long as you can open up a bidirectional connection from each function instance so that your system outside the function instances can send them updates when it needs to. This is because you can't just send a request and have it reach every backing instance. The serverless frameworks are specifically designed to not work that way. They load balance your requests to the various backing instances. And it's not guaranteed to be round robin, so there's no way for you to be confident you're sending enough duplicate requests for each of the backing instances to have been hit at least once.
However, there is something also built into most serverless frameworks that may stop you, even if you can open up long lives connections from each of them that allow them to be reliably messaged at least once each. To help keep resources available for functions that need them, inactive functions are often "paused" in some way. Again, each framework will have its own way of doing this.
For example, OpenWhisk has a configurable "grace period" where it allows CPU to be allocated only for a small period of time after the last request for a container. OpenWhisk calls this pausing and unpausing containers. When a container is paused, no CPU is allocated to it, so background processing (like if it's Node.js and you've put something onto the event loop with setInterval) will not run and messages sent to it from a connection it opened will not be responded to.
This will prevent your updates from reliably going out unless you have constant activity that keeps every OpenWhisk container not only warm, but unpaused. And, it goes against the interests of the folks maintaining the OpenWhisk cluster you're deploying to. They will want to pause your container as soon as they can so that the CPU it consumed can be allocated to containers not yet paused instead. They will try to tune their cluster so that containers remain unpaused for a duration as short as possible after a request/event is handled. So, this will be hard for you to control unless you're working with an OpenWhisk deployment you control, in which case you just need to tune it according to your needs.
Network restrictions that interfere with your ability to open these connections may also prevent you from using this architecture.
You should take these factors into consideration if you plan to use a serverless framework and consider changing your architecture if you require global state that would be mutated this way in your system.
Specifically, you should consider switching to a stateless design where instead of caching occurring in each function instance, it occurs in a shared service designed for fast caching, like Redis or Memcached. Then each function can check that shared caching service for the data before retrieving it from its source. Many cloud providers who provide serverless compute options also provide managed databases like these. So you can often deploy it all to the same place.
Also, you could switch, if not to a stateless design, a pull model for caching instead of a push model. Instead of having updates pushed out to each function instance to refresh their cached data, each function would pull fresh data from its source when they detect that the data stored in their memory has expired.

avoid persisting memory state between requests in FastAPI

Is there a way to deploy a FastAPI application so that memory state cannot be persisted between requests? The goal is to avoid leaking any data between requests in a multi-tenant application.
Starting up the application from scratch for every request seems not feasible since it takes too long. Is there a way in which the application is launched for every instance of the service but individual requests are handled by workers or threads that get purged after the request is handled so that any static property, singleton instance and such is destroyed and the next request is handled with clean memory?
FastAPI is basically stateless by default. It actually takes extra work to persist data across requests through methods such connection pooling, reading a value from Redis, and so on. If you consider things such as starting up the server, loading a configuration, setting up path redirects, and so on to be "state", then FastAPI will not work for your purposes.
When you say "memory state", it sounds like you are trying to partition off instances of FastAPI server from each other so that they do not even use the same memory. This is not going to be a viable solution because most web servers, FastAPI included, are not designed for this type of segregating. By default, the requests from one tenant will not have anything to do with the requests from another tenant unless you write additional code that allows them to become related; so separating the concerns of the different tenants becomes a matter for the programmer, not the server's memory.
Instead, if you absolutely cannot let requests from multiple tenants inhabit the same memory, you'd be better off giving different tenants their own subdomain on the DNS level. Spin up a VPS and instance of your FastAPI program for each of them. That will truly prevent the requests from one tenant share any memory or state with the others.

How to allow sinatra poll for data smartly

I am wanting to design an application where the back end is constantly polling different sensors while the front end (sinatra) allows for this data to be viewed either via json api, or by simply displaying the results in html.
What considerations should I take to develop such an application and how should I structure the application for best scaling and ease of maintenance.
My first thought is to simply let sinatra poll the sensors every time it receives a request to the proper end points, but this seems like it could bog down quiet fast especially seeing how that some sensors only update themselves every couple seconds.
My second thought is to have a background process (or thread) poll the sensors and store the values for sinatra. When a request is received sinatra can then simply poll the background process for a cached value (or pull it from the threaded code) and present it to the client.
I like the second thought more, but I am not sure how I would develop the "background application" so that sinatra could poll it for data to present to the client. The other option would be for sinatra to thread the sensor polling code so that it can simply grab values from it inside the same process rather than requesting it from another process.
Due note that this application will also be responsible for automation of different relays and such based off the sensors and sinatra is only responsible for relaying the status of the sensors to the user. I think separating the backend (automation + sensor information) in a background process/daemon from the frontend (sinatra) would be ideal, but I am not sure how I would fetch the data for sinatra.
Anyone have any input on how I could structure this? If possible I would also appreciate a sample application that simply displays the idea that I could adopt and modify.
Thanks
Edit::
After a bit more research I have discovered drb (distributed ruby http://ruby-doc.org/stdlib-1.9.3/libdoc/drb/rdoc/DRb.html) which allows you to make remote calls on objects over the network. This may be a suitable solution to this problem as the daemon can automate the relays, read the sensors and store the values in class objects, and then present the class objects over drb so that sinatra can call the getters on the remote object to obtain up to date data from the daemon. This is what I initially wanted to attempt to do.
What do you guys think? Is this advisable for such an application?
I have decided to go with Sinatra, DRB, and Daemons to meet the requirements I have stated above.
The web front end will run in its own process and only serve up statistical information via DRB interactions with the backend. This will allow quick response times for the clients and allow me to separate front end code from backend code.
The backend will run in its own process and constantly poll the sensors for updates and store them as class objects with getters so that Sinatra can fetch the information over DRB when required. It will also use the gathered information for automation that is project specific.
Finally the backend and frontend will be wrapped with a Daemons wrapper so that the project will have the capabilities of starting, restarting, stopping, run status, and automatic restarting of the Daemons if it crashes or quits for what ever reason.
Source information:
http://phrogz.net/drb-server-for-long-running-web-processes
http://ruby-doc.org/stdlib-1.9.3/libdoc/drb/rdoc/DRb.html
http://www.sinatrarb.com/
https://github.com/thuehlinger/daemons

Best way to initialize initial connection with a server for REST calls?

I've been building some apps that connect to a SQL backend. I use ajax calls to hit WebMethods, a WebAPI, etc.
I notice that the first initial call to the SQL backend retrieves the data fairly slow. I can only assume that this is because it must first negotiate credentials first before retrieving the data. It probably caches this somewhere, and thus, any calls made afterwards come back very fast.
I'm wondering if there's an ideal, or optimal way, to initialize this connection.
My thought was to make a simple GET call right when the page loads (grabbing something very small, like a single entry). I probably wouldn't be using the returned data in any useful way, other than to ensure that any calls afterwards come back faster.
Is this an okay way to approach fixing the initial delay? I'd love to hear how others handle this.
Cheers!
There are a number of reasons that your first call could be slower than subsequent ones
Depending on your server platform, code may be compiled when first executed
You may not have an active DB connection in your connection pool
The database may not have cached indices or data on the first call
Some VM platforms may take a while to allocate sufficient resources to your server if it has been idle for a while.
One way I deal with those types of issues on the server side is to add startup code to my web service that fetches data likely to be used by many callers when the service first initializes (e.g. lookup tables, user credential tables, etc).
If you only control the client, consider that you may well wish to monitor server health (I use the open source monitoring platform Zabbix. There are also many commercial web-based monitoring solutions). Exercising the server outside of end-user code is probably better than making an extra GET call from a page that an end user has loaded.

CPU bound/stateful distributed system design

I'm working on a web application frontend to a legacy system which involves a lot of CPU bound background processing. The application is also stateful on the server side and the domain objects needs to be held in memory across the entire session as the user operates on it via the web based interface. Think of it as something like a web UI front end to photoshop where each filter can take 20-30 seconds to execute on the server side, so the app still has to interact with the user in real time while they wait.
The main problem is that each instance of the server can only support around 4-8 instances of each "workspace" at once and I need to support a few hundreds of concurrent users at once. I'm going to be building this on Amazon EC2 to make use of the auto scaling functionality. So to summarize, the system is:
A web application frontend to a legacy backend system
task performed are CPU bound
Stateful, most calls will be some sort of RPC, the user will make multiple actions that interact with the stateful objects held in server side memory
Most tasks are semi-realtime, where they have to execute for 20-30 seconds and return the results to the user in the same session
Use amazon aws auto scaling
I'm wondering what is the best way to make a system like this distributed.
Obviously I will need a web server to interact with the browser and then send the cpu-bound tasks from the web server to a bunch of dedicated servers that does the background processing. The question is how to best hook up the 2 tiers together for my specific neeeds.
I've been looking at message Queue systems such as rabbitMQ but these seems to be geared towards one time task where any worker node can simply grab a job form a queue, execute it and forget the state. My needs are a little different since there could be multiple 'tasks' that needs to be 'sticky', for example if step 1 is started in node 1 then step 2 for the same workspace has to go to the same worker process.
Another problem I see is that most worker queue systems seems to be geared towards background tasks that can be processed anytime rather than a system that has to provide user feedback that I'm dealing with.
My question is, is there an off the shelf solution for something like this that will allow me to easily build a system that can scale? Would love to hear your thoughts.
RabbitMQ is has an RPC tutorial. I haven't used this pattern in particular but I am running RabbitMQ on a couple of nodes and it can handle hundreds of connections and millions of messages. With a little work in monitoring you can detect when there is more work to do then you have consumers for. Messages can also timeout so queues won't backup too greatly. To scale out capacity you can create multiple RabbitMQ nodes/clusters. You could have multiple rounds of RPC so that after the first response you include the information required to get second message to the correct destination.
0MQ has this as a basic pattern which will fanout work as needed. I've only played with this but it is simpler to code and possibly simpler to maintain (as it doesn't need a broker, devices can provide one though). This may not handle stickiness by default but it should be possible to write your own routing layer to handle it.
Don't discount HTTP for this as well. When you want request/reply, a strict throughput per backend node, and something that scales well, HTTP is well supported. With AWS you can use their ELB easily in front of an autoscaling group to provide the routing from frontend to backend. ELB supports sticky sessions as well.
I'm a big fan of RabbitMQ but if this is the whole scope then HTTP would work nicely and have fewer moving parts in AWS than the other solutions.

Resources