Is there a way to update cached in-memory value on all running instance of a serverless function? (AWS,Google,Azure or OpenWhisk) - aws-lambda

Suppose I am running a serverless function with a global state variable which is cached in memory. Assuming that the value is cached on multiple running instances, how an update to the global state would be broadcasted to every serverless instance with the updated value?
Is this possible in any of the serverless framework?

It depends on the serverless framework you're using, which makes it hard to give a useful answer on Stack Overflow. You'll have to research each of them. And you'll have to review this over time because their underlying implementations can change.
In general, you will be able to achieve your goal as long as you can open up a bidirectional connection from each function instance so that your system outside the function instances can send them updates when it needs to. This is because you can't just send a request and have it reach every backing instance. The serverless frameworks are specifically designed to not work that way. They load balance your requests to the various backing instances. And it's not guaranteed to be round robin, so there's no way for you to be confident you're sending enough duplicate requests for each of the backing instances to have been hit at least once.
However, there is something also built into most serverless frameworks that may stop you, even if you can open up long lives connections from each of them that allow them to be reliably messaged at least once each. To help keep resources available for functions that need them, inactive functions are often "paused" in some way. Again, each framework will have its own way of doing this.
For example, OpenWhisk has a configurable "grace period" where it allows CPU to be allocated only for a small period of time after the last request for a container. OpenWhisk calls this pausing and unpausing containers. When a container is paused, no CPU is allocated to it, so background processing (like if it's Node.js and you've put something onto the event loop with setInterval) will not run and messages sent to it from a connection it opened will not be responded to.
This will prevent your updates from reliably going out unless you have constant activity that keeps every OpenWhisk container not only warm, but unpaused. And, it goes against the interests of the folks maintaining the OpenWhisk cluster you're deploying to. They will want to pause your container as soon as they can so that the CPU it consumed can be allocated to containers not yet paused instead. They will try to tune their cluster so that containers remain unpaused for a duration as short as possible after a request/event is handled. So, this will be hard for you to control unless you're working with an OpenWhisk deployment you control, in which case you just need to tune it according to your needs.
Network restrictions that interfere with your ability to open these connections may also prevent you from using this architecture.
You should take these factors into consideration if you plan to use a serverless framework and consider changing your architecture if you require global state that would be mutated this way in your system.
Specifically, you should consider switching to a stateless design where instead of caching occurring in each function instance, it occurs in a shared service designed for fast caching, like Redis or Memcached. Then each function can check that shared caching service for the data before retrieving it from its source. Many cloud providers who provide serverless compute options also provide managed databases like these. So you can often deploy it all to the same place.
Also, you could switch, if not to a stateless design, a pull model for caching instead of a push model. Instead of having updates pushed out to each function instance to refresh their cached data, each function would pull fresh data from its source when they detect that the data stored in their memory has expired.

Related

Are service fabric services entirely single-threaded?

I'm trying to get to grips with service fabric and I'm struggling a little bit. Some questions:
are all service fabric service instances single-threaded? I created a stateless web api, one instance, with a method that did a Task.Delay, then returned a string. Two requests to this service were served one after the other, not concurrently. So am I right in thinking then that the number of concurrent requests that can be served is purely a function of the service instance count in the application manifest? Edit Thinking about this, it is probably to do with the set up of OWIN Wep Api. Could it be it is blocking by session? I assumed there is no session by default?
I have long-running operations that I need to perform in service fabric (that can take several hours). Is there a recommended pattern that I can use for this in service fabric? These are currently handled using a storage queue that triggers a webjob. Maybe something with Reliable Queues and a RunAsync loop?
It seems you handled the first part so I will comment on the second part: "long-running operations".
We can see long running operations / workflows being handled far before service fabric came about. For this reason, we can build on the shoulders of giants by looking on the design patterns that software experts have been using for decades. For example, the famous and all inclusive Process Manager. Mind you that this pattern is sometimes an overkill. If it is in your case, just check out the rest of the related patterns in the Enterprise Integration Patterns book (by Gregor Hohpe).
As for the use of reliable collections, those are implementation details when choosing a data structure supporting the chosen design pattern.
I hope that helps
With regards to your second point - It really depends on the nature of your long running task.
Is your long running task the kind of workload that runs on an isolated thread that depends on local OS/VM level resources and eventually comes back with a result (A)? or is it the kind of long running task that goes through stages and builds up a model of the result through a series of persisted state changes (B)?
From what I understand of Service Fabric, it isn't really designed for running long running workloads (A), but more for writing horizontally-scalable, highly-available systems.
If you were absolutely keen on using service fabric (and your kind of workload tends to be more like B than A) I would definitely find a way to break down those long running tasks that could be processed in parallel across the cluster. But even then, there is probably more appropriate technologies designed for this such as Azure Batch?
P.s. If you are going to put a long running process in the RunAsync method, you should design the workload so it is interruptable and its state can be persisted in a way that can be resumed from another node in the cluster
In a stateful service, only the primary replica has write access to
state and thus is generally when the service is performing actual
work. The RunAsync method in a stateful service is executed only when
the stateful service replica is primary. The RunAsync method is
cancelled when a primary replica's role changes away from primary, as
well as during the close and abort events.
P.s.s Long running operations are the devil when trying to write scalable systems. Try and tackle that now and save yourself the future pain if possibe.
To the first point - this is purely a client issue. Chrome saw my requests as indentical and so delayed the 2nd request until the 1st got a response. Varying the parameter of the requests allowed them to be served concurrently.

Why does my Redis key show up only minutes after being stored?

I have a handler function on AWS Lambda that is connecting to a Redis instance to store a single key in the cache. The function has completed successfully but the key in Redis shows up minutes (or more) after the fact.
This behavior is observable on both Heroku Redis and Redis Cloud, they're both hosted solutions.
I can't for the life of me figure out what's causing this lag. My Redis knowledge is practically zero, I know how to store a list using LPUSH and how to trim that list using LTRIM.
The writer to Redis uses this Node client while I observe the lag using redis-cli on my local machine.
Is it common to experience this kind of lack in the setup I describe? What can I do to debug this?
I'm purposefully ignoring most of the information in the question and would like to refer only to the alleged symptom, namely that
key show up only minutes after being stored
This behavior is impossible with Redis - any change to the data is immediately visible given Redis' design. That said, the only scenario what you're describing could be remotely possible is when you're writing to a Redis master server and reading from a very-badly-lagged slave. I can ensure you that this is not the case with Redis Cloud however.
The main reason is due to the fact that the Lambda container starts to sleep as soon as your function terminates, and the Redis client you are using is all asynchronous APIs.
Note that the API is entire asynchronous. To get data back from the server, you'll need to use a callback.
I'm assuming that the asynchronous SET is the last action performed in your Lambda function. Once that is called, the underlying Lambda container goes to sleep, and most likely, the actual SET action hasn't finished its job yet. Therefore, the record will not show in Redis until the exact same Lambda container was called to execute your function again, and finished the job that it was supposed to finish on the last execution. This is probably the lag that you are experiencing.
To test whether or not this is true, do a sleep action for a couple of seconds at the end of your function to delay the Lambda container going to sleep immediately, and see if the lag is still there.
I would also recommend not to use asynchronous behaviour APIs inside Lambda functions. They'll add state to your Lambda computation, and this is actually not recommended by AWS themselves within the Lambda documentations too.

CPU bound/stateful distributed system design

I'm working on a web application frontend to a legacy system which involves a lot of CPU bound background processing. The application is also stateful on the server side and the domain objects needs to be held in memory across the entire session as the user operates on it via the web based interface. Think of it as something like a web UI front end to photoshop where each filter can take 20-30 seconds to execute on the server side, so the app still has to interact with the user in real time while they wait.
The main problem is that each instance of the server can only support around 4-8 instances of each "workspace" at once and I need to support a few hundreds of concurrent users at once. I'm going to be building this on Amazon EC2 to make use of the auto scaling functionality. So to summarize, the system is:
A web application frontend to a legacy backend system
task performed are CPU bound
Stateful, most calls will be some sort of RPC, the user will make multiple actions that interact with the stateful objects held in server side memory
Most tasks are semi-realtime, where they have to execute for 20-30 seconds and return the results to the user in the same session
Use amazon aws auto scaling
I'm wondering what is the best way to make a system like this distributed.
Obviously I will need a web server to interact with the browser and then send the cpu-bound tasks from the web server to a bunch of dedicated servers that does the background processing. The question is how to best hook up the 2 tiers together for my specific neeeds.
I've been looking at message Queue systems such as rabbitMQ but these seems to be geared towards one time task where any worker node can simply grab a job form a queue, execute it and forget the state. My needs are a little different since there could be multiple 'tasks' that needs to be 'sticky', for example if step 1 is started in node 1 then step 2 for the same workspace has to go to the same worker process.
Another problem I see is that most worker queue systems seems to be geared towards background tasks that can be processed anytime rather than a system that has to provide user feedback that I'm dealing with.
My question is, is there an off the shelf solution for something like this that will allow me to easily build a system that can scale? Would love to hear your thoughts.
RabbitMQ is has an RPC tutorial. I haven't used this pattern in particular but I am running RabbitMQ on a couple of nodes and it can handle hundreds of connections and millions of messages. With a little work in monitoring you can detect when there is more work to do then you have consumers for. Messages can also timeout so queues won't backup too greatly. To scale out capacity you can create multiple RabbitMQ nodes/clusters. You could have multiple rounds of RPC so that after the first response you include the information required to get second message to the correct destination.
0MQ has this as a basic pattern which will fanout work as needed. I've only played with this but it is simpler to code and possibly simpler to maintain (as it doesn't need a broker, devices can provide one though). This may not handle stickiness by default but it should be possible to write your own routing layer to handle it.
Don't discount HTTP for this as well. When you want request/reply, a strict throughput per backend node, and something that scales well, HTTP is well supported. With AWS you can use their ELB easily in front of an autoscaling group to provide the routing from frontend to backend. ELB supports sticky sessions as well.
I'm a big fan of RabbitMQ but if this is the whole scope then HTTP would work nicely and have fewer moving parts in AWS than the other solutions.

Policy for EC2 and ELB based on number of transcoding processes on each instance

I need to transcode massive number of audio files on a series of auto-scaling instances behind an ELB. The core of transcoding script is based on Node.Js and FFMPEG. Queuing is impossible because users are not patience! I need to control the number of transcodings on each instance to avoid CPU 100% problem.
My questions:
A- Is there any way to define a policy for ELB to control the number of connections to each instance? if not is there any parameter to control average CPU utilization on each instance and add a new one after triggering level? (I have found this slide but it is not complete) If it adds a new instance on the fly how much it takes time the new instance be 100% operative to serve the user ( I mean does auto scaling have long latency?)
B- Is there another alternative architecture to achieve same transcoding solution? (I have included my current idea to this answer as a drawing). I can not use third party solutions like Transcoding.com I need to have my native solution.
C- I use node.js for each instance and by socket to the user browser show progress. From browser side I send regularly some ajax request to the node.js side to get the progress information. Does this mechanism has problem with sticky session?
Thanks you.
If your scaling needs to take place in response to individual requests on the server (i.e. a single request would require X number of machines to execute in desired timeframe), then autoscaling is probably not going to be the answer for you, as you will have delay as the new instances become active. You will also potentially have much higher cost to run service in such manner as you could scale up and time a number of times in response to individual request, charging you for one hour minimum for each instance that is started.
If however you are concerned with autoscaling, to for example, increase your fleet 50% during peak times when you get request volume spikes (i.e. you already have many servers serving many requests, but you just need to keep latency down during peak hours by adding more instances), then autoscaling should probably work just fine for you.
There are any number of triggers you can configure to control scaling events in such a case.
ELB does support session affinity ("sticky" sessions).
You will want to use an AWS SDK. Normally you'd use one of the official ones for C#, Ruby etc. Since you're on node.js, try using this SDK on github to monitor, throttle and create instance connection pools etc.
https://github.com/awssum/awssum
there's also AWS2JS
https://github.com/SaltwaterC/aws2js

Thin server with application state

I need to build a webservice with application state. By this I mean the webservice needs to load and process a lot of data before being ready to answer requests, so a Rails-like approach where normally you don't keep state at the application level between two requests doesn't look appropriate.
I was wondering if a good approach was a daemon (using Daemon-Kit for instance) embedding a simple web server like Thin. The daemon would load and process the initial data.
But I feel it would be better to use Thin directly (launched with Rack). In this case how can I initialize and maintain my application state ?
EDIT: There will be thousands of requests per second, so having to read the app state from files or DB at each one is not efficient. I need to use global variables, and I am wondering what it the cleanest way to initialize and store then in a Ruby/Thin environment.
You could maintain state a number of ways.
A database, including NoSQL databases like Memcache or Redis
A file, or multiple files
Global variables or class variables, assuming the server never gets restarted/reloaded

Resources