What is wrong in my amazon autoscalling? - amazon-ec2

I have an EC2 instance that I want to scale based on the number of messages in a SQS queue. If there are many messages (for 5 minutes) I want to pop up a new EC2, for consuming faster the messages. Then if the messages are few (for 5 minutes), I want to pop down the oldest EC2. This way, if the service that consumes the messages stops for some reason, I will terminate the old EC2, and the service will run.
I have created an AutoScalling for this. I have set the TerminationPolicy to OldestInstance, but it works as I expect only if I set just one zone (eg: eu-west-1a): it creates a new instance and terminates the oldest each time. But if I have 3 regions (eu-west-1a, eu-west-1b, eu-west-1c), it just launches and terminates the instances not in the OldestInstance manner. Or, at least, not as I expect: delete the oldest every time. Is there something linked to different zones? On this pace I have not found anything about it, except for the default policy.
And even if the case linking to multiple zones from default policy is applied, I can have maximum only 2 instances that turn at the same time. And they are always launched in a new zone.

This is probably the key paragraph:
When you customize the termination policy, Auto Scaling first assesses the Availability Zones for any imbalance. If an Availability Zone has more instances than the other Availability Zones that are used by the group, then Auto Scaling applies your specified termination policy on the instances from the imbalanced Availability Zone. If the Availability Zones used by the group are balanced, then Auto Scaling selects an Availability Zone at random and applies the termination policy that you specified.
I interpret this to mean that if you have instances in multiple zones, and those zones are already balanced, then AWS will select a zone at random AND THEN pick the oldest instance, within the randomly selected zone - it won't pick the oldest instances across AZ's, it picks and random AZ and then the oldest instance is terminated within that AZ.

Related

Concurrency within a single lambda#edge instance?

My understanding is that a single lambda#edge instance can only handle one request at a time, and AWS will spin up new instances if all existing instances are serving a request.
My lambda has a heavy instance startup cost (~2 seconds) but a very light execution cost. It triggers on viewer requests, which always come in batches of ~20 (loading a single-page application). This means one user loading the app, on a cold start, will start ~20 lambda instances and take ~2 seconds.
But due to the very light execution cost, a single lambda instance could handle all 20 requests and it would still take only ~2 seconds.
An extra advantage is, since each instance connects to a 3rd party service on startup, there would be only 1 open connection instead of 20.
Is this possible?
Lambda#edge doesn’t support reserved nor provisioned concurrency.
Here is the link to the documentation for reference: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/edge-functions-restrictions.html#lambda-at-edge-function-restrictions
That being said, with 2s cold start, you might consider using standard lambda.
Also, can’t you reduce that cold start somehow?

What can cause a Cloud Run instance to not be reused despite continuous load?

Context:
My Spring-Boot app runs as expected on Cloud Run when I deploy it with max-instances set to 1: It receives a constant stream of pubsub messages via push, and makes anywhere from 0 to 5 writes to an associated CloudSQL instance, depending on the message payload. Typically it handles between 20 and 40 messages per second. Latency/response-time varies between 50ms and 60sec, probably due to some resource contention.
In order to increase throughput/ decrease resource contention, I'm looking to experiment with the connection pool size per app-instance, as well as the concurrency and max-instances parameters for my cloud run app.
I understand that due to Spring-Boot, my app has a relatively high cold-start time of about 30-40 seconds. This is acceptable for how this service is used.
Problem:
I'm experiencing problems when deploying a spring-boot app to cloud run with max-instances set to a value greater than 1:
Instances start, handle a single request successfully, and then produce no more logs.
This happens a few times per minute, leading me to believe that instances get started (cold-start), handle a single request, die, and then get started again. They are not being reused as described in the docs, and as is happening when I set max-instances to 1. Official docs on concurrency
Instead, I expect 3 container instances to be started, which then each requests according to max-concurrency setting.
Billable container time at max-instances=3:
As shown in the graph, the number of instances is fluctuating wildly, once the new revision with max-instances=3 is deployed.
The graphs for CPU- and memory-usage also look like this.
There are no error logs. As before at max-instaces=1, there are warnings indicating that there are not enough instances available to handle requests (HTTP 429).
Connection Limit of CloudSQL instance has not been exceeded
Requests are handled at less than 10/s
Finally, this is the command used to deploy:
gcloud beta run deploy my-service --project=[...] --image=[...] --add-cloudsql-instances=[...] --region=[...] --platform=managed --memory=1Gi --max-instances=3 --concurrency=3 --no-allow-unauthenticated
What could cause this behavior?
Some month ago, in private Alpha, I performed tests and I observed the same behavior. After discussion with Google team, I understood that instances are over provisioned "in case of": an instances crashes, an instances is preempted, the traffic suddenly increase,...
The trade-off of this is that you will have more cold start that your max instances values. Worse, you will be charged for this over provisioned cold start -> this is not an issue because Cloud Run has a huge free tier that covers this kind of glitches.
Going deeper in the logs (you can do it by creating a sink of Cloud Run logs into BigQuery and then by requesting them), even if there is more instances up than your max instances, only your max instances are active in the same time. I'm not sure to be clear. With your parameters, that means, if you have 5 instances up in the same time, only 3 serve the traffic at the same point of time
This part is not documented because it evolves constantly for find the best balance between over-provisioning and lack of ressources (and 429 errors).
#Steren #AhmetB can you confirm or correct me?
When Cloud Run receives and processes requests rapidly, it predicts how many instances it needs, and will try to scale to the amount. If a sudden burst of requests occur, Cloud Run will instantiate a larger number of instances as a response. This is done in order to adapt to a possible higher number of network requests beyond what it is currently serving, with attempts to take into consideration the length of time it will take for the existing instance to complete loading the request. Per the documentation, it is possible that the amount of container instances can go above the max instance value when it spikes.
You mentioned with max-instances set to 1 it was running fine, but later you mentioned it was in fact producing 429s with it set to 1 as well. Seeing behavior of 429s as well as the instances spiking could indicate that the amount of traffic is not being handled fluidly.
It is also worth noting, because of the cold start time you mention, when instances are serving the first request(s), by design, the number of concurrent requests is actually hard set to 1. Once things are fully ready,only then the concurrency setting you have chosen is applied.
Was there some specific reason you chose 3 and 3 for Max Instance settings and concurrency? Also how was the concurrency set when you had max instance set to 1? Perhaps you could try tinkering up further the concurrency (max 80) and /or Max instances (high limit up to 1000) and see if that removes the 429s.

Spring and scheduled tasks on different Data Centers

I have one spring scheduler , which I will be deploying in 2 different data center.
My data centers will be in active and passive mode. I am looking for a mechanism where passive data center scheduler start working where that data center become active .
We can do it using manually changing some configurations to true/false but , I am looking for a automated process.
-Initial state:
Data center A active - Scheduler M is running.
Data center B passive - Scheduler M is turned off.
-May be after 3 days.
Data center A passive - Scheduler M turned off.
Data center B active - Scheduler M is starting
I don't know your business requirements but unless you want multiple instances running but only one active, the purpose you will have a load balancer would be to spread the load to multiple instances of the same application rather to stick with only one instance.
Anyway I think an easy way of doing this without using a very sophisticated mechanism (coming with a lot of complexity depending where you run your application) would be this:
Have shared location such as a semaphore table in your database storing the ID of the application instance owning the scheduler process
Have a timeout set for each task. Say if the scheduler is supposed to run every two minutes set the timeout to two minutes.
Have your schedulers always kick off on all application instances
Once the tasks kicks off first check if it is the one owning the processing. If yes do the work, if not go at point 7.
After doing the work record the time stamp of the task completion in the semaphore table
Wait for the time to pass for the next kick off
If not the one owning the processing check when the task last run in the semaphore table. If the time since last run is greater than the timeout set for that process take the ownership of the process (recording your application instance id in the semaphore table)
We applied this and it ran very well with one of our applications. In reality it was much more complex than explained above as we had a lot of application instances and we had to avoid starting an ownership battle between them. To address this we put in place a "Permission to process request" concept so no matter how many instances wanted to take control it was only one which was granted.
For another application with similar requirements we used a much much easier way to achieve this but the price we paid was some extra learning curve in using ILock from Hazelcast IMGB framework. That is really very easy but keep in mind the Hazelcat community edition comes with absolutely no security and paying for a Hazelcast license just to achieve this may be a bit of expense.
Again all depends on you use case, for us the semaphore table was good enough in first scenario but prove bad in the second one as the multiple processes trying to update the same table at the same time ended up with a lot of database contention which took us to Hazelcast.
Other ideas would be a custom health check implementation that could trigger activating one scheduler or the other depending of response received.
Hope that helps, just ideas from our experience. Good luck.

Update operation concurrency on multiple nodes

I have a single application , maintained on two different nodes on cloud. I have a scheduler in the application which triggers every 5 minutes, which perform some update operation in database. How can I avoid the two operations to cause anomaly in database. Is there a way one application may know, that other instance is already been triggered or any sort of inter node communication that may happen in cloud foundry.
Many Thanks
A couple options come to mind for Cloud Foundry:
Create a distributed "lock" with your database. This could be as simple as a table or record in the DB that the scheduler checks out first before it does anything else. Once it has the lock, the scheduler can work. If it fails to obtain the lock, it goes back to sleep. Then when it's done, it returns the lock.
If you have lots of work to do, you could divide it into sections and have locks for each section, that way you could spread the work out across your different instances. This gets more complicated though, so you'd have to weigh the advantages against the extra complication to see if it's worth it for your use case.
Only run the scheduler on the first node. You can determine the first node by looking at your application instance number. Either the env variable CF_INSTANCE_INDEX or VCAP_APPLICATION, which contains JSON and has an instance_index property. For either option, the value will be 0 for the first instance. If it's 0, the scheduler runs. If it's greater than zero, the scheduler doesn't run.
Hope that helps!

Aerospike's behavior on EC2

In my test setup on EC2 I have done following:
One Aerospike server is running in ZoneA (say Aerospike-A).
Another node of the same cluster is running in ZoneB (say Aerospike-B).
The application using above cluster is running in ZoneA.
I am initializing AerospikeClinet like this:
hosts= new Host[];
hosts[0] = new Host(PUBLIC_IP_OF_AEROSPIKE-A, 3000);
AerospikeClient client = new AerospikeClient(policy, hosts);
With above setup I am getting below behavior:
Writes are happening on both Aerospike-A and Aerospike-B.
Reads are only happening on Aerospike-A (data is around 1million records, occupying 900MB of memory and 1.3 GB of disk)
Question: Why are reads not going to both the nodes?
If I take Aerospike-B down, everything works perfectly. There is no outage.
If I take Aerospike-A down, all the writes and reads start failing. I've waited for 5 mins for other node to take traffic but it didn't work.
Questions:
a. In above scenario, I would expect Aerospike-B to take all the traffic. But this is not happening. Is there anything I am doing wrong?
b. Should I be giving both the hosts while initializing the client?
c. I had executed "clinfo -v 'config-set:context=service;paxos-recovery-policy=auto-dun-all'" on both the nodes. Is that creating a problem?
In EC2 you should place all the nodes of the cluster in the same AZ of the same region. You can use the rack awareness feature to set up nodes in two separate AZs, however you will be paying with a latency hit on each one of your writes.
Now what your seeing is likely due to misconfiguration. Each EC2 machine has a public IP and a local IP. Machines sitting on the same subnet may access each other through the local IP, but a machine from a different AZ cannot. You need to make sure your access-address is set to the public IP in the event that your cluster nodes are spread across two availability zones. Otherwise you have clients which can reach some of the nodes, lots of proxy events as the cluster nodes try to compensate and move your reads and writes to the correct node for you, and weird issues with data upon nodes leaving or joining the cluster.
For more details:
http://www.aerospike.com/docs/operations/configure/network/general/
https://github.com/aerospike/aerospike-client-python/issues/56#issuecomment-117901128

Resources