Delay of an AutoScaling action triggered by a CloudWatch Alarm

Delay of an AutoScaling action triggered by a CloudWatch Alarm - amazon-ec2

I have an EC2 autoscaling group, with scale-out triggered by a step policy. The policy threshold seems to be detected correctly, but the action is delayed by 4-5 minutes. It seems to be reflected in the alarm logs:
"newState": {
"stateValue": "ALARM",
"stateReason": "Threshold Crossed: 1 out of the last 1 datapoints [548.0 (03/01/19 20:27:00)] was greater than the threshold (160.0) (minimum 1 datapoint for OK -> ALARM transition).",
"stateReasonData": {
"version": "1.0",
"queryDate": "2019-01-03T20:31:31.936+0000",
"startDate": "2019-01-03T20:27:00.000+0000",
"statistic": "Sum",
"period": 60,
"recentDatapoints": [
548
],
"threshold": 160
}
}
And the ASG activity log:
At 2019-01-03T20:31:31Z a monitor alarm scale-out-alarm in state ALARM triggered policy scale-out-policy changing the desired capacity from 1 to 4. At 2019-01-03T20:32:01Z an instance was started in response to a difference between desired and actual capacity, increasing the capacity from 1 to 4.
Notice that the threshold was crossed at 20:27:00, while the action was taken at 20:31:31. These seem to correlate with the "startDate" and the "queryDate" in the log, although I haven't found any documentation of these properties.
Is this just a matter of random delays in CloudWatch, or there is another reason for such a delay?
The ASG hasn't been scaled for a long period of time before that, so it doesn't seem to be related to warmup/cooldown.
Both EvaluationPeriods and DatapointsToAlarm are 1.
After some further investigation, it seems that the delay is much larger when the alarm is based on ALB's RequestCount metrics than it is for EC2 CPUUtilization. Does it make sense?

Related

load test using k6 with fixed request rate

trying to run k6 performance test cases by using the following scenario's
how to hit x amount of API per min for
example: Produce 500 messages per minute → Check how API behaves
It should be same, next time we run the test case.

This is easily achievable with the constant-arrival-rate executor:
import http from 'k6/http';
import { sleep } from 'k6';
export const options = {
scenarios: {
500_mps: {
executor: 'constant-arrival-rate',
duration: '10m',
rate: 500,
timeUnit: '1m',
preAllocatedVUs: 10,
maxVUs: 100,
},
},
};
export default function() {
http.get('your api');
}
The above code will try to run the default function 500 times per minute for 10 minutes. It will use at most 100 VUs – if your API is too slow, k6 will not start more VUs and you will not reach the target load.

Backpressure with Reactors Parallel Flux + Timeouts

I'm currently working on using paralellism in a Flux. Right now I'm having problems with the backpressure. In our case we have a fast producing service we want to consume, but we are much slower.
With a normal flux, this works so far, but we want to have parallelism. What I see when I'm using the approach with
.parallel(2)
.runOn(Schedulers.parallel())
that there is a big request on the beginning, which takes quite a long time to process. Here occurs also a different problem, that if we take too long to process, we somehow seem to generate a cancel event in the producer service (we consume it via webflux rest-call), but no cancel event is seen in the consumer.
But back to problem 1, how is it possible to bring this thing back to sync. I know of the prefetch parameter on the .parallel() method, but it does not work as I expect.
A minimum example would be something like this
fun main() {
val atomicInteger = AtomicInteger(0)
val receivedCount = AtomicInteger(0)
val processedCount = AtomicInteger(0)
Flux.generate<Int> {
it.next(atomicInteger.getAndIncrement())
println("Emitted ${atomicInteger.get()}")
}.doOnEach { it.get()?.let { receivedCount.addAndGet(1) } }
.parallel(2, 1)
.runOn(Schedulers.parallel())
.flatMap {
Thread.sleep(200)
log("Work on $it")
processedCount.addAndGet(1)
Mono.just(it * 2)
}.subscribe {
log("Received ${receivedCount.get()} and processed ${processedCount.get()}")
}
Thread.sleep(25000)
}
where I can observe logs like this
...
Emitted 509
Emitted 510
Emitted 511
Emitted 512
Emitted 513
2022-02-02T14:12:58.164465Z - Thread[parallel-1,5,main] Work on 0
2022-02-02T14:12:58.168469Z - Thread[parallel-2,5,main] Work on 1
2022-02-02T14:12:58.241966Z - Thread[parallel-1,5,main] Received 513 and processed 2
2022-02-02T14:12:58.241980Z - Thread[parallel-2,5,main] Received 513 and processed 2
2022-02-02T14:12:58.442218Z - Thread[parallel-2,5,main] Work on 3
2022-02-02T14:12:58.442215Z - Thread[parallel-1,5,main] Work on 2
2022-02-02T14:12:58.442315Z - Thread[parallel-2,5,main] Received 513 and processed 3
2022-02-02T14:12:58.442338Z - Thread[parallel-1,5,main] Received 513 and processed 4
So how could I adjust that thing that I can use the parallelism but stay in backpressure/sync with my producer? The only way I got it to work is with a semaphore acquired before the parallelFlux and released after work, but this is not really a nice solution.

Ok for this szenario it seemed crucial that prefetch of parallel and runOn had to bet seen very low, here to 1.
With defaults from 256, we requested too much from our producer, so that there was already a cancel event because of the long time between the first block of requests for getting the prefetch and the next one when the Flux decided to fill the buffer again.

Why and how is the quota "critial read requests" exceeded when using batchCreateContacts

I'm programming a contacts export from our database to Google Contacts using the Google People API. I'm programming the requests over URL via Google Apps Script.
The code below - using https://people.googleapis.com/v1/people:batchCreateContacts - works for 13 to about 15 single requests, but then Google returns this error message:
Quota exceeded for quota metric 'Critical read requests (Contact and Profile Reads)' and limit 'Critical read requests (Contact and Profile Reads) per minute per user' of service 'people.googleapis.com' for consumer 'project_number:***'.
For speed I send the request with batches of 10 parallel requests.
I have the following two questions regarding this problem:
Why, for creating contacts, I would hit a quotum regarding read requests?
Given the picture link below, why would sending 2 batches of 10 simultaneous requests (more precise: 13 to 15 single requests) hit that quotum limit anyway?
quotum limit of 90 read requests per user per minute as displayed on console.cloud.google.com
Thank you for any clarification!
Further reading: https://developers.google.com/people/api/rest/v1/people/batchCreateContacts
let payloads = [];
let lengthPayloads;
let limitPayload = 200;
/*Break up contacts in payload limits*/
contacts.forEach(function (contact, index) /*contacts is an array of objects for the API*/
{
if(!(index%limitPayload))
{
lengthPayloads = payloads.push(
{
'readMask': "userDefined",
'sources': ["READ_SOURCE_TYPE_CONTACT"],
'contacts': []
}
);
}
payloads[lengthPayloads-1]['contacts'].push(contact);
}
);
Logger.log("which makes "+payloads.length+" payloads");
let parallelRequests = [];
let lengthParallelRequests;
let limitParallelRequest = 10;
/*Break up payloads in parallel request limits*/
payloads.forEach(function (payload, index)
{
if(!(index%limitParallelRequest))
lengthParallelRequests = parallelRequests.push([]);
parallelRequests[lengthParallelRequests-1].push(
{
'url': "https://people.googleapis.com/v1/people:batchCreateContacts",
'method': "post",
'contentType': "application/json",
'payload': JSON.stringify(payload),
'headers': { 'Authorization': "Bearer " + token }, /*token is a token of a single user*/
'muteHttpExceptions': true
}
);
}
);
Logger.log("which makes "+parallelRequests.length+" parallelrequests");
let responses;
parallelRequests.forEach(function (parallelRequest)
{
responses = UrlFetchApp.fetchAll(parallelRequest); /* error occurs here*/
responses = responses.map(function (response) { return JSON.parse(response.getContentText()); });
responses.forEach(function (response)
{
if(response.error)
{
Logger.log(JSON.stringify(response));
throw response;
}
else Logger.log("ok");
}
);
Output of logs:
which makes 22 payloads
which makes 3 parallelrequests
ok (15 times)
(the error message)

I had raised the same issue in Google's issue tracker.
Seems that the single BatchCreateContacts or BatchUpdateContacts call consumes six (6) "Critical Read Request" quota per request. Still did not get an answer why for creating/updating contacts, we are hitting the limit of critical read requests.

Quota exceeded for quota metric 'Critical read requests (Contact and Profile Reads)' and limit 'Critical read requests (Contact and Profile Reads) per minute per user' of service 'people.googleapis.com' for consumer 'project_number:***'.
There are two types of quotas: project based quotas and user based quotas. Project based quotas are limits placed upon your project itself. User based quotes are more like flood protection they limit the number of requests a single user can make over a period of time.
When you send a batch request with 10 requests in it it counts as ten requests not as a single batch request. If you are trying to run this parallel then you are defiantly going to be overflowing the request per minute per user quota.
Slow down this is not a race.
Why, for creating contacts, I would hit a quota regarding read requests?
I would chock it up to a bad error message.
Given the picture link below, why would sending 13 to 15 requests hit that quota limit anyway? ((there are 3 read requests before this code)) quota limit of 90 read requests per user per minute as displayed on console.cloud.google.com
Well you are sending 13 * 10 = 130 per minute that would exceed the request per minute. There is also no way of knowing how fast your system is running it could be going faster as it will depend upon what else the server is doing at the time it gets your requests what minute they are actually being recorded in.
My advice is to just respect the quota limits and not try to understand why there are to many variables on Googles servers to be able to tack down what exactly a minute is. You could send 100 requests in 10 seconds and then try to send another 100 in 55 seconds and you will get the error you could also get the error after 65 seconds depend upon when they hit the server and when the server finished processing your initial 100 requests.
Again slow down.

Kibana - Watcher when a server goes down and a followup if it’s been down for 24 hours?

I have a number of servers being pinged by heartbeat. What I'm trying to figure out how to do is:
"Watcher 1" run every minute - When a server has been consistently down for a minute (no monitor.status = up for any documents for one particular address in heartbeat for the past 60 seconds of documents), send an email notification, then trigger a "watcher" to evaluate again every 24 hours.
"Watcher 2" - At the 24 hour mark, if the server has still been consistently down (no monitor.status = up for any document for that one address pinged in any of the heartbeat documents returned for that server for the past 24 hours), trigger another email.
During this "wait 24 hour" period, suspend "Watcher 1" running every minute on the server
Is something like this doable with the watcher/alerting functionality (without customization)? I've seen workflows posted before about a
simple downtime notification, but not sure if the rest of this can be performed.

I think you should be looking at Throttling option in watchers. As per the Elasticsearch documentation
During the watch execution, once the condition is met, a decision is
made per configured action as to whether it should be throttled. The
main purpose of action throttling is to prevent too many executions of
the same action for the same watch.
"actions" : {
"email_administrator" : {
"throttle_period": "24h",
"email" : {
"to" : "sys.admino#host.domain",
"subject" : "Encountered {{ctx.payload.hits.total}} errors",
"body" : "Too many error in the system, see attached data",
"attachments" : {
"attached_data" : {
"data" : {
"format" : "json"
}
}
},
"priority" : "high"
}
}

how to view the number of task in the current queue?

i get a index rejected exception when concurrent index document operation.
rejected execution of org.elasticsearch.transport.TcpTransport$RequestHandler#6d1cb827
on EsThreadPoolExecutor
[
index,
queue capacity = 200,
org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor#5d6lae0c
[
Running,
pool size = 32,
active threads = 32,
queued tasks = 312,
completed tasks = 32541513
]
]
i try to visit the url, but the index queue field is aways 0.
/_cat/thread_pool?v&h=id,type,name,size,largest,active,queue_size,queue
question 1: the queue capacity is 200, why is the munber of tasks in queue is 312 (over 200) ?
question 2: how to view the number of task in the current queue?

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio