Sentry Rate Limiting - aws-lambda

We have a couple of aws lambda functions which are using the same Sentry Client Key for error reporting. Recently we started to receive rate limiting error.
{
"errorType": "SentryError",
"errorMessage": "HTTP Error (429)",
"name": "SentryError",
"stack": [
"SentryError: HTTP Error (429)",
" at new SentryError (/var/task/node_modules/#sentry/utils/dist/error.js:9:28)",
" at ClientRequest.<anonymous> (/var/task/node_modules/#sentry/node/dist/transports/base/index.js:212:44)",
" at Object.onceWrapper (events.js:421:26)",
" at ClientRequest.emit (events.js:314:20)",
" at ClientRequest.EventEmitter.emit (domain.js:483:12)",
" at HTTPParser.parserOnIncomingClient (_http_client.js:601:27)",
" at HTTPParser.parserOnHeadersComplete (_http_common.js:122:17)",
" at TLSSocket.socketOnData (_http_client.js:474:22)",
" at TLSSocket.emit (events.js:314:20)",
" at TLSSocket.EventEmitter.emit (domain.js:483:12)"
]
}
However, no actual errors occurs in the lambda function. From what we understand Sentry is recording every lambda executions and sending this information as trace. We are using a low tracesSampleRate, but the error still occurs.
AWSLambda.init({
dsn: process.env.SENTRY_DNS,
sampleRate: 1.0,
tracesSampleRate: 0.1,
});
I don't know if this is relevant, but the functions are not inside our VPC, they are using AWS ip address pool.
We tried to find any indication of errors related to rate limiting in the Sentry dashboard, but without success.
Thanks!

I started getting this issue in one of our systems around the same time you posted this. Something that stopped it happening was to disable tracing all together (tracesSampleRate: 0). Not truly a fix, but enough to clear our error logs for now while we investigate a proper fix.

Related

Google.apis returns error code 400 after creating maximum amount of service account keys

We are using Google.apis Version 1.36.1 SDK in order to create service account keys for GCP Service accounts.
When we reach maximum amount of keys (10) instead of getting a valid error message / error code we recieve a general 400 error code with a "Precondition check failed." message.
We used to get error code 429 indicating we have reached maximum amount of keys.
Current GoogleApiException object :
Google.GoogleApiException: Google.Apis.Requests.RequestError
Precondition check failed. [400]
Errors [
Message[Precondition check failed.] Location[ - ] Reason[failedPrecondition] Domain[global]
]
The current return code does not provide us with enough information, Is there any other way for us to know the reason of the failure ?
This error message is also related to limits. You can take the official documentation for the Classroom API as an example.
I have found myself in a similar situation where we were deleting service account keys to immediately create new ones. We were getting the same error because there is a delay on the system where it can take from 60-90 seconds to delete the key for you to be able to create it again.

How do I find the request ID for a failed Lambda invocation?

On my AWS Lambda dashboard, I see a spike in failed invocations. I want to investigate these errors by looking at the logs for these invocations. Currently, the only thing I can do to filter these invocations, is get the timeline of the failed invocations, and then look through logs.
Is there a way I can search for failed invocations, i.e. ones that did not return a 200, and get a request ID that I can then lookup in CloudWatch Logs?
You may use AWS X-Ray for this by enabling in AWS Lambda dashboard.
In X-Ray dashboard;
you may view traces
filter them by status code
see all the details of the invocation including request id, total execution time etc such as
{
"Document": {
"id": "ept5e8c459d8d017fab",
"name": "zucker",
"start_time": 1595364779.526,
"trace_id": "1-some-trace-id-fa543548b17a44aeb2e62171",
"end_time": 1595364780.079,
"http": {
"response": {
"status": 200
}
},
"aws": {
"request_id": "abcdefg-69b5-hijkl-95cc-170e91c66110"
},
"origin": "AWS::Lambda",
"resource_arn": "arn:aws:lambda:eu-west-1:12345678:function:major-tom"
},
"Id": "52dc189d8d017fab"
}
What I understand from your question is you are more interested in finding out why your lambda invocation has failed rather than finding the request-id for failed lambda invocation.
You can do this by following the steps below:
Go to your lambda function in the AWS console.
There will be three tabs named as Configuration, Permissions, and Monitoring
Click on the Monitoring Tab. Here you can see the number of invocation, Error count and success rate, and other metrics as well. Click on the Error metrics. You will see that at what time the error in invocation has happened. You can read more at this Lambda function metrics
If you already know the time at which your function has failed then ignore Step 3.
Now scroll down. You will find the section termed as CloudWatch Logs Insights. Here you will see logs for all the invocation that has happened within the specified time range.
Adjust your time range under this section. You can choose a predefined time range like 1h, 3h, 1d, etc, or your custom time range.
Now Click on the Log stream link after the above filter has been applied. It will take you to cloudwatch console and you can see the logs here.

How To Prevent DynamoDb exception ProvisionedThroughputExceededException in Laravel

I am updating a certain number of records in a long period of time, and I have no certainty about the timing in which the records will be produced. Sometimes, when many records are produced at the same time, I get an Error Log Entry saying that I hit the ProvisionedThroughputExceededException.
I'd like to prevent this exception to happen, or at least be able to catch the exception (and then re-throw it so that I don't alterate the logic) but all I get is the error log below:
[2019-02-12 15:50:48] local.ERROR: Error executing "UpdateItem" on "https://dynamodb.eu-central-1.amazonaws.com"; AWS HTTP error: Client error: `POST https://dynamodb.eu-central-1.amazonaws.com` resulted in a `400 Bad Request` response:
The Log continues and we can find a little more detail:
ProvisionedThroughputExceededException (client): The level of configured provisioned throughput for the table was exceeded. Consider increasing your provisioning level with the UpdateTable API. -
{
"__type": "com.amazonaws.dynamodb.v20120810#ProvisionedThroughputExceededException",
"message": "The level of configured provisioned throughput for the table was exceeded. Consider increasing your provisioning level with the UpdateTable API."
}
{"exception":"[object] (Aws\\DynamoDb\\Exception\\DynamoDbException(code: 0): Error executing \"UpdateItem\" on \"https://dynamodb.eu-central-1.amazonaws.com\"; AWS HTTP error: Client error: `POST https://dynamodb.eu-central-1.amazonaws.com` resulted in a `400 Bad Request` response:
So, the exception was thrown, but it looks like it's already caught, while I'd love to catch it myself, even only to keep track of it, and possibly to avoid the exception at all.
Is there a way to do so ?
To prevent the Exception, the obvious answer would be "Use Autoscaling on DynamoDb capacity". And that's what I did, with a certain degree of luck: when a spike in the request arose, I still had the exception, but in average, autoscaling worked pretty well. Here is the CloudFormation snipped for autoscaling:
MyTableWriteScaling:
Type: AWS::ApplicationAutoScaling::ScalableTarget
Properties:
MaxCapacity: 250
MinCapacity: 5
ResourceId: !Join ["/", ["table", !Ref myTable ]]
ScalableDimension: "dynamodb:table:WriteCapacityUnits"
ServiceNamespace: "dynamodb"
RoleARN: {"Fn::GetAtt": ["DynamoDbScalingRole", "Arn"]}
WriteScalingPolicy:
Type: AWS::ApplicationAutoScaling::ScalingPolicy
Properties:
PolicyName: !Join ['-', [!Ref 'AWS::StackName', 'MyTable', 'Write', 'Scaling', 'Policy']]
PolicyType: TargetTrackingScaling
ScalingTargetId: !Ref MyTableWriteScaling
ScalableDimension: dynamodb:table:WriteCapacityUnits
ServiceNamespace: dynamodb
TargetTrackingScalingPolicyConfiguration:
PredefinedMetricSpecification:
PredefinedMetricType: DynamoDBWriteCapacityUtilization
ScaleInCooldown: 1
ScaleOutCooldown: 1
TargetValue: 60
DependsOn : MyTableWriteScaling
That said, I still had the Exception. I knew that the throttled requests would eventually be written, but I was looking for a way to prevent the exception, since I could not catch it.
The way to do it was introduced by Amazon on November 28 and it is DynamoDB on demand.
Quite usefully, in the announcement we read:
DynamoDB on-demand is useful if your application traffic is difficult to predict and control, your workload has large spikes of short duration, or if your average table utilization is well below the peak.
Configuring on-demand in CloudFormation couldn't be easier:
HotelStay:
Type: AWS::DynamoDB::Table
Properties:
BillingMode: PAY_PER_REQUEST
...
Changing the BillingMode and removing the ProvisionedThroughput prevented this kind of exception to be thrown, they're just gone forever.

Getting 401 Unauthorized error when threads in JMeter increase

I am running a JMeter script, where I get the Access Token which I use it for my HTTP Request Samplers (By using Bearer ${AccessToken} in Header Manager of each Request). My HTTP Requests are being categorized into multiple Simple Controllers.
There are 70 HTTP GET Requests and ONE Thread takes around 20 seconds to execute them all.
Now when my no. of threads increase, say 3 onwards, then I start getting 401 Errors
({
"statusCode": 401,
"error": "Unauthorized",
"message": "Bad token",
"attributes": {
"error": "Bad token"
}
})
for a few requests. But eventually 401 errors start getting high as no. of Threads increase, keeping Ramp Up time low. for eg: for 5 Requests Ramp Up time = 30 sec.
JMeter Script snapshot
I have checked, my Access Token call always return a different token which is used per new THREAD. so not sure where the issue is :(
So far I have not used any think times, maybe that is one of the issue, but not sure.
By looking at your http get response , The issue is caused most likely due to incorrect value of AccessToken.
Make sure you are passing correct AccessToken to get response.
IF you have a recorded script log, check where this access token originating from and make sure your regular expression extractor is extracting it correctly.
For more information on extracting variables and reusing it in the script you can read this article.

Elasticsearch DB giving Client request timeout [Status Code 504, error : Gateway Time-Out]

I am trying to send request on Elasticsearch Db. Most of the time I get this error:
{
"statusCode": 504,
"error": "Gateway Time-out",
"message": "Client request timeout"
}
Currently i have 24 GB of data in Elasticsearch DB
System Configuration:
8 Core, 4 GB RAM, OS Ubuntu.
I have only one node in the cluster.
I am unable to find why am i getting timeout issue frequently.
Is it because the size of the data I have?
Is it because the size of the data I have?
No, I would not say so. We have about 100 GB in our one-node-cluster and it works fine.
As for troubleshooting your problem, it is really hard to say anything as you have not given too much info.
The 24 GB is on one indice or more?
What kind of queries are you using?
What is your heapsize?

Resources