I am updating a certain number of records in a long period of time, and I have no certainty about the timing in which the records will be produced. Sometimes, when many records are produced at the same time, I get an Error Log Entry saying that I hit the ProvisionedThroughputExceededException.
I'd like to prevent this exception to happen, or at least be able to catch the exception (and then re-throw it so that I don't alterate the logic) but all I get is the error log below:
[2019-02-12 15:50:48] local.ERROR: Error executing "UpdateItem" on "https://dynamodb.eu-central-1.amazonaws.com"; AWS HTTP error: Client error: `POST https://dynamodb.eu-central-1.amazonaws.com` resulted in a `400 Bad Request` response:
The Log continues and we can find a little more detail:
ProvisionedThroughputExceededException (client): The level of configured provisioned throughput for the table was exceeded. Consider increasing your provisioning level with the UpdateTable API. -
{
"__type": "com.amazonaws.dynamodb.v20120810#ProvisionedThroughputExceededException",
"message": "The level of configured provisioned throughput for the table was exceeded. Consider increasing your provisioning level with the UpdateTable API."
}
{"exception":"[object] (Aws\\DynamoDb\\Exception\\DynamoDbException(code: 0): Error executing \"UpdateItem\" on \"https://dynamodb.eu-central-1.amazonaws.com\"; AWS HTTP error: Client error: `POST https://dynamodb.eu-central-1.amazonaws.com` resulted in a `400 Bad Request` response:
So, the exception was thrown, but it looks like it's already caught, while I'd love to catch it myself, even only to keep track of it, and possibly to avoid the exception at all.
Is there a way to do so ?
To prevent the Exception, the obvious answer would be "Use Autoscaling on DynamoDb capacity". And that's what I did, with a certain degree of luck: when a spike in the request arose, I still had the exception, but in average, autoscaling worked pretty well. Here is the CloudFormation snipped for autoscaling:
MyTableWriteScaling:
Type: AWS::ApplicationAutoScaling::ScalableTarget
Properties:
MaxCapacity: 250
MinCapacity: 5
ResourceId: !Join ["/", ["table", !Ref myTable ]]
ScalableDimension: "dynamodb:table:WriteCapacityUnits"
ServiceNamespace: "dynamodb"
RoleARN: {"Fn::GetAtt": ["DynamoDbScalingRole", "Arn"]}
WriteScalingPolicy:
Type: AWS::ApplicationAutoScaling::ScalingPolicy
Properties:
PolicyName: !Join ['-', [!Ref 'AWS::StackName', 'MyTable', 'Write', 'Scaling', 'Policy']]
PolicyType: TargetTrackingScaling
ScalingTargetId: !Ref MyTableWriteScaling
ScalableDimension: dynamodb:table:WriteCapacityUnits
ServiceNamespace: dynamodb
TargetTrackingScalingPolicyConfiguration:
PredefinedMetricSpecification:
PredefinedMetricType: DynamoDBWriteCapacityUtilization
ScaleInCooldown: 1
ScaleOutCooldown: 1
TargetValue: 60
DependsOn : MyTableWriteScaling
That said, I still had the Exception. I knew that the throttled requests would eventually be written, but I was looking for a way to prevent the exception, since I could not catch it.
The way to do it was introduced by Amazon on November 28 and it is DynamoDB on demand.
Quite usefully, in the announcement we read:
DynamoDB on-demand is useful if your application traffic is difficult to predict and control, your workload has large spikes of short duration, or if your average table utilization is well below the peak.
Configuring on-demand in CloudFormation couldn't be easier:
HotelStay:
Type: AWS::DynamoDB::Table
Properties:
BillingMode: PAY_PER_REQUEST
...
Changing the BillingMode and removing the ProvisionedThroughput prevented this kind of exception to be thrown, they're just gone forever.
Related
I am running cdk deploy in my textract pipline folder for large document processing. However, when i run this porgram I get this error
The error
| CREATE_FAILED | AWS::Lambda::Function | S3BatchProcessor6C619AEA
Resource handler returned message: "Specified ReservedConcurrentExecutions for function decreases account's UnreservedConcurrentExecution below its minimum value of [10]. (Service: Lambda, Status Code: 400, Request ID: 7f6d1305-e248-4745-983e-045eccde562d)" (RequestToken: 9c84827d-502e-5697-b023-e
0be45f8d451, HandlerErrorCode: InvalidRequest)
By default AWS will provide with at max 1000 concurrency limit.
In your case, the different concurrencies in all lambdas in your account is exceeding UnreservedConcurrentExecution Limit of 10 i.e.,
1000 - sum all reservedConcurrency in lambdas > 10
This is causing deployment failure as you're trying to exceed concurrency limit.
There can be two solutions here:
Reduce the reserved concurrency of lambdas so that above equation holds or
You can raise the account concurrency limit by contacting aws support. Please refer this
We have set up an EFK stack for our project and from yesterday kibana seems down. When we initially troubleshooter we have found the following errors:
Readiness probe failed: Error: Got HTTP code 503 but expected a 200 & Readiness probe failed: Error: Got HTTP code 000 but expected a 200
Later we found the same issue with elasticsearch pod as well. along with this we found the following issue with Data request limit:
FATAL
{"error":{"root_cause":[{"type":"circuit_breaking_exception","reason":"[parent]
Data too large, data for [indices:admin/template/get] would be
[1036909172/988.8mb], which is larger than the limitof
[1020054732/972.7mb], real usage: [1036909056/988.8mb], new bytes
reserved: [116/116b], usages [request=0/0b, fielddata=420/420b,
in_flight_requests=67310/65.7kb, model_inference=0/0b,
eql_sequence=0/0b,
accounting=110294544/105.1mb]","bytes_wanted":1036909172,"bytes_limit":1020054732,"durability":"PERMANENT"}],"type":"circuit_breaking_exception","reason":"[parent]
Data too large, data for [indices:admin/template/get] would be
[1036909172/988.8mb], which is larger than the limit of
[1020054732/972.7mb], real usage: [1036909056/988.8mb], new bytes
reserved: [116/116b], usages [request=0/0b, fielddata=420/420b,
in_flight_requests=67310/65.7kb, model_inference=0/0b,
eql_sequence=0/0b,
accounting=110294544/105.1mb]","bytes_wanted":1036909172,"bytes_limit":1020054732,"durability":"PERMANENT"},"status":429}
We have tried changing the REDINESS_PROBE_TIMEOUT, Initial Delay, Timeout, Probe Period, Success Threshold, and Failure Threshold. Also tried increasing the Indicess Breaker limit but it's not reflecting we can see error still taking old limits, tried fixing circuit_breaking_exception by adding ES_JAVA_OPTS values as well.
Nothing seems to be working, any help would be appreciated.
the same phenomenon occurred during the service operation. This issue is identified as a memory shortage. So there are several ways to think about it over.
Physical Memory Expansion (Scale Out)
Additional equipment due to insufficient memory available
Lower load through monitoring
If circuit_breaking_exception remains in the log, develop a monitoring device that lowers the load
Setting java_opts
You can set memory usage, but it's meaningless if you don't have enough hardware memory
We are using Google.apis Version 1.36.1 SDK in order to create service account keys for GCP Service accounts.
When we reach maximum amount of keys (10) instead of getting a valid error message / error code we recieve a general 400 error code with a "Precondition check failed." message.
We used to get error code 429 indicating we have reached maximum amount of keys.
Current GoogleApiException object :
Google.GoogleApiException: Google.Apis.Requests.RequestError
Precondition check failed. [400]
Errors [
Message[Precondition check failed.] Location[ - ] Reason[failedPrecondition] Domain[global]
]
The current return code does not provide us with enough information, Is there any other way for us to know the reason of the failure ?
This error message is also related to limits. You can take the official documentation for the Classroom API as an example.
I have found myself in a similar situation where we were deleting service account keys to immediately create new ones. We were getting the same error because there is a delay on the system where it can take from 60-90 seconds to delete the key for you to be able to create it again.
The full text of the message is :
{code: 1012, message: "Transaction is temporarily banned"}
This would indicate that the transaction is held somewhere in Substrate Runtime mempool or something of that nature, but it is not entirely clear what possible causes can trigger this, and what the eventual outcome might be.
For example,
1) is it that too many transactions have been sent from a given account, IP address or other? Has some threshold been reached?
2) is the transaction actually invalid, or not?
3) The use of the word "temporary" suggests a delay in processing, not an outright rejection of the transaction. Therefore does this suggest that the transaction is valid, but delayed? If so, for how long?
The comments in the substrate runtime core/rpc/src/author/errors.rs and core/transaction-pool/graph/src/errors.rs is no clearer about what is the outcome.
In front of the mempool, exists a transaction blacklist, which can trigger this error. Specifically, this error means that a transaction with the same hash was either:
Part of recently mined block
Detected as invalid during block production and removed from the pool.
Additionally, this error can occur when:
The transaction reaches it's longevity, i.e. is not mined for TransactionValidation::longevity blocks after being imported to the pool.
By default longevity is set to u64::max so this normally should not be the problem.
In any case -ltxpool=log should reveal more details around this error.
A transaction is only temporarily banned because it will be removed from the blacklist when either:
30 minutes pass
There are more than 4,000 transactions on the blacklist
Check out core/transaction-pool/graph/src/rotator.rs.
I have created API using Django Rest Framework.
API communicates with GCP cloud storage to store profile Image(around 1MB/pic).
While performing load testing (around 1000 request/s) to that server.
I have encountered the following error.
I seem to be a GCP cloud storage max request issue, but unable to figure out the solution of it.
Exception Type: SSLError at /api/v1/users
Exception Value: HTTPSConnectionPool(host='www.googleapis.com', port=443): Max retries exceeded with url: /storage/v1/b/<gcp-bucket-name>?projection=noAcl (Caused by SSLError(SSLError("bad handshake: SysCallError(-1, 'Unexpected EOF')",),))
Looks like you have the answer to your question here:
"...buckets have an initial IO capacity of around 1000 write requests
per second...As the request rate for a given bucket grows, Cloud
Storage automatically increases the IO capacity for that bucket"
Therefore it automatically Auto-Scale. The only thing is that you need to increase the requests/s gradually as described here:
"If your request rate is expected to go over these thresholds, you should start with a request rate below or near the thresholds and then double the request rate no faster than every 20 minutes"
Looks like your bucket should get an increase of I/O capacity that will work in the future.
You are actually right in the edge (1000 req/s), but I guess this is what is causing your error.