I have implemented an API which adhere with the Snowflake's Asynchronous External Function.
In our developed system, we are using AWS API gateway, Lambda function and a Third Party API( TPA).
In our scenarios, we store certain information in Snowflake's table and try to enrich this table using Snowflake's External User Defined Function.
We are able to enrich the table if the number of records are less. If we try to enrich the 3 millions of records, then after certain time, our TAPI starts sending HTTP 429. This is a indicator which tells our lambda function to slow the number of Snowflake's requests.
We understand this and the moment Lambda function gets the HTTP 429, then it sends the HTTP 429 back to Snowflake in any polling/post requests. It is expected that Snowflake will slow down the request rather than throwing an error and stopped processing further.
Below response to Snowflake
{
"statusCode" : 429
}
And it is a fixed situation which looks like Snowflake is not respecting HTTP 429 in the Request-Reply Pattern.
Snowflake does handle HTTP 4xx responses when working with external functions.
Have you engaged support? I have worked with customers having this issue, and snowflake team is able to review.
AWS API gateway has a default limit of 10000 rps.
Please review Designing High Performance External Functions
Remote services should return HTTP response code 429 when overloaded.
If Snowflake sees HTTP 429, Snowflake scales back the rate at which it
sends rows, and retries sending batches of rows that were not
processed successfully.
Your options for resolution are:
Work with AWS to increase your API Gateway rate limit.
However, some proxy services, including Amazon API Gateway and Azure
API Management, have default usage limits. When the request rate
exceeds the limit, these proxy services throttle requests. If
necessary, you might need to ask AWS or Azure to increase your quota
on your proxy service.
or
Try using a smaller warehouse, so that snowflake sends less volume to API gateway per second. This has obvious drawback of you running slower.
Related
SetUp:
We have gRPC pods running in a k8s cluster. The service mesh we use is linkerd. Our gRPC microservices are written in python (asyncio grpcs as the concurrency mechanism), with the exception of the entry-point. That microservice is written in golang (using gin framework). We have an AWS API GW that talks to an NLB in front of the golang service. The golang service communicates to the backend via nodeport services.
Requests on our gRPC Python microservices can take a while to complete. Average is 8s, up to 25s in the 99th %ile. In order to handle the load from clients, we've horizontally scaled, and spawned more pods to handle concurrent requests.
Problem:
When we send multiple requests to the system, even sequentially, we sometimes notice that requests go to the same pod as an ongoing request. What can happen is that this new request ends up getting "queued" in the server-side (not fully "queued", some progress gets made when context switches happen). The issue with queueing like this is that:
The earlier requests can start getting starved, and eventually timeout (we have a hard 30s cap from API GW).
The newer requests may also not get handled on time, and as a result get starved.
The symptom we're noticing is 504s which are expected from our hard 30s cap.
What's strange is that we have other pods available, but for some reason the loadbalancer isn't routing it to those pods smartly. It's possible that linkerd's smarter load balancing doesn't work well for our high latency situation (we need to look into this further, however that will require a big overhaul to our system).
One thing I wanted to try doing is to stop this queuing up of requests. I want the service to immediately reject the request if one is already in progress, and have the client (meaning the golang service) retry. The client retry will hopefully hit a different pod (do let me know if that won’t happen). In order to do this, I set the "maximum_concurrent_rpcs" to 1 on the server-side (Python server). When i sent multiple requests in parallel to the system, I didn't see any RESOURCE_EXHAUSTED exceptions (even under the condition when there is only 1 server pod). What I do notice is that the requests are no longer happening in parallel on the server, they happen sequentially (I think that’s a step in the right direction, the first request doesn’t get starved). That being said, I’m not seeing the RESOURCE_EXHAUSTED error in golang. I do see a delay between the entry time in the golang client and the entry time in the Python service. My guess is that the queuing is now happening client-side (or potentially still server side, but it’s not visible to me)?
I then saw online that it may be possible for requests to get queued up on the client-side as a default behavior in http/2. I tried to test this out in custom Python client that mimics the golang one with:
channel = grpc.insecure_channel(
"<some address>",
options=[("grpc.max_concurrent_streams", 1)]
)
# create stub to server with channel…
However, I'm not seeing any change here either. (Note, this is a test dummy client - eventually i'll need to make this run in golang. Any help there would be appreciated as well).
Questions:
How can I get the desired effect here? Meaning server sends resource exhausted if already handling a request, golang client retries, and it hits a different pod?
Any other advice on how to fix this issue? I'm grasping at straws here.
Thank you!
I have an AWS lambda function that launches an AWS Batch job. I call the lambda function within R like this:
result <- httr::POST(url, body = toJSON(job, auto_unbox = TRUE))
Where url is (some details redacted):
https://XXXXXXXXXX.execute-api.ca-central-1.amazonaws.com/Prod/job"
This works great when the requests are submitted sequentially. However, if I submit the job from even a small cluster (i.e. 10 nodes), I get a lot of 502 responses, which IIUC means the Lambda API endpoint is refusing the connection due to excessive traffic.
If I throttle the requests it works as desired.
But that does not seem like very high traffic (at most, 10 concurrent requests). My questions are: 1) am I interpretting the 502 response correctly and 2) What are the concurrent request limits for Lambda requests via the API Gateway?
Based on helpful comments from above, it became apparent the problem was not concurrent requests, but timeouts from the lambda function. This was evident in the logs. So when receiving 502 responses from your lambda API endpoint, inspect the cloudwatch logs for further details, including timeouts.
I'd like to set an arbitrary timer on my graphQL requests. Say, I make a request and it takes longer than 10 seconds for Apollo to send back an error.
Thoughts?
Would I need to do this with the Apollo client and Apollo server (say additional service requests such as databases, etc.)?
There are three different places where timeouts might make sense:
1. For the connection to the server
To have a timeout for requests sent to the server, you could build a wrapper around the network interface, which would reject query promises after x seconds.
2. For the query resolution on the GraphQL server
To implement a per-query timeout on the server, you could put the query start time on the context at the start of the query, and wrap each resolve function with a function that either returns the promise from the resolver, or rejects when the timeout has elapsed.
3. For the connection between your GraphQL server and the backends
To implement timeouts for requests to the backend, you can simply make the fetch-requests to the backends time out after a certain amount of time.
PS: It's worth noting that the solutions above will cause queries or requests to time out, but they won't cancel them, which means that your server or backends will most likely continue doing work that is now wasted. However, cancelling is an entirely different problem, and it's also harder to address.
Currently the documentation just says:
If Connect API endpoints receive too many requests associated with the same application or access token in a short time window, they might respond with a 429 Too Many Requests error. If this occurs, try your request again at a later time.
Much appreciated!
Currently, Connect API rate limits are on the order of 10 QPS. This limit might change in the future and should not be relied on.
My application sends notifications to the customers in bulks. For example at 8am every day a back-end systems generates notifications for 50K customers, and those notifications should be delivered in a reasonable time.
During performance testing I've discovered that sending a single push request to C2DM server takes about 400 millis which is far too long. I was told that a production quota may provide better performance, but will it reduce to 10 millis?
Besides, I need C2DM performance marks before going to production because it may affect the implementation - sending the requests from multiple threads, using asynchronous http client etc.
Does anyone knows about the C2DM server benchmarks or any performance-related server implementation guidelines?
Thanks,
Artem
I use appengine, which makes the delivery of a newsletter to 1 million users a very painful task. Mostly because the c2dm API doesn't support multiple users delivery, which makes it necessary to create one http request per user.
The time to the c2dm server to respond will depend on your latency from the google servers. In my case (appengine) it's very small.
My advice to you is to create as many threads as possible, since the threads will be waiting for the IO over the network. If you every pass the quota, you can always ask permission for more traffic.