Idempotency with Redis in multi-threaded envirnment - spring

I'm creating a POST api in which I'm using REDIS for idempotency.
I'm taking a idempotency-key in header which is getting saved in Redis.
When the same request comes in, I'm returning the cached message.
In redis I'm saving it as idempotency-key: message-body with Http status
During load testing I sent the same request, with same idempotency-key 30 times.
As expected, 5/30 times, the same request was stored in Redis, because new request comes in before the first finished.
In redis, how can I avoid it, without making API slow?
I did not find much material on net.
Apart for redis I only have dynamo as a centralized DB.

Related

What is the best way to share events between Google cloud run containers

I have a service which is running on many cloud run containers.
When a single container (A) receives a web request to do some work, I need all the other live containers to fetch some updated data from elasticsearch.
I would have expected ES to have a "listening" type of connection such as firebase but this is not possible.
Right now I am having to poll the database from each service.
Is there a better way to achieve this sort of cross container sync when using cloud run? Would pub/sub be the best solution here?
It's unusual but not impossible to achieve.
First of all, you have to understand the instance life cycle: the CPU is allocated only when a request is being processed. Else, the CPU is throttle ( bellow 5%). That's also for that you pay only when your instance is processing, and not when the instance is kept warm (and offloaded after a while).
That being said, it's totally useless and inefficient to update instances in background when a request is not being processed.
Therefore, the idea is to perform something when the instance receive a request. The bad thing is that this solution will increase the request latency (the instance start to sync his cache and then process the request).
Finally the solution is to store, somewhere, the latest cache update. You have to keep that pretty same information in your instance. When the instance receive a request, first thing, it compares its own cache date with the central data date.
If it's the same, no problem, continue the processing.
If the central data date is after the current instance date, update the instance data, and then process the request.
You can store the data, and the date of that data in Firestore for instance, or in MemoryStore, or in any other databases.
PubSub can be also a solution but more complex to implement. Each instance, when they start have to create a pull subscription on a topic. When the instance is killed, you have to delete that subscription.
Then, when a request comes in, your instance have to pull the subscription, and get the messages, if any, and update his local cache.
Could be faster than the previous solution, but harder to implement.

Disallow queuing of requests in gRPC microservices

SetUp:
We have gRPC pods running in a k8s cluster. The service mesh we use is linkerd. Our gRPC microservices are written in python (asyncio grpcs as the concurrency mechanism), with the exception of the entry-point. That microservice is written in golang (using gin framework). We have an AWS API GW that talks to an NLB in front of the golang service. The golang service communicates to the backend via nodeport services.
Requests on our gRPC Python microservices can take a while to complete. Average is 8s, up to 25s in the 99th %ile. In order to handle the load from clients, we've horizontally scaled, and spawned more pods to handle concurrent requests.
Problem:
When we send multiple requests to the system, even sequentially, we sometimes notice that requests go to the same pod as an ongoing request. What can happen is that this new request ends up getting "queued" in the server-side (not fully "queued", some progress gets made when context switches happen). The issue with queueing like this is that:
The earlier requests can start getting starved, and eventually timeout (we have a hard 30s cap from API GW).
The newer requests may also not get handled on time, and as a result get starved.
The symptom we're noticing is 504s which are expected from our hard 30s cap.
What's strange is that we have other pods available, but for some reason the loadbalancer isn't routing it to those pods smartly. It's possible that linkerd's smarter load balancing doesn't work well for our high latency situation (we need to look into this further, however that will require a big overhaul to our system).
One thing I wanted to try doing is to stop this queuing up of requests. I want the service to immediately reject the request if one is already in progress, and have the client (meaning the golang service) retry. The client retry will hopefully hit a different pod (do let me know if that won’t happen). In order to do this, I set the "maximum_concurrent_rpcs" to 1 on the server-side (Python server). When i sent multiple requests in parallel to the system, I didn't see any RESOURCE_EXHAUSTED exceptions (even under the condition when there is only 1 server pod). What I do notice is that the requests are no longer happening in parallel on the server, they happen sequentially (I think that’s a step in the right direction, the first request doesn’t get starved). That being said, I’m not seeing the RESOURCE_EXHAUSTED error in golang. I do see a delay between the entry time in the golang client and the entry time in the Python service. My guess is that the queuing is now happening client-side (or potentially still server side, but it’s not visible to me)?
I then saw online that it may be possible for requests to get queued up on the client-side as a default behavior in http/2. I tried to test this out in custom Python client that mimics the golang one with:
channel = grpc.insecure_channel(
"<some address>",
options=[("grpc.max_concurrent_streams", 1)]
)
# create stub to server with channel…
However, I'm not seeing any change here either. (Note, this is a test dummy client - eventually i'll need to make this run in golang. Any help there would be appreciated as well).
Questions:
How can I get the desired effect here? Meaning server sends resource exhausted if already handling a request, golang client retries, and it hits a different pod?
Any other advice on how to fix this issue? I'm grasping at straws here.
Thank you!

Snowflake's Asynchronous External function not respecting HttpStatus 429

I have implemented an API which adhere with the Snowflake's Asynchronous External Function.
In our developed system, we are using AWS API gateway, Lambda function and a Third Party API( TPA).
In our scenarios, we store certain information in Snowflake's table and try to enrich this table using Snowflake's External User Defined Function.
We are able to enrich the table if the number of records are less. If we try to enrich the 3 millions of records, then after certain time, our TAPI starts sending HTTP 429. This is a indicator which tells our lambda function to slow the number of Snowflake's requests.
We understand this and the moment Lambda function gets the HTTP 429, then it sends the HTTP 429 back to Snowflake in any polling/post requests. It is expected that Snowflake will slow down the request rather than throwing an error and stopped processing further.
Below response to Snowflake
{
"statusCode" : 429
}
And it is a fixed situation which looks like Snowflake is not respecting HTTP 429 in the Request-Reply Pattern.
Snowflake does handle HTTP 4xx responses when working with external functions.
Have you engaged support? I have worked with customers having this issue, and snowflake team is able to review.
AWS API gateway has a default limit of 10000 rps.
Please review Designing High Performance External Functions
Remote services should return HTTP response code 429 when overloaded.
If Snowflake sees HTTP 429, Snowflake scales back the rate at which it
sends rows, and retries sending batches of rows that were not
processed successfully.
Your options for resolution are:
Work with AWS to increase your API Gateway rate limit.
However, some proxy services, including Amazon API Gateway and Azure
API Management, have default usage limits. When the request rate
exceeds the limit, these proxy services throttle requests. If
necessary, you might need to ask AWS or Azure to increase your quota
on your proxy service.
or
Try using a smaller warehouse, so that snowflake sends less volume to API gateway per second. This has obvious drawback of you running slower.

Fetching large webhook in an optimal way

So I was sending a webhook from SendGrid, comprising 10000 records in it. At my endpoint, I am using rabbitmq queue to process the webhook further. The issue I am facing is in MongoDb update process as it is experiencing too much load. I've searched around and didn't get any results for this issue. What I was thinking to do is to send data to mongodb in chunks after consuming the message or is there any way around where I can receive the webhook in chunks before pushing it to my queue?

Laravel Nginx Every Second Request Handle

i want to ask about the request handle
The case is, if server receive the same post request or the same key parameter, the request is on hold first or stored until there are no more requests for 5 seconds or a certain condition. and after that the request is processed to the database
So from the client site send post every second, but server don't need to process it to the database every second, just the most recent request
Has anyone ever known about such needs, using what methods, maybe what packages or tools are needed?
I use laravel, nginx, mariadb
thank you,

Resources