How to handle http stream responses from within a Substrate offchain worker? - substrate

Starting from the Substrate's Offchain Worker recipe that leverages the Substrate http module, I'm trying to handle http responses that are delivered as streams (basically interfacing a pubsub mechanism with a chain through a custom pallet).
Non-stream responses are perfectly handled as-is and reflecting them on-chain with signed transactions is working for me, as advertised in the doc.
However, when the responses are streams (meaning the http requests are never completed), I can only see the stream data logs in my terminal when I shut down the Substrate node. Trying to reflect each received chunk as a signed transaction doesn't work either: I can also see my logs only on node shut down, and the transaction is never sent (which makes sense since the node is down).
Is there an existing pattern for this use case? Is there a way to get the stream observed in background (not in the offchain worker runtime)?
Actually, would it be a good practice to keep the worker instance running ad vitam for this http request? (knowing that in my configuration the http request is sent only once, via a scheme of command queue - in the pallet storage - that gets cleaned at each block import).

Related

Azure web app - how to avoid multiple users triggering the same endpoint at time

Is there any way to set a limit on the number of requests for the azure web app (asp.net web API)?
I have an endpoint that runs for a long time and I like to avoid multiple triggers while the application is processing one request.
Thank you
This would have to be a custom implementation which can be done in a few ways
1. Leverage a Queue
This involves separating the background process into a separate execution flow by using a queue. So, your code would be split up into two parts
API Endpoint that receives the request and inserts a message into a queue
Separate Method (or Service) that listens on the queue and processes messages one by one
The second method could either be in the same Web App or could be separated into a Function App. The queue could be in Azure Service Bus, which your Web App or Function would be listening in on.
This approach has the added benefits of durability since if the web app or function were to crash and you would like to ensure that requests are all processed, the message would be processed again in order if not completed on the queue.
2. Distributed Lock
This approach is simpler but lacks durability. Here you would simply use an in-memory queue to process requests but ensure only one is being processed at a time but having the method acquire a lock which the following requests would wait for before being processed.
You could leverage blob storage leases as a option for distributed locks.

Disallow queuing of requests in gRPC microservices

SetUp:
We have gRPC pods running in a k8s cluster. The service mesh we use is linkerd. Our gRPC microservices are written in python (asyncio grpcs as the concurrency mechanism), with the exception of the entry-point. That microservice is written in golang (using gin framework). We have an AWS API GW that talks to an NLB in front of the golang service. The golang service communicates to the backend via nodeport services.
Requests on our gRPC Python microservices can take a while to complete. Average is 8s, up to 25s in the 99th %ile. In order to handle the load from clients, we've horizontally scaled, and spawned more pods to handle concurrent requests.
Problem:
When we send multiple requests to the system, even sequentially, we sometimes notice that requests go to the same pod as an ongoing request. What can happen is that this new request ends up getting "queued" in the server-side (not fully "queued", some progress gets made when context switches happen). The issue with queueing like this is that:
The earlier requests can start getting starved, and eventually timeout (we have a hard 30s cap from API GW).
The newer requests may also not get handled on time, and as a result get starved.
The symptom we're noticing is 504s which are expected from our hard 30s cap.
What's strange is that we have other pods available, but for some reason the loadbalancer isn't routing it to those pods smartly. It's possible that linkerd's smarter load balancing doesn't work well for our high latency situation (we need to look into this further, however that will require a big overhaul to our system).
One thing I wanted to try doing is to stop this queuing up of requests. I want the service to immediately reject the request if one is already in progress, and have the client (meaning the golang service) retry. The client retry will hopefully hit a different pod (do let me know if that won’t happen). In order to do this, I set the "maximum_concurrent_rpcs" to 1 on the server-side (Python server). When i sent multiple requests in parallel to the system, I didn't see any RESOURCE_EXHAUSTED exceptions (even under the condition when there is only 1 server pod). What I do notice is that the requests are no longer happening in parallel on the server, they happen sequentially (I think that’s a step in the right direction, the first request doesn’t get starved). That being said, I’m not seeing the RESOURCE_EXHAUSTED error in golang. I do see a delay between the entry time in the golang client and the entry time in the Python service. My guess is that the queuing is now happening client-side (or potentially still server side, but it’s not visible to me)?
I then saw online that it may be possible for requests to get queued up on the client-side as a default behavior in http/2. I tried to test this out in custom Python client that mimics the golang one with:
channel = grpc.insecure_channel(
"<some address>",
options=[("grpc.max_concurrent_streams", 1)]
)
# create stub to server with channel…
However, I'm not seeing any change here either. (Note, this is a test dummy client - eventually i'll need to make this run in golang. Any help there would be appreciated as well).
Questions:
How can I get the desired effect here? Meaning server sends resource exhausted if already handling a request, golang client retries, and it hits a different pod?
Any other advice on how to fix this issue? I'm grasping at straws here.
Thank you!

Microservices asynchronous response

I come across many blog that say using rabbitmq improve the performance of microservices due to asynchronous nature of rabbitmq.
I don't understand in that case how the the http response is send to end user I am elaborating my question below more clearly.
user send a http request to microservice1(which is user facing service)
microservice1 send it to rabbitmq because it need some service from microservice2
microservice2 receive the request process it and send the response to rabbitmq
microservice1 receive the response from rabbitmq
NOW how this response is send to browser?
Does microservice1 waits untill it receive the response from rabbitmq?
If yes then how it become aynchronous??
It's a good question. To answer, you have to imagine the server running one thread at a time. Making a request to a microservice via RestTemplate is a blocking request. The user clicks a button on the web page, which triggers your spring-boot method in microservice1. In that method, you make a request to microservice2, and the microservice1 does a blocking wait for the response.
That thread is busy waiting for microservice2 to complete the request. Threads are not expensive, but on a very busy server, they can be a limiting factor.
RabbitMQ allows microservice1 to queue up a message to microservice2, and then release the thread. Your receive message will be trigger by the system (spring-boot / RabbitMQ) when microservice2 processes the message and provides a response. That thread in the thread pool can be used to process other users' requests in the meantime. When the RabbitMQ response comes, the thread pool uses an unused thread to process the remainder of the request.
Effectively, you're making the server running microservice1 have more threads available more of the time. It only becomes a problem when the server is under heavy load.
Good question , lets discuss one by one
Synchronous behavior:
Client send HTTP or any request and waits for the response HTTP.
Asynchronous behavior:
Client sends the request, There's another thread that is waiting on the socket for the response. Once response arrives, the original sender is notified (usually, using a callback like structure).
Now we can talk about blocking vs nonblocking call
When you are using spring rest then each call will initiate new thread and waiting for response and block your network , while nonblocking call all call going via single thread and pushback will return response without blocking network.
Now come to your question
Using rabbitmq improve the performance of microservices due to
asynchronous nature of rabbitmq.
No , performance is depends on your TPS hit and rabbitmq not going to improve performance .
Messaging give you two different type of messaging model
Synchronous messaging
Asynchronous messaging
Using Messaging you will get loose coupling and fault tolerance .
If your application need blocking call like response is needed else cannot move use Rest
If you can work without getting response go ahaead with non blocking
If you want to design your app loose couple go with messaging.
In short above all are architecture style how you want to architect your application , performance depends on scalability .
You can combine your app with rest and messaging and non-blocking with messaging.
In your scenario microservice 1 could be rest blocking call give call other api using rest template or web client and or messaging queue and once get response will return rest json call to your web app.
I would take another look at your architecture. In general, with microservices - especially user-facing ones that must be essentially synchronous, it's an anti-pattern to have ServiceA have to make a call to ServiceB (which may, in turn, call ServiceC and so on...) to return a response. That condition indicates those services are tightly coupled which makes them fragile. For example: if ServiceB goes down or is overloaded in your example, ServiceA also goes offline due to no fault of its own. So, probably one or more of the following should occur:
Deploy the related services behind a facade that encloses the entire domain - let the client interact synchronously with the facade and let the facade handle talking to multiple services behind the scenes.
Use MQTT or AMQP to publish data as it gets added/changed in ServiceB and have ServiceA subscribe to pick up what it needs so that it can fulfill the user request without explicitly calling another service
Consider merging ServiceA and ServiceB into a single service that can handle requests without having to make external calls
You can also send the HTTP request from the client to the service, set the application-state to waiting or similar, and have the consuming application subscribe to a eventSuccess or eventFail integration message from the bus. The main point of this idea is that you let daisy-chained services (which, again, I don't like) take their turns and whichever service "finishes" the job publishes an integration event to let anyone who's listening know. You can even do things like pass webhook URI's with the initial request to have services call the app back directly on completion (or use SignalR, or gRPC, or...)
The way we use RabbitMQ is to integrate services in real-time so that each service always has the info it needs to be responsive all by itself. To use your example, in our world ServiceB publishes events when data changes. ServiceA only cares about, and subscribes to a small subset of those events (and typically only a field or two of the event data), but it knows within seconds (usually less) when B has changed and it has all the information it needs to respond to requests. Each service literally has no idea what other services exist, it just knows events that it cares about (and that conform to a contract) arrive from time-to-time and it needs to pay attention to them.
You could also use events and make the whole flow async. In this scenario microservice1 creates an event representing the user request and then return a requested created response immediately to the user. You can then notify the user later when the request is finished processing.
I recommend the book Designing Event-Driven Systems written by Ben Stopford.
I asked a similar question to Chris Richardson (www.microservices.io). The result was:
Option 1
You use something like websockets, so the microservice1 can send the response, when it's done.
Option 2
microservice1 responds immediately (OK - request accepted). The client pulls from the server repeatedly until the state changed. Important is that microservice1 stores some state about the request (ie. initial state "accepted", so the client can show the spinner) which is modified, when you finally receive the response (ie. update state to "complete").

Front-facing REST API with an internal message queue?

I have created a REST API - in a few words, my client hits a particular URL and she gets back a JSON response.
Internally, quite a complicated process starts when the URL is hit, and there are various services involved as a microservice architecture is being used.
I was observing some performance bottlenecks and decided to switch to a message queue system. The idea is that now, once the user hits the URL, a request is published on internal message queue waiting for it to be consumed. This consumer will process and publish back on a queue and this will happen quite a few times until finally, the same node servicing the user will receive back the processed response to be delivered to the user.
An asynchronous "fire-and-forget" pattern is now being used. But my question is, how can the node servicing a particular person remember who it was servicing once the processed result arrives back and without blocking (i.e. it can handle several requests until the response is received)? If it makes any difference, my stack looks a little like this: TomCat, Spring, Kubernetes and RabbitMQ.
In summary, how can the request node (whose job is to push items on the queue) maintain an open connection with the client who requested a JSON response (i.e. client is waiting for JSON response) and receive back the data of the correct client?
You have few different scenarios according to how much control you have on the client.
If the client behaviour cannot be changed, you will have to keep the session open until the request has not been fully processed. This can be achieved employing a pool of workers (futures/coroutines, threads or processes) where each worker keeps the session open for a given request.
This method has few drawbacks and I would keep it as last resort. Firstly, you will only be able to serve a limited amount of concurrent requests proportional to your pool size. Lastly as your processing is behind a queue, your front-end won't be able to estimate how long it will take for a task to complete. This means you will have to deal with long lasting sessions which are prone to fail (what if the user gives up?).
If the client behaviour can be changed, the most common approach is to use a fully asynchronous flow. When the client initiates a request, it is placed within the queue and a Task Identifier is returned. The client can use the given TaskId to poll for status updates. Each time the client requests updates about a task you simply check if it was completed and you respond accordingly. A common pattern when a task is still in progress is to let the front-end return to the client the estimated amount of time before trying again. This allows your server to control how frequently clients are polling. If your architecture supports it, you can go the extra mile and provide information about the progress as well.
Example response when task is in progress:
{"status": "in_progress",
"retry_after_seconds": 30,
"progress": "30%"}
A more complex yet elegant solution would consist in using HTTP callbacks. In short, when the client makes a request for a new task it provides a tuple (URL, Method) the server can use to signal the processing is done. It then waits for the server to send the signal to the given URL. You can see a better explanation here. In most of the cases this solution is overkill. Yet I think it's worth to mention it.
One option would be to use DeferredResult provided by spring but that means you need to maintain some pool of threads in request serving node and max no. of active threads will decide the throughput of your system. For more details on how to implement DeferredResult refer this link https://www.baeldung.com/spring-deferred-result

How to manage microservice failure?

Let's say, I have several micro-services (REST API), the problem is, if one service is not accessible (let's call service "A" ) the data which was sending to service "A" will be saved in temporary database. And after service worked, the data will be sent again.
Question:
1. Should I create the service which pings to service "A" in every 10 seconds to know service works or not? Or is it possible to do it by task queue? Any suggestions?
Polling is a waste of bandwidth. You want to use a transactional queue.
Throw all your outbound messages in the queue, and have some other process to handle the messages.
How this will work is - after your process reads from the queue, and tries to send to the REST service:
If it works, commit the transaction (for the queue)
If it doesn't work, don't commit. Start a delay (minutes, seconds - you know best) until you read from the queue again.
You can use Circuit Breaker pattern for e.g. hystrix circuit breaker from netflix.
It is possible to open circuit-breaker base on a timeout or when service call fails or inaccessible.
There are multiple dimensions to your question. First you want to consider using an infrastructure that provides resilience and self healing. Meaning you want to deploy a cluster of containers, all containing your Service A. Now you use a load balancer or API gateway in front of your service to distribute calls/load. It will also periodically check for the health of your service. When it detects a container does not respond correctly it can kill the container and start another one. This can be provided by a container infrastructure such as kubernetes / docker swarm etc.
Now this does not protect you from losing any requests. In the event that a container malfunctions there will still be a short time between the failure and the next health check where requests may not be served. In many applications this is acceptable and the client side will just re-request and hit another (healthy container). If your application requires absolutely not losing requests you will have to cache the request in for example an API gateway and make sure it is kept until a Service has completed it (also called Circuit Breaker). An example technology would be Netflix Zuul with Hystrix. Using such a Gatekeeper with built in fault tolerance can increase the resiliency even further. As a side note - Using an API gateway can also solve issues with central authentication/authorization, routing and monitoring.
Another approach to add resilience / decouple is to use a fast streaming / message queue, such as Apache Kafka, for recording all incoming messages and have a message processor process them whenever ready. The trick then is to only mark the messages as processed when your request was served fully. This can also help in scenarios where faults can occur due to large number of requests that cannot be handled in real time by the Service (Asynchronous Decoupling with Cache).
Service "A" should fire a "ready" event when it becomes available. Just listen to that and resend your request.

Resources