ReplyingKafkaTemplate timeout in Spring-Kafka - spring-boot

I have very regularly experienced timeouts using ReplyingKafkaTemplate and request/reply pattern for synchronous calls. It has been suggested this is due to consumers and that appears to be correct because what resolves renaming the consumer group on the calling side of request/reply and redeploying the service. This is not an optimal solution however, and I would like to prevent this error from occurring. To recap I from Service A to Service B I make a synchronous request and after a while due to staleness or something eventually I see a timeout in this request. The solution I have found is to change name of consumer group in service A and redeploy but I need a better resolution. Any suggestions?

Related

Spring Cloud Data Flow (SCDF) : Http Client Processor app's retry mechanism causing issues during retry

Im using Http Client Processor app in one of my stream which is a standard app of SCDF.The puspose of this app is to make http calls against provided URL with a message payload. I tried to enable the retry mechanism of this app by keeping the boolean httpclient.retry.enabled to 'true'.
But when I do that it try to repost the message against the http end point even if the first attempt is succecssfull. It looks like it is working with the concept of 'write at least once'. The problem with this approach is, it creates duplicates in the target system.
Is there a way we can configure it for 'write just once if the call is successful else, retry'. If not can we expect a fix from Spring ?
The Http Client Processor is no longer supported. I recommend upgrading to HttpRequestProcessor. This uses the common retry mechanism included in the messaging binder. The behavior is as you describe, the request will be retried only if the consumer fails to acknowledge the message. With at least once guarantees, you still have the potential for duplicates.

Using transactional bus inside consumer

I have REST API gateway which calls one of the microservices with MassTransit request client. This request is not durable and is meant to live for a short time - essentially it's just replacement of "traditional" synchronous (via HTTP/GRPC/etc) gateway-microservice communication.
On microservice side I have consumer which under the hood uses DbContext and Transaction (EFC) to perform some work in database. After the work is done it should publish "WorkDoneEvent" (to be consumed later by other microservices) and return result of the work to api gateway. Event must be published atomically along with transaction used to perform the work. It does not matter if ApiGateway will receive response / will retry request - as soon as transaction is commited both work result and sending "WorkDoneEvent" must be guaranteed.
Normally this is done with transactional outbox which first saves published event to database within same transaction as the work is done. (And then some process constantly "polls" outbox and tries send message to the broker, when done it removes message from outbox). As far as I know.
MassTransit seems to have transactional outbox built in: https://masstransit-project.com/advanced/middleware/transactions.html#transactional-bus.
However in docs it clearly states:
Never use the TransactionalBus or TransactionalEnlistmentBus when writing consumers. These tools are very specific and should be used only in the scenarios described.
And this is exactly what I want to do...
Why I should not do it?
I'd suggest using the InMemoryOutbox, which is part of MassTransit. It's significantly lighter weight, is designed to work in a consumer, and will not publish your events until after the consumer has completed (but prior to acknowledging the message at the broker). The only consideration is that your consumer should be idempotent (which needs to be the case in your approach as well) and if the operation was already performed on a retry, it should republish the events.
There are videos, articles, and a sample to go along with it.

How to manage microservice failure?

Let's say, I have several micro-services (REST API), the problem is, if one service is not accessible (let's call service "A" ) the data which was sending to service "A" will be saved in temporary database. And after service worked, the data will be sent again.
Question:
1. Should I create the service which pings to service "A" in every 10 seconds to know service works or not? Or is it possible to do it by task queue? Any suggestions?
Polling is a waste of bandwidth. You want to use a transactional queue.
Throw all your outbound messages in the queue, and have some other process to handle the messages.
How this will work is - after your process reads from the queue, and tries to send to the REST service:
If it works, commit the transaction (for the queue)
If it doesn't work, don't commit. Start a delay (minutes, seconds - you know best) until you read from the queue again.
You can use Circuit Breaker pattern for e.g. hystrix circuit breaker from netflix.
It is possible to open circuit-breaker base on a timeout or when service call fails or inaccessible.
There are multiple dimensions to your question. First you want to consider using an infrastructure that provides resilience and self healing. Meaning you want to deploy a cluster of containers, all containing your Service A. Now you use a load balancer or API gateway in front of your service to distribute calls/load. It will also periodically check for the health of your service. When it detects a container does not respond correctly it can kill the container and start another one. This can be provided by a container infrastructure such as kubernetes / docker swarm etc.
Now this does not protect you from losing any requests. In the event that a container malfunctions there will still be a short time between the failure and the next health check where requests may not be served. In many applications this is acceptable and the client side will just re-request and hit another (healthy container). If your application requires absolutely not losing requests you will have to cache the request in for example an API gateway and make sure it is kept until a Service has completed it (also called Circuit Breaker). An example technology would be Netflix Zuul with Hystrix. Using such a Gatekeeper with built in fault tolerance can increase the resiliency even further. As a side note - Using an API gateway can also solve issues with central authentication/authorization, routing and monitoring.
Another approach to add resilience / decouple is to use a fast streaming / message queue, such as Apache Kafka, for recording all incoming messages and have a message processor process them whenever ready. The trick then is to only mark the messages as processed when your request was served fully. This can also help in scenarios where faults can occur due to large number of requests that cannot be handled in real time by the Service (Asynchronous Decoupling with Cache).
Service "A" should fire a "ready" event when it becomes available. Just listen to that and resend your request.

Commit blocks using spring-amqp and rabbitmq when disk_size_limit threshold is reached

We are using rabbitmq 3.0.1 on CentOS 6, and as a client Spring spring-rabbit version 1.1.2.RELEASE. (I know these aren't the latest versions, see later).
We send messages to rabbitmq via this client. These messages are initiated via an external rest call. Someone else calls our web service updates the database and sends the amqp message. I would like to be informed if rabbitmq blocks the client - for instance if the disk_free_limit threshold is reached.
Importantly, I would like to be informed in the same thread as that processing the web request, so that I can rollback the transaction.
Our web service can also update a database (within a transaction obviously). Normally, this works fine. However, under certain circumstances, rabbitmq can block our web server - the most obvious being when the disk_free_limit is reached. This blocks the web server Thread, indefinitely. The external caller of the web service will obviously time out after a sensible period, but the thread in our web service doesn't - it stays around, and keeps the resources, and importantly the transaction open.
The web server is blocking the thread because it is transactional. It isn't the initial message which is blocking, it is the commit. I assume that rabbitmq is blocking because it can't persist it or something like that. The thread is blocking until rabbitmq sends the commit ok message back. The bit of code is deep within the rabbitmq implementation - com.rabbitmq.client.impl.ChannelN
public Tx.CommitOk More ...txCommit() throws IOException
{
return (Tx.CommitOk) exnWrappingRpc(new Tx.Commit()).getMethod();
}
and this eventually calls the following method from com.rabbitmq.client.impl.AMQChannel
public T More ...getReply() throws ShutdownSignalException
{
return _blocker.uninterruptibleGetValue();
}
The preferable solution for this would be some sort of timeout on the txCommit - then I could throw an exception and fail the web service with a 500 or whatever. I can't find any way of doing this.
What I have found is:
addBlockedListener - this adds a listener on a message sent by rabbitmq when this it is blocked. This is good, but the message will is treated by another thread - so I can't fail the web service. Using this I can at least log the fact that rabbitmq is blocked, through syslog or whatever. However, this isn't available on the version that we run - we would have to upgrade to the latest. We would prefer not to do this because of the testing it would imply.
setConnectionTimeout(int) - this sets the connection timeout for the initial connection to rabbitmq. This doesn't apply in my case, because rabbitmq is up and running and accepts the connection.
AmqpTemplate.setReplyTimeout() - as shown above, this reply timeout does not apply to the commit.
I fully understand that this situation (disk_free_limit threshold is breached) is a situation which should not occur in a production system. However, I would like to be able to cope nicely with this situation so that my application behaves nicely when one of its components (rabbitmq) has a problem.
So, what other options do I have? Is there any way, short of rewriting portions of the spring amqp client or removing the transactionality of doing what I want?

Spring's JMS Design Question : Decouple processing of messages

I'm using a message listener to process some messages from MQ based on Spring's DefaultMessageListenerContainer. After I receive a message, I have to make a Web Service (WS) call. However, I don't want to do this in the onMessage method because it would block the onMessage method until the invocation of WS is successful and this introduces latency in dequeuing of messages from the queue. How can I decouple the invocation of the Web Service by calling it outside of the onMesage method or without impacting the dequeuing of messages?
Thanks,
I think you might actually want to invoke the web service from your onMessage. Why do you want to dequeue messages quickly, then delay further processing? If you do what you're saying, you'd probably have to introduce another level of queueing, or some sort of temporary "holding" collection, which is redundant. The point of the queue is to hold messages, and your message listener will pull them off and process them as quickly as possible.
If you are looking for a way to maximize throughput on the queue, you might think about making it multi-threaded, so that you have multiple threads pulling messages off the queue to invoke the web service. You can easily do this by setting the "concurrentConsumers" configuration on the DefaultMessageListenerContainer. If you set concurrentConsumers to 5, you'll have 5 threads pulling messages off the queue to process. It does get tricky if you have to maintain ordering on the messages, but there may be solutions to that problem if that's the case.
I agree with answer provided before me , however I can see a usecase similar to this very common in practice. I'm adding my two cents It might be valid in some cases that you don't want to do time consuming work in your onMessage Thread (which is pulling message from Q)
We have something similar in one workflow, where if user selects some XYZ option on GUI that means at server we need to connect to another external webservice to get ABCD in this case we do not make call to webservice in onMessage Thread and use ThreadPool to dispatch and handle that call.
If something wrong happens during webservice call we broadcast that to GUI as separate Message , there is concept of request id which is preserved across messages so that GUI can relate error messages. You can use ExecutorService implementation to submit task.
hope it helps.

Resources