MassTransit Mediator MessageNotConsumedException - masstransit

I Noticed a weird issue in one of our applications, from time to time, we get MessageNotConsumedException errors on API requests which we route via MT's Mediator.
As you will notice below, we have configured a customer LogFilter<T> which implements IFilter<ConsumeContext<T>> which ensure that we log each mediator message before and after consuming, or a 'ConsumeFailed' log in case an exception is thrown in any consumer.
When the error manifests itself, in the logs we see the following sequence of events:
T 0 : PreConsume logged
T +5ms: PostConsume logged
T +6ms: R-FAULT logged (I believe this logging is made by MT's internals?)
T +9ms: API Request 500 response logged, with `MessageNotConsumedException` as internal error
In the production environment, we see these errors with various timings, it happens in requests taking as 'little' as 9ms, over several seconds up to 30+ seconds.
I've trying to reproduce this problem in my local development environment, and did manage to produce the same sequence of events, but only by adding a delay of 35 seconds inside the consumer (see GetSomethingById class below for consumer body)
If I reduce the delay to 30s or less, the reponse will be fine.
Since the production errors are happening with very low handling times in the consumer, I suspect what I'm able to reproduce is not exactly the same.
However I'd still like to understand why I'm getting the MessageNotConsumedException, since while debugging I can easily step through my entire consumer (after the delay has elapsed) and happily reach the context.RespondAsync() call without any problems. Also while stepping through the consumer, the context.CancellationToken has not been cancelled.
I also came across this question, which sounds exactly like what I'm having, however I did add the HttpContext scope as documented. To be fair, I didn't try this change in production yet, but my local issue with the 35s delay remains unchanged.
I have MassTransit medatior configured as follows:
services.AddHttpContextAccessor();
services.AddMediator(x =>
{
x.AddConsumer<GetSomethingByIdHandler>();
x.ConfigureMediator((context, cfg) =>
{
//The order of using the middleware matters, so don't change this
cfg.UseHttpContextScopeFilter(context); // Extension method & friends copy/pasted from https://masstransit-project.com/usage/mediator.html#http-context-scope
cfg.UseConsumeFilter(typeof(LogFilter<>), context);
});
});
The LogFilter which is configured is the following class:
public class LogFilter<T> : IFilter<ConsumeContext<T>> where T : class
{
private readonly ILogger<LogFilter<T>> _logger;
public LogFilter(ILogger<LogFilter<T>> logger)
{
_logger = logger;
}
public void Probe(ProbeContext context) => context.CreateScope("log-filter");
public async Task Send(ConsumeContext<T> context, IPipe<ConsumeContext<T>> next)
{
LogPreConsume(context);
try
{
await next.Send(context);
}
catch (Exception exception)
{
LogConsumeException(context, exception);
throw;
}
LogPostConsume(context);
}
private void LogPreConsume(ConsumeContext context) => _logger.LogInformation(
"{MessageType}:{EventType} correlated by {CorrelationId} on {Address}"
+ " with send time {SentTime:dd/MM/yyyy HH:mm:ss:ffff}",
typeof(T).Name,
"PreConsume",
context.CorrelationId,
context.ReceiveContext.InputAddress,
context.SentTime?.ToUniversalTime());
private void LogPostConsume(ConsumeContext context) => _logger.LogInformation(
"{MessageType}:{EventType} correlated by {CorrelationId} on {Address}"
+ " with send time {SentTime:dd/MM/yyyy HH:mm:ss:ffff}"
+ " and elapsed time {ElapsedTime}",
typeof(T).Name,
"PostConsume",
context.CorrelationId,
context.ReceiveContext.InputAddress,
context.SentTime?.ToUniversalTime(),
context.ReceiveContext.ElapsedTime);
private void LogConsumeException(ConsumeContext<T> context, Exception exception) => _logger.LogError(exception,
"{MessageType}:{EventType} correlated by {CorrelationId} on {Address}"
+ " with sent time {SentTime:dd/MM/yyyy HH:mm:ss:ffff}"
+ " and elapsed time {ElapsedTime}"
+ " and message {#message}",
typeof(T).Name,
"ConsumeFailure",
context.CorrelationId,
context.ReceiveContext.InputAddress,
context.SentTime?.ToUniversalTime(),
context.ReceiveContext.ElapsedTime,
context.Message);
}
I then have a controller method which looks like this:
[Route("[controller]")]
[ApiController]
public class SomethingController : ControllerBase
{
private readonly IMediator _mediator;
public SomethingController(IMediator mediator)
{
_mediator = mediator;
}
[HttpGet("{somethingId}")]
public async Task<IActionResult> GetSomething([FromRoute] int somethingId, CancellationToken ct)
{
var query = new GetSomethingByIdQuery(somethingId);
var response = await _mediator
.CreateRequestClient<GetSomethingByIdQuery>()
.GetResponse<Something>(query, ct);
return Ok(response.Message);
}
}
The consumer which handles this request is as follows:
public record GetSomethingByIdQuery(int SomethingId);
public class GetSomethingByIdHandler : IConsumer<GetSomethingByIdQuery>
{
public async Task Consume(ConsumeContext<GetSomethingByIdQuery> context)
{
await Task.Delay(35000, context.CancellationToken);
await context.RespondAsync(new Something{Name = "Something cool"});
}
}

MessageNotConsumedException is thrown when a message is sent using mediator and that message is not consumed by a consumer. That wouldn't typically be a transient error since one would expect that the consumer remains configured/connected to the mediator for the lifetime of the application.

Related

Immediately return first emitted value from two Monos while continuing to process the other asynchronously

I have two data sources, each returning a Mono:
class CacheCustomerClient {
Mono<Entity> createCustomer(Customer customer)
}
class MasterCustomerClient {
Mono<Entity> createCustomer(Customer customer)
}
Callers to my application are hitting a Spring WebFlux controller:
#PostMapping
#ResponseStatus(HttpStatus.CREATED)
public Flux<Entity> createCustomer(#RequestBody Customer customer) {
return customerService.createNewCustomer(entity);
}
As long as either data source successfully completes its create operation, I want to immediately return a success response to the caller, however, I still want my service to continue processing the result of the other Mono stream, in the event that an error was encountered, so it can be logged.
The problem seems to be that as soon as a value is returned to the controller, a cancel signal is propagated back through the stream by Spring WebFlux and, thus, no information is logged about a failure.
Here's one attempt:
public Flux<Entity> createCustomer(final Customer customer) {
var cacheCreate = cacheClient
.createCustomer(customer)
.doOnError(WebClientResponseException.class,
err -> log.error("Customer creation failed in cache"));
var masterCreate = masterClient
.createCustomer(customer)
.doOnError(WebClientResponseException.class,
err -> log.error("Customer creation failed in master"));
return Flux.firstWithValue(cacheCreate, masterCreate)
.onErrorMap((err) -> new Exception("Customer creation failed in cache and master"));
}
Flux.firstWithValue() is great for emitting the first non-error value, but then whichever source is lagging behind is cancelled, meaning that any error is never logged out. I've also tried scheduling these two sources on their own Schedulers and that didn't seem to help either.
How can I perform these two calls asynchronously, and emit the first value to the caller, while continuing to listen for emissions on the slower source?
You can achieve that by transforming you operators to "hot" publishers using share() operator:
First subscriber launch the upstream operator, and additional subscribers get back result cached from the first subscriber:
Further Subscriber will share [...] the same result.
Once a second subscription has been done, the publisher is not cancellable:
It's worth noting this is an un-cancellable Subscription.
So, to achieve your requirement:
Apply share() on each of your operators
Launch a subscription on shared publishers to trigger processing
Use shared operators in your pipeline (here firstWithValue).
Sample example:
import java.time.Duration;
import reactor.core.publisher.Mono;
public class TestUncancellableMono {
// Mock a mono successing quickly
static Mono<String> quickSuccess() {
return Mono.delay(Duration.ofMillis(200)).thenReturn("SUCCESS !");
}
// Mock a mono taking more time and ending in error.
static Mono<String> longError() {
return Mono.delay(Duration.ofSeconds(1))
.<String>then(Mono.error(new Exception("ERROR !")))
.doOnCancel(() -> System.out.println("CANCELLED"))
.doOnError(err -> System.out.println(err.getMessage()));
}
public static void main(String[] args) throws Exception {
// Transform to hot publisher
var sharedQuick = quickSuccess().share();
var sharedLong = longError().share();
// Trigger launch
sharedQuick.subscribe();
sharedLong.subscribe();
// Subscribe back to get the cached result
Mono
.firstWithValue(sharedQuick, sharedLong)
.subscribe(System.out::println, err -> System.out.println(err.getMessage()));
// Wait for subscription to end.
Thread.sleep(2000);
}
}
The output of the sample is:
SUCCESS !
ERROR !
We can see that error message has been propagated properly, and that upstream publisher has not been cancelled.

How to lock async operation\limit number of executions in a web-api controller

I have a web-api endpoint in which I would like to control its execution according to internal async operation. The objective of this endpoint is to limit the reconnection attempts to an external resource.
In the controller I have a singleton (service with property) counter which counts every time a reconnect attempt is being done.
The problem is that the frequency of the http request (the time between each http request) is smaller (lets say 1 second) that the time of the inner async operation (lets say 10 seconds), so different threads are making the async operation while the counter is not yet incremented.
The outcome of this code is lots of:
"Trying to connect, number of attempts: 0" entries in the logs.
I thought of using loch on the code block - but an await operation cannot be inside a body of lock
See my code:
public class SomeController : ControllerBase
{
private static readonly object LockObject = new object();
private readonly ISomeSingletonService _someSingletonService;
public SomeController(ISomeSingletonService someSingletonService)
{
_someSingletonService = someSingletonService;
}
[HttpPost("Connect")] // This is reached every second
public async Task Connect()
{
/*lock (LockObject) //can't do that because await cannot be in the body of lock
{*/
logger.LogInformation($"Trying to connect, number of attempts: {_someSingletonService.ReconnectionsAttempts}");
if (_someSingletonService.ReconnectionsAttempts < maxReconnectionsAttempts)
{
await someAsyncOeration(); // This operation can last few seconds
_someSingletonService.ReconnectionsAttempts++;
}
/*}*/
}
}

Pub Sub Messages still in queue but not pulled

I have a simple shell script that connect to GCP and try to pull Pub/Sub messages from a topic.
When launched, it check if any message exist, does a simple action if so, then ack the message and loop .
It looks like that :
while [ 1 ]
do
gcloud pubsub subscriptions pull...
// Do something
gcloud pubsub subscriptions ack ...
done
Randomly it does not pull the messages : they stay in the queue and are not pulled.
So we tried to add a while loop when getting the message with something like 5 re-try in order to avoid those issues work better but not perfectly. I also think that is a bit shabby...
This issue happened on other project that where migrated from a script shell to Java (for some other reasons) where we used a pull subscription and it work perfectly on those projects now !
We must probably do something wrong but I don't know what...
I have read that sometimes gcloud pull less messages than what's really on the pubsub queue :
https://cloud.google.com/sdk/gcloud/reference/pubsub/subscriptions/pull
But it must at least pull one ... In our case no messages are pulled but randomly.
Is there something to improve here ?
In general, relying on a shell script that uses gcloud to retrieve messages and do something with them is not going to be an efficient way to use Cloud Pub/Sub. It is worth noting that the lack of messages being returned in pull is not indicative of a lack of messages; it just means that messages could not be returned before the pull request's deadline. The gcloud subscriptions pull command sets the returnImmediately property (see info in pull documentation) to true, which basically means that if there aren't messages already quickly accessible in memory, then no messages are going to be returned. This property is deprecated and should not be set to true, so that is probably something that we need to explore changing in gcloud.
You would be better off writing a subscriber using the client libraries that sets up a stream and continuously retrieves messages. If your intention is to run this only periodically, then you could write a job that reads messages and waits some time after messages have not been received and shuts down. Again, this would not guarantee that all messages would be consumed that are available, but it would be true in most cases.
A version of this in Java would look like this:
import com.google.cloud.pubsub.v1.AckReplyConsumer;
import com.google.cloud.pubsub.v1.MessageReceiver;
import com.google.pubsub.v1.ProjectSubscriptionName;
import com.google.pubsub.v1.PubsubMessage;
import java.util.concurrent.atomic.AtomicLong;
import org.joda.time.DateTime;
/** A basic Pub/Sub subscriber for purposes of demonstrating use of the API. */
public class Subscriber implements MessageReceiver {
private final String PROJECT_NAME = "my-project";
private final String SUBSCRIPTION_NAME = "my-subscription";
private com.google.cloud.pubsub.v1.Subscriber subscriber;
private AtomicLong lastReceivedTimestamp = new AtomicLong(0);
private Subscriber() {
ProjectSubscriptionName subscription =
ProjectSubscriptionName.of(PROJECT_NAME, SUBSCRIPTION_NAME);
com.google.cloud.pubsub.v1.Subscriber.Builder builder =
com.google.cloud.pubsub.v1.Subscriber.newBuilder(subscription, this);
try {
this.subscriber = builder.build();
} catch (Exception e) {
System.out.println("Could not create subscriber: " + e);
System.exit(1);
}
}
#Override
public void receiveMessage(PubsubMessage message, AckReplyConsumer consumer) {
// Process message
lastReceivedTimestamp.set(DateTime.now().getMillis());
consumer.ack();
}
private void run() {
subscriber.startAsync();
while (true) {
long now = DateTime.now().getMillis();
long currentReceived = lastReceivedTimestamp.get();
if (currentReceived > 0 && ((now - currentReceived) > 30000)) {
subscriber.stopAsync();
break;
}
try {
Thread.sleep(5000);
} catch (InterruptedException e) {
System.out.println("Error while waiting for completion: " + e);
}
}
System.out.println("Subscriber has not received message in 30s. Stopping.");
subscriber.awaitTerminated();
}
public static void main(String[] args) {
Subscriber s = new Subscriber();
s.run();
System.exit(0);
}
}

How to use HttpContext inside Task.Run

There is some posts explain how to tackle, but couldnt help me much..
Logging Request/Response in middleware, it works when use 'await' with Task.Run() but since its awaited current operation to complete there is performance issue.
When I remove await as below, it runs fast but not logging anything, since HttpContext instance not available to use inside parallel thread
public class LoggingHandlerMiddleware
{
private readonly RequestDelegate next;
private readonly ILoggerManager _loggerManager;
public LoggingHandlerMiddleware(RequestDelegate next, ILoggerManager loggerManager)
{
this.next = next;
_loggerManager = loggerManager;
}
public async Task Invoke(HttpContext context, ILoggerManager loggerManager, IWebHostEnvironment environment)
{
_ = Task.Run(() =>
{
AdvanceLoggingAsync(context, _loggerManager, environment);
});
...
}
private void AdvanceLoggingAsync(HttpContext context, ILoggerManager loggerManager, IWebHostEnvironment environment, bool IsResponse = false)
{
{
context.Request.EnableBuffering(); // Throws ExecutionContext.cs not found
result += $"ContentType:{context.Request.ContentType},";
using (StreamReader reader = new StreamReader(context.Request.Body, Encoding.UTF8, true, 1024, true))
{
result += $"Body:{await reader.ReadToEndAsync()}";
context.Request.Body.Position = 0;
}
loggerManager.LogInfo($"Advance Logging Content(Request)-> {result}");
}
How can I leverage Task.Run() performance with accessing HttpContext?
Well, you can extract what you need from the context, build your string you want to log, and then pass that string to the task you run.
However, firing and forgetting a task is not good. If it throws an exception, you risk of bringing down the server, or at least you will have very hard time getting information about the error.
If you are concerned about the logging performance, better add what you need to log to a message queue, and have a process that responds to new messages in the queue and logs the message to the log file.

Elasticsearch : AssertionError while getting index name from alias

We have been using Elasticsearch Plugin in our project. While getting index name from alias getting below error
Error
{
"error": "AssertionError[Expected current thread[Thread[elasticsearch[Seth][http_server_worker][T#2]{New I/O worker #20},5,main]] to not be a transport thread. Reason: [Blocking operation]]", "status": 500
}
Code
String realIndex = client.admin().cluster().prepareState()
.execute().actionGet().getState().getMetaData()
.aliases().get(aliasName).iterator()
.next().key;
what causes this issue?? Googled it didn't get any help
From the look of the error, it seems like this operation is not allowed on the transport thread as it will block the thread until you get the result back. You need to execute this on a execute thread.
public String getIndexName() {
final IndexNameHolder result = new IndexNameHolder(); // holds the index Name. Needed a final instance here, hence created a holder.
getTransportClient().admin().cluster().prepareState().execute(new ActionListener<ClusterStateResponse>() {
#Override
public void onResponse(ClusterStateResponse response) {
result.indexName = response.getState().getMetaData().aliases().get("alias").iterator().next().key;
}
#Override
public void onFailure(Throwable e) {
//Handle failures
}
});
return result.value;
}
There is another method for execute(), one which takes a listener. You need to implement your own listener. In my answer, I have an anonymous implementation of Listener.
I hope it helps

Resources