I'm new to mass transit and have a question regarding how I should solve a failure to consume a message. Given the below code I am consuming INotificationRequestContract's. As you can see the code will break and not complete.
public class NotificationConsumerWorker : IConsumer<INotificationRequestContract>
{
private readonly ILogger<NotificationConsumerWorker> _logger;
private readonly INotificationCreator _notificationCreator;
public NotificationConsumerWorker(ILogger<NotificationConsumerWorker> logger, INotificationCreator notificationCreator)
{
_logger = logger;
_notificationCreator = notificationCreator;
}
public Task Consume(ConsumeContext<INotificationRequestContract> context)
{
try
{
throw new Exception("Horrible error");
}
catch (Exception e)
{
// >>>>> insert code here to put message back for later consumption. <<<<<
_logger.LogError(e, "Failed to consume message");
throw;
}
}
}
How do I best handle a scenario such as this where the consumption fails? In my specific case this is likely to occur if a required external service is unavailable.
I can see two solutions.
If there is a way to put the message back, or cancel the consumption so that it will be tried again.
I could store it locally in a database and create my own re-try method to wrap this (but would prefer not to for sake of simplicity).
The exceptions section of the documentation provides sufficient guidance for dealing with consumer exceptions.
There are two retry approaches, which can be used in combination:
Message Retry, which waits while the message is locked, in-process, for the next retry. Therefore, these should be short, to deal with transient issues.
Message Redelivery, which delays the message using either the broker delayed delivery, or a message scheduler, so that it is redelivered to the receive endpoint at some point in the future.
Once all retry/redelivery attempts are exhausted, the message is moved to the _error queue.
Related
I have two data sources, each returning a Mono:
class CacheCustomerClient {
Mono<Entity> createCustomer(Customer customer)
}
class MasterCustomerClient {
Mono<Entity> createCustomer(Customer customer)
}
Callers to my application are hitting a Spring WebFlux controller:
#PostMapping
#ResponseStatus(HttpStatus.CREATED)
public Flux<Entity> createCustomer(#RequestBody Customer customer) {
return customerService.createNewCustomer(entity);
}
As long as either data source successfully completes its create operation, I want to immediately return a success response to the caller, however, I still want my service to continue processing the result of the other Mono stream, in the event that an error was encountered, so it can be logged.
The problem seems to be that as soon as a value is returned to the controller, a cancel signal is propagated back through the stream by Spring WebFlux and, thus, no information is logged about a failure.
Here's one attempt:
public Flux<Entity> createCustomer(final Customer customer) {
var cacheCreate = cacheClient
.createCustomer(customer)
.doOnError(WebClientResponseException.class,
err -> log.error("Customer creation failed in cache"));
var masterCreate = masterClient
.createCustomer(customer)
.doOnError(WebClientResponseException.class,
err -> log.error("Customer creation failed in master"));
return Flux.firstWithValue(cacheCreate, masterCreate)
.onErrorMap((err) -> new Exception("Customer creation failed in cache and master"));
}
Flux.firstWithValue() is great for emitting the first non-error value, but then whichever source is lagging behind is cancelled, meaning that any error is never logged out. I've also tried scheduling these two sources on their own Schedulers and that didn't seem to help either.
How can I perform these two calls asynchronously, and emit the first value to the caller, while continuing to listen for emissions on the slower source?
You can achieve that by transforming you operators to "hot" publishers using share() operator:
First subscriber launch the upstream operator, and additional subscribers get back result cached from the first subscriber:
Further Subscriber will share [...] the same result.
Once a second subscription has been done, the publisher is not cancellable:
It's worth noting this is an un-cancellable Subscription.
So, to achieve your requirement:
Apply share() on each of your operators
Launch a subscription on shared publishers to trigger processing
Use shared operators in your pipeline (here firstWithValue).
Sample example:
import java.time.Duration;
import reactor.core.publisher.Mono;
public class TestUncancellableMono {
// Mock a mono successing quickly
static Mono<String> quickSuccess() {
return Mono.delay(Duration.ofMillis(200)).thenReturn("SUCCESS !");
}
// Mock a mono taking more time and ending in error.
static Mono<String> longError() {
return Mono.delay(Duration.ofSeconds(1))
.<String>then(Mono.error(new Exception("ERROR !")))
.doOnCancel(() -> System.out.println("CANCELLED"))
.doOnError(err -> System.out.println(err.getMessage()));
}
public static void main(String[] args) throws Exception {
// Transform to hot publisher
var sharedQuick = quickSuccess().share();
var sharedLong = longError().share();
// Trigger launch
sharedQuick.subscribe();
sharedLong.subscribe();
// Subscribe back to get the cached result
Mono
.firstWithValue(sharedQuick, sharedLong)
.subscribe(System.out::println, err -> System.out.println(err.getMessage()));
// Wait for subscription to end.
Thread.sleep(2000);
}
}
The output of the sample is:
SUCCESS !
ERROR !
We can see that error message has been propagated properly, and that upstream publisher has not been cancelled.
In start, it works fine, but after a certain time (1-2 hours) it crashes with the following exception in server logs.
ERROR 1 --- [-ignite-server%] : JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.IgniteDeploymentCheckedException: Failed to obtain deployment for class: com.event.audit.AuditEventListener$$Lambda$1484/0x0000000800a7ec40]]
public static void remoteListener(Ignite ignite) {
// This optional local callback is called for each event notification
// that passed remote predicate listener.
IgniteBiPredicate<UUID, CacheEvent> locLsnr = new IgniteBiPredicate<UUID, CacheEvent>() {
#Override public boolean apply(UUID nodeId, CacheEvent evt) {
System.out.println("Listener caught an event");
//--- My custom code to persists the event in another cache
};
IgnitePredicate<CacheEvent> remoteListener = cacheEvent -> {
return true;
};
// Register event listeners on all nodes to listen for task events.
UUID lsnrId = ignite.events(ignite.cluster()).remoteListen(locLsnr, remoteListener, EVT_CACHE_OBJECT_PUT, EVT_CACHE_OBJECT_REMOVED);
}
}
As I understand you, you try to perform cache operations in event listener:
//--- My custom code to persists the event in another cache
Event listeners are called under the locks and this is bad idea to make any other cache operations in listeners. I suppose it could be the root cause of your issue.
Try to change you design, for example you can add your caught event in a queue and then read this queue in another thread and save the data in another cache.
I have a simple shell script that connect to GCP and try to pull Pub/Sub messages from a topic.
When launched, it check if any message exist, does a simple action if so, then ack the message and loop .
It looks like that :
while [ 1 ]
do
gcloud pubsub subscriptions pull...
// Do something
gcloud pubsub subscriptions ack ...
done
Randomly it does not pull the messages : they stay in the queue and are not pulled.
So we tried to add a while loop when getting the message with something like 5 re-try in order to avoid those issues work better but not perfectly. I also think that is a bit shabby...
This issue happened on other project that where migrated from a script shell to Java (for some other reasons) where we used a pull subscription and it work perfectly on those projects now !
We must probably do something wrong but I don't know what...
I have read that sometimes gcloud pull less messages than what's really on the pubsub queue :
https://cloud.google.com/sdk/gcloud/reference/pubsub/subscriptions/pull
But it must at least pull one ... In our case no messages are pulled but randomly.
Is there something to improve here ?
In general, relying on a shell script that uses gcloud to retrieve messages and do something with them is not going to be an efficient way to use Cloud Pub/Sub. It is worth noting that the lack of messages being returned in pull is not indicative of a lack of messages; it just means that messages could not be returned before the pull request's deadline. The gcloud subscriptions pull command sets the returnImmediately property (see info in pull documentation) to true, which basically means that if there aren't messages already quickly accessible in memory, then no messages are going to be returned. This property is deprecated and should not be set to true, so that is probably something that we need to explore changing in gcloud.
You would be better off writing a subscriber using the client libraries that sets up a stream and continuously retrieves messages. If your intention is to run this only periodically, then you could write a job that reads messages and waits some time after messages have not been received and shuts down. Again, this would not guarantee that all messages would be consumed that are available, but it would be true in most cases.
A version of this in Java would look like this:
import com.google.cloud.pubsub.v1.AckReplyConsumer;
import com.google.cloud.pubsub.v1.MessageReceiver;
import com.google.pubsub.v1.ProjectSubscriptionName;
import com.google.pubsub.v1.PubsubMessage;
import java.util.concurrent.atomic.AtomicLong;
import org.joda.time.DateTime;
/** A basic Pub/Sub subscriber for purposes of demonstrating use of the API. */
public class Subscriber implements MessageReceiver {
private final String PROJECT_NAME = "my-project";
private final String SUBSCRIPTION_NAME = "my-subscription";
private com.google.cloud.pubsub.v1.Subscriber subscriber;
private AtomicLong lastReceivedTimestamp = new AtomicLong(0);
private Subscriber() {
ProjectSubscriptionName subscription =
ProjectSubscriptionName.of(PROJECT_NAME, SUBSCRIPTION_NAME);
com.google.cloud.pubsub.v1.Subscriber.Builder builder =
com.google.cloud.pubsub.v1.Subscriber.newBuilder(subscription, this);
try {
this.subscriber = builder.build();
} catch (Exception e) {
System.out.println("Could not create subscriber: " + e);
System.exit(1);
}
}
#Override
public void receiveMessage(PubsubMessage message, AckReplyConsumer consumer) {
// Process message
lastReceivedTimestamp.set(DateTime.now().getMillis());
consumer.ack();
}
private void run() {
subscriber.startAsync();
while (true) {
long now = DateTime.now().getMillis();
long currentReceived = lastReceivedTimestamp.get();
if (currentReceived > 0 && ((now - currentReceived) > 30000)) {
subscriber.stopAsync();
break;
}
try {
Thread.sleep(5000);
} catch (InterruptedException e) {
System.out.println("Error while waiting for completion: " + e);
}
}
System.out.println("Subscriber has not received message in 30s. Stopping.");
subscriber.awaitTerminated();
}
public static void main(String[] args) {
Subscriber s = new Subscriber();
s.run();
System.exit(0);
}
}
I am using Masstransit with RabbitMQ. As part of some deployment procedure, At some point in time I need my service to disconnect and stop receiving any messages.
Assuming that I won't need the bus until the next restart of the service, will it be Ok to use bus.StopAsync()?
Is there a way to get list of end points and then remove them from listining ?
You should StopAsync the bus, and then when ready, call StartAsync to bring it back up (or start it at the next service restart).
To stop receiving messages without stopping the buss I needed a solution that will avoid the consume message pipeline from consuming any type of message. I tried with observers but unsuccessfully. My solution came up with custom consuming message filter.
The filter part looks like this
public class ComsumersBlockingFilter<T> :
IFilter<ConsumeContext<T>>
where T : class
{
public void Probe(ProbeContext context)
{
var scope = context.CreateFilterScope("messageFilter");
}
public async Task Send(ConsumeContext<T> context, IPipe<ConsumeContext<T>> next)
{
// Check if the service is degraded (true for this demo)
var isServiceDegraded = true;
if (isServiceDegraded)
{
//Suspend the message for 5 seconds
await Task.Delay(TimeSpan.FromMilliseconds(5000), context.CancellationToken);
if (!context.CancellationToken.IsCancellationRequested)
{
//republish the message
await context.Publish(context.Message);
Console.WriteLine($"Message {context.MessageId} has been republished");
}
// NotifyConsumed to avoid skipped message
await context.NotifyConsumed(TimeSpan.Zero, "messageFilter");
}
else
{
//Next filter in the pipe is called
await next.Send(context);
}
}
}
The main idea is to delay with cancellation token and the republish the message. After that call contect.NotifyConsumed to avoid the next pipeline filters and return normally.
I have a CRM plugin registered on Create (synchronous, post-operation) of a custom entity that performs some actions, and I want the Create operation to succeed in spite of errors in the plugin. For performance reasons, I also want the plugin to fire immediately when a record is created, so making the plugin asynchronous is undesirable. I've implemented this by doing something like the following:
public class FooPlugin : IPlugin
{
public FooPlugin(string unsecureInfo, string secureInfo) { }
public void Execute(IServiceProvider serviceProvider)
{
try
{
// Boilerplate
var context = (IPluginExecutionContext) serviceProvider.GetService(typeof (IPluginExecutionContext));
var serviceFactory = (IOrganizationServiceFactory) serviceProvider.GetService(typeof (IOrganizationServiceFactory));
IOrganizationService service = serviceFactory.CreateOrganizationService(context.UserId);
// Additional validation omitted
var targetEntity = (Entity) context.InputParameters["Target"];
UpdateFrobber(service, (EntityReference)targetEntity["new_frobberid"]);
CreateFollowUpFlibber(service, targetEntity);
CloseTheEntity(service, targetEntity);
}
catch (Exception ex)
{
// Send an email but do not re-throw the exception
// because we don't want a failure to roll-back the transaction.
try
{
SendEmailForException(ex, context);
}
catch { }
}
}
}
However, when an error occurs (e.g. in UpdateFrobber(...)), the service client receives this exception:
System.ServiceModel.FaultException`1[Microsoft.Xrm.Sdk.OrganizationServiceFault]:
There is no active transaction. This error is usually caused by custom plug-ins
that ignore errors from service calls and continue processing.
Server stack trace:
at System.ServiceModel.Channels.ServiceChannel.HandleReply(ProxyOperationRuntime operation, ref ProxyRpc rpc)
at System.ServiceModel.Channels.ServiceChannel.Call(String action, Boolean oneway, ProxyOperationRuntime operation, Object[] ins, Object[] outs, TimeSpan timeout)
at System.ServiceModel.Channels.ServiceChannelProxy.InvokeService(IMethodCallMessage methodCall, ProxyOperationRuntime operation)
at System.ServiceModel.Channels.ServiceChannelProxy.Invoke(IMessage message)
Exception rethrown at [0]:
at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)
at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(ref MessageData msgData, Int32 type)
at Microsoft.Xrm.Sdk.IOrganizationService.Create(Entity entity)
at Microsoft.Xrm.Sdk.Client.OrganizationServiceProxy.CreateCore(Entity entity)
at Microsoft.Xrm.Sdk.Client.OrganizationServiceProxy.Create(Entity entity)
at Microsoft.Xrm.Client.Services.OrganizationService.<>c__DisplayClassd.<Create>b__c(IOrganizationService s)
at Microsoft.Xrm.Client.Services.OrganizationService.InnerOrganizationService.UsingService(Func`2 action)
at Microsoft.Xrm.Client.Services.OrganizationService.Create(Entity entity)
at MyClientCode() in MyClientCode.cs: line 100
My guess is that this happens because UpdateFrobber(...) uses the IOrganizationService instance derived from the plugin, so any CRM service calls that it makes participate in the same transaction as the plugin, and if those "child" operations fail, it causes the entire transaction to rollback. Is this correct? Is there a "safe" way to ignore an error from a "child" operation in a synchronous plugin? Perhaps a way of instantiating an IOrganizationService instance that doesn't re-use the plugin's context?
In case it's relevant, we're running CRM 2013, on-premises.
You cannot ignore unhandled exceptions from child plugins when your plugin is participating in a database transaction.
However, when your plugin is operating On Premise in partial trusted mode, you can actually create a OrganizationServiceProxy instance of your own and use that to access CRM. Be sure you reference the server your plugin is executing on to avoid "double hop" problems.
If really needed, I would create an ExecuteMultipleRequest with ContinueOnError = true, for your email you could just check the ExecuteMultipleResponse...
But it looks a bit overkill.
You can catch exceptions if running in async mode. Be sure to verify your mode when catching the exception.
Sample Code:
try
{
ExecuteTransactionResponse response =
(ExecuteTransactionResponse)service.Execute(exMultReq);
}
catch (Exception ex)
{
errored = true;
if (context.Mode == 0) //0 sync, 1 Async.
throw new InvalidPluginExecutionException(
$"Execute Multiple Transaction
Failed.\n{ex.Message}\n{innermessage}", ex);
}
if(errored == true)
{
//Do more stuff to handle it, such as Log the failure.
}
It is not possible to do so for a synchronous plugin.
A more detailed summary, explaining the execution mode and use case can be found on my blog: https://helpfulbit.com/handling-exceptions-in-plugins/
Cheers.