Are there limitation on short durations when scheduling events from the Initially block in a MassTransit saga? - masstransit

I'm working on a POC for using MassTransit sagas to handle state changes in a system for grant applications. I'm using MassTransit 8.0.0-develop.394, .Net 6, EF Core 6.0.2 and ActiveMQ Artemis 1.19.0.
In the final solution the applicants can register their application and prepare the data for several weeks. A few days before the deadline another external system will be populated with data that will be used to validate the application data. Application data entered before the validation data is populated should just be scheduled for later validation, but data entered after should be validated immediately. I think MassTransit sagas with scheduled events looks like a good fit for this.
In the POC I just schedule the validation start time for some 10 seconds into the future from the program starts, and uses a shorter and shorter delay in the schedule until I just schedule it with a delay of TimeSpan.Zero.
From looking in the database I noticed that some of the schedule events somehow get lost when I run the POC with an empty saga repository, but everything works fine when I rerun the the program with existing sagas in the database. I use the same scheduling code in Initially and in DuringAny, which make me think that there might be some limitations on how short delay its safe to use when scheduling saga events?
Note 1: I've switched to not schedule the event in the saga when its less than 1 second to the valdation can be started, then I just publish the validation message directly, so this issue is not blocking me at the moment.
Note 2: I noticed this when running the POC from the command line and checking the database manually. I've tried to reproduce it in a test using the TestHarness, and also using ActiveMQ Artemis and InMemoryRepository, but with no luck. I've been able to reproduce it (more or less consistently) with a test using Artemis and EF Core Repository. I must admit that the test got quite complex with a lot of Task.Delay and other stuff, so it might be hard to follow the logic, but I can post it here if anyone think it's of any help.
Update 2 using Chris Pattersons recommendation about cfg.UseMessageRetry and cfg.UseInMemoryOutbox in the SagaDefinition and not on the bus.
Here is the updated code where MassTransit is configured
private static ServiceProvider BuildServiceProvider()
{
return new ServiceCollection()
.AddDbContext<MySagaDbContext>(builder =>
{
MySagaDbContextFactory.Apply(builder);
})
.AddMassTransit(cfg =>
{
cfg.AddDelayedMessageScheduler();
cfg.UsingActiveMq((context, config) =>
{
config.Host("artemis", 61616, configureHost =>
{
configureHost.Username("admin");
configureHost.Password("admin");
});
config.EnableArtemisCompatibility();
config.UseDelayedMessageScheduler();
config.ConfigureEndpoints(context);
});
cfg.AddSagaStateMachine<MyStateMachine, MySaga, MySagaDefinition<MySaga>>()
.EntityFrameworkRepository(x =>
{
x.ConcurrencyMode = ConcurrencyMode.Optimistic;
x.ExistingDbContext<MySagaDbContext>();
});
})
.AddLogging(configure =>
{
configure.AddFilter("MassTransit", LogLevel.Error); // Filter out all retry warnings
configure.AddFilter("Microsoft", LogLevel.None);
configure.AddSimpleConsole(options =>
{
options.UseUtcTimestamp = true;
options.TimestampFormat = "HH:mm:ss.fff ";
});
})
.BuildServiceProvider(true);
}
Here is the updated saga definition code
public class MySagaDefinition<TSaga> : SagaDefinition<TSaga> where TSaga : class, ISaga
{
protected override void ConfigureSaga(IReceiveEndpointConfigurator endpointConfigurator, ISagaConfigurator<TSaga> consumerConfigurator)
{
endpointConfigurator.UseMessageRetry(r => r.Intervals(10, 50, 100, 500, 1000));
endpointConfigurator.UseInMemoryOutbox();
}
}

If you are scheduling messages from a saga, or really producing any messages from a saga, you should always have the following middleware components configured:
cfg.UseMessageRetry(r => r.Intervals(50,100,1000));
cfg.UseInMemoryOutbox();
That will ensure that messages produced by the saga are:
Only produced if the saga is successfully saved to the repository
Produced after the saga has been saved to the repository
More details are available in the documentation.
The reason being, a short delay is likely delivering the message before it has been saved, and the scheduled event isn't correlating to an existing saga instance because it hasn't saved yet.

Related

Axon - Cannot emit query update in different microservice

I'm bothering with situation when I want to emit query update via queryUpdateEmitter but in different module (microservice). I have application built upon microservices and both are connected to the same Axon Server. First service creates subscriptionQuery, and sends some commands. After a while (through few commands and events) second service handles some event, and emits update for firstly subscribed query. Unfortunately it seems like this emit doesn't get to subscriber. Queries are exactly the same and sits in the same packages.
Subscription:
#GetMapping("/refresh")
public Mono<MovieDTO> refreshMovies() {
commandGateway.send(
new CreateRefreshMoviesCommand(UUID.randomUUID().toString()));
SubscriptionQueryResult<MovieDTO, MovieDTO> refreshedMoviesSubscription =
queryGateway.subscriptionQuery(
new GetRefreshedMoviesQuery(),
ResponseTypes.instanceOf(MovieDTO.class),
ResponseTypes.instanceOf(MovieDTO.class)
);
return refreshedMoviesSubscription.updates().next();
}
Emitter:
#EventHandler
public void handle(DataRefreshedEvent event) {
log.info("[event-handler] Handling {}, movieId={}",
event.getClass().getSimpleName(),
event.getMovieId());
queryUpdateEmitter.emit(GetRefreshedMoviesQuery.class, query -> true,
Arrays.asList(
MovieDTO.builder().aggregateId("as").build(),
MovieDTO.builder().aggregateId("be").build()));
}
This situation is even possible in the newest version of Axon? Similar configuration but within one service is working as expected.
#Edit
I have found a workardound for this situation:
Second service instead of emitting query via queryUpdateEmitter, publishes event with list of movies
First service handles this event and then emits update via queryUpdateEmitter
But still I'd like to know if there is a way to do this using queries only, because it seems natural to me (commandGateways/eventGateways works as expected, queryUpdateEmitter is the exception).
This follows from the implementation of the QueryUpdateEmitter (regardless of using Axon Server yes/no).
The QueryUpdateEmitter stores a set of update handlers, referencing the issued subscription queries. It however only maintains the issued subscription queries handled by the given JVM (as the QueryUpdateEmitter implementation is not distributed).
It's intent is to be paired in the component (typically a Query Model "projector") which answers queries about a given model, updates the model and emits those updates.
Hence, placing the QueryUpdateEmitter operations in a different (micro)service as where the query is handled will not work.

Using setTimeout() to Schedule pushes

Since Scheduled push is not available on Parse , I'm using setTimeout() to schedule pushes.
I'm using back4app.
// I call this cloud code
Parse.Cloud.define("pushMultiple",async (request) => {
//Using set timeout to send out a push 1 hour later
setTimeout(pushout,100000);
});
//The function to send Notificaiton
const pushout = () => {
Parse.Push.send({
channels: [ "t1g.com"],
data: {alert: "The Giants won against the Mets 2-3."}
},{ useMasterKey: true });
}
My code works fine. So my question is this:
1) Is my method reliable?
2) What can the disadvantages of this be ?
3) How many setTimeouts() can be queued on the server, is there any sort of limit ?
T.I.A
Why don't you use sheduled cron jobs? I believe back4app supports cron jobs. Save necessary push information to database. Then run a cloud code every "x" time. If push time is come your cloud code sends the push. SetTimeOut() method , I believe keeps the istance or reference of cloud code. Which means your cloud code is still "working" even its just waiting, Parse server should be keeping the instance of it. That means you wast your resources. Also I believe back4app has a cloud code timeout. Even you use setTimeOut() for one hour cloud code will be terminated after timeout.

Testing MassTransit endpoint configuration in Autofac module

I have some endpoint configuration code in an Autofac module that's registering consumers based on conventions that I'd like to unit test. I'm not trying to verify any behaviour of any consumers I just want to check that my setup code is doing what I need it to do. I'm using InMemoryTestHarness but consuming doesn't seem to be working and I'm not sure about the correlation between configuring the bus and registering consumer test harnesses.
To allow the host to be swapped between Rabbit for prod and in memory for tests I have this in my module:
Func<Action<IReceiveConfigurator>, IBusControl> BusFactory = receiveConfig => Bus.Factory.CreateUsingRabbitMq(cfg =>
{
cfg.Host(rabbitMqUrl, hostCfg =>
{
hostCfg.Username(rabbitMqUsername);
hostCfg.Password(rabbitMqPassword);
});
receiveConfig(cfg);
});
For the actual consumer registration in my module I have:
// code to scan assembly and build a list of queue definitions with consumers
...
// consumer registration
builder.AddMassTransit(x =>
{
foreach(var consumerType in consumerTypes)
x.AddConsumer(consumerType);
x.AddBus(context => BusFactory(cfg =>
{
foreach(var queueDef in queueDefs)
cfg.ReceiveEndpoint(queueDef.QueueName, e =>
{
foreach(var consumerDef in queueDef.ConsumerDefs)
e.ConfigureConsumer(context, consumerDef.ConsumerType);
});
});
});
For the unit test setup I am doing:
harness = new InMemoryTestHarness();
var module = new MassTransitModule(typeof(TestMessageConsumer).Assembly)
{
BusFactory = (receiveConfig) =>
{
harness.OnConfigureBus += cfg => receiveConfig(cfg);
Task.WaitAll(harness.Start());
return harness.BusControl;
}
};
var builder = new ContainerBuilder();
builder.RegisterModule(module);
container = builder.Build();
// ensure bus initialisation runs
container.Resolve<IBusControl>();
I've verified in the unit test that Autofac can resolve IBus, IBusControl and concrete consumer classes, as well as, given a message type T an IConsumer<T>.
In my tests, if I do:
await harness.InputQueueSendEndpoint.Send(new TestMessage());
harness.Consumed.Select<TestMessage>().Any().ShouldBeTrue();
then first the test waits on the harness.Consumed line for 30 seconds then the test fails (Any() returns false). I get the same behaviour if I register a consumer harness - plus I'm worried that registering a consumer harness doesn't actually verify my registration.
Have I misunderstood something with the test harness? How would I verify that my consumer config is correct? Is the harness.Consume line taking 30 seconds an indication that I've completely misused the test harness? So many questions...
Thanks,
Daniel
EDIT
Based on the comment from Chris Patterson I've updated my registration to use the MassTransit Autofac integration methods (code updated above) but still getting the same problem.
The test harness creates its own bus instance, and the Consumer, Saga, etc. methods add additional harnesses to that same test harness. If you're resolving a bus from the container as part of your test, you're stuck using that bus. The one in the harness is of no use to you, as are the methods in that harness.
You should separate the testing of your consumers from testing the container registration. And while you're at it, why not use the built-in container support for configuring endpoints, etc. instead of writing it yourself? I believe there is an extension method for .AddMassTransit to AddConsumersFromContainer where you specify the container. This makes it usable with previously loaded modules that added consumers to the container, where the bus is in its own module.

Performance Azure function with multiple output bindings

Hello all who read this,
We have written a router function on azure in an app plan that receives messages from iothub
and depending the message type we route our message to another eventhub.
Previously we had 6 out bindings to eventhubs in this function
Recently we added 3 more message type so 3 more out binding to 3 more eventhubs
No processing of the messages happen in this function but what we see now is that we spend 16 times more time in the routing function.
Is there a performance issue about having multiple output bindings.
We don't see an increase in load of the incoming messages.
We are running on azure functions 1.0 (Runtime version: 1.0.12205.0 (~1))
Regards Ben
Simplified Sample code of the routing function
public static class IotHubRouterFunction
{
[FunctionName("IotHubRouterFunction")]
public static void Run([EventHubTrigger("%iothub%", Connection = "IothubRouterListen")]EventData myEventHubData,
[EventHub("%msg1-eventhub%", Connection = "msg1event")] ICollector<EventData> eventHub4Dmsg1Event,
[EventHub("%msg2-eventhub%", Connection = "msg2event")] ICollector<EventData> eventHub4Dmsg2Event,
[EventHub("%msg3-eventhub%", Connection = "msg3event")] ICollector<EventData> eventHub4Dmsg3Event,
//... like 6 more bindings like this
ILogger logger
)
{
try
{
var messageType = GetValue(myEventHubData.Properties, "type");
// routing
switch (messageType)
{
case "msg1event":
{
eventHub4DevicesStatusChanged.Add(eventHub4Dmsg1Event);
break;
}
case "msg2event":
{
eventHub4MeasurementLog.Add(eventHub4Dmsg2Event);
break;
}
case "msg3event":
{
eventHub4DeviceDiscovered.Add(eventHub4Dmsg3Event);
break;
}
//6 more cases like this
default:
{
logger.LogError("Unrouteable message of type: {messageType}", messageType);
break;
}
}
}
catch (Exception ex)
{
//removed
}
}
}
With 6 bindings the message fly through the router function at 50ms
With 9 bindings the message crawl through the router function at 800ms
CPU raised with 30% as well on the applan (we scaled extra so we have it under control but why so much what is causing this)
A little late with the follow up of what happened
In the end we found out what was going on
We have several instances of our app plan
but the old monitoring solution showed the average of the cpu and memory overall the instances of the applan.
Basically with switching to the newer metrics and azure monitoring we were able to drill down in the separate instances of the app plan and the instances of the functions.
We found out that one instance of a function which was running three times two of them norammly but the third function had crashed it's internal apppool and consumed all cpu power it got hold off and did absolutely nothing.
We restarted the function and all issues were gone.
Still wondering if it was something in our code that made it go through the roof
or that something happened in azure that made it go crazy.
:-s
When you are using Azure Function under App service plan then you have to watch out for performance parameters like scaling. Have you investigated your function is not getting overloaded ?
On the other hand , As part of your design this approach is wrong to me. With this many bindings there could be potential performance issues , and what if you are supposed to add more bindings in future ? If you are not performing any operation then you shouldn't be taking overhead of redirecting messages.
Event Grid
We can use event grids for that. Based on topic the IoT hub publishes the event to a topic and events are consumed by subscribers in your case other event hubs. You also get advantage of micro billing (serverless) and auto scaling as well. https://learn.microsoft.com/en-us/azure/event-grid/overview

Laravel Scheduling in clustered environment

I am working with scheduling in Laravel 5.3. Previously, I was using one server to host the laravel application. Now that I am using two servers to run the Laravel App, how do I ensure that both servers are not running the same jobs at the same time?
Recently, I saw an Event method called "withoutOverlapping()". See https://laravel.com/docs/5.3/scheduling#preventing-task-overlaps
In my case, withoutOverlapping() cannot help me as I am working in a clustered environment.
Are there any workarounds or suggestions regarding this?
First of all, define if it is critical or not to avoid running task multiple times.
For example, if your app is using a task to do some sort of cleanup, there is almost no drawback to run it on every server (who care if you try to delete messages with +10 min twice?)
If it is absolutely critical to run every task only one time, you'll need to define a "main server" that will execute tasks, and a slave server that will just answer to requests but not perform any task. This is quite trivial as you just have to give every env a different name in your .env, and test against that when you define the scheduler tasks.
This is the easiest way, seriously don't bother making a database locking mecanism or whatever so you can synchronise tasks accross servers. Even OS's struggle to manage properly synchronisation against threads on the same machine, why do you want to implement the same accross different machines?
Here's what I've done when I ran into the same problems with load balancing:
class MutexCommand extends Command {
private $hash = null;
public function cleanup() {
if (is_string($this->hash)) {
Redis::del($this->hash);
$this->hash = null;
}
}
protected abstract function generateHash();
protected abstract function handleInternal();
public final function handle() {
register_shutdown_function([$this,"cleanup"]);
try {
$this->hash = $this->generateHash();
//Set a value if it does not exist atomically. Will fail if it does exist.
//Essentially setnx is the mechanism to acquire the lock
if (!Redis::setnx($this->hash,true)) {
$this->hash = null; //Prevent it from being cleaned up
throw new Exception("Already running");
}
$this->handleInternal();
} finally {
$this->cleanup();
}
}
}
Then you can write your commands:
class ThisShouldNotOverlap extends MutexCommand {
public function generateHash() {
return "Unique key for mutex, you can just use the class name if you want by doing return static::class";
}
public function handleInternal() { /* do stuff */ }
}
Then whenever you try to run the same command on multiple instances one would successfully acquire the "lock" and the others should fail.
Of course this assumes that you are using a non-clustered redis cache.
If you are not using redis then there's probably similar locking mechanisms you can implement in other caches, if you are using a clustered redis then you may need to use the RedLock locking mechanism
Essentially no, there's no a natural way using Laravel to know if another Laravel app have the same job on the job dispatcher.
We have some options there to find a solution:
Create a intermediate app that manages the jobs from the other apps.
Allow only one app to dispatch jobs.
Use worker queues, you have some packages for this, I would recommend to use Laravel 5 with WebSockets and Queue Asynchronously.
First of all Laravel scheduler isn't designed to work in a clustered environment. It was never intended to be that way.
I would suggest you should have a dedicated cron instance which manages your Laravel scheduler jobs.

Resources