I have a task that is potentially long running (hours). The task is performed by multiple workers (AWS ECS instances in my case) that read from a message queue (AWS SQS in my case). I have multiple users adding messages to the queue. The problem is that if Bob adds 5000 messages to the queue, enough to keep the workers busy for 3 days, then Alice comes along and wants to process 5 tasks, Alice will need to wait 3 days before any of Alice's tasks even start.
I would like to feed messages to the workers from Alice and Bob at an equal rate as soon as Alice submits tasks.
I have solved this problem in another context by creating multiple queues (subqueues) for each user (or even each batch a user submits) and alternating between all subqueues when a consumer asks for the next message.
This seems, at least in my world, to be a common problem, and I'm wondering if anyone knows of an established way of solving it.
I don't see any solution with ActiveMQ. I've looked a little at Kafka with it's ability to round-robin partitions in a topic, and that may work. Right now, I'm implementing something using Redis.
I would recommend Cadence Workflow instead of queues as it supports long running operations and state management out of the box.
In your case I would create a workflow instance per user. Every new task would be sent to the user workflow via signal API. Then the workflow instance would queue up the received tasks and execute them one by one.
Here is a outline of the implementation:
public interface SerializedExecutionWorkflow {
#WorkflowMethod
void execute();
#SignalMethod
void addTask(Task t);
}
public interface TaskProcessorActivity {
#ActivityMethod
void process(Task poll);
}
public class SerializedExecutionWorkflowImpl implements SerializedExecutionWorkflow {
private final Queue<Task> taskQueue = new ArrayDeque<>();
private final TaskProcesorActivity processor = Workflow.newActivityStub(TaskProcesorActivity.class);
#Override
public void execute() {
while(!taskQueue.isEmpty()) {
processor.process(taskQueue.poll());
}
}
#Override
public void addTask(Task t) {
taskQueue.add(t);
}
}
And then the code that enqueues that task to the workflow through signal method:
private void addTask(WorkflowClient cadenceClient, Task task) {
// Set workflowId to userId
WorkflowOptions options = new WorkflowOptions.Builder().setWorkflowId(task.getUserId()).build();
// Use workflow interface stub to start/signal workflow instance
SerializedExecutionWorkflow workflow = cadenceClient.newWorkflowStub(SerializedExecutionWorkflow.class, options);
BatchRequest request = cadenceClient.newSignalWithStartRequest();
request.add(workflow::execute);
request.add(workflow::addTask, task);
cadenceClient.signalWithStart(request);
}
Cadence offers a lot of other advantages over using queues for task processing.
Built it exponential retries with unlimited expiration interval
Failure handling. For example it allows to execute a task that notifies another service if both updates couldn't succeed during a configured interval.
Support for long running heartbeating operations
Ability to implement complex task dependencies. For example to implement chaining of calls or compensation logic in case of unrecoverble failures (SAGA)
Gives complete visibility into current state of the update. For example when using queues all you know if there are some messages in a queue and you need additional DB to track the overall progress. With Cadence every event is recorded.
Ability to cancel an update in flight.
See the presentation that goes over Cadence programming model.
Related
We have 5 topics and we want to have a service that scales for example to 5 instances of the same app.
This would mean that i would want to dynamically (via for example Redis locking or similar mechanism) determine which instance should listen to what topic.
I know that we could have 1 topic that has 5 partitions - and each node in the same consumer group would pick up a partition. Also if we have a separately deployed service we can set the topic via properties.
The issue is that those two are not suitable for our situation and we want to see if it is possible to do that via what i explained above.
#PostConstruct
private void postConstruct() {
// Do logic via redis locking or something do determine topic
dynamicallyDeterminedVariable = // SOME LOGIC
}
#KafkaListener(topics = "{dynamicallyDeterminedVariable")
void listener(String data) {
LOG.info(data);
}
Yes, you can use SpEL for the topic name.
#{#someOtherBean.whichTopicToUse()}.
Below is a sample POC developed in ASP.net Core 6.0 API that uses MassTransit and RabbitMQ to simulate a simple publish/subscribe using MassTransit consumer. However when the code is executed it results in creation of 2 Exchanges and 1 Queue in RabbitMQ.
Program.cs
builder.Services.AddMassTransit(msConfig =>
{
msConfig.AddConsumers(Assembly.GetEntryAssembly());
msConfig.UsingRabbitMq((hostcontext, cfg) =>
{
cfg.Host("localhost", 5700, "/", h =>
{
h.Username("XXXXXXXXXXX");
h.Password("XXXXXXXXXXX");
});
cfg.ConfigureEndpoints(hostcontext);
});
});
OrderConsumer.cs
public class OrderConsumer : IConsumer<OrderDetails>
{
readonly ILogger<OrderConsumer> _logger;
public OrderConsumer(ILogger<OrderConsumer> logger)
{
_logger = logger;
}
public Task Consume(ConsumeContext<OrderDetails> context)
{
_logger.LogInformation("Message picked by OrderConsumer. OrderId : {OrderId}", context.Message.OrderId);
return Task.CompletedTask;
}
}
Model
public class OrderDetails
{
public int OrderId { get; set; }
public string OrderName { get; set; }
public int Quantity { get; set; }
}
Controller
readonly IPublishEndpoint _publishEndpoint;
[HttpPost("PostOrder")]
public async Task<ActionResult> PostOrder(OrderDetails orderDetails)
{
await _publishEndpoint.Publish<OrderDetails>(orderDetails);
return Ok();
}
Output from Asp.Net
As highlighted 2 Exchanges are created Sample:OrderDetails and Order.
However, the Sample:OrderDetails is bound to Order (Exchange)
And the Order (Exchange) routes to "Order" queue.
So, the question is regarding the 2 Exchanges that got created where I am not sure if that's per design or its a mistake on the code that led to both getting created and if its per design, why the need for 2 exchange.
I was pondering the same question when I first started playing with MassTransit, and in the end came to understand it as follows:
You are routing two types of messages via MassTransit, events and commands. Events are multicast to potentially multiple consumers, commands to a single consumer. Every consumer has their own input queue to which messages are being routed via exchanges.
For every message type, MassTransit by default creates one fanout exchange based on the message type and one fanout exchange and one queue for every consumer of this message.
This makes absolute sense for events, as you are publishing events using the event type (with no idea who or if anyone at all will consume it), so in your case, you publish to the OrderDetails exchange. MassTransit has to make sure that all consumers of this event are bound to this exchange. In this case, you have one consumer, OrderConsumer. MassTransit by default generates the name of the consumer exchange based on the type name of this consumer, removing the Consumer suffix. The actual input queue for this consumer is bound to this exchange.
So you get something like this:
EventTypeExchange => ConsumerExchange => ConsumerQueue
or in your case:
Sample:OrderDetails (based on the type Sample.OrderDetails) => Order (based on the type OrderConsumer) => Order (again based on the OrderConsumer type)
For commands this is a bit less obvious, because a command can only ever be consumed by one consumer. In fact you can actually tell MassTransit not to create the exchanges based on the command type. However, what you would then have to do is route commands not based on the command type, but on the command handler type, which is really not a good approach as now you would have to know - when sending a command - what the type name of the handler is. This would introduce coupling that you really do not want. Thus, I think it's best to keep the exchanges based on the command type and route to them, based on the command type.
As Chriss (author of MassTransit) mentions in the MassTransit RabbitMQ deep dive video (YouTube), this setup also allows you to potentially do interesting stuff like siphon off messages to another queue for monitoring/auditing/debugging, just by creating a new queue and binding it to the existing fanout exchange.
All the above is based on me playing with the framework, so it's possible I got some of this wrong, but it does make sense to me at least. RabbitMQ is extremely flexible with its routing options, so Chriss could've chosen a different approach (e. g. Brighter, a "competing" library uses RabbitMQ differently to achieve the same result) but this one has merit as well.
MassTransit also - unlike some other frameworks like NServiceBus or Brighter - doesn't really technically distinguish or care about the semantic difference between these two, e. g. you can just as well send or publish a command just as you can an event.
I have a scenario where we need to keep on polling a database table for all active users and perform an api call to fetch any unread emails from their inbox. My approach is to use two verticles, one for polling and another for fetching emails for an user. The first verticle when found an user, sends a message(userId) to the second verticle through an event bus to fetch emails. That way, I can increase the number of second verticle instances required when there are lots of users.
Following two ways I found I can use to poll the database for active users and then perform an api call for each user.
vertx.setPeriodic
vertx.executeBlocking
But in the manual, its mentioned that for long running/polling tasks, its better to create an application managed thread to handle the task.
Is my approach for the problem correct, or is there a better approach to solve the problem at hand?
If I go through an application managed thread, can you please help illustrate with an example.
Thanks.
You can create a dedicated worker thread pool for that, and run your periodic tasks on it:
public class PeriodicWorkerExample {
public static void main(String[] args) {
Vertx vertx = Vertx.vertx();
vertx.deployVerticle(new MyPeriodicWorker(), new DeploymentOptions()
.setWorker(true)
.setWorkerPoolSize(1)
.setWorkerPoolName("periodic"));
}
}
class MyPeriodicWorker extends AbstractVerticle {
#Override
public void start() {
vertx.setPeriodic(1000, (r) -> {
System.out.println(Thread.currentThread().getName());
});
}
}
I am trying to define a monitor in which I receieve events and then handle them on multiple contexts (roughly equating to threads if I understand correctly) I know I can write
spawn myAction() to myNewContext;
and this will run that action in the new context.
However I want to have an action which will respond to an event when it comes into my monitor:
on all trigger() as t {
doMyThing()
}
on all otherTrigger() as ot {
doMyOtherThing()
}
Can I define my on all in a way that uses a specific context? Something like
on all trigger() as t in myContext {
doMyThing()
}
on all otherTrigger() as t in myOtherContext {
doMyOtherThing()
}
If not what is the best way to define this in Apama EPL? Also could I have multiple contexts handling the same events when they arrive, round robin style?
Apama events from external receivers (ie the outside world) are delivered only to public contexts, including the 'main' context. So depending on your architecture, you can either spawn your action to a public context
// set the receivesInput parameter to true to make this context public
spawn myAction() to context("myContext", true);
...
action myAction() {
on all trigger() as t {
doMyThing();
}
}
or, spawn your action to a private context and set up an event forwarder in a public context, usually the main context (which will always exist)
spawn myAction() to context("myNewContext");
on all trigger() as t {
send t to "myChannel"; // forward all trigger events to the "myChannel" channel
}
...
action myAction() {
monitor.subscribe("myChannel"); // receive all events delivered to the "myChannel" channel
on all trigger() as t {
doMyThing();
}
}
Spawning to a private context and leveraging the channels system is generally the better design as it only sends events to contexts that care about them
To extend a bit on Madden's answer (I don't have enough rep to comment yet), the private context and forwarders is also the only way to achieve true round-robin: otherwise all contexts will receive all events. The easiest approach is to use a partitioning strategy (e.g. IDs ending in 0 go to context-0, or you have one context per machine you're monitoring, etc.), because then each concern is tracked in the same context and you don't have to share state.
Also could I have multiple contexts handling the same events when they arrive, round robin style?
This isn't entirely clear to me. What benefit are you aiming for here? If you're looking to reduce latency by having the "next available" context pick up the event, this probably isn't the right way to achieve it - the deciding which context processes the event means you'd need inter-context communications and coordination, which will increase latency. If you want multiple contexts to process the same events (e.g. one context runs your temperature spike rule, and another runs your long-term temperature average rule, but both take temperature readings as inputs), then that's a good approach but it's not what I'd have called round-robin.
I am currently working with an implementation based on:
org.springframework.integration.support.leader.LockRegistryLeaderInitiator
Having multiple candidate roles so that only one application instance within the cluster is elected as leader for each role. During initialisation of the cluster if autoStartup property is set to true the first application instance that is initialised will be elected as leader for all roles. This is something that we want to avoid and instead have a fair distribution of the lead roles across the cluster.
One possible solution on the above might be that when the cluster is ready and properly initialised then invoke an endpoint that will execute:
lockRegistryLeaderInitiator.start()
For all instances in the cluster so that the election process starts and the roles are fairly distributed across instances. One drawback on that is that this needs to be part of the deployment process, adding somehow complexity.
What is the proposed best practice on the above? Are there any plans for additional features related? For example to autoStartup the leader election only when X application instances are available?
I suggest you to take a look into the Spring Cloud Bus project. I don't know its details, but looks like your idea about autoStartup = false for all the LockRegistryLeaderInitiator instances and their startup by some distributed event is the way to go.
Not sure what we can do for you from the Spring Integration perspective, but it fully feels like not its responsibility and all the coordinations and rebalancing should be done via some other tool. Fortunately all our Spring projects can be used together as a single platform.
I think with the Bus you even really can track the number of instances joined the cluster and decide your self when and how to publish StartLeaderInitiators event.
It would be relatively easy with the Zookeeper LeaderInitiator because you could check in zookeeper for the instance count before starting it.
It's not so easy with the lock registry because there's no inherent information about instances; you would need some external mechanism (such as zookeeper, in which case, you might as well use ZK).
Or, you could use something like Spring Cloud Bus (with RabbitMQ or Kafka) to send a signal to all instances that it's time to start electing leadership.
I find very simple approach to do this.
You could add scheduled task to each node which periodically tries to yield leaderships if node holds too many of them.
For example, if you have N nodes and 2*N roles and you want to achieve completely fair leadership distribution (each node tries to hold only two leaderships) you can use something like this:
#Component
#RequiredArgsConstructor
public class FairLeaderDistributor {
private final List<LeaderInitiator> initiators;
#Scheduled(fixedDelay = 300_000) // once per 5 minutes
public void yieldExcessLeaderships() {
initiators.stream()
.map(LeaderInitiator::getContext)
.filter(Context::isLeader)
.skip(2) // keep only 2 leaderships
.forEach(Context::yield);
}
}
When all nodes will be up, you will eventually get completely fair leadership distribution.
You can also implement dynamic distribution based on current active node count if you use Zookeeper LeaderInitiator implementation.
Current number of participants can be easily retrieved from Curator LeaderSelector::getParticipants method.
You can get LeaderSelector with reflection from LeaderInitiator.leaderSelector field.
#Slf4j
#Component
#RequiredArgsConstructor
public class DynamicFairLeaderDistributor {
final List<LeaderInitiator> initiators;
#SneakyThrows
private static int getParticipantsCount(LeaderInitiator leaderInitiator) {
Field field = LeaderInitiator.class.getDeclaredField("leaderSelector");
field.setAccessible(true);
LeaderSelector leaderSelector = (LeaderSelector) field.get(leaderInitiator);
return leaderSelector.getParticipants().size();
}
#Scheduled(fixedDelay = 5_000)
public void yieldExcessLeaderships() {
int rolesCount = initiators.size();
if (rolesCount == 0) return;
int participantsCount = getParticipantsCount(initiators.get(0));
if (participantsCount == 0) return;
int maxLeadershipsCount = (rolesCount - 1) / participantsCount + 1;
log.info("rolesCount={}, participantsCount={}, maxLeadershipsCount={}", rolesCount, participantsCount, maxLeadershipsCount);
initiators.stream()
.map(LeaderInitiator::getContext)
.filter(Context::isLeader)
.skip(maxLeadershipsCount)
.forEach(Context::yield);
}
}