How to republish events from event log? - event-sourcing

I see the notion of "republishing" events from event long mentioned everywhere, but it's not really described in detail.
The problem I am thinking of is the following. Certain Producer maintains an event log, and publishes every event to the Queue. Consumer connects to the queue and receives all the events produced.
Consider a case where there are two consumers (C1, C2) and one producer (P1).
Let's say that:
Producer P1 is started
Consumer C1 connects to the queue
P1 produces events E1, E2, E3
C1 consumes E1, E2, E3
Consumer C2 connects to the queue
P1 produces E4, E5, E6
C1 consumes E4, E5, E6
C2 consumes E4, E5, E6
At this point C2 has missed all events that previously happened! How does:
C2 request the events E1, E2, E3 be republished?
C2 avoid getting events out of order (i.e.getting E4 before getting E1, E2, E3)?
If anyone has some insights, much appreciated.

Here is what I am thinking.
Synchronizing Consumers with Producer
Consumer records last event sequence number received from a foreign context.
Records it in the aggregate table in the latest_event_per_context column (this only works for one foreign context) OR
Context relation that has a foreign key to aggregate table: Table(aggregate_id, context_name, sequence_number), where the sequence number is the latest event received in a given context.
This leaves this concern in application layer. OR
Records it as an event EventReceived(event, context) - leaves it to domain to discard duplicates
Producer sequences all the events it sends out.

Check out https://geteventstore.com/
When a new subscriber is connected it has the ability to replay a stream. GES offloads the duty of managing what has already been received to the clients.

Related

Delay start of consumer forces rebalance of group

We need to delay start of consumer.
Here's what we need:
Start consumer A (reading topic "xyz")
When consumer A will process all messages, we need to start consumer B (reading topic "zyx")
After reading this:
How to find no more messages in kafka topic/partition & reading only after writing to topic is done
We set idleEventInterval on containerProperties of consumer A:
containerProperties.setIdleEventInterval(30000L);
and on consumer B:
container.setAutoStartup(false);
then we have:
#EventListener
public void handleListenerContainerIdleEvent(ListenerContainerIdleEvent event) {
if(canStartContainer(event.getListenerId())) {
Optional.ofNullable(containers.get("container-a"))
.ifPresent(AbstractMessageListenerContainer::start);
}
}
We found that it's exactly what we need - it works fine, but we faced one problem: when consumer B is starting, it forces rebalance of all other consumers.
Can we avoid it?
Request joining group due to: group is already rebalancing
Revoke previously assigned partitions
(Re-)joining group
It's not a big issue, but we use ConsumerSeekAware to reset offset using seekToBeginning, so topic is read twice
You should not use the same group.id with consumers on different topics; it will cause an unnecessary rebalance, as you have found out.
Use different group.ids for consumers on different topics.

multiple Spring Kafka Consumer one class multiple groups

I am experimenting with Apache Kafka and Spring Boot.
Is there a way to set up multiple Listeners in one JVM, Each of them associated to a different group (so that they each respond to an event) on a topic?
Is there a good pattern for this?
#KafkaListener(topics ="${slappy.consumer.topic}", groupId = "t1, t2, t3, t4, t5, t6, t7, t8, t9, t10")
public void listenToGroup(#Payload WackyMessage message) {
log.info("Wacky Consumer with processItemId of received message: '{}'", message);
process(message, ProcessItemIdentifiers.omni_silo0_deleter);
}
Sounds like this is the sort of thing Akka is for. I was hoping I could just do naked spring. I am not sure If I want to add Akka into the mix at this point.
Your possible groupId = "t1, t2, t3, t4, t5, t6, t7, t8, t9, t10" solution is logically wrong. You still have the same listener method, so what is the point to call it so many times for the same record in Kafka topic?
If you would have different #KafkaListener methodd signatures, I would agree with you about different groups on them, so each of them would really receive the same message and it could be done in parallel as well, since each #KafkaListener blows up its own listener container process.
On the other hand you can leverage an #EventListener abstraction from Spring to distribute a single record from a single #KafkaListener to any possible number of subscribers to that event. In this case you don't need to think about different groups: the single one is enough to consumer from Kafka and process everything rest in the target application already.

Understanding reactor's FluxProcessor.wrap(upstream, downstream)

Processors (Subjects in RxJava) act both as Publishers and Subscribers, so they could subscribe to a Publisher and, in addition, be subscribed so that they pass the values they got from the top Subscriber:
Publisher
|
\/
Processor
|
\/
Subscriber
How does FluxProcessor.wrap() fit in this schema? For instance I would like to create a FluxProcessor with FluxProcessor.wrap that gets values from a Flux.range() and can be subscribed to get values.

Spring AMQP RabbitMQ how to make two parallel consumers will not grab same task at the same time?

I have two systems are integrated with RabbitMQ.
BackGround
Client send multiple request messages from Spring-AMQP outbound-Gateway to RabbitMQExchange, the rabbitmq-DirectExchange will use round-robin dispatching those messages to Multiple Workers(Those workers are independent located on different desktops which will parallel running same worker code for processing different messages from RabbitExchange by use simpleMessageListner.
Logic Flow
Similiar with Rabbitmq Tutorial multiWorker-DirectExchange.
Client-----sendRequests (5tasks) to ---->RabbitMQ-DirectExchange
then Rabbitmq-DirectExchange distribute those 5 tasks to workers
PC1 ( Worker1 ), PC2 ( Worker2 )
ExchangeType & my Bindings
<!-- rabbit connection factory, rabbit template, and rabbit admin -->
<rabbit:connection-factory
id="connectionFactory"
host="local IP address"
username="guest"
password="guest"
channel-cache-size="10" />
<rabbit:template id="amqpTemplate"
connection-factory="connectionFactory"
reply-timeout="600000"
exchange="JobRequestDirectExchange"/>
<rabbit:admin connection-factory="connectionFactory" id="rabbitAdmin" />
<rabbit:direct-exchange name="taskRequests"
auto-delete="false"
durable="true" >
<rabbit:bindings>
<rabbit:binding queue="jobRequests" key="request.doTask" />
</rabbit:bindings>
</rabbit:direct-exchange>
<rabbit:queue name="jobRequests" auto-delete="false" durable="true" />
Worker-The consumer Configuration
<rabbit:listener-container id="workerContainer"
acknowledge="auto"
prefetch="1"
connection-factory="connectionFactory">
<rabbit:listener ref="taskWorker" queue-names="jobRequests" />
</rabbit:listener-container>
The Worker Class is simple POJO who will process the request and complete task.
Use: RabbitMQ 3.2.2 with Spring-Integration-Amqp 2.2
What I expect
I expect that Worker1 can receive some of tasks while Worker2 can pick the rest of tasks ( the other tasks ).
I wish workers can parallel together do whole 5 tasks. Each time each worker only do one task, after finish will be distribute another tasks one by one. (rabbit-listner has been set to prefetch=1)
Such as
worker1: t2 t3 t5
worker2: t1 t4
But
After lots of runtime-test, sometime it do the task correctly.
Worker1------task4 task1
Worker2------task3 task2 task5
While sometime it do the wrong way like this:
Worker1------task4 task1
Worker2------task4 task2 task1
Aparently, the task4 and task1 are be picked by worker1 and worker2 sametime.
Runtime test:
I checked that the client correctly send out task1 task2 task3 task4 task5 request message to RabbitExchange. But everytime each worker receive different tasks. There is a common case that may trigger wrong dispathcing.
There are 5tasks (t1,t2,t3,t4,t5) at RabbitmqExchange, and they will be send to 2 parallel workers (w1,w2).
w1 got tasks: t2 t1 t4
w2 got tasks: t3 t1
As Round-Robin dispatch method, w1 and w2 in sequence got tasks.
w1 got t2 and w2 got t3.
While t2 and t3 running, RabbitmqExchange send t1 to w1 and wait for ack from w1.
Suppose t2 spend more time to finish task than t3 does and w2 is free when w1 doing t1.
w2 finish t3 task will receive RabbitmqExchange dispatched t1 because w2 is not busy and RabbitExchange did not receive t1 finished task ack message.
My understanding is
Both w1 and w2 are doing same task t1. Either one of them once finish t1 will send back ack to RabbitmqExchange, then RabbitmqExchange will dequeue one task message. As t1 has been finished twice, RabbitmqExchange dequeue one more message that it should. So in this way t5 message has been dequeued because t1 has been done twice. Although 5 messages in RabbitmqExchange are acked and dequeue finish. But two workers missing do t5 and do t1 twice.
What should I do that can prevent two parallel workers grab the same message from a same Rabbit queue?
I tried auto-ack way, the message are correctly acked. But during the time of server wait for worker's ack, rabbitmq may redispatch the message which is not-acked but already been distributed to another worker.
Also thinking about synchronized the sent out messages or give priority to sent out messages. But do not have clear vision how to accomplish.
I am grateful to hear any ideas about this problem.Thanks
One thing I can think that is causing this duplicated messages for your consumers is when a consumer closes the channel before sending an ack message.
In that case, the RabbitMQ broker will requeue the message and set it's redelivered flag to true. From RabbitMQ docs:
If a message is delivered to a consumer and then requeued (because it was not acknowledged before the consumer connection dropped, for example) then RabbitMQ will set the redelivered flag on it when it is delivered again (whether to the same consumer or a different one). This is a hint that a consumer may have seen this message before (although that's not guaranteed, the message may have made it out of the broker but not into a consumer before the connection dropped). Conversely if the redelivered flag is not set then it is guaranteed that the message has not been seen before. Therefore if a consumer finds it more expensive to deduplicate messages or process them in an idempotent manner, it can do this only for messages with the redelivered flag set.
If when you are testing you close one of the worker processes before sending an ack, or in case they fault, this is very likely to happen. You can try to examine the redelivered flag in order to avoid it to be processed again by a different consumer, if that is the case.
Another thing I've noticed is the prefetch setting in your consumer configuration. You should set this to a higher value (tune it for your needs) instead of leaving it at just 1. You can learn more about prefetch here.
Hope that helps!
I tried long time to work out SpringConfigured-way to implement this feature but failed.
While I come out the workable solution using RabbitMQ Java Client API.
Using Spring-Asynchronous Gateway with QuartzScheduler, it always have problem send message as needed. I guess it has reason for multi-threads sort of.
At the beginning, I thought it because of that the Channel instance may accessed concurrently by multiple threads. In this way the confirms are not handled properly.
An important caveat to this is that confirms are not handled properly when a Channel is shared between multiple threads. In that scenario, it is therefore important to ensure that the Channel instance is not accessed concurrently by multiple threads.
Above from http://www.rabbitmq.com/javadoc/com/rabbitmq/client/Channel.html
Finally, I decide give up use Spring-way and change back to use RabbitMQ API(Before I use Spring XML configure the gateway/channels, now use RabbitMQ-JavaClient java programming way declare exchange with channels.). And add usage of RabbitMQRPC for asynchronous callback. Now everything works fine for current requirement.
So in summary, the final solution for my requirement is:
Use RabbitMQ JAVAClient API to declare exchange/channels/binding/routingKey.
For both client and server side.
Use RabbitMQ RPC for implement asynchronous callback feature.
(I follow RabbitMQ's java tutorial, use this link: http://www.rabbitmq.com/tutorials/tutorial-six-java.html)
Did you try setting concurrentConsumers property on the listener container as discussed here?

[BOOST][MSM] Can Guard be a post condition of the Action, instead of precondition

I am new to MSM, and also UML state machine standards as well. I had some state machine design before, using State Design Pattern, but this time I want to learn to use BOOST MSM, instead of cooking things up again.
One thing that really confused me a lot is the Guard. I want to do this, in State S1, I receive a event E1, then perform some Action A1, based on the result of action A1, I should either transit to new State S2, or stay in same state S1.
Using MSM, I cannot specify Guard G1 to be the result of Action A1, as in MSM's concept, G1 is the precondition whether A1 should be executed or not, rather than a result of executing A1.
Two solutions I can think of are:
Introduce a pseudo choice state, post_S1, where in its on_entry I perform the Action A1, and have a guard G1 testing the result of this action, then either go back to S1, or proceed to S2.
// Start Event Action Next Guard
S1 E1 none post_S1 none
post_S1 none none S2 G1
post_S1 none none S1 G1'(which is reverse of G1)
2.
Move Action A1 code to Guard G1 (Afterall, A1 is a function call, which I can make it return boolean). so basically my transition row would be
// Start Event Action Next Guard
S1 E1 none S2 G1=A1
Am I using MSM right? Is there any better practice for solving this problem? In my application, I would have A LOT of these pseudo choice states, which I really tries to avoid.
Thanks!
Zongjun
This is what the UML Standard defines, guards are preconditions.
You have several ways to your goal, my personal taste in this case would be:
Within State S1, add an internal transition on event E1.
This transition would have A1 as an action. Within A1, execute the action, then check the result.
If result means "stay where you are", stop
Else call (still within A1) fsm.template process_event(E2); where E2 is a new event moving you to S2.
I suggest this way because it will save you some compile-time, states are expensive ;-)
This is the easiest way. Again, there are others, like using eUML to make A1 return a result, then adding a if_ in the transition table, but this is much more advanced.
HTH,
Christophe
// Start Event Action Next Guard
S1 E1 none S2 Result_of_A1
Inside the Guard function itself, we are performing the action A1, then returning True or False based on A1's result. This way, if guard is false, then stay in S1; otherwise, move to S2.
This saves the pseudo choice state, which is useful but if there are a lot of states where its transition to next state is a "post condition" rather than "pre condition", then there will be lots of these pseudo choice states inside the transition table. Compared to the above transition table, it is more messy.

Resources