Transactional outbox pattern vs ChainedKafkaTransactionManager in Microservices - spring-boot

Using Spring-Kafkas ChainedKafkaTransactionManager I cannot see any point in implementing the transactional outbox pattern in a Spring Boot microservices context.
Putting message producer (i.e. KafkaTemplate's send method) and DB operation in the same transactional block solves exactly the problem that should be solved by the outbox pattern:
If any exception is raised in the transactional code neither the db op is commited nor the message is read on the consumer side (configured with read_committed)
This way I dont need an additional table nor any type of CDC code. In summary the Spring Kafka way of transaction synchronization seems much easier to use and implement to me than any implementation of transactional outbox pattern.
Am I missing anything?
public ChainedKafkaTransactionManager chainedTransactionManager(
JpaTransactionManager transactionManager,
KafkaTransactionManager kafkaTransactionManager) {
ChainedKafkaTransactionManager chainedKafkaTransactionManager =
new ChainedKafkaTransactionManager<>(transactionManager,
kafkaTransactionManager);
return chainedKafkaTransactionManager;
}
#Bean
#Primary
public JpaTransactionManager transactionManager(EntityManagerFactory
entityManagerFactory) {
JpaTransactionManager jpaTransactionManager =
new JpaTransactionManager(entityManagerFactory);
return jpaTransactionManager;
}
#Bean
public KafkaTransactionManager<Object, Object>
kafkaTransactionManager(ProducerFactory producerFactory) {
KafkaTransactionManager kafkaTransactionManager =
new KafkaTransactionManager<>(producerFactory);
return kafkaTransactionManager;
}
#Transactional(value = "chainedTransactionManager")
public Customer createCustomer(Customer customer) {
customer = customerRepository.save(customer);
kafkaTemplate.send("customer-created-topic","Customer created");
return customer;
}

I think it doesn't give you the same level of safety. What if something fails between Kafka commit and DB commit.
https://medium.com/dev-genius/transactional-integration-kafka-with-database-7eb5fc270bdc

you get a weaker guarantee if a data you are trying to update is external to kafka.
Note that exactly-once semantics is guaranteed within the scope of Kafka Streams’ internal processing only; for example, if the event streaming app written in Streams makes an RPC call to update some remote stores, or if it uses a customized client to directly read or write to a Kafka topic, the resulting side effects would not be guaranteed exactly once.
https://www.confluent.fr/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/

Related

Spring Batch Parallel processing with JMS

I implemented a spring batch project that reads from a weblogic Jms queue (Custom Item Reader not message driven), then pass the Jms message data to an item writer (chunk = 1) where i call some APIs and write in DataBase.
However, i am trying to implement parallel Jms processing, reading in parallel Jms messages and passing them to the writer without waiting for the previous processes to complete.
I’ve used a DefaultMessageListenerContainer in a previous project and it offers a parallel consuming of jms messages, but in this project i have to use the spring batch framework.
I tried using the easiest solution (multi-threaded step) but it
didn’t work , JmsException : "invalid blocking receive when another
receive is in progress" which means probably that my reader is
statefull.
I thought about using remote partitioning but then i have to read all
messages and put the data into step execution contexts before calling
the slave steps, which isn't really efficient if dealing with a large
number of messages.
I looked a little bit into remote chunking, i understand that it passes data via queue channels, but i can't seem to find the utility in reading from a Jms and putting messages in a local queue for slave workers.
How can I approach this?
My code:
#Bean
Step step1() {
return steps.get("step1").<Message, DetectionIncoherenceLiqJmsOut>chunk(1)
.reader(reader()).processor(processor()).writer(writer())
.listener(stepListener()).build();
}
#Bean
Job job(#Qualifier("step1") Step step1) {
return jobs.get("job").start(step1).build();
}
Jms Code :
#Override
public void initQueueConnection() throws NamingException, JMSException {
Hashtable<String, String> properties = new Hashtable<String, String>();
properties.put(Context.INITIAL_CONTEXT_FACTORY, env.getProperty(WebLogicConstant.JNDI_FACTORY));
properties.put(Context.PROVIDER_URL, env.getProperty(WebLogicConstant.JMS_WEBLOGIC_URL_RECEIVE));
InitialContext vInitialContext = new InitialContext(properties);
QueueConnectionFactory vQueueConnectionFactory = (QueueConnectionFactory) vInitialContext
.lookup(env.getProperty(WebLogicConstant.JMS_FACTORY_RECEIVE));
vQueueConnection = vQueueConnectionFactory.createQueueConnection();
vQueueConnection.start();
vQueueSession = vQueueConnection.createQueueSession(false, 0);
Queue vQueue = (Queue) vInitialContext.lookup(env.getProperty(WebLogicConstant.JMS_QUEUE_RECEIVE));
consumer = vQueueSession.createConsumer(vQueue, "JMSCorrelationID IS NOT NULL");
}
#Override
public Message receiveMessages() throws NamingException, JMSException {
return consumer.receive(20000);
}
Item reader :
#Override
public Message read() throws Exception {
return jmsServiceReceiver.receiveMessages();
}
Thanks ! i'll appreciate the help :)
There's a BatchMessageListenerContainer in the spring-batch-infrastructure-tests sub project.
https://github.com/spring-projects/spring-batch/blob/d8fc58338d3b059b67b5f777adc132d2564d7402/spring-batch-infrastructure-tests/src/main/java/org/springframework/batch/container/jms/BatchMessageListenerContainer.java
Message listener container adapted for intercepting the message reception with advice provided through configuration.
To enable batching of messages in a single transaction, use the TransactionInterceptor and the RepeatOperationsInterceptor in the advice chain (with or without a transaction manager set in the base class). Instead of receiving a single message and processing it, the container will then use a RepeatOperations to receive multiple messages in the same thread. Use with a RepeatOperations and a transaction interceptor. If the transaction interceptor uses XA then use an XA connection factory, or else the TransactionAwareConnectionFactoryProxy to synchronize the JMS session with the ongoing transaction (opening up the possibility of duplicate messages after a failure). In the latter case you will not need to provide a transaction manager in the base class - it only gets on the way and prevents the JMS session from synchronizing with the database transaction.
Perhaps you could adapt it for your use case.
I was able to do so with a multithreaded step :
// Jobs et Steps
#Bean
Step stepDetectionIncoherencesLiq(#Autowired StepBuilderFactory steps) {
int threadSize = Integer.parseInt(env.getProperty(PropertyConstant.THREAD_POOL_SIZE));
return steps.get("stepDetectionIncoherencesLiq").<Message, DetectionIncoherenceLiqJmsOut>chunk(1)
.reader(reader()).processor(processor()).writer(writer())
.readerIsTransactionalQueue()
.faultTolerant()
.taskExecutor(taskExecutor())
.throttleLimit(threadSize)
.listener(stepListener())
.build();
}
And a jmsItemReader with jmsTemplate instead of creating session and connections explicitly, it manages connections so i dont have the jms exception anymore:( JmsException : "invalid blocking receive when another receive is in progress" )
#Bean
public JmsItemReader<Message> reader() {
JmsItemReader<Message> itemReader = new JmsItemReader<>();
itemReader.setItemType(Message.class);
itemReader.setJmsTemplate(jmsTemplate());
return itemReader;
}

Is 1 phase commit (ChainedTransactionManager) really necessary in this scenario vs no transaction management?

I have a Spring Boot application with a #JmsListener that receives a message from a queue, stores it in database and sends it to another queue.
I wanted to have a minimal transactional guarantee so 1-phase-commit works for me. After a lot of reading I found I could use the ChainedTransactionManager to coordinate the DataSource and JMS resources:
#Configuration
public class TransactionConfiguration {
#Bean
public ChainedTransactionManager transactionManager(JpaTransactionManager jpaTm, JmsTransactionManager jmsTm) {
return new ChainedTransactionManager(jmsTm, jpaTm);
}
#Bean
public JpaTransactionManager jpaTransactionManager(EntityManagerFactory emf) {
return new JpaTransactionManager(emf);
}
#Bean
public JmsTransactionManager jmsTransactionManager(ConnectionFactory connectionFactory) {
return new JmsTransactionManager(connectionFactory);
}
}
Queue listener:
#Transactional(transactionManager = "transactionManager")
#JmsListener(...)
public void process(#Payload String message) {
//Write to db
//Send to output queue
}
START MESSAGING TX
START DB TX
READ MESSAGE
WRITE DB
SEND MESSAGE
COMMIT DB TX
COMMIT MESSAGING TX
If the db commit fails the message will be reprocesed again
If the db commit succeeds but the messaging commit fails the message will be reprocessed. This it not a problem since I can guarantee the idempotency of the db write operation
Now my doubt is, let's suppose I hadn't configured the ChainedTransactionManager and the listener were like this (no #Transactional):
#JmsListener(...)
public void process(#Payload String message) {
//Write to db
//Send to output queue
}
Doesn't this behave the same as the other example despite not coordinating the commits? (I've verified that on SQL exceptions the message is redelivered)
RECEIVE MESSAGE
WRITE DB + COMMIT
SEND MESSAGE + COMMIT
If the DB commit failed the message would be reprocessed
If it succeeded and the send message operation failed it would be reprocessed again.
So is the ChainedTransactionManager really necessary in this case?
UPDATE: Debugging the Spring Boot autoconfiguration (JmsAnnotationDrivenConfiguration)...
#Bean
#ConditionalOnMissingBean(name = "jmsListenerContainerFactory")
public DefaultJmsListenerContainerFactory jmsListenerContainerFactory(
DefaultJmsListenerContainerFactoryConfigurer configurer, ConnectionFactory connectionFactory) {
DefaultJmsListenerContainerFactory factory = new DefaultJmsListenerContainerFactory();
configurer.configure(factory, connectionFactory);
return factory;
}
... the DefaultJmsListenerContainerFactoryConfigurer is configuring the factory with factory.setSessionTransacted(true); because there is no JtaTransactionManager defined:
if (this.transactionManager != null) {
factory.setTransactionManager(this.transactionManager);
}
else {
factory.setSessionTransacted(true);
}
With setSessionTransacted(true), according to the Spring doc I would get the message rollback and redelivery behaviour I need on DB (or any) exceptions:
Local resource transactions can simply be activated through the
sessionTransacted flag on the listener container definition. Each
message listener invocation will then operate within an active JMS
transaction, with message reception rolled back in case of listener
execution failure. Sending a response message (via
SessionAwareMessageListener) will be part of the same local
transaction, but any other resource operations (such as database
access) will operate independently. This usually requires duplicate
message detection in the listener implementation, covering the case
where database processing has committed but message processing failed
to commit.
That explains I'm getting the behaviour I expect without needing to configure the ChainedTransactionManager.
After all this, could you tell me if it makes sense (it adds some guarantee I'm missing) to use the ChainedTransactionManager in this case?

Why does Amazon sqs not support transactions?

In Spring, Rabbitmq supports transactions. However, Amazon sqs does not support transactions in Spring.
I'm sorry. I added some more content.
I tested two message queues(rabbitmq, amazon sqs) as shown below in spring.
My purpose is the logic to process the user email to the queue to send the signup completion email when the user completes the signup without exception.
//rabbit mq configuration.class
#Bean
public ConnectionFactory rabbitConnectionFactory() {
CachingConnectionFactory connectionFactory =
new CachingConnectionFactory("localhost",5672);
connectionFactory.setUsername("guest");
connectionFactory.setPassword("guest");
return connectionFactory;
}
#Bean
public SimpleMessageListenerContainer messageListenerContainer() {
SimpleMessageListenerContainer container = new SimpleMessageListenerContainer();
container.setConnectionFactory(rabbitConnectionFactory());
container.setQueueNames(queue);
container.setMessageListener(exampleListener());
container.setTransactionManager(platformTransactionManager);
container.setChannelTransacted(true);
return container;
}
#Bean
public RabbitTemplate producerRabbitTemplate() {
RabbitTemplate rabbitTemplate = new RabbitTemplate(rabbitConnectionFactory());
rabbitTemplate.setQueue(queue);
rabbitTemplate.setMandatory(true);
rabbitTemplate.isChannelTransacted();
rabbitTemplate.setChannelTransacted(true);
return rabbitTemplate;
}
//UserService.class
#Autowired
private final UserRepository userRepository;
#Autowired
private final RabbitTemplate rabbitTemplate;
#Transactional
public User createUser(final User user){
rabbitTemplate.convertAndSend("spring-boot", user.getEmail()); // SignUp Completion email
final User user = userRepository.save(user);
if(true) throw new RuntimeException();
return user;
}
However, the logic above occurs a runtimeException.
Rabbit mq will not send an data to queue due to transactional annotation in Spring if an exception occurs.
//amazon sqs configuration.class
#Bean
public QueueMessagingTemplate queueMessagingTemplate(AmazonSQSAsync amazonSqs) {
//UserService.class
#Autowired
private final UserRepository userRepository;
#Autowired
private QueueMessagingTemplate messagingTemplate;
#Transactional
public User createUser(final User user){
messagingTemplate.convertAndSend("spring-boot", user.getEmail()); // SignUp Completion email
final User user = userRepository.save(user);
if(true) throw new RuntimeException();
return user;
}
However, sqs will send data to the queue even if an exception occurs.
Does anyone know why this is?
TLDR How can I solve this problem?
Don't try to use transactions for this, come up with some way to make the system eventually consistent.
It sounds like you want to perform a 'queue.sendMessage' and 'repository.save' as if they were a transaction - either both get committed, or none get committed. The problem is that the 'transaction' isn't really transactional, even in your rabbit example.
Consider the basics of how a transaction works:
some sort of begin step, that signifies following changes are part of the transaction
changes are made (but not visible to readers)
some sort of commit step to atomically commit the changes (making them visible to readers)
However, in your case, the queue and the repository are separate entities, backed by separate network resources, that don't talk to each other. There is no atomic commit in this case. Without an atomic commit, you cannot have a true transaction. It "works" in your demo because the exception is separate from the code that is doing the write.
Consider this case to more clearly illustrate:
#Transactional
public User createUser(final User user){
messagingTemplate.convertAndSend("spring-boot", user.getEmail());
final User user = userRepository.save(user);
return user;
}
When commit step starts for this, it needs to make visible both the message (in the queue) and the user (in the repo)
In order to do this, a network call needs to be made to both the queue and the repo
the fact that these are two calls is the source of the problem
If one succeeds and the other fails, you end up in an inconsistent state
you may say "transactions can be rolled back" - but rolling back would involve another network call, which of course can fail.
There are ways to make the overall distributed system transactional, but it's a very complex problem. It's often much easier and faster to allow for temporary inconsistency and have mechanisms in place to make the system eventually consistent.

Spring kafka Batch Listener- commit offsets manually in Batch

I am implementing spring kafka batch listener, which reads list of messages from Kafka topic and posts the data to a REST service.
I would like to understand the offset management in case of the REST service goes down, the offsets for the batch should not be committed and the messages should be processed for the next poll. I have read spring kafka documentation but there is confusion in understanding the difference between Listener Error Handler and Seek to current container error handlers in batch. I am using spring-boot-2.0.0.M7 version and below is my code.
Listener Config:
#Bean
KafkaListenerContainerFactory<ConcurrentMessageListenerContainer<String, String>> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.setConcurrency(Integer.parseInt(env.getProperty("spring.kafka.listener.concurrency")));
// factory.getContainerProperties().setPollTimeout(3000);
factory.getContainerProperties().setBatchErrorHandler(kafkaErrorHandler());
factory.getContainerProperties().setAckMode(AckMode.BATCH);
factory.setBatchListener(true);
return factory;
}
#Bean
public Map<String, Object> consumerConfigs() {
Map<String, Object> propsMap = new HashMap<>();
propsMap.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, env.getProperty("spring.kafka.bootstrap-servers"));
propsMap.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG,
env.getProperty("spring.kafka.consumer.enable-auto-commit"));
propsMap.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG,
env.getProperty("spring.kafka.consumer.auto-commit-interval"));
propsMap.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, env.getProperty("spring.kafka.session.timeout"));
propsMap.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
propsMap.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
propsMap.put(ConsumerConfig.GROUP_ID_CONFIG, env.getProperty("spring.kafka.consumer.group-id"));
return propsMap;
}
Listener Class:
#KafkaListener(topics = "${spring.kafka.consumer.topic}", containerFactory = "kafkaListenerContainerFactory")
public void listen(List<String> payloadList) throws Exception {
if (payloadList.size() > 0)
//Post to the service
}
Kafka Error Handler:
public class KafkaErrorHandler implements BatchErrorHandler {
private static Logger LOGGER = LoggerFactory.getLogger(KafkaErrorHandler.class);
#Override
public void handle(Exception thrownException, ConsumerRecords<?, ?> data) {
LOGGER.info("Exception occured while processing::" + thrownException.getMessage());
}
}
How to handle Kafka listener so that if something happens during processing batch of records, I wouldn't loose data.
With Apache Kafka we never lose the data. There is indeed an offset in partition logs to seek to any arbitrary position.
On the other hand, when we consume records from a partition there is no requirement to commit their offsets - the current consumer holds the state in the memory. We need to commit only for other, new consumers in the same group when the current one is dead. Independently of the error, the current consumer always moves on to poll new data behind its current in-memory offset.
So, to reprocess the same data in the same consumer we definitely have to use seek operation to move the consumer back to the desired position. That's why Spring Kafka introduces SeekToCurrentErrorHandler:
This allows implementations to seek all unprocessed topic/partitions so the current record (and the others remaining) will be retrieved by the next poll. The SeekToCurrentErrorHandler does exactly this.
https://docs.spring.io/spring-kafka/reference/htmlsingle/#_seek_to_current_container_error_handlers

Springboot JMS LIstener ActiveMQ is very slow

Im having a SpringBoot application which consume my custom serializable message from ActiveMQ Queue. So far it is worked, however, the consume rate is very poor, only 1 - 20 msg/sec.
#JmsListener(destination = "${channel.consumer.destination}", concurrency="${channel.consumer.maxConcurrency}")
public void receive(IMessage message) {
processor.process(message);
}
The above is my channel consumer class's snippet, it has a processor instance (injected, autowired and inside it i have #Async service, so i can assume the main thread will be released as soon as message entering #Async method) and also it uses springboot activemq default conn factory which i set from application properties
# ACTIVEMQ (ActiveMQProperties)
spring.activemq.broker-url= tcp://localhost:61616?keepAlive=true
spring.activemq.in-memory=true
spring.activemq.pool.enabled=true
spring.activemq.pool.expiry-timeout=1
spring.activemq.pool.idle-timeout=30000
spring.activemq.pool.max-connections=50
Few things worth to inform:
1. I run everything (Eclipse, ActiveMQ, MYSQL) in my local laptop
2. Before this, i also tried using custom connection factory (default AMQ, pooling, and caching) equipped with custom threadpool task executor, but still getting same result. Below is a snapshot performance capture which i took and updating every 1 sec
3. I also notive in JVM Monitor that the used heap keep incrementing
I want to know:
1. Is there something wrong/missing from my steps?I can't even touch hundreds in my message rate
2. Annotated #JmsListener method will execute process async or sync?
3. If possible and supported, how to use traditional sync receive() with SpringBoot properly and ellegantly?
Thank You
I'm just checking something similar. I have defined DefaultJmsListenerContainerFactory in my JMSConfiguration class (Spring configuration) like this:
#Bean
public DefaultJmsListenerContainerFactory jmsListenerContainerFactory(CachingConnectionFactory connectionFactory) {
// settings made based on https://bsnyderblog.blogspot.sk/2010/05/tuning-jms-message-consumption-in.html
DefaultJmsListenerContainerFactory factory = new DefaultJmsListenerContainerFactory(){
#Override
protected void initializeContainer(DefaultMessageListenerContainer container) {
super.initializeContainer(container);
container.setIdleConsumerLimit(5);
container.setIdleTaskExecutionLimit(10);
}
};
factory.setConnectionFactory(connectionFactory);
factory.setConcurrency("10-50");
factory.setCacheLevel(CACHE_CONSUMER);
factory.setReceiveTimeout(5000L);
factory.setDestinationResolver(new BeanFactoryDestinationResolver(beanFactory));
return factory;
}
As you can see, I took those values from https://bsnyderblog.blogspot.sk/2010/05/tuning-jms-message-consumption-in.html. It's from 2010 but I could not find anything newer / better so far.
I have also defined Spring's CachingConnectionFactory Bean as a ConnectionFactory:
#Bean
public CachingConnectionFactory buildCachingConnectionFactory(#Value("${activemq.url}") String brokerUrl) {
// settings based on https://bsnyderblog.blogspot.sk/2010/02/using-spring-jmstemplate-to-send-jms.html
ActiveMQConnectionFactory activeMQConnectionFactory = new ActiveMQConnectionFactory();
activeMQConnectionFactory.setBrokerURL(brokerUrl);
CachingConnectionFactory cachingConnectionFactory = new CachingConnectionFactory(activeMQConnectionFactory);
cachingConnectionFactory.setSessionCacheSize(10);
return cachingConnectionFactory;
}
This setting will help JmsTemplate with sending.
So my answer to you is set the values of your connection pool like described in the link. Also I guess you can delete spring.activemq.in-memory=true because (based on documentation) in case you specify custom broker URL, "in-memory" property is ignored.
Let me know if this helped.
G.

Resources