Handling transactions with Kafka Streams and Spring-Cloud-Stream - spring-boot

I am developing an app (microservices-based) relying on Kafka and Kafka Streams. I am using Spring Boot and Spring Cloud Stream for that and I am having trouble with handling transactions for Kafka Streams operations. I know that there is no problem with handling transactions purely with Kafka consumer however when I try to add Kafka Streams processing in the middle it becomes tricky to me.
The example case is:
In one of my services order request for a product is consumed from topic A.
Inventory info is consumed from topic B
This service produces inventory updates to topic B but it is also responsible for publishing events regarding products being ready for shipping (to topic C)
When receiving order request from topic A I want to check (by processing topic B) whether inventory for particular product is sufficient and publish an event with either success or failure (regarding that order) to topic C.
At the same time I need to update inventory (subtract the quantity that is let's say reserved for shipping) so that for next order I have actual values from topic B. I want to post success to topic C and update inventory on topic B within one transaction.
Is that possible in spring cloud stream with kafka streams? And if yes, how can I manage to do that?

Related

One partition multiple consumers same group, consumer IDs

We have one topic with one partition due to ordering of message requirements. We have two consumers running on different servers with same set of configurations i.e. groupId, consumerId, consumerGroup. i.e.
1 Topic -> 1 Partition -> 2 Consumers
When we deploy consumers same code is deployed on both the servers. Noticed when a message comes we see both the consumers are consuming message rather than only one processing. Reason having consumers running on two separate servers is if one server crashes at least other can continue processing messages. But looks like if both up both consuming messages. Reading Kafka docs it says if we have more consumers than partitions then some stay idle don't see that happening. Anything we are missing on configuration side apart from consumerId & groupId. Thanks
As #Gary Russel said, as long as the two consumer instances have their own consumer group, they will consume every event that is written to the topic. Just put them into the same consumer-group. You can provide a consumer-group-id in the consumer.properties.

Best way to track/trace a JSON Object (a time series data) as it flows through a system of microservices on a IOT platform

We are working on an IOT platform, which ingests many device parameter
values (time series) every second from may devices. Once ingested the
each JSON (batch of multiple parameter values captured at a particular
instance) What is the best way to track the JSON as it flows through
many microservices down stream in an event driven way?
We use spring boot technology predominantly and all the services are
containerised.
Eg: Option 1 - Is associating UUID to each object and then updating
the states idempotently in Redis as each microservice processes it
ideal? Problem is each microservice will be tied to Redis now and we
have seen performance of Redis going down as number api calls to Redis
increase as it is single threaded (We can scale this out though).
Option 2 - Zipkin?
Note: We use Kafka/RabbitMQ to process the messages in a distributed
way as you mentioned here. My question is about a strategy to track
each of this message and its status (to enable replay if needed to
attain only once delivery). Let's say a message1 is being by processed
by Service A, Service B, Service C. Now we are having issues to track
if the message failed getting processed at Service B or Service C as
we get a lot of messages
Better approach will be using Kafka instead of Redis.
Create a topic for every microservice & keep moving the packet from
one topic to another after processing.
topic(raw-data) - |MS One| - topic(processed-data-1) - |MS Two| - topic(processed-data-2) ... etc
Keep appending the results to same object and keep moving it down the line, untill every micro-service has processed it.

Spring Boot Kafka - Message management with consumer different

My application create with SpringBoot and is in cluster (two different istance openshit)
Every istance has one consumer that read message of topic in replication factory.
I would like to find a mechanism to block the reading of a message into topic in replication factory if it has already been read by one of the two consumers
Example:
CONSUMER CLIENT A -- READ MSG_1 --> BROKER_1
- Offset increase
- Commit OK
CONSUMER CLIENT B --> NOT READ MSG_1 --> BROKER_1
-- Correct beacause already commit
Now BROKER_1 is show and new lead is BROKER_2
How can I block the already read message into BROKER_2?
Thanks all!
Giuseppe.
Replication factor doesn't control if/how consumers read messages. The partition count does. If the topic only has one partition, then only one consumer instance is able to read messages, and all other instances are "blocked". And if the message is already read and commited then it doesn't matter which broker is the leader because the offsets are maintained per topic, not per replica
If you have more than one partition and you still want to block consumers from being able to read data, then you'll need to implement some external, coordinated lock via Zookeeper, for example

Kafka Streams Exactly Once Consumer Groups

When I turn on exactly once processing I get the following error. NOTE: Our application are very secure and we only give kafka users and consumers access to resources that they explicitly need.
2019-04-22 15:28:09 INFO (kafka.authorizer.logger)233 - Principal = User:xxx is Denied Operation = Describe from hos
xxx.xxx.xxx.xxx on resource = TransactionalId:application_consumer-0_16
With exactly once processing does kafka streams use a consumer group per stream task instead of a consumer group across all stream tasks?
With exactly-once enabled, there is still only one consumer group that is the same as the application.id. However, instead of using one Producer per thread, one producer per task is used.
What you need is permission for transaction. The TransactionsId the error reports is from the producer of task 0_16. Each producer uses its own transactional ID, that is constructed as <application.id>-<taskId>.
For details, compare the docs: https://docs.confluent.io/current/kafka/authorization.html#using-acls

Spring Cloud | Gather response from multiple destinations

I'm thinking if Spring Cloud Stream can be a good fit for a specific system we're thinking to build ground up. There's currently a Monolith (ESB) which is currently in use but we are looking to get benefitted by the goodness of microservices (spring cloud ecosystem especially).
We receive request from the input source (JMS Queue, ActiveMQ to be specific) at the rate of 5 requests/second.
We will need to have different routing rules (based on the payload or some derived logic) and route the message to different output destinations (say A, B, C). The output destinations are JMS queues.
Finally, we'll have to receive the 3 responses from A,B,C (by listening to different set of queues) and mash up the final response. This response is finally dispatched to another output channel (which is anther JMS queue).
There are a few corner cases such as when the response for A takes more than '5' seconds, then we'll want to mash up the responses of 'B' and 'C' and an error object for 'A'. Same goes for 'B' and 'C' too.
Also, the destinations 'A','B' and 'C' are dynamic. We could have more target systems 'D', 'E' etc in the future. We're looking at not having to change the main orchestration layer if a new system is introduced.
Is Spring Cloud Stream the right choice? I'm looking for more specific pointers in case of Aggregating the responses from multiple JMS queues (with timeouts) and mashing up the response.
What you are talking about is fully sufficient for the Aggregator EIP or its more powerful friend Scatter-Gather .
Both of them are available in Spring Integration:
Aggregator
Scatter-Gather
So, you will need to have some correlationKey to be able to gather all the responses to the same group to aggregate in the end.
Also there is group-timeout option which allows you to release group when there is no all replies after some time.

Resources