Kafka streams - exactly once configuration - apache-kafka-streams

I was trying to use exactly once capabilities of kafka using kafka streams library. I've only configured proessing.guarantee as exactly_once. Along with this, there is a need to have transaction state stored in a internal topic (__transaction_state).
My question is, how to customize the name of the topic? if kafka cluster is being shared by multiple consumers, does each customer need a different topic for transaction management?
Thanks
Murthy

You don't need to worry about the topic __transaction_state -- it's an internal topic that will be automatically created for you -- you don't need to create it manually and it will always have this name (it's not possible to customize the name). It will be used for all producers that use transactions.

Related

kafka streams - can I use kafka streams processing in cases where the source is not a kafka topic?

I have an application (call it smscb-router) as shown in the diagram.
It reads data from a legacy system (sms).
Based on the content (callback type), I have to put into corresponding outgoing topic (such as billing-n-cdr, dr-cdr, ...)
I think streams API is better suited in this case, as it has the map functionality to do the content mapping check. What I am unsure is, can I read source data from a non-kafka-topic source.
All the examples that I see on the internet blogs, explain steaming apps with the context of reading from a source topic and put to other destination topics.
So, is this possible to read from a non-topic source, such as say a redis store, or a message queue such as RabbitMQ?
We had a recent implementation, where we had to poll an .xml file from a network attached drive and convert it into the KAFKA Events i.e. publishing each record into an output topic. In such, we wont even call it as something we have developed using a Streams API, but it is just a KAFKA Producer Component.
Java File Poller Module (Quartz time based) -> XML Schema Management -> KAFKA Producer Component -> Output Topic (KAFKA Broker).
And you will get all native features of KAKFA Producer API in terms of retries and you can use producer.send (Sync) or producer.send.get(Asyn) with call-back.
Hope this helps. Streams API is meant for big and something very complex that to be normalized through using Stateful operations.
Thanks,
Christopher
Kafka Streams is only about Topic to Topic Data Streaming
All external system should be integrated by another method :
Ideally Kafka Connect : for example with this one :
https://docs.confluent.io/kafka-connect-rabbitmq-source/current/overview.html
You may also use a manual consumer for the first step, but it always better to reuse all availability mecanism built in Kafka Connect. (No code, just some Json config).
In your schema i would recommend to add one topic and one producer or one connector in front of your Pink Component, then it can become a fully standard Kafka Streams microservice.

Kafka connect with EventStoreDB

I'm working on a small academic project - Event sourcing with EventStoreDB and Apache Kafka as a broker. The idea is that get events from EventStoreDB and push them to Kafka for further distribution. I saw Apache Kafka has connections to different DB systems but didn't find any connector with EvenStoreDB.
How can I create(code or use existing one) Kafka connector to EventStoreDB, so these two systems would be able to transfer events vise-versa, from Kafka to EventStoreDB and from EventStoreDB to Kafka?
There is no official Kafka Connect Connector between Kafka and EventStoreDB, and I haven't heard about any unofficial so far. Still, there is a tool called Replicator that enables replicating data from EventStoreDB to Kafka (https://replicator.eventstore.org/docs/features/sinks/kafka/). It's open-sourced, so you can either use it or check the implementation.
For the EventStoreDB to Kafka, I recommend using the subscriptions mechanism: catch-up if you need an ordering guarantee, persistent if ordering is not critical: https://developers.eventstore.com/clients/grpc/subscriptions.html. The crucial part here is to define how to map EventStoreDB streams to Kafka topics and partitions. Typically you'd expect to have at least an ordering guarantee on the stream level, so single stream events should land to the same partition.
For Kafka to EventStoreDB integration, you could either write your own pass-through service or try to use the HTTP sink connector (e.g. https://docs.confluent.io/kafka-connect-http/current/overview.html). EventStoreDB exposes HTTP API (https://developers.eventstore.com/clients/http-api/v5/introduction/). Sidenote, this API (Atom pub based) may be replaced with another HTTP API in the future, so the structure may change.
You can use Event Store Replicator, which has a Kafka sink.
Keep in mind that it doesn't do anything with regards to events schema, so things like Kafka Streams and KSQL might not work properly.
The sink was created solely for the purpose of pushing events to Kafka being used as a message broker.

More than one JDBC Connectors one single topic

I have requirement to consume two exactly same table on different database (location also differ) to publish on same topic.
Kafka JDBC connectors doesn't explain how it manage high-watermark so thought to check whats best practices in this scenario?
1. Can we keep 2 separate JDBC connector publishing to separate topic
2. Can we keep 2 separate JDBC connector publishing to same topic.
If we choose option 2 how Kafka JDBC connector manage in case message arrived into table concurrently at same time? How it manage different database time zone?
Can we keep 2 separate JDBC connector publishing to separate topic
Yes.
Can we keep 2 separate JDBC connector publishing to same topic.
Yes
how Kafka JDBC connector manage in case message arrived into table concurrently at same time?
You'll get both messages on the target topic. Your consumer would need logic in it to deal with the duplication if there is any. You could use a Single Message Transform to set the key on the message written to the topic and use that as part of the de-duplication.

Using AggregateApplicationBuilder with a local binder

I'm trying to aggregate different Sink and Source spring boot applications using the AggregateApplicationBuilder as described here: http://docs.spring.io/spring-cloud-stream/docs/current-SNAPSHOT/reference/htmlsingle/#_aggregation
Since I expect in process communication, I don't want to setup kafka or rabbitmq binder. How to configure a local one? I found that a spring-cloud-stream-binder-local exists but it's in M2 since a long time and is not embedded with a release train.
How I can use the AggregateApplicationBuilder with no external system dependency?
Thanks
With AggregateApplicationBuilder you don't have to configure the binder for the in-process communication of the directly bound channels within the aggregated application. The binder is required only if you need the aggregate application itself consumes messages from broker or produces messages to broker. If the aggregated application itself is self-contained, then there is no need for the binder at all.

Is it possible for Spring-XD to listen to more than one JMS broker at a time?

I've managed to get Spring Xd working for a scenario where I have data coming in from one JMS broker.
I potentially am facing a scenario where data ingestion could happen from different sources thereby needing me to connect to different brokers.
Based on my current understanding, I'm not quite sure how to do this as there exists a JMS config file which allows you to setup only one broker.
Is there a workaround to this?
At the moment, you would have to create a separate jms-[provider]-infrastructure-context.xml for each broker (in modules/common), say call the provider activemq2.
Then use --provider=activemq2 in the module definition.
(I recently used this technique to test sonicmq and hornetq providers).

Resources