kafka streams - can I use kafka streams processing in cases where the source is not a kafka topic? - apache-kafka-streams

I have an application (call it smscb-router) as shown in the diagram.
It reads data from a legacy system (sms).
Based on the content (callback type), I have to put into corresponding outgoing topic (such as billing-n-cdr, dr-cdr, ...)
I think streams API is better suited in this case, as it has the map functionality to do the content mapping check. What I am unsure is, can I read source data from a non-kafka-topic source.
All the examples that I see on the internet blogs, explain steaming apps with the context of reading from a source topic and put to other destination topics.
So, is this possible to read from a non-topic source, such as say a redis store, or a message queue such as RabbitMQ?

We had a recent implementation, where we had to poll an .xml file from a network attached drive and convert it into the KAFKA Events i.e. publishing each record into an output topic. In such, we wont even call it as something we have developed using a Streams API, but it is just a KAFKA Producer Component.
Java File Poller Module (Quartz time based) -> XML Schema Management -> KAFKA Producer Component -> Output Topic (KAFKA Broker).
And you will get all native features of KAKFA Producer API in terms of retries and you can use producer.send (Sync) or producer.send.get(Asyn) with call-back.
Hope this helps. Streams API is meant for big and something very complex that to be normalized through using Stateful operations.
Thanks,
Christopher

Kafka Streams is only about Topic to Topic Data Streaming
All external system should be integrated by another method :
Ideally Kafka Connect : for example with this one :
https://docs.confluent.io/kafka-connect-rabbitmq-source/current/overview.html
You may also use a manual consumer for the first step, but it always better to reuse all availability mecanism built in Kafka Connect. (No code, just some Json config).
In your schema i would recommend to add one topic and one producer or one connector in front of your Pink Component, then it can become a fully standard Kafka Streams microservice.

Related

Kafka connect with EventStoreDB

I'm working on a small academic project - Event sourcing with EventStoreDB and Apache Kafka as a broker. The idea is that get events from EventStoreDB and push them to Kafka for further distribution. I saw Apache Kafka has connections to different DB systems but didn't find any connector with EvenStoreDB.
How can I create(code or use existing one) Kafka connector to EventStoreDB, so these two systems would be able to transfer events vise-versa, from Kafka to EventStoreDB and from EventStoreDB to Kafka?
There is no official Kafka Connect Connector between Kafka and EventStoreDB, and I haven't heard about any unofficial so far. Still, there is a tool called Replicator that enables replicating data from EventStoreDB to Kafka (https://replicator.eventstore.org/docs/features/sinks/kafka/). It's open-sourced, so you can either use it or check the implementation.
For the EventStoreDB to Kafka, I recommend using the subscriptions mechanism: catch-up if you need an ordering guarantee, persistent if ordering is not critical: https://developers.eventstore.com/clients/grpc/subscriptions.html. The crucial part here is to define how to map EventStoreDB streams to Kafka topics and partitions. Typically you'd expect to have at least an ordering guarantee on the stream level, so single stream events should land to the same partition.
For Kafka to EventStoreDB integration, you could either write your own pass-through service or try to use the HTTP sink connector (e.g. https://docs.confluent.io/kafka-connect-http/current/overview.html). EventStoreDB exposes HTTP API (https://developers.eventstore.com/clients/http-api/v5/introduction/). Sidenote, this API (Atom pub based) may be replaced with another HTTP API in the future, so the structure may change.
You can use Event Store Replicator, which has a Kafka sink.
Keep in mind that it doesn't do anything with regards to events schema, so things like Kafka Streams and KSQL might not work properly.
The sink was created solely for the purpose of pushing events to Kafka being used as a message broker.

Is it possible sending websocket messages to a kafka topic?

I am trying to find a way to consume messages that being sent by a websocket to a kafka topic (the messages are sent by the websocket to the address 'ws://address:port/topic_name' and I want to add all of those messages to a kafka topic).
I read about kafka connect and tried to find a way to do it with it but it doesnt seem to work...
thanks in advance :)
There is no Kafka Connector to a socket in Confluent Platform.
I work in a team that use Kafka in production and our source is a socket, so your options are to use platforms that support this socket->Kafka producing, or write one by yourself.
About possible platforms, I think most of them will be overkill though you can utilize them for this problem, some options are:
1. NiFi or MiniFi for smaller loads, use PublishKafka Processor
2. StreamSets with Kafka Producer Destination
3. Apache Flume- not very recommended, this project is stops to evolve.
If you wish to write your own producer, you basically have to create a listener on this port, and produce the incoming messages to Kafka; if this is a web socket, just get the payload of the requests and produce them to Kafka.
Example Kafka Producer Code can be copied from tutorialspoint simple producer example*
Here are some open-source projects examples:
1. https://github.com/DataReply/kafka-connect-socket-source
2. https://github.com/kafka-socket/miniature_engine
3. https://github.com/dhanuka84/kafka-connect-tcp
4. https://github.com/krux/tcp-stream-kafka-producer
The idea of Kafka connect is that you have some sort of external integration that serves as storage. This can be SAP, Salesforce, RDBMS, MQ or anything else that has state. You websocket endpoint does not have data, you can not poll it it is someone else that is invoking it and there fore the data is transfered. Now if you know who is actualy holding the data than you can potentialy build a conector using this guide. https://docs.confluent.io/current/connect/devguide.html
For your particular case, the best you can do is either to use Kafka Producer API https://docs.confluent.io/current/clients/producer.html
and from your websocket enpoint use this producer to post a message to the topic, or even better if you are using spring you can use a higher level abstraction, that will be KafkaTemplate https://docs.spring.io/spring-kafka/reference/html/#sending-messages.
Full disclosure: I work for MigratoryData.
You can check out MigratoryData's solution for Kafka. MigratoryData is a scalable WebSocket server. The MigratoryData Source/Sink Connector for Kafka makes use of Kafka Connect API and can be used to stream data in real-time from Kafka to WebSocket clients and vice versa. The main advantage of the solution is it extends Kafka messaging to WebSocket clients while preserving Kafka's key features like guaranteed delivery, message ordering, etc.

Kafka streams - exactly once configuration

I was trying to use exactly once capabilities of kafka using kafka streams library. I've only configured proessing.guarantee as exactly_once. Along with this, there is a need to have transaction state stored in a internal topic (__transaction_state).
My question is, how to customize the name of the topic? if kafka cluster is being shared by multiple consumers, does each customer need a different topic for transaction management?
Thanks
Murthy
You don't need to worry about the topic __transaction_state -- it's an internal topic that will be automatically created for you -- you don't need to create it manually and it will always have this name (it's not possible to customize the name). It will be used for all producers that use transactions.

Reactive stream Kafka Stream Fan-out to http actors

I'm very new to Akka Streaming and reactive streaming. I have a question: is it possible to have a rest API receiving a message dropping it on Kafka Bus, and the Kafka streaming consumer then aggregates the messages in a max. time window and retrun the answer back?
How to implement such a system? Or where to start?
Thanks
For a rest API you can consider the Kafka REST Proxy: https://github.com/confluentinc/kafka-rest
Or course you can instead build your own using akka-http and akka-stream-kafka.
As to windowing, I'm sure it can be done in akka streams but personally, I'd suggest using Kafka Streams as the first port of call:
http://docs.confluent.io/current/streams/developer-guide.html#windowing
I'm not sure what exactly you mean by returning the answer back, but if you follow the approach above, you can use use REST Proxy to consume the windowed-aggregated messages or you can build a REST service that queries the Kafka Streams state stores via the so-called "interactive queries". This post shows how to do it using javax.ws.rs: https://www.confluent.io/blog/unifying-stream-processing-and-interactive-queries-in-apache-kafka/ but for a reactive application you can do the same using akka-http instead (I'm implementing this exact thing on one of my projects).

Connection between Apache Kafka and JMS

I was wondering could Apache Kafka communicate and send messages to JMS? Can I establish connection between them? For example, I'm using JMS in my system and it should send messages to the other system that uses Kafka
answering bit late, but if I understood correctly the requirement.
If the requirement is synchronous messaging from
client->JMS->Kafka --- > consumer
then following is not the solution, but if its ( and most likely) the async requirement like:
client->JMS | ----> Kafka ---> consumer
then, this would be related to KafkaConnect framework which is solving the problem of how to integrate different sources and sinks with Kafka.
http://docs.confluent.io/2.0.0/connect/
http://www.confluent.io/product/connectors
so what you need is a JMSSourceConnector.
Not directly. And the two are incomparable concepts. JMS is a vendor-neutral API specification of a messaging service.
While Kafka may be classified as a messaging service, it is not compatible with the JMS API, and to the best of my knowledge there is no trivial way of adapting JMS to fit Kafka's use cases without making significant compromises.
However, if your needs are simply to move messages between Kafka and a JMS-compliant broker, then this can easily be achieved by either writing a simple relay app that consumes from one and publishes onto another, or use something like Kafka Connect, which has pre-canned sinks for most data sources, including JMS brokers, databases, etc.
If the requirement is the reverse of the previous answer:
Kafka Producer -> Kafka Broker -> JMS Broker -> JMS Consumer
then you would need a KafkaConnect Sink like the following one from Data Mountaineer
http://docs.datamountaineer.com/en/latest/jms.html

Resources