Kafka Streams Add New Source to Running Application - apache-kafka-streams

Is it possible to add another source topic to an existing topology of a running kafka streams java application. Based on the javadoc (https://kafka.apache.org/23/javadoc/org/apache/kafka/streams/KafkaStreams.html) I am guessing the answer is no.
My Use Case:
REST api call triggers a new source topic should be processed by an existing processor. Source topics are stored in a DB and used to generate the topology.
I believe the only option is to shutdown the app and restart it allowing for the new topic to be picked up.
Is there any option to add the source topic without shutting down the app?

You cannot modify the program while it's running. As you point out, to change anything, you need to stop the program and create a new Topology. Depending on your program and the change, you might actually need to reset the application before restarting it. Cf. https://docs.confluent.io/current/streams/developer-guide/app-reset-tool.html

Related

How to debug Spring Cloud Data flow sink application that is silently failing to write

I have a sink application that fails to write to db, but am having trouble debugging. Note that I also asked a more specific question here, but this question in SO is more general: How should I go about debuggging an SCDF stream pipeline when no errors come up?
What I'm trying to do
I am trying to follow a tutorial (specifically, this tutorial) which uses some prebuilt applications. Everything is up and running with no error messages, and the source application is correctly writing to Kafka. However, the sink seems to be failing to write anything.
Note that I do see the debugging guide here:
https://dataflow.spring.io/docs/stream-developer-guides/troubleshooting/debugging-stream-apps/#sinks
However, this seems to only be relevant when you are writing your own sink.
I am not asking about how to solve my issue per se, but rather about debugging protocol for SCDF apps in general. What is the best way to go about debugging in these kinds of situations where no errors come up but the core functionality isn't working?
Assuming you know how to view the logs and there are no error messages, the next step is to turn on DEBUG logging for spring-integration. You can set a property on the sink logging.level.org.springframework.integration=DEBUG that will log any messages coming into the sink.

ruby-kafka: is it possible to publish to two kafka instances at the same time

Current flow of the project that I'm working on involves pushing to a local kafka using ruby-kafka gem.
Now the need arose to add producer for the remote kafka, and duplicate also messages there.
And I'm looking for a better way, than calling Kafka.new(...) twice...
Could you please help me, and do you happen to have any ideas?
Another approach to consider would be writing the data once from your application, and then asynchronously replicating the message from one Kafka cluster to another. There are multiple ways of doing this including Apache Kafka's MirrorMaker, Confluent's Replicator, Uber's uReplicator etc.
Disclaimer: I work for Confluent.

SCDF. WSDL Source : Spring Cloud Task or Spring Cloud Stream or any other solution?

We have requirements for getting data from a SOAP web service, where same records are going to be exposed. Then the record is transformed and written do the DB.
We are the acitve side and at the certain intervals we are going to check if a new record has appeared.
Our main goal are:
to have a scheduler for setting intervals
to have a mechanizm to retry if something goes wrong (eg. lost connection)
to have a visual control of the process - check the places where something stuck (like dashboard in SCDF)
Since there is no sample wsdl source app, I guess the Task (or Stream ?) should be written by ourself. But what to use for repeating and scheduling...
I Need your advice in choosing the right approach.
I'm not tied to the SCDF solution if any other are more suitable.
If you intend to consume directly as SOAP messages from external services, you could either build a custom Spring Cloud Stream source or a simple Spring Batch/Spring Cloud Task application. Both the options provide the resiliency patterns, including retries.
However, if the upstream data is not real-time, you would choose the Task path because the streams are long-running and they never terminate. Tasks, on the other hand, run for a finite period of time, terminate, and free-up resources. There's also the option to use the platform-specific scheduler implementation to trigger to launch the Task on a recurring window periodically.
From the SCDF dashboard, you can design/build Composed Tasks, including the state transitions and the desired downstream operation.

Spring boot applications high availability

We have a microservice which is developed using spring boot. couple of the functionalities it implements is
1) A scheduler that triggers, at a specified time, a file download using webhdfs and process it and once the data is processed, it will send an email to users with the data process summary.
2) Read messages from kafka and once the data is read, send an email to users.
We are now planning to make this application high available either in Active-Active or Active-passive set up. The problem we are facing now is if both the instances of the application are running then both of them will try to download the file/read the data from kafka, process it and send emails. How can this be avoided? I mean to ensure that only one instance triggers the download and process it ?
Please let me know if there is known solution for this kind of scenarios as this seems to be a common scenario in most of the projects? Is master-slave/leader election approach a correct solution?
Thanks
Let the service download that file, extract the information and publish them via kafka.
Check beforehand if the information was already processed by querying kafka or a local DB.
You also could publish an DataProcessed-Event that triggers the EmailService, that sends the corresponding E-Mail.

Re-ordering of messages - Artemis

I have to run two instances of the same application that read messages from 'queue-1' and write them back to another queue 'queue-2'. 
I need my messages inside the two queues to be ordered by specific property (sequence number) which is initially added to every message by producer. As per documentation, inside queue-1 the order of messages will be preserved as messages are sent by a single producer. But because of having multiple consumers that read, process and send the processed messages to queue-2, the order of messages inside queue-2 might be lost.
So my task is to make sure that messages are delivered to queue-2 in the same order as they were read from queue-1. I have implemented re-sequencer pattern from Apache camel to re-order messages inside queue-2. The re-sequencer works fine but results to data transfer overhead as the camel routes run locally.
Thinking about doing it in a better way, I have three questions: 
Does artemis inherently supports re-ordering of messages inside a
queue using a property such as sequence number.
Is it possible to run the routes inside the server? If yes, can you
give an example or give a link to the documentation?
Some artemis features such as divert (split) requires modifying
broker configuration (broker.xml file), is there a way to do them
programmatically and dynamically so I can decide when to start
diverting message? I know this can be accomplished by using camel,
but I want everything to be running in the server.
Does artemis inherently supports re-ordering of messages inside a queue using a property such as sequence number.
No. Camel is really the best solution here in my opinion.
Is it possible to run the routes inside the server? If yes, can you give an example or give a link to the documentation?
You should be able to do the same kind of thing in Artemis as in ActiveMQ 5.x using a web application with a Camel context. The 5.x doc is here.
Some artemis features such as divert (split) requires modifying broker configuration (broker.xml file), is there a way to do them programatically and dynamically so I can decide when to start diverting message?
You can use the Artemis management methods to create, modify, and delete diverts programmatically (or administratively) at runtime. However, these modifications will be volatile (i.e. they won't survive a broker restart).

Resources