Spring Cloud Stream’s Apache Kafka - apache-kafka-streams

This is my application that consumes data from a Kafka topic and then computed results are sent to a topic.
#SpringBootApplication
#EnableBinding(KStreamProcessor.class)
public class WordCountProcessorApplication {
#StreamListener("input")
#SendTo("output")
public KStream<?, WordCount> process(KStream<?, String> input) {
return input
.flatMapValues(value -> Arrays.asList(value.toLowerCase().split("\\W+")))
.groupBy((key, value) -> value)
.windowedBy(TimeWindows.of(5000))
.count(Materialized.as("WordCounts-multi"))
.toStream()
.map((key, value) -> new KeyValue<>(null, new WordCount(key.key(), value, new Date(key.window().start()), new Date(key.window().end()))));
}
public static void main(String[] args) {
SpringApplication.run(WordCountProcessorApplication.class, args);
}
How can I print a log befor every consume data from a Kafka topic?

Add a Transformer at the beginning and end of your topology.
See this discussion where there was a request to automatically add custom transformers to the topology by the framework.
It was decided that the work around to add your own is sufficient.

Related

How to read a message from Kafka topic on demand

How can I read a message from Kafka topic on demand. I have the topic name, offsetId, PartitionID, using these three params, how can i retrieve a specific message from Kafka Topic. Is it possible using Spring Kafka ?
I am using spring boot 2.2.4.RELEASE
create consumer
assign the topic/partition
seek
poll for one record
close consumer
#SpringBootApplication
public class So64759726Application {
public static void main(String[] args) {
SpringApplication.run(So64759726Application.class, args);
}
#Bean
ApplicationRunner runner(ConsumerFactory<String, String> cf) {
return args -> {
try (Consumer<String, String> consumer = cf.createConsumer()) {
TopicPartition tp = new TopicPartition("so64759726", 0);
consumer.assign(Collections.singleton(tp));
consumer.seek(tp, 2);
ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(5));
System.out.println(records.iterator().next().value());
}
};
}
}
application.properties
spring.kafka.consumer.max-poll-records=1
UPDATE
Since this answer was posted, the KafkaTemplate now has receive() methods for on-demand consumption.
https://docs.spring.io/spring-kafka/docs/current/reference/html/#kafka-template-receive
ConsumerRecord<K, V> receive(String topic, int partition, long offset);
ConsumerRecord<K, V> receive(String topic, int partition, long offset, Duration pollTimeout);
ConsumerRecords<K, V> receive(Collection<TopicPartitionOffset> requested);
ConsumerRecords<K, V> receive(Collection<TopicPartitionOffset> requested, Duration pollTimeout);

How to send keyed message to Kafka using Spring Cloud Stream Supplier

I want to use Spring Cloud Stream to produce keyed (message with specific key) messages to Kafka.
#SpringBootApplication
public class SpringCloudStreamKafkaApplication {
public static void main(String[] args) {
SpringApplication.run(SpringCloudStreamKafkaApplication.class, args);
}
#Bean
Supplier<DataRecord> process(){
return () -> new DataRecord(42L);
}
}
What do I need to change in the Supplier code to provide key?
Is it possible in new style of API (using lambdas)?
Thank you
Return a Message<?> and set the KafkaHeaders.MESSAGE_KEY header:
#Bean
Supplier<Message<String>> process() {
return () -> MessageBuilder.withPayload("foo")
.setHeader(KafkaHeaders.MESSAGE_KEY, "bar".getBytes())
.build();
}
(assumes the default key serializer (byte[]).
EDIT
This will be called endlessly.
If you want to send a finite stream, I believe you have to switch to the reactive model.
#Bean
Supplier<Flux<Message<String>>> processFinite() {
Message<String> msg1 = MessageBuilder.withPayload("foo")
.setHeader(KafkaHeaders.MESSAGE_KEY, "bar".getBytes())
.build();
Message<String> msg2 = MessageBuilder.withPayload("baz")
.setHeader(KafkaHeaders.MESSAGE_KEY, "qux".getBytes())
.build();
return () -> {
return Flux.just(msg1, msg2);
};
}
There is also Flux.fromStream(myStream).
Which will end at the end of the stream.
EDIT2
You can also use the StreamBridge.
https://docs.spring.io/spring-cloud-stream/docs/3.1.4/reference/html/spring-cloud-stream.html#_sending_arbitrary_data_to_an_output_e_g_foreign_event_driven_sources

Spring Cloudstream 3 + RabbitMQ configuration to existing queue

I'm learning Cloudstream and cannot map the cloudstream Function<String, String> into existing queue.
I'm just creating the hello world app from spring cloud documentation, but don't really understand this part regarding binding names.
I have q.test (existing) on my rabbitmq app, but when I use this code and configuration, my app always create new queue q.test.anonymous.someRandomString.
Anybody has configuration example for this?
#SpringBootApplication
public class CloudstreamApplication {
public static void main(String[] args) {
SpringApplication.run(CloudstreamApplication.class, args);
}
#Bean
public Function<String, String> uppercase() {
return value -> {
System.out.println("Received: " + value);
return value.toUpperCase();
};
}
}
application.yml
spring.cloud.stream:
function.bindings:
uppercase-in-0: q.test
bindings:
uppercase-in-0.destination: q.test
Thanks
See the binder documentation - Using Existing Queues/Exchanges.
If you have an existing exchange/queue that you wish to use, you can completely disable automatic provisioning as follows, assuming the exchange is named myExchange and the queue is named myQueue:
spring.cloud.stream.bindings.<binding name>.destination=myExhange
spring.cloud.stream.bindings.<binding name>.group=myQueue
spring.cloud.stream.rabbit.bindings.<binding name>.consumer.bindQueue=false
spring.cloud.stream.rabbit.bindings.<binding name>.consumer.declareExchange=false
spring.cloud.stream.rabbit.bindings.<binding name>.consumer.queueNameGroupOnly=true

Spring Batch question for email summary at the end of all jobs

We have approximately 20 different Spring Batch jobs (some running as microservices, some lumped together in one Spring Boot app). What I need to do is gather all the errors encountered by ALL the jobs, as well as the number of records processed, and summarize it all in an email.
I have implemented ItemListenerSupport as a start:
public class BatchItemListener extends ItemListenerSupport<BaseDomainDataObject, BaseDomainDataObject> {
private final static Log logger = LogFactory.getLog(BatchItemListener.class);
private final static Map<String, Integer> numProcessedMap = new HashMap<>();
private final static Map<String, Integer> errorMap = new HashMap<>();
#Override
public void onReadError(Exception ex) {
logger.error("Encountered error on read", ex);
}
#Override
public void onProcessError(BaseDomainDataObject item, Exception ex) {
String msgBody = ExceptionUtils.getStackTrace(ex);
errorMap.put(item, msgBody);
}
#Override
public void onWriteError(Exception ex, List<? extends BaseDomainDataObject> items) {
logger.error("Encountered error on write", ex);
numProcessedMap.computeIfAbsent("numErrors", val -> items.size());
}
#Override
public void afterWrite(List<? extends BaseDomainDataObject> items) {
logger.info("Logging successful number of items written...");
numProcessedMap.computeIfAbsent("numSuccess", val -> items.size());
}
}
But how to I access the errors I accumulate in the listener when my batch jobs are finally finished? Right now I don't even have a good way to know when they are all finished. Any suggestions? Does Spring Batch provide something better for summarizing jobs?
Spring Batch does not provide a way to orchestrate jobs. The closest you can get out of the box is using a "master" job with multiple steps of type Jobstep that delegate to your sub-jobs. with this approach, you can do the aggregation in a JobExecutionListener#afterJob configured on the master job.
Otherwise, you can Spring Cloud Data Flow and create a composed task of all your jobs.

KStream with Testbinder - Spring Cloud Stream Kafka

I recently started looking into Spring Cloud Stream for Kafka, and have struggled to make the TestBinder work with Kstreams. Is this a known limitation, or have I just overlooked something?
This works fine:
String processor:
#StreamListener(TopicBinding.INPUT)
#SendTo(TopicBinding.OUTPUT)
public String process(String message) {
return message + " world";
}
String test:
#Test
#SuppressWarnings("unchecked")
public void testString() {
Message<String> message = new GenericMessage<>("Hello");
topicBinding.input().send(message);
Message<String> received = (Message<String>) messageCollector.forChannel(topicBinding.output()).poll();
assertThat(received.getPayload(), equalTo("Hello world"));
}
But when I try to use KStream in my process, I can't get the TestBinder to be working.
Kstream processor:
#SendTo(TopicBinding.OUTPUT)
public KStream<String, String> process(
#Input(TopicBinding.INPUT) KStream<String, String> events) {
return events.mapValues((value) -> value + " world");
}
KStream test:
#Test
#SuppressWarnings("unchecked")
public void testKstream() {
Message<String> message = MessageBuilder
.withPayload("Hello")
.setHeader(KafkaHeaders.TOPIC, "event.sirism.dev".getBytes())
.setHeader(KafkaHeaders.MESSAGE_KEY, "Test".getBytes())
.build();
topicBinding.input().send(message);
Message<String> received = (Message<String>)
messageCollector.forChannel(topicBinding.output()).poll();
assertThat(received.getPayload(), equalTo("Hello world"));
}
As you might have noticed, I omitted the #StreamListener from the Kstream processor, but without it it doesn't seem like the testbinder can find the handler. (but with it, it doesn't work when starting up the application)
Is this a known bug / limitation, or am I just doing something stupid here?
The test binder is only for MessageChannel-based binders (subclasses of AbstractMessageChannelBinder). The KStreamBinder does not use MessageChannels.
You can testing using the real binder and an embedded kafka broker, provided by the spring-kafka-test module.
Also see this issue.

Resources