spring cloud stream when a lot of different event types - spring

I want an advise according usage of spring cloud stream technologies.
Currently my service use spring-boot and implements some event-based approaches.
But the events are not sent to some kind of broker, but are simply handled by handlers in separate threads.
I am interested in spring cloud stream technology.
I have implemented CustomMessageRoutingCallback as shown in this example https://github.com/spring-cloud/spring-cloud-stream-samples/tree/main/routing-samples/message-routing-callback.
But the problem, that declaring all consumers at config in this way sounds like a pain:
#Bean
public Consumer<Menu> menuConsumer(){
return menu -> log.info(menu.toString());
}
Because I have around 50-60 different event types. Is where any way to register consumers dynamicly? Or the better way will be declare consumer with some raw input type, then deserialize message in consumer and manually route message to the right consumer?

This really has nothing to do with s-c-stream and more of an architectural question. If you have 50+ different event types having that many diff3rent consumers would be the least of your issues. The question I would be asking - is it really feasible to trust a single application to process that many different event types? What if a single event processing results in the system failure. Are you willing to live with non of the events being processed until the problem is fixed?
This is just an example, but there are many other architectural questions that would need to be answered before you can select a technology

A possible option is to create a common interface for your events
#Bean
public Consumer<CommonIntefaceType> menuConsumer(){
return commonIntefaceTypeObj -> commonIntefaceTypeObj.doSomething();
}

Related

Processing incoming payloads as batch not working as expected in spring-cloud-streams

I say 'not working as expected' but actually is more like 'I don't really know if I'm doing the proper work in here', I feel like I'm mixing stuff from different approaches and doesn't really correlate.
Right now I've been using Spring Cloud Streams to process String-type messages from a PubSub subscription and so far so good, message in message out without much of a hassle.
What I'm trying to achieve now is to gather, let's say, 1000 messages, process them and send them altogether to another PubSub Topic. Still unsure about sending them as a List or individually like now, but all at the same time (this shouldn't be related to this question though).
Now I just discovered the following property.
spring.cloud.stream.bindings.input.consumer.batch-mode=true
Together with the following ones more specific to the GCP stuff.
spring.cloud.gcp.pubsub.publisher.batching.enabled=true
spring.cloud.gcp.pubsub.publisher.batching.delay-threshold-seconds=300
spring.cloud.gcp.pubsub.publisher.batching.element-count-threshold=100
So first question is... Are they linked by any means? Must I have the first one together with the other three?
What happened after I added the previous properties to my application.properties file is actually no change at all. Messages keep arriving and leaving the application without any issue and with no batch approach whatsoever.
Currently using the functional features the following way.
#Bean
public Function<Message<String>, String> sampleFunction() {
... // Stream processing in here
return processedString;
}
I was expecting this to crash with some message since the method only receives a String, not a list of String. Since it didn't crash, I modified the method above to receive a list of String (maybe Spring does some magic behind the scenes to still receive messages as String but collect them in a list for the method to process afterwards?).
#Bean
public Function<Message<List<String>>, String> sampleFunction() {
... // Stream processing in here
return processedString;
}
But this just crashes since it's trying to parse a single String message as a List of String.
How could I prepare the code to batch all those String messages into a List? Is there any example on this?
...batch-mode only works with binders that support it (e.g. Kafka, RabbitMQ). It doesn't look like the GCP binder supports it (I see no references to the property).
https://github.com/spring-cloud/spring-cloud-gcp/blob/master/spring-cloud-gcp-pubsub-stream-binder/src/main/java/org/springframework/cloud/gcp/stream/binder/pubsub/PubSubMessageChannelBinder.java
https://docs.spring.io/spring-cloud-stream/docs/3.1.0/reference/html/spring-cloud-stream.html#_batch_consumers
Publisher batching is not related to consumer batching.

Kafka Streams - override default addSink implementation / custom producer

It is my first post to this here and I am not sure if this was covered here before, but here goes: I have a Kafka Streams application, using Processor API, following the topology below:
1. Consume data from an input topic (processor.addSource())
2. Inserts data into a DB (processor.addProcessor())
3. Produce its process status to an output topic (processor.addSink())
App works big time, however, for traceability purposes, I need to have in the logs the moment kstreams produced a message to the output topic, as well as its RecordMetaData (topic, partition, offset).
Example below:
KEY="MY_KEY" OUTPUT_TOPIC="MY-OUTPUT-TOPIC" PARTITION="1" OFFSET="1000" STATUS="SUCCESS"
I am not sure if there is a way to override the default kafka streams producer to add this logging or maybe creating my own producer to plug it on the addSink process. I partially achieved it by implementing my own ExceptionHandler (default.producer.exception.handler), but it only covers the exceptions.
Thanks in advance,
Guilherme
If you configure the streams application to use a ProducerInterceptor, then you should be able to get the information you need. Specifically, implementing the onAcknowledgement() will provide access to everything you listed above.
To configure interceptors in a streams application:
Properties props = new Properties();
// add this configuration in addition to your other streams configs
props.put(StreamsConfig.producerPrefix(ProducerConfig.INTERCEPTOR_CLASSES_CONFIG), Collections.singletonList(MyProducerInterceptor.class));
You can provide more than one interceptor if desired, just add the class name and change the list implementation from a singleton to a regular List. Execution of the interceptors follows the order of the classes in the list.
EDIT: Just to be clear, you can override the provided Producer in Kafka Streams via the KafkaClientSupplier interface, but IMHO using an interceptor is the cleaner approach. But which direction to go is up to you. You pass in your KafkaClientSupplier in an overloaded Kafka Streams constructor.

bind destinations dynamically for producers and consumers (Spring)

I'm trying to send and receive messages to channels/topics whose destination names are in a database, so they can be added/modified/deleted at runtime, but I'm surprised I have found little on the web. I'm using Spring Cloud Streams to allow to change the underlying broker.
To send messages to dynamically bound destinations I'm going with BinderAwareChannelResolver.resolveDestination(target).send(message), but I haven't found something that works like it to receive messages.
My questions are:
1. Is there something similar?
2. how can the message be processed periodically as #StreamListener does?
3. And not as important, but can you create a subscriber automatically in case there is none?
Thanks for any help!
This is a bit out of scope of the original design of the framework. But I would further question your architecture. . . If you truly desire to subscribe to unlimited amount of destinations I wonder why? What is the underlying business requirement?
Keep in mind that even if we were to do it somehow that would require creation of a message listener container dynamically for each new destination which would raise more questions, such as, how long would such container have to live since eventually you would run out of resources.
If, however, you simply asking about possibility of mapping multiple destinations to a single channel so all messages go to the same message handler (e.g., StreamListener), then you can simply use input destination property and define multiple destination delimited by comas.

"ObjectMessage usage is generally discouraged", what to use instead?

The ActiveMQ docs state:
Although ObjectMessage usage is generally discouraged, as it
introduces coupling of class paths between producers and consumers,
ActiveMQ supports them as part of the JMS specification
Having not had much experience with message busses, I have been approaching them as conceptually similar to SOAP web services, where you specify the service interface contract for consumers, who then construct equivalent class proxies.
What I am trying to achieve is:
Publishers in some way indicate the schema of the message
Subscribers in some way know the schema of the message
ObjectMessage solves this problem, although not in the nicest way given the noted classpath coupling. As far as I can see the other message types provide minimal guidance to the consumer as to the expected message format (e.g. consumers would have to assume that a MapMessage contained certain keys with certain value types).
Is there another reasonable way to accomplish this, or is this not even something I should be pursuing?
Since the idea is for publishers/subscribers to know about the schema. The first step is to definitely have a structure to the payload using JSON/ protobuf. (Not a big fan of XML personally). And then we pass the data as either TextMessage / BytesMessage.
While the idea is for publishers/subscribers to communicate the schema. Couple of ways to achieve this:
Subscriber knows about the schema via publishér's javadoc or sample invocations . (Sounds fine for simple use-cases)
Have a centralized config to publish both the publisher and for the subscriber to pick up from. This config could lie in a database/ application that serves out configurations. An effective implementation would ensure neither publisher/subscriber will break if there are modifications.
Advantages of this approach over the Object message approach:
No tight coupling of payload (i.e jar upgrades/attribute changes etc)
Significant performance improvement - Here's an example where a Java class with string and int takes 3.7x times more than directly storing int and string as bytes.

Solution for composite events with Apache Kafka?

Architecture question: We have an Apache Kafka based eventing system and multiple systems producing / sending events. Each event has some data including an ID and I need to implement a "ID is complete"-event. Example:
Event_A(id)
Event_B(id)
Event_C(id)
are received asynchonrously, and only once all 3 events are received, I need to send a Event_Complete(id). The problem is that we have multiple clusters of consumers and our database is eventual consistent.
A simple way would be to use the eventually consistent DB to store which events we have for each ID and add a "cron" job to catch race conditions eventually.
It feels like a problem that might have been solved out there already. So my question is, is there a better way to do it (without introducing a consistent datastore to the picture)?
Thanks a bunch!

Resources