Publish multiple events shares some attributes in one kafka topic - spring-boot

I need to publish multiple messages from the same project which represents employee journey events, and i need to use one topic only to publish these messages as they are representing the same project, but in some cases the message may contain extra fields for example:
All messages share (id, name, type, date) and
may some events have more fields like (course id, course name), so I am intending to use one parent object called "Journey", contains "Event" object, and I will create multiple children objects like 'LMSEvent' that extends this Event, etc if needed. Also using the Jackson + spring boot over rest APIs to do the needed cast based on type attribute. Finally, then this message to Kafka directly, so, each object contains its own properties.
For the consumer, I will do some strategy patterns and do the required logic per each type if needed.
The message size will not be very big and i don't expect to have more different attributes per each event.
I am looking to know if this approach is good or not and in case is not, what is the alternative.

I think that in general it is good approach. Having single message schema on topic or multiple schemas is always good question and both has some bright sights and drawbacks, you can read more about it in Martin Kleppmann article.
When you decided to have multiple events on single topic, starting from rest api and next by Kafka producer and consumer you can use the same approach of serializing and deserializing events, #JsonTypeInfo and #JsonSubTypes does the job:
#JsonTypeInfo(
use = JsonTypeInfo.Id.NAME,
include = JsonTypeInfo.As.EXISTING_PROPERTY,
property = "type")
#JsonSubTypes({
#JsonSubTypes.Type(value = LMSEvent.class, name = "LMSEvent"),
#JsonSubTypes.Type(value = YetAnotherEvent.class, name = "YetAnotherEvent")
})
public interface Event {
String getType();
default boolean hasType(String type) {
return getType().equalsIgnoreCase(type);
}
default <T> T getConcreteEvent(Class<T> clazz) {
return clazz.cast(this);
}
}
When you consume that type of messages using spring-kafka you can define some very neat code, where each method is consuming concrete event type, so you don't need to write some dirty casting by your own:
#KafkaListener(topics = "someEvents", containerFactory = "myKafkaContainerFactory")
public class MyKafkaHandler {
#KafkaHandler
void handleLMSEvent(LMSEvent event) {
....
}
#KafkaHandler
void handleYetAnotherEvent(YetAnotherEvent yetAnotherEvent) {
...
}
#KafkaHandler(isDefault = true)
void handleDefault(#Payload Object unknown,
#Header(KafkaHeaders.OFFSET) long offset,
#Header(KafkaHeaders.RECEIVED_PARTITION) int partitionId,
#Header(KafkaHeaders.RECEIVED_TOPIC) String topic) {
logger.info("Server received unknown message {},{},{}", offset, partitionId, topic);
}
}
Full code

Related

How to consume messages from a Debezium topic from Quarkus?

I'm trying to set up an application which produces change events with MySQL+Debezium+Kafka. I'd like to consume messages from the Debezium topic with a Quarkus Microprofile application.
I'm using the following configuration on the Quarkus side to capture incoming messages:
mp.messaging.incoming.customers.connector=smallrye-kafka
mp.messaging.incoming.customers.topic=dbserver1.inventory.customers
mp.messaging.incoming.customers.value.deserializer=org.apache.kafka.common.serialization.StringDeserializer
That works, however the change event, when captured with a StringDeserializer, does not just contain the changed record:
{"schema":{"type":"struct","fields":[{"type":"struct","fields":[{"type":"int32","optional":false,"field":"id"},{"type":"string","optional":false,"field":"first_name"},{"type":"string","optional":false,"field":"last_name"},{"type":"string","optional":false,"field":"email"}],"optional":true,"name":"dbserver1.inventory.customers.Value","field":"before"},{"type":"struct","fields":[{"type":"int32","optional":false,"field":"id"},{"type":"string","optional":false,"field":"first_name"},{"type":"string","optional":false,"field":"last_name"},{"type":"string","optional":false,"field":"email"}],"optional":true,"name":"dbserver1.inventory.customers.Value","field":"after"},{"type":"struct","fields":[{"type":"string","optional":false,"field":"version"},{"type":"string","optional":false,"field":"connector"},{"type":"string","optional":false,"field":"name"},{"type":"int64","optional":false,"field":"ts_ms"},{"type":"string","optional":true,"name":"io.debezium.data.Enum","version":1,"parameters":{"allowed":"true,last,false"},"default":"false","field":"snapshot"},{"type":"string","optional":false,"field":"db"},{"type":"string","optional":true,"field":"table"},{"type":"int64","optional":false,"field":"server_id"},{"type":"string","optional":true,"field":"gtid"},{"type":"string","optional":false,"field":"file"},{"type":"int64","optional":false,"field":"pos"},{"type":"int32","optional":false,"field":"row"},{"type":"int64","optional":true,"field":"thread"},{"type":"string","optional":true,"field":"query"}],"optional":false,"name":"io.debezium.connector.mysql.Source","field":"source"},{"type":"string","optional":false,"field":"op"},{"type":"int64","optional":true,"field":"ts_ms"},{"type":"struct","fields":[{"type":"string","optional":false,"field":"id"},{"type":"int64","optional":false,"field":"total_order"},{"type":"int64","optional":false,"field":"data_collection_order"}],"optional":true,"field":"transaction"}],"optional":false,"name":"dbserver1.inventory.customers.Envelope"},"payload":{"before":null,"after":{"id":1005,"first_name":"myname","last_name":"myusername","email":"amail#mail.com"},"source":{"version":"1.3.0.Final","connector":"mysql","name":"dbserver1","ts_ms":1603634203000,"snapshot":"false","db":"inventory","table":"customers","server_id":223344,"gtid":null,"file":"mysql-bin.000003","pos":364,"row":0,"thread":6,"query":null},"op":"c","ts_ms":1603634203419,"transaction":null}}
How can I extract the changed data from this huge JSON?
which in my case is:
{"id":1005,"first_name":"myname","last_name":"myusername","email":"amail#mail.com"}
Should I keep using a StringDeserializer and use JSONB and iterate through the JSON Payload? or is there a better solution?
I don't think there's a better approach for that, however, as the text is a JSON, using a custom Deserializer that extends would JsonbDeserializer work:
#RegisterForReflection
public class CustomerDeserializer extends JsonbDeserializer<Customer> {
#Override
public Customer deserialize(String topic, byte[] data) {
JsonReader reader = Json.createReader(new StringReader(new String(data)));
JsonObject jsonObject = reader.readObject();
JsonObject payload = jsonObject.getJsonObject("payload");
String firstName = payload.getJsonObject("after").getString("first_name");
String lastName = payload.getJsonObject("after").getString("last_name");
String email = payload.getJsonObject("after").getString("email");
return new Customer(firstName,lastName,email);
}
}
Edit: You can find the full Debezium example here.

Spring-AMQP - routing based on message headers

As per the documentation: https://docs.spring.io/spring-amqp/docs/2.2.5.RELEASE/reference/html/#async-annotation-driven
We can have different handlers for messages based on it's converted class type like:
#RabbitListener(id="multi", queues = "someQueue")
#SendTo("my.reply.queue")
public class MultiListenerBean {
#RabbitHandler
public String thing2(Thing2 thing2) {
...
}
#RabbitHandler
public String cat(Cat cat) {
...
}
#RabbitHandler
public String hat(#Header("amqp_receivedRoutingKey") String rk, #Payload Hat hat) {
...
}
#RabbitHandler(isDefault = true)
public String defaultMethod(Object object) {
...
}
}
This I believe won't be great in performance since it has to do a trial and error to cast the incoming payload.
Instead, how to filter based on a condition say a header value? If header['operation']="order" then cast the message payload to Order class.
This I believe won't be great in performance since it has to do a trial and error to cast the incoming payload.
Usually, type information is conveyed in headers and the MessageConverter uses that information to create the payload - there is no "trial and error".
If you don't use one of the supplied converters, you can create your own, based on header['operation'].

SpringBoot RabbitMQ - how to reduce boilerplate for many topics (events)?

I wonder if there is a way to reduce amount of boilerplate code when initializing many RabbitMQ queues/bindings in SpringBoot?
Following event-driven approach, my app produces like 50 types of events (it will be split into several smaller apps later, but still).
Each event goes to exchange with type "topic".
Some events are getting consumed by other apps, some events additionally consumed by the same app which is sending them.
Lets consider that publishing-and-self-consuming case.
In SpringBoot for each event I need to declare:
routing key name in config (like "event.item.purchased")
queue name to consume that event inside the same app
("queue.event.item.purchased")
matching configuration properties class field or a variable itemPurchasedRoutingKey or constant in code which keeps property name (like ${event.item.purchased})
bean for Queue creation (with a name featuring event name) like
itemPurchasedQueue
bean for Binding creation (with a name featuring
event name) and routing key name. like itemPurchasedBinding which is
constructed with itemPurchasedQueue.bind(...itemPurchasedRoutingKey)
RabbitListener for event, with annotation containing queue name
(can't be defined in runtime)
So - 6 places where "item purchased" is mentioned in one or another form.
The amount of boilerplate code is just killing me :)
If there are 50 events, its very easy to make a mistake - when adding new event, you need to remember to add it to 6 places.
Ideally, for each event I'd like to:
specify routing key in config. Queue name can be built upon it by appending common prefix (specific to the app).
use some annotation or alternative RabbitListener which automatically declares queue (by routing key + prefix), binds to it, and listens to events.
Is there a way to optimize it?
I thought about custom annotations, but RabbitListener doesn't like dynamic queue names, and spring boot can't find beans for queues and bindings if I declare them inside some util method.
Maybe there is a way to declare all that stuff in code, but it's not a Spring way, I believe :)
So I ended up using manual bean declaration and using 1 bind() method for each bean
#Configuration
#EnableConfigurationProperties(RabbitProperties::class)
class RabbitConfiguration(
private val properties: RabbitProperties,
private val connectionFactory: ConnectionFactory
) {
#Bean
fun admin() = RabbitAdmin(connectionFactory)
#Bean
fun exchange() = TopicExchange(properties.template.exchange)
#Bean
fun rabbitMessageConverter() = Jackson2JsonMessageConverter(
jacksonObjectMapper()
.registerModule(JavaTimeModule())
.registerModule(Jdk8Module())
.disable(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES)
.enable(DeserializationFeature.READ_UNKNOWN_ENUM_VALUES_AS_NULL)
)
#Value("\${okko.rabbit.queue-prefix}")
lateinit var queuePrefix: String
fun <T> bind(routingKey: String, listener: (T) -> Mono<Void>): SimpleMessageListenerContainer {
val queueName = "$queuePrefix.$routingKey"
val queue = Queue(queueName)
admin().declareQueue(queue)
admin().declareBinding(BindingBuilder.bind(queue).to(exchange()).with(routingKey)!!)
val container = SimpleMessageListenerContainer(connectionFactory)
container.addQueueNames(queueName)
container.setMessageListener(MessageListenerAdapter(MessageHandler(listener), rabbitMessageConverter()))
return container
}
internal class MessageHandler<T>(private val listener: (T) -> Mono<Void>) {
// NOTE: don't change name of this method, rabbit needs it
fun handleMessage(message: T) {
listener.invoke(message).subscribeOn(Schedulers.elastic()).subscribe()
}
}
}
#Service
#Configuration
class EventConsumerRabbit(
private val config: RabbitConfiguration,
private val routingKeys: RabbitEventRoutingKeyConfig
) {
#Bean
fun event1() = handle(routingKeys.event1)
#Bean
fun event2() = handle(routingKeys.event2)
...
private fun<T> handle(routingKey: String): Mono<Void> = config.bind<T>(routingKey) {
log.debug("consume rabbit event: $it")
... // handle event, return Mono<Void>
}
companion object {
private val log by logger()
}
}
#Configuration
#ConfigurationProperties("my.rabbit.routing-key.event")
class RabbitEventRoutingKeyConfig {
lateinit var event1: String
lateinit var event2: String
...
}

Field Grouping for a Kafka Spout

Can field grouping be done on tuples emitted by a kafka spout? If yes, then how does Storm gets to know the fields in a Kafka record?
Field grouping (and grouping in general) in Storm is for bolts, not for spouts. This is done via InputDeclarer class.
When you call setBolt() on TopologyBuilder, InputDeclarer is returned.
Kafka Spout declared its output fields like any other component. My explanation is based on current implementation of KafkaSpout.
In KafkaSpout.java class we see declareOutputFields method that call getOutputFields() method of KafkaConfig Scheme.
#Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(_spoutConfig.scheme.getOutputFields());
}
By default, KafkaConfig uses RawMultiScheme that implements this method in this way.
#Override
public Fields getOutputFields() {
return new Fields("bytes");
}
So what does it mean?, if you declared bolt which reads tuples from KafkaSpout with fieldGrouping you know that every tuple that contains equals field "bytes" is going to be executed by the same task. If you want to emit any field, you should implement new scheme for your needs.
TL:DR
The default implementation of KafkaSpout declares following output fields in declareOutputFields:
new Fields("topic", "partition", "offset", "key", "value");
So in building topology code directly do:
topologyBuilder.setSpout(spoutName, mySpout, parallelismHintSpout);
topologyBuilder.setBolt(boltName, myBolt, parallelismHintBolt).fieldsGrouping(spoutName, new Fields("key"));
Details: A little looking into code tells that:
In Kafka Spout, declareOutputFields is implemented in following way:
#Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
RecordTranslator<K, V> translator = kafkaSpoutConfig.getTranslator();
for (String stream : translator.streams()) {
declarer.declareStream(stream, translator.getFieldsFor(stream));
}
}
It gets fields from RecordTranslator interface and its instance is fetched from kafkaSpoutConfig i.e. KafkaSpoutConfig<K, V>. KafkaSpoutConfig<K, V> extends from CommonKafkaSpoutConfig (this is slightly different in 1.1.1 version though). The builder of this returns DefaultRecordTranslator. If you check the Fields in this class implementation, you will find:
public static final Fields FIELDS = new Fields("topic", "partition", "offset", "key", "value");
So we can use Fields("key") directly in fields grouping in topology code:
topologyBuilder.setBolt(boltName, myBolt, parallelismHintBolt).fieldsGrouping(spoutName, new Fields("key"));

Why is MassTransit using ConstructorHandling.AllowNonPublicDefaultConstructor for message deserialization?

I'm trying to incorporate MassTransit in a project that also uses NHibernate. NHibernate requires me to have at least a default constructor with protected internal visibility.
I run into the following problem. Messages can be published without any problem, however the handlers receive the message objects with uninitialized members. After some period of debugging and inspection of the MassTransit sources I found out that this is caused by the fact that MassTransit uses the setting ConstructorHandling.AllowNonPublicDefaultConstructor during deserialization, which causes my protected internal default constructor to be called instead of the parametrized constructor. I managed to reproduce this behavior, see code below.
What's the reason behind MassTransit's use of AllowNonPublicDefaultConstructor, and is there any way to change this behavior?
class Program
{
public class TestClass
{
private readonly string _someString;
public string SomeString {
get { return _someString; }
}
public TestClass(string someString)
{
_someString = someString;
}
protected internal TestClass()
{
_someString = "uninitialized";
}
}
static void Main(string[] args)
{
var obj = new TestClass("Hello World");
var serializerSettings = new JsonSerializerSettings
{
ConstructorHandling = ConstructorHandling.AllowNonPublicDefaultConstructor,
ContractResolver = new ... // MassTransit contract resolver that includes private setters
};
string serializedObject = JsonConvert.SerializeObject(obj, serializerSettings);
var deserializedObj = JsonConvert.DeserializeObject<TestClass>(serializedObject, serializerSettings);
// deserializedObj.SomeString == "uninitialized"
}
}
Messages shouldn't have any logic in them at all. Messages are contracts. Any logic in them will only end up bitting you again and again. :( We will always use a default, no parameter, constructor. If there isn't one we won't deserialize your message.
We suggest that you alway consume interfaces instead of concrete types to help enforce the removal of logic from message types. But if you really want to have this behaviour you'll need to write your own serializer.
If you want to discuss further, I suggest you join the mailing list: groups.google.com/group/masstransit-discuss.

Resources