Searching for architecture approach for converting Kafka Message to other formats - spring-boot

We're using Kafka as a broker which takes notifications from different message sources and then routes them to one or more target apps like Slack or E-Mail. Having such an approach it is necessary to convert the Kafka message into different output formats like JSON or E-Mail before they are sent to the apps.
I thought of having Microservices with SpringBoot at the target ends which takes the message from Kafka, converts it using one of the common template languages like Velocity or Freemarker into the target format and then forwards the converted result to the given target app.
Would you agree with such an approach or are there better ways, some caveats or even no-gos to do it this way? What about performance? Any experience in this?
Thanks for your honest assessment.

Why not have a single serialization format and let each service deserialize the payload for their use case? Templating with something like Velocity or Freemarker seems like a specific concern independent of the data used to populate the template. Maybe focus on broadcasting the raw data.

Related

What is the best way to pass Kafka Headers in distributed applications?

Currently my team and me are working in a project where we develop a distributed Kafka processing application. And we want to know how to conveniently pass Kafka Headers through these components.
What does this mean? Please have a look at the following sample image:
Our Component A receives events and generates based on the content a unique Header with metadata information about the event. After processing the result is passed into Topic A with the generated Header.
Component B and Component C receive these events, start processing and write their results into Topic B and Topic C. These components don't use the generated Headers from Component A.
But Component D needs it. So Component B and Component C must receive the header and pass it through.
Our system in the project is a little bit bigger than this example and that's why we questioned ourselves: What is the best way to pass Kafka Headers through these components. Is there an automatic way?
We considered following approaches:
Don't use Kafka Headers at all and pass the metadata information though the body.
Usage of interceptors (if that is even possible)
FYI we use Spring Boot.
Thanks in advance,
Michael

"ObjectMessage usage is generally discouraged", what to use instead?

The ActiveMQ docs state:
Although ObjectMessage usage is generally discouraged, as it
introduces coupling of class paths between producers and consumers,
ActiveMQ supports them as part of the JMS specification
Having not had much experience with message busses, I have been approaching them as conceptually similar to SOAP web services, where you specify the service interface contract for consumers, who then construct equivalent class proxies.
What I am trying to achieve is:
Publishers in some way indicate the schema of the message
Subscribers in some way know the schema of the message
ObjectMessage solves this problem, although not in the nicest way given the noted classpath coupling. As far as I can see the other message types provide minimal guidance to the consumer as to the expected message format (e.g. consumers would have to assume that a MapMessage contained certain keys with certain value types).
Is there another reasonable way to accomplish this, or is this not even something I should be pursuing?
Since the idea is for publishers/subscribers to know about the schema. The first step is to definitely have a structure to the payload using JSON/ protobuf. (Not a big fan of XML personally). And then we pass the data as either TextMessage / BytesMessage.
While the idea is for publishers/subscribers to communicate the schema. Couple of ways to achieve this:
Subscriber knows about the schema via publishér's javadoc or sample invocations . (Sounds fine for simple use-cases)
Have a centralized config to publish both the publisher and for the subscriber to pick up from. This config could lie in a database/ application that serves out configurations. An effective implementation would ensure neither publisher/subscriber will break if there are modifications.
Advantages of this approach over the Object message approach:
No tight coupling of payload (i.e jar upgrades/attribute changes etc)
Significant performance improvement - Here's an example where a Java class with string and int takes 3.7x times more than directly storing int and string as bytes.

Talend Open Studio for ESB 5.2 Route to Job Optimisation/Performance Issue

Using Talend ESB 5.2.0, I want to create a mediation route that will call a processing job on the payload of an inbound request to a CXF messaging endpoint, however my current implementation is suffering some performance issues with large payloads.
I’ve investigated the issue and found that the bottleneck is in marshalling my inbound XML payload from the tRouteInput component to the internal row structure for processing, using a tXMLMap.
Is it possible, using a built-in type converter in the route, to marshal the internal row structure from the route and stream through POJOs or transport objects that are cheaper to process in the job? Or is there a better way to marshal XML to Talend’s internal row structure from a route using a less expensive transform?
Any thoughts would be welcome.
Cheers,
mids
It turns out that the issue was caused by the format of the inbound XML payload - having more than one loop element mapping to separate output flows from the tXMLMap generates relative links for each item for each output flow, enabling more advanced processing involving the loops if required.
This caused the large memory overhead that led to poor throughput.
Not requiring any more advanced processing in the XML to Talend row conversion, we overcame this issue by splitting the payload to its distinct loop elements using tReplicate and tExtractXMLField components before mapping out of the XML in separate tXMLMaps to avoid the auto-generation of those links.- mids

ETL , Esper or Drools?

The question environment relates to JavaEE, Spring
I am developing a system which can start and stop arbitrary TCP (or other) listeners for incoming messages. There could be a need to authenticate these messages. These messages need to be parsed and stored in some other entities. These entities model which fields they store.
So for example if I have property1 that can have two text fields FillLevel1 and FillLevel2, I could receive messages on TCP which have both fill levels specified in text as F1=100;F2=90
Later I could add another filed say FillLevel3 when I start receiving messages F1=xx;F2=xx;F3=xx. But this is a conscious decision on the part of system modeler.
My question is what do you think is better to use for parsing and storing the message. ETL (using Pantaho, which is used in other system) where you store the raw message and use task executor to consume them one by one and store the transformed messages as per your rules.
One could use Espr or Drools to do the same thing , storing rules and executing them with timer, but I am not sure how dynamic you could get with making rules (they have to be made by end user in a running system and preferably in most user friendly way, ie no scripts or code, only GUI)
The end user should be capable of changing the parse rules. It is also possible that end user might want to change the archived data as well (for example in the above example if a new value of FillLevel is added, one would like to put a FillLevel=-99 in the previous values to make the data consistent).
Please ask for explanations, I have the feeling that I need to revise this question a bit.
Thanks
Well Esper is a great CEP engine, but drools has it's own implementation Drools Fusion which integrates really well with jBpm. That would be a good choice.

How do I write message queue handling in an object-oriented way?

If you had to write code that takes messages from a message queue and updates a table in a database, how would you go about structuring it in a good oo way. How would you structure it? The messages is XML data, one node per row in the table. The rows in the table could be updated, deleted or inserted.
I don't believe you've provided enough information for a good answer. What do the messages look like? Do they vary in contents/type, or are they all just "messages"? Do they interact with each other, or is this just a data format conversion? One of the keys to OO development is to realize that the "find the nouns-n-verbs" game (which is as much as you've described) rarely leads to the best solution. It certainly won't be the worst, but you'll end up with data aggregation and a bunch of procedural code.
Procedural code isn't bad, though. Why does it need to be OO? Does the problem itself require polymorphism and data hiding? Is there any complex behavior that you are trying to model? There's no shame in using a non-OO solution, when the problem is simple.
Normally with OO implementations of message queues you make the classes that represent the individual types of messages yourself. To the extent that the different message types that you expect to get are derivates of each other, this provides your class hierarchy for the messages.
With configuration based persistence frameworks you can just set up presistence for these classes directly.
Then there's one or more classes that listen to the message queue and just persist the messages, probably just one. It doesn't have to be more elaborate than that.
The best way of building OO code when doing messaging or dealing with any kind of middleware is to hide the middleware APIs from your code and just deal with business logic.
e.g. see these examples
POJO Consuming which is pretty much the use case you describe and
POJO Producing if ever you need to send messages to a message queue.
Then you just need to define what your Data Transfer Objects look like; how you want to encode things on the wire in XML / JSON / whatever.
The great thing about this approach is your code is now totally middleware agnostic - you could swap out your message queue and use a database or JavaSpace or in-memory SEDA or files or any other communication protocol or middleware API.

Resources