Filter-Interceptor-Kafka topics - spring

Hi am trying to build a analytical engine to determine realtime analysis of urls\events being used by client as well as to log the performance of api.
Following is the logic I am planning to implement:
1. Create a filter to intercept urls
2. Code filter as a reusable jar which have the logic to intercept them
using mvc-interceptors.
3. The interceptor will produce and publish events into kafka streams if url pattern is matched.
My confusion is this is the best approach to achieve this. Or is there any alternative better approach, keeping in mind high traffice flow into apis.

If the filtering is just done a single message at a time it could also be done in Kafka Connect using the new Single Message Transforms feature https://cwiki.apache.org/confluence/display/KAFKA/KIP-66%3A+Single+Message+Transforms+for+Kafka+Connect

Related

How to route based on content with high perfomance?

In nifi, if I am listening to Kafka from single topic and based on the routing logic it'll call the respective process group.
However, in RouteOnContent processor, if we give regular expression for checking the occurance of string will it affect performance or how to achieve the a good performance while routing based on condition.
It would be more efficient to do some split at KSQL / Stream Processing level into different topics and have Nifi reading from different topics?
Running a regex on the content of each message is an inefficient approach, consider if you can modify your approach to one of the following:
Have your Producers write the necessary metadata into a Kafka Header which can use a much more efficient RouteOnAttribute processor in NiFi. This is still message-at-a-time which has throughput limitations
If your messages conform to a schema, use the more efficient KafkaRecord processors in NiFi with a QueryRecord approach which will significantly boost throughput
If you cannot modify the source data and the regex logic is involved, it may be more efficient to use a small Kafka Streams app to split the topic before processing the data further downstream

How to do 2 phase commit between two micro-services(Spring-boot)?

I Have two mico-serives A and B where they connect to seperate database, From Mico-serives A i need to persist(save) objects of both A and B in same transtation how to achive this.
I am using Spring micro-servies with netflix-oss.Please give suggestions on best way to do achive 2 phase commit.
you can not implement traditional transaction system in micro-services in a distributed environment.
You should you Event Sourcing + CQRS technique and because they are atomic you will gain something like implementing transactions or 2PC in a monolithic system.
Other possible way is transaction-log-mining that I think linked-in is using this way but it has its own cons and pros. for e.g. binary log of different databases are different and event in same kind of database there are differences between different versions.
I suggest that you use Event Sourcing + CQRS and string events in an event-store then try reaching eventual consistency base on CAP theorem after transferring multiple events between micro-service A and B and updating domain states in each step.
It is suggested that you use a message broker like ActiveMQ, RabbitMQ or Kafka for sending event-sourced events between different microservices and string them in an event store like mysql or other systems.
Another benefit of this way beside mimicking transactions is that you will have a complete audit log.
It is an architecture(microservices) problem. Spring boot or netflix-oss do not offer a direct solution. You have to implement your own solution. Check with event driven architecture. It can give you some ideas.
You could try the SAGA pattern https://microservices.io/patterns/data/saga.html

Looking For A Scalable PubSub Solution Or Alternative

I'm currently looking for the best architecture for an IM app I'm trying to build.
The app consists of channels each having a couple thousands of subscribed users. Each user is subscribed only to one channel at a time and is able to publish and read from that channel. Users may move rapidly between channels.
I initially considered using the XMPP PubSub (via Ejabbered or MongooseIM) but as far as I understand it was added as an afterthought and is not very scalable.
I also thought about using using a message queue protocol like AMPQ but I'm not sure if that's what I'm looking for from the IM aspect.
Is my concern regarding the XMPP PubSub justified? And if so, do you know of a better solution?
Take a look at Redis and Kafka. Both are scalable and performant.
I imagined below primary usecases for above IM application based on your inputs.
**
Usecases
**
Many new users keep registering with system and subscribing to one
of the channels
Many existing users changing their subscription from one channel to
other channel
Many existing users keep publishing messages to channels
Many existing users keep receiving messages as subscribers
XMPP is natural fit for 3rd and 4th usecases. "ejabbered" is one of proven highly scalable platform to go ahead.
In case 2nd usecase, You probably may have logic some thing like this.
- a)update channel info of the user in DB
- b)make him listen to new channel
- c)change his publishing topic to other channel...so on
When ever you need to do multiple operations, I strongly recommend to use "KAFKA" to perform above operations in async manner
In case of 1st usecase, Provide registration through rest APIs.So that registration can be done from any device.While registering an user,You may have many operations as follows.
- 1) register user in DB
- 2) create internally IM account
- 3) send email OR SMS for confirmation...so on
Here also perform 1st operation as a part of rest API service logic. Perform 2nd and 3rd operations in async manner using KAFKA. That means your service logic perform 1st operation in sync manner and raise an event to KAFKA. Every consumer will handle 2nd and 3rd operations in async manner.
System could scale well if all layers/subsystems can scale well. In that perspective, Below tech stack may help you scale well.
REST APIS + KAFKA + EJABBERED(XMPP)

Solution for composite events with Apache Kafka?

Architecture question: We have an Apache Kafka based eventing system and multiple systems producing / sending events. Each event has some data including an ID and I need to implement a "ID is complete"-event. Example:
Event_A(id)
Event_B(id)
Event_C(id)
are received asynchonrously, and only once all 3 events are received, I need to send a Event_Complete(id). The problem is that we have multiple clusters of consumers and our database is eventual consistent.
A simple way would be to use the eventually consistent DB to store which events we have for each ID and add a "cron" job to catch race conditions eventually.
It feels like a problem that might have been solved out there already. So my question is, is there a better way to do it (without introducing a consistent datastore to the picture)?
Thanks a bunch!

Logstash/not logstash for kafka-elasticsearch integration?

I read that elasticsearch rivers/river plugins are deprecated. So we cannot directly have elasticsearch-kafka integration. If we want to do this then we need to have some java(or any language) layer in between that puts the data from kafka to elastic search using its apis.
On the other hand – if we have kafka-logstash-elasticsearch – that we get rid of the above middle layer and achieve that through logstash with just configuration. But I am not sure if having logstash in between is an overhead or not?
And is my undertsanding right?
Thanks in advance for the inputs.
Regards,
Priya
Your question is quite general. It would be good to understand your architecture, its purpose and assumptions you made.
Kafka, as it is stated in its documentation, is a massively scalable publish-subscribe messaging system. My assumption would be that you use it to as a data broker in your architecture.
Elasticsearch on the other hand, is a search engine, hence I assume that you use it as a data access/searching/aggregation layer.
These two separate systems require connectors to create a proper data-pipeline. That's where Logstash comes in. It allows you to create data streaming connection between, in your case, Kafka and Elasticsearch. It also allows you to mutate the data on the fly, depending on your needs.
Ideally, Kafka uses raw data events. Elasticsearch stores documents which are useful to your data consumers (web or mobile application, other systems etc.), so can be quite different to the raw data format. If you need to modify the data between its raw form, and ES document, that's where Logstash might be handy (see filters stage).
Another approach could be to use Kafka Connectors, building custom tools e.g. based on Kafka Streams or Consumers, but it really depends on the concepts of your architecture - purpose, stack, data requirements and more.

Resources