Spring Batch and Kafka - spring-boot

I am a junior programmer in banking. I want to make a microservice system that get data from kafka and processes it. after that, save to database and send final data to client app. What technology can i use? I plan to use spring bacth and kafka. Can the technology be implemented in my project or is there a better alternative?

To process data from a Kafka topic I recommend you to use Kafka Streams API, especially Spring Kafka Streams.
Kafka Streams and Spring
And to store the data in a database, you should use a Kafka Sink Connector.
Kafka Connect
This approach is very common and easy if your company has a Kafka ecosystem.

In terms of alternatives, here you will find an interesting comparison:
https://scramjet.org/blog/welcome-to-the-family
3 in 1 serverless
Scramjet takes a slightly different approach - 3 platforms in one.
Both the free product https://hub.scramjet.org/ for installation on your server and the cloud platform are available - currently also free in the beta version https://scramjet.org/#join-beta

Related

Kafka connect with EventStoreDB

I'm working on a small academic project - Event sourcing with EventStoreDB and Apache Kafka as a broker. The idea is that get events from EventStoreDB and push them to Kafka for further distribution. I saw Apache Kafka has connections to different DB systems but didn't find any connector with EvenStoreDB.
How can I create(code or use existing one) Kafka connector to EventStoreDB, so these two systems would be able to transfer events vise-versa, from Kafka to EventStoreDB and from EventStoreDB to Kafka?
There is no official Kafka Connect Connector between Kafka and EventStoreDB, and I haven't heard about any unofficial so far. Still, there is a tool called Replicator that enables replicating data from EventStoreDB to Kafka (https://replicator.eventstore.org/docs/features/sinks/kafka/). It's open-sourced, so you can either use it or check the implementation.
For the EventStoreDB to Kafka, I recommend using the subscriptions mechanism: catch-up if you need an ordering guarantee, persistent if ordering is not critical: https://developers.eventstore.com/clients/grpc/subscriptions.html. The crucial part here is to define how to map EventStoreDB streams to Kafka topics and partitions. Typically you'd expect to have at least an ordering guarantee on the stream level, so single stream events should land to the same partition.
For Kafka to EventStoreDB integration, you could either write your own pass-through service or try to use the HTTP sink connector (e.g. https://docs.confluent.io/kafka-connect-http/current/overview.html). EventStoreDB exposes HTTP API (https://developers.eventstore.com/clients/http-api/v5/introduction/). Sidenote, this API (Atom pub based) may be replaced with another HTTP API in the future, so the structure may change.
You can use Event Store Replicator, which has a Kafka sink.
Keep in mind that it doesn't do anything with regards to events schema, so things like Kafka Streams and KSQL might not work properly.
The sink was created solely for the purpose of pushing events to Kafka being used as a message broker.

Spring Kafka JDBC Connector compatibility

Is Kafka JDBC connect compatible with Spring-Kafka library?
I did follow https://www.confluent.io/blog/kafka-connect-deep-dive-jdbc-source-connector/ and still have some confusions.
Let's say you want to consume from a Kafka topic and write to a JDBC database. Some of your options are
Use plain Kafka consumer to consume from the topic and use Jdbc api to write the consumed record to database.
Use spring Kafka to consume from the Kafka Topic and spring jdbc template or spring data to write it to the database
Use Kafka connect with Jdbc connector as sink to read from topic and write to a table.
So as you can see
Kafka Jdbc connector is a specialised component that can only do one job.
Kafka Consumer is very genric component which can do lot of job and you will be writing lot of code. In facr, it will be the foundational API from which other frameworks build on and specialise.
Spring Kafka simplfies it and let you deal with kafka records as java objects but doesnt tell you how to write that object to your db.
So they are alternative solutions to fulfil the task. Having said that you may have a flow where different segments are controlled by different teams and for each segment, any of them can be used and Kafka topic will act as joining channel

How to update data in real-time

I have a small stock-market application with Spring boot and if any product updated I want to serve an updated product to the clients in realtime
does it make sense to use message queues like RabbitMQ and Sse(Server Sent Events) for this, or is there a more sensible solution?
Solution
Publish your updated data to some channel
Your clients should subscribe to that channel to get updated feed in real-time.
Tools
Use in-house setup for RabbitMQ, ActiveMQ, Kafka or other open-source tools and implement WebSocket (For Front end applications)
Use commercial service like Google Cloud PubSub
Readymade and fully packaged solution with supported SDK for backend and frontend, https://www.pubnub.com/.
For this you can use either of
Spring Integration
Web Sockets
JMS
Spring Integration is an implementation of Enterprise Integration Patterns and is ideal for asynchronous processing data at realtime.
However, looking at your scope, it is only about publisher-subscriber pattern. Hence can be solved with JMS.
With JMS the subscribers/consumers can register/de-register dynamically. Also it provides ways to have fall-backs and tracking.

For a spring enterprise web application with multiple instances, What is the way to retrieve the offset value from Kafka and store it?

I'm working on an enterprise web application that has a requirement to read from a Kafka system and then trigger events. Can anyone suggest a way to get the offset and also an ideal way to store the offset (Ideal way should be able to handle accessing by multiple instances of the application)?
Note:-
I'm using spring-kafka and open for any further suggestions.
Thanks in advance.
With recent versions of Kafka, the offset is stored in a kafka topic. Kafka keeps track of the consumer offset for each partition in a topic __consumer_offsets which is a compacted topic; in other words; kafka itself keeps track of the offset for each consumer group.
With Spring for Apache Kafka; several options are provided for when the offset is committed.
In earlier versions of kafka offsets were often stored externally; it's now a lot simpler.
There may still be use cases for that but such scenarios are all supported by Spring Kafka; especially with the upcoming 2.0 release.

Is Spring XD the right tool choice?

We're building an M2M IoT platform and part of the ecosystem is a Big Data storage and analytics component.
The platform connects devices at one end and provides a streaming data output using ActiveMQ to interface with the Big Data application layer.
I'm right now designing this middle layer which accepts machine data, running real time processes and stores this data in to a Hadoop storage module.
From what I see, Spring XD seems to be able to orchestrate this process from ingestion, to filtering, processing, analytics and export to Hadoop.
However, I do not know anyone who has done something like this. Anyone here who has executed something similar? Need your feedback into the choice of tool for the middleware.
Spring XD is great with RabbitMQ, for ActiveMQ you can use the JMS connector.
For more information take a look at Spring Integration, which is the main underpinnings and has been around for ever.
Spring XD runs on YARN or Zookeeper which are very solid.
I have seen it used for orchestration of big data in a few places.

Resources