Checkpointing with Spring AWS Integration - spring-boot

According to Spring release notes, spring-integration-aws.1.1.0.M1 does not include DynamoDB MetaDataStore implementation. There is still ConcurrentMetadataStore class which is a key-value based store and based on implementation I suppose it maps streams with latest sequence number read. But it does not use any data store as to retrieve checkpoints.
I am using spring integration for kinesis consuming and need to implement checkpointing. I am wondering if I need to do it manually by connecting to DynamoDB and always update checkpoints or there is another way of doing it using spring framework?
P.S: I can't use Spring Cloud KinesisBinderConfiguration as I dynamically consume events from a list of configurable streams.
Thank you

If you are not talking about Spring Cloud Stream and the AWS Kinesis Binder implementation, then I don't see any blockers for you to upgrade your solution to the Spring Integration AWS 2.0 and go ahead with already provided DynamoDbMetaDataStore.
Or if that is so hard for you to move to the Spring Integration 5.0, then you simply can consider to copy/paste an implementation to your own class and inject it into the KinesisMessageDrivenChannelAdapter: https://github.com/spring-projects/spring-integration-aws/blob/master/src/main/java/org/springframework/integration/aws/metadata/DynamoDbMetaDataStore.java
Although it is really available in the 1.1.0.RELEASE - I don't see reason for your to stick with the 1.1.0.M1: https://spring.io/blog/2017/11/27/spring-integration-for-aws-1-1-ga-available

Related

Spring Cloud Stream with Project Reactor Stability

I want to use Spring Cloud Stream for consuming and processing Apache Kafka queues and writing them to MongoDB. I saw that there is an option of using the library so that functions will be Reactive, or Imperative. In most Spring projects the imperative way is the default, but as for my understanding, in spring cloud stream the reactive paradigm is the default.
I wonder what is considered the most “stable” api e.g. what is recommended to use for enterprise?
Reactive API is stable and yes we provide support for it. In other words you can write functions using reactive API (e.g., Function<Flux, Flux>).
However, i want to be very clear that support for API does not mean support for the full stack of reactive capabilities since those actually depend on source and targets which are not reactive.
That said, with Kafka you can rely on native reactive support provided by Kafka itself and Spring Cloud Stream using Kafka Streams binder - https://docs.spring.io/spring-cloud-stream-binder-kafka/docs/3.1.5/reference/html/spring-cloud-stream-binder-kafka.html#_kafka_streams_binder

Spring Reactive Stack with Spring for Apache Kafka

In a few words:
I'm trying to decide between using the default Spring for Apache Kafka stack, KafkaTemplate or the pair, ReactiveKafkaProducerTemplate and ReactiveKafkaConsumerTemplate for my Reactor based application.
Some more context:
In the company I work we're developing a high-disponibility application aiming to publish a set of requests directly to a Kafka Broker. Since this is an API centric application expecting to receive a few millions of requests per week, we decided to go with a stack based on the Project Reactor with Spring WebFlux and Kotlin.
After doing some digging I've discovered that the Spring for Apache Kafka has a simple wrapper designed around the Reactor Kafka implementation, but this wrapper lacks a lot of the functionalities present in the default KafkaTemplate mentioned before, things like: A Metrics Binder out of the box (for prometheus integration), associated factories, extensive documentation, Auto configuration, etc.
I'm trying to understand what I'm really giving up when using the default implementation in favor of the Reactive one. Am I giving up back pressure functionality? Am I sacrificing the Reactive Stack present in my application? Will this be a toll in the future? Does anyone has some experience in working with a Reactive Stack alongside a non-reactive solution?
I have, also, a few concerns regarding the DLT flow facilitated in the default implementation, things like the SeekToCurrentErrorHandler strategy

logstash vs spring cloud data flow, which one is suitable for data preprocessing?

I'm using spring boot along with elasticsearch to make a search system on my website.
I've some data that i need to push in elastic search, this data ( a product for example ) must be processed before ( passed to another micro-service that filters the JSON, adds some fields for a better search result do some calculations and return the object i want to store ). is it possible to do so with log stash, or do i need to use Spring Cloud Data Flow ? thanks in advance.
what i want to do:
save a product ( product service )
log the saved product or stream it.
process it before storage ( another service )
save the document ( elastic search server )
Thanks in advance.
Obviously it depends on various factors but I can try to provide some insights on Spring Cloud Data Flow from the technical standpoint.
If you want to construct a streaming pipeline where your filtering apps are connected via a messaging system that does this flow of data processing, you can checkout Spring Cloud Data Flow.
Spring Cloud Data Flow (and the underlying framework supports such as Spring Cloud Stream and Spring Cloud Task) provides the operational benefits over how you manage your streaming pipelines but it may not make sense if you don't need a data pipeline with a messaging system etc., In those cases, you would just stick to a simple Spring Boot app that does this whole filtering model. As soon as you start exploring the distribution of these applications loosely coupled via messaging system, Spring Cloud Data Flow would be handy.
Please checkout SCDF guide to understand some of the features and recipes to know more about what SCDF can offer and choose what fits in your case.

Azure alternative to spring cloud dataflow process

I'm looking for the azure alternative for the Data flow model of Data Source-processor-sink.
I want the three entities to be separate microservices. I want to use messaging as a link between these three.
Basically, Source app takes the data from another service and sends it to processor while processor app acts on it and sends relevant notification/alert to sink.
I'm aware I can use rabbitmq for the messaging but I need to know which one will be better in azure - service bus topics or eventhub? and how can I use them?
At the moment, there isn't a Spring Cloud Stream binder implementation for Azure Event Hubs.
Unless we have this, the out-of-the-box or the custom apps cannot be built as a messaging-microservice app, where Spring Cloud Stream provides the programming model and Spring Cloud Data Flow lets you orchestrate the individual microserivces in to a data pipeline (i.e., source-processor-sink) via the DSL/Drag-and-Drop GUI.
Microsoft was exploring the binder implementation in the past; possibly it would end up in Azure Spring Boot project. Feel free to drop an issue on their backlog.

Spring-Kafka vs. Spring-Cloud-Stream (Kafka)

Using Kafka as a messaging system in a microservice architecture what are the benefits of using spring-kafka vs. spring-cloud-stream + spring-cloud-starter-stream-kafka ?
The spring cloud stream framework supports more messaging systems and has therefore a more modular design. But what about the functionality ? Is there a gap between the functionality of spring-kafka and spring-cloud-stream + spring-cloud-starter-stream-kafka ?
Which API is better designed?
Looking forward to read about your opinions
Spring Cloud Stream with kafka binder rely on Spring-kafka. So the former has all functionalities supported by later, but the former will be more heavyweight. Below are some points help you make the choice:
If you might change kafka into another message middleware in the future, then Spring Cloud stream should be your choice since it hides implementation details of kafka.
If you want to integrate other message middle with kafka, then you should go for Spring Cloud stream, since its selling point is to make such integration easy.
If you want to enjoy the simplicity and not accept performance overhead, then choose spring-kafka
If you plan to migrate to public cloud service such as AWS Kensis, Azure EventHub, then use spring cloud stream which is part of spring cloud family.
Use Spring Cloud Stream when you are creating a system where one channel is used for input does some processing and sends it to one output channel. In other words it is more of an RPC system to replace say RESTful API calls.
If you plan to do an event sourcing system, use Spring-Kafka where you can publish and subscribe to the same stream. This is something that Spring Cloud Stream does not allow you do do easily as it disallows the following
public interface EventStream {
String STREAM = "event_stream";
#Output(EventStream.STREAM)
MessageChannel publisher();
#Input(EventStream.STREAM)
SubscribableChannel stream();
}
A few things that Spring Cloud Stream helps you avoid doing are:
setting up the serializers and deserializers

Resources