Consume data from specified shard

Consume data from specified shard - spring

I'm trying to implement Kinesis consumer in Spring Boot application, the main problematic requirement for me, that I need to consume data from specific shard only(I have a name of it). As I understand with spring-cloud-stream-binder-aws-kinesis it's not possible to do or I'm missing something? Maybe it possible to implement with KCL? It's look possible to do this with custom client Anyway I need to use SubscribeToShard somehow.

Related

How can I retrieve Kafka messages inside a controller in Spring Boot?

The messages created by the producer are all being consumed as expected.
The thing is, I need to create an endpoint to retrieve the latest messages from the consumer.
Is there a way to do it?
Like an on-demand consumer?
I found this SO post but is only to consume the last N records. I want to consume the latest without caring about the offsets.
Spring Kafka Consumer, rewind consumer offset to go back 'n' records
I'm working with Kotlin but if you have the answer in Java I don't mind either.

There are several ways to create listener containers dynamically; you can then start/stop them on demand. To get the records back into the controller, you'd need to use something like a blocking queue, or make the controller itself a MessageListener.
These answers show a couple of techniques for creating containers on demand:
How to dynamically create multiple consumers in Spring Kafka
Kafka Consumer in spring can I re-assign partitions programmatically?

How can I get built-in metrics and add a custom metrics in Spring Boot Kafka Streams?

I have a problem to add a custom metrics in Kafka Streams.
I made a Kafka Streams application with Spring Boot like this. (Kafka Streams with Spring boot. Baeldung)
and deployed several of this app on k8s.
I want to know about avg number of processd message per second of each app instance. and it exists in Kafka Streams built-in thread metrics(process-rate). (ref. Kafka Streams Metrics)
But, that metric use thread-id at tag key and so each app instance has different metric tag key.
I'd like to use that metric value as the same tag key in each app instance.
So, I came up with a solution. It's about using that built-in metric value to add a new custom metric.
But, There's no specific information about how I get built-in metric values in source code and add a custom metric..
In ref, there's a way to add a custom metrics but no specific information about how can I apply in source code.
Is there a way to solve this problem? Or is there any other way?

Message Aggregation using SQS and SpringBoot

I have a use case/situation wherein, SQS(standard) will be flooded with messages (north of 500k+), a microservice (spring boot based) listens to these events, consumes it, and makes a rest API call (batch-based) to 3rd party SaaS system (have attached a high-level diagram for the same)
The limitation here is that the spring boot consumer can receive a max of 10 messages from the SQS, transform the payload, and makes the rest API call with these 10 messages(records).
Is there a way to aggregate these messages to say 100 messages, before making the rest API call (assuming that the target SaaS System accepts 100 records of data)? Would spring batch helps in this case?
Should I have to look at a different stack for this kind of need? Any help/guidance is much appreciated.
Thanks

What you are describing is actually the chunk-oriented processing model of Spring Batch: items could be read from the queue, accumulated in chunks of 100 items (that is the configurable chunk-size) and posted to your REST API in bulk mode.
Spring Batch handles the chunking of items (and much more) for you. So yes, even though I'm biased, I believe Spring Batch is a very good option for your use case.

Maybe you should try Spring Aggregator(Spring Integration).
The Aggregator combines a group of related messages, by correlating
and storing them until the group is deemed to be complete. At that
point, the aggregator creates a single message by processing the whole
group and sends the aggregated message as output.
https://docs.spring.io/spring-integration/reference/html/aggregator.html
And please refer to this GitHub repo for spring integration with AWS services
https://github.com/spring-projects/spring-integration-aws/tree/main/src/test/java/org/springframework/integration/aws
I'm assuming you are having multiple instances of your application and can scale up easily if required (since you have 500k+ messages). But still, your application is prone to data loss. So building a reliable system is always challenging. Since you are already on the cloud and maybe you should think about utilizing different cloud services.
I think for your case, you should have a look at the AWS Kinesis dataStream and Kinesis data fire hose.
You can refer this,
https://aws.amazon.com/blogs/big-data/stream-data-to-an-http-endpoint-with-amazon-kinesis-data-firehose/

Development compromises in using Spring Cloud Stream

The case for event-driven microservices such as Spring Cloud Stream is their asynchronous nature, which I do agree it makes them more scalable
But I have an issue regarding how to code it in a way where I don't lose certain key features that I have access to using synchronous services
In a servlet-based MS, I make full use of servlet context variables and servlet-based Spring autowiring functions
For e.g., I leverage heavily on HTTP headers to carry metadata between microservices without having to impact the payload. But in Spring Cloud Stream using Kafka, Kafka doesn't support message headers of any kind! I lose that immediately if I use SCS. Putting them into the payload causes all sort of changes in my model classes if I define the attributes clearly. Yes, I can use a simple Hashmap to simulate the HTTP header object but it really seems like reinventing the wheel to me.
On the auto-wiring side: I maintain an audit log record per request, which I implement by declaring a request-scoped Hashmap bean and autowiring it into any methods in the Servlet's call stack that needs to append data to the audit log. Basically it's just a global variable to hold some data within a single request. But in SCS, again, I lose that cos bean scopes that leverage on servlets are not available.
So far, there seems to be a lot of trade-offs that I have to make just to make Spring Cloud Stream work for me.
I thought about an alternative approach where I use SCS just to create an entry point but the Source method would just get the event, use a Processor to construct a HTTP request and send the request along to a HTTP endpoint. But, why go through all that trouble then?
Hoping that some more experienced devs would be able to shed some light on how they leverage on SCS.

#feicipet Thanks for the detailed question. let me try to address some of your concerns in the order you have listed them:
+1
+1
I am not sure why you are referring to it as servlet-based instead of Spring-based? Those are features provided by Spring, but read on. . .
Spring Cloud Stream doesn't use Kafka, the end user does while Spring Cloud Stream provides Kafka binder allowing Spring Cloud Stream to integrate with Kafka. Further more, while Kafka indeed did not support headers prior to version 0.11, Spring Cloud Stream always supported and will continue support headers even with Kafka pre-0.11, embedding them in the Message and then extracting them in the consumer side into the proper Message headers completely transparent to the end user. In other words one would assume that Kafka did support headers by simply using Spring Cloud Stream. With Kafka 0.11+ headers are supported natively and we have adjusted to that with the same level of transparency.
So, you don't need to put anything in the payload. Just create an appropriate Message<payload, headers> and SCSt will take care of the rest regardless of the broker (Kafka, Rabbit, Foo etc.).
Yes you do simply due to the fact that as you eluded earlier SCSt promotes an asynchronous and stateless architecture. However, I do not agree that what you are trying to accomplish is un-accomplishable. Rather it is accomplishable the way you are describing, but there are other way to maintain context and I would be more then glad to discuss it as a separate topic.
I would not call them trade-offs, rather difference in the architecture, that has its benefits, but it is a not one-size-fits-all architecture and therefore its viability should be discussed within the context of a concrete use case.
+1. You don't have to separate it as Source and Processor. You can simply create a custom Source app with exposed REST endpoint and custom processing logic. However we are currently working on enhancements i the framework to ensure that you could do the same with the existing starter apps.
Obviously we have touched on many points here and some of them would probably need to be debated further, but I hope this clears up some of your concerns.
Cheers

communication between spring instances behind a load balancer

I have a few instances of Spring apps running behind a load balancer. I am using EHCache as a caching system on each of these instances.
Let's say I receive a request that is refreshing a part of the cache on one instance. I need a way to tell the other instances to refresh their cache (or to replicate it).
I'm more interested in a solution based on Spring and not just cache replication and that's because there are other scenarios similar with this one that require the same solution.
How can I achieve this?

There is no simple Spring solution for this. Depends on the requirements. You can use any kind of PubSub like a JMS topic to notify your nodes. This way the problem can be that you cannot guarantee consistency. The other nodes can still read the old data for a while. In my current project we use Redis. We configured it as cache with Spring Data Redis and theres no need to notify the other nodes since the cache is shared. In non cache scenarios we also use redis as a PubSub service.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio