Spring Batch remote partitioning how to shutdown slaves - spring

I want to use Spring Batch remote partitioning to handle large workloads on the cloud, and spin up/shutdown VMs on demand.
However, when configuring the slave steps, I'm using the StepExecutionRequestHandler to handle the step requests from a JMS queue. Right now the application just hangs. How can I shut down the application after the queue is depleted?

How can I shut down the application after the queue is depleted?
In a remote partitioning setup, workers are listeners on a queue on which StepExecutionRequests are coming. The question is how to know, from the listener point of view, that the queue is depleted? This is a tricky design problem. There are some known solutions like the "End-Of-Stream" message or "Poison" record but those are tricky too since you have to make sure all listeners get one such message.
If you are using Spring Cloud Task to launch your workers, you can use the DeployerPartitionHandler which provides an elegant way to dynamically create workers on demand up to a maximum configurable number. You can find more details about it here: https://docs.spring.io/spring-cloud-task/docs/2.0.0.RELEASE/reference/htmlsingle/#batch-partitioning and an example in this github repo: https://github.com/mminella/scaling-demos/blob/master/partitioned-demo/src/main/java/io/spring/batch/partitiondemo/configuration/BatchConfiguration.java#L75
The ice on the cake is that this is based on Spring Cloud Deployer which means you can use it on any cloud provider that implements the SCD SPI. Here is how to do it for:
Kubernetes: https://docs.spring.io/spring-cloud-task/docs/2.0.0.RELEASE/reference/htmlsingle/#_notes_on_developing_a_batch_partitioned_application_for_the_kubernetes_platform
cloud foundry: https://docs.spring.io/spring-cloud-task/docs/2.0.0.RELEASE/reference/htmlsingle/#_notes_on_developing_a_batch_partitioned_application_for_the_cloud_foundry_platform

Related

Create and cleanup instance specific rabbitMQ instances

I have a set of microservices using springboot rest. These microservices will be deployed in a autoscaled and load balanced environment. One of these services is responsible for managing the system's configuration. When other microservices startup, they obtain the configuration from this service. If and when the configuration is updated, I need to inform all currently running microservices instances to update their cached configuration.
I am considering using RabbitMQ with a fanout exchange. In this solution, each instance at startup will create its queue and bind that queue to the exchange. When there is a configuration change, the configuration service will publish an update to all queues currently bound to that exchange.
However, as service instances are deleted, I cannot figure out how would I delete the queue specific to that instance. I googled but could not find a complete working example of a solution.
Any help or advise?
The idea and solution is correct. What you just miss that those queues, created by your consumer services could be declared as auto-delete=true: https://www.rabbitmq.com/queues.html. As long as your service is UP, the queue is there as well. You stop your service, its consumers are stopped and unsubscribed. At the moment the last consumer is unsubscribed the queue is deleted from the broker.
On the other hand I would suggest to look into Spring Cloud Bus project which really is aimed for tasks like this: https://spring.io/projects/spring-cloud-bus.

How can I connect to multiple Rabbitmq nodes with Spring Rabbitmq?

I am writing a service with Spring and I am using Spring AMQP in order to connect to Rabbitmq.
I have two rabbitmq clusters, one is only for publishing messages(the messages are sent to the other cluster via the federation plugin) and the other cluster is for declaring queues that end users will consume from.
The nodes sit behind aws lb, each cluster has a lb.
I am using CachingConnectionFactory and RabbitTemplate,RabbitAdmin in my code and I want to have connections to all the nodes so I can use them.
For the cluster that will contain the queues I added to the config the queue-master-locator=random so new queues will be declared in all the nodes in the cluster even if my service does not have a connection to them.
With the cluster that publishes messages I have more of a problem because I need a direct connection in my service to each of the nodes so I will be able to separate the load between the nodes.
So my problem is, how do I create connections in my service to all the nodes in the cluster so they will all be used for declaring queues and sending messages?
Now, after I will have some sort of solution to this issue, the next issue will be what happens when a new node is added to the cluster? How can I create a connection to it and start using it as well?
I am using Rabbitmq - 3.7.9, Spring - 2.0.5, Spring AMQP - 2.0.5
Thanks alot!
There is currently no mechanism to do anything like that.
By default, Spring AMQP opens only one connection (optionally two, one for publishing, one for consuming).
Even when using CacheMode.CONNECTION, you'll get a new connection for each consumer (and connections will be created and cached on demand for producers), you won't get any control as to which node it connects to; that's a function of the LB.
The framework does provide the LocalizedQueueConnectionFactory which will try to consume from the node that hosts a queue, but it won't work with a load balancer in place.
In general, however, such optimization is rarely needed.
Are you trying to solve an actual problem you are experiencing now, or something that you perceive that might be a problem?
It is generally best not to perform premature optimization.

How to make Spring kafka client distributed

I have messages coming in from Kafka. So I am planning to write a listener and "onMessage". I want to process it and push it in to solr.
So my question is more architectural, like I have worked on web apps all my career, so in big data how to deploy the spring kafka listener, so I can process thousands of messages a second.
How do I make my spring code use multiple nodes to distribute the
load?
I am planning to write a SpringBoot application to run in
a tomcat container.
If you use the same group id for all instances, different partitions will be assigned to different consumers (instances of your application).
So, be sure that you specified enough partitions in the topic you are going to consume.

SCDF: Can I use an outside microservice as a source?

I am trying to work through a solution where the workflow is like this:
User hits a microservice to upload images
That microservice de-duplicates the image and if it really is new, queues it up for processing
The processing chain lives in Spring Cloud Dataflow
The microservice already exists, and we are trying to extend it to do the fancy processing. My initial cut was to use the Http Source from the sample starter pack since that would be something I didn't have to create. The problem is that the source doesn't register itself with Spring Discovery server, so there is no way to get an end point without making gross assumptions (like it lives on the dataflow server at port XYZ).
We can create a Queue endpoint and send the data directly a Queue source that receives the outside event and forwards it to an SCDF queue.
What would be awesome is if DataFlow could connect the start of the queue for me, without repackaging the microservice as a Source.
The major issue with Spring Data Flow is that it does not automatically start up deployed streams when the server starts up, and we need to be reasonably sure that microservice is always up.
The lifecycle of the server is decoupled from the apps it deploys, that was intentional.
I'm not following your thoughts on how dataflow could connect the start of the queue, but from your description there's a few things you could do:
You would need to modify the app in order to have it registered with eureka, but this is a very simple operation, no more than a few lines of code:
You can either start from a stream app perspective: https://start-scs.cfapps.io/ , select http source, your binder, and then add the spring-cloud-netflix library as well as #EnableDiscoveryClient at the Main boot class
Start with http://start.spring.io Select Stream Rabbit or Stream Kafka, add Web and netflix libraries, then add the #EnableDiscoveryClient and #EnableBinding annotations and create a simple HTTP endpoint for your use case.
In any case should be a small addition.
You can also open an issue at :https://github.com/spring-cloud-stream-app-starters/http/issues suggesting that we add #EnableDiscoveryClient to the http source app, we can take that in consideration on our next iteration as well.
I'll try to clarify few bits.
upload images -> if it really is new -> queues it up for processing
Upon a new upload event, you'd want to process the image. Here's a similar use-case, but more of a real-time streaming style solution. This is not what you're looking to do, but I thought it might be useful.
Porting the image processing code to a Spring Cloud Stream application is as simple as adding #EnableBinding(Processor.class). It is the same business logic - whether you're running it separately or orchestrating it via SCDF, it is still a standalone microservice. However, SCDF expects it to be either a Source, Processor, Sink, or Task application types. We will be opening this up to support any arbitrary "functions" (lambdas) in the future release.
We can create a Queue endpoint and send the data directly a Queue source that receives the outside event and forwards it to an SCDF queue.
This is one of the standard solutions. You can directly consume new events (images) from a queue/topic and process it in the image-processor that we created in previous step. The named-channel support in DSL facilitates just that.
What would be awesome is if DataFlow could connect the start of the queue for me, without repackaging the microservice as a Source.
I'm not sure I understand this. If I were to assume, you're looking for "named-channel" as source and that is supported.
The major issue with Spring Data Flow is that it does not automatically start up deployed streams when the server starts up, and we need to be reasonably sure that microservice is always up.
The moment you deploy a Stream in SCDF, all the individual steps included in the DSL (i.e., stream definition) are resolved and deployed as standalone apps in the target runtime (cloud foundry, kubernetes, etc.,). Once deployed, it is left to the platform where the apps run for lifecycle management. SCDF does not retain or track the app states.

How to create physical queue in JMS at run time

Want to know how to create physical queue in JMS at run time.
when I search for this I got Creating JMS Queues at runtime
But when I read http://activemq.apache.org/how-do-i-create-new-destinations.html I come to know queue which mention in Creating JMS Queues at runtime is not creating any physical queue at server side.
Please correct me if I m wrong. If any one know to create physical queue at run time please replay.
Thanks in advance.
The creation of "normal" queues is not adressed by the JMS standard. Depending on what you want to do there are two approaches:
use temporary queues -> however they have many restrictions, most commonly they are used forrequest-reply scenarios
use the API of the JMS provider - however your solution will be depending on this specific provider then
The JMS standard only addresses sending and receiving data from objects like queues and topics. Creation of JMS artefacts is vendor specific and most often requires using:
1)specific vendor APIs (not JMS)
2)command/admin messages aimed at the JMS server (command agents on activemq)
3)JMX API
I have used JMX method, which is the most powerful, but also the most work.
JMX Method for activemq (version 5.0+)
a) JMS Server Setup
1) Enable JMX in activemq startup scripts and activemq.xml files
2) If you are authenticating to to the server, make sure your user has admin privileges setup in activemq.xml (see http://activemq.apache.org/security.html)
3)restart activemq server
b) Your Client Code
1) create an instance of org.apache.activemq.broker.jmx.BrokerViewMBean (you will need to connect with some JMX connectivity code which is a bit messy)
2) use its addQueue method. This will create a queue on the server
(The process is similar for hornetq but since you mentioned Activemq I have omitted hornetq details here.)
I have used this method myself and it works.
An alternative is to use Command Agents in Activemq, but I have no personal experience with these. These are special messages contain admin commands and may do what you want as well.

Resources