I'm using ActiveMQ version 5.15.10 And broker gets block from time to time during 10 to 15 min - jms

I have one k8s environment with one single broker and multiple producers and consumers.
I have one single broker with multiple producers and consumers. Producers are in grouped by customer and each customer has 3 to 8 producers, all of them share the same Group ID. A group reset is done by one of the producers every 2 minutes more or less.
On the other side of the broker I have consumers dynamically created and destroyed depending on the size of the broker queue, it could be from 2 to 4 consumers.
Like 4-5 times per week I found cases on which broker stops sending messages to consumers during 10 to 15 minutes. This seems to match with moments in which one or two consumers are terminated because a spot reclaim. Some new consumers are created by k8s, and in some cases they even start getting a few messages, but then broker stops sending messages during 10 to 15 minutes.
I have try several things, I have increase the time on which the group reset is done, before is was done every few seconds and now is every 120 seconds, initially it was looking like helping but after several weeks I'm not so sure.
I have try to reproduce the problem in a dev environment with no luck.
One of my guess was that it was related with the prefetchSize and I want to try to set it to 1, but it will take some time to be able to test this in production.
I have configured this vales.
<policyEntry queue=">" consumersBeforeDispatchStarts="2" timeBeforeDispatchStarts="30000"/>
Could it be a random bug couting the consumers in the broker?
The prefetchSize is right now 1000, could it be a problem with the messages still not processed stored in the terminated consumers.

Related

What can cause a Cloud Run instance to not be reused despite continuous load?

Context:
My Spring-Boot app runs as expected on Cloud Run when I deploy it with max-instances set to 1: It receives a constant stream of pubsub messages via push, and makes anywhere from 0 to 5 writes to an associated CloudSQL instance, depending on the message payload. Typically it handles between 20 and 40 messages per second. Latency/response-time varies between 50ms and 60sec, probably due to some resource contention.
In order to increase throughput/ decrease resource contention, I'm looking to experiment with the connection pool size per app-instance, as well as the concurrency and max-instances parameters for my cloud run app.
I understand that due to Spring-Boot, my app has a relatively high cold-start time of about 30-40 seconds. This is acceptable for how this service is used.
Problem:
I'm experiencing problems when deploying a spring-boot app to cloud run with max-instances set to a value greater than 1:
Instances start, handle a single request successfully, and then produce no more logs.
This happens a few times per minute, leading me to believe that instances get started (cold-start), handle a single request, die, and then get started again. They are not being reused as described in the docs, and as is happening when I set max-instances to 1. Official docs on concurrency
Instead, I expect 3 container instances to be started, which then each requests according to max-concurrency setting.
Billable container time at max-instances=3:
As shown in the graph, the number of instances is fluctuating wildly, once the new revision with max-instances=3 is deployed.
The graphs for CPU- and memory-usage also look like this.
There are no error logs. As before at max-instaces=1, there are warnings indicating that there are not enough instances available to handle requests (HTTP 429).
Connection Limit of CloudSQL instance has not been exceeded
Requests are handled at less than 10/s
Finally, this is the command used to deploy:
gcloud beta run deploy my-service --project=[...] --image=[...] --add-cloudsql-instances=[...] --region=[...] --platform=managed --memory=1Gi --max-instances=3 --concurrency=3 --no-allow-unauthenticated
What could cause this behavior?
Some month ago, in private Alpha, I performed tests and I observed the same behavior. After discussion with Google team, I understood that instances are over provisioned "in case of": an instances crashes, an instances is preempted, the traffic suddenly increase,...
The trade-off of this is that you will have more cold start that your max instances values. Worse, you will be charged for this over provisioned cold start -> this is not an issue because Cloud Run has a huge free tier that covers this kind of glitches.
Going deeper in the logs (you can do it by creating a sink of Cloud Run logs into BigQuery and then by requesting them), even if there is more instances up than your max instances, only your max instances are active in the same time. I'm not sure to be clear. With your parameters, that means, if you have 5 instances up in the same time, only 3 serve the traffic at the same point of time
This part is not documented because it evolves constantly for find the best balance between over-provisioning and lack of ressources (and 429 errors).
#Steren #AhmetB can you confirm or correct me?
When Cloud Run receives and processes requests rapidly, it predicts how many instances it needs, and will try to scale to the amount. If a sudden burst of requests occur, Cloud Run will instantiate a larger number of instances as a response. This is done in order to adapt to a possible higher number of network requests beyond what it is currently serving, with attempts to take into consideration the length of time it will take for the existing instance to complete loading the request. Per the documentation, it is possible that the amount of container instances can go above the max instance value when it spikes.
You mentioned with max-instances set to 1 it was running fine, but later you mentioned it was in fact producing 429s with it set to 1 as well. Seeing behavior of 429s as well as the instances spiking could indicate that the amount of traffic is not being handled fluidly.
It is also worth noting, because of the cold start time you mention, when instances are serving the first request(s), by design, the number of concurrent requests is actually hard set to 1. Once things are fully ready,only then the concurrency setting you have chosen is applied.
Was there some specific reason you chose 3 and 3 for Max Instance settings and concurrency? Also how was the concurrency set when you had max instance set to 1? Perhaps you could try tinkering up further the concurrency (max 80) and /or Max instances (high limit up to 1000) and see if that removes the 429s.

Creation of parallel threads for bulk request handling?

I have rest service and want to handle almost 100 requests in parallel for this service. I have mentioned number of threads and number of connections to create as 100 in my application.yml even i did not see 100 connections created to handle requests
Here is what i did in my application.yml
server.tomcat.max-threads=100
server.tomcat.max-connections=100
I am using yourkit to see the internals , but when i start its created only 10 connections to handle requests, when i sent multiple requests also the count of request handling threads not increased its remain as 10. see the attachment i took from yourkit.
You're setting max threads. Not minimum threads. Tomcat in this case has decided the minimum should be 10.

Spring Boot Kafka: Commit cannot be completed since the group has already rebalanced

Today, in my Spring Boot and single instance Kafka application I faced the following issue:
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot
be completed since the group has already rebalanced and assigned the
partitions to another member. This means that the time between
subsequent calls to poll() was longer than the configured
max.poll.interval.ms, which typically implies that the poll loop is
spending too much time message processing. You can address this either
by increasing the session timeout or by reducing the maximum size of
batches returned in poll() with max.poll.records.
What may be the reason for this and how to fix it? As far as I understand - my consumer was blocked for a long time and didn't respond for the heartbeat. And I should adjust Kafka properties in order to address it. Could you please tell me what exact properties should I adjust and where, for example on the Kafka side or on my application Spring Kafka side?
By default Kafka will return a batch of records of fetch.min.bytes (default 1) up to either max.poll.records (default 500), or fetch.max.bytes (default 52428800), otherwise it will wait fetch.wait.max.ms (default 100) before returning a batch of data. Your consumer is expected to do some work on that data and then call poll() again. Your consumer's work is expected to be completed within max.poll.interval.ms (default 300000 — 5 mins in pre v2.0 and 30000 - 30 seconds post v2.0). If poll() is not called before expiration of this timeout, then the consumer is considered failed and the group will rebalance in order to reassign the partitions to another member.
So to fix your issue, reduce the number of messages returned, or increase max.poll.interval.ms property to avoid timing out and rebalancing.

In Spring how to configure a queue with multiple conditions

I have configured a queue in my .XML file,in a way that it would b populated after processing records.I need to add a condition to the queue that it becomes active only when queue depth reaches 1000 or delay time is reached.? Is it possible? I am not using jms.

Laravel 4 Queues, how to do multithreading with queue:listen?

I need asynchronous, quick processing of everything in the queue. Jobs consist of CURL requests so it takes forever doing them 1 by 1 (They're basically the same as sleep(3)). I'd like all messages in the queue to run at the same time, or at least set a limit like 50. The reason I'm using a queue for this and not just running them instantly is because I need to make sure that if anything fails, it tries again.
Use the queue with iron.io ironMQ push, the queue shouldn't fail but in the unlikely even it does there is a log.
See this link for reference http://blog.iron.io/2013/05/laravel-4-ironmq-push-queues-insane.html
From memory you get 10 million requests free per month with ironMQ

Resources