Azure Cloud Service - Auto Scale by queue counts Invisible messages (not ready for processing) - azure-cloud-services

We just developed a system that integrates azure queue with an azure cloud service to process batch items. One requirement we had was to have items be set in the future to process. So for example, we batch it now, but tell it not to start for 5 hours.
This is built right into azure queues AddMessage using initialVisibilityDelay, so we did not see this as being an issue. However, we just noticed when we add auto scale on our Cloud Service, it is going off the total items in queue. In our situation we added 100,000 queue items to be sent 5 days from now, however it is scaling assuming these 100,000 are ready to go right now.
So in our situation, we would basically have dozens of instances of our app running until these messages can even send, 5 days from now.
I feel like there is something simple we are missing here.
Any feedback would be very helpful.
Anthony

Have you considered using one queue for the waiting messages and another queue for the actual messages to be processed and scaling on that latter queue?

Related

How to design notification system that sends real-time alerts created by users

I've been thinking about how to design a system that supports user created scheduled alerts. My problem is once the alerts are created and inserted into a database, I don't know what the best way to go about scheduling those alerts. Polling the database to see which alerts need to go out next doesn't seem entirely right to me.
What are some ways this could be handled on a scale where say a million users could create their own custom alerts like change baby diaper at 3pm everyday?
This problem is very suitable for cloud platforms. For example, you could use GCP Cloud Scheduler to invoke a cloud function when the alert is supposed to be sent out. The cloud function then calls some API to alert the user.
If cloud platforms are not an option, you could have your application spawn a new thread when an alert is created, and sleep that thread for a certain duration. When it wakes up, it sends the alert. Less elegant and less scalable than the first solution, but it would still work.

Load balance Kafka consumer multiple instances

I have a consumer that reads and writes messages to a time-series database. We have multiple instances of the time series database running as a cluster on multiple physical machines.
Our plan is to deploy the consumer on Kubernetes so I can scale if I need more instance with load-balance they all point to a single time series service that is running.
Now I getting an Issue where it's come to my mind that if I have 5 instances which consume the same topic then they work individually means( they all get message payload and save like any one instance is doing )
What we want is
we want if one consumer is busy then it will go to the next free instance but not be subscribed to by all instance running. To scale or load-balance means I want like normal load-balancing application or how spring-boot app works when you scale on Kubernetes
so is there any way to make it like a load-balancing consumer and processing only one, even consume by 1st or 2nd or 3rd like normal app work as loadbanlacer?
if anyone has ideas about this, how it going to behave and what kind of output we are going to get if doing this with Kafka Spring boot application?

what are the retry settings for subscriber in pubsub and how to set them correctly in a spring application?

I have a spring service subcribing for messages from a topic in the google cloud pubsub (pulling). It is working correctly in general. But I want to have more control over resent messages. My service need sometimes to nack the message or just let the ackDeadline pass so that I would get the message later on again. While testing with single messages, the nacked message comes back to me almost immidetaly, and the ones I don't ack or nack at all, come back after 10 sec default for ackDeadline. I would like it to postpone the repeated consuming of these messages. I thought the retry setting are designed for such cases.
I should mention as well that I am currently testing locally with an emulator and create the subscription from code. I am using the PubSubAdmin for managing.
According to this docu I have tried to set those configuration in my profile config. like this:
spring.cloud.gcp.pubsub.subscriber.retry.initial-retry-delay-second: 4
spring.cloud.gcp.pubsub.subscriber.retry.max-attempts: 5
spring.cloud.gcp.pubsub.subscriber.retry.initial-rpc-timeout-seconds: 4
spring.cloud.gcp.pubsub.subscriber.retry.max-rpc-timeout-seconds: 8
spring.cloud.gcp.pubsub.subscriber.retry.max-retry-delay-seconds: 7
spring.cloud.gcp.pubsub.subscriber.retry.total-timeout-seconds: 3000
but it had no effect on the time of reoccuring of the messages.
Do I understand the meaning of retry settings wrongly? maybe they only take effect if there are some connection problems but not in nacking or lacking of acknowledgment cases? Or do I have to set the setting while using deploymentManager for creating the subscriptions and am not allowed to set them from the code? Or maybe setting them in (development) profile configs won't work with the PubSubAdmin?
Thanks for any suggestions!
edit: I want the first retry to happen after 5 seconds, but next retry 10 seconds later, etc. Plus I want to set the max retry number. So what I am not interested in is setting the ackDeadline just to a bigger number.
edit2: why nacking: one of the services (let's call it a bridge) is subscribing for the messages, has to validate each message and if ok pass it to another external system. this service is acting as a bridge for this system, as we can't work on this second system directly. in some cases the message need some extra information, so the bridge will try to fetch it somewhere else (there are a lot of microservices included) and it happens sometimes, that at this moment in time the extra information is not there (yet). So the first idea was to not ack the message and let it come later again. but I don't want to ask every 10 sec for the next 7 days (with ackDeadline), I want to just try few times, and if it is not there after 2 hours, it will never came. so we tried to nack and hoped those retry settings can help to manage the resending. But as they don't, I suppose the only way to go will be to build something for managing these messages in the bridge by myself. Maybe store message ids and the number of retry so that I can ack after for example 5 times and push the message to another topic to deal with it differently. Or are there any better solutions known?
Cloud Pub/Sub does not provide exponential backoff for specific messages. A nack has no effect other than to tell Cloud Pub/Sub that you were not able to handle the message.
I could provide a more useful answer if you were to document why you needed to nack the messages. If you are unable to handle the current load, you can use the flow control options described here to reduce the number of outstanding messages or bytes to your client. If you have messages that are known to be bad, you should instead ack them after pushing to another dead letter topic to be handled separately.
Response to edit 2:
If you have this scenario where the action to supplement the messages can fail, implement whatever backoff mechanism you want on that action yourself in your service. Set the max ack extension period when constructing your subscriber (setMaxAckExtensionPeriod in java) to ensure that your client will extend the ack deadline for each message long enough for your chain of retries.
Edit 2
Note that Pub/Sub now has built in support for Dead Lettering.
You can use PubSubSubscriberTemplate.modifyAckDeadline() to programmatically extend the deadlines of a batch of messages retrieved through pull. Each individual AcknowledgeablePubsubMessage also has a modifyAckDeadline() method, if you only need to extend deadline for a select few stragglers.
If all messages on that particular subscription need to have a longer acknowledgement period, a default can be set in GCP Console by editing the subscription and updating the "Acknowledgement Deadline" field.

Azure Queue delayed message

I has some strange behaviour on production deployment for azure queue messages:
Some of the messages in queues appears with big delay - minutes, and sometimes 10 minutes.
Befere you ask about setting delayTimeout when we put message to queue - we do not set delayTimeout for that message, so message should appear almost immedeatly after it was placed in queue.
At that moments we do not have a big load. So my instances has no work load, and able to process message fast, but they just don't appear.
Our service process millions of messages per month, we able to identify that 10-50 messages processed with very big delay, by that we fail SLA in front of our customers.
Does anyone have any idea what can be reason?
How to overcome?
Did anyone faced similar issues?
Some general ideas for troubleshooting:
Are you certain that the message was queued up for processing - ie the queue.addmessage operation returned successfully and then you are waiting 10 minutes - meaning you can rule out any client side retry policies etc as being the cause of the problem.
Is there any chance that the time calculation could be subject to some kind of clock skew problems. eg - if one of the worker roles pulling messages has its close out of sync with the other worker roles you could see this.
Is it possible that in the situations where the message is appearing to be delayed that a worker role responsible for pulling the messages is actually failing or crashing. If the client calls GetMessage but does not respond with an appropriate acknowledgement within the time specified by the invisibilityTimeout setting then the message will become visible again as the Queue Service assumes the client did not process the message. You could tell if this was a contributing factor by looking at the dequeue count on these messages that are taking longer. More information can be found here: http://msdn.microsoft.com/en-us/library/dd179474.aspx.
Is it possible that the number of workers you have pulling items from the queue is insufficient at certain times of the day and the delays are simply caused by the queue being populated faster than you can pull messages from the queue.
Have you enabled logging for queues and then looked to see if you can find the specific operations (look at e2elatency and serverlatency).
http://blogs.msdn.com/b/windowsazurestorage/archive/tags/analytics+2d00+logging+_2600_amp_3b00_+metrics/. You should also enable client logging and try to determine if the client is having connectivity problems and the retry logic is possibly kicking in.
And finally if none of these appear to help can you please send me the server logs (and ideally the client side logs as well) along with your account information (no passwords) to JAHOGG at Microsoft dot com.
Jason
Azure Service bus has a property in the BrokeredMessage class called ScheduledEnqueueTimeUtc, it allows you to set a time for when the message is added to the queue (effectively creating a delay).
Are you sure that in your code your not setting this property, and this might be the cause for the delay?
You can find more info on this at this url: https://www.amido.com/azure-service-bus-how-to-delay-a-message-being-sent-to-the-queue/
If you are using WebJobs to process messages from the queue, it can be due to WebJobs configuration.
From an MSDN forum post by pranav rastogi:
Starting with 0.4.0-beta, the (WebJobs) SDK implements a random exponential back-off algorithm. As a result of this if there are no messages on the queue, the SDK will back off and start polling less frequently.
The following setting allows you to configure this behavior.
MaxPollingInterval for when a queue remains empty, the longest period of time to wait before checking for a message to. Default is 10min.
static void Main()
{
JobHostConfiguration config = new JobHostConfiguration();
config.Queues.MaxPollingInterval = TimeSpan.FromMinutes(1);
JobHost host = new JobHost(config);
host.RunAndBlock();
}

Message queue that delivers delayed messages and checks for uniqness

I have an online service that receives incoming events (few every second). Service needs to process a job when there were no events for 30 seconds or more. Service is distributed across several PCs and uses Amazon webservices (SQS and SimpleDB) as a backbone.
I understand how can I schedule a job when there IS an incoming event (just put a message into message queue and you are done), but how can I schedule a job when the condition is "NO EVENTS FOR X SECONDS" ?
Ideally I would want a message queue that does not allow duplicate messages, allows scheduling for the future and allows adjusting "delivery date" on each message.
Is there such a message queue implementation?
Is this problem can be solved at all without persisting some data in database?
Thank you
Both BizTalk or SQL Server Service Broker fit your requirements. If they are too heavyweight, you could write a simple service that peeks the queue every couple of seconds and times out if it does not see anything in 30 seconds. That would be more difficult to scale horizontally across machines, however.

Resources