queue.size in Sidekiq is not updated in real time

queue.size in Sidekiq is not updated in real time - ruby

I'm using sidekiq with ActiveJob. I want to balance the queues. So I use this way.
while queue.size < 10
SomeJob.perform_later(some_args) # This should add one job to the queue right away, but it doesn't, it takes some time for the job to enter the queue.
end
This is failing in a bad way. This will schedule 50, 60 or more jobs. The cause is that the queue is not populated by jobs directly, but instead, it takes some time for the jobs to enter the queue. So the method queue.size will return 0 for a few seconds then gets the real queue size.
UPDATE:
I found the issue. It turns out that the class I use to schedule the jobs is a configured one, the configuration at some point was SomeJob.set(wait: wait_time), and wait_time was 0. active job will put the job into scheduled set for some time (less than a second or so) before it enters the queue. This is why the queue.size didn't reflect what I expected to be in the queue.

This is happening because queue is already initialized, and you're not reinitializing the new queue object every time a job is enqueued. It won't "update in real time" as you say (similarly to how you'd have to call #reload on an ActiveRecord object)
More efficient than reinitializing, same effect:
size = queue.size
max_queue_size = 10
number_of_jobs_to_perform = max_queue_size - size
number_of_jobs_to_perform = 0 if number_of_jobs_to_perform < 0
number_of_jobs_to_perform.times do
SomeJob.perform_later(args)
end
Edit: if you really must, use a proc, such as Proc.new { queue.size }.times do ...

Related

Lambda SQS integration: Batch Size vs MaxBatchingWindow

I'm integrating a lambda function with a standard queue in SQS.
I came across these two parameters batchSize and maxBatchingWindow. My original thinking was either the number of messages in the queue has reached the batchSize or the time since the first message came in has last for maxBatchingWindow seconds will trigger the lambda. In other words, whichever condition is satisfied first will invoke the lambda. And I couldn't find enough clarification about these two parameters in this documentation.
As a result, I did some experiment, setting batchSize = 3 and maxBatchingWindow = 300 seconds while setting the reservedConcurrency = 1 for lambda. Then I manually create 3 messages in the queue quickly (<< 5 min). However, I didn't observe the lambda being invoked after 5 minutes (300 s). Particularly, the metric Number Of Messages Sent of sqs shows a new data point at xx:54:15 while the logGroup for lambda updates around xx:59:53 (The lambda does nothing intensive but just to print out the value of event so I'm sure that would be the right execution).
Does that mean, once maxBatchingWindow is set greater than 0, it will become the only requirement to invoke lambda even if the batchSize has met?

How do I wait for a worker process to finish but also limit it's time to do so?

I am writing a Windows service in VB.Net that will go out to some devices and data log points of information. I am using a Background Worker to do that so the service itself is still responsive. I have a timer that runs every second and checks the minute component of the current time. Each time the minute component changes I check which devices need to be checked, some are every minute, some every 5, some every 10, etc. These processes can take a few seconds or over a minute (I only rerun the worker if it's not already running and log a error if the last process took longer then the data retrieval interval).
In my OnStop event for the service I want to make sure the workers all close down. I call CancelAsync on the worker and the worker checks for cancellation to hopefully exit cleanly (i.e., check cancelation, if false retrieve data, save data into database, loop).
My problem is I don't want to use a sleep statement as it will lock everything but I also don't want the service to never shut down. So for example I have this currently:
Protected Overrides Sub OnStop()
' Add code here to perform any tear-down necessary to stop your service.
My.Application.Log.WriteEntry("ServiceABC shutting down for device " & DeviceID)
ServiceTimer.enable = false
If DataRetrievalBackgroundWorker.IsBusy Then
DataRetrievalBackgroundWorker.CancelAsync()
Dim x As Integer = 0
While ((DataRetrievalBackgroundWorker.IsBusy) Or (x < 15))
Threading.Thread.Sleep(1000)
x += 1
End While
End If
End Sub
This should work since the background worker is on another thread correct? Is there a better way to handle this?

You're close, if you don't want to Sleep(1000) and lock things up, do a Sleep(1).
'Dim x As Integer = 0
'While ((DataRetrievalBackgroundWorker.IsBusy) Or (x < 15))
' Threading.Thread.Sleep(1000)
' x += 1
'End While
Dim T As Date = Now.AddSeconds(15)
While DataRetrievalBackgroundWorker.IsBusy Or Now() < T
Threading.Thread.Sleep(1)
Application.DoEvents()
End While

KafkaConsumer poll() behavior understanding

Trying to understand (new to kafka)how the poll event loop in kafka works.
Use Case : 25 records on the topic, max poll size is set to 5.
max.poll.interval.ms = 5000 //5 seconds by default max.poll.records = 5
Sequence of tasks
Poll the records from the topic.
Process the records in a for loop.
Some processing login where the logic would either pass or fail.
If logic passes (with offset) will be added to a map.
Then it will be committed using commitSync call.
If fails then the loop will break and whatever was success before this would be committed.The problem starts after this.
The next poll would just keep moving in batches of 5 even after error, is it expected?
What we basically expect is that the loop breaks and the offsets till success process message logic should get committed, then the next poll should continue from the failed message.
Example, 1st batch of poll 5 messages polled and 1,2 offsets successful and committed then 3rd failed.So the poll call keep moving to next batch like 5-10,10-15 if there are any errors in between we expect it to stop at that point and poll should start from 3 in first case or if it fails in 2nd batch at 8 then the next poll should start from 8th offset not from next max poll batch settings which would be like 5 in this case.IF IT MATTERS USING SPRING BOOT PROJECT and enable autocommit is false.
I have tried finding this in documentation but no help.
tried tweaking this but no help max.poll.interval.ms
EDIT: Not accepted answer because there is no direct solution for a customer consumer.Keeping this for informational purpose

max.poll.interval.ms is milliseconds, not seconds so it should be 5000.
Once the records have been returned by the poll (and offsets not committed), they won't be returned again unless you restart the consumer or perform seek() operations on the consumer to reset the offset to the unprocessed ones.
The Spring for Apache Kafka project provides a SeekToCurrentErrorHandler to perform this task for you.
If you are using the consumer yourself (which it sounds like), you must do the seeks.

You can manually seek to the beginning offset of the poll for all the assigned partitions on failure. I am not sure using spring consumer.
Sample code for seeking offset to beginning for normal consumer.
In the code below I am getting the records list per partition and then getting the offset of the first record to seek to.
def seekBack(records: ConsumerRecords[String, String]) = {
records.partitions().map(partition => {
val partitionedRecords = records.records(partition)
val offset = partitionedRecords.get(0).offset()
consumer.seek(partition, offset)
})
}
One problem doing this in production is bad since you don't want seekback all the time only in cases where you have a transient error otherwise you will end up retrying infinitely.

How resque checks when to run a job?

I have found the Resque:
https://github.com/elucid/resque-delayed
And I can see that I can schedule delayed Job. My question is, how does it check for delayed jobs? If I have 5000 delayed jobs in one month time, I hope it doesn't check every 10 seconds all delayed jobs.
So how is it being done?

It does not have to check all the delayed jobs. It maintains a sorted set in Redis, the jobs being sorted by their scheduled time. See the code at:
https://github.com/elucid/resque-delayed/blob/master/lib/resque-delayed/resque-delayed.rb
Each time the daemon awakes, only the first item of the set needs to be checked (using a ZRANGEBYSCORE command). The daemon fetches the relevant jobs one by one, until the polling query returns no result, then it sleeps again.
Performance could be further improved by fetching the jobs n by n. It could be implemented using a server-side Lua script as a polling query:
local res = redis.call('ZRANGEBYSCORE',KEYS[1], "-inf", ARGV[1], 'LIMIT', 0, 10 )
if #res > 0 then
redis.call( 'ZREMRANGEBYRANK', KEYS[1], 0, #res-1 )
return res
else
return false
end
In one roundtrip, this script gets 10 jobs (if available), and delete them from the zset. Much better than the 11 ZRANGEBYSCORE and 10 ZREM, currently required by Resque-delayed.

Ruby and Rails Async

I need to perform long-running operation in ruby/rails asynchronously.
Googling around one of the options I find is Sidekiq.
class WeeklyReportWorker
include Sidekiq::Worker
def perform(user, product, year = Time.now.year, week = Date.today.cweek)
report = WeeklyReport.build(user, product, year, week)
report.save
end
end
# call WeeklyReportWorker.perform_async('user', 'product')
Everything works great! But there is a problem.
If I keep calling this async method every few seconds, but the actual time heavy operation performs is one minute things won't work.
Let me put it in example.
5.times { WeeklyReportWorker.perform_async('user', 'product') }
Now my heavy operation will be performed 5 times. Optimally it should have performed only once or twice depending on whether execution of first operaton started before 5th async call was made.
Do you have tips how to solve it?

Here's a naive approach. I'm a resque user, maybe sidekiq has something better to offer.
def perform(user, product, year = Time.now.year, week = Date.today.cweek)
# first, make a name for lock key. For example, include all arguments
# there, so that another perform with the same arguments won't do any work
# while the first one is still running
lock_key_name = make_lock_key_name(user, product, year, week)
Sidekiq.redis do |redis| # sidekiq uses redis, let us leverage that
begin
res = redis.incr lock_key_name
return if res != 1 # protection from race condition. Since incr is atomic,
# the very first one will set value to 1. All subsequent
# incrs will return greater values.
# if incr returned not 1, then another copy of this
# operation is already running, so we quit.
# finally, perform your business logic here
report = WeeklyReport.build(user, product, year, week)
report.save
ensure
redis.del lock_key_name # drop lock key, so that operation may run again.
end
end
end

I am not sure I understood your scenario well, but how about looking at this gem:
https://github.com/collectiveidea/delayed_job
So instead of doing:
5.times { WeeklyReportWorker.perform_async('user', 'product') }
You can do:
5.times { WeeklyReportWorker.delay.perform('user', 'product') }
Out of the box, this will make the worker process the second job after the first job, but only if you use the default settings (because by default the worker process is only one).
The gem offers possibilities to:
Put jobs on a queue;
Have different queues for different jobs if that is required;
Have more than one workers to process a queue (for example, you can start 4 workers on a 4-CPU machine for higher efficiency);
Schedule jobs to run at exact times, or after set amount of time after queueing the job. (Or, by default, schedule for immediate background execution).
I hope it can help you as you did to me.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio