Reliable timing with EventMachine periodic timers - ruby

My objective is to have a system that broadcasts an ad every 10 minutes for 37,500 cities. It takes around 5 minutes do the DB queries, calculations, and AMQP IO for all cities.
The code is roughly structured like:
EventMachine.add_periodic_timer(10 minutes) do
the_task_that_takes_five_minutes
end
What I'm finding is that even though the timer is set for 10 minute intervals and even though the task takes less than ten minutes the command fires in 15 minute intervals (the time it takes to complete the task + the EM period.)
If we make the assumption that the task will never take longer than 10 minutes, how would I go about ensuring that the period of the timer is always exactly 10 minutes from the previous run regardless of the task processing time?
EM basically seems to set the next timer after the task is run, not before.
I've tried a simple EM.defer around the task batch itself. I assumed this would open up the thread for setting the next timer but this doesn't solve the issue.
Can I away with the following?
def do_stuff
EventMachine.add_timer(10 minutes) do
do_stuff
end
the_task_that_takes_five_minutes
end
do_stuff
I know I can do that sort of thing in Javascript because the timer wouldn't execute inside the do_stuff call stack. Is this true for EventMachine?

This is just an idea, but maybe the timer could fire the function just to broadcast the ad, not to make all the calculations.
EventMachine.add_periodic_timer(10 minutes) do
ad_broadcasting calculations
EM.defer
calculations = Calc.new
end
end
I'm not sure if deferring the calculations there would avoid EM from waiting for it.

Related

The correct use of timers in a thread group (Until now my timers get ignored)

My goal is to simulate 500 users that perform certain requests on the website in an amount of time of five minutes.
To make the test come as as close as possible to reality, I want to add a thinking time between requests (here: two seconds). The problem is no matter what I do, the timers get ignored. To give you an example, I would like to perform an login request every 2 seconds. Here is data of the thread group:
Number of Threads: 500
Ramp-Up Period: 300
Loop Count: 1
So what I did do till now to achieve this:
I used the constant timer and put it at as a child to my request, that didnt work, timer gets just ignored, no matter what value I use.
I tried the constant throughput timer, but that didnt work too, values get ignored.
What am I doing wrong. I added a screenshot so you are able to see where I did put the constant timer in my test plan.
Screenshots of my testplan:
In your case you can work without timers, you can use the Ramp up period to be Number of threads * 2 (seconds) to start Thread every 2 seconds approximately.
So in your case just put Ramp-Up Period: 1000 (and remove timer)
You are using wrong timer, Constant Timer just adds delay of 5 seconds before each request. If you want JMeter to perform login each 2 seconds you should consider switching to Constant Throughput Timer
Remember that Constant Throughput Timer acts precisely enough on minute level only so you might need to play with ramp-up period on Thread Group level in order to limit threads execution rate during first 60 seconds. Alternatively you can consider using Throughput Shaping Timer plugin

JMeter Threads, Ramp-up and Loop what the correct usage?

I am under the impression that I use the 3 settings; Threads, Ramp-up and Loop, to simulate X (Threads) amount of users/thread over Y (Ramp-up) seconds and do these for Z (Loop) amount of times. For example I want 10 users every 1 second for 1 hour which equates to 10 Threads, 1 Second Ramp-up and 3600 Loops.
But :)
Others seems to be using it differently ... as in if they want the same as above they would set Threads to 36000, Ramp-up to 3600 seconds and Loop to 1.
I tend to think the first approach is correct based on #a it reads better :) and #b why would you have a setting based in seconds to indicate the length of your test
Can anyone give me a definitive answer or are both options plausible?
Firstly 36000 threads in the second example seems very high! http://wiki.apache.org/jmeter/HowManyThreads reports people using 1000. So the 2nd scenario may not even work.
The two scenarios you describe are not exactly the same, and I'm not sure if either are exactly what you want.
In the first the 10 Threads plus 1 second Ramp-up means all 10 Threads will be in use after 1 second. The 10 Threads will then do their actions in parallel 3600 times. You have not mentioned anything that will mean the test will take 1 hour, it will take as long as it takes to loop through 3600 times. To make it take an hour (assuming the actions don't take longer than 1 second) you would need something like a Constant Throughput Timer within your loop which control the speed of the loop so that it takes an hour.
In the second, 10 threads would be created in second #1 and start doing their loop, another 10 in the second#2, etc, all the way up to 1 hour (second#3600). If the actions take longer than one second, then you would have more than 10 threads running in any one second.
The first approach is much clearer. The 2nd one is a misuse of the Ramp-Up; it's not being used to provide a ramp-up to 360000 threads, but to try and schedule 10 threads to run at once.

What is a good way to design and build a task scheduling system with lots of recurring tasks?

Imagine you're building something like a monitoring service, which has thousands of tasks that need to be executed in given time interval, independent of each other. This could be individual servers that need to be checked, or backups that need to be verified, or just anything at all that could be scheduled to run at a given interval.
You can't just schedule the tasks via cron though, because when a task is run it needs to determine when it's supposed to run the next time. For example:
schedule server uptime check every 1 minute
first time it's checked the server is down, schedule next check in 5 seconds
5 seconds later the server is available again, check again in 5 seconds
5 seconds later the server is still available, continue checking at 1 minute interval
A naive solution that came to mind is to simply have a worker that runs every second or so, checks all the pending jobs and executes the ones that need to be executed. But how would this work if the number of jobs is something like 100 000? It might take longer to check them all than it is the ticking interval of the worker, and the more tasks there will be, the higher the poll interval.
Is there a better way to design a system like this? Are there any hidden challenges in implementing this, or any algorithms that deal with this sort of a problem?
Use a priority queue (with the priority based on the next execution time) to hold the tasks to execute. When you're done executing a task, you sleep until the time for the task at the front of the queue. When a task comes due, you remove and execute it, then (if its recurring) compute the next time it needs to run, and insert it back into the priority queue based on its next run time.
This way you have one sleep active at any given time. Insertions and removals have logarithmic complexity, so it remains efficient even if you have millions of tasks (e.g., inserting into a priority queue that has a million tasks should take about 20 comparisons in the worst case).
There is one point that can be a little tricky: if the execution thread is waiting until a particular time to execute the item at the head of the queue, and you insert a new item that goes at the head of the queue, ahead of the item that was previously there, you need to wake up the thread so it can re-adjust its sleep time for the item that's now at the head of the queue.
We encountered this same issue while designing Revalee, an open source project for scheduling triggered callbacks. In the end, we ended up writing our own priority queue class (we called ours a ScheduledDictionary) to handle the use case you outlined in your question. As a free, open source project, the complete source code (C#, in this case) is available on GitHub. I'd recommend that you check it out.

EventMachine tick interval?

there is a method EventMachine.next_tick (http://eventmachine.rubyforge.org/EventMachine.html#next_tick-class_method). How big is the tick interval? How to control it? Can the tick interval be set?
Eventmachine Ticks basically match with each run of the reactor event loop. Using next_tick will run the block on the next available run of the reactor loop. Wether this means the next actual run, or more likely, at some point in the near future is based on if there are other events that are waiting to be picked up by the reactor loop. For instance, any blocks of code that where queue using add_timer or add_periodic_timer are run first, then other events like incoming network traffic is processed.
A "tick" in Eventmachine isn't really a measurement of time, it's a counter of the number of times the reactor loop executes. If you have blocking operations in your reactor loop, then each tick will take longer to process.
If you need to know approximately when your should be run, then use add_timer or add_periodic_timer instead ofnext_tick`. But as theres no guarantee that the reactor loop be available at the exact moment the timer should fire, it's almost impossible to use Eventmachine for accurate timer intervals.

How to run my script x times a day? (ruby on linux)

I want to run my ruby script x times a day (the number might change) on my linux box. What would be the best way to do so if I do not want it to happen at the same time? I want the time (hour and minute) to be random
I was thinking of using at command. The script would be called by at in x hours/minutes or so and then the script would set up another call by at. Not sure if there is any better way or only ruby way.
I'd consider using the at program to run the programs (instead of using cron directly, because cron really only works on a fixed schedule). I'd also create a program (I'd use Perl; you'll use Ruby) to schedule a random delay until the next time the job is executed.
You'll need to consider whether it is crucial that the job is executed 'x' times in 24 hours, and how the randomness should work. What is the range of variation in times. For example, you might have a cron job run at midnight plus 7 minutes, say, which then schedules 'x' at jobs spaced evenly through the day, with a random deviation in the schedule of ±30 minutes. Or you might prefer an alternative that schedules a the jobs with an average gap of 24/x hours and a random deviation of some amount. The difference is that the first mechanism guarantees that you get x events in the day (unless you make things too extreme); the second might sometimes only get x-1 events, or x+1 events, in 24 hours.
I think scheduler solutions are bit limiting, to get most flexible random action, turn your script to daemon and code the loop / wait yourself.
For Ruby there seems to be this: http://raa.ruby-lang.org/project/daemons/
I guess you can setup a cronjob that calls on a bash script which delays execution by a random time but I don't know if you can do it somehow inside the cronjob.
You can find some information on how to do that on this site and if you don't know about crontab and cronjobs you can find more information about that here.
If you want to run X times a day, set your crontab entry to:
0 */X * * * command_to_run
where X is the hourly interval you want to fire your job on to get the desired number of executions/day. For instance, use 2 to fire off every two hours for a total of 12 executions/day.
In your code use this at the top to force it to sleep a random time up to that cron interval:
# How long the program takes to run, in seconds. Be liberal unless having
# two instances running is OK.
EXECUTION_TIME = 10
INTERVAL = 2 * 60 * 60 - EXECUTION_TIME
sleep(rand(INTERVAL))
The idea is that cron will start your program at a regular interval, but then it will sleep some random number of seconds within that interval before continuing.
Change the value for EXECUTION_TIME to however long you think it will take for the code to run, to give it a chance to finish before the next interval occurs. Change the "2" in the INTERVAL to whatever your cron interval is.
I haven't tested this but it should work, or at least get you on the right path.

Resources