I want to run my ruby script x times a day (the number might change) on my linux box. What would be the best way to do so if I do not want it to happen at the same time? I want the time (hour and minute) to be random
I was thinking of using at command. The script would be called by at in x hours/minutes or so and then the script would set up another call by at. Not sure if there is any better way or only ruby way.
I'd consider using the at program to run the programs (instead of using cron directly, because cron really only works on a fixed schedule). I'd also create a program (I'd use Perl; you'll use Ruby) to schedule a random delay until the next time the job is executed.
You'll need to consider whether it is crucial that the job is executed 'x' times in 24 hours, and how the randomness should work. What is the range of variation in times. For example, you might have a cron job run at midnight plus 7 minutes, say, which then schedules 'x' at jobs spaced evenly through the day, with a random deviation in the schedule of ±30 minutes. Or you might prefer an alternative that schedules a the jobs with an average gap of 24/x hours and a random deviation of some amount. The difference is that the first mechanism guarantees that you get x events in the day (unless you make things too extreme); the second might sometimes only get x-1 events, or x+1 events, in 24 hours.
I think scheduler solutions are bit limiting, to get most flexible random action, turn your script to daemon and code the loop / wait yourself.
For Ruby there seems to be this: http://raa.ruby-lang.org/project/daemons/
I guess you can setup a cronjob that calls on a bash script which delays execution by a random time but I don't know if you can do it somehow inside the cronjob.
You can find some information on how to do that on this site and if you don't know about crontab and cronjobs you can find more information about that here.
If you want to run X times a day, set your crontab entry to:
0 */X * * * command_to_run
where X is the hourly interval you want to fire your job on to get the desired number of executions/day. For instance, use 2 to fire off every two hours for a total of 12 executions/day.
In your code use this at the top to force it to sleep a random time up to that cron interval:
# How long the program takes to run, in seconds. Be liberal unless having
# two instances running is OK.
EXECUTION_TIME = 10
INTERVAL = 2 * 60 * 60 - EXECUTION_TIME
sleep(rand(INTERVAL))
The idea is that cron will start your program at a regular interval, but then it will sleep some random number of seconds within that interval before continuing.
Change the value for EXECUTION_TIME to however long you think it will take for the code to run, to give it a chance to finish before the next interval occurs. Change the "2" in the INTERVAL to whatever your cron interval is.
I haven't tested this but it should work, or at least get you on the right path.
Related
My question is based on THIS question.
I should consider using --array=0-60000%200 to limit the number of jobs running in parallel to 200 in slurm. It seems to me that it takes up to a minute to lunch a new job every time that an old job is finished. Given the number of jobs that I am planning to run, I might be wasting a lot of time this way.
I wrote a "most probably" very inefficient alternative, consisting in a script that launches the jobs, checking the number of jobs in the queue and adding jobs if I am still bellow the max number of jobs allowed and while I reached the max number of parallel jobs, sleep for 5 seconds, as follows:
#!/bin/bash
# iterate procedure $1 times. $1=60000
for ((i=0;i<=$1;i++))
do
# wait until any queued process is finished
q=$(squeue -u myuserName | wc -l) #I don't care about +/-1 lines (e.g. title)
while [ $q -gt 200 ] #max number of parallel jobs set to 200
do
sleep 5
q=$(squeue -u myuserName | wc -l)
done
# run the job with sbatch
sbatch...
done
It seems to do a better job compared to my previous method, nevertheless,
I would like to know how inefficient is in reality this implementation? and why?
Could I be harming the scheduling efficiency of other users on the same cluster?
Thank you.
SLURM needs some time to process the jobs list and decide which job should be the next to run, specially if the backfill scheduler is in place and there are lots of jobs in the queue. You are not losing one minute to schedule a job due to you using a job array, is SLURM that needs one minute to decide, and it will need the same minute for any other job of any other user, with or without job arrays.
By using your approach your jobs are also losing priority: everytime one of your jobs finishes, you launch a new one, and that new job will be the last in the queue. Also, SLURM will have to manage some hundreds of independent jobs instead of only one that accounts for the 60000 that you need.
If you are alone in the cluster, maybe there's no big difference in both approaches, but if your cluster is full, you manual approach will give a slightly higher load to SLURM and you jobs will finish quite a lot later compared to the job array approximation (just because with the job array, once the array gets to be first in line, the 60000 are first in line, compared to being last in line everytime one of your jobs finishes).
Minimum value for EC2 instance StatusCheckFailed interval seems to be one minute. Is it possible to reduce this to 2 failures for 15 seconds?
We have a requirement to detect failures quickly in 10-15 seconds range. Are there any other ways to accomplish this?
I don't believe you can set the resolution of the status check to less than 1 minute. One potential workaround would be to implement a lambda function that essentially performs a status check (via your own code) on a more frequent time interval via a cron job.
There are several partitions on the cluster I work on. With sinfo I can see the time limit for each partition. I put my code to work on mid1 partition which has time limit of 8-00:00:00 from which I understand that time limit is 8 days. I had to wait for 1-15:23:41 which means nearly 1 day and 15 hours. However, my code ran for only 00:02:24 which means nearly 2.5 minutes (and the solution was converging). Also, I did not set a time limit in the file submitted with sbatch The reason of my code stopped was given as:
JOB 3216125 CANCELLED AT 2015-12-19T04:22:04 DUE TO TIME LIMIT
So, why my code was stopped if I did not exceed the time limit? I was asking this to the guys who were responsible for the cluster but they did not return.
Look at the value of DefaultTime in the output of scontrol show partitions. This is the maximum time that is allocated to your job in the case you do not specify it by yourself with --time.
Most probably this value is set to 2 minutes to force you to specify a sensible time limit (within the limits of the partition.)
Imagine you're building something like a monitoring service, which has thousands of tasks that need to be executed in given time interval, independent of each other. This could be individual servers that need to be checked, or backups that need to be verified, or just anything at all that could be scheduled to run at a given interval.
You can't just schedule the tasks via cron though, because when a task is run it needs to determine when it's supposed to run the next time. For example:
schedule server uptime check every 1 minute
first time it's checked the server is down, schedule next check in 5 seconds
5 seconds later the server is available again, check again in 5 seconds
5 seconds later the server is still available, continue checking at 1 minute interval
A naive solution that came to mind is to simply have a worker that runs every second or so, checks all the pending jobs and executes the ones that need to be executed. But how would this work if the number of jobs is something like 100 000? It might take longer to check them all than it is the ticking interval of the worker, and the more tasks there will be, the higher the poll interval.
Is there a better way to design a system like this? Are there any hidden challenges in implementing this, or any algorithms that deal with this sort of a problem?
Use a priority queue (with the priority based on the next execution time) to hold the tasks to execute. When you're done executing a task, you sleep until the time for the task at the front of the queue. When a task comes due, you remove and execute it, then (if its recurring) compute the next time it needs to run, and insert it back into the priority queue based on its next run time.
This way you have one sleep active at any given time. Insertions and removals have logarithmic complexity, so it remains efficient even if you have millions of tasks (e.g., inserting into a priority queue that has a million tasks should take about 20 comparisons in the worst case).
There is one point that can be a little tricky: if the execution thread is waiting until a particular time to execute the item at the head of the queue, and you insert a new item that goes at the head of the queue, ahead of the item that was previously there, you need to wake up the thread so it can re-adjust its sleep time for the item that's now at the head of the queue.
We encountered this same issue while designing Revalee, an open source project for scheduling triggered callbacks. In the end, we ended up writing our own priority queue class (we called ours a ScheduledDictionary) to handle the use case you outlined in your question. As a free, open source project, the complete source code (C#, in this case) is available on GitHub. I'd recommend that you check it out.
Is there some way like a script we can run n times in a loop to benchmark classic ASP CPU computation time and disk I/O from server to server? That is, run on a local workstation, then a local Pentium dev server with rotational disk, then a entry level production server with RAID 01, then a virtualized cloud production server, then a quad-core xeon server with SSDs, etc. So we can see "how much faster" the servers are from one to the next. The only tricky part is testing SQL and I/O testing. Perhaps the script could compute at random and then write a 10gb file to disk or a series of 1gb files and then time how long it takes to create them, write them, copy/move them, read them in and calculate and MD5 on them and spit out the results to the client, say, in a loop of n times. What we're trying to do is prove with a control set of code and tasks server performance from one machine to the next.
-- EDIT --
Of course, as pointed out by jbwebtech below, VBS does indeed have its own Timer() function, which I had completely forgotten about. And so...
Well you can just get the time before you run each process, run the process and then deduct it from the current time...
<%
Dim start
start = Timer()
'Run my process...
...
...
Response.Write(Timer()-start))
%>
It's perhaps not perfect, but it will give you back a fairly useful reading. The
result is a Double precision number; the first part of the result is the number of days that have passed since you started, and the fractional part, obviously fractions of days. You can work this out into time if you wish by dividing by the appropriate amounts (i.e. 24 hours in a day, 60 minutes in an hour, 1440 minutes in a day, 60 seconds in an hour, 86400 seconds in a day etc.). The Timer() function gives you the time in seconds and milliseconds taken to execute the code block.