I am calling a program, let say myprogram, from the terminal (in OS X Mavericks) but some times it gets stuck due to external problems out of my control. This tends to happen approximately every half an hour.
myprogram basically has to perform a large quantity of small subtasks, which are saved in a file that is read in every new execution, so there is no need to recompute everything from the beginning.
I would like to fully automatize the restarting of the program by killing and restarting it again, in the following way:
Start the program.
Kill it after 30 minutes (the program will be probably stuck).
Restart it (back to step 1).
Any ideas on how to do this? My knowledge of bash scripting is not great precisely...
The following script can serve as a wrapper script for myprogram
#!/bin/bash
while true #begin infinite loop (you'll have to manually kill)
do
./myprogram & #execute myprogram and background
PID=$! #get PID of myprogram
sleep 1800 #sleep 30 minutes (30m might work as parameter)
kill -9 $PID #kill myprogram
done
You could use a wrapper, but, an infinite loop is not an optimal solution. If you are looking to relaunch a program on timer, or not, depending on the exit code and are on OS X, you should use launchd configuration (xml property list) files and load them with launchctl.
KeepAlive <boolean or dictionary of stuff>
This optional key is used to control whether your job is to be kept continuously running or to let
demand and conditions control the invocation. The default is false and therefore only demand will start
the job. The value may be set to true to unconditionally keep the job alive. Alternatively, a dictio-nary dictionary
nary of conditions may be specified to selectively control whether launchd keeps a job alive or not. If
multiple keys are provided, launchd ORs them, thus providing maximum flexibility to the job to refine
the logic and stall if necessary. If launchd finds no reason to restart the job, it falls back on
demand based invocation. Jobs that exit quickly and frequently when configured to be kept alive will
be throttled to converve system resources.
SuccessfulExit <boolean>
If true, the job will be restarted as long as the program exits and with an exit status of zero.
If false, the job will be restarted in the inverse condition. This key implies that "RunAtLoad"
is set to true, since the job needs to run at least once before we can get an exit status.
...
ExitTimeOut <integer>
The amount of time launchd waits before sending a SIGKILL signal. The default value is 20 seconds. The
value zero is interpreted as infinity.
For more information on launchd & plists visit :
https://developer.apple.com/library/mac/documentation/MacOSX/Conceptual/BPSystemStartup/Chapters/CreatingLaunchdJobs.html
https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man5/launchd.plist.5.html
Related
My problem :
Each night, my crontab launches several nightly tests on a supercomputer working with PBS under CentOS 6.5. When launched, the jobs wait in the queue. When the scheduler allow to run, my jobs start. It is quite common than the scheduler launch all the jobs exaclty at the same time (even if my crontab lauched them at separated moments).
I can't modify the main part of the job (but I can add things before). Each job starts with an update of a common SVN repository. But, when the jobs start simultaneously, I have an error due to concurrent updates on the same repository. I want to avoid that.
What I expect :
When launched by the scheduler, the job could wait some seconds before starting. A solution could be wait a random time before starting, but the risk to have the same random time grow fast with the number of tests I perform in parallel. If I reduce this risk by choosing a big random number, I have to wait too long (locking unused resources on the supercomputer).
I suppose it's possible to store the information of "I will launch now, others have to wait for 1 minute" for each job, in a multi-thread-safe manner, but I don't know how. What I imagine is a kind of mutex but inducing only a delay and not a lock waiting the end.
A solution without MPI is prefered.
Of course, I'm open to other solutions. Any help is welcome.
Call your script from a wrapper that attempts to obtain an exclusive lock on a lock file first. For example
{
flock -s 200
# your script/code here
} 200> /var/lock/myscript
The name of the lock file doesn't really matter, as long as you have write permission to open it. When this wrapper runs, it will first attempt to get an exclusive lock on /var/lock/myscript. If another script already has the lock, it will block until the lock becomes available.
Note that there are no arbitrary wait times; each script will run as soon as possible, in the order in which they first attempt to obtain the lock. This means you can also start the jobs simultaneously; the operating system will manage the access to the lock and the ordering.
Here is a solution by using GNU parallel
It might seem a bit counter-intuitive at first to use this tool, but if you set the maximum number of jobs to run at a time to 1, it can simulate a job queue that runs multiple jobs in sequence without any overlaps.
You can observe the desired effect of this command by using this example
seq 1 5 | parallel -j1 -k 'echo {}; sleep 1'
-j1 sets max jobs running at a time to 1 while -k preserves the order.
To apply this to your original problem, we can create say a file such that it contains a list of script files line by line. We can then pipe the content of that file to parallel to make multiple scripts run in sequence and in order.
cat file | parallel -j1 -k bash {}
I'm making Online Judge, but I can't check need to be judged file's running time and I can't stop it. If submitted program have infinite loop, my judge system falls to infinite loop too.
how should I do?
You should simply run the "to be judged files" in a separate thread and kitt it after a time-limit if it has not terminated successfully before.
This way you only need to wait in your time limiting thread and than kill the other one, if it has not finished already.
I have a infinite loop java program, which I want to start at a specific time and also kill after two hours , I can start the program, and it keeps on running, until I manually kill it, is there a way in Oozie(hue) where the job can be start and killed periodically ?
If you can find a way to kill the action with a shell script (from an arbitrary node) you should be able to use the oozie shell action to kill it.
That being said, the way to go here would seem to be:
Pass an endtime to your loop (or a wrapper of your loop)
There is a <timeout> option in the Oozie Coordinator... but the name is confusing: actually, that option does not apply to a running job.
I have used a workaround for a similar requirement: just tell the program when it is supposed to stop!
in the Coordinator, generate a Workflow parameter with the desired end time for that execution, for instance "nominal start time +55 minutes" i.e. ${coord:formatTime(coord:dateOffset(coord:nominalTime(), 55, 'MINUTE'), 'yyyy-MM-dd HH:mm:ss z')}
in the Workflow, pass that parameter as an argument to your Java program
in the Java program, find a way to honor that desired end time argument -- i.e. if you run an infinite loop then break when the current time is over the limit, otherwise fork a dedicated thread with an infinite loop with a brutal System.exit(), etc.
How can I ensure, that multiple instances of certain program are always running?
Let's say that I want to make sure that 4 instances of a certain program are always running.
If one instance is killed, new one should start.
If 5 instances are running, one should be killed.
This is not really a shell question, because the approach is the same, whichever shell you are using.
I think the cleanest solution is to have a "watchdog", which checks the running processes (using ps) and, if necessary, starts a new one or kills an unnecessary one.
One way - which I have used in a similar situation - is to write a cron job, which regularly (say: every 5 minutes) starts the watchdog and let it do his work.
If such an interval is too long for your case (i.e. if you need checking it more often than every minute), you could have the watchdog run continuously, in a loop. Still, you will need a cron job, which controls in turn the watchdog from time to time - just in case the watchdogs dies. In this case you might consider running it as a daemon.
I have a thread in some console process, whose code is not available to me. Is there a way to get notified when its state changes (e.g. becomes idle)?
Maybe hooks API or even kernel-mode API (if no other way...)?
Expansion:
I have legacy console app I have to operate in my program. The only way to interact with it, obviously, is via stdin. So I run it as new Process, and send commands to its stdin. I take some text from its stdout, but it is not entirely predictable, so I cannot rely on it in order to know when it finished its current work and ready to receive the next commands. The flow is something like this: I run it, send (e.g.) command1, wait for it to finish its work (some CPU load and some IO operations), and then issue command2, and so on, until last command. When it finished working on last command, I can close it gracefully (send exit command). The thing is that I have several processes of this console exe that are working simultaneously.
I can:
use timeouts since last stdout received - not good, because it can be a millisecond and an hour
parse stdout using (e.g.) regex to wait for expected outputs - not good... the output is wholly unexpected. It's almost random
using timers, poll its threads state and only when all of them are in wait state (and not for IO), and at least one is waiting for user input (see this) - not good either: in case I use many processes simultaneously it can create unnecessary, non-proportional burden on the system.
So I like the last option, just instead polling, I rather events fired when these threads become idle.
I hope it explains the issue well...