OOzie periodically kill a job - hadoop

I have a infinite loop java program, which I want to start at a specific time and also kill after two hours , I can start the program, and it keeps on running, until I manually kill it, is there a way in Oozie(hue) where the job can be start and killed periodically ?

If you can find a way to kill the action with a shell script (from an arbitrary node) you should be able to use the oozie shell action to kill it.
That being said, the way to go here would seem to be:
Pass an endtime to your loop (or a wrapper of your loop)

There is a <timeout> option in the Oozie Coordinator... but the name is confusing: actually, that option does not apply to a running job.
I have used a workaround for a similar requirement: just tell the program when it is supposed to stop!
in the Coordinator, generate a Workflow parameter with the desired end time for that execution, for instance "nominal start time +55 minutes" i.e. ${coord:formatTime(coord:dateOffset(coord:nominalTime(), 55, 'MINUTE'), 'yyyy-MM-dd HH:mm:ss z')}
in the Workflow, pass that parameter as an argument to your Java program
in the Java program, find a way to honor that desired end time argument -- i.e. if you run an infinite loop then break when the current time is over the limit, otherwise fork a dedicated thread with an infinite loop with a brutal System.exit(), etc.

Related

Powershell Jobs - Monitor running jobs in another parallel job

I'm looking for a way to monitor several long running jobs in a session in another parallel job, however there is no way to pass the current session's jobs (Get-Job) as a parameter into another job, unless I assign them to a variable and I process them singularly in a pipeline, which is time consuming.
I might need to end up doing something like this, even though it keeps the session busy: https://gallery.technet.microsoft.com/scriptcenter/Monitor-and-display-808ce573 , however the downside of this approach is that I cannot stop jobs and/or interact with them in any way until all of them area completed/failed, thats why I was trying to find a solution in a parallel monitoring job.

bash : how to wait some time to avoid simultaneous run of a script?

My problem :
Each night, my crontab launches several nightly tests on a supercomputer working with PBS under CentOS 6.5. When launched, the jobs wait in the queue. When the scheduler allow to run, my jobs start. It is quite common than the scheduler launch all the jobs exaclty at the same time (even if my crontab lauched them at separated moments).
I can't modify the main part of the job (but I can add things before). Each job starts with an update of a common SVN repository. But, when the jobs start simultaneously, I have an error due to concurrent updates on the same repository. I want to avoid that.
What I expect :
When launched by the scheduler, the job could wait some seconds before starting. A solution could be wait a random time before starting, but the risk to have the same random time grow fast with the number of tests I perform in parallel. If I reduce this risk by choosing a big random number, I have to wait too long (locking unused resources on the supercomputer).
I suppose it's possible to store the information of "I will launch now, others have to wait for 1 minute" for each job, in a multi-thread-safe manner, but I don't know how. What I imagine is a kind of mutex but inducing only a delay and not a lock waiting the end.
A solution without MPI is prefered.
Of course, I'm open to other solutions. Any help is welcome.
Call your script from a wrapper that attempts to obtain an exclusive lock on a lock file first. For example
{
flock -s 200
# your script/code here
} 200> /var/lock/myscript
The name of the lock file doesn't really matter, as long as you have write permission to open it. When this wrapper runs, it will first attempt to get an exclusive lock on /var/lock/myscript. If another script already has the lock, it will block until the lock becomes available.
Note that there are no arbitrary wait times; each script will run as soon as possible, in the order in which they first attempt to obtain the lock. This means you can also start the jobs simultaneously; the operating system will manage the access to the lock and the ordering.
Here is a solution by using GNU parallel
It might seem a bit counter-intuitive at first to use this tool, but if you set the maximum number of jobs to run at a time to 1, it can simulate a job queue that runs multiple jobs in sequence without any overlaps.
You can observe the desired effect of this command by using this example
seq 1 5 | parallel -j1 -k 'echo {}; sleep 1'
-j1 sets max jobs running at a time to 1 while -k preserves the order.
To apply this to your original problem, we can create say a file such that it contains a list of script files line by line. We can then pipe the content of that file to parallel to make multiple scripts run in sequence and in order.
cat file | parallel -j1 -k bash {}

How to restart a program in terminal periodically?

I am calling a program, let say myprogram, from the terminal (in OS X Mavericks) but some times it gets stuck due to external problems out of my control. This tends to happen approximately every half an hour.
myprogram basically has to perform a large quantity of small subtasks, which are saved in a file that is read in every new execution, so there is no need to recompute everything from the beginning.
I would like to fully automatize the restarting of the program by killing and restarting it again, in the following way:
Start the program.
Kill it after 30 minutes (the program will be probably stuck).
Restart it (back to step 1).
Any ideas on how to do this? My knowledge of bash scripting is not great precisely...
The following script can serve as a wrapper script for myprogram
#!/bin/bash
while true #begin infinite loop (you'll have to manually kill)
do
./myprogram & #execute myprogram and background
PID=$! #get PID of myprogram
sleep 1800 #sleep 30 minutes (30m might work as parameter)
kill -9 $PID #kill myprogram
done
You could use a wrapper, but, an infinite loop is not an optimal solution. If you are looking to relaunch a program on timer, or not, depending on the exit code and are on OS X, you should use launchd configuration (xml property list) files and load them with launchctl.
KeepAlive <boolean or dictionary of stuff>
This optional key is used to control whether your job is to be kept continuously running or to let
demand and conditions control the invocation. The default is false and therefore only demand will start
the job. The value may be set to true to unconditionally keep the job alive. Alternatively, a dictio-nary dictionary
nary of conditions may be specified to selectively control whether launchd keeps a job alive or not. If
multiple keys are provided, launchd ORs them, thus providing maximum flexibility to the job to refine
the logic and stall if necessary. If launchd finds no reason to restart the job, it falls back on
demand based invocation. Jobs that exit quickly and frequently when configured to be kept alive will
be throttled to converve system resources.
SuccessfulExit <boolean>
If true, the job will be restarted as long as the program exits and with an exit status of zero.
If false, the job will be restarted in the inverse condition. This key implies that "RunAtLoad"
is set to true, since the job needs to run at least once before we can get an exit status.
...
ExitTimeOut <integer>
The amount of time launchd waits before sending a SIGKILL signal. The default value is 20 seconds. The
value zero is interpreted as infinity.
For more information on launchd & plists visit :
https://developer.apple.com/library/mac/documentation/MacOSX/Conceptual/BPSystemStartup/Chapters/CreatingLaunchdJobs.html
https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man5/launchd.plist.5.html

Asterisk: don't wait for AGI script (bash) to finish before continuing in dialplan

I have an Asterisk dialplan that executes a bash script that matches the callerID with a database to geolocate the caller (by matching country and area codes). Since the database is quite large (global scale), it takes up to 15 seconds to finish.
I need to run this script immediately after answering the call (in case the user hangs up before the call is finished), but don't want the user to wait for the script execution. The return values should ideally be processed at the end of the dialplan just before the hangup.
Q1: I found http://www.voip-info.org/wiki/view/Asterisk+AGI#Forkandcontinuedialplan which deals with my problem in regards to perl scripts. How do i accomplish the same in bash? I know I can send any bash script to the background by adding a "&" at the end, but I'm clueless how to do that in the dialplan / when using AGI scripts.
Q2: How can I process the values even if the user hung up before / the dialplan "exited non-zero"?
Thanks for your help!
Use fastagi interface. Or fire UserEvent with AMI listener.
AGI is not designed to work like you want, so it will not work.
Sure you can use nohup command to get immortal bash script, but that is not the way it have be.

how to automatically run a bash script when my qsub jobs are finished on a server?

I would like to run a script when all of the jobs that I have sent to a server are done.
for example, I send
ssh server "for i in config*; do qsub ./run 1 $i; done"
And I get back a list of the jobs that were started. I would like to automatically start another script on the server to process the output from these jobs once all are completed.
I would appreciate any advice that would help me avoid the following inelegant solution:
If I save each of the 1000 job id's from the above call in a separate file, I could check the contents of each file against the current list of running jobs, i.e. output from a call to:
ssh qstat
I would only need to check every half hour, but I would imagine that there is a better way.
It depends a bit on what job scheduler you are using and what version, but there's another approach that can be taken too if your results-processing can also be done on the same queue as the job.
One very handy way of managing lots of related job in more recent versions of torque (and with grid engine, and others) is to launch the any individual jobs as a job array (cf. http://docs.adaptivecomputing.com/torque/4-1-4/Content/topics/commands/qsub.htm#-t). This requires mapping the individual runs to numbers somehow, which may or may not be convenient; but if you can do it for your jobs, it does greatly simplify managing the jobs; you can qsub them all in one line, you can qdel or qhold them all at once (while still having the capability to deal with jobs individually).
If you do this, then you could submit an analysis job which had a dependency on the array of jobs which would only run once all of the jobs in the array were complete: (cf. http://docs.adaptivecomputing.com/torque/4-1-4/Content/topics/commands/qsub.htm#dependencyExamples). Submitting the job would look like:
qsub analyze.sh -W depend=afterokarray:427[]
where analyze.sh had the script to do the analysis, and 427 would be the job id of the array of jobs you launched. (The [] means only run after all are completed). The syntax differs for other schedulers (eg, SGE/OGE) but the ideas are the same.
Getting this right can take some doing, and certainly Tristan's approach has the advantage of being simple, and working with any scheduler; but learning to use job arrays in this situation if you'll be doing alot of this may be worth your time.
Something you might consider is having each job script just touch a filename in a dedicated folder like $i.jobdone, and in your master script, you could simply use ls *.jobdone | wc -l to test for the right number of jobs done.
You can use wait to stop execution until all your jobs are done. You can even collect all the exit statuses and other running statistics (time it took, count of jobs done at the time, whatever) if you cycle around waiting for specific ids.
I'd write a small C program to do the waiting and collecting (if you have permissions to upload and run executables), but you can easily use the bash wait built-in for roughly the same purpose, albeit with less flexibility.
Edit: small example.
#!/bin/bash
...
waitfor=''
for i in tasks; do
task &
waitfor="$waitfor $!"
done
wait $waitfor
...
If you run this script in background, It won't bother you and whatever comes after the wait line will run when your jobs are over.

Resources