How do I create a Stack or LIFO for GNU Parallel in Bash - bash

While my original problem was solved in a different manner (see comment thread under this question, as well as the edits to this question), I was able to create a stack/LIFO for GNU Parallel in Bash. So I will edited my background/question to reflect a situation where it could be needed.
Background
I am using GNU Parallel to process files with a Bash script. As the files are processed, more files are created and new commands need to be added to parallel's list. I am not able to give parallel a complete list of commands, as information is generated as the initial files are processed.
I need a way to add the lines to parallel's list while it is running.
Parallel will also need to wait for a new line if nothing is in the queue and exit once the queue is finished.
Solution
First I created a fifo:
mkfifo /tmp/fifo
Next I created a bash file that cat's the file and pipes the output to parallel, which checks for the end_of_file line. (I wrote this with help from the accepted answer as well as from here)
#!/bin/bash
while true;
do
cat /tmp/fifo
done | parallel --ungroup --gnu --eof "end_of_file" "{}"
Then I write to the pipe with this command, adding lines to parallel's queue:
echo "command here" > /tmp/fifo
With this setup, all new commands are added to the queue. Once the queue is full parallel will begin processing it. This means that if you have slots for 32 jobs (32 processors), then you will need to add 32 jobs in order to start the queue.
If parallel is occupying all of its processors, it will put the job on hold until a processor becomes available.
By using the --ungroup argument, parallel will process/output jobs as they are added to the queue once the queue is full.
Without the --ungroup argument, parallel waits until a new slot is needed to complete a job. From the accepted answer:
Output from the running or completed jobs are held back and will only be printed when JobSlots more jobs has been started (unless you use --ungroup or -u, in which case the output from the jobs are printed immediately). E.g. if you have 10 jobslots then the output from the first completed job will only be printed when job 11 has started, and the output of second completed job will only be printed when job 12 has started.

From http://www.gnu.org/software/parallel/man.html#EXAMPLE:-GNU-Parallel-as-queue-system-batch-manager
There is a a small issue when using GNU parallel as queue system/batch manager: You have to submit JobSlot number of jobs before they will start, and after that you can submit one at a time, and job will start immediately if free slots are available. Output from the running or completed jobs are held back and will only be printed when JobSlots more jobs has been started (unless you use --ungroup or -u, in which case the output from the jobs are printed immediately). E.g. if you have 10 jobslots then the output from the first completed job will only be printed when job 11 has started, and the output of second completed job will only be printed when job 12 has started.

Related

How to make sbatch wait until last submitted job is *running* when submitting multiple jobs?

I'm running a numerical model which parameters are in a "parameter.input" file. I use sbatch to submit multiple iterations of the model, with one parameter in the parameter file changing every time. Here is the loop I use:
#!/bin/bash -l
for a in {01..30}
do
sed -i "s/control_[0-9][0-9]/control_${a}/g" parameter.input
sbatch --time=21-00:00:00 run_model.sh
sleep 60
done
The sed line changes a parameter in the parameter file. The
run_model.sh file runs the model.
The problem: depending on the resources available, a job might run immediately or stay pending for a few hours. With my default loop, if 60 seconds is not enough time to find resources for job n to run, the parameter file will be modified while job n is pending, meaning job n will run with the wrong parameters. (and I can't wait for job n to complete before submitting job n+1 because each job takes several days to complete)
How can I force batch to wait to submit job n+1 until job n is running?
I am not sure how to create an until loop that would grab the status of job n and wait until it changes to 'running' before submitting job n+1. I have experimented with a few things, but the server I use also hosts another 150 people's jobs, and I'm afraid too much experimenting might create some issues...
Use the following to grab the last submitted job's ID and its status, and wait until it isn't pending anymore to start the next job:
sentence=$(sbatch --time=21-00:00:00 run_model.sh) # get the output from sbatch
stringarray=($sentence) # separate the output in words
jobid=(${stringarray[3]}) # isolate the job ID
sentence="$(squeue -j $jobid)" # read job's slurm status
stringarray=($sentence)
jobstatus=(${stringarray[12]}) # isolate the status of job number jobid
Check that the job status is 'running' before submitting the next job with:
if [ "$jobstatus" = "R" ];then
# insert here relevant code to run next job
fi
You can insert that last snippet in an until loop that checks the job's status every few seconds.

Queue using several processes to launch bash jobs

I need to run many (hundreds) commands in shell, but I only want to have a maximum of 4 processes running (from the queue) at once. Each process will last several hours.
When a process finishes I want the next command to be "popped" from the queue and executed.
I also want to be able to add more process after the beginning, and it will be great if I could remove some jobs from the queue, or at least empty the queue.
I have seen solutions using makefile, but this only work if I have all my list of commands before the beginning. Also tried using mkfifo sjobq, and others, but I never could reach my needs...
Does anyone have code to solve this problem?
Edit: In response to Mark Setchell
The solution with tail -f and parallel is almost perfect, but when I do it, it always keep not launching the last 4 commands until I add more, and so on, I don't know why, and it is quite troublesome...
As for Redis, good solution also, but it takes more time to master all of it.
Thanks !
Use GNU Parallel to make a job queue like this:
# Clear out file containing job queue
> jobqueue
# Start GNU Parallel processing jobs from queue
# -k means "keep" output in order
# -j 4 means run 4 jobs at a time
tail -f jobqueue | parallel -k -j 4
# From another terminal, submit 40 jobs to the queue
for i in {1..40}; do echo "sleep 5;date +'%H:%M:%S Job $i'"; done >> jobqueue
Another option is to use REDIS - see my answer here Run several jobs parallelly and Efficiently

Hold a bash script on PBS status without Torque

I've access to a low priority queue on a large national system. I can allocate in the queue only 1 job at the time.
The PBS job contains a program who is not likely to complete before the wall-time ends. Jobs on hold can't be queued in a number that exceeds 3.
It means that:
I can not use -W depend=afterok:$ID_of_previous_job . The script would submit all the job at once, but just the first 3 will enter the queue (the last 2 in H state)
I can not modify the submission script with a last line that submit the next_job (it is very likely that the actual program won't finish before the walltime ends and then the last line is not executed.
I can not install any software so I am limited to use a Bash Script, rather than Torque
I'd rather not use a "time check" script (such as: every 5 minute check if previous_job is over)
Is it possible to use a while and or sleep ?
Option 1
To use a while and sleep requires you to do something very similar to a time check script:
#!/bin/bash
jobid=`submit the first job`
while [[ -z `qstat ${jobid} | grep C` ]]; do
sleep 5
done
# submit the new job once the loop is done, after checking the exit status if desired
Option 2 - may be TORQUE only, not sure:
Perhaps a better way, suggested by Dmitri Chubarov in the comments, would be to use the per-job epilogue option. To do this the compute nodes have to be able to submit jobs, but since you were considering having the final line of the job do it then this seems like a possibility.
Add to the job a perjob epilogue by adding this line to the script:
#PBS -l epilogue=/path/to/script
And then have the script:
#!/bin/bash
# check exit code if desired, its argument 10 to the script
# submit the next job

DATASTAGE: how to run more instance jobs in parallel using DSJOB

I have a question.
I want to run more instance of same job in parallel from within a script: I have a loop in which I invoke jobs with dsjob and without option "-wait" and "-jobstatus".
I want that jobs completed before script termination, but I don't know how to verify if job instance terminated.
I though to use wait command but it is not appropriate.
Thanks in advance
First,you should assure job compile option "Allow Multiple Instance" choose.
Second:
#!/bin/bash
. /home/dsadm/.bash_profile
INVOCATION=(1 2 3 4 5)
cd $DSHOME/bin
for id in ${INVOCATION[#]}
do
./dsjob -run -mode NORMAL -wait test demo.$id
done
project -- test
job -- demo
$id -- invocation id
the two line in shell scipt:guarantee the environment path can work.
Run the jobs like you say without the -wait, and then loop around running dsjob -jobinfo and parse the output for a job status of 1 or 2. When all jobs return this status, they are all finished.
You might find, though, that you check the status of the job before it actually starts running and you might pick up an old status. You might be able to fix this by first resetting the job instance and waiting for a status of "Not running", prior to running the job.
Invoke the jobs in loop without wait or job-status option
after your loop , check the jobs status by dsjob command
Example - dsjob -jobinfo projectname jobname.invocationid
you can code one more loop for this also and use sleep command under that
write yours further logic as per status of the jobs
but its good to create Job Sequence to invoke this multi-instance job simultaneously with the help of different invoaction-ids
create a sequence job if these are in same process
create different sequences or directly create different scripts to trigger these jobs simultaneously with invocation- ids and schedule in same time.
Best option create a standard generalized script where each thing will be getting created or getting value as per input command line parameters
Example - log files on the basis of jobname + invocation-id
then schedule the same script for different parameters or invocations .

Making qsub block until job is done?

Currently, I have a driver program that runs several thousand instances of a "payload" program and does some post-processing of the output. The driver currently calls the payload program directly, using a shell() function, from multiple threads. The shell() function executes a command in the current working directory, blocks until the command is finished running, and returns the data that was sent to stdout by the command. This works well on a single multicore machine. I want to modify the driver to submit qsub jobs to a large compute cluster instead, for more parallelism.
Is there a way to make the qsub command output its results to stdout instead of a file and block until the job is finished? Basically, I want it to act as much like "normal" execution of a command as possible, so that I can parallelize to the cluster with as little modification of my driver program as possible.
Edit: I thought all the grid engines were pretty much standardized. If they're not and it matters, I'm using Torque.
You don't mention what queuing system you're using, but SGE supports the '-sync y' option to qsub which will cause it to block until the job completes or exits.
In TORQUE this is done using the -x and -I options. qsub -I specifies that it should be interactive and -x says run only the command specified. For example:
qsub -I -x myscript.sh
will not return until myscript.sh finishes execution.
In PBS you can use qsub -Wblock=true <command>

Resources