I'm trying to use a Slurm-operated cluster to run LS-Dyna (a finite-element simulation program with a limited number of licenses available on my cluster). I am trying to write my batch scripts so that I do not waste processing time due to this license limit (as well as to improve legibility when running 'squeue' commands) by using job arrays -but I'm having trouble making that work.
I want to run identical Bash scripts in a variety of FEM meshes, each of which I have organized into different subfolders.
Given this folder structure on my cluster...
cluster root
|
...
|
|-+ my scratch space's root
|
|-+ this project
|
|--+ lat_-5mm
| |- runCurrentLine.bash
| |- other files
|
|--+ lat_-4.75mm
| |- runCurrentLine.bash
| |- other files
|
|--+ lat_-4.5mm
| |- runCurrentLine.bash
| |- other files
|
...
|
|--+ lat_5mm
| |- runCurrentLine.bash
| |- other files
|
|
|-sendDynaRuns.bash
|-other dependencies
...I'm trying to submit "runCurrentLine.bash" in each folder by running the following script in my login node.
#!/bin/bash
iter=0
for foldernow in */; do
# change to subdirectory for current line iteration
cd "./${foldernow}";
# make Slurm and user happy
echo "sending LS Dyna simulation for ${pos}mm line..."
sleep 1
# first line only: send batch, and get job ID
if [ "${iter}" == 0 ];then
# send the batch...
jobID=$(sbatch -J "Dyna" --array="${iter}"%15 runCurrentLine.bash)
# ...ensure that Slurm's output shows on console (which includes the job ID)...
echo "${jobID}"
# ...and extract the job ID and save as a variable
jobID=$(echo "${jobID}" | grep -Eo '[+-]?[0-9]+([.][0-9]+)?')
# subsequent lines: add current line to job array
else
scontrol update --jobid="${jobID}" --array="${iter}"%15 runCurrentLine.bash
fi
# prepare to move onto next position
iter=$((iter+1))
cd ../
done
This setup properly sends the batch job for the first line, at -0.25mm*. However, for the second line onwards, it doesn't seem to do the same thing... This is what I end up getting on my console:
*: I intended the "lat_xmm" folders to be numerically ordered, but Unix doesn't seem to recognize that
$ ./sendDynaRuns.bash
sending LS Dyna simulation for -0.25mm line...
Submitted batch job 1081040
sending LS Dyna simulation for 0.25mm line...
sbatch: error: Batch job submission failed: Invalid job id specified
sending LS Dyna simulation for -0.5mm line...
sbatch: error: Batch job submission failed: Invalid job id specified
I know that runCurrentLine.bash runs just fine if I manually send it as a batch (and it runs to completion within the time limit I specified in-file, mainly since it doesn't have to compete with other lines for open licenses). What should I do to be able to get my code to work?
Thank you in advance!
As state by #Poshi, you cannot add jobs to an existing array.
I would create a submission script like this one:
#!/bin/bash
#SBATCH --array=1-<nb of folders>%15
# ALL OTHER SLURM SBATCH DIRECTIVES HERE
folders=(lat_*)
foldernow=${folders[$SLURM_TASK_ARRAY_ID]}
cd $foldernow && ./runCurrentLine.bash
The only drawback is that you need setup explicitly the number of jobs the array based on the number of folders.
Related
I have a simulation program in fortran which takes the input from a .dat. This file has 100.000 lines which takes really long to run. The program take the first line, run all the simulations and write in a .out the result and pass to the next line. I have a computer with 16 cpu so how can I do to split my data in 16 parts and run it separatly in each of the cpus? I am running in a machine with ubuntu. It is totally independent each line from the other.
For example my data is HeadData10000.dat, then I have a file simulation.ini with the name of the input data in this case: HeadData10000.dat and with the name of the output data. So the file simulation.ini will look like that
HeadData10000.dat
outputdata.out
Then now I have two computer so I split my HeadData10000.dat y two files and I do two simulation.ini for each input data and I run it like this in each computer: ./simulation.exe<./simulation.ini.
Assuming your list of 100,000 jobs is called "jobs.txt" and looks like this:
JobA
JobB
JobC
JobD
You could run this:
parallel 'printf "{}\n{.}.out" | ./simulation.exe' < jobs.txt
If you want to do a dry run to see what that would do without doing anything:
parallel --dry-run 'printf "{}\n{.}.out" | ./simulation.exe' < jobs.txt
Sample Output
printf "JobA\nJobA.out" | ./simulation.exe
printf "JobB\nJobB.out" | ./simulation.exe
printf "JobC\nJobC.out" | ./simulation.exe
printf "JobD\nJobD.out" | ./simulation.exe
If you have multiple servers available, look at using the -S parameter to GNU Parallel to spread the jobs across the machines. Also, look at the --eta and --bar parameters for getting progress reports.
I used printf "line1 \n line2" to generate two lines of input in order to avoid having to create, and later delete 100,000 files.
By default, GNU Parallel will keep 1 job per CPU core running, so there will always be 16 jobs running on your 16-core machine, but you can change that to, say, 8 if you want to with parallel -j 8. You can also specify the number of jobs to run on your second (and subsequent) machines.
I'm looking for a possibility to export all failed jobs (resolved and not) of a day into a file (text, csv, xml,..)
Tendency is, I will not be able to check all resolved/forced-ok jobs which failed all throughout the day unless I do it manually by placing in a spreadsheet.
Does anybody know if there is such an utility? We're currently using Control-M in Version 7.0 on Server
You can schedule a job do so :-
run below script as command line with passing two argument , %%PARM1 %%PARM2
you need to update two filed in it :-
1. NDP time of your Environment , I have used as 0930
2. Control-M environment name
3. your email ID in last line of mailx and file path as per your system .
***you can use mutt -a if mailx -a is not working in your system for sending email with attached file .
----------------------------------
now job :-
Job Type : Command
File Path : not reuired
C ommand : path/report.sh %%PARM1 %%PARM2
rest all as normal ,
but don't forget to define PARM1 and PRAM2 in auto edit variable
PARM1 = %%$PREV
PARM2 = %%$DATE
-------------------------
Script
***********************************
report.sh
------------------------------------------------
#!/bin/bash
env=< Control-M user name > # Use control-M name
ctmlog list $1 0930 $2 0930 | grep NOTOK > $1_failedjob.txt # update time to NDP time ,i used 0930
cut -d'|' -f2,3,4,5,8 $1_failedjob.txt | sed 's/|/,/g' > $1_failed.csv
awk 'BEGIN {print "DATE,TIME,JOBNAME\t,ORDERID\t,STATUS";}
{print $0;}
END { print "\tReport generated\n";}' $1_failed.csv
rm $1_failedjob.txt
echo " Last 24 Hour failed job list " | mailx -s "Failed Job list for $1" -a "absolute path of file $1_failed.csv" youremail#domain.com
exit 0`
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
Apart from using this , you can always ask your ops team to send an report by exporting failed job for a particular time and date from Control-m EM GUI
I am totally green in those kind of things, I tried reading some tutorials and still couldn't do it on my own.
Here lies the problem: I have 2 files (bulid, compile) which are somehow supposed to take other files and create a working warcraft3 map.
There's an instruction I followed:
To build:_
$ ./scripts/compile # most basic way of calling compile
$ ./scripts/build # most basic way of building a map
build
Script which takes an unportected map and applies any build settings passed via argv
upon on it then turns it into a working warcraft3 map file.
Options | Default | Description
----------------|-------------------------|---------------------------------------------------------------------
env | beta | map environment: each environment has default build settings
debug_script | false | debug this build script
do_jasshelper | true | turns vJass & ZINC into JASS
do_compile | true | turns ../src into out.j. When false, looks for {map_script_path}
do_optimizer | false | uses Vexorian's map optimizer to protect and make the map run faser
do_widgetizer | false | uses PitzerMike's map widgetizer to make map load faster
debug | false | whether the --debug flag should be passed to jasshelper
launchwc3 | false | whether the script should launch wc3 with the map loaded on exit
map_unpro_path | base-maps/{highest}.w3x | the base map file to inject script into
map_script_path | ../out.j | map script path to load into map
map_output_path | ITT_{commit}_{time}.w3x | path where to put the compiled map
setting up
This application requires Ruby
$ git clone git#github.com:theQuazz/island-troll-tribes
$ cd island-troll-tribes
$ scripts/build
It might be a easy thing to do, but I am really bad at this kind of stuff, if someone could explain what to do step by step I would greatly appreciate it
I am submitting a toy array job in slurm. My command line is
$ sbatch -p development -t 0:30:0 -n 1 -a 1-2 j1
where j1 is script:
#!/bin/bash
echo job id is $SLURM_JOB_ID
echo array job id is $SLURM_ARRAY_JOB_ID
echo task id id $SLURM_ARRAY_TASK_ID
When I submit this, I get an error:
--> Verifying valid submit host (login1)...OK
--> Verifying valid jobname...OK
--> Enforcing max jobs per user...OK
--> Verifying availability of your home dir (/home1/03400/myname)...OK
--> Verifying availability of your work dir (/work/03400/myname)...OK
--> Verifying availability of your scratch dir (/scratch/03400/myname)...OK
--> Verifying valid ssh keys...OK
--> Verifying access to desired queue (development)...OK
--> Verifying job request is within current queue limits...OK
--> Checking available allocation (PRJ-1234)...OK
sbatch: error: Batch job submission failed: Invalid job array specification
The same job works fine without the array specification:
$ sbatch -p development -t 0:30:0 -n 1 j1
This post is a bit old, but in case it happens for other people, I have had the same issue but the accepted answer did not suggest what was the problem in my case.
This error (sbatch: error: Batch job submission failed: Invalid job array specification) can also be raised when the array size is too large.
From https://slurm.schedmd.com/slurm.conf.html
MaxArraySize
The maximum job array size. The maximum job array task index value will be one less than MaxArraySize to allow for an index value of zero. Configure MaxArraySize to 0 in order to disable job array use. The value may not exceed 4000001. The value of MaxJobCount should be much larger than MaxArraySize. The default value is 1001.
To check the value, the slurm.conf file should be accessible by all slurm users (still according to 1) and may be found somewhere near /etc/slurm.conf (see https://slurm.schedmd.com/slurm.conf.html#lbAM, in my case I found it at path /etc/slurm/slurm.conf).
The syntax for your array specification is correct. But the printout you paste is not standard Slurm, I guess you are working on Stampede ; they have their own sbatch wrapper.
What you could do is use the -vvv option to sbatch to see exactly what Slurm sees:
$ sbatch -vvv -p development -t 0:30:0 -n 1 -a 1-2 j1 |& grep array
This should return
sbatch: array : 1-2
and if it does not it means the information is somehow lost somewhere.
What you can try is remove the array specification from the submission command line and insert it in the submission script, like this:
$ sbatch -p development -t 0:30:0 -n 1 j1
with j1 being
#!/bin/bash
#SBATCH -a 1-2
echo job id is $SLURM_JOB_ID
echo array job id is $SLURM_ARRAY_JOB_ID
echo task id id $SLURM_ARRAY_TASK_ID
The next step is to contact the system administrators with the information you will get from running the above tests and ask for help.
I am trying a code in shell script. while I am trying to convert the code from batch script to shell script I am getting an error.
BATCH FILE CODE
:: Create a file with all latest snapshots
FOR /F "tokens=5" %%a in (' ec2-describe-snapshots ^|find "SNAPSHOT" ^|sort /+64') do set "var=%%a"
set "latestdate=%var:~0,10%"
call ec2-describe-snapshots |find "SNAPSHOT"|sort /+64 |find "%latestdate%">"%EC2_HOME%\Working\SnapshotsLatest_%date-today%.txt"
CODE IN SHELL SCRIPT
#Create a file with all latest snapshots
FOR snapshot_date in $(' ec2-describe-snapshots | grep -i "SNAPSHOT" |sort /+64') do set "var=$snapshot_date"
set "latestdate=$var:~0,10"
ec2-describe-snapshots |grep -i "SNAPSHOT" |sort /+64 | grep "$latestdate">"$EC2_HOME%/SnapshotsLatest_$today_date"
I want to sort the snapshots according to dates and to save the snapshots that are created in latest date in a file.
SAMPLE OUTPUT OF ece-describe-snapshots:
SNAPSHOT snap-5e20 vol-f660 completed 2013-12-10T08:00:30+0000 100% 109030037527 10 2013-12-10: Daily Backup for i-2111 (VolID:vol-f9a0 InstID:i-2601)
It will contain records like this
I got this code :
latestdate=$(ec2-describe-snapshots | grep ^SNAPSHOT | sort -k 5 | awk '{print $5}')
ec2-describe-snapshots | grep SNAPSHOT.*$latestdate | > "$EC2_HOME/SnapshotsLatest_$today_date"
but getting this error :
grep: 2013-12-10T09:55:34+0000: No such file or directory
grep: 2013-12-11T04:16:49+0000: No such file or directory
grep: 2013-12-11T04:17:57+0000: No such file or directory
i have some snapshots made on amazon, i want to find the latest snapshots made on a date and then want to store them in a file. like date 2013-12-10 snapshots made on this date should be stored in file. Contents of snapshotslatest file should be
SNAPSHOT snap-c17f3 vol-f69a0 completed 2013-12-04T09:24:50+0000 100% 109030037527 10 2013-12-04: Daily Backup for Sanjay_Test_Machine (VolID:vol-f66409a0 InstID:i-26048111)
SNAPSHOT snap-c7d617f9 vol-3d335f6b completed 2013-12-04T09:24:54+0000 100% 109030037527 10 2013-12-04: Daily Backup for sacht_VPC (VolID:vol-3db InstID:i-ed6)
please not that if there are snapshots created on 2013-12-10, 2013-12-11, 2013-12-12. It means that the latest_date should be 2013-12-12 and all the snaphshot created on 2013-12-12 should be saved in file.
Any suggestion or lead is appreciated.
Neither the batch script nor the shell script you posted are a good starting point so let's start from scratch. Sorry, this is too big for a comment.
You want to find the latest snapshots made on a date and then want to store them in a file.
What does that mean?
Do the snapshot files have a timestamp in their name or in their content?
If not - UNIX does not store file creation timestamps so is a last-modified timestamp adequate?
Do you literally want to concatenate all of your snapshot files into one singe file or do you want to create a file that has a list of the snapshot file names?
Post some sample input (e.g. some snapshot file names and contents if that's where the timestamp is stored) and the expected output given that input.
Update your question to address all of the above, do not try to reply in a comment.
Minor issue, you don't need a pipe when re-directing output, so your line to save should be
ec2-describe-snapshots | grep SNAPSHOT.*$latestdate > "$EC2_HOME/SnapshotsLatest_$today_date"
Now the main issue here, is that the grep is messed up. I haven't worked with amazon snapshots, but judging by your example descriptions, you should be doing something like
latestdate=$(ec2-describe-snapshots | grep -oP "\d+-\d+-\d+" | sort -r | head -1)
This will get all the dates containing the form dddd-dd-dd from the file (I'm assuming the two dates in each snapshot line always match up), sort them in reverse order (latest first) and take the head which is the latest date, storing it in $latestdate.
Then to store all snapshots with the given date do something like
ec2-describe-snapshots | grep -oP "SNAPSHOT(.*?)$lastdateT(.*?)\)" > "$EC2_HOME/SnapshotsLatest_$today_date"
This will get all text starting with SNAPSHOT, containing the given date, and ending in a closing ")" and save it. Note, you may have to mess around with it a bit, if ")" can be present elsewhere.