sbatch error Memory specification can not be satisfied - sbatch

I wan to submit a sequential job, but I got:
sbatch: error: Memory specification can not be satisfied
sbatch: error: Batch job submission failed: Requested node configuration is not available
This is my .sh file:
#SBATCH --nodes=1
#SBATCH --time=01:00:00
#SBATCH --job-name=job-8-0
#SBATCH --mem=64000mb
#SBATCH --exclusive
module purge
module load gcc-8.3.0-gcc-4.8.5-tu6ftrf
echo "Starting job-8-0"
echo "Starting at `date`"
cd code
srun gcc -Wno-return-type file1.cpp file2.cpp file3.cpp file4.cpp file5.cpp main.cpp -o myExperiment -lstdc++ -lm
srun ./myExperiment 8 0
echo "Experiment 8-0 finished with exit code $? at: `date`"
The node login01 info is:
NodeName=login01 Arch=x86_64 CoresPerSocket=8
CPUAlloc=0 CPUErr=0 CPUTot=32 CPULoad=29.91
AvailableFeatures=(null)
ActiveFeatures=(null)
Gres=gpu:8
NodeAddr=10.0.50.0 NodeHostName=login01 Version=17.11
OS=Linux 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018
RealMemory=1 AllocMem=0 FreeMem=65001 Sockets=2 Boards=1
State=IDLE+DRAIN ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
BootTime=2021-05-25T13:13:10 SlurmdStartTime=2021-05-25T16:35:31
CfgTRES=cpu=32,mem=1M,billing=32
AllocTRES=
CapWatts=n/a
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
Reason=gres/gpu count too low (0 < 8) [slurm#2021-06-28T13:38:40]
There are also other nodes with FreeMem=122000 , or 121000,... etc that is more than 64000mb
This is the specifications of the supercomputer:
• OS: Linux CentOS 7
• 300 compute nodes
• Each node has:
o 2 CPUs: Xeon E5-2650 8 Cores 2.000GHz (total 16 cores)
o 2 dual AMD FirePro S10000 GPUs
o Memory : 128 GB RAM
• Scheduler: Slurm
When I open
nodes.conf
NodeName=login01 NodeAddr=10.0.50.0 CPUs=32 Procs=32 Sockets=2 CoresPerSocket=8 ThreadsPerCore=2 State=IDLE
which is the same for all nodes.
This is slurm.conf
ClusterName=sanam
ControlMachine=mgmt01
ControlAddr=10.0.1.254
SlurmUser=slurm
SlurmdUser=root
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
StateSaveLocation=/var/spool/slurm/ctld
SlurmdSpoolDir=/var/spool/slurmd
SwitchType=switch/none
MpiDefault=none
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
ProctrackType=proctrack/pgid
ReturnToService=2
GresTypes=gpu
# TIMERS
SlurmctldTimeout=300
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
SelectType=select/cons_res
SelectTypeParameters=CR_Core
# SCHEDULING
SchedulerType=sched/backfill
FastSchedule=1
# LOGGING AND ACCOUNTING
AccountingStorageType=accounting_storage/slurmdbd
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/linux
SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdDebug=3
SlurmdLogFile=/var/log/slurm/slurmd.log
JobCompType=jobcomp/none
AccountingStorageHost=10.0.1.254
include /etc/slurm/nodes.conf
include /etc/slurm/partitions.conf
#include /etc/slurm/gres.conf
What causes this problem "Memory specification can not be satisfied"?
Should I specify the RAM and CPU with srun ? specifically the second srun that run the experiment?

Related

multiple srun jobs within a single sbatch killed unexpectedly

I was trying to run multiple srun jobs within a single sbatch script on a cluster. The sbatch script is as follows:
#!/bin/bash
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -c 64
#SBATCH --time=200:00:00
#SBATCH -p amd_256
for i in {0..6} ;
do
cd ${i}
( srun -c 8 ./MD 150 20 300 20 20 0 0 > log.out 2>&1 & )
sleep 20
cd ..
done
cd 7/
srun -c 8 ./MD 100 20 300 20 20 0 0 > log.out 2>&1
cd ..
wait
In this script I submitted multiple srun jobs. One problem with this script is that 0-6th job will be killed after the 7th job is finished. Here is the error message I got for the 0-6th job:
srun: Job step aborted: Waiting up to 62 seconds for job step to finish.
slurmstepd: error: *** STEP 3801214.0 ON j2308 CANCELLED AT 2021-12-22T11:02:22 ***
srun: error: j2308: task 0: Terminated
Any idea on how to fix this?
The line
( srun -c 8 ./MD 150 20 300 20 20 0 0 > log.out 2>&1 & )
creates a subshells and puts them into the background inside the subshell. So the wait-call in the last line doesn't know about those background processes, as they are part of a different shell/process. And since the batch script is now finished, the job will be terminated.
Try this:
( srun -c 8 ./MD 150 20 300 20 20 0 0 > log.out 2>&1 ) &
As an example: Try
( sleep 60 & )
wait
and
( sleep 60 ) &
wait
to see the difference.

out of range for vm-bytes stress-ng

we are using the following:
stress-ng: 0.07.16-1
Debian: 9.13
kernel: 4.19.133
I wanted to start a stress test on a machine with 1.5 TB of RAM. Usually we start the stress-test:
sudo systemd-run --slice=system.slice stress-ng -m64 -c64 -f64 --vm-bytes $(awk '/MemAvailable/{printf "%d\n", $2 * 0.9;}' < /proc/meminfo)k --vm-keep --timeout 48h
And this time I got back this error:
Value 1 is out of range for vm-bytes, allowed: 4096 .. 4294967296
Can someone help me out here ?
kind regards,
Pascal den Bekker

Why is bash breaking MPI job control loop

I'm attempting to use a simple bash script to sequentially run a batch of MPI jobs. This script works perfectly when running serial code (I am using Fortran 90), but for some reason bash breaks out of the loop when I attempt to execute MPI code.
I already found a work-around to the problem. I just wrote essentially the exact same script in Perl and it worked like a charm. I just really want to understand the issue here because I prefer the simplicity of bash and it perfectly fits my own scripting needs in almost all other cases.
I've tried running the MPI code as a background process and using wait with the same result. If I run the jobs in the background without using wait, bash does not break out of the loop, but it stacks up jobs until eventually crashing. The goal is to run the executable sequentially for each parameter set anyway, I just wanted to note that the loop is not broken in that case.
Bash Script, interp.sh: Usage --> $ ./interp.sh inputfile
#!/bin/bash
PROG=$1
IFILE=$2
kount=0 # Counter variable for looping through input file
sys=0 # Counter variable to store how many times model has been run
while IFS="\n" read -r line
do
kount=$(( $kount + 1 ))
if [ $(( kount % 2 )) -eq 1 ] # if kount is even, then expect headers
then
unset name defs
sys=$(( $sys + 1 ))
name=( $line ) # parse headers
defs=${#name[*]}
k=$(( $defs - 1 ))
else # if count is odd, then expect numbers
unset vals
vals=( $line ) # parse parameters
for i in $( seq 0 $k )
do
# Define variables using header names and set their values
printf -v "${name[i]}" "${vals[i]}"
done
# Print input variable values
echo $a $b $c $d $e $nPROC
# Run executable
mpiexec -np $nPROC --oversubscribe --hostfile my_hostfile $PROG
fi
done < $IFILE
Input file, input.dat:
a b c d e nPROC
1 2 3 4 5 2
nPROC
3
nPROC
4
nPROC
5
nPROC
6
nPROC
7
nPROC
8
Sample MPI f90 code, main.f90:
program main
use mpi
implicit none
integer :: i, ierr, myID, nPROC
integer, parameter :: foolen = 100000
double precision, dimension(0:foolen) :: foo
call MPI_INIT(ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, nPROC, ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, myID, ierr)
if ( myID .eq. 0 ) then
do i=0,foolen
foo(i) = i
end do
else
do i=0,foolen
foo(i) = i
end do
end if
call MPI_FINALIZE(ierr)
end program
Sample makefile:
COMP=mpif90
EXT=f90
CFLAGs=-Wall -Wextra -Wimplicit-interface -fPIC -fmax-errors=1 -g -fcheck=all \
-fbacktrace
MPIflags=--oversubscribe --hostfile my_hostfile
PROG=main.x
INPUT=input.dat
OUTPUT=output
OBJS=main.o
$(PROG): $(OBJS)
$(COMP) $(CFLAGS) -o $(PROG) $(OBJS) $(LFLAGS)
main.o: main.f90
$(COMP) -c $(CFLAGs) main.f90
%.o: %.f90
$(COMP) -c $(CFLAGs) $<
run:
make && make clean
./interp.sh $(PROG) $(INPUT)
clean:
rm -f *.o DONE watch
my_hostfile
localhost slots=4
Note that if the mpiexec line is commented out, the script runs as expected. The output looks like this:
1 2 3 4 5 2
1 2 3 4 5 3
1 2 3 4 5 4
1 2 3 4 5 5
1 2 3 4 5 6
1 2 3 4 5 7
1 2 3 4 5 8
These are the parameter values which are supposed to be passed to the MPI code in each loop. However, when mpiexec is called in the script, only the first set of parameters is read and passed.
I apologize if all that is a bit excessive, I just wanted to provide all that is needed for testing. Any help solving the issue in bash or explanation of why this happens would be greatly appreciated!
mpiexec is consuming the stdin thus reading all remaining lines in the loop. So after the first loop stdin is empty and the loop breaks.
This is an issue that occurs not only with loops calling mpiexec from whithin but also with loops other commands that consumes stdin by default such as ssh.
The general solution is to use < /dev/null so that the offending command won't consume stdin but the /dev/null instead. Some commands have special flags to replace the redirect command such as ssh -n
so the solution in this case would be to add the redirect at the end of the line where mpiexec is called:
mpiexec -np $nPROC --oversubscribe --hostfile my_hostfile $PROG < /dev/null
there are some issues to pay attention to in the case of mpiexec related to Standard I/O detailed here: https://www.open-mpi.org/doc/v3.0/man1/mpiexec.1.php#toc14

cgroups blkio subsystem is not counting the block write byte count properly for conatiner applications

I am working on the linux kernel base 3.14 version and i have enabled the cgroup and blkio subsystem on it for checking the write byte count of the block device from the container and host applications.
But, I have problems in getting the written bytes from the cgroup blkio throttling function for the container application.
It works for the main hierarchy (e.g. /sys/fs/cgroup/blkio/blkio.throttle.io_service_bytes) , but not for the deeper ones (e.g. /sys/fs/cgroup/blkio/lxc/web (container name is web))
I created a small test script (checkWrite), which will simply enter ther cgroup it is started in (pwd) and will create 1M.
#!/bin/bash
SIZE=1M
DST="/home/root"
#check if we are in the /sys/fs/cgroup/ dir
if [ ! -e ./tasks ]; then
echo "Error, this script must be started in a cgroup blkio directory"
echo "Start in or below /sys/fs/cgroup/blkio !"
exit -1
fi
echo "Using the cgroup: ${PWD##*/cgroup}"
# add myself to cgroup
echo $$ > tasks
mygroup=`cat /proc/$$/cgroup | grep blkio`
echo "we're now in bklio cgroup: ${mygroup}"
# call sync to let kernel store data
sync
sleep 1
# fetch current writen bytes count for eMMC
before=$(cat blkio.throttle.io_service_bytes | grep "179:24 Write")
echo "before writing: ${before}"
echo "writing ${SIZE} random data to ${DST}/DELME ..."
dd if=/dev/urandom of=${DST}/DELME bs=${SIZE} count=1
sync
sleep 2
# fetch current writen bytes count for eMMC
after=$(cat blkio.throttle.io_service_bytes | grep "179:24 Write")
echo "after writing: ${after}"
written=$((${after##* }-${before##* }))
written=$((written/1024))
echo "written = ${after##* }B - ${before##* }B = ${written}kB"
rm -rf ${DST}/DELME
The output is;
/sys/fs/cgroup/blkio# ~/checkWrite
Using the cgroup: /blkio
we're now in bklio cgroup: 3:blkio:/ <- this task is in this blkio chgroup now
before writing: 179:24 Write 200701952 <- from blkio.throttle.io_service_bytes
writing 1M random data to /var/opt/bosch/dynweb/DELME ...
1+0 records in
1+0 records out
after writing: 179:24 Write 201906176
written = 201906176B - 200701952B = **1176kB** **<- fairly ok**
/sys/fs/cgroup/blkio/lxc/web# ~/checkWrite
Using the cgroup: /blkio/system.slice
we're now in bklio cgroup: 3:blkio:/system.slice
before writing: 179:24 Write 26064896
writing 1M random data to /var/opt/bosch/dynweb/DELME ...
1+0 records in
1+0 records out
after writing: 179:24 Write 26130432
written = 26130432B - 26064896B = **64kB** **<- much too less**
Do I misunderstand the handling?
If it is not working, then how to monitor/watch/read the block device write from the container applications.

Mac OS X: GNU parallel can't find the number of cores on a remote server

I used homebrew to install GNU parallel on my mac so I can run some tests remotely on my University's servers. I was quickly running through the tutorials, but when I ran
parallel -S <username>#$SERVER1 echo running on ::: <username>#$SERVER1
I got the message
parallel: Warning: Could not figure out number of cpus on <username#server> (). Using 1.
Possibly related, I never added parallel to my path and got the warning that "parallel" wasn't a recognized command, but parallel ran anyways and still echo'd correctly. This particular server has 16 cores, how can I get parallel to recognize them?
GNU Parallel is less tested on OS X as I do not have access to an OS X installation, so you have likely found a bug.
GNU Parallel has since 20120322 used these to find the number of CPUs:
sysctl -n hw.physicalcpu
sysctl -a hw 2>/dev/null | grep [^a-z]physicalcpu[^a-z] | awk '{ print \$2 }'
And the number of cores:
sysctl -n hw.logicalcpu
sysctl -a hw 2>/dev/null | grep [^a-z]logicalcpu[^a-z] | awk '{ print \$2 }'
Can you test what output you get from those?
Which version of GNU Parallel are you using?
As a work around you can force GNU Parallel to detect 16 cores:
parallel -S 16/<username>#$SERVER1 echo running on ::: <username>#$SERVER1
Since version 20140422 you have been able to export your path to the remote server:
parallel --env PATH -S 16/<username>#$SERVER1 echo running on ::: <username>#$SERVER1
That way you just need to add the dir where parallel lives on the server to your path on local machine. E.g. parallel on the remote server is in /home/u/user/bin/parallel:
PATH=$PATH:/home/u/user/bin parallel --env PATH -S <username>#$SERVER1 echo running on ::: <username>#$SERVER1
Information for Ole
My iMac (OSX MAvericks on Intel core i7) gives the following, which all looks correct:
sysctl -n hw.physicalcpu
4
sysctl -a hw
hw.ncpu: 8
hw.byteorder: 1234
hw.memsize: 17179869184
hw.activecpu: 8
hw.physicalcpu: 4
hw.physicalcpu_max: 4
hw.logicalcpu: 8
hw.logicalcpu_max: 8
hw.cputype: 7
hw.cpusubtype: 4
hw.cpu64bit_capable: 1
hw.cpufamily: 1418770316
hw.cacheconfig: 8 2 2 8 0 0 0 0 0 0
hw.cachesize: 17179869184 32768 262144 8388608 0 0 0 0 0 0
hw.pagesize: 4096
hw.busfrequency: 100000000
hw.busfrequency_min: 100000000
hw.busfrequency_max: 100000000
hw.cpufrequency: 3400000000
hw.cpufrequency_min: 3400000000
hw.cpufrequency_max: 3400000000
hw.cachelinesize: 64
hw.l1icachesize: 32768
hw.l1dcachesize: 32768
hw.l2cachesize: 262144
hw.l3cachesize: 8388608
hw.tbfrequency: 1000000000
hw.packages: 1
hw.optional.floatingpoint: 1
hw.optional.mmx: 1
hw.optional.sse: 1
hw.optional.sse2: 1
hw.optional.sse3: 1
hw.optional.supplementalsse3: 1
hw.optional.sse4_1: 1
hw.optional.sse4_2: 1
hw.optional.x86_64: 1
hw.optional.aes: 1
hw.optional.avx1_0: 1
hw.optional.rdrand: 0
hw.optional.f16c: 0
hw.optional.enfstrg: 0
hw.optional.fma: 0
hw.optional.avx2_0: 0
hw.optional.bmi1: 0
hw.optional.bmi2: 0
hw.optional.rtm: 0
hw.optional.hle: 0
hw.cputhreadtype: 1
hw.machine = x86_64
hw.model = iMac12,2
hw.ncpu = 8
hw.byteorder = 1234
hw.physmem = 2147483648
hw.usermem = 521064448
hw.pagesize = 4096
hw.epoch = 0
hw.vectorunit = 1
hw.busfrequency = 100000000
hw.cpufrequency = 3400000000
hw.cachelinesize = 64
hw.l1icachesize = 32768
hw.l1dcachesize = 32768
hw.l2settings = 1
hw.l2cachesize = 262144
hw.l3settings = 1
hw.l3cachesize = 8388608
hw.tbfrequency = 1000000000
hw.memsize = 17179869184
hw.availcpu = 8
sysctl -n hw.logicalcpu
8

Resources