"no space" error when using shell script with ulimit stack at 32K - shell

The objective is to collect information about top 10 process using memory. This will help to identify the top user - over a period of time. The following script is being used. But, it stops after a period of time with "no space" error after reaching 32K limit.
#!/usr/bin/ksh
while :
do
today=`date +"%Y%m%d_%H%M%S"`
top=`svmon -P`
sum=`svmon -P -t10 -O summary=basic`
echo "$today" >> svmonps.out
echo "$top" >> svomonps.out
echo "$sum" >> svmonps.out
sleep 30
done
exit 0
Current ulimit -a setting for stack (kbytes) is 32768. Can we modify the script to continue inspite of ulimit restriction?
Thanks in advance.

You can give up using those variables & sub-shells.
run date and svmon directly to your file, or even better to the standard output, and append to svmonps.out when you call your script.
check this out:
#!/usr/bin/ksh
while :; do
date +"%Y%m%d_%H%M%S"
svmon -P
svmon -P -t10 -O summary=basic
sleep 30
done
exit 0
And when you run your script run it like this to append to your file:
$ myScript >> svmonps.out

Related

How to submit a job array in Hoffman2 if I have a limit of 500 jobs?

I need to submit a job array of 100'000 jobs in Hoffman2. I have a limit of 500. Thus starting job 500, I get the following error:
Unable to run job: job rejected: Only 500 jobs are allowed per user (current job count: 500). Job of user "XX" would exceed that limit. Exiting.
Right now the submission Bash code is:
#!/bin/bash
#$ -cwd
#$ -o test.joblog.LOOP.$JOB_ID
#$ -j y
#$ -l h_data=3G,h_rt=02:00:00
#$ -m n
#$ -t 1-100000
echo "STARTING TIME -- $(date) "
echo "$SGE_TASK_ID "
/u/systems/UGE8.6.4/bin/lx-amd64/qsub submit_job.sh $SGE_TASK_ID
I tried to modify my code according to some Slurm Documentation but it does not work for Hoffman2 apparently (by adding % I am able to set the number of simultaneous running job).
#$ -cwd
#$ -o test.joblog.LOOP.$JOB_ID
#$ -j y
#$ -l h_data=3G,h_rt=02:00:00
#$ -m n
#$ -t 1-100000%500
echo "STARTING TIME -- $(date) "
echo "$SGE_TASK_ID "
/u/systems/UGE8.6.4/bin/lx-amd64/qsub submit_job.sh $SGE_TASK_ID
Do you know how can I modify my submission Bash code in order to always have 500 running job?
Assuming that your job is visible as
/u/systems/UGE8.6.4/bin/lx-amd64/qsub submit_job.sh
in ps -e, you could try something quick and dirty like:
#!/bin/bash
maxproc=490
while : ; do
qproc=$(ps -e | grep '/u/systems/UGE8.6.4/bin/lx-amd64/qsub submit_job.sh' | wc -l)
if [ "$qproc" -lt $maxproc ] ; then
submission_code #with correct arguments
fi
sleep 10 # or anytime that you feel appropriate
done
Of course, this shows only the principle; you may need to do some testing whether there are more submission-codes; I also assumed he submissioncode self-backgrounds. And many more. But you'll get the idea.
A possible approach (free of busy waiting and ugliness of that kind) is to track the number of jobs on the client side, cap their total count at 500 and, each time any of them finishes, immediately start a new one to replace it. (This is, however, based on the assumption that the client script outlives the jobs.) Concrete steps:
Make the qsub tool block and (passively) wait for the completion of its remote job. Depending on the particular qsub implementation, it may have a -sync flag or something more complex may be needed.
Keep exactly 500, no more and, if possible, no fewer waiting instances of qsub. This can be automated by using this answer or this answer and setting MAX_PARALLELISM to 500 there. qsub itself would be started from the do_something_and_maybe_fail() function.
Here’s a copy&paste of the Bash outline from the answers linked above, just to make this answer more self-contained. Starting with a trivial and runnable harness / dummy example (with a sleep instead of a qsub -sync):
#!/bin/bash
set -euo pipefail
declare -ir MAX_PARALLELISM=500 # pick a limit
declare -i pid
declare -a pids=()
do_something_and_maybe_fail() {
### qsub -sync "$#" ... ### # add the real thing here
sleep $((RANDOM % 10)) # remove this :-)
return $((RANDOM % 2 * 5)) # remove this :-)
}
for pname in some_program_{a..j}{00..60}; do # 600 items
if ((${#pids[#]} >= MAX_PARALLELISM)); then
wait -p pid -n \
&& echo "${pids[pid]} succeeded" 1>&2 \
|| echo "${pids[pid]} failed with ${?}" 1>&2
unset 'pids[pid]'
fi
do_something_and_maybe_fail & # forking here
pids[$!]="${pname}"
echo "${#pids[#]} running" 1>&2
done
for pid in "${!pids[#]}"; do
wait -n "$((pid))" \
&& echo "${pids[pid]} succeeded" 1>&2 \
|| echo "${pids[pid]} failed with ${?}" 1>&2
done
The first loop needs to be adjusted for the specific use case. An example follows, assuming that the right do_something_and_maybe_fail() implementation is in place and that one_command_per_line.txt is a list of arguments for qsub, one invocation per line, with an arbitrary number of lines. (The script could accept a file name as an argument or just read the commands from standard input, whatever works best.) The rest of the script would look exactly like the boilerplate above, keeping the number of parallel qsubs at MAX_PARALLELISM at most.
while read -ra job_args; do
if ((${#pids[#]} >= MAX_PARALLELISM)); then
wait -p pid -n \
&& echo "${pids[pid]} succeeded" 1>&2 \
|| echo "${pids[pid]} failed with ${?}" 1>&2
unset 'pids[pid]'
fi
do_something_and_maybe_fail "${job_args[#]}" & # forking here
pids[$!]="${job_args[*]}"
echo "${#pids[#]} running" 1>&2
done < /path/to/one_command_per_line.txt

crontab i have the error TERM environment variable not set

im using top command on shell script, when its executed on crontab i have the error TERM environment variable not set.
below my script:
#!/bin/bash
HOST=`hostname`
echo "---------------------------------------------------------------------------"
echo "Check cpu load & Memory with top on $HOST at $(date +%d/%m/%y-%H:%M:%S)"
echo "---------------------------------------------------------------------------"
echo ""
/usr/bin/top -n 1
echo ""
echo ""
echo "------check zombie process---"
/usr/bin/top -n 1 |grep zombie
echo "-----------------------------"
result after crontab script is executed
00 14 * * * /home/doug/topcommand.sh > /home/doug/check_`hostname`_`date +\%Y\%m\%d`.log 2>&1
TERM environment variable not set.
im expecting a top command result
You need to run top is batch mode for it to work in a crontab.
Use /usr/bin/top -b -n 1
It may also be useful to specify the output width you want with -w, e.g., -w 512 to get very long lines of output.
Ref: https://man7.org/linux/man-pages/man1/top.1.html

How to wait in bash till a shell script is finished?

right now I'm using this script for a program:
export FREESURFER_HOME=$HOME/freesurfer
source $FREESURFER_HOME/SetUpFreeSurfer.sh
cd /home/ubuntu/fastsurfer
datadir=/home/ubuntu/moya/data
fastsurferdir=/home/ubuntu/moya/output
mkdir -p $fastsurferdir/logs # create log dir for storing nohup output log (optional)
while read p ; do
echo $p
nohup ./run_fastsurfer.sh --t1 $datadir/$p/orig.nii \
--parallel --threads 16 --sid $p --sd $fastsurferdir > $fastsurferdir/logs/out-${p}.log &
sleep 3600s
done < /home/ubuntu/moya/data/subjects-list.txt
Instead of using sleep 3600s, as the program needs around an hour, I'd like to use wait until all processes (several PIDS) are finished.
If this is the right way, can you tell me how to do that?
BR Alex
wait will wait for all background processes to finish (see help wait). So all you need is to run wait after creating all of the background processes.
This may be more than what you are asking for but I figured I would provide some methods for controlling the number of threads you want to have running at once. I find that I always want to limit the number for various reasons.
Explaination
The following will limit concurrent threads to max_threads running at one time. I am also using the main design pattern so we have a main that runs the script with a function run_jobs that handles the calling and waiting. I read all of $p into an array, then traverse that array as we launch threads. It will either launch a thread up to 4 or wait 5 seconds, once there are at least one less than four it will start another thread. When finished it waits for any remaining to be done. If you want something more simplistic I can do that as well.
#!/usr/bin/env bash
export FREESURFER_HOME=$HOME/freesurfer
source $FREESURFER_HOME/SetUpFreeSurfer.sh
typeset max_threads=4
typeset subjects_list="/home/ubuntu/moya/data/subjects-list.txt"
typeset subjectsArray
run_jobs() {
local child="$$"
local num_children=0
local i=0
while [[ 1 ]] ; do
num_children=$(ps --no-headers -o pid --ppid=$child | wc -w) ; ((num_children-=1))
echo "Children: $num_children"
if [[ ${num_children} -lt ${max_threads} ]] ;then
if [ $i -lt ${#subjectsArray[#]} ] ;then
((i+=1))
# RUN COMMAND HERE &
./run_fastsurfer.sh --t1 $datadir/${subjectsArray[$i]}/orig.nii \
--parallel --threads 16 --sid ${subjectsArray[$i]} --sd $fastsurferdir
fi
fi
sleep 10
done
wait
}
main() {
cd /home/ubuntu/fastsurfer
datadir=/home/ubuntu/moya/data
fastsurferdir=/home/ubuntu/moya/output
mkdir -p $fastsurferdir/logs # create log dir for storing nohup output log (optional)
mapfile -t subjectsArray < ${subjects_list}
run_jobs
}
main
Note: I did not run this code since you have not provided enough information to actually do so.

testing a program in bash

I wrote a program in c++ and now I have a binary. I have also generated a bunch of tests for testing. Now I want to automate the process of testing with bash. I want to save three things in one execution of my binary:
execution time
exit code
output of the program
Right now I am stack up with a script that only tests that binary does its job and returns 0 and doesn't save any information that I mentioned above. My script looks like this
#!/bin/bash
if [ "$#" -ne 2 ]; then
echo "Usage: testScript <binary> <dir_with_tests>"
exit 1
fi
binary="$1"
testsDir="$2"
for test in $(find $testsDir -name '*.txt'); do
testname=$(basename $test)
encodedTmp=$(mktemp /tmp/encoded_$testname)
decodedTmp=$(mktemp /tmp/decoded_$testname)
printf 'testing on %s...\n' "$testname"
if ! "$binary" -c -f $test -o $encodedTmp > /dev/null; then
echo 'encoder failed'
rm "$encodedTmp"
rm "$decodedTmp"
continue
fi
if ! "$binary" -u -f $encodedTmp -o $decodedTmp > /dev/null; then
echo 'decoder failed'
rm "$encodedTmp"
rm "$decodedTmp"
continue
fi
if ! diff "$test" "$decodedTmp" > /dev/null ; then
echo "result differs with input"
else
echo "$testname passed"
fi
rm "$encodedTmp"
rm "$decodedTmp"
done
I want save output of $binary in a variable and not send it into /dev/null. I also want to save time using time bash function
As you asked for the output to be saved in a shell variable, I tried answering this without using output redirection – which saves output in (temporary) text files (which then have to be cleaned).
Saving the command output
You can replace this line
if ! "$binary" -c -f $test -o $encodedTmp > /dev/null; then
with
if ! output=$("$binary" -c -f $test -o $encodedTmp); then
Using command substitution saves the program output of $binary in the shell variable. Command substitution (combined with shell variable assignment) also allows exit codes of programs to be passed up to the calling shell so the conditional if statement will continue to check if $binary executed without error.
You can view the program output by running echo "$output".
Saving the time
Without a more sophisticated form of Inter-Process Communication, there’s no way for a shell that’s a sub-process of another shell to change the variables or the environment of its parent process so the only way that I could save both the time and the program output was to combine them in the one variable:
if ! time-output=$(time "$binary" -c -f $test -o $encodedTmp) 2>&1); then
Since time prints its profiling information to stderr, I use the parentheses operator to run the command in subshell whose stderr can be redirected to stdout. The programming output and the output of time can be viewed by running echo "$time-output" which should return something similar to:
<program output>
<blank line>
real 0m0.041s
user 0m0.000s
sys 0m0.046s
You can get the process status in bash by using $? and print it out by echo $?.
And to catch the output of time, you could use sth like that
{ time sleep 1 ; } 2> time.txt
Or you can save the output of the program and execution time at once
(time ls) > out.file 2>&1
You can save output to a file using output redirection. Just change first /dev/null line:
if ! "$binary" -c -f $test -o $encodedTmp > /dev/null; then
to
if ! "$binary" -c -f $test -o $encodedTmp > prog_output; then
then change second and third /dev/null lines respectively:
if ! "$binary" -u -f $encodedTmp -o $decodedTmp >> prog_output; then
if ! diff "$test" "$decodedTmp" >> prog_output; then
To measure program execution put
start=$(date +%s)
on the first line
then
end=$(date +%s)
echo "Execution time in seconds: " $((end-start)) >> prog_output
on the end.

create a lock file in bash to avoid duplicate execution

I'm not very good on bash I've been modifying a code to create a lock file so a cron don't execute a second time if the first process hasn't finish.
LOCK_FILE=./$(hostname)-lock
(set -C; : > $LOCK_FILE) 2> /dev/null
if [ $? != "0" ]; then
echo "already running (lock file exists); exiting..."
exit 1
fi
trap 'rm $LOCK_FILE' INT TERM EXIT
when I run it for the first time I get the message already running as if the file already existed.
Perhaps I'm missing something
#!/bin/sh
(
# Wait for lock on /tmp/lock
flock -x -w 10 200 || exit 127 # you can use or not use -w
#your stuff here
) 200> /tmp/lock
check man page flock.
This is the tool for you.
And it comes with example in man page :)

Resources