I wrote the following bash script, which works all right, apart from some random moments when it freezes completely and doesn't evolve further past a certain value of a0
export OMP_NUM_THREADS=4
N_SIM=15000
N_NODE=1
for ((i = 1; i <= $N_SIM; i++))
do
index=$((i))
a0=$(awk "NR==${index} { print \$2 }" Intensity_Wcm2_versus_a0_10_20_10_25_range.txt)
dirname="a0_${a0}"
if [ -d "${dirname}" ]; then
cd -P -- "${dirname}" # enter the directory because it exists already
if [ -f "ParticleBinning0.h5" ]; then # move to next directory because the sim has been already done and results are there
cd ..
echo ${a0}
echo We move to the next directory because ParticleBinning0.h exists in this one already.
continue 1
else
awk -v s="a0=${a0}" 'NR==6 {print s} 1 {print}' ../namelist_for_smilei.py > namelist_for_smilei_a0included.py
echo ${a0}
mpirun -n 1 ../smilei namelist_for_smilei_a0included.py 2&> smilei.log
cd ..
fi
else
mkdir -p $dirname
cd $dirname
awk -v s="a0=${a0}" 'NR==6 {print s} 1 {print}' ../namelist_for_smilei.py > namelist_for_smilei_a0included.py
echo ${a0}
mpirun -n 1 ../smilei namelist_for_smilei_a0included.py 2&> smilei.log
cd ..
fi
done
I need to let this to run for 12 hours or so in order for it to complete all the 15,000 simulations.
One mpirun -n 1 ../smilei namelist_for_smilei.py 2&> smilei.log command takes 4 seconds to run on average.
Sometimes it just stops at one value of a0 and the last printed value of a0 on the screen is say a0_12.032131.
And it stays like this, stays like this, for no reason.
There's no output being written in the smilei.log from that particularly faulty a0_12.032131 folder.
So I don't know what has happened with this particular value of a0.
Any value of a0 is not particularly important, I can live without the computations for that 1 particular value of a0.
I have tried to use the timeout utility in Ubuntu to somehow make it advance past any value of a0 which takes more than 2 mins to run. If it takes more than that to run, it clearly failed and stops the whole process running forwards.
It is beyond my capabilities to write such a script.
How shall a template look like for my particular pipeline?
Thank you!
It seems that this mpirun program is hanging. As you said you could use the timeout utility to terminate its execution after a reasonable amount of time has passed:
timeout --signal INT 2m mpirun...
Depending on how mpirun handles signals it may be necessary to use KILL instead of INT to terminate the process.
Related
I am a high school student attempting to write a script in bash that will submit jobs using the "qsub" command on a supercomputer utilizing a different number of cores. This script will then take the data on the number of cores and the time it took for the supercomputer to complete the simulation from each of the generated log files, called "log.lammps", and store this data in a separate file.
Because it will take each log file a different amount of time to be completely generated, I followed the steps from
https://superuser.com/questions/270529/monitoring-a-file-until-a-string-is-found
to have my script proceed when the last line of the log file with the string "Total wall time: " was generated.
Currently, I am using the following code in a loop so that this can be run for all the specified number of cores:
( tail -f -n0 log.lammps & ) | grep -q "Total wall time:"
However, running the script with this piece of code resulted in the log.lammps file being truncated and the script not completing even when the log.lammps file was completely generated.
Is there any other method for my script to only proceed when the submitted job is completed?
One way to do this is touch a marker file once you're complete, and wait for that:
#start process:
rm -f finished.txt;
( sleep 3 ; echo "scriptdone" > log.lammps ; true ) && touch finished.txt &
# wait for the above to complete
while [ ! -e finished.txt ]; do
sleep 1;
done
echo safe to process log.lammps now...
You could also use inotifywait, or a flock if you want to avoid busy waiting.
EDIT:
to handle the case where one of the first commands might fail, grouped first commands, and then added true to the end such that the group always returns true, and then did && touch finished.txt. This way finished.txt gets modified even if one of the first commands fails, and the loop below does not wait forever.
Try the following approach
# run tail -f in background
(tail -f -n0 log.lammps | grep -q "Total wall time:") > out 2>&1 &
# process id of tail command
tailpid=$!
# wait for some time or till the out file hqave data
sleep 10
# now kill the tail process
kill $tailpid
I tend to do this sort of thing with:
http://stromberg.dnsalias.org/~strombrg/notify-when-up2.html
and
http://stromberg.dnsalias.org/svn/age/trunk/
So something like:
notify-when-up2 --greater-than-or-equal-to 0 'age /etc/passwd' 10
This doesn't look for a specific pattern in your file - it looks for when the file stops changing for a 10 seconds. You can look for a pattern by replacing the age with a grep:
notify-when-up2 --true-command 'grep root /etc/passwd'
notify-when-up2 can do things like e-mail you, give a popup, or page you when a state changes. It's not a pretty approach in some cases, compared to using wait or whatever, but I find myself using a several times a day.
HTH.
This problem is a homework, I know there would be much easier ways of solving this problem, but it is what it is.
The question goes as follows:
-I have a bash script that does some creates some files.
-I have to pass this first argument of this script ( which is a directory ) to another script trough a pipeline ( | ).
The problem comes in when I do try and pass this directory as an argument, in the other script, the argument I receive is null.
This is the whole code, in this exact order.
Nothing was left out.
The first script receives at least 11 argument:
if [[ $# -lt 11 ]]; then
echo "Not enough arguments." >&2;
exit 1;
fi;
The script will check if the first argument is an actual directory:
if [[ ! -d $1 ]]; then
echo "Not a valid first argument." >&2;
exit 1;
fi;
Here, I'll save my first directory here so that I can shift, and then I'm going to declare more stuff that i need
Directory=$1;
shift;
N=$#
name_array=("$#")
Pos=0;
Afterwards, the script will use all the other arguments to create .txt files with those names, and add a number of non-null lines in each .txt file ( So, argument2.txt has N-1 lines, argument3.txt has N-2 lines ... argumentN.txt has 1 line.) Afterwards, I have to change their permissions to 600.
while [[ $# -gt 0 ]]; do
if [[ -a "$Directory"/"$1".txt ]]; then
rm "$Directory"/"$1".txt;
fi
touch "$Directory"/"$1".txt
for((i=0;i<N;i++)); do
echo "$N" >> "$Directory"/"$1".txt;
done;
chmod u=rw "$Directory"/"$1".txt;
N=$(($N-1));
name_array["$Pos"]="$Directory"/"$1".txt;
Pos=$(($Pos + 1));
shift;
done;
Problem comes here, I'm trying to echo the first directory..
echo $Directory;
I have tried this as well, with the same results as the one above
echo "$(cd "$(dirname "$Directory")" && pwd -P)/$(basename "$Directory")";
.. like this so I can use in the Shell the pipeline commands.
The second script will receive the directory, enter it and search trough all the files that have been created just now.
In the Unix terminal, I use this:
./FirstScript.bash 1 2 3 4 5 6 7 8 9 10 11 12 | ./SecondScript.bash
Thank you in advance!
Thanx for being honest, and of course we're not going to do your homework, but,...
Most of the code seems pretty good. There are some smaller issues, which you might easily solve with shellcheck (if it isn't installed, install it or use the web version), and I would not put ; at the end of each line.
So, instead of running the complete pipeline at once, break it down to find where your problem lies. If you run
./FirstScript.bash 1 2 3 4 5 6 7 8 9 10 11 12
If the output is as expected (so, with the echo $Directory), and the files are created, then the first script is good. So, in this case, the directory 1 will contain the files 2 3 4 5 6 7 8 9 10 11 12, and the output will be 1.
Then, you will run the second script
./SecondScript.bash
and as input you will give:
1
^D
(that is control-d on Unix). And then the second script should do what you want (or not).
In this way you can debug the scripts separately.
(quick face-palm question: is 1 really the directory you want, or did you just forget the directory name as first argument?)
It seems that by executing code in PS0 and PS1 variables (which are eval'ed before and after a prompt command is run, as I understand) it should be possible to record time of each running command and display it in the prompt. Something like that:
user#machine ~/tmp
$ sleep 1
user#machine ~/tmp 1.01s
$
However, I quickly got stuck with recording time in PS0, since something like this doesn't work:
PS0='$(START=$(date +%s.%N))'
As I understand, START assignment happens in a sub-shell, so it is not visible in the outer shell. How would you approach this?
I was looking for a solution to a different problem and came upon this question, and decided that sounds like a cool feature to have. Using #Scheff's excellent answer as a base in addition to the solutions I developed for my other problem, I came up with a more elegant and full featured solution.
First, I created a few functions that read/write the time to/from memory. Writing to the shared memory folder prevents disk access and does not persist on reboot if the files are not cleaned for some reason
function roundseconds (){
# rounds a number to 3 decimal places
echo m=$1";h=0.5;scale=4;t=1000;if(m<0) h=-0.5;a=m*t+h;scale=3;a/t;" | bc
}
function bash_getstarttime (){
# places the epoch time in ns into shared memory
date +%s.%N >"/dev/shm/${USER}.bashtime.${1}"
}
function bash_getstoptime (){
# reads stored epoch time and subtracts from current
local endtime=$(date +%s.%N)
local starttime=$(cat /dev/shm/${USER}.bashtime.${1})
roundseconds $(echo $(eval echo "$endtime - $starttime") | bc)
}
The input to the bash_ functions is the bash PID
Those functions and the following are added to the ~/.bashrc file
ROOTPID=$BASHPID
bash_getstarttime $ROOTPID
These create the initial time value and store the bash PID as a different variable that can be passed to a function. Then you add the functions to PS0 and PS1
PS0='$(bash_getstarttime $ROOTPID) etc..'
PS1='\[\033[36m\] Execution time $(bash_getstoptime $ROOTPID)s\n'
PS1="$PS1"'and your normal PS1 here'
Now it will generate the time in PS0 prior to processing terminal input, and generate the time again in PS1 after processing terminal input, then calculate the difference and add to PS1. And finally, this code cleans up the stored time when the terminal exits:
function runonexit (){
rm /dev/shm/${USER}.bashtime.${ROOTPID}
}
trap runonexit EXIT
Putting it all together, plus some additional code being tested, and it looks like this:
The important parts are the execution time in ms, and the user.bashtime files for all active terminal PIDs stored in shared memory. The PID is also shown right after the terminal input, as I added display of it to PS0, and you can see the bashtime files added and removed.
PS0='$(bash_getstarttime $ROOTPID) $ROOTPID experiments \[\033[00m\]\n'
As #tc said, using arithmetic expansion allows you to assign variables during the expansion of PS0 and PS1. Newer bash versions also allow PS* style expansion so you don't even need a subshell to get the current time. With bash 4.4:
# PS0 extracts a substring of length 0 from PS1; as a side-effect it causes
# the current time as epoch seconds to PS0time (no visible output in this case)
PS0='\[${PS1:$((PS0time=\D{%s}, PS1calc=1, 0)):0}\]'
# PS1 uses the same trick to calculate the time elapsed since PS0 was output.
# It also expands the previous command's exit status ($?), the current time
# and directory ($PWD rather than \w, which shortens your home directory path
# prefix to "~") on the next line, and finally the actual prompt: 'user#host> '
PS1='\nSeconds: $((PS1calc ? \D{%s}-$PS0time : 0)) Status: $?\n\D{%T} ${PWD:PS1calc=0}\n\u#\h> '
(The %N date directive does not seem to be implemented as part of \D{...} expansion with bash 4.4. This is a pity since we only have a resolution in single second units.)
Since PS0 is only evaluated and printed if there is a command to execute, the PS1calc flag is set to 1 to do the time difference (following the command) in PS1 expansion or not (PS1calc being 0 means PS0 was not previously expanded and so didn't re-evaluate PS1time). PS1 then resets PS1calc to 0. In this way an empty line (just hitting return) doesn't accumulate seconds between return key presses.
One nice thing about this method is that there is no output when you have set -x active. No subshells or temporary files in sight: everything is done within the bash process itself.
I took this as puzzle and want to show the result of my puzzling:
First I fiddled with time measurement. The date +%s.%N (which I didn't realize before) was where I started from. Unfortunately, it seems that bashs arithmetic evaluation seems not to support floating points. Thus, I chosed something else:
$ START=$(date +%s.%N)
$ awk 'BEGIN { printf("%fs", '$(date +%s.%N)' - '$START') }' /dev/null
8.059526s
$
This is sufficient to compute the time difference.
Next, I confirmed what you already described: sub-shell invocation prevents usage of shell variables. Thus, I thought about where else I could store the start time which is global for sub-shells but local enough to be used in multiple interactive shells concurrently. My solution are temp. files (in /tmp). To provide a unique name I came up with this pattern: /tmp/$USER.START.$BASHPID.
$ date +%s.%N >/tmp/$USER.START.$BASHPID ; \
> awk 'BEGIN { printf("%fs", '$(date +%s.%N)' - '$(cat /tmp/$USER.START.$BASHPID)') }' /dev/null
cat: /tmp/ds32737.START.11756: No such file or directory
awk: cmd. line:1: BEGIN { printf("%fs", 1491297723.111219300 - ) }
awk: cmd. line:1: ^ syntax error
$
Damn! Again I'm trapped in the sub-shell issue. To come around this, I defined another variable:
$ INTERACTIVE_BASHPID=$BASHPID
$ date +%s.%N >/tmp/$USER.START.$INTERACTIVE_BASHPID ; \
> awk 'BEGIN { printf("%fs", '$(date +%s.%N)' - '$(cat /tmp/$USER.START.$INTERACTIVE_BASHPID)') }' /dev/null
0.075319s
$
Next step: fiddle this together with PS0 and PS1. In a similar puzzle (SO: How to change bash prompt color based on exit code of last command?), I already mastered the "quoting hell". Thus, I should be able to do it again:
$ PS0='$(date +%s.%N >"/tmp/${USER}.START.${INTERACTIVE_BASHPID}")'
$ PS1='$(awk "BEGIN { printf(\"%fs\", "$(date +%s.%N)" - "$(cat /tmp/$USER.START.$INTERACTIVE_BASHPID)") }" /dev/null)'"$PS1"
0.118550s
$
Ahh. It starts to work. Thus, there is only one issue - to find the right start-up script for the initialization of INTERACTIVE_BASHPID. I found ~/.bashrc which seems to be the right one for this, and which I already used in the past for some other personal customizations.
So, putting it all together - these are the lines I added to my ~/.bashrc:
# command duration puzzle
INTERACTIVE_BASHPID=$BASHPID
date +%s.%N >"/tmp/${USER}.START.${INTERACTIVE_BASHPID}"
PS0='$(date +%s.%N >"/tmp/${USER}.START.${INTERACTIVE_BASHPID}")'
PS1='$(awk "BEGIN { printf(\"%fs\", "$(date +%s.%N)" - "$(cat /tmp/$USER.START.$INTERACTIVE_BASHPID)") }" /dev/null)'"$PS1"
The 3rd line (the date command) has been added to solve another issue. Comment it out and start a new interactive bash to find out why.
A snapshot of my cygwin xterm with bash where I added the above lines to ./~bashrc:
Notes:
I consider this rather as solution to a puzzle than a "serious productive" solution. I'm sure that this kind of time measurement consumes itself a lot of time. The time command might provide a better solution: SE: How to get execution time of a script effectively?. However, this was a nice lecture for practicing the bash...
Don't forget that this code pollutes your /tmp directory with a growing number of small files. Either clean-up the /tmp from time to time or add the appropriate commands for clean-up (e.g. to ~/.bash_logout).
Arithmetic expansion runs in the current process and can assign to variables. It also produces output, which you can consume with something like \e[$((...,0))m (to output \e[0m) or ${t:0:$((...,0))} (to output nothing, which is presumably better). 64-bit integer support in Bash supports will count POSIX nanoseconds until the year 2262.
$ PS0='${t:0:$((t=$(date +%s%N),0))}'
$ PS1='$((( t )) && printf %d.%09ds $((t=$(date +%s%N)-t,t/1000000000)) $((t%1000000000)))${t:0:$((t=0))}\n$ '
0.053282161s
$ sleep 1
1.064178281s
$
$
PS0 is not evaluated for empty commands, which leaves a blank line (I'm not sure if you can conditionally print the \n without breaking things). You can work around that by switching to PROMPT_COMMAND instead (which also saves a fork):
$ PS0='${t:0:$((t=$(date +%s%N),0))}'
$ PROMPT_COMMAND='(( t )) && printf %d.%09ds\\n $((t=$(date +%s%N)-t,t/1000000000)) $((t%1000000000)); t=0'
0.041584565s
$ sleep 1
1.077152833s
$
$
That said, if you do not require sub-second precision, I would suggest using $SECONDS instead (which is also more likely to return a sensible answer if something sets the time).
As correctly stated in the question, PS0 runs inside a sub-shell which makes it unusable for this purpose of setting the start time.
Instead, one can use the history command with epoch seconds %s and the built-in variable $EPOCHSECONDS to calculate when the command finished by leveraging only $PROMPT_COMMAND.
# Save start time before executing command (does not work due to PS0 sub-shell)
# preexec() {
# STARTTIME=$EPOCHSECONDS
# }
# PS0=preexec
# Save end time, without duplicating commands when pressing Enter on an empty line
precmd() {
local st=$(HISTTIMEFORMAT='%s ' history 1 | awk '{print $2}');
if [[ -z "$STARTTIME" || (-n "$STARTTIME" && "$STARTTIME" -ne "$st") ]]; then
ENDTIME=$EPOCHSECONDS
STARTTIME=$st
else
ENDTIME=0
fi
}
__timeit() {
precmd;
if ((ENDTIME - STARTTIME >= 0)); then
printf 'Command took %d seconds.\n' "$((ENDTIME - STARTTIME))";
fi
# Do not forget your:
# - OSC 0 (set title)
# - OSC 777 (notification in gnome-terminal, urxvt; note, this one has preexec and precmd as OSC 777 features)
# - OSC 99 (notification in kitty)
# - OSC 7 (set url) - out of scope for this question
}
export PROMPT_COMMAND=__timeit
Note: If you have ignoredups in your $HISTCONTROL, then this will not report back for a command that is re-run.
Following #SherylHohman use of variables in PS0 I've come with this complete script. I've seen you don't need a PS0Time flag as PS0Calc doesn't exists on empty prompts so _elapsed funct just exit.
#!/bin/bash
# string preceding ms, use color code or ascii
_ELAPTXT=$'\E[1;33m \uf135 '
# extract time
_printtime () {
local _var=${EPOCHREALTIME/,/};
echo ${_var%???}
}
# get diff time, print it and end color codings if any
_elapsed () {
[[ -v "${1}" ]] || ( local _VAR=$(_printtime);
local _ELAPSED=$(( ${_VAR} - ${1} ));
echo "${_ELAPTXT}$(_formatms ${_ELAPSED})"$'\n\e[0m' )
}
# format _elapsed with simple string substitution
_formatms () {
local _n=$((${1})) && case ${_n} in
? | ?? | ???)
echo $_n"ms"
;;
????)
echo ${_n:0:1}${_n:0,-3}"ms"
;;
?????)
echo ${_n:0:2}","${_n:0,-3}"s"
;;
??????)
printf $((${_n:0:3}/60))m+$((${_n:0:3}%60)),${_n:0,-3}"s"
;;
???????)
printf $((${_n:0:4}/60))m$((${_n:0:4}%60))s${_n:0,-3}"ms"
;;
*)
printf "too much!"
;;
esac
}
# prompts
PS0='${PS1:(PS0time=$(_printtime)):0}'
PS1='$(_elapsed $PS0time)${PS0:(PS0time=0):0}\u#\h:\w\$ '
img of result
Save it as _prompt and source it to try:
source _prompt
Change text, ascii codes and colors in _ELAPTXT
_ELAPTXT='\e[33m Elapsed time: '
#!/bin/bash
z=1
b=$(date)
while [[ $z -eq 1 ]]
do
a=$(date)
if [ "$a" == "$b" ]
then
b=$(date -d "+7 days")
rsync -v -e ssh user#ip_address:~/sample.tgz /home/kartik2
sleep 1d
fi
done
I want to rsync a file every week !! But if I start this script on every boot the file will be rsynced every time the system starts !! How to alter the code to satisfy week basis rsync ? ( PS- I don't want to do this through cronjob - school assignment)
You are talking about having this run for weeks, right? So, we have to take into account that the system will be rebooted and it needs to be run unattended. In short, you need some means of ensuring the script is run at least once every week even when no one is around. The options look like this
Option 1 (worst)
You set a reminder for yourself and you log in every week and run the script. While you may be reliable as a person, this doesn't allow you to go on vacation. Besides, it goes against our principle of "when no one is around".
Option 2 (okay)
You can background the process (./once-a-week.sh &) but this will not reliable over time. Among other things, if the system restarts then your script will not be operating and you won't know.
Option 3 (better)
For this to be reliable over weeks one option is to daemonize the script. For a more detailed discussion on the matter, see: Best way to make a shell script daemon?
You would need to make sure the daemon is started after reboot or system failure. For more discussion on that matter, see: Make daemon start up with Linux
Option 4 (best)
You said no cron but it really is the best option. In particular, it would consume no system resources for the 6 days, 23 hours and 59 minutes when it does not need to running. Additionally, it is naturally resilient to reboots and the like. So, I feel compelled to say that creating a crontab entry like the following would be my top vote: #weekly /full/path/to/script
If you do choose option 2 or 3 above, you will need to make modifications to your script to contain a variable of the week number (date +%V) in which the script last successfully completed its run. The problem is, just having that in memory means that it will not be sustained past reboot.
To make any of the above more resilient, it might be best to create a directory where you can store a file to serve as a semaphore (e.g. week21.txt) or a file to store the state of the last run. Something like once-a-week.state to which you would write a value when run:
date +%V > once-a-week.state # write the week number to a file
Then to read the value, you would:
file="/path/to/once-a-week.state" # the file where the week number is stored
read -d $'\x04' name < "$file"
echo "$name"
You would then need to check to see if the week number matched this present week number and handle the needed action based on match or not.
#!/bin/bash
z=1
b=$(cat f1.txt)
while [[ $z -eq 1 ]]
do
a=$(date +"%d-%m-%y")
if [ "$a" == "$b" ] || [ "$b" == "" ] || [$a -ge $b ]
then
b=$(date +"%d-%m-%y" -d "+7 days")
echo $b > f1.txt
rsync -v -e ssh HOST#ip:~/sample.tgz /home/user
if [ $? -eq 0 ]
then
sleep 1d
fi
fi
done
This code seems to works well and good !! Any changes to it let me know
So, I want to write a bash script that are a sequence of steps and ill identify it as "task#". However, each step is only completed and can run as long as the user wants.
Do task1
if keypressed stop task1 and move on #this is the part I need help with. There can be up to 10 of these move on steps.
Do task2
...
kina like top; it keeps doing stuff until you hit q to quite, however, i want to move on to the next thing
you can use read builtin command with option -t and -n
while :
do
# TASK 1
date
read -t 1 -n 1 key
if [[ $key = q ]]
then
break
fi
done
# TASK 2
date +%s
kev's great solution works well even in Bash 3.x., but it introduces a 1-second delay (-t 1) in every loop iteration.
In Bash 3.x, the lowest supported value for -t (timeout) is 1 (second), unfortunately.
Bash 4.x supports 0 and fractional values, however:
A solution that supports an arbitrary key such as q requires a nonzero -t value, but you can specify a value very close to 0 to minimize the delay:
#!/bin/bash
# !! BASH 4.x+ ONLY
while :; do
# Loop command
date
# Check for 'q' keypress *waiting very briefly* and exit the loop, if found.
read -t 0.01 -r -s -N 1 && [[ $REPLY == 'q' ]] && break
done
# Post-loop command
date +%s
Caveat: The above uses 0.01 as the almost-no-timeout value, but, depending on your host platform, terminal program and possibly CPU speed/configuration, a larger value may be required / a smaller value may be supported. If the value is too small, you'll see intermittent error setting terminal attributes: Interrupted system call errors - if anyone knows why, do tell us.
Tip of the hat to jarno for his help with the following:
Using -t 0, works as follows, according to help read (emphasis added):
If TIMEOUT is 0, read returns
immediately, without trying to read any data, returning
success only if input is available on the specified
file descriptor.
As of Bash v4.4.12 and 5.0.11, unfortunately, -t 0 seems to ignore -n / -N, so that only an ENTER keypress (or a sequence of keypresses ending in ENTER) causes read to indicate that data is available.
If anyone knows whether this is a bug or whether there's a good reason for this behavior, do let us know.
Therefore, only with ENTER as the quit key is a -t 0 solution currently possible:
#!/bin/bash
# !! BASH 4.x+ ONLY
while :; do
# Loop command
date
# Check for ENTER keypress and, after clearing the input buffer
# with a dummy `read`, exit the loop.
read -t 0 -r -N 1 && { read -r; break; }
done
# Post-loop command
date +%s