Monitoring a log file until it is complete - bash

I am a high school student attempting to write a script in bash that will submit jobs using the "qsub" command on a supercomputer utilizing a different number of cores. This script will then take the data on the number of cores and the time it took for the supercomputer to complete the simulation from each of the generated log files, called "log.lammps", and store this data in a separate file.
Because it will take each log file a different amount of time to be completely generated, I followed the steps from
https://superuser.com/questions/270529/monitoring-a-file-until-a-string-is-found
to have my script proceed when the last line of the log file with the string "Total wall time: " was generated.
Currently, I am using the following code in a loop so that this can be run for all the specified number of cores:
( tail -f -n0 log.lammps & ) | grep -q "Total wall time:"
However, running the script with this piece of code resulted in the log.lammps file being truncated and the script not completing even when the log.lammps file was completely generated.
Is there any other method for my script to only proceed when the submitted job is completed?

One way to do this is touch a marker file once you're complete, and wait for that:
#start process:
rm -f finished.txt;
( sleep 3 ; echo "scriptdone" > log.lammps ; true ) && touch finished.txt &
# wait for the above to complete
while [ ! -e finished.txt ]; do
sleep 1;
done
echo safe to process log.lammps now...
You could also use inotifywait, or a flock if you want to avoid busy waiting.
EDIT:
to handle the case where one of the first commands might fail, grouped first commands, and then added true to the end such that the group always returns true, and then did && touch finished.txt. This way finished.txt gets modified even if one of the first commands fails, and the loop below does not wait forever.

Try the following approach
# run tail -f in background
(tail -f -n0 log.lammps | grep -q "Total wall time:") > out 2>&1 &
# process id of tail command
tailpid=$!
# wait for some time or till the out file hqave data
sleep 10
# now kill the tail process
kill $tailpid

I tend to do this sort of thing with:
http://stromberg.dnsalias.org/~strombrg/notify-when-up2.html
and
http://stromberg.dnsalias.org/svn/age/trunk/
So something like:
notify-when-up2 --greater-than-or-equal-to 0 'age /etc/passwd' 10
This doesn't look for a specific pattern in your file - it looks for when the file stops changing for a 10 seconds. You can look for a pattern by replacing the age with a grep:
notify-when-up2 --true-command 'grep root /etc/passwd'
notify-when-up2 can do things like e-mail you, give a popup, or page you when a state changes. It's not a pretty approach in some cases, compared to using wait or whatever, but I find myself using a several times a day.
HTH.

Related

Bash subshell to file

I'm looping over a large file, on each line I'm running some commands, when they finish I want the entire output to be appended to a file.
Since there's nothing stopping me from running multiple commands at once, I tried to run this in the background &.
It doesn't work as expected, it just appends the commands to the file as they finish, but not in the order they appear in the subshell
#!/bin/bash
while read -r line; do
(
echo -e "$line\n-----------------"
trivy image --severity CRITICAL $line
# or any other command that might take 1-2 seconds
echo "============="
) >> vulnerabilities.txt &
done <images.txt
Where am I wrong?
Consider using GNU Parallel to get lots of things done in parallel. In your case:
parallel -k -a images.txt trivy image --severity CRITICAL > vulnerabilities.txt
The -k keeps the output in order. Add --bar or --eta for progress reports. Add --dry-run to see what it would do without actually doing anything. Add -j ... to control the number of parallel jobs at any one time - by default, it will run one job per CPU core at a time - so it will basically keep all your cores busy till the jobs are done.
If you want to do more processing on each line, you can declare a bash function and call that with each line as its parameter... see here.

How to use timeout for this nested Bash script?

I wrote the following bash script, which works all right, apart from some random moments when it freezes completely and doesn't evolve further past a certain value of a0
export OMP_NUM_THREADS=4
N_SIM=15000
N_NODE=1
for ((i = 1; i <= $N_SIM; i++))
do
index=$((i))
a0=$(awk "NR==${index} { print \$2 }" Intensity_Wcm2_versus_a0_10_20_10_25_range.txt)
dirname="a0_${a0}"
if [ -d "${dirname}" ]; then
cd -P -- "${dirname}" # enter the directory because it exists already
if [ -f "ParticleBinning0.h5" ]; then # move to next directory because the sim has been already done and results are there
cd ..
echo ${a0}
echo We move to the next directory because ParticleBinning0.h exists in this one already.
continue 1
else
awk -v s="a0=${a0}" 'NR==6 {print s} 1 {print}' ../namelist_for_smilei.py > namelist_for_smilei_a0included.py
echo ${a0}
mpirun -n 1 ../smilei namelist_for_smilei_a0included.py 2&> smilei.log
cd ..
fi
else
mkdir -p $dirname
cd $dirname
awk -v s="a0=${a0}" 'NR==6 {print s} 1 {print}' ../namelist_for_smilei.py > namelist_for_smilei_a0included.py
echo ${a0}
mpirun -n 1 ../smilei namelist_for_smilei_a0included.py 2&> smilei.log
cd ..
fi
done
I need to let this to run for 12 hours or so in order for it to complete all the 15,000 simulations.
One mpirun -n 1 ../smilei namelist_for_smilei.py 2&> smilei.log command takes 4 seconds to run on average.
Sometimes it just stops at one value of a0 and the last printed value of a0 on the screen is say a0_12.032131.
And it stays like this, stays like this, for no reason.
There's no output being written in the smilei.log from that particularly faulty a0_12.032131 folder.
So I don't know what has happened with this particular value of a0.
Any value of a0 is not particularly important, I can live without the computations for that 1 particular value of a0.
I have tried to use the timeout utility in Ubuntu to somehow make it advance past any value of a0 which takes more than 2 mins to run. If it takes more than that to run, it clearly failed and stops the whole process running forwards.
It is beyond my capabilities to write such a script.
How shall a template look like for my particular pipeline?
Thank you!
It seems that this mpirun program is hanging. As you said you could use the timeout utility to terminate its execution after a reasonable amount of time has passed:
timeout --signal INT 2m mpirun...
Depending on how mpirun handles signals it may be necessary to use KILL instead of INT to terminate the process.

delay between runs in bash on cluster

I have to submit a large number of jobs on a cluster, I have a script like:
#!/bin/bash
for runname in bcc BNU Can CNRM GFDLG GFDLM
do
cd given_directory/$runname
cat another_directory | while read LINE ; do
qsub $LINE
done
done
There are 4000 lines in the script, i.e. 4000 jobs for each runename.
The number of jobs that can be submitted on the cluster is limited by a user at a given time.
So, I want to delay the process between each runs, in a given for-loop till
one batch, like all runs in bcc directory is done.
How can I do that? Is there a command that I can put after the first done (?) to make the code to wait till bcc is done and then move to BNU?
One option is to use a counter to monitor how many jobs are currently submitted, and wait when the limit is reached. Querying the number of jobs can be a costly operation to the head node so it is better not to do it after every submitted job. Here, it is done maximum once every SLEEP seconds.
#!/bin/bash
TARGET=4000
SLEEP=300
# Count the current jobs, pending or running
get_job_count(){
# The grep is to remove the header, there may be a better way.
qstat -u $USER | grep $USER | wc -l
}
# Wait until the number of job is under the limit, then submit.
submit_when_possible(){
while [ $COUNTER -ge $TARGET ]; do
sleep $SLEEP
COUNTER=$(get_job_count)
done
qsub $1
let "COUNTER++"
}
# Global job counter
COUNTER=$(get_job_count)
for RUNNAME in bcc BNU Can CNRM GFDLG GFDLM
do
cd given_directory/$RUNNAME
cat another_directory | while read JOB ; do
submit_when_possible $JOB
done
done
Note: the script is untested, so it may need minor fixes, but the idea should work.

how to check only lines that were added since last check in bash

I have a dilemma.
I need to check log files using bash script.
The script needs to run every 5-10 minutes (set it in crontab) and send email if there is a warning or error in logs.
But it has to check only lines that were added since last check and not go through the whole document again and again.
I don't know how to check only lines that were added since last check, or lines that were added in the last 10 minutes
Sleep won't work in my situation because the script shouldn't be running all the time it should be done once every 5-10 minutes
If your file is not too big, you could try storing the previous number of lines of your log file in an environment variable, and compare it with the current line number. Something like the following script should work :
#!/bin/bash
#here, detect_errors is a placeholder name for the function you already developped
#we use only the lines we haven't seen yet in our detect_errors function
detect_errors "$(head -n $(($(wc -l log.file) - OLD_LINE_COUNT)) log.file)"
#we update the new value for the $OLD_LINE_COUNT
export OLD_LINE_COUNT="$(wc -l log.file)"
DETAILED EXPLANATION
head -n X myfile : displays the first X lines of myfile
$(( ... )) : arithmetic calculations in bash
wc -l log.file : line count for our file
$OLD_LINE_COUNT : environement variable where we store the line count of the previous iteration (equals to 0 when we first launch the script)
If the file is unchanged, $(wc -l log.file) - OLD_LINE_COUNT will return 0, and head -n 0 returns empty.
If the log file is too big, wc -l will take a lot of time, so in this case my method wouldn't be recomended.
EDIT: I have not asked how your log file is incremented. If the additional lines are added AT THE END OF THE FILE, you should use tail -n instead of head -n

Why is the output from these parallel processes not messed up?

Everything is executing perfectly.
words.dict file contains one word per line:
$ cat words.dict
cat
car
house
train
sun
today
station
kilometer
house
away
chapter.txt file contains plain text:
$ cat chapter.txt
The cars are very noisy today.
The train station is one kilometer away from his house.
This script below adds in result.txt file all words from words.dict not found with grep command in chapter.txt file, using 10 parallel grep:
$ cat psearch.sh
#!/bin/bash --
> result.txt
max_parallel_p=10
while read line ; do
while [ $(jobs | wc -l) -gt "$max_parallel_p" ]; do sleep 1; done
fgrep -q "$line" chapter.txt || printf "%s\n" "$line" >> result.txt &
done < words.dict
wait
A test:
$ ./psearch.sh
$ cat result.txt
cat
sun
I thought the tests would generate mixed words in result.txt
csat
un
But it really seems to work.
Please have a look and explain me why?
Background jobs are not threads. With a multi-threaded process then you can get that effect. The reason is that each process has just one standard output stream (stdout). In a multi-threaded program all threads share the same output stream, so an unprotected write to stdout can lead to garbled output as you describe. But you do not have a multi-threaded program.
When you use the & qualifier bash creates a new child process with its own stdout stream. Generally (depends on implementation details) this is flushed on a newline. So even though the file might be shared, the granularity is by line.
There is a slim chance that two processes could flush to the file at exactly the same time, but your code, with subprocesses and a sleep, makes it highly unlikely.
You could try taking out the newline from the printf, but given the inefficiency of the rest of the code, and the small dataset, it is still unlikely. It is quite possible that each process is complete before the next starts.

Resources