How to parallelize csh script with nested loop - parallel-processing

I have four different experiment factors.
For each of the experiments I need to vary three parameters and call a fortran program. I pass the parameters to the fortran program using an EOF construction.
Here is an example of the code:
set expfac=(0.1 0.3 0.5 0.7)
set fact1=(1 3 2 8)
set fact2=(9 2 1 4)
set fact3=(5 6 1 4)
# exp = 1
while ( $exp <= $#expfac )
foreach i ($fact1)
foreach k ($fact2)
foreach h ($fact3)
./PROGRAM << EOF
expfac |$expfac[$exp]
fact1 |${i}
fact2 |${k}
fact3 |${h}
EOF
end
end
end
# exp += 1
end
How can I parallelize respect to the while loop? Using GNU parallel maybe?

My csh skills are crap, so here it is in bash:
parallel --header : './PROGRAM << EOF
expfac |{expfac}
fact1 |{fact1}
fact2 |{fact2}
fact3 |{fact3}
EOF' ::: expfac 0.1 0.3 0.5 0.7 ::: fact1 1 3 2 8 ::: fact2 9 2 1 4 ::: fact3 5 6 1 4
The above will unfortunately fail if the values include characters that need shell quoting ("&\'* and the likes). From your example it seems it will not be a problem for you, though.

Related

plotting to dumb terminal without a data file

I have been using a script I created some time ago to monitor the convergence of some numerical calculations. What it does is, extract some data with awk, write them in some files and then I use gnuplot to plot the data in a dumb terminal. It works ok but lately I have been wondering if I am writing too much to the disk for such a task and I am curious if there is a way to use gnuplot to plot the result of awk without the need to write the result in a file first.
Here is the script I wrote:
#!/bin/bash
#
input=$1
#
timing=~/tmp/time.dat
nriter=~/tmp/nriter.dat
totenconv=~/tmp/totenconv.dat
#
test=false
while ! $test; do
clear
awk '/total cpu time/ {print $9-p;p=$9}' $input | tail -n 60 > $timing
awk '/ total energy/ && !/!/{a=$4; nr[NR+1]}; NR in nr{print a," ",$5}' $input | tail -n 60 > $nriter
awk '/!/{a=$5; nr[NR+2]}; NR in nr{print a," ",$5}' $input > $totenconv
gnuplot <<__EOF
set term dumb feed 160, 40
set multiplot layout 2, 2
#
set lmargin 15
set rmargin 2
set bmargin 1
set autoscale
#set format y "%-4.7f"
#set xlabel "nr. iterations"
plot '${nriter}' using 0:1 with lines title 'TotEn' axes x1y1
#
set lmargin 15
set rmargin 2
set bmargin 1
set autoscale
#set format y "%-4.7f"
#set xlabel "nr. iteration"
plot '${nriter}' using 0:2 with lines title 'Accuracy' axes x1y1
#
set rmargin 1
set bmargin 1.5
set autoscale
#set format y "%-4.7f"
set xlabel "nr. iteration"
plot '${totenconv}' using 1 with lines title 'TotEnConv' axes x1y1
#
set rmargin 1
set bmargin 1.5
set autoscale
set format y "%-4.0f"
set xlabel "nr. iteration"
plot '${timing}' with lines title 'Timing (s)' axes x1y1
#plot '${totenconv}' using 2 with lines title 'AccuracyConv' axes x1y1
__EOF
# tail -n 5 $input
# echo -e "\n"
date
iter=$(grep " total energy" $input | wc -l)
conviter=$(awk '/!/' $input | wc -l)
echo "number of iterations = " $iter " converged iterations = " $conviter
sleep 10s
if grep -q "JOB DONE" $input ; then
grep '!' $input
echo -e "\n"
echo "Job finished"
rm $nriter
rm $totenconv
rm $timing
date
test=true
else
test=false
fi
done
This produces a nice grid of four plots when the data is available, but I would be great if I could avoid writing to disk all the time. I don't need this data when the calculation is finished, just for this monitoring purpose.
Also, is there a better way to do this? Or is gnuplot the only option?
Edit: I am detailing what the awk bits are doing in the script as requested by #theozh:
awk '/total cpu time/ {print $9-p;p=$9}' $input - this one searches for the pattern total cpu time which appears many times in the file $input and goes to the column 9 on the line with the pattern. There it finds a number which is a time in seconds. It takes the difference between the number it finds and the one that it was found before.
awk '/ total energy/ && !/!/{a=$4; nr[NR+1]}; NR in nr{print a," ",$5}' $input - this searches for the patter total energy (there are 5 spaces before the work total) and takes the number it finds on column 4 and also goes to the second line below the line with the pattern and takes the number found at column 5
awk '/!/{a=$5; nr[NR+2]}; NR in nr{print a," ",$5}' $input - here it searches for the pattern ! and takes the number at column 5 from the line and then goes 2 lines below and takes the number at column 5.
awk works with lines and each line is devided in columns. for example the line below:
This is an example
Has 4 columns separated by the space character.
Thank you for your awk explanations, I learned again something useful.
I don't want to say that the gnuplot-only solution will be straightforward, efficient and easy to understand, but it can be done.
The assumption is that the columns or items are separated by spaces.
The ingredients are the following:
since gnuplot 5.0 you have datablocks (e.g. $Data) and since gnuplot 5.2.0 you can address the lines via index, e.g. $Data[i]. Check help datablocks. Datablocks are no files on disk but data in memory.
writing data to a datablock via with table, check help table.
to check whether a string is contained within another string you can use strstr(), check help strstrt.
use the ternary operator (check help ternary) to create a filter
to get the nth item in a string (separated by spaces) check help word.
! is the negation (check help unary)
although there is a line counter $0 in gnuplot (check help pseudocolumns) but it will be reset to 0 if you have a double empty line. That's why I would your my counter, e.g. via n=0 and n=n+1.
As far as I know, if you're using your gnuplot script in bash, you have to escape the gnuplot $ with \$, e.g. \$Data.
In order to mimic tail -n 60, i.e. only plot the last 60 datapoints of a datablock, you can use, e.g.
plot $myNrIter u ($0>|$myNrIter|-60 ? $0 : NaN):1 w lp pt 7 ti "Accuracy"
Again, it is maybe not easy to follow. The code below can maybe still be optimized.
The following might serve as a starting point and I hope you can adapt it to your needs.
Code:
### mimic an awk script using gnuplot
reset session
# if you have a file you would first need to load it 1:1 into a datablock
# see here: https://stackoverflow.com/a/65316744/7295599
$Data <<EOD
# some header of some minimal example data
1 2 3 4 5 6 7 8 9
1 2 total cpu time 6 7 8 9.1
something else
1 2 total cpu time 6 7 8 9.2
1 total energy 4.1 5 6 7 8 9
1 2 3 4 5.1 6 7 8 9
! 2 3 4 5.01 6 7 8 9
1 one line below exclamation mark
1 2nd line below 5.11 exclamation mark
1 2 total cpu time 6 7 8 9.4
1 total energy 4.2 5 6 7 8 9
1 2 3 4 5.2 6 7 8 9
1 2 total cpu time 6 7 8 9.5
# again something else
! 2 3 4 5.02 6 7 8 9
1 one line below exclamation mark
1 2nd line below 5.22 exclamation mark
1 2 total cpu time 6 7 8 9.9
1 total energy 4.3 5 6 7 8 9
1 2 3 4 5.3 6 7 8 9
! 2 3 4 5.03 6 7 8 9
1 one line below exclamation mark
1 2nd line below 5.33 exclamation mark
EOD
set datafile missing NaN # missing data NaN
set datafile commentschar '' # no comment lines
found(n,s) = strstrt($Data[n],s)>0 # returns true or 1 if string s is found in line n of datablock
item(n,col) = word($Data[n],col) # returns column col of line n of datablock
set table $myTiming
myFilter(n,col) = found(n,'total cpu time') ? (p0=p1,p1=item(n,col),p1-p0) : NaN
plot n=(p1=NaN,0) $Data u (n=n+1, myFilter(n,9)) w table
set table $myNrIter
myFilter(n,col1,col2) = found(n,' total energy') && !found(n,'!') ? \
sprintf("%s %s",item(n,col1),item(n+1,col2)) : NaN
plot n=0 $Data u (n=n+1, myFilter(n,4,5)) w table
set table $myTotenconv
myFilter(n,col1,col2) = found(n,'!') ? sprintf("%s %s",item(n,col1),item(n+2,col2)) : NaN
plot n=0 $Data u (n=n+1, myFilter(n,5,5)) w table
unset table
print $myTiming
print $myNrIter
print $myTotenconv
set multiplot layout 2,2
plot $myNrIter u 0:1 w lp pt 7 ti "Accuracy"
plot $myNrIter u 0:2 w lp pt 7 ti "TotEnConv"
plot $myTotenconv u 0:1 w lp pt 7 ti "AccuracyConv"
plot $myTiming u 0:1 w lp pt 7 ti "Timing (s)"
unset multiplot
### end of code
Result: (printout and plot)
0.1
0.2
0.1
0.4
4.1 5.1
4.2 5.2
4.3 5.3
5.01 5.11
5.02 5.22
5.03 5.33

Shell scripting output

What will be the output of the following script?
#!/in/bash
secondLoop="A B C D E F G"
counter=0
for a in 6 7 8 9
do
for b in "$secondLoop"
do
let "counter+=1"
done
done
echo "This script has $counter iterations"​
Options:
This script has 28 iterations.
This script has 21 iterations.
This script has 4 iterations.
This script has 0 iterations.
It should be 4 iterations, because there are quotes in
for b in "$secondLoop"
so that the variable is expanded to a single value (that included spaces).

Bash loop with multiple variables as input

I'm trying to run a command using different variables as parameters. This is written as a bash script.
for i, j, k in $(seq 2 0.1 6), $(seq 2 0.25 5.5), $(seq 1 1 10)
do
p.p_s_e r=100 a_t=S res=$i lam=$j s=$k sig=10 >> $k_lam_$j_res_$i.log
p.p_s_e r=100 a_t=S res=$i lam=$j s=$k sig=20 >> $k_lam_$j_res_$i.log
p.p_s_e r=100 a_t=S res=$i lam=$j s=$k sig=40 >> $k_lam_$j_res_$i.log
done
When I run this, the program does not take any of the values I am trying to give it. Sorry I can't be more clear about what I am trying to do. p.p_s_e is the program, the following X=y are variables, and I need to output to be written into a file. I think it's the way I am using for, do, done loops.
You just need three loops. (Four, if you add a loop to iterate over the sig values as well.)
for i in $(seq 2 0.1 6); do
for j in $(seq 2 0.25 5.5); do
for k in $(seq 1 1 10); do
for sig in 10 20 40; do
p.p_s_e r=100 a_t=S res=$1 lam=$j s=$k sig=$sig
done >> ${k}_lam_${j}_res_${i}.log
done
done
done

Split number string arbitrarily using bash into fixed number of variables

I have a string with 3000 elements (NOT in series) in bash,
sections='1 2 4 ... 3000'
I am trying to split this string into x chunks of length n.
I want x to be typically between 3-10. Each chunk may not be of
the same length.
Each chunk is the input to a job.
Looking at https://unix.stackexchange.com/questions/122499/bash-split-a-list-of-files
and using bash arrays, my first attempt looks like this:
#! /bin/bash
nArgs=10
nChunkSize=10
z="0 1 2 .. 1--"
zs=(${z// / })
echo ${zs[#]}
for i in $nArgs; do
echo "Creating argument: "$i
startItem=$i*$nChunkSize
zArg[$i] = ${zs[#]:($startItem:$chunkSize}
done
echo "Resulting args"
for i in $nArgs; do
echo "Argument"${zArgs[$1]}
done
The above is far from working I'm afraid. Any pointers on the ${zs[#]:($startItem:$chunkSize} syntax?
For an input of 13 elements:
z='0 1 2 3 4 5 6 7 8 10 11 12 15'
nChunks=3
and nArgs=4
I would like to obtain an array with 3 elements, zs with content
zs[0] = '0 1 2 3'
zs[1] = '4 5 6 7'
zs[2] = '8 10 11 12 15'
Each zs will be used as arguments to subsequent jobs.
First note: This is a bad idea. It won't work reliably with arbitrary (non-numeric) contents, as bash doesn't have support for nested arrays.
output=( )
sections_str='1 2 4 5 6 7 8 9 10 11 12 13 14 15 16 3000'
batch_size=4
read -r -a sections <<<"$sections_str"
for ((i=0; i<${#sections[#]}; i+=batch_size)); do
current_pieces=( "${sections[#]:i:batch_size}" )
output+=( "${current_pieces[*]}" )
done
declare -p output # to view your output
Notes:
zs=( $z ) is buggy. For example, any * inside your list will be replaced with a list of filenames in the current directory. Use read -a to read into an array in a reliable way that doesn't depend on shell configuration other than IFS (which can be controlled scoped to just that one line with IFS=' ' read -r -a).
${array[#]:start:count} expands to up to count items from your array, starting at position start.

reset row number count in awk

I have a file like this
file.txt
0 1 a
1 1 b
2 1 d
3 1 d
4 2 g
5 2 a
6 3 b
7 3 d
8 4 d
9 5 g
10 5 g
.
.
.
I want reset row number count to 0 in first column $1 whenever value of field in second column $2 changes, using awk or bash script.
result
0 1 a
1 1 b
2 1 d
3 1 d
0 2 g
1 2 a
0 3 b
1 3 d
0 4 d
0 5 g
1 5 g
.
.
.
As long as you don't mind a bit of excess memory usage, and the second column is sorted, I think this is the most fun:
awk '{$1=a[$2]+++0;print}' input.txt
This awk one-liner seems to work for me:
[ghoti#pc ~]$ awk 'prev!=$2{first=0;prev=$2} {$1=first;first++} 1' input.txt
0 1 a
1 1 b
2 1 d
3 1 d
0 2 g
1 2 a
0 3 b
1 3 d
0 4 d
0 5 g
1 5 g
Let's break apart the script and see what it does.
prev!=$2 {first=0;prev=$2} -- This is what resets your counter. Since the initial state of prev is empty, we reset on the first line of input, which is fine.
{$1=first;first++} -- For every line, set the first field, then increment variable we're using to set the first field.
1 -- this is awk short-hand for "print the line". It's really a condition that always evaluates to "true", and when a condition/statement pair is missing a statement, the statement defaults to "print".
Pretty basic, really.
The one catch of course is that when you change the value of any field in awk, it rewrites the line using whatever field separators are set, which by default is just a space. If you want to adjust this, you can set your OFS variable:
[ghoti#pc ~]$ awk -vOFS=" " 'p!=$2{f=0;p=$2}{$1=f;f++}1' input.txt | head -2
0 1 a
1 1 b
Salt to taste.
A pure bash solution :
file="/PATH/TO/YOUR/OWN/INPUT/FILE"
count=0
old_trigger=0
while read a b c; do
if ((b == old_trigger)); then
echo "$((count++)) $b $c"
else
count=0
echo "$((count++)) $b $c"
old_trigger=$b
fi
done < "$file"
This solution (IMHO) have the advantage of using a readable algorithm. I like what's other guys gives as answers, but that's not that comprehensive for beginners.
NOTE:
((...)) is an arithmetic command, which returns an exit status of 0 if the expression is nonzero, or 1 if the expression is zero. Also used as a synonym for let, if side effects (assignments) are needed. See http://mywiki.wooledge.org/ArithmeticExpression
Perl solution:
perl -naE '
$dec = $F[0] if defined $old and $F[1] != $old;
$F[0] -= $dec;
$old = $F[1];
say join "\t", #F[0,1,2];'
$dec is subtracted from the first column each time. When the second column changes (its previous value is stored in $old), $dec increases to set the first column to zero again. The defined condition is needed for the first line to work.

Resources