How to parallel process a function, with loops - bash

So I have this function, I want this function to run everything that It contains in itself at the same time. So far it isn't working, and according to other sources, this is how you do it. The function itself works if its not in parallel.
#!/bin/bash
foo () {
cd ${HOME}/sh/path/to/script/execute
for f in *.sh; do #goes to "execute" directory and executes all
#scripts the current directory "execute" basically run-parts without cron
cd ~/sh/path/to/script
while IFS= read -r l1 #Line 1 in master.txt
IFS= read -r l2 #Line 2 in master.txt
IFS= read -r l3 #Line 3 in master.txt
do
cd /dev/shm/arb
echo ${l1} > arg.txt & echo ${l2} > arg2.txt & echo ${l3} > arg3.txt
cd ${HOME}/sh/path/to/script/execute
bash -H ${f} #executes all scripts inside "execute" folder
cd ~/sh/path/to/script/here
./here.sh &
cd ~/sh/path/to/script &
done <master.txt
done
}
export -f foo
parallel ::: foo
Results in
#No result at all....., just buffers. htop doesn't acknowledge any
#processes, and when this runs its pretty taxing on the cores.
master.txt content
In case this is relevant:
apple_fruit
apple_veggie
veggie_fruit
#apple changes
pear_fruit
pear_veggie
veggie_fruit
#pear changes
cucumber_fruit
...
I'm very new to using parallel, and don't know how it works in advanced(and basic) situations so would the loops interfere? And if it does interfere, is there a workaround?

The result is probably going to be something like:
inner() {
script="$1"
parallel -N3 "'$script' {}; here.sh {}" :::: master.txt
}
export -f inner
parallel inner ::: ${HOME}/sh/path/to/script/execute/*.sh
This will call each of the scripts in ${HOME}/sh/path/to/script/execute/ (and here.sh) with 3 arguments from master.txt like this:
${HOME}/sh/path/to/script/execute/script1.sh apple_fruit apple_veggie veggie_fruit
You need to change the scripts so that:
They get the arguments from the command line (not from arg.txt, arg2.txt, arg3.txt).
They send their output to stdout

Related

Parallel subshells doing work and report status

I am trying to do work in all subfolders in parallel and describe a status per folder once it is done in bash.
suppose I have a work function which can return a couple of statuses
#param #1 is the folder
# can return 1 on fail, 2 on sucess, 3 on nothing happend
work(){
cd $1
// some update thing
return 1, 2, 3
}
now I call this in my wrapper function
do_work(){
while read -r folder; do
tput cup "${row}" 20
echo -n "${folder}"
(
ret=$(work "${folder}")
tput cup "${row}" 0
[[ $ret -eq 1 ]] && echo " \e[0;31mupdate failed \uf00d\e[0m"
[[ $ret -eq 2 ]] && echo " \e[0;32mupdated \uf00c\e[0m"
[[ $ret -eq 3 ]] && echo " \e[0;32malready up to date \uf00c\e[0m"
) &>/dev/null
pids+=("${!}")
((++row))
done < <(find . -maxdepth 1 -mindepth 1 -type d -printf "%f\n" | sort)
echo "waiting for pids ${pids[*]}"
wait "${pids[#]}"
}
and what I want is, that it prints out all the folders per line, and updates them independently from each other in parallel and when they are done, I want that status to be written in that line.
However, I am unsure subshell is writing, which ones I need to capture how and so on.
My attempt above is currently not writing correctly, and not in parallel.
If I get it to work in parallel, I get those [1] <PID> things and [1] + 3156389 done ... messing up my screen.
If I put the work itself in a subshell, I don't have anything to wait for.
If I then collect the pids I dont get the response code to print out the text to show the status.
I did have a look at GNU Parallel but I think I cannot have that behaviour. (I think I could hack it that the finished jobs are printed, but I want all 'running' jobs are printed, and the finished ones get amended).
Assumptions/undestandings:
a separate child process is spawned for each folder to be processed
the child process generates messages as work progresses
messages from child processes are to be displayed in the console in real time, with each child's latest message being displayed on a different line
The general idea is to setup a means of interprocess communications (IC) ... named pipe, normal file, queuing/messaging system, sockets (plenty of ideas available via a web search on bash interprocess communications); the children write to this system while the parent reads from the system and issues the appropriate tput commands.
One very simple example using a normal file:
> status.msgs # initialize our IC file
child_func () {
# Usage: child_func <unique_id> <other> ... <args>
local i
for ((i=1;i<=10;i++))
do
sleep $1
# each message should include the child's <unique_id> ($1 in this case);
# parent/monitoring process uses this <unique_id> to control tput output
echo "$1:message - $1.$i" >> status.msgs
done
}
clear
( child_func 3 & )
( child_func 5 & )
( child_func 2 & )
while IFS=: read -r child msg
do
tput cup $child 10
echo "$msg"
done < <(tail -f status.msgs)
NOTES:
the (child_func 3 &) construct is one way to eliminate the OS message re: 'background process completed' from showing up in stdout (there may be other ways but I'm drawing a blank at the moment)
when using a file (normal, pipe) OP will want to look at a locking method (flock?) to insure messages from multiple children don't stomp each other
OP can get creative with the format of the messages printed to status.msgs in conjunction with parsing logic in the parent's while loop
assuming variable width messages OP may want to look at appending a tput el on the end of each printed message in order to 'erase' any characters leftover from a previous/longer message
exiting the loop could be as simple as keeping count of the number of child processes that send a message <id>:done, or keeping track of the number of children still running in the background, or ...
Running this at my command line generates 3 separate lines of output that are updated at various times (based on the sleep $1):
# no ouput to line #1
message - 2.10 # messages change from 2.1 to 2.2 to ... to 2.10
message - 3.10 # messages change from 3.1 to 3.2 to ... to 3.10
# no ouput to line #4
message - 5.10 # messages change from 5.1 to 5.2 to ... to 5.10
NOTE: comments not actually displayed in console
Based on #markp-fuso's answer:
printer() {
while IFS=$'\t' read -r child msg
do
tput cup $child 10
echo "$child $msg"
done
}
clear
parallel --lb --tagstring "{%}\t{}" work ::: folder1 folder2 folder3 | printer
echo
You can't control exit statuses like that. Try this instead, rework your work function to echo status:
work(){
cd $1
# some update thing &> /dev/null without output
echo "${1}_$status" #status=1, 2, 3
}
And than set data collection from all folders like so:
data=$(
while read -r folder; do
work "$folder" &
done < <(find . -maxdepth 1 -mindepth 1 -type d -printf "%f\n" | sort)
wait
)
echo "$data"

linux for loop two variables each time

I have several files in a directory and I want to run some linux packages on these files by every two of them, like ERR1045141_1 with ERR1045141_2 and ERR1045144_1 with ERR1045144_2 and so on. So I write a for loop for this but it is not working.
files:
ERR1045141_1.fastq.gz
ERR1045141_2.fastq.gz
ERR1045144_1.fastq.gz
ERR1045144_2.fastq.gz
ERR1045145_1.fastq.gz
ERR1045145_2.fastq.gz
ERR1045146_1.fastq.gz
ERR1045146_2.fastq.gz
ERR1045148_1.fastq.gz
ERR1045148_2.fastq.gz
ERR1045149_1.fastq.gz
ERR1045149_2.fastq.gz
ERR1045151_1.fastq.gz
ERR1045151_2.fastq.gz
ERR1045152_1.fastq.gz
ERR1045152_2.fastq.gz
ERR1045154_1.fastq.gz
ERR1045154_2.fastq.gz
codes:
files=ls
for (( i=0; i<${#files[#]} ; i+=2 )) ; do
echo "${files[i]}" "${files[i+1]}"
done
It did not work and I am not sure is the files=ls has something wrong.Or any better way to do it.please advise.
Try the following if you are sure about the existence of the second file:
for file1 in ERR*_1*
do
file2=`echo $file1 | sed 's/_1/_2/g'`
echo $file1 $file2
done
No, what you really want to do is to process all the 1 files, performing some action on it and its associated 2 file.
You can do that with something as simple as the for loop in this complete test program:
#!/usr/bin/env bash
doSomethingWith() {
echo "[$1] [$2]"
}
touch 'xERR1045141_1.fastq.gz' 'xERR1045141_2.fastq.gz'
touch 'xERR1045144_1.fastq.gz' 'xERR1045144_2.fastq.gz'
touch 'xERR1045145_1.fastq.gz' 'xERR1045145_2.fastq.gz'
touch 'xERR1045146_1.fastq.gz' 'xERR1045146_2.fastq.gz'
touch 'xERR1045148_1.fastq.gz' 'xERR1045148_2.fastq.gz'
touch 'xERR1045149_1.fastq.gz' 'xERR1045149_2.fastq.gz'
touch 'xERR1045151_1.fastq.gz' 'xERR1045151_2.fastq.gz'
touch 'xERR1045152_1.fastq.gz' 'xERR1045152_2.fastq.gz'
touch 'xERR1045154_1.fastq.gz' 'xERR1045154_2.fastq.gz'
touch 'xERR 45154_1.fastq.gz' 'xERR 45154_2.fastq.gz'
for file1 in xERR*_1.fastq.gz ; do
file2="${file1/_1/_2}"
doSomethingWith "${file1}" "${file2}"
done
rm -rf xERR*.fastq.gz
This program outputs:
[xERR1045141_1.fastq.gz] [xERR1045141_2.fastq.gz]
[xERR1045144_1.fastq.gz] [xERR1045144_2.fastq.gz]
[xERR1045145_1.fastq.gz] [xERR1045145_2.fastq.gz]
[xERR1045146_1.fastq.gz] [xERR1045146_2.fastq.gz]
[xERR1045148_1.fastq.gz] [xERR1045148_2.fastq.gz]
[xERR1045149_1.fastq.gz] [xERR1045149_2.fastq.gz]
[xERR1045151_1.fastq.gz] [xERR1045151_2.fastq.gz]
[xERR1045152_1.fastq.gz] [xERR1045152_2.fastq.gz]
[xERR1045154_1.fastq.gz] [xERR1045154_2.fastq.gz]
[xERR 45154_1.fastq.gz] [xERR 45154_2.fastq.gz]
to show that the names are being handled correctly.
Note that I've named the files xERR* so as not to clash with your own files. You should adjust the loop to handle your own files once you're satisfied it will work okay.
And, just as an aside, if you don't want to do anything except for those cases where both files exist, you can simply replace the "action" line with something like:
[[ -f "${file2}" ]] && doSomethingWith "${file1}" "${file2}"
This will bypass those where the 2 file is not a regular file.

Bash Scripting : How to loop over X number of files, take input and write to a file in the same line

So I have a program written in C that takes in some parameters: calling it allcell
some sample parameters: -m 1800 -n 9
the files being analyzed: cfdT100-0.trj, cfdT100-1.trj, cfdT100-2.trj, cfdT100-3.trj, ... cfdT100-19.trj
file being fed: template.file
out file: result.file
$ allcell -m 1800 -n 9 cfdT100-[0-19].trj < template.file > result.file
But when I htop, I see that only cfdT100-0.trj, cfdT100-1.trj and cfdT100-9.trj are being read. How do I make the shell read all the files from 0-19 ?
Additionally, when I write a script file to automate this, how should I enclose the line? Will this work:
"$($ allcell -m 1800 -n 9 cfdT100-[0-19].trj < template.file > result.file)"
I believe you want to change your glob expression to cfdT100-{0..19}.trj instead.
neech#nicolaw.uk:~ $ echo cfdT100-{0..19}.trj
cfdT100-0.trj cfdT100-1.trj cfdT100-2.trj cfdT100-3.trj cfdT100-4.trj cfdT100-5.trj cfdT100-6.trj cfdT100-7.trj cfdT100-8.trj cfdT100-9.trj cfdT100-10.trj cfdT100-11.trj cfdT100-12.trj cfdT100-13.trj cfdT100-14.trj cfdT100-15.trj cfdT100-16.trj cfdT100-17.trj cfdT100-18.trj cfdT100-19.trj
Your quoting on the scripted version looks acceptable. Just change the glob.
use recursion function for infinite loop
a()
{
echo "apple"
a
}
a
This the will make a infinite loop

Write a script to put a series of files in sequence

I am beginning in scripting and I am trying to write a script in bash. I need a script to write a sequence of several file names that are numbered from 1 to 50 inside one file. These are trajectory files from MD simulations. My idea was to write something like:
for valor in {1..50}
do
echo "
#!/bin/bash
catdcd -o Traj-all.dcd -stride 10 -dcd traj-$valor.dcd" > Traj.bash
exit
However, I just got one file with the following line:
#!/bin/bash
catdcd -o Traj-all.dcd -stride 10 -dcd traj-50.dcd
exit
But what I really want is something like:
#!/bin/bash
catdcd -o Traj-all.dcd -stride 10 -dcd traj-1.dcd -dcd traj-2.dcd -dcd traj-3.dcd ... -dcd traj-50.dcd
exit
How can I solve this problem?
You need to read a bit more about bash brace expansion. You can do this:
{
echo "#!/bin/bash"
echo "catdcd -o Traj-all.dcd -stride 10" "-dec traj-"{1..50}".dcd"
# ^^^^^^^^^^^^^^^^^^^^^^^^^
} > Traj.bash
The underlined part is where the brace expansion will get expanded by the shell into
-dec traj-1.dcd -dec traj-2.dcd ... -dec traj-50.dcd
You don't need to explicitly end your script with exit -- the shell will exit by itself when it runs out of commands.
> truncates the file on open. Either only use it once before the loop to create the file and then append (>>) within the loop, or redirect the entire loop.
> foo
for ...
do ...
echo ... >> foo
done
...
{
for ...
do ...
echo ...
done
} > foo

Xcode bad coloring with shell script?

It seems that Xcode really sucks at coloring the shell script. For example, if you copy the following snippet into your Xcode, most of the whole chunk is colored red.
### Creation of the GM template by averaging all (or following the template_list for) the GM_nl_0 and GM_xflipped_nl_0 images
cat <<stage_tpl3 > fslvbm2c
#!/bin/sh
if [ -f ../template_list ] ; then
template_list=\`cat ../template_list\`
template_list=\`\$FSLDIR/bin/remove_ext \$template_list\`
else
template_list=\`echo *_struc.* | sed 's/_struc\./\./g'\`
template_list=\`\$FSLDIR/bin/remove_ext \$template_list | sort -u\`
echo "WARNING - study-specific template will be created from ALL input data - may not be group-size matched!!!"
fi
for g in \$template_list ; do
mergelist="\$mergelist \${g}_struc_GM_to_T"
done
\$FSLDIR/bin/fslmerge -t template_4D_GM \$mergelist
\$FSLDIR/bin/fslmaths template_4D_GM -Tmean template_GM
\$FSLDIR/bin/fslswapdim template_GM -x y z template_GM_flipped
\$FSLDIR/bin/fslmaths template_GM -add template_GM_flipped -div 2 template_GM_init
stage_tpl3
chmod +x fslvbm2c
fslvbm2c_id=`fsl_sub -j $fslvbm2b_id -T 15 -N fslvbm2c ./fslvbm2c`
echo Creating first-pass template: ID=$fslvbm2c_id
### Estimation of the registration parameters of GM to grey matter standard template
/bin/rm -f fslvbm2d
T=template_GM_init
for g in `$FSLDIR/bin/imglob *_struc.*` ; do
echo "${FSLDIR}/bin/fsl_reg ${g}_GM $T ${g}_GM_to_T_init $REG -fnirt \"--config=GM_2_MNI152GM_2mm.cnf\"" >> fslvbm2d
done
chmod a+x fslvbm2d
fslvbm2d_id=`$FSLDIR/bin/fsl_sub -j $fslvbm2c_id -T $HOWLONG -N fslvbm2d -t ./fslvbm2d`
echo Running registration to first-pass template: ID=$fslvbm2d_id
### Creation of the GM template by averaging all (or following the template_list for) the GM_nl_0 and GM_xflipped_nl_0 images
cat <<stage_tpl4 > fslvbm2e
#!/bin/sh
if [ -f ../template_list ] ; then
template_list=\`cat ../template_list\`
template_list=\`\$FSLDIR/bin/remove_ext \$template_list\`
else
template_list=\`echo *_struc.* | sed 's/_struc\./\./g'\`
template_list=\`\$FSLDIR/bin/remove_ext \$template_list | sort -u\`
echo "WARNING - study-specific template will be created from ALL input data - may not be group-size matched!!!"
fi
for g in \$template_list ; do
mergelist="\$mergelist \${g}_struc_GM_to_T_init"
done
\$FSLDIR/bin/fslmerge -t template_4D_GM \$mergelist
\$FSLDIR/bin/fslmaths template_4D_GM -Tmean template_GM
\$FSLDIR/bin/fslswapdim template_GM -x y z template_GM_flipped
\$FSLDIR/bin/fslmaths template_GM -add template_GM_flipped -div 2 template_GM
stage_tpl4
chmod +x fslvbm2e
fslvbm2e_id=`fsl_sub -j $fslvbm2d_id -T 15 -N fslvbm2e ./fslvbm2e`
echo Creating second-pass template: ID=$fslvbm2e_id
It would look like this.
Is there a way whereby I can fix the Xcode coloring issue?
What's confusing Xcode's syntax highlighting here is specifically the combination of heredocs (<<EOF) and escaped backticks (\`).
There's no way to fix it as-is, but, so long as there is no substituted content in the heredocs, you can use a quoted heredoc to remove the requirement for escaped backticks in the first place:
cat <<'stage_tpl3' > fslvbm2c
...
template_list=`cat ../template_list`
...
stage_tpl3
When the terminating label for a heredoc is enclosed in quotes, substitutions within the heredoc are disabled. It works the exact same way, and Xcode is able to highlight it more gracefully. (As a bonus, it's also easier to read and write the script without all the backslashes in the way!)
As an aside, note that it's conventional to always use the label "EOF" for heredocs. Some editors special-case this for syntax highlighting. It's also easier to spot than something specific to the document.

Resources