Bash Scripting : How to loop over X number of files, take input and write to a file in the same line - bash

So I have a program written in C that takes in some parameters: calling it allcell
some sample parameters: -m 1800 -n 9
the files being analyzed: cfdT100-0.trj, cfdT100-1.trj, cfdT100-2.trj, cfdT100-3.trj, ... cfdT100-19.trj
file being fed: template.file
out file: result.file
$ allcell -m 1800 -n 9 cfdT100-[0-19].trj < template.file > result.file
But when I htop, I see that only cfdT100-0.trj, cfdT100-1.trj and cfdT100-9.trj are being read. How do I make the shell read all the files from 0-19 ?
Additionally, when I write a script file to automate this, how should I enclose the line? Will this work:
"$($ allcell -m 1800 -n 9 cfdT100-[0-19].trj < template.file > result.file)"

I believe you want to change your glob expression to cfdT100-{0..19}.trj instead.
neech#nicolaw.uk:~ $ echo cfdT100-{0..19}.trj
cfdT100-0.trj cfdT100-1.trj cfdT100-2.trj cfdT100-3.trj cfdT100-4.trj cfdT100-5.trj cfdT100-6.trj cfdT100-7.trj cfdT100-8.trj cfdT100-9.trj cfdT100-10.trj cfdT100-11.trj cfdT100-12.trj cfdT100-13.trj cfdT100-14.trj cfdT100-15.trj cfdT100-16.trj cfdT100-17.trj cfdT100-18.trj cfdT100-19.trj
Your quoting on the scripted version looks acceptable. Just change the glob.

use recursion function for infinite loop
a()
{
echo "apple"
a
}
a
This the will make a infinite loop

Related

Nextflow: Missing output file(s) expected by process

I'm currently making a start on using Nextflow to develop a bioinformatics pipeline. Below, I've created a params.files variable which contains my FASTQ files, and then input this into fasta_files channel.
The process trimming and its scripts takes this channel as the input, and then ideally, I would output all the $sample".trimmed.fq.gz into the output channel, trimmed_channel. However, when I run this script, I get the following error:
Missing output file(s) `trimmed_files` expected by process `trimming` (1)
The nextflow script I'm trying to run is:
#! /usr/bin/env nextflow
params.files = files("$baseDir/FASTQ/*.fastq.gz")
println "fastq files for trimming:$params.files"
fasta_files = Channel.fromPath(params.files)
println "files in the fasta channel: $fasta_files"
process trimming {
input:
file fasta_file from fasta_files
output:
path trimmed_files into trimmed_channel
// the shell script to be run:
"""
#!/usr/bin/env bash
mkdir trimming_report
cd /home/usr/Nextflow
#Finding and renaming my FASTQ files
for file in FASTQ/*.fastq.gz; do
[ -f "\$file" ] || continue
name=\$(echo "\$file" | awk -F'[/]' '{ print \$2 }') #renaming fastq files.
sample=\$(echo "\$name" | awk -F'[.]' '{ print \$1 }') #renaming fastq files.
echo "Found" "\$name" "from:" "\$sample"
if [ ! -e FASTQ/"\$sample"_trimmed.fq.gz ]; then
trim_galore -j 8 "\$file" -o FASTQ #trim the files
mv "\$file"_trimming_report.txt trimming_report #moves to the directory trimming report
else
echo ""\$sample".trimmed.fq.gz exists skipping trim galore"
fi
done
trimmed_files="FASTQ/*_trimmed.fq.gz"
echo \$trimmed_files
"""
}
The script in the process works fine. However, I'm wondering if I'm misunderstanding or missing something obvious. If I've forgot to include something, please let me know and any help is appreciated!
Nextflow does not export the variable trimmed_files to its own scope unless you tell it to do so using the env output qualifier, however doing it that way would not be very idiomatic.
Since you know the pattern of your output files ("FASTQ/*_trimmed.fq.gz"), simply pass that pattern as output:
path "FASTQ/*_trimmed.fq.gz" into trimmed_channel
Some things you do, but probably want to avoid:
Changing directory inside your NF process, don't do this, it entirely breaks the whole concept of nextflow's /work folder setup.
Write a bash loop inside a NF process, if you set up your channels correctly there should only be 1 task per spawned process.
Pallie has already provided some sound advice and, of course, the right answer, which is: environment variables must be declared using the env qualifier.
However, given your script definition, I think there might be some misunderstanding about how best to skip the execution of previously generated results. The cache directive is enabled by default and when the pipeline is launched with the -resume option, additional attempts to execute a process using the same set of inputs, will cause the process execution to be skipped and will produce the stored data as the actual results.
This example uses the Nextflow DSL 2 for my convenience, but is not strictly required:
nextflow.enable.dsl=2
params.fastq_files = "${baseDir}/FASTQ/*.fastq.gz"
params.publish_dir = "./results"
process trim_galore {
tag { "${sample}:${fastq_file}" }
publishDir "${params.publish_dir}/TrimGalore", saveAs: { fn ->
fn.endsWith('.txt') ? "trimming_reports/${fn}" : fn
}
cpus 8
input:
tuple val(sample), path(fastq_file)
output:
tuple val(sample), path('*_trimmed.fq.gz'), emit: trimmed_fastq_files
path "${fastq_file}_trimming_report.txt", emit: trimming_report
"""
trim_galore \\
-j ${task.cpus} \\
"${fastq_file}"
"""
}
workflow {
Channel.fromPath( params.fastq_files )
| map { tuple( it.getSimpleName(), it ) }
| set { sample_fastq_files }
results = trim_galore( sample_fastq_files )
results.trimmed_fastq_files.view()
}
Run using:
nextflow run script.nf \
-ansi-log false \
--fastq_files '/home/usr/Nextflow/FASTQ/*.fastq.gz'

Shell script: Copy file and folder N times

I've two documents:
an .json
an folder with random content
where <transaction> is id+sequancial (id1, id2... idn)
I'd like to populate this structure (.json + folder) to n. I mean:
I'd like to have id1.json and id1 folder, an id2.json and id2 folder... idn.json and idn folder.
Is there anyway (shell script) to populate this content?
It would be something like:
for (i=0,i<n,i++) {
copy "id" file to "id+i" file
copy "id" folder to "id+i" folder
}
Any ideas?
Your shell syntax is off but after that, this should be trivial.
#!/bin/bash
for((i=0;i<$1;i++)); do
cp "id".json "id$i".json
cp -r "id" "id$i"
done
This expects the value of n as the sole argument to the script (which is visible inside the script in $1).
The C-style for((...)) loop is Bash only, and will not work with sh.
A proper production script would also check that it received the expected parameter in the expected format (a single positive number) but you will probably want to tackle such complications when you learn more.
Additionaly, here is a version working with sh:
#!/bin/sh
test -e id.json || { (>&2 echo "id.json not found") ; exit 1 ; }
{
seq 1 "$1" 2> /dev/null ||
(>&2 echo "usage: $0 transaction-count") && exit 1
} |
while read i
do
cp "id".json "id$i".json
cp -r "id" "id$i"
done

Write a script to put a series of files in sequence

I am beginning in scripting and I am trying to write a script in bash. I need a script to write a sequence of several file names that are numbered from 1 to 50 inside one file. These are trajectory files from MD simulations. My idea was to write something like:
for valor in {1..50}
do
echo "
#!/bin/bash
catdcd -o Traj-all.dcd -stride 10 -dcd traj-$valor.dcd" > Traj.bash
exit
However, I just got one file with the following line:
#!/bin/bash
catdcd -o Traj-all.dcd -stride 10 -dcd traj-50.dcd
exit
But what I really want is something like:
#!/bin/bash
catdcd -o Traj-all.dcd -stride 10 -dcd traj-1.dcd -dcd traj-2.dcd -dcd traj-3.dcd ... -dcd traj-50.dcd
exit
How can I solve this problem?
You need to read a bit more about bash brace expansion. You can do this:
{
echo "#!/bin/bash"
echo "catdcd -o Traj-all.dcd -stride 10" "-dec traj-"{1..50}".dcd"
# ^^^^^^^^^^^^^^^^^^^^^^^^^
} > Traj.bash
The underlined part is where the brace expansion will get expanded by the shell into
-dec traj-1.dcd -dec traj-2.dcd ... -dec traj-50.dcd
You don't need to explicitly end your script with exit -- the shell will exit by itself when it runs out of commands.
> truncates the file on open. Either only use it once before the loop to create the file and then append (>>) within the loop, or redirect the entire loop.
> foo
for ...
do ...
echo ... >> foo
done
...
{
for ...
do ...
echo ...
done
} > foo

Xcode bad coloring with shell script?

It seems that Xcode really sucks at coloring the shell script. For example, if you copy the following snippet into your Xcode, most of the whole chunk is colored red.
### Creation of the GM template by averaging all (or following the template_list for) the GM_nl_0 and GM_xflipped_nl_0 images
cat <<stage_tpl3 > fslvbm2c
#!/bin/sh
if [ -f ../template_list ] ; then
template_list=\`cat ../template_list\`
template_list=\`\$FSLDIR/bin/remove_ext \$template_list\`
else
template_list=\`echo *_struc.* | sed 's/_struc\./\./g'\`
template_list=\`\$FSLDIR/bin/remove_ext \$template_list | sort -u\`
echo "WARNING - study-specific template will be created from ALL input data - may not be group-size matched!!!"
fi
for g in \$template_list ; do
mergelist="\$mergelist \${g}_struc_GM_to_T"
done
\$FSLDIR/bin/fslmerge -t template_4D_GM \$mergelist
\$FSLDIR/bin/fslmaths template_4D_GM -Tmean template_GM
\$FSLDIR/bin/fslswapdim template_GM -x y z template_GM_flipped
\$FSLDIR/bin/fslmaths template_GM -add template_GM_flipped -div 2 template_GM_init
stage_tpl3
chmod +x fslvbm2c
fslvbm2c_id=`fsl_sub -j $fslvbm2b_id -T 15 -N fslvbm2c ./fslvbm2c`
echo Creating first-pass template: ID=$fslvbm2c_id
### Estimation of the registration parameters of GM to grey matter standard template
/bin/rm -f fslvbm2d
T=template_GM_init
for g in `$FSLDIR/bin/imglob *_struc.*` ; do
echo "${FSLDIR}/bin/fsl_reg ${g}_GM $T ${g}_GM_to_T_init $REG -fnirt \"--config=GM_2_MNI152GM_2mm.cnf\"" >> fslvbm2d
done
chmod a+x fslvbm2d
fslvbm2d_id=`$FSLDIR/bin/fsl_sub -j $fslvbm2c_id -T $HOWLONG -N fslvbm2d -t ./fslvbm2d`
echo Running registration to first-pass template: ID=$fslvbm2d_id
### Creation of the GM template by averaging all (or following the template_list for) the GM_nl_0 and GM_xflipped_nl_0 images
cat <<stage_tpl4 > fslvbm2e
#!/bin/sh
if [ -f ../template_list ] ; then
template_list=\`cat ../template_list\`
template_list=\`\$FSLDIR/bin/remove_ext \$template_list\`
else
template_list=\`echo *_struc.* | sed 's/_struc\./\./g'\`
template_list=\`\$FSLDIR/bin/remove_ext \$template_list | sort -u\`
echo "WARNING - study-specific template will be created from ALL input data - may not be group-size matched!!!"
fi
for g in \$template_list ; do
mergelist="\$mergelist \${g}_struc_GM_to_T_init"
done
\$FSLDIR/bin/fslmerge -t template_4D_GM \$mergelist
\$FSLDIR/bin/fslmaths template_4D_GM -Tmean template_GM
\$FSLDIR/bin/fslswapdim template_GM -x y z template_GM_flipped
\$FSLDIR/bin/fslmaths template_GM -add template_GM_flipped -div 2 template_GM
stage_tpl4
chmod +x fslvbm2e
fslvbm2e_id=`fsl_sub -j $fslvbm2d_id -T 15 -N fslvbm2e ./fslvbm2e`
echo Creating second-pass template: ID=$fslvbm2e_id
It would look like this.
Is there a way whereby I can fix the Xcode coloring issue?
What's confusing Xcode's syntax highlighting here is specifically the combination of heredocs (<<EOF) and escaped backticks (\`).
There's no way to fix it as-is, but, so long as there is no substituted content in the heredocs, you can use a quoted heredoc to remove the requirement for escaped backticks in the first place:
cat <<'stage_tpl3' > fslvbm2c
...
template_list=`cat ../template_list`
...
stage_tpl3
When the terminating label for a heredoc is enclosed in quotes, substitutions within the heredoc are disabled. It works the exact same way, and Xcode is able to highlight it more gracefully. (As a bonus, it's also easier to read and write the script without all the backslashes in the way!)
As an aside, note that it's conventional to always use the label "EOF" for heredocs. Some editors special-case this for syntax highlighting. It's also easier to spot than something specific to the document.

Run a string as a command within a Bash script

I have a Bash script that builds a string to run as a command
Script:
#! /bin/bash
matchdir="/home/joao/robocup/runner_workdir/matches/testmatch/"
teamAComm="`pwd`/a.sh"
teamBComm="`pwd`/b.sh"
include="`pwd`/server_official.conf"
serverbin='/usr/local/bin/rcssserver'
cd $matchdir
illcommando="$serverbin include='$include' server::team_l_start = '${teamAComm}' server::team_r_start = '${teamBComm}' CSVSaver::save='true' CSVSaver::filename = 'out.csv'"
echo "running: $illcommando"
# $illcommando > server-output.log 2> server-error.log
$illcommando
which does not seem to supply the arguments correctly to the $serverbin.
Script output:
running: /usr/local/bin/rcssserver include='/home/joao/robocup/runner_workdir/server_official.conf' server::team_l_start = '/home/joao/robocup/runner_workdir/a.sh' server::team_r_start = '/home/joao/robocup/runner_workdir/b.sh' CSVSaver::save='true' CSVSaver::filename = 'out.csv'
rcssserver-14.0.1
Copyright (C) 1995, 1996, 1997, 1998, 1999 Electrotechnical Laboratory.
2000 - 2009 RoboCup Soccer Simulator Maintenance Group.
Usage: /usr/local/bin/rcssserver [[-[-]]namespace::option=value]
[[-[-]][namespace::]help]
[[-[-]]include=file]
Options:
help
display generic help
include=file
parse the specified configuration file. Configuration files
have the same format as the command line options. The
configuration file specified will be parsed before all
subsequent options.
server::help
display detailed help for the "server" module
player::help
display detailed help for the "player" module
CSVSaver::help
display detailed help for the "CSVSaver" module
CSVSaver Options:
CSVSaver::save=<on|off|true|false|1|0|>
If save is on/true, then the saver will attempt to save the
results to the database. Otherwise it will do nothing.
current value: false
CSVSaver::filename='<STRING>'
The file to save the results to. If this file does not
exist it will be created. If the file does exist, the results
will be appended to the end.
current value: 'out.csv'
if I just paste the command /usr/local/bin/rcssserver include='/home/joao/robocup/runner_workdir/server_official.conf' server::team_l_start = '/home/joao/robocup/runner_workdir/a.sh' server::team_r_start = '/home/joao/robocup/runner_workdir/b.sh' CSVSaver::save='true' CSVSaver::filename = 'out.csv' (in the output after "runnning: ") it works fine.
You can use eval to execute a string:
eval $illcommando
your_command_string="..."
output=$(eval "$your_command_string")
echo "$output"
I usually place commands in parentheses $(commandStr), if that doesn't help I find bash debug mode great, run the script as bash -x script
don't put your commands in variables, just run it
matchdir="/home/joao/robocup/runner_workdir/matches/testmatch/"
PWD=$(pwd)
teamAComm="$PWD/a.sh"
teamBComm="$PWD/b.sh"
include="$PWD/server_official.conf"
serverbin='/usr/local/bin/rcssserver'
cd $matchdir
$serverbin include=$include server::team_l_start = ${teamAComm} server::team_r_start=${teamBComm} CSVSaver::save='true' CSVSaver::filename = 'out.csv'
./me casts raise_dead()
I was looking for something like this, but I also needed to reuse the same string minus two parameters so I ended up with something like:
my_exe ()
{
mysql -sN -e "select $1 from heat.stack where heat.stack.name=\"$2\";"
}
This is something I use to monitor openstack heat stack creation. In this case I expect two conditions, an action 'CREATE' and a status 'COMPLETE' on a stack named "Somestack"
To get those variables I can do something like:
ACTION=$(my_exe action Somestack)
STATUS=$(my_exe status Somestack)
if [[ "$ACTION" == "CREATE" ]] && [[ "$STATUS" == "COMPLETE" ]]
...
Here is my gradle build script that executes strings stored in heredocs:
current_directory=$( realpath "." )
GENERATED=${current_directory}/"GENERATED"
build_gradle=$( realpath build.gradle )
## touch because .gitignore ignores this folder:
touch $GENERATED
COPY_BUILD_FILE=$( cat <<COPY_BUILD_FILE_HEREDOC
cp
$build_gradle
$GENERATED/build.gradle
COPY_BUILD_FILE_HEREDOC
)
$COPY_BUILD_FILE
GRADLE_COMMAND=$( cat <<GRADLE_COMMAND_HEREDOC
gradle run
--build-file
$GENERATED/build.gradle
--gradle-user-home
$GENERATED
--no-daemon
GRADLE_COMMAND_HEREDOC
)
$GRADLE_COMMAND
The lone ")" are kind of ugly. But I have no clue how to fix that asthetic aspect.
To see all commands that are being executed by the script, add the -x flag to your shabang line, and execute the command normally:
#! /bin/bash -x
matchdir="/home/joao/robocup/runner_workdir/matches/testmatch/"
teamAComm="`pwd`/a.sh"
teamBComm="`pwd`/b.sh"
include="`pwd`/server_official.conf"
serverbin='/usr/local/bin/rcssserver'
cd $matchdir
$serverbin include="$include" server::team_l_start="${teamAComm}" server::team_r_start="${teamBComm}" CSVSaver::save='true' CSVSaver::filename='out.csv'
Then if you sometimes want to ignore the debug output, redirect stderr somewhere.
For me echo XYZ_20200824.zip | grep -Eo '[[:digit:]]{4}[[:digit:]]{2}[[:digit:]]{2}'
was working fine but unable to store output of command into variable.
I had same issue I tried eval but didn't got output.
Here is answer for my problem:
cmd=$(echo XYZ_20200824.zip | grep -Eo '[[:digit:]]{4}[[:digit:]]{2}[[:digit:]]{2}')
echo $cmd
My output is now 20200824

Resources