Shell script: Copy file and folder N times - bash

I've two documents:
an .json
an folder with random content
where <transaction> is id+sequancial (id1, id2... idn)
I'd like to populate this structure (.json + folder) to n. I mean:
I'd like to have id1.json and id1 folder, an id2.json and id2 folder... idn.json and idn folder.
Is there anyway (shell script) to populate this content?
It would be something like:
for (i=0,i<n,i++) {
copy "id" file to "id+i" file
copy "id" folder to "id+i" folder
}
Any ideas?

Your shell syntax is off but after that, this should be trivial.
#!/bin/bash
for((i=0;i<$1;i++)); do
cp "id".json "id$i".json
cp -r "id" "id$i"
done
This expects the value of n as the sole argument to the script (which is visible inside the script in $1).
The C-style for((...)) loop is Bash only, and will not work with sh.
A proper production script would also check that it received the expected parameter in the expected format (a single positive number) but you will probably want to tackle such complications when you learn more.

Additionaly, here is a version working with sh:
#!/bin/sh
test -e id.json || { (>&2 echo "id.json not found") ; exit 1 ; }
{
seq 1 "$1" 2> /dev/null ||
(>&2 echo "usage: $0 transaction-count") && exit 1
} |
while read i
do
cp "id".json "id$i".json
cp -r "id" "id$i"
done

Related

Nextflow: Missing output file(s) expected by process

I'm currently making a start on using Nextflow to develop a bioinformatics pipeline. Below, I've created a params.files variable which contains my FASTQ files, and then input this into fasta_files channel.
The process trimming and its scripts takes this channel as the input, and then ideally, I would output all the $sample".trimmed.fq.gz into the output channel, trimmed_channel. However, when I run this script, I get the following error:
Missing output file(s) `trimmed_files` expected by process `trimming` (1)
The nextflow script I'm trying to run is:
#! /usr/bin/env nextflow
params.files = files("$baseDir/FASTQ/*.fastq.gz")
println "fastq files for trimming:$params.files"
fasta_files = Channel.fromPath(params.files)
println "files in the fasta channel: $fasta_files"
process trimming {
input:
file fasta_file from fasta_files
output:
path trimmed_files into trimmed_channel
// the shell script to be run:
"""
#!/usr/bin/env bash
mkdir trimming_report
cd /home/usr/Nextflow
#Finding and renaming my FASTQ files
for file in FASTQ/*.fastq.gz; do
[ -f "\$file" ] || continue
name=\$(echo "\$file" | awk -F'[/]' '{ print \$2 }') #renaming fastq files.
sample=\$(echo "\$name" | awk -F'[.]' '{ print \$1 }') #renaming fastq files.
echo "Found" "\$name" "from:" "\$sample"
if [ ! -e FASTQ/"\$sample"_trimmed.fq.gz ]; then
trim_galore -j 8 "\$file" -o FASTQ #trim the files
mv "\$file"_trimming_report.txt trimming_report #moves to the directory trimming report
else
echo ""\$sample".trimmed.fq.gz exists skipping trim galore"
fi
done
trimmed_files="FASTQ/*_trimmed.fq.gz"
echo \$trimmed_files
"""
}
The script in the process works fine. However, I'm wondering if I'm misunderstanding or missing something obvious. If I've forgot to include something, please let me know and any help is appreciated!
Nextflow does not export the variable trimmed_files to its own scope unless you tell it to do so using the env output qualifier, however doing it that way would not be very idiomatic.
Since you know the pattern of your output files ("FASTQ/*_trimmed.fq.gz"), simply pass that pattern as output:
path "FASTQ/*_trimmed.fq.gz" into trimmed_channel
Some things you do, but probably want to avoid:
Changing directory inside your NF process, don't do this, it entirely breaks the whole concept of nextflow's /work folder setup.
Write a bash loop inside a NF process, if you set up your channels correctly there should only be 1 task per spawned process.
Pallie has already provided some sound advice and, of course, the right answer, which is: environment variables must be declared using the env qualifier.
However, given your script definition, I think there might be some misunderstanding about how best to skip the execution of previously generated results. The cache directive is enabled by default and when the pipeline is launched with the -resume option, additional attempts to execute a process using the same set of inputs, will cause the process execution to be skipped and will produce the stored data as the actual results.
This example uses the Nextflow DSL 2 for my convenience, but is not strictly required:
nextflow.enable.dsl=2
params.fastq_files = "${baseDir}/FASTQ/*.fastq.gz"
params.publish_dir = "./results"
process trim_galore {
tag { "${sample}:${fastq_file}" }
publishDir "${params.publish_dir}/TrimGalore", saveAs: { fn ->
fn.endsWith('.txt') ? "trimming_reports/${fn}" : fn
}
cpus 8
input:
tuple val(sample), path(fastq_file)
output:
tuple val(sample), path('*_trimmed.fq.gz'), emit: trimmed_fastq_files
path "${fastq_file}_trimming_report.txt", emit: trimming_report
"""
trim_galore \\
-j ${task.cpus} \\
"${fastq_file}"
"""
}
workflow {
Channel.fromPath( params.fastq_files )
| map { tuple( it.getSimpleName(), it ) }
| set { sample_fastq_files }
results = trim_galore( sample_fastq_files )
results.trimmed_fastq_files.view()
}
Run using:
nextflow run script.nf \
-ansi-log false \
--fastq_files '/home/usr/Nextflow/FASTQ/*.fastq.gz'

batch processing : File name comparison error

I have written a program (Cifti_subject_fmri) which compares whether file name matches in two folders and essentially executes a set of instructions
#!/bin/bash -- fix_mni_paths
source activate ciftify_v1.0.0
export SUBJECTS_DIR=/scratch/m/mchakrav/dev/functional_data
export HCP_DATA=/scratch/m/mchakrav/dev/tCDS_ciftify
## make the $SUBJECTS_DIR if it does not already exist
mkdir -p ${HCP_DATA}
SUBJECTS=`cd $SUBJECTS_DIR; ls -1d *` ## list of my subjects
HCP=`cd $HCP_DATA; ls -1d *` ## List of HCP Subjects
cd $HCP_DATA
## submit the files to the queue
for i in $SUBJECTS;do
for j in $HCP ; do
if [[ $i == $j ]];then
parallel "echo ciftify_subject_fmri $i/filtered_func_data.nii.gz $j fMRI " ::: $SUBJECTS |qbatch --walltime '05:00:00' --ppj 8 -c 4 -j 4 -N ciftify_subject_fmri -
fi
done
done
When i run this code in the cluster i am getting an error which says
./Cifti_subject_fmri: [[AS1: command not found
The query ciftify_subject_fmri is part of toolbox ciftify, for it to execute it requires following instructions
ciftify_subject_fmri <func.nii.gz> <Subject> <NameOffMRI>
I have 33 subjects [AS1 -AS33] each with its own func.nii.gz files located SUBJECTS directory,the results need to be populated in HCP directory, fMRI is name of file format .
Could some one kindly let me know why i am getting an error in loop

Saving function output into a variable named in an argument

I have an interesting problem that I can't seem to find the answer for. I am creating a simple app that will help my dev department auto launch docker containers with NginX and config files. My problem is, for some reason I can't get the bash script to store the name of a folder, while scanning the directory. Here is an extremely simple example of what I am talking about....
#!/bin/bash
getFolder() {
local __myResultFolder=$1
local folder
for d in */ ; do
$folder=$d
done
__myResultFolder=$folder
return $folder
}
getFolder FOLDER
echo "Using folder: $FOLDER"
I then save that simple script as folder_test.sh and put it in a folder where there is only one folder, change owner to me, and give it correct permissions. However, when I run the script I keep getting the error...
./folder_test.sh: 8 ./folder_test.sh: =test_folder/: not found
I have tried putting the $folder=$d part in different types of quotes, but nothing works. I have tried $folder="'"$d"'", $folder=`$d`, $folder="$d" but none of it works. Driving me insane, any help would be greatly appreciated. Thank you.
If you want to save your result into a named variable, what you're doing is called "indirect assignment"; it's covered in BashFAQ #6.
One way is the following:
#!/bin/bash
# ^^^^ not /bin/sh; bash is needed for printf -v
getFolder() {
local __myResultFolder=$1
local folder d
for d in */ ; do
folder=$d
done
printf -v "$__myResultFolder" %s "$folder"
}
getFolder folderName
echo "$folderName"
Other approaches include:
Using read:
IFS= read -r -d '' "$__myResultFolder" < <(printf '%s\0' "$folder")
Using eval (very, very carefully):
# note \$folder -- we're only trusting the destination variable name
# ...not trusting the content.
eval "$__myResultFolder=\$folder"
Using namevars (only if using new versions of bash):
getFolder() {
local -n __myResultFolder=$1
# ...your other logic here...
__myResultFolder=$folder
}
The culprit is the line
$folder=$d
which is treating the folder names to stored with a = sign before and tried to expand it in that name i.e. literally treats the name =test_folder/ as an executable to be run under shell but does not find a file of that name. Change it to
folder=$d
Also, bash functions' return value is only restricted to integer types and you cannot send a string to the calling function. If you wanted to send a non-zero return code to the calling function on $folder being empty you could add a line
if [ -z "$folder" ]; then return 1; else return 0; fi
(or) if you want to return a string value from the function, do not use return, just do echo of the name and use command-substitution with the function name, i.e.
getFolder() {
local __myResultFolder=$1
local folder
for d in */ ; do
folder=$d
done
__myResultFolder=$folder
echo "$folder"
}
folderName=$(getFolder FOLDER)
echo "$folderName"

How can I edit a .conf file easily?

So I read the easiest way to use .conf files for bash scripts is to use source to load such files. Now, what if I want to edit this file ?
Some code I found does a really good job :
function set_config(){
sed -i "s/^\($1\s*=\s*\).*\$/\1$2/" $conf_file
}
But, if the variable is not yet defined, it doesn't define it, nor does it check if the parameters are passed well, isn't secure, doesn't handle default values etc...
Does reliable tools/code already exists to edit .conf file which contain key="value" pairs ? For instance, I would like to be able to do things like this :
$conf_file="my_script.conf"
conf_load $conf_file #should create the file if it doesn't exist !
read=$(conf_get_value "data" "default_value") #should read the value with key "data", defaulting to "default_value"
if [[ $? = 0 ]] #we should be able to know if the read was successful
then
echo "Successfully read value for field \"data\" : $read"
else
echo "Default value for field \"data\" : $read"
fi
conf_set "something_new" "a great value!" #should add the key "something_new" as it doesn't exist
conf_set "data" "new_value" #should edit the value with key "data"
if [[ $? = 0 ]]
then
echo "Edit successful !"
else #something went wrong :-/
echo "Edit failed !"
fi
before running this code, the conf file would contain
data="some_value"
and after it would be
data="new_value"
something_new="a great value!"
and the code should output
Successfully read value for field "data" : some_value
Edit successful !
I am using bash version 4.3.30 .
Thanks for your help.
I'd to that with awk since it's rather good at tokenizing:
# overwrite config's entries for KEY with VALUE or else appends the definition
# Usage: set_config KEY VALUE
set_config() {
[ -n "$1" ] && awk -F= -v key="$1" -v new="$1=\"$2\"" '
$1 == key { $0 = new; key_found = 1; }
{ print }
END { if (!key_found) { print new; }
' "$conf_file" > "$conf_file.new" \
&& cat "$conf_file.new" > "$conf_file" && rm "$conf_file.new"
}
If run without arguments, set_config() will do nothing and return false. If run with only one argument, it will create an empty value (outputting KEY="").
The awk command parses the .conf file line by line, looking for each definition of the given key and altering it to the new value. All lines are then printed (with or without modification), preserving the original order. If the key hasn't yet been found by the end of the file, this appends the new definition.
Because you can't pipe a file atop itself, this gets saved with a ".new" extension and then copied atop the original in a manner that preserves permissions. The ".new" copy is then removed. I used && to ensure that these never happen if an error occurred earlier in the function.
Also note that the type of ".conf file" you're referring to (the type you source with a POSIX shell) will never have spaces around its equals signs, so the \s* parts of your sed command aren't needed.

accessing newly created directory in shell script

I'm attempting to make a new folder, a duplicate of the input, and then tar the contents of that folder. I can't figure out why - but it seems like instead of searching the contents of my newly created directory - it is searching my entire computer... returning lines such as
/Applications/GarageBand.app/Contents/Frameworks/MAAlchemy.framework/Resources/Libraries/WaveOsc/Sine/Sine - Vocal 1.raw is a file
/Applications/GarageBand.app/Contents/Frameworks/MAAlchemy.framework/Resources/Libraries/WaveOsc/Sine/Sine - Vocal 2.raw is a file
/Applications/GarageBand.app/Contents/Frameworks/MAAlchemy.framework/Resources/Libraries/WaveOsc/Sine/Triangle - Arp.raw is a file
/Applications/GarageBand.app/Contents/Frameworks/MAAlchemy.framework/Resources/Libraries/WaveOsc/Sine/Triangle - Asym 4.raw is a file
/Applications/GarageBand.app/Contents/Frameworks/MAAlchemy.framework/Resources/Libraries/WaveOsc/Sine/Triangle - Eml.raw is a file
/Applications/GarageBand.app/Contents/Frameworks/MAAlchemy.framework/Resources/Libraries/WaveOsc/Square is a folder
/Applications/GarageBand.app/Contents/Frameworks/MAAlchemy.framework/Resources/Libraries/WaveOsc/Square/Square - Arp.raw is a file
/Applications/GarageBand.app/Contents/Frameworks/MAAlchemy.framework/Resources/Libraries/WaveOsc/Square/Square - Bl Saw.raw is a file
can you guys spot a simple error?
BTW, I know that the script to tar isn't present yet, but that will be easy once i can navigate the new folder.
#!/bin/bash
##--- deal with help args ------------------
##
print_help_message() {
printf "Usage: \n"
printf "\t./`basename $0` <input_dir> <output_dir>\n"
printf "where\n"
printf "\tinput_dir : (required) the input directory.\n"
printf "\toutput_dir : (required) the output directory.\n"
}
if [ "$1" == "help" ]; then
print_help_message
exit 1
fi
## ------ get cli args ----------------------
##
if [ $# == 2 ]; then
INPUT_DIR="$1"
OUTPUT_DIR="$2"
fi
## ------ tree traversal function -----------
##
mkdir "$2"
cp -r "$1"/* "$2"/
## ------ return output dir name ------------
##
return_output_dir() {
echo $OUTPUT_DIR/$(basename $(basename $(dirname $1)))
}
bt() {
output_dir="$1"
for filename in $output_dir/*; do
if [ -d "${filename}" ]; then
echo "$filename is a folder"
bt $filename
else
echo "$filename is a file"
fi
done
}
## ------ main ------------------------------
##
main() {
bt $return_output_dir
exit 0
}
main
}
Well, I can tell you why it's doing that, but I'm not clear on what it's supposed to be doing, so I'm not sure how to fix it. The immediate problem is that return_output_dir is a function, not a variable, so in the command bt $return_output_dir the $return_output_dir part expands to ... nothing, and bt gets run with no argument. That means that inside bt, output_dir gets set to the empty string, so for filename in $output_dir/* becomes for filename in /*, which iterates over the top-level items on your boot volume.
There are a number of other things that're confusing/weird about this code:
The function main() doesn't seem to serve any purpose -- some of the main-line code is outside it (notably, the argument parsing stuff), some inside, for no apparent reason. Having a main function is required in some languages, but in a shell script it generally makes more sense to just put the main code inline. (Also, functions shouldn't exit, they should return.)
You have variables named both OUTPUT_DIR and output_dir. Use distinct names. Also, it's generally best to stick to lowercase (or mixed-case) variable names, to avoid conflicts with the variables that're used by the shell and other programs.
You copy $1 and $2 into INPUT_DIR and OUTPUT_DIR, then continue to use $1 and $2 rather than the more-clearly-named variables you just copied them into.
output_dir is changed in the recursive function, but not declared as local; this means that inner invocations of bt will be changing the values that outer ones might try to use, leading to weirdness. Declare function-local variables as local to avoid trouble.
$(basename $(basename $(dirname $1))) doesn't make sense. Suppose $1 is "/foo/bar/baz/quux": then dirname $1 returns /foo/bar/baz, basename /foo/bar/baz returns "baz", and basename baz returns "baz" again. The second basename isn't doing anything! And in any case, I'm pretty sure the whole thing isn't doing what you expect it to.
What directory is bt supposed to be recursing through? Nothing in how you call it has any reference to either INPUT_DIR or OUTPUT_DIR.
As a rule, you should put variable references in double-quotes (e.g. for filename in "$output_dir"/* and bt "$filename"). You do this in some places, but not others.

Resources