Conditional use of functions? - bash

I created a bash script that parses ASCII files into a comma delimited output. It's worked great. Now, a new file layout for these files is being gradually introduced.
My script has now two parsing functions (one per layout) that I want to call depending on a specific marker that is present in the ASCII file header. The script is structured thusly:
#!/bin/bash
function parseNewfile() {...parse stuff...return stuff...}
function parseOldfile() {...parse stuff...return stuff...}
#loop thru ASCII files array
i=0
while [ $i -lt $len ]; do
#check if file contains marker for new layout
grep CSVHeaderBox output_$i.ASC
#calls parsing function based on exit code
if [ $? -eq 0 ]
then
CXD=`parseNewfile`
else
CXD=`parseOldfile`
fi
echo ${array[$i]}| awk -v cxd=`echo $CXD` ....
let i++
done>>${outdir}/outfile.csv
...
The script does not err out. It always calls the original function "parseOldfile" and ignores the new one. Even when I specifically feed my script with several files with the new layout.
What I am trying to do seem very trivial. What am I missing here?
EDIT: Samples of old and new file layouts.
1) OLD File Layout
F779250B
=====BOX INFORMATION=====
Model = R15-100
Man Date = 07/17/2002
BIST Version = 3.77
SW Version = 0x122D
SW Name = v1b1645
HW Version = 1.1
Receiver ID = 00089787556
=====DISK INFORMATION=====
....
2) NEW File Layout
F779250B
=====BOX INFORMATION=====
Model = HR22-100
Man Date = 07/17/2008
BIST Version = 7.55
SW Version = 0x066D
SW Name = v18m1fgu
HW Version = 2.3
Receiver ID = 028910170936
CSVHeaderBox:Platform,ManufactureDate,BISTVersion,SWVersion,SWName,HWRevision,RID
CSVValuesBox:HR22-100,20080717,7.55,0x66D,v18m1fgu,2.3,028910170936
=====DISK INFORMATION=====
....

This may not solve your problem, but a potential performance boost: instead of
grep CSVHeaderBox output_$i.ASC
#calls parsing function based on exit code
if [ $? -eq 0 ]
use
if grep -q CSVHeaderBox output_$i.ASC
qrep -q will exit successfully on the first match, so it doesn't have to scan the whole file. Plus you don't have to bother with the $? var.
Don't do this:
awk -v cxd=`echo $CXD`
Do this:
awk -v cxd="$CXD"

I'm not sure if this solves the OP's requirement.
What's the need for awk if your function knows how to parse the file?
#/bin/bash
function f1() {
echo "f1() says $#"
}
function f2() {
echo "f2() says $#"
}
FUN="f1"
${FUN} "foo"
FUN="f2"
${FUN} "bar"

I am bit embarrassed to write this but I solved my "problem".
After gedit (I am on Ubuntu) err-ed out several dozen times about "Trailing spaces", I copied and pasted my code into a new file and re-run my script.
It worked.
I have no explanation why.
Thanks to everyone for taking the time.

Related

Nextflow: Missing output file(s) expected by process

I'm currently making a start on using Nextflow to develop a bioinformatics pipeline. Below, I've created a params.files variable which contains my FASTQ files, and then input this into fasta_files channel.
The process trimming and its scripts takes this channel as the input, and then ideally, I would output all the $sample".trimmed.fq.gz into the output channel, trimmed_channel. However, when I run this script, I get the following error:
Missing output file(s) `trimmed_files` expected by process `trimming` (1)
The nextflow script I'm trying to run is:
#! /usr/bin/env nextflow
params.files = files("$baseDir/FASTQ/*.fastq.gz")
println "fastq files for trimming:$params.files"
fasta_files = Channel.fromPath(params.files)
println "files in the fasta channel: $fasta_files"
process trimming {
input:
file fasta_file from fasta_files
output:
path trimmed_files into trimmed_channel
// the shell script to be run:
"""
#!/usr/bin/env bash
mkdir trimming_report
cd /home/usr/Nextflow
#Finding and renaming my FASTQ files
for file in FASTQ/*.fastq.gz; do
[ -f "\$file" ] || continue
name=\$(echo "\$file" | awk -F'[/]' '{ print \$2 }') #renaming fastq files.
sample=\$(echo "\$name" | awk -F'[.]' '{ print \$1 }') #renaming fastq files.
echo "Found" "\$name" "from:" "\$sample"
if [ ! -e FASTQ/"\$sample"_trimmed.fq.gz ]; then
trim_galore -j 8 "\$file" -o FASTQ #trim the files
mv "\$file"_trimming_report.txt trimming_report #moves to the directory trimming report
else
echo ""\$sample".trimmed.fq.gz exists skipping trim galore"
fi
done
trimmed_files="FASTQ/*_trimmed.fq.gz"
echo \$trimmed_files
"""
}
The script in the process works fine. However, I'm wondering if I'm misunderstanding or missing something obvious. If I've forgot to include something, please let me know and any help is appreciated!
Nextflow does not export the variable trimmed_files to its own scope unless you tell it to do so using the env output qualifier, however doing it that way would not be very idiomatic.
Since you know the pattern of your output files ("FASTQ/*_trimmed.fq.gz"), simply pass that pattern as output:
path "FASTQ/*_trimmed.fq.gz" into trimmed_channel
Some things you do, but probably want to avoid:
Changing directory inside your NF process, don't do this, it entirely breaks the whole concept of nextflow's /work folder setup.
Write a bash loop inside a NF process, if you set up your channels correctly there should only be 1 task per spawned process.
Pallie has already provided some sound advice and, of course, the right answer, which is: environment variables must be declared using the env qualifier.
However, given your script definition, I think there might be some misunderstanding about how best to skip the execution of previously generated results. The cache directive is enabled by default and when the pipeline is launched with the -resume option, additional attempts to execute a process using the same set of inputs, will cause the process execution to be skipped and will produce the stored data as the actual results.
This example uses the Nextflow DSL 2 for my convenience, but is not strictly required:
nextflow.enable.dsl=2
params.fastq_files = "${baseDir}/FASTQ/*.fastq.gz"
params.publish_dir = "./results"
process trim_galore {
tag { "${sample}:${fastq_file}" }
publishDir "${params.publish_dir}/TrimGalore", saveAs: { fn ->
fn.endsWith('.txt') ? "trimming_reports/${fn}" : fn
}
cpus 8
input:
tuple val(sample), path(fastq_file)
output:
tuple val(sample), path('*_trimmed.fq.gz'), emit: trimmed_fastq_files
path "${fastq_file}_trimming_report.txt", emit: trimming_report
"""
trim_galore \\
-j ${task.cpus} \\
"${fastq_file}"
"""
}
workflow {
Channel.fromPath( params.fastq_files )
| map { tuple( it.getSimpleName(), it ) }
| set { sample_fastq_files }
results = trim_galore( sample_fastq_files )
results.trimmed_fastq_files.view()
}
Run using:
nextflow run script.nf \
-ansi-log false \
--fastq_files '/home/usr/Nextflow/FASTQ/*.fastq.gz'

Saving function output into a variable named in an argument

I have an interesting problem that I can't seem to find the answer for. I am creating a simple app that will help my dev department auto launch docker containers with NginX and config files. My problem is, for some reason I can't get the bash script to store the name of a folder, while scanning the directory. Here is an extremely simple example of what I am talking about....
#!/bin/bash
getFolder() {
local __myResultFolder=$1
local folder
for d in */ ; do
$folder=$d
done
__myResultFolder=$folder
return $folder
}
getFolder FOLDER
echo "Using folder: $FOLDER"
I then save that simple script as folder_test.sh and put it in a folder where there is only one folder, change owner to me, and give it correct permissions. However, when I run the script I keep getting the error...
./folder_test.sh: 8 ./folder_test.sh: =test_folder/: not found
I have tried putting the $folder=$d part in different types of quotes, but nothing works. I have tried $folder="'"$d"'", $folder=`$d`, $folder="$d" but none of it works. Driving me insane, any help would be greatly appreciated. Thank you.
If you want to save your result into a named variable, what you're doing is called "indirect assignment"; it's covered in BashFAQ #6.
One way is the following:
#!/bin/bash
# ^^^^ not /bin/sh; bash is needed for printf -v
getFolder() {
local __myResultFolder=$1
local folder d
for d in */ ; do
folder=$d
done
printf -v "$__myResultFolder" %s "$folder"
}
getFolder folderName
echo "$folderName"
Other approaches include:
Using read:
IFS= read -r -d '' "$__myResultFolder" < <(printf '%s\0' "$folder")
Using eval (very, very carefully):
# note \$folder -- we're only trusting the destination variable name
# ...not trusting the content.
eval "$__myResultFolder=\$folder"
Using namevars (only if using new versions of bash):
getFolder() {
local -n __myResultFolder=$1
# ...your other logic here...
__myResultFolder=$folder
}
The culprit is the line
$folder=$d
which is treating the folder names to stored with a = sign before and tried to expand it in that name i.e. literally treats the name =test_folder/ as an executable to be run under shell but does not find a file of that name. Change it to
folder=$d
Also, bash functions' return value is only restricted to integer types and you cannot send a string to the calling function. If you wanted to send a non-zero return code to the calling function on $folder being empty you could add a line
if [ -z "$folder" ]; then return 1; else return 0; fi
(or) if you want to return a string value from the function, do not use return, just do echo of the name and use command-substitution with the function name, i.e.
getFolder() {
local __myResultFolder=$1
local folder
for d in */ ; do
folder=$d
done
__myResultFolder=$folder
echo "$folder"
}
folderName=$(getFolder FOLDER)
echo "$folderName"

Get bash function path from name

A hitchhiker, waned by the time a function is taking to complete, wishes to find where a function is located, so that he can observe the function for himself by editting the file location. He does not wish to print the function body to the shell, simply get the path of the script file containing the function. Our hitchhiker only knows the name of his function, which is answer_life.
Imagine he has a function within a file universal-questions.sh, defined like this, the path of which is not known to our hitchhiker:
function answer_life() {
sleep $(date --date='7500000 years' +%s)
echo "42"
}
Another script, called hitchhiker-helper-scripts.sh, is defined below. It has the function above source'd within it (the hitchhiker doesn't understand source either, I guess. Just play ball.):
source "/usr/bin/universal-questions.sh"
function find_life_answer_script() {
# Print the path of the script containing `answer_life`
somecommand "answer_life" # Should output the path of the script containing the function.
}
So this, my intrepid scripter, is where you come in. Can you replace the comment with code in find_life_answer_script that allows our hitchhiker to find where the function is located?
In bash operating in extended debug mode, declare -F will give you the function name, line number, and path (as sourced):
function find_life_answer_script() {
( shopt -s extdebug; declare -F answer_life )
}
Like:
$ find_life_answer_script
answer_life 3 ./universal-questions.sh
Running a sub-shell lets you set extdebug mode without affecting any prior settings.
Your hitchhiker can also try to find the answer this way:
script=$(readlink -f "$0")
sources=$(grep -oP 'source\s+\K[\w\/\.]+' $script)
for s in "${sources[#]}"
do
matches=$(grep 'function\s+answer_life' $s)
if [ -n "${matches[0]}" ]; then
echo "$s: Nothing is here ("
else
echo "$s: Congrats! Here is your answer!"
fi
done
This is for case if debug mode will be unavailable on some planet )

Are there any existing methods for importing functions from other scripts without sourcing the entire script?

I am working on a large shell program and need a way to import functions from other scripts as required without polluting the global scope with all the internal functions from that script.
UPDATE: However, those imported functions have internal dependancies. So the imported function must be executed in the context of its script.
I came up with this solution and wonder if there is any existing strategy out there and if not, perhaps this is a really bad idea?
PLEASE TAKE A LOOK AT THE POSTED SOLUTION BEFORE RESPONDING
example usage of my solution:
main.sh
import user get_name
import user set_name
echo "hello $(get_name)"
echo "Enter a new user name :"
while true; do
read user_input < /dev/tty
done
set_name $user_input
user.sh
import state
set_name () {
state save "user_name" "$1"
}
get_name () {
state get_value "user_name"
}
As one approach, you could put a comment in the script to indicate where you want to stop sourcing:
$ cat script
fn() { echo "You are running fn"; }
#STOP HERE
export var="Unwanted name space pollution"
And then, if you are using bash, source it like this:
source <(sed '/#STOP HERE/q' script)
<(...) is process substitution and our process, sed '/#STOP HERE/q' script just extracts the lines from script until the stop line is reached.
Adding more precise control
We can select particular sections from a file if we add both start and stop flags:
$ cat script
export var1="Unwanted name space pollution"
#START
fn1() { echo "You are running fn1"; }
#STOP
export var2="More unwanted name space pollution"
#START
fn2() { echo "You are running fn2"; }
#STOP
export var3="More unwanted name space pollution"
And then source the file like this:
source <(sed -n '/#START/,/#STOP/p' script)
create standalone shel script that do this
will have 2 argument the file name and the function name
it will source the input file first
it will then use declare -f function name
in your code you can include functions like this
eval "./importfunctions.sh filename functionaname"
what is happening here :
step 1 basically read the file and source it in new shell environment . then it will echo the function declaration
step 2 will eval that function into our main code
So final result is as if we wrote just that function in our main script
When the functions in the script indent untill the closing } and all start with the keyword function, you can include specific functions without changing the original files:
largeshell.sh
#!/bin/bash
function demo1 {
echo "d1"
}
function demo2 {
echo "d2"
}
function demo3 {
echo "d3"
}
function demo4 {
echo "d4"
}
echo "Main code of largeshell... "
demo2
Now show how to source demo1() and forget demo4():
source <(sed -n '/^function demo1 /,/^}/p' largeshell.sh)
source <(sed -n '/^function demo3 /,/^}/p' largeshell.sh)
demo1
demo4
Or source all functions in a loop:
for f in demo1 demo3; do
echo sourcing $f
source <(sed -n '/^function '$f' /,/^}/p' largeshell.sh)
done
demo1
demo4
You can make it more fancy when you source a special script that will:
grep all strings starting with largeshell., like largefile.demo1
generate functions like largefile.demo1 that will call demo1
and source all functions that are called.
Your new script will look like
source function_includer.sh
largeshell.demo1
largeshell.demo4
EDIT:
You might want to reconsider your requirements.
Above solution is not only slow, but it will also make it hard for the
guys and ladies who made tha largeshell.sh. As soon as they are going to refactor their code or replace it with something in another language,
they have to refactor, test and deploy your code as well.
A better path is extracting the functions from largeshell.sh into some smaller files ("modules"), and put them in a shared directory (shlib?).
With names as sqlutil.sh, datetime.sh, formatting.sh, mailstuff.sh and comm.sh you can pick the includes file you need (and largefile.sh will include them all).
It's been a while and it would appear that my original solution is the best one out there. Thanks for the feedback.

How can I edit a .conf file easily?

So I read the easiest way to use .conf files for bash scripts is to use source to load such files. Now, what if I want to edit this file ?
Some code I found does a really good job :
function set_config(){
sed -i "s/^\($1\s*=\s*\).*\$/\1$2/" $conf_file
}
But, if the variable is not yet defined, it doesn't define it, nor does it check if the parameters are passed well, isn't secure, doesn't handle default values etc...
Does reliable tools/code already exists to edit .conf file which contain key="value" pairs ? For instance, I would like to be able to do things like this :
$conf_file="my_script.conf"
conf_load $conf_file #should create the file if it doesn't exist !
read=$(conf_get_value "data" "default_value") #should read the value with key "data", defaulting to "default_value"
if [[ $? = 0 ]] #we should be able to know if the read was successful
then
echo "Successfully read value for field \"data\" : $read"
else
echo "Default value for field \"data\" : $read"
fi
conf_set "something_new" "a great value!" #should add the key "something_new" as it doesn't exist
conf_set "data" "new_value" #should edit the value with key "data"
if [[ $? = 0 ]]
then
echo "Edit successful !"
else #something went wrong :-/
echo "Edit failed !"
fi
before running this code, the conf file would contain
data="some_value"
and after it would be
data="new_value"
something_new="a great value!"
and the code should output
Successfully read value for field "data" : some_value
Edit successful !
I am using bash version 4.3.30 .
Thanks for your help.
I'd to that with awk since it's rather good at tokenizing:
# overwrite config's entries for KEY with VALUE or else appends the definition
# Usage: set_config KEY VALUE
set_config() {
[ -n "$1" ] && awk -F= -v key="$1" -v new="$1=\"$2\"" '
$1 == key { $0 = new; key_found = 1; }
{ print }
END { if (!key_found) { print new; }
' "$conf_file" > "$conf_file.new" \
&& cat "$conf_file.new" > "$conf_file" && rm "$conf_file.new"
}
If run without arguments, set_config() will do nothing and return false. If run with only one argument, it will create an empty value (outputting KEY="").
The awk command parses the .conf file line by line, looking for each definition of the given key and altering it to the new value. All lines are then printed (with or without modification), preserving the original order. If the key hasn't yet been found by the end of the file, this appends the new definition.
Because you can't pipe a file atop itself, this gets saved with a ".new" extension and then copied atop the original in a manner that preserves permissions. The ".new" copy is then removed. I used && to ensure that these never happen if an error occurred earlier in the function.
Also note that the type of ".conf file" you're referring to (the type you source with a POSIX shell) will never have spaces around its equals signs, so the \s* parts of your sed command aren't needed.

Resources