How to recall a string in shell script - shell

I made a script like this:
#! /usr/bin/bash
a=`ls ../wrfprd/wrfout_d0${i}* | cut -c22-25`
b=`ls ../wrfprd/wrfout_d0${i}* | cut -c27-28`
c=`ls ../wrfprd/wrfout_d0${i}* | cut -c30-31`
d=`ls ../wrfprd/wrfout_d0${i}* | cut -c33-34`
f=$a$b$c$d
echo $f
sed "s/.* startdate=.*/export startdate=${f}/g" ./post_process > post_process2
echo command works and gives 2008042118 that is what I want but in file post_process2 is like this export startdate= and can not recall variable f. I want to produce a line like export startdate=2008042118

First -- don't use ls here -- it's both expensive in terms of performance (compared to globbing, which is performed internal to the shell without starting any external programs), and doesn't guarantee useful output for the full range of possible filenames, making its use in this context inherently bug-prone. A better way to retrieve pieces from a filename, assuming a ksh-derived shell such as bash or zsh, would look like this:
#!/bin/bash
# this is an array, but we're only going to use the first element
file=( "../wrfprd/wrfout_d0${i}"* )
[[ -e $file ]] || { echo "No file found" >&2; exit 1; }
f=${file:22:4}${file:27:2}${file:30:2}${file:33:2}
Second, don't use sed to modify code -- doing so requires that your runtime user have permission to modify its own code, and moreover invites injection vulnerabilities. Just write your content out to a data file:
printf '%s\n' "$f" >startdate.txt
...and, in your second script, to read in the value from that file:
# if the shebang is #!/bin/bash
startdate=$(<startdate.txt)
# if the shebang is #!/bin/sh
startdate=$(cat startdate.txt)

Related

Some tips to improve a bash script for count fastq files

Hi guys I got this bash one line that i wish to make a script
for i in 'ls *.fastq.gz'; do echo $(zcat ${i} | wc -l)/4|bc; done
I would like to make it as a script to read from a data dir and print out the result with the name of the file.
I tried to put the dir in front of the 'data/*.fastq.gz' but got am error No such dir exist...
I would like some like this:
name1.fastq.gz 1898516
name2.fastq.gz 2467421
namen.fastq.gz 1234532
I am not experienced in bash.
Could you guys give a help?
Thanks
Take the dir as an argument, but default to the current dir if it's not set.
dir="${1-.}"
Then put it in the glob: "$dir"/*.fastq.gz
As well:
Quote variables and command expansions.
Don't parse ls.
Don't trust echo with arbitrary data (filenames). Use printf instead.
Use an end-of-options flag -- when giving filenames to commands.
I prefer to not have any inline command expansions, but that's just personal preference
Putting it together:
#!/bin/bash
dir="${1-.}"
for file in "$dir"/*.fastq.gz; do
printf '%s ' "$file"
lines="$(zcat -- "$file" | wc -l)"
bc <<< "$lines/4" # Using a here-string (Bash feature)
done
There is no need to escape to bc for integer math (divide by 4), or to use 'ls' to enumerate the files. The original version will do with minor changes:
#!/bin/bash
dir="${1-.}"
for i in "$dir"/*.fastq.gz; do
lines=$(zcat "${i}" | wc -l)
printf '%s %d\n' "$i" "$((lines/4))"
done

Correct syntax and usage of `cat` command?

(This question is a follow-up on this comment, in an answer about git hooks)
I'm far too unskilled in bash (so far) to understand fully the remark and how to act accordingly. More specifically, I've been advised to avoid using bash command cat this way :
echo "$current_branch" $(cat "$1") > "$1"
because the order of operations depends on the specific shell and it could end up destroying the contents of the passed argument, so the commit message itself if I got it right?
Also, how to "save the contents in a separate step"?
Would the following make any sense?
tmp = "$1"
echo "$current_branch" $(cat $tmp) > "$1"
The proposed issue is not about overwriting variables or arguments, but about the fact that both reading from and writing to a file at the same time is generally a bad idea.
For example, this command may look like it will just write a file to itself, but instead it truncates it:
cat myfile > myfile # Truncates the file to size 0
However, this is not a problem in your specific command. It is guaranteed to work in a POSIX compliant shell because the order of operations specify that redirections will happen after expansions:
The words that are not variable assignments or redirections shall be expanded. If any fields remain following their expansion, the first field shall be considered the command name and remaining fields are the arguments for the command.
Redirections shall be performed as described in Redirection.
Double-however, it's still a bit fragile in the sense that seemingly harmless modifications may trigger the problem, such as if you wanted to run sed on the result. Since the redirection (> "$1") and command substitution $(cat "$1") are now in separate commands, the POSIX definition no longer saves you:
# Command may now randomly result in the original message being deleted
echo "$current_branch $(cat "$1")" | sed -e 's/(c)/©/g' > "$1"
Similarly, if you refactor it into a function, it will also suddenly stop working:
# Command will now always delete the original message
modify_message() {
echo "$current_branch $(cat "$1")"
}
modify_message "$1" > "$1"
You can avoid this by writing to a temporary file, and then replace your original.
tmp=$(mktemp) || exit
echo "$current_branch $(cat "$1")" > "$tmp"
mv "$tmp" "$1"
In my opinion, it's better to save to another file.
You may try something like
echo "$current_branch" > tmp
cat "$1" >> tmp # merge these into
# echo "$current_branch" $(cat "$1") > tmp
# may both OK
mv tmp "$1"
However I am not sure if my understanding is right, or there are some better solutions.
This is what I considered as the core of question. It is hard to decide the "precedence" of $() block and >. If > is executed "earlier", then echo "$current_branch" will rewrite "$1" file and drop the original content of "$1", which is a disaster. If $() is executed "earlier", then everything works as expected. However, there exists a risk, and we should avoid it.
A command group would be far better than a command substitution here. Note the similarity to Geno Chen's answer.
{
echo "$current_branch"
cat "$1"
} > tmp && mv tmp "$1"

Making bash script with command already containing '$1'

Somewhere I found this command that sorts lines in an input file by number of characters(1st order) and alphabetically (2nd order):
while read -r l; do echo "${#l} $l"; done < input.txt | sort -n | cut -d " " -f 2- > output.txt
It works fine but I would like to use the command in a bash script where the name of the file to be sorted is an argument:
& cat numbersort.sh
#!/bin/sh
while read -r l; do echo "${#l} $l"; done < $1 | sort -n | cut -d " " -f 2- > sorted-$1
Entering numbersort.sh input-txt doesn't give the desired result, probably because $1 is already in using as an argument for something else.
How do I make the command work in a shell script?
There's nothing wrong with your original script when used with simple arguments that don't involve quoting issues. That said, there are a few bugs addressed in the below version:
#!/bin/bash
while IFS= read -r line; do
printf '%d %s\n' "${#line}" "$line"
done <"$1" | sort -n | cut -d " " -f 2- >"sorted-$1"
Use #!/bin/bash if your goal is to write a bash script; #!/bin/sh is the shebang for POSIX sh scripts, not bash.
Clear IFS to avoid pruning leading and trailing whitespace from input and output lines
Use printf rather than echo to avoid ambiguities in the POSIX standard (see http://pubs.opengroup.org/onlinepubs/009604599/utilities/echo.html, particularly APPLICATION USAGE and RATIONALE sections).
Quote expansions ("$1" rather than $1) to prevent them from being word-split or glob-expanded
Note also that this creates a new file rather than operating in-place. If you want something that operates in-place, tack a && mv -- "sorted-$1" "$1" on the end.

Processing arguments from file/cli/stdin in bash

I can see myself ending up writing a lot of scripts which do some thing based on some arguments on the command line.
This then progresses to doing more or less the same thing multiple times automated with a scheduler.
To prevent myself having to create a new job for each variation on the arguments, I would like to create a simple script skeleton which I can use to quickly create scripts which take the same arguments from:
The command line
A file from a path specified on the command line
From stdin until eof
My initial approach for taking arguments or config from a TAB delim file was as follows:
if [ -f "$1" ]; then
echo "Using config file '$1'"
IFS=' '
cat $1 | grep -v "^#" | while read line; do
if [ "$line" != "" ]; then
echo $line
#call fn with line as args
fi
done
unset IFS
elif [ -d "$1" ]; then
echo "Using cli arguments..."
#call fn with $1 $2 $3 etc...
else
echo "Read from stdin, ^d will terminate"
IFS=' '
while read line; do
if [ "$(echo $line | grep -v "^#")" != "" ]; then
#call fn with line as args
fi
done
unset IFS
fi
So to all those who have doubtless done this kind of thing before:
How did/would you go about it?
Am I being too procedural - could this be better done with awk or similar?
Is this the best approach anyway?
Not sure whether I'm a bit wide of the mark, but it sounds like you are trying to reinvent xargs.
If you have a script, normally invoked as such
$ your_script.sh -d foo bar baz
You can get the parameters from stdin as follows:
$ xargs your_script.sh
-d foo
bar
baz
^D
Our from a file
$ cat config_file | xargs your_script.sh
(assuming that config_file has the following content)
-d foo bar
baz
Or from multiple config files
$ cat config_file1 config_file2 | xargs your_script.sh
Can you think of a standard Unix utility that behaves as you describe? (No, I can't.) That suggests that you are slightly off-target with your goal.
The testing of -f "$1" and -d "$1" is not conventional, but if your script only works on directories, maybe it makes sense.
Ultimately, I think you need an interface like:
your_cmd [-f argumentlist] [file ...]
The explicit but optional -f argumentlist allows you to specify the file to read from on the command line. Otherwise, the files specified on the command line are processed, unless there are no such arguments, in which case the file names to be processed are read from standard input. This is a lot closer to a conventional organization. We can debate about the handling of file names with spaces and newlines in the names some other time.
The core of your code will be written to accept/process one file name at a time. This might be written as a shell function, which allows the maximum reuse.
while getopts f: opt
do
case $opt in
(f) while read file; do shell_function $file; done < $OPTARG; exit 0;;
(*) : Error handling etc;;
esac
done
shift $(($OPTIND - 1))
case $# in
(0) while read file; do shell_function $file; done; exit 0;;
(*) for file in "$#"; do shell_function $file; done; exit 0;;
esac
It is not very hard to ring the variations on this. It is also tolerably compact.

Check execute command after cheking file type

I am working on a bash script which execute a command depending on the file type. I want to use the the "file" option and not the file extension to determine the type, but I am bloody new to this scripting stuff, so if someone can help me I would be very thankful! - Thanks!
Here the script I want to include the function:
#!/bin/bash
export PrintQueue="/root/xxx";
IFS=$'\n'
for PrintFile in $(/bin/ls -1 ${PrintQueue}) do
lpr -r ${PrintQueue}/${PrintFile};
done
The point is, all files which are PDFs should be printed with the lpr command, all others with ooffice -p
You are going through a lot of extra work. Here's the idiomatic code, I'll let the man page provide the explanation of the pieces:
#!/bin/sh
for path in /root/xxx/* ; do
case `file --brief $path` in
PDF*) cmd="lpr -r" ;;
*) cmd="ooffice -p" ;;
esac
eval $cmd \"$path\"
done
Some notable points:
using sh instead of bash increases portability and narrows the choices of how to do things
don't use ls when a glob pattern will do the same job with less hassle
the case statement has surprising power
First, two general shell programming issues:
Do not parse the output of ls. It's unreliable and completely useless. Use wildcards, they're easy and robust.
Always put double quotes around variable substitutions, e.g. "$PrintQueue/$PrintFile", not $PrintQueue/$PrintFile. If you leave the double quotes out, the shell performs wildcard expansion and word splitting on the value of the variable. Unless you know that's what you want, use double quotes. The same goes for command substitutions $(command).
Historically, implementations of file have had different output formats, intended for humans rather than parsing. Most modern implementations have an option to output a MIME type, which is easily parseable.
#!/bin/bash
print_queue="/root/xxx"
for file_to_print in "$print_queue"/*; do
case "$(file -i "$file_to_print")" in
application/pdf\;*|application/postscript\;*)
lpr -r "$file_to_print";;
application/vnd.oasis.opendocument.*)
ooffice -p "$file_to_print" &&
rm "$file_to_print";;
# and so on
*) echo 1>&2 "Warning: $file_to_print has an unrecognized format and was not printed";;
esac
done
#!/bin/bash
PRINTQ="/root/docs"
OLDIFS=$IFS
IFS=$(echo -en "\n\b")
for file in $(ls -1 $PRINTQ)
do
type=$(file --brief $file | awk '{print $1}')
if [ $type == "PDF" ]
then
echo "[*] printing $file with LPR"
lpr "$file"
else
echo "[*] printing $file with OPEN-OFFICE"
ooffice -p "$file"
fi
done
IFS=$OLDIFS

Resources