I can see myself ending up writing a lot of scripts which do some thing based on some arguments on the command line.
This then progresses to doing more or less the same thing multiple times automated with a scheduler.
To prevent myself having to create a new job for each variation on the arguments, I would like to create a simple script skeleton which I can use to quickly create scripts which take the same arguments from:
The command line
A file from a path specified on the command line
From stdin until eof
My initial approach for taking arguments or config from a TAB delim file was as follows:
if [ -f "$1" ]; then
echo "Using config file '$1'"
IFS=' '
cat $1 | grep -v "^#" | while read line; do
if [ "$line" != "" ]; then
echo $line
#call fn with line as args
fi
done
unset IFS
elif [ -d "$1" ]; then
echo "Using cli arguments..."
#call fn with $1 $2 $3 etc...
else
echo "Read from stdin, ^d will terminate"
IFS=' '
while read line; do
if [ "$(echo $line | grep -v "^#")" != "" ]; then
#call fn with line as args
fi
done
unset IFS
fi
So to all those who have doubtless done this kind of thing before:
How did/would you go about it?
Am I being too procedural - could this be better done with awk or similar?
Is this the best approach anyway?
Not sure whether I'm a bit wide of the mark, but it sounds like you are trying to reinvent xargs.
If you have a script, normally invoked as such
$ your_script.sh -d foo bar baz
You can get the parameters from stdin as follows:
$ xargs your_script.sh
-d foo
bar
baz
^D
Our from a file
$ cat config_file | xargs your_script.sh
(assuming that config_file has the following content)
-d foo bar
baz
Or from multiple config files
$ cat config_file1 config_file2 | xargs your_script.sh
Can you think of a standard Unix utility that behaves as you describe? (No, I can't.) That suggests that you are slightly off-target with your goal.
The testing of -f "$1" and -d "$1" is not conventional, but if your script only works on directories, maybe it makes sense.
Ultimately, I think you need an interface like:
your_cmd [-f argumentlist] [file ...]
The explicit but optional -f argumentlist allows you to specify the file to read from on the command line. Otherwise, the files specified on the command line are processed, unless there are no such arguments, in which case the file names to be processed are read from standard input. This is a lot closer to a conventional organization. We can debate about the handling of file names with spaces and newlines in the names some other time.
The core of your code will be written to accept/process one file name at a time. This might be written as a shell function, which allows the maximum reuse.
while getopts f: opt
do
case $opt in
(f) while read file; do shell_function $file; done < $OPTARG; exit 0;;
(*) : Error handling etc;;
esac
done
shift $(($OPTIND - 1))
case $# in
(0) while read file; do shell_function $file; done; exit 0;;
(*) for file in "$#"; do shell_function $file; done; exit 0;;
esac
It is not very hard to ring the variations on this. It is also tolerably compact.
Related
This part of my script is comparing each line of a file to find a preset string. If the string does NOT exist as a line in the file, it should append it to the end of the file.
STRING=foobar
cat "$FILE" | while read LINE
do
if [ "$STRING" == "$LINE" ]; then
export ISLINEINFILE="yes"
fi
done
if [ ! "$ISLINEINFILE" == yes ]; then
echo "$LINE" >> "$FILE"
fi
However, it appears as if both $LINE and $ISLINEINFILE are both cleared upon finishing the do loop. How can I avoid this?
Using shell
If we want to make just the minimal change to your code to get it working, all we need to do is switch the input redirection:
string=foobar
while read line
do
if [ "$string" == "$line" ]; then
islineinfile="yes"
fi
done <"$file"
if [ ! "$islineinfile" == yes ]; then
echo "$string" >> "$file"
fi
In the above, we changed cat "$file" | while do ...done to while do...done<"$file". With this one change, the while loop is no longer in a subshell and, consequently, shell variables created in the loop live on after the loop completes.
Using sed
I believe that the whole of your script can be replaced with:
sed -i.bak '/^foobar$/H; ${x;s/././;x;t; s/$/\nfoobar/}' file*
The above adds line foobar to the end of each file that doesn't already have a line that matches ^foobar$.
The above shows file* as the final argument to sed. This will apply the change to all files matching the glob. You could list specific files individually if you prefer.
The above was tested on GNU sed (linux). Minor modifications may be needed for BSD/OSX sed.
Using GNU awk (gawk)
awk -i inplace -v s="foobar" '$0==s{f=1} {print} ENDFILE{if (f==0) print s; f=0}' file*
Like the sed command, this can tackle multiple files all in one command.
Why does my variable set in a do loop disappear?
It disappears because it is set in a shell pipeline component. Most shells run each part of a pipeline in a subshell. By Unix design, variables set in a subshell cannot affect their parent or any already running other shell.
How can I avoid this?
There are several ways:
The simplest is to use a shell that doesn't run the last component of a pipeline in a subshell. This is ksh default behavior, e.g. use that shebang:
#!/bin/ksh
This behavior can also be bash one when the lastpipe option is set:
shopt -s lastpipe
You might use the variable in the same subshell that set it. Note that your original script indentation is wrong and might lead to the incorrect assumption that the if block is inside the pipeline, which isn't the case. Enclosing the whole block with parentheses will rectify that and would be the minimal change (two extra characters) to make it working:
STRING=foobar
cat "$FILE" | ( while read LINE
do
if [ "$STRING" == "$LINE" ]; then
export ISLINEINFILE="yes"
fi
done
if [ ! "$ISLINEINFILE" == yes ]; then
echo "$LINE" >> "$FILE"
fi
)
The variable would still be lost after that block though.
You might simply avoid the pipeline, which is straigthforward in your case, the cat being unnecessary:
STRING=foobar
while read LINE
do
if [ "$STRING" == "$LINE" ]; then
export ISLINEINFILE="yes"
fi
done < "$FILE"
if [ ! "$ISLINEINFILE" == yes ]; then
echo "$LINE" >> "$FILE"
fi
You might use another argorithmic approach, like using sed or gawk as suggested by John1024.
See also https://unix.stackexchange.com/a/144137/2594 for standard compliance details.
I made a script like this:
#! /usr/bin/bash
a=`ls ../wrfprd/wrfout_d0${i}* | cut -c22-25`
b=`ls ../wrfprd/wrfout_d0${i}* | cut -c27-28`
c=`ls ../wrfprd/wrfout_d0${i}* | cut -c30-31`
d=`ls ../wrfprd/wrfout_d0${i}* | cut -c33-34`
f=$a$b$c$d
echo $f
sed "s/.* startdate=.*/export startdate=${f}/g" ./post_process > post_process2
echo command works and gives 2008042118 that is what I want but in file post_process2 is like this export startdate= and can not recall variable f. I want to produce a line like export startdate=2008042118
First -- don't use ls here -- it's both expensive in terms of performance (compared to globbing, which is performed internal to the shell without starting any external programs), and doesn't guarantee useful output for the full range of possible filenames, making its use in this context inherently bug-prone. A better way to retrieve pieces from a filename, assuming a ksh-derived shell such as bash or zsh, would look like this:
#!/bin/bash
# this is an array, but we're only going to use the first element
file=( "../wrfprd/wrfout_d0${i}"* )
[[ -e $file ]] || { echo "No file found" >&2; exit 1; }
f=${file:22:4}${file:27:2}${file:30:2}${file:33:2}
Second, don't use sed to modify code -- doing so requires that your runtime user have permission to modify its own code, and moreover invites injection vulnerabilities. Just write your content out to a data file:
printf '%s\n' "$f" >startdate.txt
...and, in your second script, to read in the value from that file:
# if the shebang is #!/bin/bash
startdate=$(<startdate.txt)
# if the shebang is #!/bin/sh
startdate=$(cat startdate.txt)
Somewhere I found this command that sorts lines in an input file by number of characters(1st order) and alphabetically (2nd order):
while read -r l; do echo "${#l} $l"; done < input.txt | sort -n | cut -d " " -f 2- > output.txt
It works fine but I would like to use the command in a bash script where the name of the file to be sorted is an argument:
& cat numbersort.sh
#!/bin/sh
while read -r l; do echo "${#l} $l"; done < $1 | sort -n | cut -d " " -f 2- > sorted-$1
Entering numbersort.sh input-txt doesn't give the desired result, probably because $1 is already in using as an argument for something else.
How do I make the command work in a shell script?
There's nothing wrong with your original script when used with simple arguments that don't involve quoting issues. That said, there are a few bugs addressed in the below version:
#!/bin/bash
while IFS= read -r line; do
printf '%d %s\n' "${#line}" "$line"
done <"$1" | sort -n | cut -d " " -f 2- >"sorted-$1"
Use #!/bin/bash if your goal is to write a bash script; #!/bin/sh is the shebang for POSIX sh scripts, not bash.
Clear IFS to avoid pruning leading and trailing whitespace from input and output lines
Use printf rather than echo to avoid ambiguities in the POSIX standard (see http://pubs.opengroup.org/onlinepubs/009604599/utilities/echo.html, particularly APPLICATION USAGE and RATIONALE sections).
Quote expansions ("$1" rather than $1) to prevent them from being word-split or glob-expanded
Note also that this creates a new file rather than operating in-place. If you want something that operates in-place, tack a && mv -- "sorted-$1" "$1" on the end.
I use this loop to iterate through files or, if there are no files, read stdin:
#!/bin/bash
set -e
cat "$#" | while read arr ; do
echo "Got this line ${arr}"
done
The problem is that if the file doesn't exist, it doesn't error out.
You can see the full example here:
https://github.com/StackExchange/blackbox/blob/master/tools/mk_rpm_fpmdir
cat does return error code 1 if any file is not found. However the error doesn't cause the program to stop.
How can I iterate through $# and fail if a file is not found?
There are several available options. One is to simply ditch cat, and thus have no pipeline (which has the advantageous side effect of avoiding subshell use, and thus allowing changes to shell state -- variables set, etc -- to last beyond the inner loop):
set -e
for f; do # in "$#" is implicit
while read -r -a arr; do
printf 'Got line: '
printf '%q ' "${arr[#]}"
printf '\n'
done <"$f"
done
The above also goes out of its way to print the content read in a way that entirely preserves the array read, and prints its contents unambiguously (distinguishing an array containing two elements foo bar baz from one containing three elements foo bar baz).
Another is to set pipefail, which will cause a pipeline to be considered failed if any component of a pipeline returns a nonzero exit status, including cat:
set -e
set -o pipefail
cat "$#" | while IFS= read -r; do
printf 'Got line: %q\n' "$REPLY"
done
This works by overriding the default behavior by which only the right-hand side of a pipeline matters for purposes of considering the overall exit status of the command.
for f in "${#}"; do
test -f "${f}" || exit # fail if file doesn't exist
cat "${f}" | awk '{print "Got this line" $0}'
done
I am working on a bash script which execute a command depending on the file type. I want to use the the "file" option and not the file extension to determine the type, but I am bloody new to this scripting stuff, so if someone can help me I would be very thankful! - Thanks!
Here the script I want to include the function:
#!/bin/bash
export PrintQueue="/root/xxx";
IFS=$'\n'
for PrintFile in $(/bin/ls -1 ${PrintQueue}) do
lpr -r ${PrintQueue}/${PrintFile};
done
The point is, all files which are PDFs should be printed with the lpr command, all others with ooffice -p
You are going through a lot of extra work. Here's the idiomatic code, I'll let the man page provide the explanation of the pieces:
#!/bin/sh
for path in /root/xxx/* ; do
case `file --brief $path` in
PDF*) cmd="lpr -r" ;;
*) cmd="ooffice -p" ;;
esac
eval $cmd \"$path\"
done
Some notable points:
using sh instead of bash increases portability and narrows the choices of how to do things
don't use ls when a glob pattern will do the same job with less hassle
the case statement has surprising power
First, two general shell programming issues:
Do not parse the output of ls. It's unreliable and completely useless. Use wildcards, they're easy and robust.
Always put double quotes around variable substitutions, e.g. "$PrintQueue/$PrintFile", not $PrintQueue/$PrintFile. If you leave the double quotes out, the shell performs wildcard expansion and word splitting on the value of the variable. Unless you know that's what you want, use double quotes. The same goes for command substitutions $(command).
Historically, implementations of file have had different output formats, intended for humans rather than parsing. Most modern implementations have an option to output a MIME type, which is easily parseable.
#!/bin/bash
print_queue="/root/xxx"
for file_to_print in "$print_queue"/*; do
case "$(file -i "$file_to_print")" in
application/pdf\;*|application/postscript\;*)
lpr -r "$file_to_print";;
application/vnd.oasis.opendocument.*)
ooffice -p "$file_to_print" &&
rm "$file_to_print";;
# and so on
*) echo 1>&2 "Warning: $file_to_print has an unrecognized format and was not printed";;
esac
done
#!/bin/bash
PRINTQ="/root/docs"
OLDIFS=$IFS
IFS=$(echo -en "\n\b")
for file in $(ls -1 $PRINTQ)
do
type=$(file --brief $file | awk '{print $1}')
if [ $type == "PDF" ]
then
echo "[*] printing $file with LPR"
lpr "$file"
else
echo "[*] printing $file with OPEN-OFFICE"
ooffice -p "$file"
fi
done
IFS=$OLDIFS