Correct syntax and usage of `cat` command? - bash

(This question is a follow-up on this comment, in an answer about git hooks)
I'm far too unskilled in bash (so far) to understand fully the remark and how to act accordingly. More specifically, I've been advised to avoid using bash command cat this way :
echo "$current_branch" $(cat "$1") > "$1"
because the order of operations depends on the specific shell and it could end up destroying the contents of the passed argument, so the commit message itself if I got it right?
Also, how to "save the contents in a separate step"?
Would the following make any sense?
tmp = "$1"
echo "$current_branch" $(cat $tmp) > "$1"

The proposed issue is not about overwriting variables or arguments, but about the fact that both reading from and writing to a file at the same time is generally a bad idea.
For example, this command may look like it will just write a file to itself, but instead it truncates it:
cat myfile > myfile # Truncates the file to size 0
However, this is not a problem in your specific command. It is guaranteed to work in a POSIX compliant shell because the order of operations specify that redirections will happen after expansions:
The words that are not variable assignments or redirections shall be expanded. If any fields remain following their expansion, the first field shall be considered the command name and remaining fields are the arguments for the command.
Redirections shall be performed as described in Redirection.
Double-however, it's still a bit fragile in the sense that seemingly harmless modifications may trigger the problem, such as if you wanted to run sed on the result. Since the redirection (> "$1") and command substitution $(cat "$1") are now in separate commands, the POSIX definition no longer saves you:
# Command may now randomly result in the original message being deleted
echo "$current_branch $(cat "$1")" | sed -e 's/(c)/©/g' > "$1"
Similarly, if you refactor it into a function, it will also suddenly stop working:
# Command will now always delete the original message
modify_message() {
echo "$current_branch $(cat "$1")"
}
modify_message "$1" > "$1"
You can avoid this by writing to a temporary file, and then replace your original.
tmp=$(mktemp) || exit
echo "$current_branch $(cat "$1")" > "$tmp"
mv "$tmp" "$1"

In my opinion, it's better to save to another file.
You may try something like
echo "$current_branch" > tmp
cat "$1" >> tmp # merge these into
# echo "$current_branch" $(cat "$1") > tmp
# may both OK
mv tmp "$1"
However I am not sure if my understanding is right, or there are some better solutions.
This is what I considered as the core of question. It is hard to decide the "precedence" of $() block and >. If > is executed "earlier", then echo "$current_branch" will rewrite "$1" file and drop the original content of "$1", which is a disaster. If $() is executed "earlier", then everything works as expected. However, there exists a risk, and we should avoid it.

A command group would be far better than a command substitution here. Note the similarity to Geno Chen's answer.
{
echo "$current_branch"
cat "$1"
} > tmp && mv tmp "$1"

Related

An escaping of the variable within bash script

My bash script writes an another bash script using printf.
printf "#!/bin/bash
HOME=${server}
file=gromacs*
file_name=\$(basename "\${file}")
date=\$(date +"\%m_\%d_\%Y")
for sim in \${HOME}/* ; do
if [[ -d \$sim ]]; then
simulation=$(basename "\$sim")
pushd \${sim}
cp \$file \${server}/\${results}/\${file_name}.\${simulation}.\${date}
echo "\${file_name}\ from\ \${simulation}\ has\ been\ collected!"
popd
fi
done" > ${output}/collecter.sh
Here there is a problem in escappiong of the elements within date variable
date=\$(date +"\%m_\%d_\%Y")
where the below part did not work properly
"\%m_\%d_\%Y"
it results in incomplete of a new bash script produced by printf.
How it should be fixed?
Thanks!
Use a quoted heredoc.
{
## print the header, and substitute our own value for HOME
printf '#!/bin/bash\nHOME=%q\n' "$server"
## EVERYTHING BELOW HERE UNTIL THE EOF IS LITERAL
cat <<'EOF'
file=( gromacs* )
(( ${#file[#]} == 1 )) && [[ -e $file ]] || {
echo "ERROR: Exactly one file starting with 'gromacs' should exist" >&2
exit 1
}
file_name=$(basename "${file}")
date=$(date +"%m_%d_%Y")
for sim in "$HOME"/* ; do
if [[ -d $sim ]]; then
simulation=$(basename "$sim")
(cd "${sim}" && exec cp "$file" "${server}/${results}/${file_name}.${simulation}.${date}")
echo "${file_name} from ${simulation} has been collected!"
fi
done
EOF
} >"${output}/collecter.sh"
Note:
Inside a quoted heredoc (cat <<'EOF'), no substitutions are performed, so no escaping is needed. We're thus able to write our code exactly as we want it to exist in the generated file.
When generating code, use printf %q to escape values in such a way as to evaluate back to their original values. Otherwise, a variable containing $(rm -rf ~) could cause the given command to be run (if it were substituted inside literal single quotes, making the contents $(rm -rf ~)'$(rm -rf ~)' would escape them).
Glob expansions return a list of results; the proper data type in which to store their results is an array, not a string. Thus, file=( gromacs* ) makes the storage of the result in an array explicit, and the following code checks for both the case where we have more than one result, and the case where our result is the original glob expression (meaning no matches existed).
All expansions need to be quoted to prevent string-splitting. This means "$HOME"/*, not $HOME/* -- otherwise you'll have problems whenever a user has a home directory containing whitespace (and yes, this does happen -- consider Windows-derived platforms where you have /Users/Firstname Lastname, or sites where you've mounted a volume for home directories off same).
pushd and popd are an interactive extension, as opposed to a tool intended for writing scripts. Since spawning an external program (such as cp) involves a fork() operation, and any directory change inside a subshell terminates when that subshell does, you can avoid any need for them by spawning a subshell, cd'ing within that subshell, and then using the exec builtin to replace the subshell's PID with that of cp, thus preventing the fork that would otherwise have taken place to allow cp to be started in a separate process from the shell acting as its parent.
You have to escapes in printf with esscaps, e. g.:
date=\$(date +"%%m_%%d_%%Y")
should print date=\$(date +"%m_%d_%Y"). But you should avoid using printf, because you don't use it's capabilities. Instead you could cat the string to the file:
cat > ${output}/collecter.sh <<'END'
HOME=${server}
...
done
END
This would allow you to avoid many escapes, and make the code more readable.
Try this
date=\$(date +\"%%m_%%d_%%Y\")"

How to recall a string in shell script

I made a script like this:
#! /usr/bin/bash
a=`ls ../wrfprd/wrfout_d0${i}* | cut -c22-25`
b=`ls ../wrfprd/wrfout_d0${i}* | cut -c27-28`
c=`ls ../wrfprd/wrfout_d0${i}* | cut -c30-31`
d=`ls ../wrfprd/wrfout_d0${i}* | cut -c33-34`
f=$a$b$c$d
echo $f
sed "s/.* startdate=.*/export startdate=${f}/g" ./post_process > post_process2
echo command works and gives 2008042118 that is what I want but in file post_process2 is like this export startdate= and can not recall variable f. I want to produce a line like export startdate=2008042118
First -- don't use ls here -- it's both expensive in terms of performance (compared to globbing, which is performed internal to the shell without starting any external programs), and doesn't guarantee useful output for the full range of possible filenames, making its use in this context inherently bug-prone. A better way to retrieve pieces from a filename, assuming a ksh-derived shell such as bash or zsh, would look like this:
#!/bin/bash
# this is an array, but we're only going to use the first element
file=( "../wrfprd/wrfout_d0${i}"* )
[[ -e $file ]] || { echo "No file found" >&2; exit 1; }
f=${file:22:4}${file:27:2}${file:30:2}${file:33:2}
Second, don't use sed to modify code -- doing so requires that your runtime user have permission to modify its own code, and moreover invites injection vulnerabilities. Just write your content out to a data file:
printf '%s\n' "$f" >startdate.txt
...and, in your second script, to read in the value from that file:
# if the shebang is #!/bin/bash
startdate=$(<startdate.txt)
# if the shebang is #!/bin/sh
startdate=$(cat startdate.txt)

Get first line of stdin which is not in a file

I am trying to write a function in a bash script that gets lines from stdin and picks out the first line which is not contained in a file.
Here is my approach:
doubles=file.txt
firstnotdouble(){
while read input_line; do
found=0;
cat $doubles |
while read double_line; do
if [ "$input_line" = "$double_line" ]
then
found=1;
break
fi
done
if [ $found -eq 0 ] # no double found, echo and break!
then
echo $input_line
break
fi
done
}
After some debugging attempts I realized that when found is set to 1 in the first if block, it does not keep its value until the next if block. That's why it's not working. Why does the script act as if there were two found variables in different "scopes"?
The second question would be if the approach as a whole could be optimized.
As indicated in the comments, the issue with environment variables is that the commands in a pipeline (that is, a series of commands separated by |) run in subshells, and each subshell has its own environment variables. You could have avoided the problem by avoiding the UUOC (useless use of cat), writing:
while read ...; do ... done < "$doubles"
instead of the pipeline.
A (much) faster way than using a while read loop repeatedly through the doubles file is to use grep:
# Specify the file to be scanned as the first argument
firstnotdouble() {
while IFS= read -r double_line; do
if ! grep -qxF "$double_line" "$1"; then
echo "$double_line"
return
fi
done
return 1
}
In the grep:
-q suppress print out, and stop on first match
-x pattern must match the entire line
-F pattern is a simple string instead of a regular expression.
In the read:
IFS= avoids spaces being trimmed
-r avoids backslashes being deleted
With GNU grep, you could use -xF -m1 (or even -xFm1 if you like being cryptic) instead of -qxF, and then leave out the echo. The grep extension -m N limits the number of matches found to N.

Pass CAT command as argument to function

I'm trying to write a very simple bash script that modifies a number of files, and I'm outputting the results of each command to a log as a check whether the command was completed successfully. Everything appears to be working except I can't pass CAT with variables to my script -- I keep getting a cat: >>: No such file or directory error.
#! /bin/bash
file1="./file1"
file2="./file2"
check () {
if ( $1 > /dev/null ) then
echo " $1 : completed" | tee -a log
return 0;
else
echo "ERR> $1 : command failed" | tee -a log
return 1;
fi
}
check "cp $file1 $file1.bak" # this works fine
check "sed -i s/text/newtext/g $file1" # this works, too
check "cat $file1 >> $file2" # this does not work
I've tried any number of combinations of quoting the command. The only way that I can get it to work is by using the following:
check $(cat $file1 >> $file2)
However, this does not pass the command itself to check only the return value, so $1 in function check carries /dev/null and not the command performed, which is not the particular behaviour I want.
Just for completeness, the log file looks like:
cp ./file1 ./file1.bak : completed
sed -i s/text/newtext/g ./file1 : completed
ERR> cat ./file1 >> ./file2 : command failed
I'm sure the solution is rather simple, but it has eluded me for a few hours and no amount of Google searches has yielded any help. Thanks for having a look.
The problem is that the I/O redirection in your cat command is not being interpreted as I/O redirection but rather as a simple argument to the cat command. It's not cat so much as the I/O redirection that is causing grief. Trying a pipeline would also give you problems.
Options available to remedy this include:
check "cp $file1 $file2" # Use copy instead of cat and I/O redirection; clobbers file2
check "eval cat $file1 >> $file2" # Use eval to handle I/O redirection, piping, etc
If either $file1 or $file2 contains shell special characters, the eval option is dangerous.
The cp command substitutes something that works without needing I/O redirection. You could even use a (microscopic) shell script to handle the job — where your script executes the shell script, and the shell script handles the redirection:
#!/bin/sh
exec cat ${1:?} >> ${2:?}
This generates a default error message if either argument 1 or 2 is missing (but doesn't object to extra arguments).
EDIT: the approach I first tried below doesn't quite work. There's another trick that can rescue this, even without resorting to bash magic, but it's getting ugly.
The >> redirection occurs at the wrong level, in this case. You wind up asking cat to read ./file, then a file named >>, then ./file2. To get the redirection to occur you'll need to do it elsewhere (see below), or invoke eval.
I'd recommend not using eval, but instead, rejiggering the logic of function check instead. You can redirect check at the top level, e.g.,:
check() {
if "$#"; then
echo " $# : completed" | tee -a log
return 0
fi
echo "ERR> $# : failed, status $?" | tee -a log
return 1
}
check cp "$file1" "$file.bak" # doesn't print anything
check sed -i s/text/newtext/g "$file1" >/dev/null # does print, so >/dev/null
check cat "$file1" >> "$file2"
(The double quotes here in the invocations of check are in case file1 and/or file2 ever acquire meta-characters like * or ;, or white space, etc.)
EDIT: as #cdm and #rici note, this fails for the append-to-file cases, because check's output is redirected even for the tee command. Again the redirection is happening at the wrong level. It's possible to fix this by adding another level of indirection:
append_to_file() {
local fname
fname="$1"
shift
"$#" >> "$fname"
}
check cp "$file1" "$file.bak"
check append_to_file /dev/null sed -e s/text/newtext/g "$file1"
check append_to_file "$file2" cat "$file1"
Now, though, the completed and failure messages log append_to_file at the front, which is really pretty klunky. I think I'd go back to eval instead.

Check execute command after cheking file type

I am working on a bash script which execute a command depending on the file type. I want to use the the "file" option and not the file extension to determine the type, but I am bloody new to this scripting stuff, so if someone can help me I would be very thankful! - Thanks!
Here the script I want to include the function:
#!/bin/bash
export PrintQueue="/root/xxx";
IFS=$'\n'
for PrintFile in $(/bin/ls -1 ${PrintQueue}) do
lpr -r ${PrintQueue}/${PrintFile};
done
The point is, all files which are PDFs should be printed with the lpr command, all others with ooffice -p
You are going through a lot of extra work. Here's the idiomatic code, I'll let the man page provide the explanation of the pieces:
#!/bin/sh
for path in /root/xxx/* ; do
case `file --brief $path` in
PDF*) cmd="lpr -r" ;;
*) cmd="ooffice -p" ;;
esac
eval $cmd \"$path\"
done
Some notable points:
using sh instead of bash increases portability and narrows the choices of how to do things
don't use ls when a glob pattern will do the same job with less hassle
the case statement has surprising power
First, two general shell programming issues:
Do not parse the output of ls. It's unreliable and completely useless. Use wildcards, they're easy and robust.
Always put double quotes around variable substitutions, e.g. "$PrintQueue/$PrintFile", not $PrintQueue/$PrintFile. If you leave the double quotes out, the shell performs wildcard expansion and word splitting on the value of the variable. Unless you know that's what you want, use double quotes. The same goes for command substitutions $(command).
Historically, implementations of file have had different output formats, intended for humans rather than parsing. Most modern implementations have an option to output a MIME type, which is easily parseable.
#!/bin/bash
print_queue="/root/xxx"
for file_to_print in "$print_queue"/*; do
case "$(file -i "$file_to_print")" in
application/pdf\;*|application/postscript\;*)
lpr -r "$file_to_print";;
application/vnd.oasis.opendocument.*)
ooffice -p "$file_to_print" &&
rm "$file_to_print";;
# and so on
*) echo 1>&2 "Warning: $file_to_print has an unrecognized format and was not printed";;
esac
done
#!/bin/bash
PRINTQ="/root/docs"
OLDIFS=$IFS
IFS=$(echo -en "\n\b")
for file in $(ls -1 $PRINTQ)
do
type=$(file --brief $file | awk '{print $1}')
if [ $type == "PDF" ]
then
echo "[*] printing $file with LPR"
lpr "$file"
else
echo "[*] printing $file with OPEN-OFFICE"
ooffice -p "$file"
fi
done
IFS=$OLDIFS

Resources