How to pass string literal containing newlines to grep from bash script - bash

I am trying to pass the "strings" from a file as input to grep using the -F (fixed string) parameter.
From grep the man page, the expected format is newline-separated:
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings (instead of regular expressions), separated by newlines, any of which is to be matched.
How can this be done in bash? I have:
#!/bin/bash
INFILE=$1
DIR=$2
# Create a newline-separated string array
STRINGS="";
while read -r string; do
STRINGS+=$'\n'$string;
done < <(strings $INFILE);
cd $DIR
for file in *; do
grep -Frn \"$STRINGS\" .
done;
But grep reports error at run-time regarding input formatting. Grep is interpreting the passed string arguments as parameters -- hence the need to pass them as one large string literal.
Debugging bash with -xand passing the first parameter (INFILE) as the script itself gives:
+ grep -Frn '"' '#!/bin/bash' 'INFILE=$1' 'DIR=$2' [...]

Try the following:
#!/bin/bash
inFile=$1
dir=$2
# Read all lines output by `string` into a single variable using
# a command substitution, $(...).
# Note that the trailing newlines is trimmed, but grep still recognizes
# the last line.
strings="$(strings "$inFile")"
cd "$dir"
for file in *; do
grep -Frn "$strings" .
done
string outputs each string found in the target file on its own line, so you can use its output as-is, via a command substitution ($(...)).
On a side note: strings is used to extract strings from binary files, and strings are only included if they're at least 4 ASCII(!) characters long and are followed by a newline or NUL.
Note that while the POSIX spec for strings does mandate locale-awareness with respect to character interpretation, both GNU strings and BSD/macOS strings recognize 7-bit ASCII characters only.
If, by contrast, your search strings come from a text file from which you want to strip empty and blank lines, use strings="$(awk 'NF>0' "$inFile")"
Double-quote your variable references and command substitutions to ensure that their values are used as-is.
Do not use \" unless you want to pass a literal " char. to the target command - as opposed to an unquoted one that has syntactical meaning to the shell.
In your particular case, \"$STRINGS\" breaks down as follows:
An unquoted reference to variable $STRINGS - because the enclosing " are \-escaped and therefore literals.
The resulting string - "<value-of-$STRINGS>" - due to $STRINGS being unquoted, is then subject to word-splitting
(and globbing), i.e., split into multiple arguments by whitespace. As a result, because grep expects the search term(s) as a single argument, the command breaks.
Do not use all-uppercase shell variable names in order to avoid conflicts with environment variables and special shell variables.

Related

Source grep expression from array

I am passing input to grep from previously declared variable that contains multiple lines. My goal is to extract only certain lines.
As I increase the argument count in grep, the readability goes down.
var1="
_id=1234
_type=document
date_found=988657890
whateverelse=1211121212"
echo "$var1"
_id=1234
_type=document
date_found=988657890
whateverelse=1211121212
grep -e 'file1\|^_id=\|_type\|date_found\|whateverelse' <<< $var1
_id=1234
_type=document
date_found=988657890
whateverelse=1211121212
My idea was to pass parameters from array and it will increase readibility:
declare -a grep_array=(
"^_id=\|"
"_type\|"
"date_found\|"
"whateverelse"
)
echo ${grep_array[#]}
^_id=\| _type\| date_found\| whateverelse
grep -e '${grep_array[#]}' <<<$var1
---- no results
How can I do it with grep to pass parameters with multiple OR conditions from somewhere else not one line?
As I have more arguments the readibility and manageability goes down.
Your idea is right but you've got couple of issues in the logic. The array expansion of type ${array[#]} puts the contents of the array as separate words, split by the white space character. While you wanted to pass a single regexp string to grep, the shell has expanded the array into its constituents and tries it to evaluate as
grep -e '^_id=\|' '_type\|' 'date_found\|' whateverelse
which means each of your regexp strings are now evaluated as a file content instead of a regexp string.
So to let grep treat your whole array content as a single string use the ${array[*]} expansion. Since this particular type of expansion uses the IFS character for joining the array content, you get a default space (default IFS value) between the words if it is not reset. The syntax below resets the IFS value in a sub-shell and prints out the expanded array content
grep -e "$(IFS=; printf '%s' "${grep_array[*]}")" <<<"$str1"

bash print words on multiple lines in a single line

I am writing a shell script for which I write a header that has 30 (growing) column names. Right now, I have a echo statement that works and looks like this
echo "Colum_Name1, Column_Name2,Column_Name30"
While this works the readability sucks for me. if i want to add a column, its a bit of a nightmare to look at the screen and understand whether it is already in there. of course, I search my way out. Is it possible to do something like this with echo or printf and get the CSV in one line?
echo " Column_Name1,
Column_Name2,
Column_Name30"
and get the output as
Column_Name1,Column_Name2,Column_Name30
You can add backslash as the line continuation:
echo " Column_Name1,"\
"Column_Name2,"\
"Column_Name30"
From the bash manual:
The backslash character ‘\’ may be used to remove any special meaning
for the next character read and for line continuation.
Decouple the definition of the header and printing it, and use an array to store the column names.
headers=(
Column_Name1
Column_Name2
Column_Name30
)
(IFS=","; printf '%s\n' "${headers[*]}")
The elements of the array are joined by the first character of IFS when ${headers[*]} is expanded. The subshell is used so you don't have to worry about restoring the previous value of IFS.
Convenience solution, using paste:
If you don't mind the (probably negligible) overhead of invoking an external utility (paste) to build your string, you can combine it with a (literal, in this case) here-doc:
paste -s -d, - <<'EOF'
Column_Name1
Column_Name2
Column_Name30
EOF
yields
Column_Name1,Column_Name2,Column_Name30
The above acts like a single-quoted string, due to the opening delimiter, 'EOF', being quoted.
Omit the enclosing '...' to treat the string like a double-quoted string, i.e., with expansions being performed (allowing the inclusion of variable references, command substitutions, and arithmetic expansions).
If you take care to use actual leading tabs (\t) in your here-doc (multiple spaces do not work), you can even introduce indentation, by prepending - to the opening delimiter:
# !! Only works with actual *tabs* as the leading whitespace.
paste -s -d, - <<-'EOF'
Column_Name1
Column_Name2
Column_Name30
EOF
More efficient solution, using line continuation:
POSIX-compatible shells support line continuation even inside double-quoted strings, "..." (but not inside single-quoted ones, '...').
That means that any \<newline> sequence inside a double-quoted string is removed:
echo "\
Column_Name1,\
Column_Name2,\
Column_Name3\
"
Given that a here-document with an unquoted opening delimiter is treated like a double-quoted string, you can do the following:
cat <<EOF
Column_Name1,\
Column_Name2,\
Column_Name30
EOF
Note:
Using <<-EOF with to-be-stripped leading tabs (\t) for readability is not an option here, because the line continuations will still include them.
To take advantage of line continuation, it is invariably the interpolating (expanding) here-doc variety that must be used; therefore, you may need to \-escape $ instances to ensure their literal use.
Both commands again yield the desired single-line string:
Column_Name1,Column_Name2,Column_Name30
echo "foo bar" | (IFS=" "; xargs -n 1 echo)
yields
foo
bar

Why does echo "$out" split output onto multiple lines, if quotes suppress word-splitting?

I have very simple directory with "directory1" and "file2" in it.
After
out=`ls`
I want to print my variable: echo $out gives:
directory1 file2
but echo "$out" gives:
directory1
file2
so using quotes gives me output with each record on separate line. As we know ls command prints output using single line for all files/dirs (if line is big enough to contain output) so I expected that using double quotes prevents my shell from splitting words to separate lines while ommitting quotes would split them.
Pls tell me: why using quotes (used for prevent word-splitting) suddenly splits output ?
On Behavior Of ls
ls only prints multiple filenames on a single line by default when output is to a TTY. When output is to a pipeline, a file, or similar, then the default is to print one line to a file.
Quoting from the POSIX standard for ls, with emphasis added:
The default format shall be to list one entry per line to standard output; the exceptions are to terminals or when one of the -C, -m, or -x options is specified. If the output is to a terminal, the format is implementation-defined.
Literal Question (Re: Quoting)
It's the very act of splitting your command into separate arguments that causes it to be put on one line! Natively, your value spans multiple lines, so echoing it unmodified (without any splitting) prints it precisely that manner.
The result of your command is something like:
out='directory1
file2'
When you run echo "$out", that exact content is printed. When you run echo $out, by contrast, the behavior is akin to:
echo "directory1" "file2"
...in that the string is split into two elements, each passed as completely different argument to echo, for echo to deal with as it sees fit -- in this case, printing both those arguments on the same line.
On Side Effects Of Word Splitting
Word-splitting may look like it does what you want here, but that's often not the case! Consider some particular issues:
Word-splitting expands glob expressions: If a filename contains a * surrounded by whitespace, that * will be replaced with a list of files in the current directory, leading to duplicate results.
Word-splitting doesn't honor quotes or escaping: If a filename contains whitespace, that internal whitespace can't be distinguished from whitespace separating multiple names. This is closely related to the issues described in BashFAQ #50.
On Reading Directories
See Why you shouldn't parse the output of ls. In short -- in your example of out=`ls`, the out variable (being a string) isn't able to store all possible filenames in a useful, parsable manner.
Consider, for instance, a file created as such:
touch $'hello\nworld"three words here"'
...that filename contains spaces and newlines, and word-splitting won't correctly detect it as a single name in the output from ls. However, you can store and process it in an array:
# create an array of filenames
names=( * )
if ! [[ -e $names || -L $names ]]; then # this tests only the FIRST name
echo "No names matched" >&2 # ...but that's good enough.
else
echo "Found ${#files[#]} files" # print number of filenames
printf '- %q\n' "${names[#]}"
fi

In shell scripting, how do I ensure all characters in a variable are passed literally?

Say I have this command:
printf $text | perl program.pl
How do I guarantee that everything in the $text variable is literally? For example, if $text contains hello"\n, how do I make sure that's exactly what gets passed to program.pl, without the newline or quotation mark (or any conceivable character) being interpreted as a special character?
Quotes!
printf '%s' "$text" | ...
Don't ever expand variables unquoted if you care about preserving their contents precisely. Also, don't ever pass a dynamic string as a format variable when you want it to be treated as literal data.
If you want backslash sequences to be interpreted -- for instance, the two-character sequence \n to be changed to a single newline -- and your shell is bash, use printf '%b' "$text" instead. If you want byte-for-byte accuracy, %s is the Right Thing (and works on any POSIX-compliant shell). If you want escaping for interpretation by another shell (which would be appropriate if, say, you were passing content as part of a ssh command line), then the appropriate format string (for bash only) is %q.

How to escape a previously unknown string in regular expression?

I need to egrep a string that isn't known before runtime and that I'll get via shell variable (shell is bash, if that matters). Problem is, that string will contain special characters like braces, spaces, dots, slashes, and so on.
If I know the string I can escape the special characters one at a time, but how can I do that for the whole string?
Running the string through a sed script to prefix each special character with \ could be an idea, I still need to rtfm how such a script should be written. I don't know if there are other, better, options.
I did read re_format(7) but it seems there is no such thing like "take the whole next string as literal"...
EDIT: to avoid false positives, I should also add newline detection to the pattern, eg. egrep '^myunknownstring'
If you need to embed the string into a larger expression, sed is how I would do it.
s_esc="$(echo "$s" | sed 's/[^-A-Za-z0-9_]/\\&/g')" # backslash special characters
inv_ent="$(egrep "^item [0-9]+ desc $s_esc loc .+$" inventory_list)"
Use the -F flag to make the PATTERN a fixed literal string
$ var="(.*+[a-z]){3}"
$ echo 'foo bar (.*+[a-z]){3} baz' | grep -F "$var" -o
(.*+[a-z]){3}
Are you trying to protect the string from being incorrectly interpreted as bash syntax or are you trying to protect parts of the string from being interpreted as regular expression syntax?
For bash protection:
grep supports the -f switch:
-f FILE, --file=FILE
Obtain patterns from FILE, one per line. The empty file contains zero patterns, and therefore matches nothing.
No escaping is necessary inside the file. Just make it a file containing a single line (and thus one pattern) which can be produced from your shell variable if that's what you need to do.
# example trivial regex
var='^r[^{]*$'
pattern=/tmp/pattern.$$
rm -f "$pattern"
echo "$var" > "$pattern"
egrep -f "$pattern" /etc/password
rm -f "$pattern"
Just to illustrate the point.
Try it with -F instead as another poster suggested for regex protection.

Resources