Source grep expression from array - bash

I am passing input to grep from previously declared variable that contains multiple lines. My goal is to extract only certain lines.
As I increase the argument count in grep, the readability goes down.
var1="
_id=1234
_type=document
date_found=988657890
whateverelse=1211121212"
echo "$var1"
_id=1234
_type=document
date_found=988657890
whateverelse=1211121212
grep -e 'file1\|^_id=\|_type\|date_found\|whateverelse' <<< $var1
_id=1234
_type=document
date_found=988657890
whateverelse=1211121212
My idea was to pass parameters from array and it will increase readibility:
declare -a grep_array=(
"^_id=\|"
"_type\|"
"date_found\|"
"whateverelse"
)
echo ${grep_array[#]}
^_id=\| _type\| date_found\| whateverelse
grep -e '${grep_array[#]}' <<<$var1
---- no results
How can I do it with grep to pass parameters with multiple OR conditions from somewhere else not one line?
As I have more arguments the readibility and manageability goes down.

Your idea is right but you've got couple of issues in the logic. The array expansion of type ${array[#]} puts the contents of the array as separate words, split by the white space character. While you wanted to pass a single regexp string to grep, the shell has expanded the array into its constituents and tries it to evaluate as
grep -e '^_id=\|' '_type\|' 'date_found\|' whateverelse
which means each of your regexp strings are now evaluated as a file content instead of a regexp string.
So to let grep treat your whole array content as a single string use the ${array[*]} expansion. Since this particular type of expansion uses the IFS character for joining the array content, you get a default space (default IFS value) between the words if it is not reset. The syntax below resets the IFS value in a sub-shell and prints out the expanded array content
grep -e "$(IFS=; printf '%s' "${grep_array[*]}")" <<<"$str1"

Related

What does read -r -a BUILD_ARGS_ARRAY <<< "$#" mean in bash?

I read some piece of code in build.include
set -u
prepare_build_args() {
IFS=',' read -r -a BUILD_ARGS_ARRAY <<< "$#"
for i in ${BUILD_ARGS_ARRAY[#]}; do
BUILD_ARGS+="--build-arg $i "
done
}
I have difficulty in understanding this code because I am new to shell.
Is IFS a variable assigned with value ','? Why it is followed by a read command?
What does -r -a mean? and what does <<< do?
BUILD_ARGS_ARRAY[#] is not defined before . and there is set -u which means unassigned variable will be recognized as error.Is it the problem of scope? And What does [#] mean?
Finally,in my understanding BUILD_ARGS stored all the things in BUILD_ARG_ARRAY, but it is not returned out of the prepare_build_argsfunction?
Looking through the Bash manual might be helpful.
IFS is the Internal Field Separator, setting it before the read command applies it only for that command.
The read builtin command option -r stops backslashes mangling the data, and -a reads into an array (BUILD_ARGS_ARRAY in this case).
<<< is a here string which directs the arguments of the function prepare_build_args to the read command.
BUILD_ARGS_ARRAY is set by the read command. The [#] Bash syntax expands the array.
Variable scope is global unless the local builtin is used.
In short, this code:
Concatenates all your function's arguments together into a single string (normally this is what "$*" does, but "$#" does it as well when used in a context where its result is evaluated as a single string).
Splits that string on commas, storing the result in an array named BUILD_ARGS_ARRAY
Takes the array, concatenates its arguments into a single string (again!), splits that string on whitespace, expands each component generated by that split operation as a glob operation, and iterates over the glob results.
For each glob results, appends the string --build-arg <result> to BUILD_ARGS.
This is extremely buggy, and should never be used by anyone. To go into why in more detail:
"$#" is intended for use where its result can be treated as a list. Expanding it in a string context throws away the original division between arguments, replacing them with the first character in IFS (in the context in which the expansion is done, not the expansion of the read in which the result is consumed).
The unquoted ${foo[#]} expansions make the behavior of this code sensitive to whether your arguments contain globbing characters, and if so, which files exist in the directory it's run in and whether the nullglob, failglob, or similar options are set. See the shellcheck warning SC2068.
The net effect of this operation is to build a string which is presumably going to be expanded in generating a command line. Strings cannot be safely used in this way in the general case; see BashFAQ #50 describing the pitfalls, caveats, and alternative approaches.

How to pass string literal containing newlines to grep from bash script

I am trying to pass the "strings" from a file as input to grep using the -F (fixed string) parameter.
From grep the man page, the expected format is newline-separated:
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings (instead of regular expressions), separated by newlines, any of which is to be matched.
How can this be done in bash? I have:
#!/bin/bash
INFILE=$1
DIR=$2
# Create a newline-separated string array
STRINGS="";
while read -r string; do
STRINGS+=$'\n'$string;
done < <(strings $INFILE);
cd $DIR
for file in *; do
grep -Frn \"$STRINGS\" .
done;
But grep reports error at run-time regarding input formatting. Grep is interpreting the passed string arguments as parameters -- hence the need to pass them as one large string literal.
Debugging bash with -xand passing the first parameter (INFILE) as the script itself gives:
+ grep -Frn '"' '#!/bin/bash' 'INFILE=$1' 'DIR=$2' [...]
Try the following:
#!/bin/bash
inFile=$1
dir=$2
# Read all lines output by `string` into a single variable using
# a command substitution, $(...).
# Note that the trailing newlines is trimmed, but grep still recognizes
# the last line.
strings="$(strings "$inFile")"
cd "$dir"
for file in *; do
grep -Frn "$strings" .
done
string outputs each string found in the target file on its own line, so you can use its output as-is, via a command substitution ($(...)).
On a side note: strings is used to extract strings from binary files, and strings are only included if they're at least 4 ASCII(!) characters long and are followed by a newline or NUL.
Note that while the POSIX spec for strings does mandate locale-awareness with respect to character interpretation, both GNU strings and BSD/macOS strings recognize 7-bit ASCII characters only.
If, by contrast, your search strings come from a text file from which you want to strip empty and blank lines, use strings="$(awk 'NF>0' "$inFile")"
Double-quote your variable references and command substitutions to ensure that their values are used as-is.
Do not use \" unless you want to pass a literal " char. to the target command - as opposed to an unquoted one that has syntactical meaning to the shell.
In your particular case, \"$STRINGS\" breaks down as follows:
An unquoted reference to variable $STRINGS - because the enclosing " are \-escaped and therefore literals.
The resulting string - "<value-of-$STRINGS>" - due to $STRINGS being unquoted, is then subject to word-splitting
(and globbing), i.e., split into multiple arguments by whitespace. As a result, because grep expects the search term(s) as a single argument, the command breaks.
Do not use all-uppercase shell variable names in order to avoid conflicts with environment variables and special shell variables.

How to grep from a single line

I'm using a weather API that outputs all data in a single line. How do I use grep to get the values for "summary" and "apparentTemperature"? My command of regular expressions is basically nonexistent, but I'm ready to learn.
{"latitude":59.433335,"longitude":24.750486,"timezone":"Europe/Tallinn","offset":2,"currently":{"time":1485880052,"summary":"Clear","icon":"clear-night","precipIntensity":0,"precipProbability":0,"temperature":0.76,"apparentTemperature":-3.34,"dewPoint":-0.13,"humidity":0.94,"windSpeed":3.99,"windBearing":262,"visibility":9.99,"cloudCover":0.11,"pressure":1017.72,"ozone":282.98}}
Thank you!
How do I use grep to get the values for "summary" and "apparentTemperature"?
You use grep's -o flag, which makes it output only the matched part.
Since you don't know much about regex, I suggest you instead learn to use a JSON parser, which would be more appropriate for this task.
For example with jq, the following command would extract the current summary :
<whatever is your JSON source> | jq '.currently.summary'
Assume your single-line data is contained in a variable called DATA_LINE.
If you are certain the field is only present once in the whole line, you could do something like this in Bash:
if
[[ "$DATA_LINE" =~ \"summary\":\"([^\"]*)\" ]]
then
summary="${BASH_REMATCH[1]}"
echo "Summary field is : $summary"
else
echo "Summary field not found"
fi
You would have to do that once for each field, unless you build a more complex matching expression that assumes fields are in a specific order.
As a note, the matching expression \"summary\":\"([^\"]*)\" finds the first occurrence in the data of a substring consisting of :
"summary":" (double quotes included), followed by
([^\"]*) a sub-expression formed of a sequence of zero or more characters other than a double quote : this is in parentheses to make it available later as an element in the BASH_REMATCH array, because this is the value you want to extract
and finally a final quote ; this is not absolutely necessary, but protects from reading from a truncated data line.
For apparentTemperature the code will be a bit different because the field does not have the same format.
if
[[ "$DATA_LINE" =~ \"apparentTemperature\":([^,]*), ]]
then
apparentTemperature="${BASH_REMATCH[1]}"
echo "Apparent temperature field is : $apparentTemperature"
else
echo "Apparent temperature field not found"
fi
This is fairly easily understood if your skills are limited - like mine! Assuming your string is in a variable called $LINE:
summary=$(sed -e 's/.*summary":"//' -e 's/".*//' <<< $LINE)
Then check:
echo $summary
Clear
That executes (-e) 2 sed commands. The first one substitutes everything up to summary":" with nothing and the second substitutes the first remaining double quote and everything that follows with nothing.
Extract apparent temperature:
appTemp=$(sed -e 's/.*apparentTemperature"://' -e 's/,.*//' <<< $LINE)
Then check:
echo $appTemp
-3.34
As Aaron mentioned a json parser like jq is the right tool for this, but since the question was about grep, let's see one way to do it.
Assuming your API return value is in $json:
json='{"latitude":59.433335,"longitude":24.750486,"timezone":"Europe/Tallinn","offset":2,"currently":{"time":1485880052,"summary":"Clear","icon":"clear-night","precipIntensity":0,"precipProbability":0,"temperature":0.76,"apparentTemperature":-3.34,"dewPoint":-0.13,"humidity":0.94,"windSpeed":3.99,"windBearing":262,"visibility":9.99,"cloudCover":0.11,"pressure":1017.72,"ozone":282.98}}'
The patterns you see in the parenthesis are lookbehind and lookahead assertions for context matching. They can be used with the -P Perl regex option and will not be captured in the output.
summary=$(<<< "$json" grep -oP '(?<="summary":").*?(?=",)')
apparentTemperature=$(<<< "$json" grep -oP '(?<="apparentTemperature":).*?(?=,)')

Correctly allow word splitting of command substitution in bash

I write, maintain and use a healthy amount of bash scripts. I would consider myself a bash hacker and strive to someday be a bash ninja ( need to learn more awk first ). One of the most important feature/frustrations of bash to understand is how quotes, and subsequent parameter expansion, work. This is well documented, and for a good reason, many pitfalls, bugs and newbie-traps exist in the mysterious world of quoted parameter expansion and word splitting. For this reason, the advice is to "Double quote everything," but what if I want word splitting to occur?
In multiple style guides I can not find an example of safe and proper use of word splitting after command substitution.
What is the correct way to use unquoted command substitution?
Example:
I don't need help getting this command working, but it seems to be a violation of established patterns, if you would like to give feedback on this command, please keep it in comments
docker stats $(docker ps | awk '{print $NF}' | grep -v NAMES)
The command substitute returns output such as:
container-1 container-3 excitable-newton
This one-liner uses the command substitution to spit out the names of each of my running docker containers and the feeds them, with word splitting, as separate inputs to the docker stats command, which takes an arbitrary length list of container names and gives back some info about them.
If I used:
docker stats "$(docker ps | awk '{print $NF}' | grep -v NAMES)"
There would be one string of newline separated container names passed to docker stats.
This seems like a perfect example of when I would want word splitting, but shellcheck disagrees, is this somehow unsafe? Is there an established pattern for using word-splitting after expansion or substitution?
The safe way to capture output from one command and pass it to another is to temporarily capture the output in an array. This allows splitting on arbitrary delimiters and prevents unintentional splitting or globbing while capturing output as more than one string to be passed on to another command.
If you want to read a space-separated string into an array, use read -a:
read -r -a names < <(docker ps | awk '{print $NF}' | grep -v NAMES)
printf 'Found name: %s\n' "${names[#]}"
Unlike the unquoted-expansion approach, this doesn't expand globs. Thus, foo[bar] can't be replaced with a filesystem entry named foob, or with an empty string if no such filesystem entry exists and the nullglob shell option is set. (Likewise, * will no longer be replaced with a list of files in the current directory).
To go into detail regarding behavior: read -r -a reads up to a delimiter passed as the first character of the option argument following -d (if given), or a NUL if that option argument is 0 bytes, and splits the results into fields based on characters within IFS -- a set which, by default, contains the newline, the tab, and the space; it then assigns those split results to an array.
This behavior does not meaningfully vary based on shell-local configuration, except for IFS, which can be modified scoped to the single command.
mapfile -t and readarray -t are similarly consistent in behavior, and likewise recommended if portability constraints do not prevent their use.
By contrast, array=( $string ) is much more dependent on the shell's configuration and settings, and will behave badly if the shell's configuration is left at defaults:
When using array=( $string ), if set -f is not set, each word created by splitting $string is evaluated as a glob, with further variances based in behavior depending on the shopt settings nullglob (which would cause a pattern which didn't expand to any contents to result in an empty set, rather than the default of expanding to the glob expression itself), failglob (which would cause a pattern which didn't expand to any contents to result in a failure), extglob, dotglob and others.
When using array=( $string ), the value of IFS used for the split operation cannot be easily and reliably altered in a manner scoped to this single operation. By contrast, one can run IFS=: read to force read to split only on :s without modifying the value of IFS outside the scope of that single value; no equivalent for array=( $string ) exists without storing and re-setting IFS (which is an error-prone operation; some common idioms [such as assignment to oIFS or a similar variable name] operate contrary to intent in common scenarios, such as failing to reproduce an unset or empty IFS at the end of the block to which the temporary modification is intended to apply).
Thanks to #I'L'I's pointing to an example of a valid exception to the "Quote Everything" rule, my code does appear to be a exception to the rule.
In my particular use case, using docker container names, the risk of accidental globbing or expansion is low due to the constraints on container names. However #Charles Duffy provided a surefire and safe way to go about word splitting one command output before feeding it into the next command by reading the first output into an array using bash built-in read ( I found readarray better suited my case ).
readarray -t names < <(docker ps | awk '{print $NF}' | grep -v NAMES)
docker stats "${names[#]}"
This pattern allows for the output from the first command to be fed to the second command as properly split, separate arguments while avoiding unwanted globbing or splitting. Unfortunately my slick one-liner will perish in favor of safety.

How do I take shell input literally? (i.e. keeping quotes etc. intact)

I am trying to write a bash script so that I will use to replace my egrep command. I want to be able to take the exact same input that is given to my script and feed it to egrep.
i.e.
#!/bin/bash
PARAMS=$#
`egrep "$PARAMS"`
But I have noticed that if I echo what I am executing, that the quotes have been removed as follows:
./customEgrep -nr "grep my ish" *
returns
egrep -nr grep my ish (file list from the expanded *)
Is there a way that I can take the input literally so I can use it directly with egrep?
You want this:
egrep "$#"
The quotes you type are not passed to the script; they're used to determine word boundaries. Using "$#" preserves those word boundaries, so egrep will get the same arguments as it would if you ran it directly. But you still won't see quotation marks if you echo the arguments.
" is a special char. you need to use escape character in order to retrieve "
use
./customEgrep -nr "\"grep my ish\"" *
If you don't need to do any parameter expansion in the argument, you can use
single quotes to avoid the need to escape the double quotes:
./customerEgrep -nr '"grep my ish"' *
$# is special when quoted. Try:
value=$( egrep "$#" )
It's not clear to me why you are using bacticks and ignoring the result, so I've used the $() syntax and assigned the value.
If for some reason you want to save the parameters to use later, you can also do things like:
for i; do args="$args '$i'"; done # Save the arguments
eval grep $args # Pass the arguments to grep without resetting $1,$2,...
eval set $args # Restore the arguments
grep "$#" # Use the restored arguments

Resources