bash or perl inline capture of multiple pattern matches

bash or perl inline capture of multiple pattern matches - bash

Simple thing :
content='"id":"1234"abcdefg"id":"3456"'
need to get the id's out:
perl -le '#m = ( $content =~ /\"id\"\:\"(.*)\"/g); print for #m'
Not working but need a one liner for use on confluence json/output for processing by Ansible.

In Perl:
perl -sE 'say for $str =~ /"id":"(.*?)"/g' -- -str="$content"

Use
grep -Po 'id":"\K.*?(?=")' <<< "$content"
-P enables perl regexes.
-o prints only the matching parts instead of whole lines.
\K lets grep forget that it matched id":" before.
.*? matches as few caracters as possible ...
(?=") ... until a " is the next character. The " will not be included in the match.
Output for your example:
1234
3456

Related

bash - extract part of variable that starts with digit

I have the following variable 'VAR1' in a bash script:
VAR1 = "/path/to/file/190909_AAA_ZZZ/"
I now want to create a variable (VAR2) that only contains the part "190909".
I want to do this by extracting the part that starts with any 6 digits (190909) until the next "_"
How can this be achieved?
VAR2 = ${grep ... $VAR1} ???

Please try the following:
VAR1="/path/to/file/190909_AAA_ZZZ/"
[[ $VAR1 =~ ([0-9]{6})_ ]] && VAR2=${BASH_REMATCH[1]}
echo "$VAR2"
Output:
190909
Note that it is not recommended to use uppercase letters for normal variable names.

You may use this sed command:
var1="/path/to/file/190909_AAA_ZZZ/"
var2=$(sed -E 's~.*/([0-9]{6})_.*~\1~' <<< "$var1")
echo "$var2"
190909
RegEx Details:
.*/: Match anything (greedy) till we match /
([0-9]{6}): Match 6 digits and capture it in group #1
_.*: Match _ and everything until end
Replacement is \1 which is to put captured group #1 value back.

pcre grep and perl equivalents:
$ VAR1="/path/to/file/190909_AAA_ZZZ/"
$ grep -oP '[0-9]{6}(?=_)' <<< $VAR1
190909
$ perl -nE 'say for /([0-9]{6}(?=_))/' <<< $VAR1
190909
Explanation on regex: https://regex101.com/r/D2NVOl/1

Substring from a string in bash using scripting language

How can we fetch a substring from a string in bash using scripting language?
Example:
fullstring="mnuLOCNMOD.URL = javascript:parent.doC...something"
The substring I want is everything before ".URL" in the full string.

With Parameter Expansion, you can do:
fullstring="mnuLOCNMOD.URL = javascript:parent.doC...something"
echo ${fullstring%\.URL*}
prints:
mnuLOCNMOD

$ fullstring="mnuLOCNMOD.URL = javascript:parent.doC...something"
$ sed -r 's/^(.*)\.URL.*$/\1/g' <<< "$fullstring"
mnuLOCNMOD
$

You can use grep:
echo "mnuLOCNMOD.URL = javas" | grep -oP '\w+(?=\.URL)'
and assign the result to a string. I used a positive lookahead (?=regex) because it's a zero length assertion, meaning that it'll be matched but won't be displayed.
Run grep --help to find out what o and P flags stand for.

Parameter Expansion is the way to go.
If you are interested in a simple grep:
% fullstring="mnuLOCNMOD.URL = javascript:parent.doC...something"
% grep -o '^[^.]*' <<<"$fullstring"
mnuLOCNMOD

fullstring="mnuLOCNMOD.URL = javascript:parent.doC...something"
menuID=`echo $fullstring | cut -f 1 -d '.'`
here I used dot as a separator
this works in .sh files

To offer yet another alternative: Bash's regular-expression matching operator, =~:
fullstring="mnuLOCNMOD.URL = javascript:parent.doC...something"
echo "$([[ $fullstring =~ ^(.*)'.URL' ]] && echo "${BASH_REMATCH[1]}")"
Note how the (one and only) capture group ((.*)) is reported through element 1 of the special "${BASH_REMATCH[#]}" array variable.
While in this case l3x's parameter expansion solution is simpler, =~ generally offers more flexibility.
awk offers an easy solution as well:
echo "$(awk -F'\\.URL' '{ print $1 }' <<<"$fullstring")"

bash script command output execution doesn't assign full output when using backticks

I used many times [``] to capture output of command to a variable. but with following code i am not getting right output.
#!/bin/bash
export XLINE='($ZWP_SCRIP_NAME),$ZWP_LT_RSI_TRIGGER)R),$ZWP_RTIMER'
echo 'Original XLINE'
echo $XLINE
echo '------------------'
echo 'Extract all word with $ZWP'
#works fine
echo $XLINE | sed -e 's/\$/\n/g' | sed -e 's/.*\(ZWP[_A-Z]*\).*/\1/g' | grep ZWP
echo '------------------'
echo 'Assign all word with $ZWP to XVAR'
#XVAR doesn't get all the values
export XVAR=`echo $XLINE | sed -e 's/\$/\n/g' | sed -e 's/.*\(ZWP[_A-Z]*\).*/\1/g' | grep ZWP` #fails
echo "$XVAR"
and i get:
Original XLINE
($ZWP_SCRIP_NAME),$ZWP_LT_RSI_TRIGGER)R),$ZWP_RTIMER
------------------
Extract all word with $ZWP
ZWP_SCRIP_NAME
ZWP_LT_RSI_TRIGGER
ZWP_RTIMER
------------------
Assign all word with $ZWP to XVAR
ZWP_RTIMER
why XVAR doesn't get all the values?
however if i use $() to capture the out instead of ``, it works fine. but why `` is not working?

Having GNU grep you can use this command:
XVAR=$(grep -oP '\$\KZWP[A-Z_]+' <<< "$XLINE")
If you pass -P grep is using Perl compatible regular expressions. The key here is the \K escape sequence. Basically the regex matches $ZWP followed by one or more uppercase characters or underscores. The \K after the $ removes the $ itself from the match, while its presence is still required to match the whole pattern. Call it poor man's lookbehind if you want, I like it! :)
Btw, grep -o outputs every match on a single line instead of just printing the lines which match the pattern.
If you don't have GNU grep or you care about portability you can use awk, like this:
XVAR=$(awk -F'$' '{sub(/[^A-Z_].*/, "", $2); print $2}' RS=',' <<< "$XLINE")

First, the smallest change that makes your code "work":
echo "$XLINE" | tr '$' '\n' | sed -e 's/.*\(ZWP[_A-Z]*\).*/\1/g' | grep ZWP_
The use of tr replaces a sed expression that didn't actually do what you thought it did -- try looking at its output to see.
One sane alternative would be to rely on GNU grep's -o option. If you can't do that...
zwpvars=( ) # create a shell array
zwp_assignment_re='[$](ZWP_[[:alnum:]_]+)(.*)' # ...and a regex
content="$XLINE"
while [[ $content =~ $zwp_assignment_re ]]; do
zwpvars+=( "${BASH_REMATCH[1]}" ) # found a reference
content=${BASH_REMATCH[2]} # stuff the remaining content aside
done
printf 'Found variable: %s\n' "${zwpvars[#]}"

bash script to extract ALL matches of a regex pattern

I found this but it assumes the words are space separated.
result="abcdefADDNAME25abcdefgHELLOabcdefgADDNAME25abcdefgHELLOabcdefg"
for word in $result
do
if echo $word | grep -qi '(ADDNAME\d\d.*HELLO)'
then
match="$match $word"
fi
done
POST EDITED
Re-naming for clarity:
data="abcdefADDNAME25abcdefgHELLOabcdefgADDNAME25abcdefgHELLOabcdefg"
for word in $data
do
if echo $word | grep -qi '(ADDNAME\d\d.*HELLO)'
then
match="$match $word"
fi
done
echo $match
Original left so comments asking about result continue to make sense.

Use grep -o
-o, --only-matching show only the part of a line matching PATTERN

Edit: answer to edited question:
for string in "$(echo $result | grep -Po "ADDNAME[0-9]{2}.*?HELLO")"; do
match="${match:+$match }$string"
done
Original answer:
If you're using Bash version 3.2 or higher, you can use its regex matching.
string="string to search 99 with 88 some 42 numbers"
pattern="[0-9]{2}"
for word in $string; do
[[ $word =~ $pattern ]]
if [[ ${BASH_REMATCH[0]} ]]; then
match="${match:+$match }${BASH_REMATCH[0]}"
fi
done
The result will be "99 88 42".

Not very elegant - and there are problems because of greedy matching - but this more or less works:
data="abcdefADDNAME25abcdefgHELLOabcdefgADDNAME25abcdefgHELLOabcdefg"
for word in $data \
"ADDNAME25abcdefgHELLOabcdefgADDNAME25abcdefgHELLOabcdefg" \
"ADDNAME25abcdefgHELLOabcdefgADDNAME25abcdefgHELLO"
do
echo $word
done |
sed -e '/ADDNAME[0-9][0-9][a-z]*HELLO/{
s/\(ADDNAME[0-9][0-9][a-z]*HELLO\)/ \1 /g
}' |
while read line
do
set -- $line
for arg in "$#"
do echo $arg
done
done |
grep "ADDNAME[0-9][0-9][a-z]*HELLO"
The first loop echoes three lines of data - you'd probably replace that with cat or I/O redirection. The sed script uses a modified regex to put spaces around the patterns. The last loop breaks up the 'space separated words' into one 'word' per line. The final grep selects the lines you want.
The regex is modified with [a-z]* in place of the original .* because the pattern matching is greedy. If the data between ADDNAME and HELLO is unconstrained, then you need to think about using non-greedy regexes, which are available in Perl and probably Python and other modern scripting languages:
#!/bin/perl -w
while (<>)
{
while (/(ADDNAME\d\d.*?HELLO)/g)
{
print "$1\n";
}
}
This is a good demonstration of using the right too for the job.

Is it possible to do a grep with keywords stored in the array?

Is it possible to do a grep with keywords stored in the array.
Here is the possible code snippet; how can I correct it?
args=("key1" "key2" "key3")
cat file_name |while read line
echo $line | grep -q -w ${args[c]}
done
At the moment, I can search for only one keyword. I would like to search for all the keywords which is stored in args array.

args=("key1" "key2" "key3")
pat=$(echo ${args[#]}|tr " " "|")
grep -Eow "$pat" file
Or with the shell
args=("key1" "key2" "key3")
while read -r line
do
for i in ${args[#]}
do
case "$line" in
*"$i"*) echo "found: $line";;
esac
done
done <"file"

You can use some bash expansion magic to prefix each element with -e and pass each element of the array as a separate pattern. This may avoid some precedence issues where your patterns may interact badly with the | operator:
$ grep ${args[#]/#/-e } file_name
The downside to this is that you cannot have any spaces in your patterns because that will split the arguments to grep. You cannot put quotes around the above expansion, otherwise you get "-e pattern" as a single argument to grep.

This is one way:
args=("key1" "key2" "key3")
keys=${args[#]/%/\\|} # result: key1\| key2\| key3\|
keys=${keys// } # result: key1\|key2\|key3\|
grep "${keys}" file_name
Edit:
Based on Pavel Shved's suggestion:
( IFS="|"; keys="${args[*]}"; keys="${keys//|/\\|}"; grep "${keys}" file_name )
The first version as a one-liner:
keys=${args[#]/%/\\|}; keys=${keys// }; grep "${keys}" file_name
Edit2:
Even better than the version using IFS:
printf -v keys "%s\\|" "${args[#]}"; grep "${keys}" file_name

I tend to use process substitution for everything. It's convenient when combined with grep's -f option:
Obtain patterns from FILE, one per line.
(Depending on the context, you might even want to combine that with -F, -x or -w, etc., for awesome effects.)
So:
#! /usr/bin/env bash
t=(8 12 24)
seq 30 | grep -f <(printf '%s\n' "${t[#]}")
and I get:
8
12
18
24
28
I basically write a pseudo-file with one item of the array per line, and then tell grep to use each of these lines as a pattern.

The command
( IFS="|" ; grep --perl-regexp "${args[*]}" ) <file_name
searches the file for each keyword in an array. It does so by constructing regular expression word1|word2|word3 that matches any word from the alternatives given (in perl mode).
If I there is a way to join array elements into a string, delimiting them with sequence of characters (namely, \|), it could be done without perl regexp.

perhaps something like this;
cat file_name |while read line
for arg in ${args[#]}
do
echo $line | grep -q -w $arg}
done
done
not tested!

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

bash or perl inline capture of multiple pattern matches - bash

Simple thing : content='"id":"1234"abcdefg"id":"3456"' need to get the id's out: perl -le '#m = ( $content =~ /\"id\"\:\"(.*)\"/g); print for #m' Not working but need a one liner for use on confluence json/output for processing by Ansible.

In Perl: perl -sE 'say for $str =~ /"id":"(.*?)"/g' -- -str="$content"

Related

bash - extract part of variable that starts with digit

Substring from a string in bash using scripting language

bash script command output execution doesn't assign full output when using backticks

bash script to extract ALL matches of a regex pattern

Is it possible to do a grep with keywords stored in the array?

Categories

Resources