bash script to extract ALL matches of a regex pattern - bash

I found this but it assumes the words are space separated.
result="abcdefADDNAME25abcdefgHELLOabcdefgADDNAME25abcdefgHELLOabcdefg"
for word in $result
do
if echo $word | grep -qi '(ADDNAME\d\d.*HELLO)'
then
match="$match $word"
fi
done
POST EDITED
Re-naming for clarity:
data="abcdefADDNAME25abcdefgHELLOabcdefgADDNAME25abcdefgHELLOabcdefg"
for word in $data
do
if echo $word | grep -qi '(ADDNAME\d\d.*HELLO)'
then
match="$match $word"
fi
done
echo $match
Original left so comments asking about result continue to make sense.

Use grep -o
-o, --only-matching show only the part of a line matching PATTERN

Edit: answer to edited question:
for string in "$(echo $result | grep -Po "ADDNAME[0-9]{2}.*?HELLO")"; do
match="${match:+$match }$string"
done
Original answer:
If you're using Bash version 3.2 or higher, you can use its regex matching.
string="string to search 99 with 88 some 42 numbers"
pattern="[0-9]{2}"
for word in $string; do
[[ $word =~ $pattern ]]
if [[ ${BASH_REMATCH[0]} ]]; then
match="${match:+$match }${BASH_REMATCH[0]}"
fi
done
The result will be "99 88 42".

Not very elegant - and there are problems because of greedy matching - but this more or less works:
data="abcdefADDNAME25abcdefgHELLOabcdefgADDNAME25abcdefgHELLOabcdefg"
for word in $data \
"ADDNAME25abcdefgHELLOabcdefgADDNAME25abcdefgHELLOabcdefg" \
"ADDNAME25abcdefgHELLOabcdefgADDNAME25abcdefgHELLO"
do
echo $word
done |
sed -e '/ADDNAME[0-9][0-9][a-z]*HELLO/{
s/\(ADDNAME[0-9][0-9][a-z]*HELLO\)/ \1 /g
}' |
while read line
do
set -- $line
for arg in "$#"
do echo $arg
done
done |
grep "ADDNAME[0-9][0-9][a-z]*HELLO"
The first loop echoes three lines of data - you'd probably replace that with cat or I/O redirection. The sed script uses a modified regex to put spaces around the patterns. The last loop breaks up the 'space separated words' into one 'word' per line. The final grep selects the lines you want.
The regex is modified with [a-z]* in place of the original .* because the pattern matching is greedy. If the data between ADDNAME and HELLO is unconstrained, then you need to think about using non-greedy regexes, which are available in Perl and probably Python and other modern scripting languages:
#!/bin/perl -w
while (<>)
{
while (/(ADDNAME\d\d.*?HELLO)/g)
{
print "$1\n";
}
}
This is a good demonstration of using the right too for the job.

Related

Cut string of numbers at letter in bash

I have a string such as plantford1775.274.284b63.11.
I have been using identity=$( echo "$identity" | cut -d'.' -f3) to cut at each dot, and then choose the third section. I am left with 284b63.
The format of this part is always a letter, sandwiched by varying amounts of numbers. I would like to take the first few numbers before the letter. An example code line would be this:
identity=$( echo "$identity" | cut -d'anyletter' -f1)
What do I replace anyletter with to cut at whatever letter is listed there, so that I end with a string of 284?
This could be done in single awk, please try following written and tested with your shown samples.
echo "$identity" | awk -F'.' '{sub(/[^0-9].*/,"",$3);print $3}'
Explanation: simple explanation would be, passing echo command's output as a standard input to awk code. In awk program, setting field separator as . for values. Then in 3rd field substituting(using sub function of awk) everything apart from digits with NULL in 3rd field, then printing it.
Try:
echo plantford1775.274.284b63.11 | cut -d. -f3 | sed 's/[a-z].*//'
Or a slight variation on the REGEX, with [[...]] in bash:
v="plantford1775.274.284b63.11"
[[ $v =~ ^[^.]+.[^.]+.([^.]+).*$ ]] && echo ${BASH_REMATCH[1]}
Output
284b63
Or if you are only interested in the digits before the letter:
[[ $v =~ ^[^.]+.[^.]+.([[:digit:]]+)[^.]+.*$ ]] && echo ${BASH_REMATCH[1]}
Output
284
With bash, using the =~ operator :
[[ $identity =~ [^.]*.[^.]*.([0-9]+) ]] && identity=${BASH_REMATCH[1]}
or, in POSIX shell:
identity=${identity#*.*.}
identity=${identity%%[^0-9]*}
or, using sed:
identity=$(sed 's/[^.]*.[^.]*.\([0-9]*\).*/\1/' <<< "$identity")
Maybe you can use a bash regex and get the result from $BASH_REMATCH.
[[ "$identity" =~ ([0-9]+)[a-z][0-9]+ ]] && identity="${BASH_REMATCH[1]}"
Say we have
identity=284b63
then you can do a
lead=${identity%[a-z]*}
to set lead to 284. Feel free to adapt the pattern to upper case letters and/or other separators.
If the format of this part is always a letter, sandwiched by varying amounts of numbers, and you want to match this format, you might also use gnu awk, setting the field separator to . and use a pattern with a capture group for the 3rd field.
The pattern captures 1 or more digits from the start of the string, and match one of more chars [a-z] after it followed by a digit.
echo "$identity" | awk -F'.' 'match($3, /^([0-9]+)[a-z]+[0-9]/, ary) {print ary[1]}'
Output
284
Or using sed with a pattern matching the first 2 dots and the capture group after the 2nd dot:
identity=$(sed 's/^[^.]\+\.[^\.]\+\.\([0-9]\+\)[a-z]\+[0-9].*/\1/' <<< "$identity")

Q: find longest string using for loop (Bash)

Im learning bash, and I have an assignment where I need to iterate through a list of strings in bash using a for loop, and return the longest string.
This is what I've written:
max=-1
word=""
list=`cat random-text.txt | tr -s [:space:] " " | sed -r 's/([.* ])/\1\n/g' | grep -E "^a.*" | sed -r 's/(.*)[[:space:]]/\1/' | tr -s [:space:] " "`
for i in $list; do
int=`$i | wc -c`
if [ $int > $max ]; then
max=$int
word=$i
fi
done
echo The longest word in $infile that starts with $char is $i
that's probably a bit messy, but I'm having trouble using the for loop (I need the echo function at the end to return the longest string I have found iterating through the array.
** that's a part of a longer script I've written, I
Thanks in advance, much appreciated!
for some reason, while I run this script I get an error which says: "Command 'an' not found
That's because you erroneously used $i | to feed the content of variable i to wc; correct is <<<$i instead (with Bash). But better use just int=${#i}.
Then in $int > $max the > is interpreted as an output redirection; the correct arithmetic comparison operator is -gt.
Finally you don't echo the longest word found, but rather the last processed one; change $i to $word there.

bash or perl inline capture of multiple pattern matches

Simple thing :
content='"id":"1234"abcdefg"id":"3456"'
need to get the id's out:
perl -le '#m = ( $content =~ /\"id\"\:\"(.*)\"/g); print for #m'
Not working but need a one liner for use on confluence json/output for processing by Ansible.
In Perl:
perl -sE 'say for $str =~ /"id":"(.*?)"/g' -- -str="$content"
Use
grep -Po 'id":"\K.*?(?=")' <<< "$content"
-P enables perl regexes.
-o prints only the matching parts instead of whole lines.
\K lets grep forget that it matched id":" before.
.*? matches as few caracters as possible ...
(?=") ... until a " is the next character. The " will not be included in the match.
Output for your example:
1234
3456

bash script command output execution doesn't assign full output when using backticks

I used many times [``] to capture output of command to a variable. but with following code i am not getting right output.
#!/bin/bash
export XLINE='($ZWP_SCRIP_NAME),$ZWP_LT_RSI_TRIGGER)R),$ZWP_RTIMER'
echo 'Original XLINE'
echo $XLINE
echo '------------------'
echo 'Extract all word with $ZWP'
#works fine
echo $XLINE | sed -e 's/\$/\n/g' | sed -e 's/.*\(ZWP[_A-Z]*\).*/\1/g' | grep ZWP
echo '------------------'
echo 'Assign all word with $ZWP to XVAR'
#XVAR doesn't get all the values
export XVAR=`echo $XLINE | sed -e 's/\$/\n/g' | sed -e 's/.*\(ZWP[_A-Z]*\).*/\1/g' | grep ZWP` #fails
echo "$XVAR"
and i get:
Original XLINE
($ZWP_SCRIP_NAME),$ZWP_LT_RSI_TRIGGER)R),$ZWP_RTIMER
------------------
Extract all word with $ZWP
ZWP_SCRIP_NAME
ZWP_LT_RSI_TRIGGER
ZWP_RTIMER
------------------
Assign all word with $ZWP to XVAR
ZWP_RTIMER
why XVAR doesn't get all the values?
however if i use $() to capture the out instead of ``, it works fine. but why `` is not working?
Having GNU grep you can use this command:
XVAR=$(grep -oP '\$\KZWP[A-Z_]+' <<< "$XLINE")
If you pass -P grep is using Perl compatible regular expressions. The key here is the \K escape sequence. Basically the regex matches $ZWP followed by one or more uppercase characters or underscores. The \K after the $ removes the $ itself from the match, while its presence is still required to match the whole pattern. Call it poor man's lookbehind if you want, I like it! :)
Btw, grep -o outputs every match on a single line instead of just printing the lines which match the pattern.
If you don't have GNU grep or you care about portability you can use awk, like this:
XVAR=$(awk -F'$' '{sub(/[^A-Z_].*/, "", $2); print $2}' RS=',' <<< "$XLINE")
First, the smallest change that makes your code "work":
echo "$XLINE" | tr '$' '\n' | sed -e 's/.*\(ZWP[_A-Z]*\).*/\1/g' | grep ZWP_
The use of tr replaces a sed expression that didn't actually do what you thought it did -- try looking at its output to see.
One sane alternative would be to rely on GNU grep's -o option. If you can't do that...
zwpvars=( ) # create a shell array
zwp_assignment_re='[$](ZWP_[[:alnum:]_]+)(.*)' # ...and a regex
content="$XLINE"
while [[ $content =~ $zwp_assignment_re ]]; do
zwpvars+=( "${BASH_REMATCH[1]}" ) # found a reference
content=${BASH_REMATCH[2]} # stuff the remaining content aside
done
printf 'Found variable: %s\n' "${zwpvars[#]}"

Get any string between 2 string and assign a variable in bash

I cannot get this to work. I only want to get the string between 2 others in bash. Like this:
FOUND=$(echo "If show <start>THIS WORK<end> then it work" | **the magic**)
echo $FOUND
It seems so simple...
sed -n 's/.*<start>\(.*\)<end>.*/\1/p'
This can be done in bash without any external commands such as awk and sed. When doing a regex match in bash, the results of the match are put into a special array called BASH_REMATCH. The second element of this array contains the match from the first capture group.
data="If show <start>THIS WORK<end> then it work"
regex="<start>(.*)<end>"
[[ $data =~ $regex ]] && found="${BASH_REMATCH[1]}"
echo $found
This can also be done using perl regex in grep (GNU specific):
found=$(grep -Po '(?<=<start>).*(?=<end>)' <<< "If show <start>THIS WORK<end> then it work")
echo "$found"
If you have < start > and < end > in your string then this will work. Set the FS to < and >.
[jaypal:~/Temp] FOUND=$(echo "If show <start>THIS WORK<end> then it work" |
awk -v FS="[<>]" '{print $3}')
[jaypal:~/Temp] echo $FOUND
THIS WORK

Resources