Extract substring from a variables between two patterns in bash with special characters - bash

I am trying to Extract substring from variables between two patterns in bash that as special characters inside the variable.
The variable:
MQ_URI=ssl://b-7dda5da6-59a5-4150-8e2f-16534985665-1.mq.us-east-1.amazonaws.com:61617?jms.prefetchPolicy.queuePrefetch=0
What I've tried so far:
echo "$MQ_URI" | sed -E 's/.*ssl:// (.*) :61617.*/\1/'
Got me this in response:
sed: -e expression #1, char 12: unknown option to `s'
Also tried with grep:
echo $MQ_URI | grep -o -P '(?<=ssl://).*(?=:61617jms.prefetchPolicy.queuePrefetch=0)
The output I need is everything between: "ssl://" and ":61617?jms.prefetchPolicy.queuePrefetch=0"
which is : "b-7dda5da6-59a5-4150-8e2f-16534985665-1.mq.us-east-1.amazonaws.com"

Using bash
$ mq_uri=${mq_uri##*/}
$ mq_uri=${mq_uri//:*}
$ echo "$mq_uri"
b-7dda5da6-59a5-4150-8e2f-16534985665-1.mq.us-east-1.amazonaws.com
sed
$ sed -E 's~[^-]*/([^?]*):.*~\1~' <<< "$mq_uri"
b-7dda5da6-59a5-4150-8e2f-16534985665-1.mq.us-east-1.amazonaws.com
grep
$ grep -Po '[^-]*/\K[^:]*' <<< "$mq_uri"
b-7dda5da6-59a5-4150-8e2f-16534985665-1.mq.us-east-1.amazonaws.com
awk
$ awk -F'[/:]' '{print $4}' <<< "$mq_uri"
b-7dda5da6-59a5-4150-8e2f-16534985665-1.mq.us-east-1.amazonaws.com

If this is what you expect
echo "$MQ_URI" | sed -E 's#.*ssl://(.*):61617.*#\1#'
b-7dda5da6-59a5-4150-8e2f-16534985665-1.mq.us-east-1.amazonaws.com
replace the delimiters by # or anything not found in the string.

With your shown samples and attempts please try following codes.
##Shell variable named `mq_uri` being created here.
##to be used in following all solutions.
mq_uri="ssl://b-7dda5da6-59a5-4150-8e2f-16534985665-1.mq.us-east-1.amazonaws.com:61617?jms.prefetchPolicy.queuePrefetch=0"
1st solution: Using awk's match function along with split` function here.
awk 'match($0,/^ssl:.*:61617\?/){split(substr($0,RSTART,RLENGTH),arr,"[/:]");print arr[4]}' <<<"$mq_uri"
2nd solution: Using GNU grep along with its -oP options and its \K option to get required output.
grep -oP '^ssl:\/\/\K[^:]*(?=:61617\?)' <<<"$mq_uri"
3rd solution: Using match function of awk along with using gsub to Globally substitute values to get required output.
awk 'match($0,/^ssl:.*:61617\?/){val=substr($0,RSTART,RLENGTH);gsub(/^ssl:\/\/|:.*\?/,"",val);print val}' <<<"$mq_uri"
4th solution: Using awk's match function along with its array creation capability in GNU awk.
awk 'match($0,/^ssl:\/\/(.*):61617\?/,arr){print arr[1]}' <<<"$mq_uri"
5th solution: With perl's One-liner solution please try following code.
perl -pe 's/ssl:\/\/(.*):61617\?.*/\1/' <<<"$mq_uri"

Related

extract string between '$$' characters - $$extractabc$$

I am working on shell script and new to it. I want to extract the string between double $$ characters, for example:
input:
$$extractabc$$
output
extractabc
I used grep and sed but not working out. Any suggestions are welcome!
You could do
awk -F"$" '{print $3}' file.txt
assuming the file contained input:$$extractabc$$ output:extractabc. awk splits your data into pieces using $ as a delimiter. First item will be input:, next will be empty, next will be extractabc.
You could use sed like so to get the same info.
sed -e 's/.*$$\(.*\)$$.*/\1/' file.txt
sed looks for information between $$s and outputs that. The goal is to type something like this .*$$(.*)$$.*. It's greedy but just stay with me.
looks for .* - i.e. any character zero or more times before $$
then the string should have $$
after $$ there'll be any character zero or more times
then the string should have another $$
and some more characters to follow
between the 2 $$ is (.*). String found between $$s is given a placeholder \1
sed finds such information and publishes it
Using grep PCRE (where available) and look-around:
$ echo '$$extractabc$$' | grep -oP "(?<=\\$\\$).*(?=\\$\\$)"
extractabc
echo '$$extractabc$$' | awk '{gsub(/\$\$/,"")}1'
extractabc
Here is an other variation:
echo "$$extractabc$$" | awk -F"$$" 'NF==3 {print $2}'
It does test of there are two set of $$ and only then prints whats between $$
Does also work for input like blabla$$some_data$$moreblabla
How about remove all the $ in the input?
$ echo '$$extractabc$$' | sed 's/\$//g'
extractabc
Same with tr
$ echo '$$extractabc$$' | tr -d '$'
extractabc

Adding double quotes to beginning, end and around comma's in bash variable

I have a shell script that accepts a parameter that is comma delimited,
-s 1234,1244,1567
That is passed to a curl PUT json field. Json needs the values in a "1234","1244","1567" format.
Currently, I am passing the parameter with the quotes already in it:
-s "\"1234\",\"1244\",\"1567\"", which works, but the users are complaining that its too much typing and hard to do. So I'd like to just take a comma delimited list like I had at the top and programmatically stick the quotes in.
Basically, I want a parameter to be passed in as 1234,2345 and end up as a variable that is "1234","2345"
I've come to read that easiest approach here is to use sed, but I'm really not familiar with it and all of my efforts are failing.
You can do this in BASH:
$> arg='1234,1244,1567'
$> echo "\"${arg//,/\",\"}\""
"1234","1244","1567"
awk to the rescue!
$ awk -F, -v OFS='","' -v q='"' '{$1=$1; print q $0 q}' <<< "1234,1244,1567"
"1234","1244","1567"
or shorter with sed
$ sed -r 's/[^,]+/"&"/g' <<< "1234,1244,1567"
"1234","1244","1567"
translating this back to awk
$ awk '{print gensub(/([^,]+)/,"\"\\1\"","g")}' <<< "1234,1244,1567"
"1234","1244","1567"
you can use this:
echo QV=$(echo 1234,2345,56788 | sed -e 's/^/"/' -e 's/$/"/' -e 's/,/","/g')
result:
echo $QV
"1234","2345","56788"
just add double quotes at start, end, and replace commas with quote/comma/quote globally.
easy to do with sed
$ echo '1234,1244,1567' | sed 's/[0-9]*/"\0"/g'
"1234","1244","1567"
[0-9]* zero more consecutive digits, since * is greedy it will try to match as many as possible
"\0" double quote the matched pattern, entire match is by default saved in \0
g global flag, to replace all such patterns
In case, \0 isn't recognized in some sed versions, use & instead:
$ echo '1234,1244,1567' | sed 's/[0-9]*/"&"/g'
"1234","1244","1567"
Similar solution with perl
$ echo '1234,1244,1567' | perl -pe 's/\d+/"$&"/g'
"1234","1244","1567"
Note: Using * instead of + with perl will give
$ echo '1234,1244,1567' | perl -pe 's/\d*/"$&"/g'
"1234""","1244""","1567"""
""$
I think this difference between sed and perl is similar to this question: GNU sed, ^ and $ with | when first/last character matches
Using sed:
$ echo 1234,1244,1567 | sed 's/\([0-9]\+\)/\"\1\"/g'
"1234","1244","1567"
ie. replace all strings of numbers with the same strings of numbers quoted using backreferencing (\1).

how to remove last comma from line in bash using "sed or awk"

Hi I want to remove last comma from a line. For example:
Input:
This,is,a,test
Desired Output:
This,is,a test
I am able to remove last comma if its also the last character of the string using below command: (However this is not I want)
echo "This,is,a,test," |sed 's/,$//'
This,is,a,test
Same command does not work if there are more characters past last comma in line.
echo "This,is,a,test" |sed 's/,$//'
This,is,a,test
I am able to achieve the results using dirty way by calling multiple commands, any alternative to achieve the same using awk or sed regex ?(This is I want)
echo "This,is,a,test" |rev |sed 's/,/ /' |rev
This,is,a test
$ echo "This,is,a,test" | sed 's/\(.*\),/\1 /'
This,is,a test
$ echo "This,is,a,test" | perl -pe 's/.*\K,/ /'
This,is,a test
In both cases, .* will match as much as possible, so only the last comma will be changed.
You can use a regex that matches not-comma, and captures that group, and then restores it in the replacement.
echo "This,is,a,test" |sed 's/,\([^,]*\)$/ \1/'
Output:
This,is,a test
All the answer are based on regex. Here is a non-regex way to remove last comma:
s='This,is,a,test'
awk 'BEGIN{FS=OFS=","} {$(NF-1)=$(NF-1) " " $NF; NF--} 1' <<< "$s"
This,is,a test
In Gnu AWK too since tagged:
$ echo This,is,a,test|awk '$0=gensub(/^(.*),/,"\\1 ","g",$0)'
This,is,a test
One way to do this is by using Bash Parameter Expansion.
$ s="This,is,a,test"
$ echo "${s%,*} ${s##*,}"
This,is,a test

Text Manipulation using sed or AWK

I get the following result in my script when I run it against my services. The result differs depending on the service but the text pattern showing below is similar. The result of my script is assigned to var1. I need to extract data from this variable
$var1=HOST1*prod*gem.dot*serviceList : svc1 HOST1*prod*kem.dot*serviceList : svc3, svc4 HOST1*prod*fen.dot*serviceList : svc5, svc6
I need to strip the name of the service list from $var1. So the end result should be printed on separate line as follow:
svc1
svc2
svc3
svc4
svc5
svc6
Can you please help with this?
Regards
Using sed and grep:
sed 's/[^ ]* :\|,\|//g' <<< "$var1" | grep -o '[^ ]*'
sed deletes every non-whitespace before a colon and commas. Grep just outputs the resulting services one per line.
Using gnu grep and gnu sed:
grep -oP ': *\K\w+(, \w+)?' <<< "$var1" | sed 's/, /\n/'
svc1
svc3
svc4
svc5
svc6
grep is the perfect tool for the job.
From man grep:
-o, --only-matching
Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.
Sounds perfect!
As far as I'm aware this will work on any grep:
echo "$var1" | grep -o 'svc[0-9]\+'
Matches "svc" followed by one or more digits. You can also enable the "highly experimental" Perl regexp mode with -P, which means you can use the \d digit character class and don't have to escape the + any more:
grep -Po 'svc\d+' <<<"$var1"
In bash you can use <<< (a Here String) which supplies "$var1" to grep on the standard input.
By the way, if your data was originally on separate lines, like:
HOST1*prod*gem.dot*serviceList : svc1
HOST1*prod*kem.dot*serviceList : svc3, svc4
HOST1*prod*fen.dot*serviceList : svc5, svc6
This would be a good job for awk:
awk -F': ' '{split($2,a,", "); for (i in a) print a[i]}'

shell command to truncate/cut a part of string

I have a file with the below contents. I got the command to print version number out of it. But I need to truncate the last part in the version file
file.spec:
Version: 3.12.0.2
Command used:
VERSION=($(grep -r "Version:" /path/file.spec | awk '{print ($2)}'))
echo $VERSION
Current output : 3.12.0.2
Desired output : 3.12.0
There is absolutey no need for external tools like awk, sed etc. for this simple task if your shell is POSIX-compliant (which it should be) and supports parameter expansion:
$ cat file.spec
Version: 3.12.0.2
$ version=$(<file.spec)
$ version="${version#* }"
$ version="${version%.*}"
$ echo "${version}"
3.12.0
Try this:
VERSION=($(grep -r "Version:" /path/file.spec| awk '{print ($2)}' | cut -d. -f1-3))
Cut split string with field delimiter (-d) , then you select desired field with -f param.
You could use this single awk script awk -F'[ .]' '{print $2"."$3"."$4}':
$ VERSION=$(awk -F'[ .]' '{print $2"."$3"."$4}' /path/file.spec)
$ echo $VERSION
3.12.0
Or this single grep
$ VERSION=$(grep -Po 'Version: \K\d+[.]\d+[.]\d' /path/file.spec)
$ echo $VERSION
3.12.0
But you never need grep and awk together.
if you only grep single file, -r makes no sense.
also based on the output of your command line, this grep should work:
grep -Po '(?<=Version: )(\d+\.){2}\d+' /path/file.spec
gives you:
3.12.0
the \K is also nice. worked for fixed/non-fixed length look-behind. (since PCRE 7.2). There is another answer about it. but I feel look-behind is easier to read, if fixed length.

Resources