extract string between '$$' characters - $$extractabc$$ - shell

I am working on shell script and new to it. I want to extract the string between double $$ characters, for example:
input:
$$extractabc$$
output
extractabc
I used grep and sed but not working out. Any suggestions are welcome!

You could do
awk -F"$" '{print $3}' file.txt
assuming the file contained input:$$extractabc$$ output:extractabc. awk splits your data into pieces using $ as a delimiter. First item will be input:, next will be empty, next will be extractabc.
You could use sed like so to get the same info.
sed -e 's/.*$$\(.*\)$$.*/\1/' file.txt
sed looks for information between $$s and outputs that. The goal is to type something like this .*$$(.*)$$.*. It's greedy but just stay with me.
looks for .* - i.e. any character zero or more times before $$
then the string should have $$
after $$ there'll be any character zero or more times
then the string should have another $$
and some more characters to follow
between the 2 $$ is (.*). String found between $$s is given a placeholder \1
sed finds such information and publishes it

Using grep PCRE (where available) and look-around:
$ echo '$$extractabc$$' | grep -oP "(?<=\\$\\$).*(?=\\$\\$)"
extractabc

echo '$$extractabc$$' | awk '{gsub(/\$\$/,"")}1'
extractabc

Here is an other variation:
echo "$$extractabc$$" | awk -F"$$" 'NF==3 {print $2}'
It does test of there are two set of $$ and only then prints whats between $$
Does also work for input like blabla$$some_data$$moreblabla

How about remove all the $ in the input?
$ echo '$$extractabc$$' | sed 's/\$//g'
extractabc
Same with tr
$ echo '$$extractabc$$' | tr -d '$'
extractabc

Related

How do you remove a section of of a file name after underscore including the underscore using bash? [duplicate]

How can I remove all text after a character, in this case a colon (":"), in bash? Can I remove the colon, too? I have no idea how to.
In Bash (and ksh, zsh, dash, etc.), you can use parameter expansion with % which will remove characters from the end of the string or # which will remove characters from the beginning of the string. If you use a single one of those characters, the smallest matching string will be removed. If you double the character, the longest will be removed.
$ a='hello:world'
$ b=${a%:*}
$ echo "$b"
hello
$ a='hello:world:of:tomorrow'
$ echo "${a%:*}"
hello:world:of
$ echo "${a%%:*}"
hello
$ echo "${a#*:}"
world:of:tomorrow
$ echo "${a##*:}"
tomorrow
An example might have been useful, but if I understood you correctly, this would work:
echo "Hello: world" | cut -f1 -d":"
This will convert Hello: world into Hello.
$ echo 'hello:world:again' |sed 's/:.*//'
hello
I know some solutions:
# Our mock data:
A=user:mail:password
With awk and pipe:
$ echo $A | awk -v FS=':' '{print $1}'
user
Via bash variables:
$ echo ${A%%:*}
user
With pipe and sed:
$ echo $A | sed 's#:.*##g'
user
With pipe and grep:
$ echo $A | egrep -o '^[^:]+'
user
With pipe and cut:
$ echo $A | cut -f1 -d\:
user
egrep -o '^[^:]*:'
trim off everything after the last instance of ":"
grep -o '^.*:' fileListingPathsAndFiles.txt
and if you wanted to drop that last ":"
grep -o '^.*:' file.txt | sed 's/:$//'
#kp123: you'd want to replace : with / (where the sed colon should be \/)
Let's say you have a path with a file in this format:
/dirA/dirB/dirC/filename.file
Now you only want the path which includes four "/". Type
$ echo "/dirA/dirB/dirC/filename.file" | cut -f1-4 -d"/"
and your output will be
/dirA/dirB/dirC
The advantage of using cut is that you can also cut out the uppest directory as well as the file (in this example), so if you type
$ echo "/dirA/dirB/dirC/filename.file" | cut -f1-3 -d"/"
your output would be
/dirA/dirB
Though you can do the same from the other side of the string, it would not make that much sense in this case as typing
$ echo "/dirA/dirB/dirC/filename.file" | cut -f2-4 -d"/"
results in
dirA/dirB/dirC
In some other cases the last case might also be helpful. Mind that there is no "/" at the beginning of the last output.

Using Bash to delete a bracket delimited substring from string variable

I have this string in a variable:
strVar="Hello World [randomSubstring].zip"
I would like to extract [randomSubstring], where that substring inside the brackets could be anything.
The expected result must be something like this:
echo "$strVar"
Hello World .zip
I tried several combinations with grep and awk but without success, I am using CentOS 7.
echo "Hello World [RNVE5Z].zip" | grep -oP '(?<=[).*(?=])'
echo "Hello World [RNVE5Z].zip" | awk -F"["" '{print $1}' | awk -F"]" '{print $2}'
Bash only:
echo ${strVar/\[*\]/}
Just use bash's substring manipulation:
echo ${strVar/\[*\]/}
I would prefer it over an external call to sed, except there is more to be done, why I use sed anyway:
echo $strVar | sed 's/\[.*\]//'
I don't think there is an elegant solution for grep but might be wrong. In awk I'm not that fluent.

Using sed to extract strings from a text file

I have text data in this form:
^Well/Well[ADV]+ADV ^John/John[N]+N ^has/have[V]+V+3sg+PRES ^a/a[ART]
^quite/quite[ADV]+ADV ^different/different[ADJ]+ADJ ^not/not[PART]
^necessarily/necessarily[ADV]+ADV ^more/more[ADV]+ADV
^elaborated/elaborate[V]+V+PPART ^theology/theology[N]+N *edu$
And I want it to be processed to this form:
Well John have a quite different not necessarily more elaborate theology
Basically, I need every string between the starting character / and the ending character [.
Here is what I tried, but I just get empty files...
#!/bin/bash
for file in probe/*.txt
do sed '///,/[/d' $file > $file.aa
mv $file.aa $file
done
awk to the rescue!
$ awk -F/ -v RS=^ -v ORS=' ' '{print $1}' file
Well John has a quite different not necessarily more elaborated theology
Explanation set record separator (RS) to ^ to separate your logical groups, also set the field separator (FS) to / and print the first field as your requirement. Finally, setting the output field separator (OFS) to space (instead of the default new line) keeps the extracted fields on the same line.
With GNU grep and Perl compatible regular expressions (-P):
$ echo $(grep -Po '(?<=/)[^[]*' infile)
Well John have a quite different not necessarily more elaborate theology
-o retains just the matches, (?<=/) is a positive look-behind ("make sure there is a /, but don't include it in the match"), and [^[]* is "a sequence of characters other than [".
grep -Po prints one match per line; by using the output of grep as arguments to echo, we convert the newlines into spaces (could also be done by piping to tr '\n' ' ').
cat file|grep -oE "\/[^\[]*\[" |sed -e 's#^/##' -e 's/\[$//' | tr -s "\n" " "

Adding double quotes to beginning, end and around comma's in bash variable

I have a shell script that accepts a parameter that is comma delimited,
-s 1234,1244,1567
That is passed to a curl PUT json field. Json needs the values in a "1234","1244","1567" format.
Currently, I am passing the parameter with the quotes already in it:
-s "\"1234\",\"1244\",\"1567\"", which works, but the users are complaining that its too much typing and hard to do. So I'd like to just take a comma delimited list like I had at the top and programmatically stick the quotes in.
Basically, I want a parameter to be passed in as 1234,2345 and end up as a variable that is "1234","2345"
I've come to read that easiest approach here is to use sed, but I'm really not familiar with it and all of my efforts are failing.
You can do this in BASH:
$> arg='1234,1244,1567'
$> echo "\"${arg//,/\",\"}\""
"1234","1244","1567"
awk to the rescue!
$ awk -F, -v OFS='","' -v q='"' '{$1=$1; print q $0 q}' <<< "1234,1244,1567"
"1234","1244","1567"
or shorter with sed
$ sed -r 's/[^,]+/"&"/g' <<< "1234,1244,1567"
"1234","1244","1567"
translating this back to awk
$ awk '{print gensub(/([^,]+)/,"\"\\1\"","g")}' <<< "1234,1244,1567"
"1234","1244","1567"
you can use this:
echo QV=$(echo 1234,2345,56788 | sed -e 's/^/"/' -e 's/$/"/' -e 's/,/","/g')
result:
echo $QV
"1234","2345","56788"
just add double quotes at start, end, and replace commas with quote/comma/quote globally.
easy to do with sed
$ echo '1234,1244,1567' | sed 's/[0-9]*/"\0"/g'
"1234","1244","1567"
[0-9]* zero more consecutive digits, since * is greedy it will try to match as many as possible
"\0" double quote the matched pattern, entire match is by default saved in \0
g global flag, to replace all such patterns
In case, \0 isn't recognized in some sed versions, use & instead:
$ echo '1234,1244,1567' | sed 's/[0-9]*/"&"/g'
"1234","1244","1567"
Similar solution with perl
$ echo '1234,1244,1567' | perl -pe 's/\d+/"$&"/g'
"1234","1244","1567"
Note: Using * instead of + with perl will give
$ echo '1234,1244,1567' | perl -pe 's/\d*/"$&"/g'
"1234""","1244""","1567"""
""$
I think this difference between sed and perl is similar to this question: GNU sed, ^ and $ with | when first/last character matches
Using sed:
$ echo 1234,1244,1567 | sed 's/\([0-9]\+\)/\"\1\"/g'
"1234","1244","1567"
ie. replace all strings of numbers with the same strings of numbers quoted using backreferencing (\1).

Text Manipulation using sed or AWK

I get the following result in my script when I run it against my services. The result differs depending on the service but the text pattern showing below is similar. The result of my script is assigned to var1. I need to extract data from this variable
$var1=HOST1*prod*gem.dot*serviceList : svc1 HOST1*prod*kem.dot*serviceList : svc3, svc4 HOST1*prod*fen.dot*serviceList : svc5, svc6
I need to strip the name of the service list from $var1. So the end result should be printed on separate line as follow:
svc1
svc2
svc3
svc4
svc5
svc6
Can you please help with this?
Regards
Using sed and grep:
sed 's/[^ ]* :\|,\|//g' <<< "$var1" | grep -o '[^ ]*'
sed deletes every non-whitespace before a colon and commas. Grep just outputs the resulting services one per line.
Using gnu grep and gnu sed:
grep -oP ': *\K\w+(, \w+)?' <<< "$var1" | sed 's/, /\n/'
svc1
svc3
svc4
svc5
svc6
grep is the perfect tool for the job.
From man grep:
-o, --only-matching
Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.
Sounds perfect!
As far as I'm aware this will work on any grep:
echo "$var1" | grep -o 'svc[0-9]\+'
Matches "svc" followed by one or more digits. You can also enable the "highly experimental" Perl regexp mode with -P, which means you can use the \d digit character class and don't have to escape the + any more:
grep -Po 'svc\d+' <<<"$var1"
In bash you can use <<< (a Here String) which supplies "$var1" to grep on the standard input.
By the way, if your data was originally on separate lines, like:
HOST1*prod*gem.dot*serviceList : svc1
HOST1*prod*kem.dot*serviceList : svc3, svc4
HOST1*prod*fen.dot*serviceList : svc5, svc6
This would be a good job for awk:
awk -F': ' '{split($2,a,", "); for (i in a) print a[i]}'

Resources