How to delete everything outside of {} braces in BASH? - bash

I need to delete all data appearing outside of { and } brackets. E.g., here is the line $variable:
The fish {{went}} to the {{restaurant}} to eat some {fish} for lunch.
The output, after deleting everything outside the paired {'s and }'s would just be:
{{went}}{{restaurant}}{fish}
All braces appear in pairs.
I've found the post Delete all data outside square brackets, which is similar, and deals with square braces, but my attempt to modify the two answers to work failed, because both [ and { can have multiple meanings within the code, either as the symbol displayed in the original data, or as something sed or awk or regular expressions use. This is what I tried, based on the answers in the other post.
awk -F '\{\}\{\}' '{for (i=2; i<NF; i+=2) printf "[%s]%s", $i, OFS; print ""}' <<< "$variable"
sed -e 's/^[^\{]*//;s/\}[^\{]*\[/\} \[/g;s/[^{]*$//;' <<< "$variable"
How can I make the proper modifications so that one of these will delete all data outside of the braces?

Here is a solution using grep. The -P means to use Perl syntax which allows for non-greedy expressions and the -o prints only the match.
echo "The fish {{went}} to the {{restaurant}} to eat some {fish} for lunch." |
grep -Po '{?{[^{}]+}}?'

$ echo "The fish {{went}} to the {{restaurant}} to eat some {fish} for lunch." |
sed -r 's/(^|\})[^{}]+(\{|$)/\1\2/g'
{{went}}{{restaurant}}{fish}
or with GNU awk for FPAT:
$ echo "The fish {{went}} to the {{restaurant}} to eat some {fish} for lunch." |
gawk -v FPAT='{[^}]+}+' -v OFS= '{$1=$1}1'
{{went}}{{restaurant}}{fish}

This might work for you (GNU sed):
sed 's/[^{]*\(\({{*[^}]*}}*\)*\)/\1/g' file
or:
sed -r 's/[^{]*((\{+[^}]*\}+)*)/\1/g' file
Assuming all { and } are balanced.
N.B. This avoids alternation.

Here's another way using vanilla sed:
sed 's/^[^{]*\|[^}]*$//g; s/}[^{}]*{/}{/g' <<< "$variable"
Results:
{{went}}{{restaurant}}{fish}

A little late to the party. Here is a perl solution.
perl -ne'print for /{[^}]+}+/g'
or if you prefer a new line at the end then
perl -ne'print for /{[^}]+}+/g }{ print "\n"'
$ echo "The fish {{went}} to the {{restaurant}} to eat some {fish} for lunch." |
perl -ne'print for /{[^}]+}+/g }{ print "\n"'
{{went}}{{restaurant}}{fish}

Related

Use sed to transform a comma space seperated list into a comma seperated list with quotes around each element

I have this
a/b/Test b/c/Test c/d/Test
and want to transform it into:
"a/b/Test", "b/c/Test", "c/d/Test"
I know I can use this (here: path=a/b/Test b/c/Test c/d/Test)
test=$(echo $path | sed 's/ /", "/g')
to transform it into
a/b/Test", "b/c/Test", "c/d/Test
But here I am missing the first and last ".
I dont quite know how to use sed for this. Can I somehow change it and use the anchors ^ and $ to get the first and last part of the string and add " there?
sed 's/.*/"&"/g ; s/ /", "/g' filename
You may use awk:
s='a/b/Test b/c/Test c/d/Test'
awk -v OFS=', ' '{for (i=1; i<=NF; i++) $i = "\"" $i "\""} 1' <<< "$s"
"a/b/Test", "b/c/Test", "c/d/Test"
awk is easier:
awk -v OFS=", " -v q='"' '{for(i=1;i<=NF;i++)$i=q $i q}7'
You may just add double quotes if you have a single line text:
test="a/b/Test b/c/Test c/d/Test"
test='"'$(echo "$test" | sed 's/[[:space:]]\{1,\}/",&"/g')'"'
echo "$test"
See the online demo
If you have multiple lines use
test=$(echo "$test" | sed 's/[[:space:]]\{1,\}/",&"/g; s/^/"/g; s/$/"/g')
test=$(echo "$test" | sed -E 's/[[:space:]]+/",&"/g; s/^|$/"/g')
See this online demo
The [[:space:]]\{1,\} POSIX BRE pattern (equal to [[:space:]]+ POSIX ERE) matches one or more whitespace chars and & in the replacement pattern inserts this matched value back in the resulting string.

How to get all the group names in given subscription az cli [duplicate]

I am trying to use awk to get the name of a file given the absolute path to the file.
For example, when given the input path /home/parent/child/filename I would like to get filename
I have tried:
awk -F "/" '{print $5}' input
which works perfectly.
However, I am hard coding $5 which would be incorrect if my input has the following structure:
/home/parent/child1/child2/filename
So a generic solution requires always taking the last field (which will be the filename).
Is there a simple way to do this with the awk substr function?
Use the fact that awk splits the lines in fields based on a field separator, that you can define. Hence, defining the field separator to / you can say:
awk -F "/" '{print $NF}' input
as NF refers to the number of fields of the current record, printing $NF means printing the last one.
So given a file like this:
/home/parent/child1/child2/child3/filename
/home/parent/child1/child2/filename
/home/parent/child1/filename
This would be the output:
$ awk -F"/" '{print $NF}' file
filename
filename
filename
In this case it is better to use basename instead of awk:
$ basename /home/parent/child1/child2/filename
filename
If you're open to a Perl solution, here one similar to fedorqui's awk solution:
perl -F/ -lane 'print $F[-1]' input
-F/ specifies / as the field separator
$F[-1] is the last element in the #F autosplit array
Another option is to use bash parameter substitution.
$ foo="/home/parent/child/filename"
$ echo ${foo##*/}
filename
$ foo="/home/parent/child/child2/filename"
$ echo ${foo##*/}
filename
Like 5 years late, I know, thanks for all the proposals, I used to do this the following way:
$ echo /home/parent/child1/child2/filename | rev | cut -d '/' -f1 | rev
filename
Glad to notice there are better manners
It should be a comment to the basename answer but I haven't enough point.
If you do not use double quotes, basename will not work with path where there is space character:
$ basename /home/foo/bar foo/bar.png
bar
ok with quotes " "
$ basename "/home/foo/bar foo/bar.png"
bar.png
file example
$ cat a
/home/parent/child 1/child 2/child 3/filename1
/home/parent/child 1/child2/filename2
/home/parent/child1/filename3
$ while read b ; do basename "$b" ; done < a
filename1
filename2
filename3
I know I'm like 3 years late on this but....
you should consider parameter expansion, it's built-in and faster.
if your input is in a var, let's say, $var1, just do ${var1##*/}. Look below
$ var1='/home/parent/child1/filename'
$ echo ${var1##*/}
filename
$ var1='/home/parent/child1/child2/filename'
$ echo ${var1##*/}
filename
$ var1='/home/parent/child1/child2/child3/filename'
$ echo ${var1##*/}
filename
you can skip all of that complex regex :
echo '/home/parent/child1/child2/filename' |
mawk '$!_=$-_=$NF' FS='[/]'
filename
2nd to last :
mawk '$!--NF=$NF' FS='/'
child2
3rd last field :
echo '/home/parent/child1/child2/filename' |
mawk '$!--NF=$--NF' FS='[/]'
child1
4th-last :
mawk '$!--NF=$(--NF-!-FS)' FS='/'
echo '/home/parent/child000/child00/child0/child1/child2/filename' |
child0
echo '/home/parent/child1/child2/filename'
parent
major caveat :
- `gawk/nawk` has a slight discrepancy with `mawk` regarding
- how it tracks multiple,
- and potentially conflicting, decrements to `NF`,
- so other than the 1st solution regarding last field,
- the rest for now, are only applicable to `mawk-1/2`
just realized it's much much cleaner this way in mawk/gawk/nawk :
echo '/home/parent/child1/child2/filename' | …
'
awk ++NF FS='.+/' OFS= # updated such that
# root "/" still gets printed
'
filename
You can also use:
sed -n 's/.*\/\([^\/]\{1,\}\)$/\1/p'
or
sed -n 's/.*\/\([^\/]*\)$/\1/p'

Replace string by regex

I have bunch of string like "{one}two", where "{one}" could be different and "two" is always the same. I need to replace original sting with "three{one}", "three" is also constant. It could be easily done with python, for example, but I need it to be done with shell tools, like sed or awk.
If I understand correctly, you want:
{one}two --> three{one}
{two}two --> three{two}
{n}two --> three{n}
SED with a backreference will do that:
echo "{one}two" | sed 's/\(.*\)two$/three\1/'
The search store all text up to your fixed string, and then replace with the your new string pre-appended to the stored text. SED is greedy by default, so it should grab all text up to your fixed string even if there's some repeat in the variable part (e.gxx`., {two}two will still remap to three{two} properly).
Using sed:
s="{one}two"
sed 's/^\(.*\)two/three\1/' <<< "$s"
three{one}
echo "XXXtwo" | sed -E 's/(.*)two/three\1/'
Here's a Bash only solution:
string="{one}two"
echo "three${string/two/}"
awk '{a=gensub(/(.*)two/,"three\\1","g"); print a}' <<< "{one}two"
Output:
three{one}
awk '/{.*}two/ { split($0,s,"}"); print "three"s[1]"}" }' <<< "{one}two"
does also output
three{one}
Here, we are using awk to find the correct lines, and then split on "}" (which means your lines should not contain more than the one to indicate the field).
Through GNU sed,
$ echo 'foo {one}two bar' | sed -r 's/(\{[^}]*\})two/three\1/g'
foo three{one} bar
Basic sed,
$ echo 'foo {one}two bar' | sed 's/\({[^}]*}\)two/three\1/g'
foo three{one} bar

How to increment a variable caught with sed?

I have a sed script that would catch a phrase objID="x", where x can be any positive integer.
I would like to increment it a by a constant value, let's say 100 in whole file. How can I do that?
sed 's/objID="\(\d\)"/objID="\1"/g
What should I change in that ?
Try doing this :
With perl :
$ echo 'objID="1"' |
perl -pe 's/(objID=")(\d+)(")/sprintf "%s%s%s", $1, $2+1, $3/ge'
objID="2"
With awk :
$ echo 'objID="1"' | awk -F'"' '/objID=/{print $1 $2+1 $3}'
objID="2"
as I commented, awk, perl would do the job easier, however if sed is a must requirement, take a look this example:
(Gnu Sed required)
kent$ echo 'objID="7"'|sed -r 's/(objID=")([0-9]+)(")/echo \1$((100+\2))\3/ge'
objID=107

Get specific string

I need to get a specific string from a bigger string:
From these Abcd1234_Tot9012_tore.dr or Abcd1234_Tot9012.tore.dr
I want to get those numbers which are between Tot and _ or . , so I should get 9012. Important thing is that the number of characters before and after these numbers may vary.
Could anyone give me a nice solution for this? Thanks in advance!
This should also work if you are looking only for numbers after Tot
[srikanth#myhost ~]$ echo "Abcd1234_Tot9012_tore.dr" | awk ' { match($0,/Tot([0-9]*)/,a); print a[1]; } '
9012
[srikanth#myhost ~]$ echo "Abcd1234_Tot9012.tore.dr" | awk ' { match($0,/Tot([0-9]*)/,a); print a[1]; } '
9012
I know this is tagged as bash/sed but perl is clearer for this kind of task, in my opinion. In case you're interested:
perl -ne 'print $1 if /Tot([0-9]+)[._]/' input.txt
-ne tells perl to loop the specified one-liner over the input file without printing anything by default.
The regex is readable as: match Tot, followed by a number, followed by either a dot or an underscore; capture the number (that's what the parens are for). As it's the first/capture group it's assigned to the $1 variable, which then is printed.
Pure Bash:
string="Abcd1234_Tot9012_tore.dr" # or ".tore.dr"
string=${string##*_Tot}
string=${string%%[_.]*}
echo "$string"
Remove longest leading part ending with '_Tot'.
Remove longest trailing part beginning with '_' or '.'.
Result:
9012
awk
string="Abcd1234_Tot9012_tore.dr"
num=$(awk -F'Tot|[._]' '{print $3}' <<<"$string")
sed
string="Abcd1234_Tot9012_tore.dr"
num=$(sed 's/.*\([0-9]\{4\}\).*$/\1/' <<<"$string")
Example
$ string="Abcd1234_Tot9012_tore.dr"; awk -F'Tot|[._]' '{print $3}' <<<"$string"
9012
$ string="Abcd1234_Tot9013.tore.dr"; sed 's/.*\([0-9]\{4\}\).*$/\1/' <<<"$string"
9013
You can use perl one-liner:
perl -pe 's/.*(?<=Tot)([0-9]{4}).*/\1/' file
Test:
[jaypal:~/Temp] cat file
Abcd1234_Tot9012_tore.dr
Abcd1234_Tot9012.tore.dr
[jaypal:~/Temp] perl -pe 's/.*(?<=Tot)([0-9]{4}).*/\1/' file
9012
9012
Using grep you can do:
str=Abcd1234_Tot9012.tore.dr; grep -o "Tot[0-9]*" <<< $str|grep -o "[0-9]*$"
OUTPUT:
9012
This might work for you:
echo -e "Abcd1234_Tot9012_tore.dr\nAbcd1234_Tot9012.tore.dr" |
sed 's/Tot[^0-9]*\([0-9]*\)[_.].*/\n\1/;s/.*\n//'
9012
9012
This works equally as well:
echo -e "Abcd1234_Tot9012_tore.dr\nAbcd1234_Tot9012.tore.dr" |
sed 's/.*Tot\([0-9]*\).*/\1/'
9012
9012

Resources