sed regexp in a bash script - bash

I want to extract a certain part of a string, if it exists. I'm interested in the xml filename, i.e i want whats between an "_" and ".xml".
This is ok, it prints "555"
MYSTRING=`echo "/sdd/ee/publ/xmlfile_555.xml" | sed 's/^.*_\([0-9]*\).xml/\1/'`
echo "STRING = $MYSTRING"
This is not ok because it returns the whole string. In this case I don't want any result.
It prints "/sdd/ee/publ/xmlfile.xml"
MYSTRING=`echo "/sdd/ee/publ/xmlfile.xml" | sed 's/^.*_\([0-9]*\).xml/\1/'`
echo "STRING = $MYSTRING"
Any ideas how to get an "empty" result in the second case.
thanks!

You just need to tell sed to keep its mouth shut if it doesn't find a match. The -n option is used for that.
MYSTRING=`echo "/sdd/ee/publ/xmlfile_555.xml" | sed -n 's/^.*_\([0-9]*\)\.xml/\1/p'`
I only made two changes to what you had: the aforementioned -n option to sed, and the p flag that comes after the s/// command, which tells sed to print the output only if the substitution was successfully done.
EDIT: I've also escaped the final . as suggested in the comments.

Try this?
basename /sdd/ee/publ/xmlfile_555.xml | awk -F_ '{print $2}'
The output is 555.xml
With the other one.
basename /sdd/ee/publ/xmlfile.xml | awk -F_ '{print $2}'
The output is an empty string.

$ path=/sdd/ee/publ/xmlfile_555.xml
$ echo ${path##*/}
xmlfile_555.xml
$ path=${path##*/}
$ echo ${path%.xml}
xmlfile_555
$ path=${path%.xml}
$ echo ${path##*_}
555

Related

How to get all the group names in given subscription az cli [duplicate]

I am trying to use awk to get the name of a file given the absolute path to the file.
For example, when given the input path /home/parent/child/filename I would like to get filename
I have tried:
awk -F "/" '{print $5}' input
which works perfectly.
However, I am hard coding $5 which would be incorrect if my input has the following structure:
/home/parent/child1/child2/filename
So a generic solution requires always taking the last field (which will be the filename).
Is there a simple way to do this with the awk substr function?
Use the fact that awk splits the lines in fields based on a field separator, that you can define. Hence, defining the field separator to / you can say:
awk -F "/" '{print $NF}' input
as NF refers to the number of fields of the current record, printing $NF means printing the last one.
So given a file like this:
/home/parent/child1/child2/child3/filename
/home/parent/child1/child2/filename
/home/parent/child1/filename
This would be the output:
$ awk -F"/" '{print $NF}' file
filename
filename
filename
In this case it is better to use basename instead of awk:
$ basename /home/parent/child1/child2/filename
filename
If you're open to a Perl solution, here one similar to fedorqui's awk solution:
perl -F/ -lane 'print $F[-1]' input
-F/ specifies / as the field separator
$F[-1] is the last element in the #F autosplit array
Another option is to use bash parameter substitution.
$ foo="/home/parent/child/filename"
$ echo ${foo##*/}
filename
$ foo="/home/parent/child/child2/filename"
$ echo ${foo##*/}
filename
Like 5 years late, I know, thanks for all the proposals, I used to do this the following way:
$ echo /home/parent/child1/child2/filename | rev | cut -d '/' -f1 | rev
filename
Glad to notice there are better manners
It should be a comment to the basename answer but I haven't enough point.
If you do not use double quotes, basename will not work with path where there is space character:
$ basename /home/foo/bar foo/bar.png
bar
ok with quotes " "
$ basename "/home/foo/bar foo/bar.png"
bar.png
file example
$ cat a
/home/parent/child 1/child 2/child 3/filename1
/home/parent/child 1/child2/filename2
/home/parent/child1/filename3
$ while read b ; do basename "$b" ; done < a
filename1
filename2
filename3
I know I'm like 3 years late on this but....
you should consider parameter expansion, it's built-in and faster.
if your input is in a var, let's say, $var1, just do ${var1##*/}. Look below
$ var1='/home/parent/child1/filename'
$ echo ${var1##*/}
filename
$ var1='/home/parent/child1/child2/filename'
$ echo ${var1##*/}
filename
$ var1='/home/parent/child1/child2/child3/filename'
$ echo ${var1##*/}
filename
you can skip all of that complex regex :
echo '/home/parent/child1/child2/filename' |
mawk '$!_=$-_=$NF' FS='[/]'
filename
2nd to last :
mawk '$!--NF=$NF' FS='/'
child2
3rd last field :
echo '/home/parent/child1/child2/filename' |
mawk '$!--NF=$--NF' FS='[/]'
child1
4th-last :
mawk '$!--NF=$(--NF-!-FS)' FS='/'
echo '/home/parent/child000/child00/child0/child1/child2/filename' |
child0
echo '/home/parent/child1/child2/filename'
parent
major caveat :
- `gawk/nawk` has a slight discrepancy with `mawk` regarding
- how it tracks multiple,
- and potentially conflicting, decrements to `NF`,
- so other than the 1st solution regarding last field,
- the rest for now, are only applicable to `mawk-1/2`
just realized it's much much cleaner this way in mawk/gawk/nawk :
echo '/home/parent/child1/child2/filename' | …
'
awk ++NF FS='.+/' OFS= # updated such that
# root "/" still gets printed
'
filename
You can also use:
sed -n 's/.*\/\([^\/]\{1,\}\)$/\1/p'
or
sed -n 's/.*\/\([^\/]*\)$/\1/p'

Replace string by regex

I have bunch of string like "{one}two", where "{one}" could be different and "two" is always the same. I need to replace original sting with "three{one}", "three" is also constant. It could be easily done with python, for example, but I need it to be done with shell tools, like sed or awk.
If I understand correctly, you want:
{one}two --> three{one}
{two}two --> three{two}
{n}two --> three{n}
SED with a backreference will do that:
echo "{one}two" | sed 's/\(.*\)two$/three\1/'
The search store all text up to your fixed string, and then replace with the your new string pre-appended to the stored text. SED is greedy by default, so it should grab all text up to your fixed string even if there's some repeat in the variable part (e.gxx`., {two}two will still remap to three{two} properly).
Using sed:
s="{one}two"
sed 's/^\(.*\)two/three\1/' <<< "$s"
three{one}
echo "XXXtwo" | sed -E 's/(.*)two/three\1/'
Here's a Bash only solution:
string="{one}two"
echo "three${string/two/}"
awk '{a=gensub(/(.*)two/,"three\\1","g"); print a}' <<< "{one}two"
Output:
three{one}
awk '/{.*}two/ { split($0,s,"}"); print "three"s[1]"}" }' <<< "{one}two"
does also output
three{one}
Here, we are using awk to find the correct lines, and then split on "}" (which means your lines should not contain more than the one to indicate the field).
Through GNU sed,
$ echo 'foo {one}two bar' | sed -r 's/(\{[^}]*\})two/three\1/g'
foo three{one} bar
Basic sed,
$ echo 'foo {one}two bar' | sed 's/\({[^}]*}\)two/three\1/g'
foo three{one} bar

how to extract string appears after one particular string in Shell

I am working on a script where I am grepping lines that contains -abc_1.
I need to extract string that appear just after this string as follow :
option : -abc_1 <some_path>
I have used following code :
grep "abc_1" | awk -F " " {print $4}
This code is failing if there are more spaces used between string , e.g :
option : -abc_1 <some_path>
It will be helpful if I can extract the path somehow without bothering of spaces.
thanks
This should do:
echo 'option : -abc_1 <some_path>' | awk '/abc_1/ {print $4}'
<some_path>
If you do not specify field separator, it uses one ore more blank as separator.
PS you do not need both grep and awk
With sed you can do the search and the filter in one step:
sed -n 's/^.*abc_1 *: *\([^ ]*\).*$/\1/p'
The -n option suppresses printing, but the p command at the end still prints if a successful substitution was made.
perl -lne ' print $1 if(/-abc_1 (.*)/)' your_file
Tested Here
Or if you want to use awk:
awk '{for(i=1;i<=NF;i++)if($i="-abc_1")print $(i+1)}' your_file
try this grep only way:
grep -Po '^option\s*:\s*-abc_1\s*\K.*' file
or if the white spaces were fixed:
grep -Po '^option : -abc_1 \K.*' file

bash scripting removing optional <Integer><colon> prefix

I have a list with all of the content is like:
1:NetworkManager-0.9.9.0-28.git20131003.fc20.x86_64
avahi-0.6.31-21.fc20.x86_64
2:irqbalance-1.0.7-1.fc20.x86_64
abrt-addon-kerneloops-2.1.12-2.fc20.x86_64
mdadm-3.3-4.fc20.x86_64
I need to remove the N: but leave the rest of strings as is.
Have tried:
cat service-rpmu.list | sed -ne "s/#[#:]\+://p" > end.list
cat service-rpmu.list | egrep -o '#[#:]+' > end.list
both result in an empty end.list
//* the N:, just denotes an epoch version */
With sed:
sed 's/^[0-9]\+://' your.file
Output:
NetworkManager-0.9.9.0-28.git20131003.fc20.x86_64
avahi-0.6.31-21.fc20.x86_64
irqbalance-1.0.7-1.fc20.x86_64
abrt-addon-kerneloops-2.1.12-2.fc20.x86_64
mdadm-3.3-4.fc20.x86_64
Btw, your list looks like the output of a grep command with the option -n. If this is true, then omit the -n option there. Also it is likely that your whole task can be done with a single sed command.
awk -F: '{ sub(/^.*:/,""); print}' sample
Here is another way with awk:
awk -F: '{print $NF}’ service-rpmu.list

Get specific string

I need to get a specific string from a bigger string:
From these Abcd1234_Tot9012_tore.dr or Abcd1234_Tot9012.tore.dr
I want to get those numbers which are between Tot and _ or . , so I should get 9012. Important thing is that the number of characters before and after these numbers may vary.
Could anyone give me a nice solution for this? Thanks in advance!
This should also work if you are looking only for numbers after Tot
[srikanth#myhost ~]$ echo "Abcd1234_Tot9012_tore.dr" | awk ' { match($0,/Tot([0-9]*)/,a); print a[1]; } '
9012
[srikanth#myhost ~]$ echo "Abcd1234_Tot9012.tore.dr" | awk ' { match($0,/Tot([0-9]*)/,a); print a[1]; } '
9012
I know this is tagged as bash/sed but perl is clearer for this kind of task, in my opinion. In case you're interested:
perl -ne 'print $1 if /Tot([0-9]+)[._]/' input.txt
-ne tells perl to loop the specified one-liner over the input file without printing anything by default.
The regex is readable as: match Tot, followed by a number, followed by either a dot or an underscore; capture the number (that's what the parens are for). As it's the first/capture group it's assigned to the $1 variable, which then is printed.
Pure Bash:
string="Abcd1234_Tot9012_tore.dr" # or ".tore.dr"
string=${string##*_Tot}
string=${string%%[_.]*}
echo "$string"
Remove longest leading part ending with '_Tot'.
Remove longest trailing part beginning with '_' or '.'.
Result:
9012
awk
string="Abcd1234_Tot9012_tore.dr"
num=$(awk -F'Tot|[._]' '{print $3}' <<<"$string")
sed
string="Abcd1234_Tot9012_tore.dr"
num=$(sed 's/.*\([0-9]\{4\}\).*$/\1/' <<<"$string")
Example
$ string="Abcd1234_Tot9012_tore.dr"; awk -F'Tot|[._]' '{print $3}' <<<"$string"
9012
$ string="Abcd1234_Tot9013.tore.dr"; sed 's/.*\([0-9]\{4\}\).*$/\1/' <<<"$string"
9013
You can use perl one-liner:
perl -pe 's/.*(?<=Tot)([0-9]{4}).*/\1/' file
Test:
[jaypal:~/Temp] cat file
Abcd1234_Tot9012_tore.dr
Abcd1234_Tot9012.tore.dr
[jaypal:~/Temp] perl -pe 's/.*(?<=Tot)([0-9]{4}).*/\1/' file
9012
9012
Using grep you can do:
str=Abcd1234_Tot9012.tore.dr; grep -o "Tot[0-9]*" <<< $str|grep -o "[0-9]*$"
OUTPUT:
9012
This might work for you:
echo -e "Abcd1234_Tot9012_tore.dr\nAbcd1234_Tot9012.tore.dr" |
sed 's/Tot[^0-9]*\([0-9]*\)[_.].*/\n\1/;s/.*\n//'
9012
9012
This works equally as well:
echo -e "Abcd1234_Tot9012_tore.dr\nAbcd1234_Tot9012.tore.dr" |
sed 's/.*Tot\([0-9]*\).*/\1/'
9012
9012

Resources