Extract all ip addresses with sed and awk from a string - bash

It is simple to extract all ip addresses with grep from a string.
string="221.11.165.237xxxx221.11.165.233\n
219.158.9.97ttttt219.158.19.137"
echo $string |grep -oP "(\d+\.){3}\d+"
221.11.165.237
221.11.165.233
219.158.9.97
219.158.19.137
The regrex pattern is simple (\d+\.){3}\d+.
Do the same job with sed and awk.
For sed:
echo $string | sed 's/^\(\(\d\+\.\)\{3\}\d\+\)$/\1/g'
221.11.165.237xxxx221.11.165.233\n 219.158.9.97ttttt219.158.19.137
For awk:
echo $string |gawk 'match($0,/(\d+\.){3}\d+/,k){print k}'
echo $string |awk '/(\d+\.){3}\d+/{print}'
How to fix it for sed and gawk(awk)?
The expect output is the same as grep.
221.11.165.237
221.11.165.233
219.158.9.97
219.158.19.137

Very few tools will recognize \d as meaning digits. Just use [0-9] or [[:digit:]] instead:
$ echo "$string" | awk -v RS='([0-9]+\\.){3}[0-9]+' 'RT{print RT}'
221.11.165.237
221.11.165.233
219.158.9.97
219.158.19.137
The above uses GNU awk for multi-char RS and RT. With any awk:
$ echo "$string" | awk '{while ( match($0,/([0-9]+\.){3}[0-9]+/) ) { print substr($0,RSTART,RLENGTH); $0=substr($0,RSTART+RLENGTH) } }'
221.11.165.237
221.11.165.233
219.158.9.97
219.158.19.137

This might work for you (GNU sed):
sed '/\n/!s/[0-9.]\+/\n&\n/;/^\([0-9]\{1,3\}\.\)\{3\}[0-9]\{1,3\}\n/P;D' file
Insert newlines either side of strings consisting only of numbers and periods. If a line contains only an IP address print it.
An easier-on-the-eye rendition uses the -r option:
sed -r '/\n/!s/[0-9.]+/\n&\n/;/^([0-9]{1,3}\.){3}[0-9]{1,3}\n/P;D' <<<"$string"

As you weren't specific about what could be between the ip addresses, I went with the fact that only numbers and periods will be in the ip:
echo "$string" | sed -r 's/[^0-9.]+/\n/'
echo "$string" | awk '1' RS="[^0-9.]+"

Related

Get only numbers in output

I need to get only numbers from this:
release/M_0.1.0
thus, need to extract with bash to have in output this:
0.1.0.
I have tried this but cannot finish it:
echo "release/M_0.1.0" | awk -F'/' '{print $2}'
And what about if given such string? relea234se/sdf23_4Mm0.1.0.8. How to get only 0.1.0.8? Please pay attention that this can be random digits such as 0.2 or 1.9.1.
Please check if this grep command works
echo "release/M_0.1.0" | egrep -o '[0-9.]+'
You could also use general parameter expansion parsing to literally remove characters up through the last that isn't digits or dots.
$: ver() { echo "${1//*[^.0-9]/}"; }
$: ver release/M_0.1.0
0.1.0
$: ver relea234se/sdf23_4Mm0.1.0.8
0.1.0.8
With sed you can do:
echo "release/M_0.1.0" | sed 's#.*_##'
Output:
0.1.0
Considering that your Input_file will be same as shown samples.
echo "$var" | awk -F'_' '{print $2}'
OR could use sub:
echo "$var" | awk '{sub(/.*_/,"")} 1'
With simple bash you could use:
echo "${var#*_}"
echo release/M_0.1.0 | awk -F\_ '{print $2}'
0.1.0
Take your pick:
$ var='relea234se/sdf23_4Mm0.1.0.8'
$ [[ $var =~ .*[^0-9.](.*) ]] && echo "${BASH_REMATCH[1]}"
0.1.0.8
$ echo "$var" | sed 's/.*[^0-9.]//'
0.1.0.8
$ echo "$var" | awk -F'[^0-9.]' '{print $NF}'
0.1.0.8
if data in d file, tried on gnu sed:
sed -E 's/relea.*/.*([0-9][0-9.]*)$/\1/' d

How to group each 2 column together and chage the separator type between them?

Input:
echo "1021,fra,1022,eng,1023,qad" | sed or awk ...
Expected output:
1021-fra,1022-eng,1023-gad
echo "1021,fra,1022,eng,1023,qad" |sed 's/\([^,][^,]*\),\([^,][^,]*\)/\1-\2/g'
1021-fra,1022-eng,1023-qad
by GNU sed
echo "1021,fra,1022,eng,1023,qad" |sed -r 's/([^,]+),([^,]+)/\1-\2/g'
Here's one way to do it, with a little cheat:
echo "1021,fra,1022,eng,1023,qad" | sed -e 's/,\([a-z]\)/-\1/g'
That is, replace every comma followed by a letter with a hyphen followed by that letter.
In case it helps, here's another version cheating a bit differently:
echo "1021,fra,1022,eng,1023,qad" | sed -e 's/\([0-9]\),/\1-/g'
That is, replace every digit followed by a comma with that digit and a hyphen.
Here is an awk version:
echo "1021,fra,1022,eng,1023,qad" | awk -F, '{for (i=1;i<NF;i++) printf "%s%s",$i,(i%2?"-":",");print $NF}'
1021-fra,1022-eng,1023-qad

Remove a word from a string bash

I have the string
file="this-is-a-{test}file"
I want to remove {test} from this string.
I used
echo $file | sed 's/[{][^}]*//'
but this returned me
this-is-a-}file
How can I remove } too?
Thanks
Also try this bash only oneliner as an alternative:
s="this-is-a-{test}file"
echo ${s/\{test\}/}
You can use sed with correct regex:
s="this-is-a-{test}file"
sed 's/{[^}]*}//' <<< "$s"
this-is-a-file
Or this awk:
awk -F '{[^}]*}' '{print $1 $2}' <<< "$s"
this-is-a-file

Use Awk to extract substring

Given a hostname in format of aaa0.bbb.ccc, I want to extract the first substring before ., that is, aaa0 in this case. I use following awk script to do so,
echo aaa0.bbb.ccc | awk '{if (match($0, /\./)) {print substr($0, 0, RSTART - 1)}}'
While the script running on one machine A produces aaa0, running on machine B produces only aaa, without 0 in the end. Both machine runs Ubuntu/Linaro, but A runs newer version of awk(gawk with version 3.1.8 while B with older awk (mawk with version 1.2)
I am asking in general, how to write a compatible awk script that performs the same functionality ...
You just want to set the field separator as . using the -F option and print the first field:
$ echo aaa0.bbb.ccc | awk -F'.' '{print $1}'
aaa0
Same thing but using cut:
$ echo aaa0.bbb.ccc | cut -d'.' -f1
aaa0
Or with sed:
$ echo aaa0.bbb.ccc | sed 's/[.].*//'
aaa0
Even grep:
$ echo aaa0.bbb.ccc | grep -o '^[^.]*'
aaa0
Or just use cut:
echo aaa0.bbb.ccc | cut -d'.' -f1
I am asking in general, how to write a compatible awk script that
performs the same functionality ...
To solve the problem in your quesiton is easy. (check others' answer).
If you want to write an awk script, which portable to any awk implementations and versions (gawk/nawk/mawk...) it is really hard, even if with --posix (gawk)
for example:
some awk works on string in terms of characters, some with bytes
some supports \x escape, some not
FS interpreter works differently
keywords/reserved words abbreviation restriction
some operator restriction e.g. **
even same awk impl. (gawk for example), the version 4.0 and 3.x have difference too.
the implementation of certain functions are also different. (your problem is one example, see below)
well all the points above are just spoken in general. Back to your problem, you problem is only related to fundamental feature of awk. awk '{print $x}' the line like that will work all awks.
There are two reasons why your awk line behaves differently on gawk and mawk:
your used substr() function wrongly. this is the main cause. you have substr($0, 0, RSTART - 1) the 0 should be 1, no matter which awk do you use. awk array, string idx etc are 1-based.
gawk and mawk implemented substr() differently.
You don't need awk for this...
echo aaa0.bbb.ccc | cut -d. -f1
cut -d. -f1 <<< aaa0.bbb.ccc
echo aaa0.bbb.ccc | { IFS=. read a _ ; echo $a ; }
{ IFS=. read a _ ; echo $a ; } <<< aaa0.bbb.ccc
x=aaa0.bbb.ccc; echo ${x/.*/}
Heavier options:
sed:
echo aaa0.bbb.ccc | sed 's/\..*//'
sed 's/\..*//' <<< aaa0.bbb.ccc
awk:
echo aaa0.bbb.ccc | awk -F. '{print $1}'
awk -F. '{print $1}' <<< aaa0.bbb.ccc
You do not need any external command at all, just use Parameter Expansion in bash:
hostname=aaa0.bbb.ccc
echo ${hostname%%.*}
if you don't want to change the input field separator, then it's possible to use split function:
echo "some aaa0.bbb.ccc text" | awk '{split($2, a, "."); print a[1]}'
documentation:
split(string, array [, fieldsep [, seps ] ])
Divide string into pieces separated by fieldsep
and store the pieces in array and the separator
strings in the seps array.
awk is still the cleanest approach :
mawk NF=1 FS='[.]' <<< aaa0.bbb.ccc
aaa0
If there's stuff before or after :
mawk ++NF FS='[.].+$|^[^ ]* ' OFS= <<< 'some aaa0.bbb.ccc text'
mawk '$!NF=$2' FS='[ .]' <<< 'some aaa0.bbb.ccc text'
aaa0

Remove blank spaces with comma in a string in bash shell

I would like to replace blank spaces/white spaces in a string with commas.
STR1=This is a string
to
STR1=This,is,a,string
Without using external tools:
echo ${STR1// /,}
Demo:
$ STR1="This is a string"
$ echo ${STR1// /,}
This,is,a,string
See bash: Manipulating strings.
Just use sed:
echo $STR1 | sed 's/ /,/g'
or pure BASH way::
echo ${STR1// /,}
kent$ echo "STR1=This is a string"|awk -v OFS="," '$1=$1'
STR1=This,is,a,string
Note:
if there are continued blanks, they would be replaced with a single comma. as example above shows.
This might work for you:
echo 'STR1=This is a string' | sed 'y/ /,/'
STR1=This,is,a,string
or:
echo 'STR1=This is a string' | tr ' ' ','
STR1=This,is,a,string
How about
STR1="This is a string"
StrFix="$( echo "$STR1" | sed 's/[[:space:]]/,/g')"
echo "$StrFix"
**output**
This,is,a,string
If you have multiple adjacent spaces in your string and what to reduce them to just 1 comma, then change the sed to
STR1="This is a string"
StrFix="$( echo "$STR1" | sed 's/[[:space:]][[:space:]]*/,/g')"
echo "$StrFix"
**output**
This,is,a,string
I'm using a non-standard sed, and so have used ``[[:space:]][[:space:]]*to indicate one or more "white-space" characters (including tabs, VT, maybe a few others). In a modern sed, I would expect[[:space:]]+` to work as well.
STR1=`echo $STR1 | sed 's/ /,/g'`

Resources