Replacing Nth Occurrence from End of Line via Sed - bash

Do you know of a better or easier way of using sed or Bash to replace from the last occurrence of a regex match from the end of a line without using rev?
Here is the rev way to match the third from last occurrence of the letter 's' – still forward matching, yet utilizing rev to match from the last character.
echo "a declaration of sovereignty need no witness" | rev | sed 's/s/S/3' | rev
a declaration of Sovereignty need no witness
Update -- a generalized solution based on Cyruses answer:
SEARCH="one"
OCCURRENCE=3
REPLACE="FOUR"
SED_DELIM=$'\001'
SEARCHNEG=$(sed 's/./[^&]*/g' <<< "${SEARCH}")
sed -r "s${SED_DELIM}${SEARCH}((${SEARCHNEG}${SEARCH}){$((${OCCURRENCE}-1))}${SEARCHNEG})\$${SED_DELIM}${REPLACE}\1${SED_DELIM}" <<< "one one two two one one three three one one"
Note: To be truly generic it should escape regex items from the LHS.

With GNU sed:
sed -r 's/s(([^s]*s){2}[^s]*)$/S\1/' file
Output:
a declaration of Sovereignty need no witness
See: The Stack Overflow Regular Expressions FAQ

Using awk:
str='a declaration of sovereignty need no witness'
awk -v r='S' -v n=3 'BEGIN{FS=OFS="s"} {
for (i=1; i<=NF; i++) printf "%s%s", $i, (i<NF)?(i==NF-n?r:OFS):ORS}' <<< "$str"
Output:
a declaration of Sovereignty need no witness

You first need to count the number of occurances and figure out which one to replace:
echo "importantword1 importantword2 importantword3 importantword4 importantword5 importantword6" |
grep -o "importantword" | wc -l
Use that for your sed string
echo "importantword1 importantword2 importantword3 importantword4 importantword5 importantword6" |
sed 's/\(\(.*importantword.*\)\{3\}\)importantword\(\(.*importantword.*\)\{2\}\)/\1ExportAntword\3/'
importantword1 importantword2 importantword3 ExportAntword4 importantword5 importantword6
You get the same answer with setting variables for left and right:
l=3
r=2
echo "importantword1 importantword2 importantword3 importantword4 importantword5 importantword6" |
sed 's/\(\(.*importantword.*\)\{'$l'\}\)importantword\(\(.*importantword.*\)\{'$r'\}\)/\1ExportAntword\3/'
or
str="\(.*importantword.*\)"
echo "importantword1 importantword2 importantword3 importantword4 importantword5 importantword6" |
sed 's/\('${str}'\{'$l'\}\)importantword\('${str}'\{'$r'\}\)/\1ExportAntword\3/'

In Perl, you can use a look-ahead assertion:
echo "a declaration of sovereignty need no witness" \
| perl -pe 's/s(?=(?:[^s]*s[^s]*){2}$)/S/'
(?=...) is the look-ahead assertion, it means "if followed by ...", but the ... part is not matched and therefore not substituted
[^s]*s[^s]* means there's only one s wrapped possibly by non-ses
{2} is the number of repetitions, i.e. we want two s'es after the one we want to replace

Related

Use sed to transform a comma space seperated list into a comma seperated list with quotes around each element

I have this
a/b/Test b/c/Test c/d/Test
and want to transform it into:
"a/b/Test", "b/c/Test", "c/d/Test"
I know I can use this (here: path=a/b/Test b/c/Test c/d/Test)
test=$(echo $path | sed 's/ /", "/g')
to transform it into
a/b/Test", "b/c/Test", "c/d/Test
But here I am missing the first and last ".
I dont quite know how to use sed for this. Can I somehow change it and use the anchors ^ and $ to get the first and last part of the string and add " there?
sed 's/.*/"&"/g ; s/ /", "/g' filename
You may use awk:
s='a/b/Test b/c/Test c/d/Test'
awk -v OFS=', ' '{for (i=1; i<=NF; i++) $i = "\"" $i "\""} 1' <<< "$s"
"a/b/Test", "b/c/Test", "c/d/Test"
awk is easier:
awk -v OFS=", " -v q='"' '{for(i=1;i<=NF;i++)$i=q $i q}7'
You may just add double quotes if you have a single line text:
test="a/b/Test b/c/Test c/d/Test"
test='"'$(echo "$test" | sed 's/[[:space:]]\{1,\}/",&"/g')'"'
echo "$test"
See the online demo
If you have multiple lines use
test=$(echo "$test" | sed 's/[[:space:]]\{1,\}/",&"/g; s/^/"/g; s/$/"/g')
test=$(echo "$test" | sed -E 's/[[:space:]]+/",&"/g; s/^|$/"/g')
See this online demo
The [[:space:]]\{1,\} POSIX BRE pattern (equal to [[:space:]]+ POSIX ERE) matches one or more whitespace chars and & in the replacement pattern inserts this matched value back in the resulting string.

sed pattern parts as input for other bash function

I'm trying to replace floating-point numbers like 1.2e + 3 with their integer value 1200. For this I use sed in the following way:
echo '"1.2e+04"' | sed "s/\"\([0-9]\+\.[0-9]\+\)e+\([0-9]\+\)\"/$(echo \1*10^\2|bc -l)/"
but the pattern parts \1 and \2 doesn't get evaluated in the echo.
Is there a way to solve this problem with sed?
Thanks in advance
Within the double quotes, \1 and \2 are interpreted as literal 1 and 2.
You need to put additional backslashes to escape them. In addition, $(command substitution) in
sed replacement seems not to work when combined with back references.
If you are using GNU sed, you can instead say something like:
echo '"1.2e+04"' | sed "s/\"\([0-9]\+\.[0-9]\+\)e+\([0-9]\+\)\"/echo \"\\1*10^\\2\"|bc -l/;e"
which yields:
12000.0
If you want to chop off the decimal point, you'll know what to do ;-).
If you are happy with awk command like this can do the work:
echo 1.2e+4|awk '{printf "%d",$0}'
It is perhaps better to use perl (or other typed language) to manage the variable types:
echo '"1.2e+04"' | perl -lane 'my $a=$_;$a=~ s/"//g;print sprintf("%.10g",$a);print $a;'
In any case, your sed expression is incorrect, it should be:
echo '"1.2e+04"' | sed "s/\"\([0-9]\+\.[0-9]\+\)e+\([0-9]\+\)\"/$(echo \1*10^\3 + \2*10^$(echo \3 - 1 | bc -l)|bc -l)/"
The best way to solve the problem properly is to use an advanced combination of # tshiono and # Romeo solutions:
sed "s/\(.*\)\([0-9]\+\.[0-9]\+e+[0-9]\+\)\(.*\)/printf '\1'\; echo \2 |awk '{printf \"%d\",\$0}'\;printf '\3'\;/e"
So it is possible to convert all such floats into arbitrary contexts.
for example:
echo '"1.2e+04"' | sed "s/\(.*\)\([0-9]\+\.[0-9]\+e+[0-9]\+\)\(.*\)/printf '\1'\; echo \2 |awk '{printf \"%d\",\$0}'\;printf '\3'\;/e"
outputs
"12000"
and
echo 'abc"1.2e+04"def' | sed "s/\(.*\)\([0-9]\+\.[0-9]\+e+[0-9]\+\)\(.*\)/printf '\1'\; echo \2 |awk '{printf \"%d\",\$0}'\;printf '\3'\;/e"
outputs
abc"12000"def

How to remove all but the last 3 parts of FQDN?

I have a list of IP lookups and I wish to remove all but the last 3 parts, so:
98.254.237.114.broad.lyg.js.dynamic.163data.com.cn
would become
163data.com.cn
I have spent hours searching for clues, including parameter substitution, but the closest I got was:
$ string="98.254.237.114.broad.lyg.js.dynamic.163data.com.cn"
$ string1=${string%.*.*.*}
$ echo $string1
Which gives me the inverted answer of:
98.254.237.114.broad.lyg.js.dynamic
which is everything but the last 3 parts.
A script to do a list would be better than just the static example I have here.
Using CentOS 6, I don't mind if it by using sed, cut, awk, whatever.
Any help appreciated.
Thanks, now that I have working answers, may I ask as a follow up to then process the resulting list and if the last part (after last '.') is 3 characters - eg .com .net etc, then to just keep the last 2 parts.
If this is against protocol, please advise how to do a follow up question.
if parameter expansion inside another parameter expansion is supported, you can use this:
$ s='98.254.237.114.broad.lyg.js.dynamic.163data.com.cn'
$ # removing last three fields
$ echo "${s%.*.*.*}"
98.254.237.114.broad.lyg.js.dynamic
$ # pass output of ${s%.*.*.*} plus the extra . to be removed
$ echo "${s#${s%.*.*.*}.}"
163data.com.cn
can also reverse the line, get required fields and then reverse again.. this makes it easier to use change numbers
$ echo "$s" | rev | cut -d. -f1-3 | rev
163data.com.cn
$ echo "$s" | rev | cut -d. -f1-4 | rev
dynamic.163data.com.cn
$ # and easy to use with file input
$ cat ip.txt
98.254.237.114.broad.lyg.js.dynamic.163data.com.cn
foo.bar.123.baz.xyz
a.b.c.d.e.f
$ rev ip.txt | cut -d. -f1-3 | rev
163data.com.cn
123.baz.xyz
d.e.f
echo $string | awk -F. '{ if (NF == 2) { print $0 } else { print $(NF-2)"."$(NF-1)"."$NF } }'
NF signifies the total number of field separated by "." and so we want the last piece (NF), last but 1 (NF-1) and last but 2 (NF-2)
$ echo $string | awk -F'.' '{printf "%s.%s.%s\n",$(NF-2),$(NF-1),$NF}'
163data.com.cn
Brief explanation,
Set the field separator to .
Print only last 3 field using the awk parameter $(NF-2), $(NF-1),and $NF.
And there's also another option you may try,
$ echo $string | awk -v FPAT='[^.]+.[^.]+.[^.]+$' '{print $NF}'
163data.com.cn
It sounds like this is what you need:
awk -F'.' '{sub("([^.]+[.]){"NF-3"}","")}1'
e.g.
$ echo "$string" | awk -F'.' '{sub("([^.]+[.]){"NF-3"}","")}1'
163data.com.cn
but with just 1 sample input/output it's just a guess.
wrt your followup question, this might be what you're asking for:
$ echo "$string" | awk -F'.' '{n=(length($NF)==3?2:3); sub("([^.]+[.]){"NF-n"}","")}1'
163data.com.cn
$ echo 'www.google.com' | awk -F'.' '{n=(length($NF)==3?2:3); sub("([^.]+[.]){"NF-n"}","")}1'
google.com
Version which uses only bash:
echo $(expr "$string" : '.*\.\(.*\..*\..*\)')
To use it with a file you can iterate with xargs:
File:
head list.dat
98.254.237.114.broad.lyg.js.dynamic.163data.com.cn
98.254.34.56.broad.kkk.76onepi.co.cn
98.254.237.114.polst.a65dal.com.cn
iterating the whole file:
cat list.dat | xargs -I^ -L1 expr "^" : '.*\.\(.*\..*\..*\)'
Notice: it won't be very efficient in large scale, so you need to consider by your own whether it is good enough for you.
Regexp explanation:
.* \. \( .* \. .* \. .* \)
\___| | | | |
| \------------------------/> brakets shows which part we extract
| | |
| \-------/> the \. indicates the dots to separate specific number of words
|
|
-> the rest and the final dot which we are not interested in (out of brakets)
details:
http://tldp.org/LDP/abs/html/string-manipulation.html -> Substring Extraction

Replace pipe character "|" with escaped pip character "\|" in string in bash script

I am trying to replace a pipe character in an String with the escaped character in it:
Input: "text|jdbc"
Output: "text\|jdbc"
I tried different things with tr:
echo "text|jdbc" | tr "|" "\\|"
...
But none of them worked.
Any help would be appreciated.
Thank you,
tr is good for one-to-one mapping of characters (read "translate").
\| is two characters, you cannot use tr for this. You can use sed:
echo 'text|jdbc' | sed -e 's/|/\\|/'
This example replaces one |. If you want to replace multiple, add the g flag:
echo 'text|jdbc' | sed -e 's/|/\\|/g'
An interesting tip by #JuanTomas is to use a different separator character for better readability, for example:
echo 'text|jdbc' | sed -e 's_|_\\|_g'
You can take advantage of the fact that | is a special character in bash, which means the %q modifier used by printf will escape it for you:
$ printf '%q\n' "text|jdbc"
text\|jdbc
A more general solution that doesn't require | to be treated specially is
$ f="text|jdbc"
$ echo "${f//|/\\|}"
text\|jdbc
${f//foo/bar} expands f and replaces every occurance of foo with bar. The operator here is /; when followed by another /, it replaces all occurrences of the search pattern instead of just the first one. For example:
$ f="text|jdbc|two"
$ echo "${f/|/\\|}"
text\|jdbc|two
$ echo "${f//|/\\|}"
text\|jdbc\|two
You can try with awk:
echo "text|jdbc" | awk -F'|' '$1=$1' OFS="\\\|"

Bash index of first character not given

So basically something like expr index '0123 some string' '012345789' but reversed.
I want to find the index of the first character that is not one of the given characters...
I'd rather not use RegEx, if it is possible...
You can remove chars with tr and pick the first from what is left
left=$(tr -d "012345789" <<< "0123_some string"); echo ${left:0:1}
_
once you have the char to find the index follow the same
expr index "0123_some string" ${left:0:1}
5
Using gnu awk and FPAT you can do this:
str="0123 some string"
awk -v FPAT='[012345789]+' '{print length($1)}' <<< "$str"
4
awk -v FPAT='[02345789]+' '{print length($1)}' <<< "$str"
1
awk -v FPAT='[01345789]+' '{print length($1)}' <<< "$str"
2
awk -v FPAT='[0123 ]+' '{print length($1)}' <<< "$str"
5
I know this is in Perl but I got to say that I like it:
$ perl -pe '$i++while s/^\d//;$_=$i' <<< '0123 some string'
4
In case of 1-based index you can use $. which is initialized at 1 when dealing with single lines:
$ perl -pe '$.++while s/^\d//;$_=$.' <<< '0123 some string'
5
I'm using \d because I assume that you by mistake left out the number 6 from the list 012345789
Index is currently pointing to the space:
0123 some string
^ this space
Even if shell globing might look similar, it is not a regex.
It could be done in two steps: cut the string, count characters (length).
#!/bin/dash
a="$1" ### string to process
b='0-9' ### range of characters not desired.
c=${a%%[!$b]*} ### cut the string at the first (not) "$b".
echo "${#c}" ### Print the value of the position index (from 0).
It is written to work on many shells (including bash, of course).
Use as:
$ script.sh "0123_some string"
4
$ script.sh "012s3_some string"
3

Resources