String manipulation via script - bash

I am trying to get a substring between &DEST= and the next & or a line break.
For example :
MYREQUESTISTO8764GETTHIS&DEST=SFO&ORIG=6546
In this I need to extract "SFO"
MYREQUESTISTO8764GETTHIS&DEST=SANFRANSISCO&ORIG=6546
In this I need to extract "SANFRANSISCO"
MYREQUESTISTO8764GETTHISWITH&DEST=SANJOSE
In this I need to extract "SANJOSE"
I am reading a file line by line, and I need to update the text after &DEST= and put it back in the file. The modification of the text is to mask the dest value with X character.
So, SFO should be replaced with XXX.
SANJOSE should be replaced with XXXXXXX.
Output :
MYREQUESTISTO8764GETTHIS&DEST=XXX&ORIG=6546
MYREQUESTISTO8764GETTHIS&DEST=XXXXXXXXXXXX&ORIG=6546
MYREQUESTISTO8764GETTHISWITH&DEST=XXXXXXX
Please let me know how to achieve this in script (Preferably shell or bash script).
Thanks.

$ cat file
MYREQUESTISTO8764GETTHIS&DEST=SFO&ORIG=6546
MYREQUESTISTO8764GETTHIS&DEST=PORTORICA
MYREQUESTISTO8764GETTHIS&DEST=SANFRANSISCO&ORIG=6546
MYREQUESTISTO8764GETTHISWITH&DEST=SANJOSE
$ sed -E 's/^.*&DEST=([^&]*)[&]*.*$/\1/' file
SFO
PORTORICA
SANFRANSISCO
SANJOSE
should do it

Replacing airports with an equal number of Xs
Let's consider this test file:
$ cat file
MYREQUESTISTO8764GETTHIS&DEST=SFO&ORIG=6546
MYREQUESTISTO8764GETTHIS&DEST=SANFRANSISCO&ORIG=6546
MYREQUESTISTO8764GETTHISWITH&DEST=SANJOSE
To replace the strings after &DEST= with an equal length of X and using GNU sed:
$ sed -E ':a; s/(&DEST=X*)[^X&]/\1X/; ta' file
MYREQUESTISTO8764GETTHIS&DEST=XXX&ORIG=6546
MYREQUESTISTO8764GETTHIS&DEST=XXXXXXXXXXXX&ORIG=6546
MYREQUESTISTO8764GETTHISWITH&DEST=XXXXXXX
To replace the file in-place:
sed -i -E ':a; s/(&DEST=X*)[^X&]/\1X/; ta' file
The above was tested with GNU sed. For BSD (OSX) sed, try:
sed -Ee :a -e 's/(&DEST=X*)[^X&]/\1X/' -e ta file
Or, to change in-place with BSD(OSX) sed, try:
sed -i '' -Ee :a -e 's/(&DEST=X*)[^X&]/\1X/' -e ta file
If there is some reason why it is important to use the shell to read the file line-by-line:
while IFS= read -r line
do
echo "$line" | sed -Ee :a -e 's/(&DEST=X*)[^X&]/\1X/' -e ta
done <file
How it works
Let's consider this code:
search_str="&DEST="
newfile=chart.txt
sed -E ':a; s/('"$search_str"'X*)[^X&]/\1X/; ta' "$newfile"
-E
This tells sed to use Extended Regular Expressions (ERE). This has the advantage of requiring fewer backslashes to escape things.
:a
This creates a label a.
s/('"$search_str"'X*)[^X&]/\1X/
This looks for $search_str followed by any number of X followed by any character that is not X or &. Because of the parens, everything except that last character is saved into group 1. This string is replaced by group 1, denoted \1 and an X.
ta
In sed, t is a test command. If the substitution was made (meaning that some character needed to be replaced by X), then the test evaluates to true and, in that case, ta tells sed to jump to label a.
This test-and-jump causes the substitution to be repeated as many times as necessary.
Replacing multiple tags with one sed command
$ name='DEST|ORIG'; sed -E ':a; s/(&('"$name"')=X*)[^X&]/\1X/; ta' file
MYREQUESTISTO8764GETTHIS&DEST=XXX&ORIG=XXXX
MYREQUESTISTO8764GETTHIS&DEST=XXXXXXXXXXXX&ORIG=XXXX
MYREQUESTISTO8764GETTHISWITH&DEST=XXXXXXX
Answer for original question
Using shell
$ s='MYREQUESTISTO8764GETTHIS&DEST=SFO&ORIG=6546'
$ s=${s#*&DEST=}
$ echo ${s%%&*}
SFO
How it works:
${s#*&DEST=} is prefix removal. This removes all text up to and including the first occurrence of &DEST=.
${s%%&*} is suffix removal_. It removes all text from the first & to the end of the string.
Using awk
$ echo 'MYREQUESTISTO8764GETTHIS&DEST=SFO&ORIG=6546' | awk -F'[=\n]' '$1=="DEST"{print $2}' RS='&'
SFO
How it works:
-F'[=\n]'
This tells awk to treat either an equal sign or a newline as the field separator
$1=="DEST"{print $2}
If the first field is DEST, then print the second field.
RS='&'
This sets the record separator to &.

With GNU bash:
while IFS= read -r line; do
[[ $line =~ (.*&DEST=)(.*)((&.*|$)) ]] && echo "${BASH_REMATCH[1]}fooooo${BASH_REMATCH[3]}"
done < file
Output:
MYREQUESTISTO8764GETTHIS&DEST=fooooo&ORIG=6546
MYREQUESTISTO8764GETTHIS&DEST=fooooo&ORIG=6546
MYREQUESTISTO8764GETTHISWITH&DEST=fooooo

Replace the characters between &DEST and & (or EOL) with x's:
awk -F'&DEST=' '{
printf("%s&DEST=", $1);
xlen=index($2,"&");
if ( xlen == 0) xlen=length($2)+1;
for (i=0;i<xlen;i++) printf("%s", "X");
endstr=substr($2,xlen);
printf("%s\n", endstr);
}' file

Related

sed multiple replacements with line range

I have a file with below records
user1,fuser1,luser1,user1#test.com,data,user1
user2,fuser2,luser2,user2#test.com,data,user2
user3,fuser3,luser3,user3#test.com,data,user3
I wanted to perform some text replacements from
user1,fuser1,luser1,user1#test.com,data,user1
to
New_user1,New_fuser1,New_luser1,New_user1#test.com,data,New_user1
so I wrote below sed script.
sed -i -e 's/user/New_user/g; s/fuser/New_fuser/g; s/luser/New_luser/g' file
This works perfect. Now I have a requirement that I want to replace in specific line range.
start=2
end=3
sed -i -e ''${start},${end}'s/user/New_user/g; s/fuser/New_fuser/g; s/luser/New_luser/g' file
but this command is replacing pattern in all lines. example output is,
user1,New_fuser1,New_luser1,user1#test.com,data,New_user1
user2,New_fuser2,New_luser2,user2#test.com,data,New_user2
user3,New_fuser3,New_luser3,user3#test.com,data,New_user3
Looks like range is getting applied only to first expression and remaining expressions are getting applied on whole file. How to apply this range to all expressions?
You can use awk variables to use for this functionality, controlling the row and column numbers used for replacing
awk -vFS="," -vOFS="," -v columnStart=2 -v columnEnd=3 -v rowStart=1 -v rowEnd=2 \
'NR>=rowStart&&NR<=rowEnd{for(i=columnStart; i<=columnEnd; i++) \
$i="New_"$i; print }' file
where the awk variables columnStart, columnEnd, rowStart and rowStart determine which columns and rows to replace with , as the de-limiter adopted.
For your input file:-
$ cat input-file
user1,fuser1,luser1,user1#test.com,data,user1
user2,fuser2,luser2,user2#test.com,data,user2
user3,fuser3,luser3,user3#test.com,data,user3
Assuming I want to do replacement in lines 2 and 3 from columns 3-4, I can set-up my awk as
awk -vFS="," -vOFS="," -v columnStart=3 -v columnEnd=4 -v rowStart=2 -v rowEnd=3 \
'NR>=rowStart&&NR<=rowEnd{for(i=columnStart; i<=columnEnd; i++) \
$i="New_"$i; print }' file
user2,fuser2,New_luser2,New_user2#test.com,data,user2
user3,fuser3,New_luser3,New_user3#test.com,data,user3
To apply on the say the last column, set the columnStart and columnEnd to the same value e.g. say on column 6 and on last line only.
awk -vFS="," -vOFS="," -v columnStart=6 -v columnEnd=6 -v rowStart=3 -v rowEnd=3 \
'NR>=rowStart&&NR<=rowEnd{for(i=columnStart; i<=columnEnd; i++) \
$i="New_"$i; print }' file
user3,fuser3,luser3,user3#test.com,data,New_user3
When using GNU Sed (present on Ubuntu, probably Debian, and probably others).
There is a feature which makes this easy:
https://www.gnu.org/software/sed/manual/sed.html#Common-Commands
A group of commands may be enclosed between { and } characters. This
is particularly useful when you want a group of commands to be
triggered by a single address (or address-range) match.
Example: perform substitution then print the second input line:
$ seq 3 | sed -n '2{s/2/X/ ; p}'
X
Given the original question, this should do the trick:
sed -i -e '2,3 {s/user/New_user/g; s/fuser/New_fuser/g; s/luser/New_luser/g}' file
The following works for me:
START=2
NUM=1
sed -i -e "$START,+${NUM} s/user/New_user/g; $START,+${NUM} s/fuser/New_fuser/g; $START,+${NUM} s/luser/New_luser/g" file
As you can see, there are several changes:
The line range has to be present at each expression
The range should be represented (in this case) as the start line number and number of lines (the number of affected lines is NUM+1)
You put extra apostrophe symbols.
Using a single s command:
start=1
end=2
sed -e "$start,$end s/\([fl]*\)user/New_\1user/g" file
[fl]*user will match user with optional f or l first letter
output:
New_user1,New_fuser1,New_luser1,New_user1#test.com,data,New_user1
New_user2,New_fuser2,New_luser2,New_user2#test.com,data,New_user2
user3,fuser3,luser3,user3#test.com,data,user3

Bash command to extract characters in a string

I want to write a small script to generate the location of a file in an NGINX cache directory.
The format of the path is:
/path/to/nginx/cache/d8/40/32/13febd65d65112badd0aa90a15d84032
Note the last 6 characters: d8 40 32, are represented in the path.
As an input I give the md5 hash (13febd65d65112badd0aa90a15d84032) and I want to generate the output: d8/40/32/13febd65d65112badd0aa90a15d84032
I'm sure sed or awk will be handy, but I don't know yet how...
This awk can make it:
awk 'BEGIN{FS=""; OFS="/"}{print $(NF-5)$(NF-4), $(NF-3)$(NF-2), $(NF-1)$NF, $0}'
Explanation
BEGIN{FS=""; OFS="/"}. FS="" sets the input field separator to be "", so that every char will be a different field. OFS="/" sets the output field separator as /, for print matters.
print ... $(NF-1)$NF, $0 prints the penultimate field and the last one all together; then, the whole string. The comma is "filled" with the OFS, which is /.
Test
$ awk 'BEGIN{FS=""; OFS="/"}{print $(NF-5)$(NF-4), $(NF-3)$(NF-2), $(NF-1)$NF, $0}' <<< "13febd65d65112badd0aa90a15d84032"
d8/40/32/13febd65d65112badd0aa90a15d84032
Or with a file:
$ cat a
13febd65d65112badd0aa90a15d84032
13febd65d65112badd0aa90a15f1f2f3
$ awk 'BEGIN{FS=""; OFS="/"}{print $(NF-5)$(NF-4), $(NF-3)$(NF-2), $(NF-1)$NF, $0}' a
d8/40/32/13febd65d65112badd0aa90a15d84032
f1/f2/f3/13febd65d65112badd0aa90a15f1f2f3
With sed:
echo '13febd65d65112badd0aa90a15d84032' | \
sed -n 's/\(.*\([0-9a-f]\{2\}\)\([0-9a-f]\{2\}\)\([0-9a-f]\{2\}\)\)$/\2\/\3\/\4\/\1/p;'
Having GNU sed you can even simplify the pattern using the -r option. Now you won't need to escape {} and () any more. Using ~ as the regex delimiter allows to use the path separator / without need to escape it:
sed -nr 's~(.*([0-9a-f]{2})([0-9a-f]{2})([0-9a-f]{2}))$~\2/\3/\4/\1~p;'
Output:
d8/40/32/13febd65d65112badd0aa90a15d84032
Explained simple the pattern does the following: It matches:
(all (n-5 - n-4) (n-3 - n-2) (n-1 - n-0))
and replaces it by
/$1/$2/$3/$0
You can use a regular expression to separate each of the last 3 bytes from the rest of the hash.
hash=13febd65d65112badd0aa90a15d84032
[[ $hash =~ (..)(..)(..)$ ]]
new_path="/path/to/nginx/cache/${BASH_REMATCH[1]}/${BASH_REMATCH[2]}/${BASH_REMATCH[3]}/$hash"
Base="/path/to/nginx/cache/"
echo '13febd65d65112badd0aa90a15d84032' | \
sed "s|\(.*\(..\)\(..\)\(..\)\)|${Base}\2/\3/\4/\1|"
# or
# sed sed 's|.*\(..\)\(..\)\(..\)$|${Base}\1/\2/\3/&|'
Assuming info is a correct MD5 (and only) string
First of all - thanks to all of the responders - this was extremely quick!
I also did my own scripting meantime, and came up with this solution:
Run this script with a parameter of the URL you're looking for (www.example.com/article/76232?q=hello for example)
#!/bin/bash
path=$1
md5=$(echo -n "$path" | md5sum | cut -f1 -d' ')
p3=$(echo "${md5:0-2:2}")
p2=$(echo "${md5:0-4:2}")
p1=$(echo "${md5:0-6:2}")
echo "/path/to/nginx/cache/$p1/$p2/$p3/$md5"
This assumes the NGINX cache has a key structure of 2:2:2.

Cut the first and the last part of a string in bash

I have a string having this formats:
aa_bb_cc_dd
aa_bb_cc_dd_ee_ff
I want to obtain:
bb_cc
bb_cc_dd_ee
I've tried 'cut', but I didn't manage to obtain what I wanted.
when using bash you can use built-ins for this task:
strip_headtail() {
local s=$1
## strip the head
s=${s#*_}
## strip the tail
s=${s%_*}
echo ${s}
}
strip_headtail aa_bb_cc_dd
strip_headtail aa_bb_cc_dd_ee_ff
you might want to check the bash-manual (man bash) for more information on this.
search for Remove matching prefix pattern resp. Remove matching suffix pattern.
With awk:
$ echo "aa_bb_cc_dd
aa_bb_cc_dd_ee_ff" | awk -F_ '{for(i=1;i<NF;i++) $i=$(i+1); NF=NF-2}1' OFS=_
bb_cc
bb_cc_dd_ee
Explanation
-F_ and OFS=_ set input and output field separator as _.
{for(i=1;i<NF;i++) $i=$(i+1); NF=NF-2} set each field as the next one, so the nth will be the (n+1)th. Then, decrease number of fields in 2.
With sed:
$ echo "aa_bb_cc_dd
aa_bb_cc_dd_ee_ff" | sed -e 's/^[^_]*_//' -e 's/_[^_]*$//'
bb_cc
bb_cc_dd_ee
Explanation
sed -e is used to do multiple commands.
's/^[^_]*_//' delete from the beginning up to first _.
's/_[^_]*$//' delete from last _ up to the end of line.

sed right align a group of text

this question originated from string pattaren-matching using awk , basically we are splitting a line of text in multiple groups based on a regex pattern, and then printing two groups only. Now the question is can we right align a group while printing through sed?
below is an example
$cat input.txt
it is line one
it is longggggggg one
itttttttttt is another one
now
$sed -e 's/\(.*\) \(.*\) \(.*\) \(.*\)/\1 \3/g' input.txt
it splits and prints group 1 and 3, but the output is
it line
it longggggggg
itttttttttt another
my question is can we do it through sed so that the output comes as
it line
it longggggggg
itttttttttt another
I did it with awk but I feel it can be done through sed, but I am not able to get how I am going to get the length of the second group and then pad correct number of spaces in between the groups, I am open to any suggestions to try out.
This might work for you (GNU sed):
sed -r 's/^(.*) .* (.*) .*$/\1 \2/;:a;s/^.{1,40}$/ &/;ta;s/^( *)(\S*)/\2\1/' file
or:
sed -r 's/^(.*) .* (.*) .*$/printf "%-20s%20s" \1 \2/e' file
You can use looping in sed to achieve what you want:
#!/bin/bash
echo 'aa bb cc dd
11 22 33333333 44
ONE TWO THREEEEEEEEE FOUR' | \
sed -e 's/\(.*\) \(.*\) \(.*\) \(.*\)/\1 \3/g' \
-e '/\([^ ]*\) \([^ ]*\)/ { :x ; s/^\(.\{1,19\}\) \(.\{1,19\}\)$/\1 \2/g ; tx }'
The two 19's control the width of your columns. The :x is a label which is looped to by tx whenever the preceding substitution succeeded. (You could add a p; before tx to "debug" it.
It most easy to use awk in this case...
You could too use a bash loop to calculate the number of space and run this command on the line covered :
while read; do
# ... calculate $SPACE ...
echo $REPLY|sed "s/\([^\ ]*\)\ *[^\ ]*\ *\([^\ ]*\)/\1$SPACES\2/g"
done < file
But I prefer use awk for do all that (or other advanced shell languages ​​such as Perl, Python, PHP shell mode, ...)
TemplateSpace=" "
TemplateSize=${#TemplateSpace}
sed "
# split your group (based on word here but depend on your real need)
s/^ *\(\w\) \(\w\) \(\w\) \(\w\).*$/\1 \3/
# align
s/$/${TemplateSpace}/
s/^\(.\{${TemplateSize}\}\).*$/\1/
s/\(\w\) \(\w\)\( *\)/\1 \3\2/
"
or more simple for avoiding TemplateSize (and there are no dot in content)
TemplateSpace="............................................................."
and replace
s/^\(.\{${TemplateSize}\}.*$/\1/
by
s/^\(${TemplateSpace}\).*$/\1/
s/\./ /g
Del columns 2 and 4. Right justify resulting col 2 at line length of 23 chars.
sed -e '
s/[^ ]\+/ /4;
s/[^ ]\+//2;
s/^\(.\{23\}\).*$/\1/;
s/\(^[^ ]\+[ ]\+\)\([^ ]\+\)\([ ]\+\)/\1\3\2/;
'
or gnu sed with extended regex:
sed -r '
s/\W+\w+\W+(\w+)\W+\w+$/\1 /;
s/^(.{23}).*/\1/;
s/(+\W)(\w+)(\W+)$/\1\3\2/
'
This question is old, but I like to see it as a puzzle.
While I love the loop solution for its brevity, here is one without a loop or shell help.
sed -E "s/ \w+ (\w+) \w+$/ \1/;h;s/./ /g;s/$/# /;s/( *)#\1//;x;H;x;s/\n//;s/^( *)(\w+)/\2\1/"
or without extended regex
sed "s/ .* \(.*\) .*$/ \1/;h;s/./ /g;s/$/# /;s/\( *\)#\1//;x;H;x;s/\n//;s/^\( *\)\([^ ]*\)/\2\1/"

How do I insert a newline/linebreak after a line using sed

It took me a while to figure out how to do this, so posting in case anyone else is looking for the same.
For adding a newline after a pattern, you can also say:
sed '/pattern/{G;}' filename
Quoting GNU sed manual:
G
Append a newline to the contents of the pattern space, and then append the contents of the hold space to that of the pattern space.
EDIT:
Incidentally, this happens to be covered in sed one liners:
# insert a blank line below every line which matches "regex"
sed '/regex/G'
This sed command:
sed -i '' '/pid = run/ a\
\
' file.txt
Finds the line with: pid = run
file.txt before
; Note: the default prefix is /usr/local/var
; Default Value: none
;pid = run/php-fpm.pid
; Error log file
and adds a linebreak after that line inside file.txt
file.txt after
; Note: the default prefix is /usr/local/var
; Default Value: none
;pid = run/php-fpm.pid
; Error log file
Or if you want to add text and a linebreak:
sed -i '/pid = run/ a\
new line of text\
' file.txt
file.txt after
; Note: the default prefix is /usr/local/var
; Default Value: none
;pid = run/php-fpm.pid
new line of text
; Error log file
A simple substitution works well:
sed 's/pattern.*$/&\n/'
Example :
$ printf "Hi\nBye\n" | sed 's/H.*$/&\nJohn/'
Hi
John
Bye
To be standard compliant, replace \n by backslash newline :
$ printf "Hi\nBye\n" | sed 's/H.*$/&\
> John/'
Hi
John
Bye
sed '/pattern/a\\r' file name
It will add a return after the pattern while g will replace the pattern with a blank line.
If a new line (blank) has to be added at end of the file use this:
sed '$a\\r' file name
Another possibility, e.g. if You don't have an empty hold register, could be:
sed '/pattern/{p;s/.*//}' file
Explanation:
/pattern/{...} = apply sequence of commands, if line with pattern found,
p = print the current line,
; = separator between commands,
s/.*// = replace anything with nothing in the pattern register,
then automatically print the empty pattern register as additional line)
The easiest option -->
sed 'i\
' filename

Resources