sed multiple replacements with line range - bash

I have a file with below records
user1,fuser1,luser1,user1#test.com,data,user1
user2,fuser2,luser2,user2#test.com,data,user2
user3,fuser3,luser3,user3#test.com,data,user3
I wanted to perform some text replacements from
user1,fuser1,luser1,user1#test.com,data,user1
to
New_user1,New_fuser1,New_luser1,New_user1#test.com,data,New_user1
so I wrote below sed script.
sed -i -e 's/user/New_user/g; s/fuser/New_fuser/g; s/luser/New_luser/g' file
This works perfect. Now I have a requirement that I want to replace in specific line range.
start=2
end=3
sed -i -e ''${start},${end}'s/user/New_user/g; s/fuser/New_fuser/g; s/luser/New_luser/g' file
but this command is replacing pattern in all lines. example output is,
user1,New_fuser1,New_luser1,user1#test.com,data,New_user1
user2,New_fuser2,New_luser2,user2#test.com,data,New_user2
user3,New_fuser3,New_luser3,user3#test.com,data,New_user3
Looks like range is getting applied only to first expression and remaining expressions are getting applied on whole file. How to apply this range to all expressions?

You can use awk variables to use for this functionality, controlling the row and column numbers used for replacing
awk -vFS="," -vOFS="," -v columnStart=2 -v columnEnd=3 -v rowStart=1 -v rowEnd=2 \
'NR>=rowStart&&NR<=rowEnd{for(i=columnStart; i<=columnEnd; i++) \
$i="New_"$i; print }' file
where the awk variables columnStart, columnEnd, rowStart and rowStart determine which columns and rows to replace with , as the de-limiter adopted.
For your input file:-
$ cat input-file
user1,fuser1,luser1,user1#test.com,data,user1
user2,fuser2,luser2,user2#test.com,data,user2
user3,fuser3,luser3,user3#test.com,data,user3
Assuming I want to do replacement in lines 2 and 3 from columns 3-4, I can set-up my awk as
awk -vFS="," -vOFS="," -v columnStart=3 -v columnEnd=4 -v rowStart=2 -v rowEnd=3 \
'NR>=rowStart&&NR<=rowEnd{for(i=columnStart; i<=columnEnd; i++) \
$i="New_"$i; print }' file
user2,fuser2,New_luser2,New_user2#test.com,data,user2
user3,fuser3,New_luser3,New_user3#test.com,data,user3
To apply on the say the last column, set the columnStart and columnEnd to the same value e.g. say on column 6 and on last line only.
awk -vFS="," -vOFS="," -v columnStart=6 -v columnEnd=6 -v rowStart=3 -v rowEnd=3 \
'NR>=rowStart&&NR<=rowEnd{for(i=columnStart; i<=columnEnd; i++) \
$i="New_"$i; print }' file
user3,fuser3,luser3,user3#test.com,data,New_user3

When using GNU Sed (present on Ubuntu, probably Debian, and probably others).
There is a feature which makes this easy:
https://www.gnu.org/software/sed/manual/sed.html#Common-Commands
A group of commands may be enclosed between { and } characters. This
is particularly useful when you want a group of commands to be
triggered by a single address (or address-range) match.
Example: perform substitution then print the second input line:
$ seq 3 | sed -n '2{s/2/X/ ; p}'
X
Given the original question, this should do the trick:
sed -i -e '2,3 {s/user/New_user/g; s/fuser/New_fuser/g; s/luser/New_luser/g}' file

The following works for me:
START=2
NUM=1
sed -i -e "$START,+${NUM} s/user/New_user/g; $START,+${NUM} s/fuser/New_fuser/g; $START,+${NUM} s/luser/New_luser/g" file
As you can see, there are several changes:
The line range has to be present at each expression
The range should be represented (in this case) as the start line number and number of lines (the number of affected lines is NUM+1)
You put extra apostrophe symbols.

Using a single s command:
start=1
end=2
sed -e "$start,$end s/\([fl]*\)user/New_\1user/g" file
[fl]*user will match user with optional f or l first letter
output:
New_user1,New_fuser1,New_luser1,New_user1#test.com,data,New_user1
New_user2,New_fuser2,New_luser2,New_user2#test.com,data,New_user2
user3,fuser3,luser3,user3#test.com,data,user3

Related

Unix sed command - global replacement is not working

I have scenario where we want to replace multiple double quotes to single quotes between the data, but as the input data is separated with "comma" delimiter and all column data is enclosed with double quotes "" got an issue and the same explained below:
The sample data looks like this:
"int","","123","abd"""sf123","top"
So, the output would be:
"int","","123","abd"sf123","top"
tried below approach to get the resolution, but only first occurrence is working, not sure what is the issue??
sed -ie 's/,"",/,"NULL",/g;s/""/"/g;s/,"NULL",/,"",/g' inputfile.txt
replacing all ---> from ,"", to ,"NULL",
replacing all multiple occurrences of ---> from """ or "" or """" to " (single occurrence)
replacing 1 step changes back to original ---> from ,"NULL", to ,"",
But, only first occurrence is getting changed and remaining looks same as below:
If input is :
"int","","","123","abd"""sf123","top"
the output is coming as:
"int","","NULL","123","abd"sf123","top"
But, the output should be:
"int","","","123","abd"sf123","top"
You may try this perl with a lookahead:
perl -pe 's/("")+(?=")//g' file
"int","","123","abd"sf123","top"
"int","","","123","abd"sf123","top"
"123"abcs"
Where input is:
cat file
"int","","123","abd"""sf123","top"
"int","","","123","abd"""sf123","top"
"123"""""abcs"
Breakup:
("")+: Match 1+ pairs of double quotes
(?="): If those pairs are followed by a single "
Using sed
$ sed -E 's/(,"",)?"+(",)?/\1"\2/g' input_file
"int","","123","abd"sf123","top"
"int","","NULL","123","abd"sf123","top"
"int","","","123","abd"sf123","top"
In awk with your shown samples please try following awk code. Written and tested in GNU awk, should work in any version of awk.
awk '
BEGIN{ FS=OFS="," }
{
for(i=1;i<=NF;i++){
if($i!~/^""$/){
gsub(/"+/,"\"",$i)
}
}
}
1
' Input_file
Explanation: Simple explanation would be, setting field separator and output field separator as , for all the lines of Input_file. Then traversing through each field of line, if a field is NOT NULL then Globally replacing all 1 or more occurrences of " with single occurrence of ". Then printing the line.
With sed you could repeat 1 or more times sets of "" using a group followed by matching a single "
Then in the replacement use a single "
sed -E 's/("")+"/"/g' file
For this content
$ cat file
"int","","123","abd"""sf123","top"
"int","","","123","abd"""sf123","top"
"123"""""abcs"
The output is
"int","","123","abd"sf123","top"
"int","","","123","abd"sf123","top"
"123"abcs"
sed s'#"""#"#' file
That works. I will demonstrate another method though, which you may also find useful in other situations.
#!/bin/sh -x
cat > ed1 <<EOF
3s/"""/"/
wq
EOF
cp file stack
cat stack | tr ',' '\n' > f2
ed -s f2 < ed1
cat f2 | tr '\n' ',' > stack
rm -v ./f2
rm -v ./ed1
The point of this is that if you have a big csv record all on one line, and you want to edit a specific field, then if you know the field number, you can convert all the commas to carriage returns, and use the field number as a line number to either substitute, append after it, or insert before it with Ed; and then re-convert back to csv.

Update version number in property file using bash

I am new in bash scripting and I need help with awk. So the thing is that I have a property file with version inside and I want to update it.
version=1.1.1.0
and I use awk to do that
file="version.properties"
awk -F'["]' -v OFS='"' '/version=/{
split($4,a,".");
$4=a[1]"."a[2]"."a[3]"."a[4]+1
}
;1' $file > newFile && mv newFile $file
but I am getting strange result version="1.1.1.0""...1
Could someone help me please with this.
You mentioned in your comment you want to update the file in place. You can do that in a one-liner with perl:
perl -pe '/^version=/ and s/(\d+\.\d+\.\d+\.)(\d+)/$1 . ($2+1)/e' -i version.properties
Explanation
-e is followed by a script to run. With -p and -i, the effect is to run that script on each line, and modify the file in place if the script changes anything.
The script itself, broken down for explanation, is:
/^version=/ and # Do the following on lines starting with `version=`
s/ # Make a replacement on those lines
(\d+\.\d+\.\d+\.)(\d+)/ # Match x.y.z.w, and set $1 = `x.y.z.` and $2 = `w`
$1 . ($2+1)/ # Replace x.y.z.w with a copy of $1, followed by w+1
e # This tells Perl the replacement is Perl code rather
# than a text string.
Example run
$ cat foo.txt
version=1.1.1.2
$ perl -pe '/^version=/ and s/(\d+\.\d+\.\d+\.)(\d+)/$1 . ($2+1)/e' -i foo.txt
$ cat foo.txt
version=1.1.1.3
This is not the best way, but here's one fix.
Test case
I am assuming the input file has at least one line that is exactly version=1.1.1.0.
$ awk -F'["]' -v OFS='"' '/version=/{
> split($4,a,".");
> $4=a[1]"."a[2]"."a[3]"."a[4]+1
> }
> ;1' <<<'version=1.1.1.0'
Output:
version=1.1.1.0"""...1
The """ is because you are assigning to field 4 ($4). When you do that, awk adds field separators (OFS) between fields 1 and 2, 2 and 3, and 3 and 4. Three OFS => """, in your example.
Minimal change
$ awk -F'["]' -v OFS='"' '/version=/{
split($1,a,".");
$1=a[1]"."a[2]"."a[3]"."a[4]+1;
print
}
' <<<'version=1.1.1.0'
version=1.1.1.1
Two changes:
Change $4 to $1
Since the input field separator (-F) is ["], $4 is whatever would be after the third " (if there were any in the input). Therefore, split($4, ...) splits an empty field. The contents of the line, before the first " (if any), are in $1.
print at the end instead of ;1
The 1 after the closing curly brace is the next condition, and there is no action specified. The default action is to print the current line, as modified, so the 1 triggers printing. Instead, just print within your action when you are done processing. That way your action is self-contained. (Of course, if you needed to do other processing, you might want to print later, after that processing.)
You can use the = as the delimiter, like this:
awk -F= -v v=1.0.1 '$1=="version"{printf "version=\"%s\"\n", v}' file.properties

String manipulation via script

I am trying to get a substring between &DEST= and the next & or a line break.
For example :
MYREQUESTISTO8764GETTHIS&DEST=SFO&ORIG=6546
In this I need to extract "SFO"
MYREQUESTISTO8764GETTHIS&DEST=SANFRANSISCO&ORIG=6546
In this I need to extract "SANFRANSISCO"
MYREQUESTISTO8764GETTHISWITH&DEST=SANJOSE
In this I need to extract "SANJOSE"
I am reading a file line by line, and I need to update the text after &DEST= and put it back in the file. The modification of the text is to mask the dest value with X character.
So, SFO should be replaced with XXX.
SANJOSE should be replaced with XXXXXXX.
Output :
MYREQUESTISTO8764GETTHIS&DEST=XXX&ORIG=6546
MYREQUESTISTO8764GETTHIS&DEST=XXXXXXXXXXXX&ORIG=6546
MYREQUESTISTO8764GETTHISWITH&DEST=XXXXXXX
Please let me know how to achieve this in script (Preferably shell or bash script).
Thanks.
$ cat file
MYREQUESTISTO8764GETTHIS&DEST=SFO&ORIG=6546
MYREQUESTISTO8764GETTHIS&DEST=PORTORICA
MYREQUESTISTO8764GETTHIS&DEST=SANFRANSISCO&ORIG=6546
MYREQUESTISTO8764GETTHISWITH&DEST=SANJOSE
$ sed -E 's/^.*&DEST=([^&]*)[&]*.*$/\1/' file
SFO
PORTORICA
SANFRANSISCO
SANJOSE
should do it
Replacing airports with an equal number of Xs
Let's consider this test file:
$ cat file
MYREQUESTISTO8764GETTHIS&DEST=SFO&ORIG=6546
MYREQUESTISTO8764GETTHIS&DEST=SANFRANSISCO&ORIG=6546
MYREQUESTISTO8764GETTHISWITH&DEST=SANJOSE
To replace the strings after &DEST= with an equal length of X and using GNU sed:
$ sed -E ':a; s/(&DEST=X*)[^X&]/\1X/; ta' file
MYREQUESTISTO8764GETTHIS&DEST=XXX&ORIG=6546
MYREQUESTISTO8764GETTHIS&DEST=XXXXXXXXXXXX&ORIG=6546
MYREQUESTISTO8764GETTHISWITH&DEST=XXXXXXX
To replace the file in-place:
sed -i -E ':a; s/(&DEST=X*)[^X&]/\1X/; ta' file
The above was tested with GNU sed. For BSD (OSX) sed, try:
sed -Ee :a -e 's/(&DEST=X*)[^X&]/\1X/' -e ta file
Or, to change in-place with BSD(OSX) sed, try:
sed -i '' -Ee :a -e 's/(&DEST=X*)[^X&]/\1X/' -e ta file
If there is some reason why it is important to use the shell to read the file line-by-line:
while IFS= read -r line
do
echo "$line" | sed -Ee :a -e 's/(&DEST=X*)[^X&]/\1X/' -e ta
done <file
How it works
Let's consider this code:
search_str="&DEST="
newfile=chart.txt
sed -E ':a; s/('"$search_str"'X*)[^X&]/\1X/; ta' "$newfile"
-E
This tells sed to use Extended Regular Expressions (ERE). This has the advantage of requiring fewer backslashes to escape things.
:a
This creates a label a.
s/('"$search_str"'X*)[^X&]/\1X/
This looks for $search_str followed by any number of X followed by any character that is not X or &. Because of the parens, everything except that last character is saved into group 1. This string is replaced by group 1, denoted \1 and an X.
ta
In sed, t is a test command. If the substitution was made (meaning that some character needed to be replaced by X), then the test evaluates to true and, in that case, ta tells sed to jump to label a.
This test-and-jump causes the substitution to be repeated as many times as necessary.
Replacing multiple tags with one sed command
$ name='DEST|ORIG'; sed -E ':a; s/(&('"$name"')=X*)[^X&]/\1X/; ta' file
MYREQUESTISTO8764GETTHIS&DEST=XXX&ORIG=XXXX
MYREQUESTISTO8764GETTHIS&DEST=XXXXXXXXXXXX&ORIG=XXXX
MYREQUESTISTO8764GETTHISWITH&DEST=XXXXXXX
Answer for original question
Using shell
$ s='MYREQUESTISTO8764GETTHIS&DEST=SFO&ORIG=6546'
$ s=${s#*&DEST=}
$ echo ${s%%&*}
SFO
How it works:
${s#*&DEST=} is prefix removal. This removes all text up to and including the first occurrence of &DEST=.
${s%%&*} is suffix removal_. It removes all text from the first & to the end of the string.
Using awk
$ echo 'MYREQUESTISTO8764GETTHIS&DEST=SFO&ORIG=6546' | awk -F'[=\n]' '$1=="DEST"{print $2}' RS='&'
SFO
How it works:
-F'[=\n]'
This tells awk to treat either an equal sign or a newline as the field separator
$1=="DEST"{print $2}
If the first field is DEST, then print the second field.
RS='&'
This sets the record separator to &.
With GNU bash:
while IFS= read -r line; do
[[ $line =~ (.*&DEST=)(.*)((&.*|$)) ]] && echo "${BASH_REMATCH[1]}fooooo${BASH_REMATCH[3]}"
done < file
Output:
MYREQUESTISTO8764GETTHIS&DEST=fooooo&ORIG=6546
MYREQUESTISTO8764GETTHIS&DEST=fooooo&ORIG=6546
MYREQUESTISTO8764GETTHISWITH&DEST=fooooo
Replace the characters between &DEST and & (or EOL) with x's:
awk -F'&DEST=' '{
printf("%s&DEST=", $1);
xlen=index($2,"&");
if ( xlen == 0) xlen=length($2)+1;
for (i=0;i<xlen;i++) printf("%s", "X");
endstr=substr($2,xlen);
printf("%s\n", endstr);
}' file

Separating joined columns with awk

I have a data file which looks like the following:
0.00000-130250.92921 28880.20200-159131.13121 301.58706
0.05000-130250.73120 28156.69202-158407.42322 294.03167
0.10000-130250.79137 28237.16138-158487.95275 294.87198
0.15000-130250.81209 28168.63042-158419.44250 294.15634
0.20000-130250.82418 28149.57611-158400.40029 293.95736
0.25000-130250.88438 28069.57135-158320.45573 293.12189
0.30000-130251.06059 28071.30576-158322.36635 293.14000
0.35000-130250.96639 28084.46351-158335.42990 293.27741
as you can see some of the columns which start with "-" sign are
joined to the previous one, for instance: 0.35000-130250.96639
this should be 0.35000 and -130250.96639. I can separate the
columns with VIM but I wanted to know if it is possible to do that
with AWK.
Thanks.
You can use sed: replace each - with a space and -:
sed -e 's/-/ -/g' input > output
The /g means globally, i.e. it replaces all occurrences on each line, not just the first one.
Using just awk
awk '{ gsub("-"," -") ; print }'

Bash command to extract characters in a string

I want to write a small script to generate the location of a file in an NGINX cache directory.
The format of the path is:
/path/to/nginx/cache/d8/40/32/13febd65d65112badd0aa90a15d84032
Note the last 6 characters: d8 40 32, are represented in the path.
As an input I give the md5 hash (13febd65d65112badd0aa90a15d84032) and I want to generate the output: d8/40/32/13febd65d65112badd0aa90a15d84032
I'm sure sed or awk will be handy, but I don't know yet how...
This awk can make it:
awk 'BEGIN{FS=""; OFS="/"}{print $(NF-5)$(NF-4), $(NF-3)$(NF-2), $(NF-1)$NF, $0}'
Explanation
BEGIN{FS=""; OFS="/"}. FS="" sets the input field separator to be "", so that every char will be a different field. OFS="/" sets the output field separator as /, for print matters.
print ... $(NF-1)$NF, $0 prints the penultimate field and the last one all together; then, the whole string. The comma is "filled" with the OFS, which is /.
Test
$ awk 'BEGIN{FS=""; OFS="/"}{print $(NF-5)$(NF-4), $(NF-3)$(NF-2), $(NF-1)$NF, $0}' <<< "13febd65d65112badd0aa90a15d84032"
d8/40/32/13febd65d65112badd0aa90a15d84032
Or with a file:
$ cat a
13febd65d65112badd0aa90a15d84032
13febd65d65112badd0aa90a15f1f2f3
$ awk 'BEGIN{FS=""; OFS="/"}{print $(NF-5)$(NF-4), $(NF-3)$(NF-2), $(NF-1)$NF, $0}' a
d8/40/32/13febd65d65112badd0aa90a15d84032
f1/f2/f3/13febd65d65112badd0aa90a15f1f2f3
With sed:
echo '13febd65d65112badd0aa90a15d84032' | \
sed -n 's/\(.*\([0-9a-f]\{2\}\)\([0-9a-f]\{2\}\)\([0-9a-f]\{2\}\)\)$/\2\/\3\/\4\/\1/p;'
Having GNU sed you can even simplify the pattern using the -r option. Now you won't need to escape {} and () any more. Using ~ as the regex delimiter allows to use the path separator / without need to escape it:
sed -nr 's~(.*([0-9a-f]{2})([0-9a-f]{2})([0-9a-f]{2}))$~\2/\3/\4/\1~p;'
Output:
d8/40/32/13febd65d65112badd0aa90a15d84032
Explained simple the pattern does the following: It matches:
(all (n-5 - n-4) (n-3 - n-2) (n-1 - n-0))
and replaces it by
/$1/$2/$3/$0
You can use a regular expression to separate each of the last 3 bytes from the rest of the hash.
hash=13febd65d65112badd0aa90a15d84032
[[ $hash =~ (..)(..)(..)$ ]]
new_path="/path/to/nginx/cache/${BASH_REMATCH[1]}/${BASH_REMATCH[2]}/${BASH_REMATCH[3]}/$hash"
Base="/path/to/nginx/cache/"
echo '13febd65d65112badd0aa90a15d84032' | \
sed "s|\(.*\(..\)\(..\)\(..\)\)|${Base}\2/\3/\4/\1|"
# or
# sed sed 's|.*\(..\)\(..\)\(..\)$|${Base}\1/\2/\3/&|'
Assuming info is a correct MD5 (and only) string
First of all - thanks to all of the responders - this was extremely quick!
I also did my own scripting meantime, and came up with this solution:
Run this script with a parameter of the URL you're looking for (www.example.com/article/76232?q=hello for example)
#!/bin/bash
path=$1
md5=$(echo -n "$path" | md5sum | cut -f1 -d' ')
p3=$(echo "${md5:0-2:2}")
p2=$(echo "${md5:0-4:2}")
p1=$(echo "${md5:0-6:2}")
echo "/path/to/nginx/cache/$p1/$p2/$p3/$md5"
This assumes the NGINX cache has a key structure of 2:2:2.

Resources