sed: replace a character only between two positions - bash

Sorry for this apparently simple question, but spent too long trying to find the solution everywhere and trying different sed options.
I just need to replace all dots by commas in a text file, but just between two positions.
As an example, from:
1.3.5.7.9
to
1.3,5,7.9
So, replace . by , between positions 3 to 7.
Thanks!
EDITED: sorry, I pretended to simplify the problem, but as none of the first 3 answers work due to a lack of details in my question, let me go a bit deeper. The important point is replacing all dots by comas in an interval of positions without knowing the rest of the string:
Here some text. I don't want to change. 10.000 usd 234.566 usd Continuation text.
More text. No need to change this part. 345 usd 76.433 usd Text going on. So on.
This is a fixed width text file, in columns, and I need to change the international format of numbers, replacing dots by commas. I just know the initial and final positions where I need to search and eventually replace the dots. Obviously, not all figures have dots (only those over 1000).
Thanks.

Rewriting the answer after the clarification of the question:
This is hard to handle with sed only, but can be simplified with other standard utilities like cut and paste:
$ start=40
$ end=64
$ paste -d' ' <(cut -c -$((start-1)) example.txt) \
> <(cut -c $((start+1))-$((end-1)) example.txt | sed 'y/./,/') \
> <(cut -c $((end+1))- example.txt)
Here some text. I don't want to change. 10,000 usd 234,566 usd Continuation text.
More text. No need to change this part. 345 usd 76,433 usd Text going on. So on.
(> just mean continuation of the previous line. < are real). This of course is very inefficient, but conceptually simple.
I used all the +1 and -1 stuff to get rid of extra spaces. Not sure if you need it.
A pure sed solution (brace yourself):
$ sed "s/\(.\{${start}\}\)\(.\{$((end-start))\}\)/\1\n\2\n/;h;s/.*\n\(.*\)\n.*/\1/;y/./,/;G;s/^\(.*\)\n\(.*\)\n\(.*\)\n\(.*\)$/\2\1\4/" example.txt
Here some text. I don't want to change. 10,000 usd 234,566 usd Continuation text.
More text. No need to change this part. 345 usd 76,433 usd Text going on. So on.
GNU sed:
$ sed -r "s/(.{${start}})(.{$((end-start))})/\1\n\2\n/;h;s/.*\n(.*)\n.*/\1/;y/./,/;G;s/^(.*)\n(.*)\n(.*)\n(.*)$/\2\1\4/" example.txt
Here some text. I don't want to change. 10,000 usd 234,566 usd Continuation text.
More text. No need to change this part. 345 usd 76,433 usd Text going on. So on.

I try to simplify the regex, but it more permissive.
echo 1.3.5.7.9 | sed -r "s/^(...).(.).(..)/\1,\2,\3/"
1.3,5,7.9
PS: It doesn't work with BSD sed.

$ echo "1.3.5.7.9" |
gawk -v s=3 -v e=7 '{
print substr($0,1,s-1) gensub(/\./,",","g",substr($0,s,e-s+1)) substr($0,e+1)
}'
1.3,5,7.9

This is rather awkward to do in pure sed. If you're not strictly constrained to sed, I suggest using another tool to do this. Ed Morton's gawk-based solution is probably the least-awkward (no pun intended) way to solve this.
Here's an example of using sed to do the grunt work, but wrapped in a bash function for simplicity:
function transform () {
line=$1
start=$2
end=$3
# Save beginning and end of line
front=$(echo $line | sed -e "s/\(^.\{$start\}\).*$/\1/")
back=$(echo $line | sed -e "s/^.\{$end\}//")
# Translate characters
line=$(echo $line | sed -e 'y/\./,/')
# Restore unmodified beginning/end
echo $line | sed -e "s/^.\{$start\}/$front/" -e "s/\(^.\{$end\}\).*$/\1$back/"
}
Call this function like:
$ transform "1.3.5.7.9" 3 7
1.3,5,7.9

Thank you all.
What I found around (not my merit) as simple solutions are:
For fixed width files:
awk -F "" 'OFS="";{for (j=2;j<= 5;j++) if ($j==".") $j=","}'1
Will change all dots into commas from the 2nd position to the 5th.
For tab delimited fields files:
awk -F'\t' 'OFS="\t" {for (j=2;j<=5;j++) gsub(/\./,",",$j)}'1
Will change all dots into comas from the 2nd field to the 5th.
Hope that can help someone: couldn't imagine it would be so tough in the begining.

Related

How can I mask 200 characters of each line in a file with 3000 long lines?

I have a fixed width text data file. Each line is 3000 characters long. I need to mask (change to 'X") all the characters between position 1000 and 1200. There are no delimiters in the file, each field is known by its position in the line.
If I only needed to change 10 characters I could use sed:
sed -i -r 's/^(.{999}).{10}(.*)/\1XXXXXXXXX\2/'
But writing a sed command with 200 X's does not seem like a good idea.
I tried using awk, but it returns different values for some lines because of spaces in the data.
But writing a sed command with 200 X's does not seem like a good idea.
Let's do it anyway, but script it:
sed -E 's/^(.{999}).{200}/\1'"$(yes X | head -n200 | tr -d '\n')"'/'
Because it just so happens that 1000 % 200 == 0, I think we also could:
sed -E 's/.{200}/'"$(yes X | head -n200 | tr -d '\n')"'/6'
My go-to tools are, in order of increasing ability to get stuff done, sed, awk and python. You may want to consider stepping up :-)
In any case, this can be done in awk with some initial setup, something like:
BEGIN {x="XXXXXXXXXX"; x=x""x""x""x""x; x=x""x""x""x}
which gives you (10, then 50, then) 200 X's.
Then you can just fiddle with $0, which is the whole line regardless of spacing. Depending on what you actually meant by "between positions 1000 and 1200", the numbers below may be slightly different but you should get the idea:
{ print substr($0,1,999)""x""substr($0,1200) }
You can see how this will behave in the following snippet, replacing character positions 3 through 6 on each line:
pax> printf "hello there\ngoodbye\n" | awk '
...> BEGIN {x="X";x=x""x;x=x""x}
...> {print substr($0,1,2)""x""substr($0,7)}'
heXXXXthere
goXXXXe
This might work for you (GNU sed):
sed -E '1{x;:a;/^x{200}/!s/^/x/;ta;x};G;s/^(.{999}).{200}(.*)\n(.*)/\1\3\2/' file
Prime the hold space with a string containing 200 x's. Append the hold space to the current line and using substitution replace the intended string with the mask.

Sed creating duplicates

I have used the command sed in shell to remove everything except for numbers from my string.
Now, my string contains three 0s among other numbers and after running
sed 's/[^0-9]*//g'
Instead of three 0s, i now have 0 01 and 02.
How can I prevent sed from doing that so that I can have the three 0s?
sample of the string:
0 cat
42 dog
24 fish
0 bird
0 tiger
5 fly
Now that we know that digits in filenames in the output from the du utility caused the problem (tip of the hat to Lars Fischer), simply use cut to extract only first column (which contains the data of interest, each file's/subdir.'s size in blocks):
du -a "$var" | cut -f1
du outputs tab-separated data, and a tab is also cut's default separator, so all that is needed is to ask for the 1st field (-f1).
In hindsight, your problem was unrelated to sed; your sample data simply wasn't representative of your actual data. It's always worth creating an MCVE (Minimal, Complete, and Verifiable Example) when asking a question.
try this:
du -a "$var" | sed -e 's/ .*//' -e 's/[^0-9]*//g'

remove everything but first and last 6 digits

I am trying to remove everything but the first number and last 6 digits from every line in a file. So far I have removed everything but the last 6 digits using sed like so:
sed -r 's/.*(.{6})/\1/' test
Would there be a way for me to modify this so that I keep the first number too? This number can be any length but will always be followed by a space. Basically, I would like to get rid of /home/usr/file and only keep 123456789 123456 Any help would be greatly appreciated!
Input line:
123455679 /home/usr/file123456
Desired Output:
123456789 123456
echo 5 /home/usr/file123456 | awk '{print $1,substr($2,length($2)-5,6)}'
Do the same thing you did for the end at the beginning.
sed -r 's/(.).*(.{6})/\1\2/' test
(I have no idea how efficient this is however. It might need to back-track for the length of the final match.)
To grab the first "field" (space separated) and the last six characters you can use.
sed -r 's/([^[:space:]]*) .*(.{6})/\1 \2/' test
Though I think the awk solution is generally a better idea.
$ echo '123456789 /home/usr123/file123456' | sed -r 's/ .*(.{6})/ \1/'
123456789 123456

Limiting SED to the first 10 characters of a line

I'm running sed as a part of a shell script to clean up bind logs for insertion into a database.
One of the sed commands is the following:
sed -i 's/-/:/g' $DPath/named.query.log
This turns out to be problematic as it disrupts any resource requests that also include a dash (I'm using : as a delimiter for an awk statement further down).
My question is how do I limit the sed command above to only the first ten characters of the line? I haven't seen a specific switch that does this, and I'm nowhere near good enough with RegEx to even start on developing one that works. I can't just use regex to match the preceding numbers because it's possible that the pattern could be part of a resource request. Heck, I can't even use pattern matching for ####-##-## because, again, it could be part of the resource.
Any ideas are much appreciated.
It's [almost always] simpler with awk:
awk '{target=substr($0,1,10); gsub(/-/,":",target); print target substr($0,11)}' file
I think the shortest solution, and perhaps the simplest, is provided by sed itself, rather than awk[ward]:
sed "h;s/-/:/g;G;s/\(..........\).*\n........../\1/"
Explanation:
(h) copy everything to the hold space
(s) do the substitution (to the entire pattern space)
(G) append the hold space, with a \n separator
(s) delete the characters up to the tenth after the \n, but keep the first ten.
Some test code:
echo "--------------------------------" > foo
sed -i "h;s/-/:/g;G;s/\(..........\).*\n........../\1/" foo
cat foo
::::::::::----------------------
I'm not sure how make sed do it per se, however, I do know that you can feed sed the first 10 characters then paste the rest back in, like so:
paste -d"\0" <(cut -c1-10 $DPath/named.query.log | sed 's/\-/:/g') <(cut -c11- $DPath/named.query.log)
You can do the following:
cut -c 1-10 $DPath/named.query.log | sed -i 's/-/:/g'
The cut statemnt takes only the first 10 chars of each line in that file. The output of that should be piped in a file. As of now it will just output to your terminal

Remove nth character from middle of string using Shell

I've been searching google for ever, and I cannot find an example of how to do this. I also do not grasp the concept of how to construct a regular expression for SED, so I was hoping someone could explain this to me.
I'm running a bash script against a file full of lines of text that look like this: 2222,H,73.82,04,07,2012
and I need to make them all look like this: 2222,H,73.82,04072012
I need to remove the last two commas, which are the 16th and 19th characters in the line.
Can someone tell me how to do that? I was going to use colrm, which is blessedly simple, but i can't seem to get that installed in CYGWIN. Please and thank you!
I'd use awk for this:
awk -F',' -v OFS=',' '{ print $1, $2, $3, $4$5$6 }' inputfile
This takes a CSV file and prints the first, second and third fields, each followed by the output field separator (",") and then the fourth, fifth and sixth fields concatenated.
Personally I find this easier to read and maintain than regular expression-based solutions in sed and it will cope well if any of your columns get wider (or narrower!).
This will work on any string and will remove only the last 2 commas:
sed -e 's/\(.*\),\([^,]*\),\([^,]*\)$/\1\2\3/' infile.txt
Note that in my sed variant I have to escape parenthesis, YMMV.
I also do not grasp the concept of how to construct a regular
expression for SED, so I was hoping someone could explain this to me.
The basic notation that people are telling you here is: s/PATTERN/REPLACEMENT/
Your PATTERN is a regular expression, which may contain parts that are in brackets. Those parts can then be referred to in the REPLACEMENT part of the command. For example:
> echo "aabbcc" | sed 's/\(..\)\(..\)\(..\)/\2\3\1/'
bbccaa
Note that in the version of sed I'm using defaults to the "basic" RE dialect, where the brackets in expressions need to be escaped. You can do the same thing in the "extended" dialect:
> echo "aabbcc" | sed -E 's/(..)(..)(..)/\2\3\1/'
bbccaa
(In GNU sed (which you'd find in Linux), you can get the same results with the -r options instead of -E. I'm using OS X.)
I should say that for your task, I would definitely follow Johnsyweb's advice and use awk instead of sed. Much easier to understand. :)
It should work :
sed -e 's~,~~4g' file.txt
remove 4th and next commas
echo "2222,H,73.82,04,07,2012" | sed -r 's/(.{15}).(..)./\1\2/'
Take 15 chars, drop one, take 2, drop one.
sed -e 's/(..),(..),(....)$/\1\2\3/' myfile.txt

Resources