Replace the last character in string - bash

How can I just replace the last character (it's a }) from a string? I need everything before the last character but replace the last character with some new string.
I tried many things with awk and sed but didn't succeed.
For example:
...\\tx4535\\tx5102\\tx5669\\tx6236\\tx6803\\pardirnatural
\\f0
}'
should become:
...\\tx4535\\tx5102\\tx5669\\tx6236\\tx6803\\pardirnatural
\\f0
\\cf2 Its red now
}'
This replaces the last occurrence of:
}
with
\\cf2 Its red now
}

sed would do this:
# replace '}' in the end
echo '\tx4535\tx5102\tx5669\tx6236\tx6803\pardirnatural \f0 }' | sed 's/}$/\\cf2 Its red now}/'
# replace any last character
echo '\tx4535\tx5102\tx5669\tx6236\tx6803\pardirnatural \f0 }' | sed 's/\(.\)$/\\cf2 Its red now\1/'

Replacing the trailing } could be done like this (with $ as the PS1 prompt and > as the PS2 prompt):
$ str="...\\tx4535\\tx5102\\tx5669\\tx6236\\tx6803\\pardirnatural
> \\f0
> }"
$ echo "$str"
...\tx4535\tx5102\tx5669\tx6236\tx6803\pardirnatural
\f0
}
$ echo "${str%\}}\cf2 It's red now
}"
...\tx4535\tx5102\tx5669\tx6236\tx6803\pardirnatural
\f0
\cf2 It's red now
}
$
The first 3 lines assign your string to my variable str. The next 4 lines show what's in the string. The 2 lines:
echo "${str%\}}\cf2 It's red now
}"
contain a (grammar-corrected) substitution of the material you asked for, and the last lines echo the substituted value.
Basically, ${str%tail} removes the string tail from the end of $str; I remember % ends in 't' for tail (and the analogous ${str#head} has hash starting with 'h' for head).
See shell parameter expansion in the Bash manual for the remaining details.
If you don't know the last character, you can use a ? metacharacter to match the end instead:
echo "${str%?}and the extra"

First make a string with newlines
str=$(printf "%s\n%s\n%s" '\\tx4535\\tx5102\\tx5669\\tx6236\\tx6803\\pardirnatural' '\\f0' "}'")
Now you look for the last } in your string and replace it including a newline.
The $ makes sure it will only replace it at the last line, & stands for the matches string.
echo "${str}" |sed '$ s/}[^}]$/\\\\cf2 Its red now\n&/'
The above solution only works when the } is at the last line. It becomes more difficult when you also want to support str2:
str2=$(printf "Extra } here.\n%s\nsome other text" "${str}")
You can not match the } on the last line. Removing the address $ for the last line will result in replacing all } characters (I added a } at the beginning of str2). You only want to replace the last one.
Replacing once is forced with ..../1. Replacing the last and not the first is done by reversing the order of lines with tac. Since you will tac again after the replacement, you need to use a different order in your sedreplacement string.
echo "${str2}" | tac |sed 's/}[^}]$/&\n\\\\cf2 Its red now/1' |tac

In awk:
$ awk ' BEGIN { RS=OFS=FS="" } $NF="\\\\cf2 Its red now\n}"' file
RS="" sets RS to an empty record (change it to suit your needs)
OFS=FS="" separates characters each to its own field
$NF="\\\\cf2 Its red now\n}" replaces the char in the last field ($NF=}) with the quoted text

awk '{sub(/\\f0/,"\\f0\n\\\\\cfs Its red now")}1' file
...\\tx4535\\tx5102\\tx5669\\tx6236\\tx6803\\pardirnatural
\\f0
\\cfs Its red now
}'

Related

sed replace string with pipe and stars

I have the following string:
|**barak**.version|2001.0132012031539|
in file text.txt.
I would like to replace it with the following:
|**barak**.version|2001.01.2012031541|
So I run:
sed -i "s/\|\*\*$module\*\*.version\|2001.0132012031539/|**$module**.version|$version/" text.txt
but the result is a duplicate instead of replacing:
|**barak**.version|2001.01.2012031541|**barak**.version|2001.0132012031539|
What am I doing wrong?
Here is the value for module and version:
$ echo $module
barak
$ echo $version
2001.01.2012031541
Assumptions:
lines of interest start and end with a pipe (|) and have one more pipe somewhere in the middle of the data
search is based solely on the value of ${module} existing between the 1st/2nd pipes in the data
we don't know what else may be between the 1st/2nd pipes
the version number is the only thing between the 2nd/3rd pipes
we don't know the version number that we'll be replacing
Sample data:
$ module='barak'
$ version='2001.01.2012031541'
$ cat text.txt
**barak**.version|2001.0132012031539| <<<=== leave this one alone
|**apple**.version|2001.0132012031539|
|**barak**.version|2001.0132012031539| <<<=== replace this one
|**chuck**.version|2001.0132012031539|
|**barak**.peanuts|2001.0132012031539| <<<=== replace this one
One sed solution with -Extended regex support enabled and making use of a capture group:
$ sed -E "s/^(\|[^|]*${module}[^|]*).*/\1|${version}|/" text.txt
Where:
\| - first occurrence (escaped pipe) tells sed we're dealing with a literal pipe; follow-on pipes will be treated as literal strings
^(\|[^|]*${module}[^|]*) - first capture group that starts at the beginning of the line, starts with a pipe, then some number of non-pipe characters, then the search pattern (${module}), then more non-pipe characters (continues up to next pipe character)
.* - matches rest of the line (which we're going to discard)
\1|${version}| - replace line with our first capture group, then a pipe, then the new replacement value (${version}), then the final pipe
The above generates:
**barak**.version|2001.0132012031539|
|**apple**.version|2001.0132012031539|
|**barak**.version|2001.01.2012031541| <<<=== replaced
|**chuck**.version|2001.0132012031539|
|**barak**.peanuts|2001.01.2012031541| <<<=== replaced
An awk alternative using GNU awk:
awk -v mod="$module" -v vers="$version" -F \| '{ OFS=FS;split($2,map,".");inmod=substr(map[1],3,length(map[1])-4);if (inmod==mod) { $3=vers } }1' file
Pass two variables mod and vers to awk using $module and $version. Set the field delimiter to |. Split the second field into array map using the split function and using . as the delimiter. Then strip the leading and ending "**" from the first index of the array to expose the module name as inmod using the substr function. Compare this to the mod variable and if there is a match, change the 3rd delimited field to the variable vers. Print the lines with short hand 1
Pipe is only special when you're using extended regular expressions: sed -E
There's no reason why you need extended here, stick with basic regex:
sed "
# for lines matching module.version
/|\*\*$module\*\*.version|/ {
# replace the version
s/|2001.0132012031539|/|$version|/
}
" text.txt
or as an unreadable one-liner
sed "/|\*\*$module\*\*.version|/ s/|2001.0132012031539|/|$version|/" text.txt

How to replace last n characters in the kth occurence of a line containing a certain substring using sed or awk?

Suppose I have a file that resembles the following format:
\\ Random other lines \\
...
27861NA+ NA+89122 13.480 11.554 10.082
27862NA+ NA+89123 2.166 5.896 10.108
27863NA+ NA+89124 8.289 6.843 3.090
27864NA+ NA+89125 12.972 5.936 4.498
27865CL- CL-89126 13.914 2.125 12.915
27866CL- CL-89127 12.050 13.907 3.559
...
\\ Random other lines \\
I am trying to find a way of replacing the last 24 characters of each line with a string that I have prepared, for the first 3 instances of lines in the file that contain the string "NA+".
For example, my output would ideally look like:
\\ Random other lines \\
...
27861NA+ NA+89122 my first string hello
27862NA+ NA+89123 my second string foo
27863NA+ NA+89124 my final string bar $$
27864NA+ NA+89125 12.972 5.936 4.498
27865CL- CL-89126 13.914 2.125 12.915
27866CL- CL-89127 12.050 13.907 3.559
...
\\ Random other lines \\
So far, I have found a sed command that will remove the last 24 characters from every line in the file:
sed 's/.\{24\}$//' myfile.txt
And also an awk command that will return the kth line that contains the desired substring:
awk '/NA+/{i++}i==1' myfile.txt
Does anyone have an idea about how I could replace the last 24 characters in the 1st, 2nd, and 3rd lines of my file that each contain a certain substring?
With single awk:
awk -v str="my string" '!f && /NA\+/{ f=1; n=NR+3 }n && n>NR{ $4=$5=""; $3=str }1' myfile.txt
string="my first string hello"
awk -v string="$string" '{ if ( $0 ~ "NA" ) {cnt++} if (cnt < 4 ) { print substr($0,1,length($0)-23)string } else { print }}' NA
Using awk, set a string and pass it awk with -v. Search for strings containing NA and the increment the variable cnt. When cnt is less that 4, print the everything but the last 23 characters adding the string passed to the end. Otherwise print the line.
This might work for you (GNU sed):
sed '/NA+/{x;s/\n/&/3;x;ta;H;s/.\{24\}$/some string/;b;:a;n;ba}' file
This uses the hold space (HS) to keep a count of the number of lines the script has seen of the required string (NA+). Once it has seen n (in this case n=3) such lines it just prints the remainder of the file.

How to select text in a file until a certain string using grep, sed or awk?

I have a huge file (this is just a sample) and I would like to select all lines with "Ph_gUFAC1083" and all after until reach one that doesn't have the code (in this example Ph_gUFAC1139)
>uce_353_Ph_gUFAC1083 |uce_353
TTTAGCCATAGAAATGCAGAAATAATTAGAAGTGCCATTGTGTACAGTGCCTTCTGGACT
GGGCTGAAGGTGAAGGAGAAAGTATCATACTATCCTTGTCAGCTGCAAGGGTAATTACTG
CTGGCTGAAATTACTCAACATTTGTTTATAAGCTCCCCAGAGCATGCTGTAAATAGATTG
TCTGTTATAGTCCAATCACATTAAAACGCTGCTCCTTGCAAACTGCTACCTCCTGTTTTC
TGTAAGCTAGACAGAGAAAGCCTGCTGCTCACTTACTGAGCACCAAGCACTGAAGAGCTA
TGTTTAATGTGATTGTTTTCATTAGCTCTTCTCTGTCTGATATTACATTTATAATTTGCT
GGGCTTGAAGACTGGCATGTTGCATTGCTTTCATTTACTGTAGTAAGAGTGAATAGCTCT
AT
>uce_101_Ph_gUFAC1083 |uce_101
TTGGGCTTTATTTCCACCTTAAAATCTTTACCTGGCCGTGATCTGTTGTTCCATTACTGG
AGGGCAAAAATGGGAGGAATTGTCTGGGCTAAATTGCAATTAGGCAGCCCTGAGAGAGGC
TGGCACCAGTTAACTTGGGATATTGGAGTGAAAAGGCCCGTAATCAGCCTTCGGTCATGT
AGAACAATGCATAAAATTAAATTGACATTAATGAATAATTGTGTAATGAAAATGGAAGAG
GAGAGTTAATTGCATGTTACAGTGAGTGTAATGCCTAGATAACCTTGCATTTAATGCTAT
TCTTAGCCCTGCTGCCAAGACTTCTACAGAGCCTCTCTCTGCAGGAAGTCATTAAAGCTG
TGAGTAGATAATGCAGGCTCAGTGAAACCTAAGTGGCAACAATATA
>uce_171_Ph_gUFAC1083 |uce_171
CATGGAAAACGAGGAAAAGCCATATCTTCCAGGCCATTAATATTACTACGGAGACGTCTT
CATATCGCCGTAATTACAGCAGATCTCAAAGTGGCACAACCAAGACCAGCACCAAAGCTA
AAATAACTCGCAGGAGCAGGCGAGCTGCTTTTGCAGCCCTCAGTCCCAGAAATGCTCGGT
AGCTTTTCTTAAAATAGACAGCCTGTAAATAAGGTCTGTGAACTCAATTGAAGGTGGCTG
TTTCTGAATTAGTCAGCCCTCACAAGGCTCTCGGCCTACATGCTAGTACATAAATTGTCC
ACTTTACCACCAGACAAGAAAGATTAGAGTAATAAACACGGGGCATTAGCTCAGCTAGAG
AAACACACCAGCCGTTACGCACACGCGGGATTGCCAAGAACTGTTAACCCCACTCTCCAG
AAACGCACACAAAAAAACAAGTTAAAGCCATGACATCATGGGAA
>uce_4300_Ph_gUFAC1139 |uce_4300
ATTAAAAATACAATCCTCATGTTTGCATTTTGCAGTCGTCAACAAGAAATTGAAGAGAAA
CTCATAGAGGAAGAAACTGCTCGAAGGGTGGAAGAACTTGTAGCTAAACGCGTGGAAGAA
GAGCTGGAGAAAAGAAAGGATGAGATTGAGCGAGAGGTTCTCCGCAGGGTGGAGGAGGCT
AAGCGCATCATGGAAAAACAGTTGCTCGAAGAACTCGAGCGACAGCGACAAGCTGAACTT
GCAGCACAAAAAGCCAGAGAGGTAACGCTCGGTCGTTTGGAAAGTAGAGACAGTCCATGG
CAAAACTTTCAGTGTCGGTTTGTGCCTCCTGTTCGGTTCAGAAAGAGATGGAATACAGCA
AATCTAATTCCCTTCTCATATAAACTTGCATTGCTGCGAAACTTAATTTCTAGCCTATTC
AGAGGAGCTCACTGATATTTAAACAGTTACTCTCCTAAAACCTGAACAAGGATACTTGAT
TCTTAATGGAACTGACCTACATATTTCAGAATTGTTTGAAACTTTTGCCATGGCTGCAGG
ATTATTCAGCAGTCCTTTCATTTT
>uce_1039_Ph_gUFAC1139 |uce_1039
ATTAGTGGAATACAAATATGCAAAAACCAAACAGTTTGGTGCTATAATGTGAAAAGAAAT
TTACACCAATCTTATTTTTAATTTGTATGGGAACATTTTTACCACAAATTCCATATTTTA
ATAATACTATCCCAACTCTATTTTTTAGACTCATTTTGTCACTGTTTTGTAACAGAAACA
CTGTAAATATTATAGATGTGGTAAACTATTATACTTGTTTTCTTATAAATGAAATGATCT
GTGCCAACACTGACAAAATGAATTAATGTGTTACTAAGGCAACAGTCACATTATATGCTT
TCTCTTTCACAGTATGCGGTAGAGCATATGGTTTACTCTTAATGGAACACTAGCTTCTCA
TTAACATACCAGTAGCAATGTCAGAACTTACAAACCAGCATAACAGAGAAATGGAAAAAC
TTATAAATTAGACCCTTTCAGTATTATTGAGTAGAAAATGACTGATGTTCCAAGGTACAA
TATTTAGCTAATACAGTGCCCTTTTCTGCATCTTTCTTCTCAAAGGAAAAAAAAATCCTC
AAAAAAAACCAGAGCAAGAAACCTAACTTTTTCTTGT
I already tried several alternatives without success, the closest I reached was
sed -n '/Ph_gUFAC1083/, />/p' file.txt
that gave me that:
>uce_2347_Ph_gUFAC1083 |uce_2347
GCTTTTCTATGCAGATTTTTTCTAATTCTCTCCCTCCCCTTGCTTCTGTCAGTGTGAAGC
CCACACTAAGCATTAACAGTATTAAAAAGAGTGTTATCTATTAGTTCAATTAGACATCAG
ACATTTACTTTCCAATGTATTTGAAGACTGATTTGATTTGGGTCCAATCATTTAAAAATA
AGAGAGCAGAACTGTGTACAGAGCTGTGTACAGATATCTGTAGCTCTGAAGTCTTAATTG
CAAATTCAGATAAGGATTAGAAGGGGCTGTATCTCTGTAGACCAAAGGTATTTGCTAATA
CCTGAGATATAAAAGTGGTTAAATTCAATATTTACTAATTTAGGATTTCCACTTTGGATT
TTGATTAAGCTTTTTGGTTGAAAACCCCACATTATTAAGCTGTGATGAGGGAAAAAGCAA
CTCTTTCATAAGCCTCACTTTAACGCTTTATTTCAAATAATTTATTTTGGACCTTCTAAA
G
>uce_353_Ph_gUFAC1083 |uce_353
>uce_101_Ph_gUFAC1083 |uce_101
TTGGGCTTTATTTCCACCTTAAAATCTTTACCTGGCCGTGATCTGTTGTTCCATTACTGG
AGGGCAAAAATGGGAGGAATTGTCTGGGCTAAATTGCAATTAGGCAGCCCTGAGAGAGGC
TGGCACCAGTTAACTTGGGATATTGGAGTGAAAAGGCCCGTAATCAGCCTTCGGTCATGT
AGAACAATGCATAAAATTAAATTGACATTAATGAATAATTGTGTAATGAAAATGGAAGAG
GAGAGTTAATTGCATGTTACAGTGAGTGTAATGCCTAGATAACCTTGCATTTAATGCTAT
TCTTAGCCCTGCTGCCAAGACTTCTACAGAGCCTCTCTCTGCAGGAAGTCATTAAAGCTG
TGAGTAGATAATGCAGGCTCAGTGAAACCTAAGTGGCAACAATATA
>uce_171_Ph_gUFAC1083 |uce_171
Do you know how to do it using grep, sed or awk?
Thx
$ awk '/^>/{if(match($0,"Ph_gUFAC1083")){s=1} else s=0}s' file
I made a simple criteria for your request,
If the the start of the line is >, we're going to judge if "Ph_gUFAC1083" existed, if yes, set s=1, set s=0 otherwise.
For the line that doesn't start with >, the value of s would be retained.
The final s in the awk command decide if the line to be printed (s=1) or not (s=0).
If what you want is every line with Ph_gUFAC1139 plus block of lines after that line until the next line starting with >, then the following awk snippet might do:
$ awk 'BEGIN {RS=ORS=">"} /Ph_gUFAC1139/' file.txt
This uses the > character as a record separator, then simply displays records that contain the text you're interested in.
If you wanted to be able to provide the search string using a variable, you'd do it something like this:
$ val="Ph_gUFAC1139"
$ awk -v s="$val" 'BEGIN {RS=ORS=">"} $0 ~ s' file.txt
UPDATE
A comment mentions that the solution above shows trailing record separators rather than leading ones. You can adapt your output to match your input by reversing this order manually:
awk 'BEGIN { RS=ORS=">" } /Ph_gUFAC1139/ { printf "%s%s",ORS,$0 }' file.txt
Note that in the initial examples, a "match" of the regex would invoke awk's default "action", which is to print the line. The default action is invoked if no action is specified within the script. The code (immediately) above includes an action .. which prints the record, preceded by the separator.
This might work for you (GNU sed):
sed '/^>/h;G;/Ph_gUFAC1083/P;d' file
Store each line beginning with > in the hold space (HS) and then append the HS to every line. If any line contains the string Ph_gUFAC1083 print the first line in the pattern space (PS) and discard the everything else.
N.B. the regexp for the match may be amended to /\n.*Ph_gUFAC1083/ if the string match may occur in any line.
This program is used to find the block which starts with Ph_gUFAC1083 and ends with any statement other than Ph_gUFAC1139
cat inp.txt |
awk '
BEGIN{begin=0}
{
# Ignore blank lines
if( $0 ~ /^$/ )
{
print $0
next
}
# mark the line that contains Ph_gUFAC1083 and print it
if( $0 ~ /Ph_gUFAC1083/ )
{
begin=1
print $0
}
else
{
# if the line contains Ph_gUFAC1083 and Ph_gUFAC1139 was found before it, print it
if( begin == 1 && ( $0 ~ /Ph_gUFAC1139/ ) )
{
print $0
}
else
{
# found a line which doesnt contain Ph_gUFAC1139 , mark the end of the block.
begin = 0
}
}
}'

insert a string at specific position in a file by SED awk

I have a string which i need to insert at a specific position in a file :
The file contains multiple semicolons(;) i need to insert the string just before the last ";"
Is this possible with SED ?
Please do post the explanation with the command as I am new to shell scripting
before :
adad;sfs;sdfsf;fsdfs
string = jjjjj
after
adad;sfs;sdfsf jjjjj;fsdfs
Thanks in advance
This might work for you:
echo 'adad;sfs;sdfsf;fsdfs'| sed 's/\(.*\);/\1 jjjjj;/'
adad;sfs;sdfsf jjjjj;fsdfs
The \(.*\) is greedy and swallows the whole line, the ; makes the regexp backtrack to the last ;. The \(.*\) make s a back reference \1. Put all together in the RHS of the s command means insert jjjjj before the last ;.
sed 's/\([^;]*\)\(;[^;]*;$\)/\1jjjjj\2/' filename
(substitute jjjjj with what you need to insert).
Example:
$ echo 'adad;sfs;sdfsf;fsdfs;' | sed 's/\([^;]*\)\(;[^;]*;$\)/\1jjjjj\2/'
adad;sfs;sdfsfjjjjj;fsdfs;
Explanation:
sed finds the following pattern: \([^;]*\)\(;[^;]*;$\). Escaped round brackets (\(, \)) form numbered groups so we can refer to them later as \1 and \2.
[^;]* is "everything but ;, repeated any number of times.
$ means end of the line.
Then it changes it to \1jjjjj\2.
\1 and \2 are groups matched in first and second round brackets.
For now, the shorter solution using sed : =)
sed -r 's#;([^;]+);$#; jjjjj;\1#' <<< 'adad;sfs;sdfsf;fsdfs;'
-r option stands for extented Regexp
# is the delimiter, the known / separator can be substituted to any other character
we match what's finishing by anything that's not a ; with the ; final one, $ mean end of the line
the last part from my explanation is captured with ()
finally, we substitute the matching part by adding "; jjjj" ans concatenate it with the captured part
Edit: POSIX version (more portable) :
echo 'adad;sfs;sdfsf;fsdfs;' | sed 's#;\([^;]\+\);$#; jjjjj;\1#'
echo 'adad;sfs;sdfsf;fsdfs;' | sed -r 's/(.*);(.*);/\1 jjjj;\2;/'
You don't need the negation of ; because sed is by default greedy, and will pick as much characters as it can.
sed -e 's/\(;[^;]*\)$/ jjjj\1/'
Inserts jjjj before the part where a semicolon is followed by any number of non-semicolons ([^;]*) at the end of the line $. \1 is called a backreference and contains the characters matched between \( and \).
UPDATE: Since the sample input has no longer a ";" at the end.
Something like this may work for you:
echo "adad;sfs;sdfsf;fsdfs"| awk 'BEGIN{FS=OFS=";"} {$(NF-1)=$(NF-1) " jjjjj"; print}'
OUTPUT:
adad;sfs;sdfsf jjjjj;fsdfs
Explanation: awk starts with setting FS (field separator) and OFS (output field separator) as semi colon ;. NF in awk stands for number of fields. $(NF-1) thus means last-1 field. In this awk command {$(NF-1)=$(NF-1) " jjjjj" I am just appending jjjjj to last-1 field.

Replacing quotation marks with "``" and "''"

I have a document containing many " marks, but I want to convert it for use in TeX.
TeX uses 2 ` marks for the beginning quote mark, and 2 ' mark for the closing quote mark.
I only want to make changes to these when " appears on a single line in an even number (e.g. there are 2, 4, or 6 "'s on the line). For e.g.
"This line has 2 quotation marks."
--> ``This line has 2 quotation marks.''
"This line," said the spider, "Has 4 quotation marks."
--> ``This line,'' said the spider, ``Has 4 quotation marks.''
"This line," said the spider, must have a problem, because there are 3 quotation marks."
--> (unchanged)
My sentences never break across lines, so there is no need to check on multiple lines.
There are few quotes with single quotes, so I can manually change those.
How can I convert these?
This is my one-liner which is works for me:
awk -F\" '{if((NF-1)%2==0){res=$0;for(i=1;i<NF;i++){to="``";if(i%2==0){to="'\'\''"}res=gensub("\"", to, 1, res)};print res}else{print}}' input.txt >output.txt
And there is long version of this one-liner with comments:
{
FS="\"" # set field separator to double quote
if ((NF-1) % 2 == 0) { # if count of double quotes in line are even number
res = $0 # save original line to res variable
for (i = 1; i < NF; i++) { # for each double quote
to = "``" # replace current occurency of double quote by ``
if (i % 2 == 0) { # if its closes quote replace by ''
to = "''"
}
# replace " by to in res and save result to res
res = gensub("\"", to, 1, res)
}
print res # print resulted line
} else {
print # print original line when nothing to change
}
}
You may run this script by:
awk -f replace-quotes.awk input.txt >output.txt
Here's my one-liner using repeated sed's:
cat file.txt | sed -e 's/"\([^"]*\)"/`\1`/g' | sed '/"/s/`/\"/g' | sed -e 's/`\([^`]*\)`/``\1'\'''\''/g'
(note: it won't work correctly if there are already back-ticks (`) in the file but otherwise should do the trick)
EDIT:
Removed back-tick bug by simplifying, now works for all cases:
cat file.txt | sed -e 's/"\([^"]*\)"/``\1'\'\''/g' | sed '/"/s/``/"/g' | sed '/"/s/'\'\''/"/g'
With comments:
cat file.txt # read file
| sed -e 's/"\([^"]*\)"/``\1'\'\''/g' # initial replace
| sed '/"/s/``/"/g' # revert `` to " on lines with extra "
| sed '/"/s/'\'\''/"/g' # revert '' to " on lines with extra "
Using awk
awk '{n=gsub("\"","&")}!(n%2){while(n--){n%2?Q=q:Q="`";sub("\"",Q Q)}}1' q=\' in
Explanation
awk '{
n=gsub("\"","&") # set n to the number of quotes in the current line
}
!(n%2){ # if there are even number of quotes
while(n--){ # as long as we have double-quotes
n%2?Q=q:Q="`" # alternate Q between a backtick and single quote
sub("\"",Q Q) # replace the next double quote with two of whatever Q is
}
}1 # print out all other lines untouched'
q=\' in # set the q variable to a single quote and pass the file 'in' as input
Using sed
sed '/^\([^"]*"[^"]*"[^"]*\)*$/s/"\([^"]*\)"/``\1'\'\''/g' in
This might work for you:
sed 'h;s/"\([^"]*\)"/``\1''\'\''/g;/"/g' file
Explanation:
Make a copy of the original line h
Replace pairs of "'s s/"\([^"]*\)"/``\1''\'\''/g
Check for odd " and if found revert to original line /"/g

Resources