How to replace characters within a substring with sed or awk? - bash

I need to replace special characters from some file names (and only file names) in an HTML document. I know how to replace special characters in the whole text with tr or sed, I know how to replace the file name with another given string with sed (e.g. 's,src="\([^"]*\)",src="newprefixtofilename_\1"'), but I am not sure sed can in some way match characters inside what I get in \1?
If sed is not able to do this, how can I do it e.g. with awk? It is probably possible to isolate the " delimited strings that are prefixed with src= and go a gsub on these only. I can assume that src= appears only in tags (so no "real" html parsing) and that there is only one string to match per file line.
Example input line:
<img src="spécial.png"> Spécial
<img src="piètre.png"> Some text including "piètre"
Desired output with [éî] replaced by [ei] only in filenames:
<img src="special.png"> Spécial
<img src="pietre.png"> Some text including "piètre"

You cant do this with sed directly (don't know about awk, tho). First you need to create a secondary file in which you replace every character for an UTF8 character, than parse and replace the differences.
I will strongly suggest to try it on test data first.
# Translate non UTF8
$ iconv -f utf-8 -t ascii//translit files.html > tmp.txt
# Create arrays (IFS if files have spaces, otherwise redundant)
$ IFS=$'\n'
$ FROM=($(diff files.html tmp.txt | grep '^<.*<img' | sed -r 's/.*src="([^"]*)".*/\1/'))
$ TO=($(diff files.html tmp.txt | grep '^>.*<img' | sed -r 's/.*src="([^"]*)".*/\1/'))
# Rename files (mv spécial.png special.png)
$ for ((i=0; i < ${#FROM[#]}; i++)); do mv "${FROM[$i]}" "${TO[$i]}"; done
# Change html src attributes
$ for ((i=0; i < ${#FROM[#]}; i++)); do sed -i "s/${FROM[$i]}/${TO[$i]}/" files.html; done
# End result
$ cat files.html
<img src="special.png"> Spécial
<img src="pietre.png"> Some text including "piètre"

Stating the requirement: replace special character ( é->e, î->i), only inside src="..." tokens.
Assuming the XML files are formatted reasonable (more specific, the full IMG tag is on one line), can be achieved replacing each of the special characters using 's' command.
First line é->e, second line î->i
sed -e 's,src="\([^"]*\)é\([^"]*"\),src=\1e\2,g' \
-e 's,src="\([^"]*\)î\([^"]*"\),src=\1i\2,g'
The above solution will not handle src that has the same special characters more than once. (e.g., src-"xîzîtîFi.png". If this is an issue, and assuming small number of repeats is accepted 92 in below example, then
# é->e
sed -e 's,src="\([^"]*\)é\([^"]*"\),src="\1e\2,g' \
-e 's,src="\([^"]*\)é\([^"]*"\),src="\1e\2,g' \
-e 's,src="\([^"]*\)é\([^"]*"\),src="\1e\2,g' \
-e 's,src="\([^"]*\)î\([^"]*"\),src="\1i\2,g' \
-e 's,src="\([^"]*\)î\([^"]*"\),src="\1i\2,g' \
-e 's,src="\([^"]*\)î\([^"]*"\),src="\1i\2,g'
I'm sure that there is a possibility to using labels/branch to perform above substitution more beneficently to handle unlimited number of special characters.
Renaming files
The other question can leverage 'sed' Transliterate command. Something like:
for file in FILELIST ; do
new_name=$(echo $file | sed -e 'y/éî/ei/')
if [ "$file" != "$new_name] ; then
mv $file $new_name
if
done

Related

How to replace a particular string with another in UNIX shell script

Could you please let me know how to replace a particular string present in a text file or ksh file in the server with another string ?
For example :-
I have 10 files present in the path /file_sys/file in which i have to replace the word "BILL" to "BILLING" in all the 10 files.
Works for me:
I created a file 'test' with this content: "This is a simple test". Now I execute this call to the sed command:
sed -i 's/ is / is not /' test
Afterwards the file 'test' contains this content: "This is not a simple test"
If your sed utility does not support the -i flag, then there is a somewhat awkward workaround:
sed 's/ is / is not /' test > tmp_test && mv tmp_test test
This should work. Please find the testing as well.
$ cat > file1
I am a BILL boy
sed 's/[[:alnum:] [:cntrl:] [:lower:] [:space:] [:alpha:] [:digit:] [:print:] [:upper:] [:blank:] [:graph:] [:punct:] [:xdigit:]]BILL[[:alnum:] [:cntrl:] [:lower:] [:space:] [:alpha:] [:digit:] [:print:] [:upper:] [:blank:] [:graph:] [:punct:] [:xdigit:]]/BILLING/g' file1>file2
$ cat file2
I am a BILLING boy
Using sed:
sed 's/\bBILL\b/BILLING/g' file
For inplace:
sed --in-place 's/\bBILL\b/BILLING/g' file
A little for loop might assist for dealing with multiple files, and here I'm assuming -i option is not available:
for file in $(grep -wl BILL /file_sys/file/*); do
echo $file
sed -e 's/\bBILL\b/BILLING/g' $file > tmp
mv tmp $file
done
Here's what's happening:
grep -w Search for all (and only) files with the word BILL
grep -l Listing the file names (rather than content)
$(....) Execute whats inside the brackets (command substitution)
for file in Loop over each item in the list (each file with BILL in it)
echo $file Print each file name we loop over
sed command Replace the word BILL (here, specifically delimited with word boundaries "\b") with BILLING, into a tmp file
mv command Move the tmp file back to the original name (replace original)
You can easily test this without actually changing anything - e.g. just print the file name, or just print the contents (to make sure you've got what you expect before replacing the original files).

sed not writing to file

I am having trouble using sed to substitute values and write to a new file. It writes to a new file, but fails to change any values. Here is my code:
cd/mydirectory
echo "Enter file name:"
read file_input
file1= "$file_input"
file1= "$file1.b"
file2= "$file_input"
file2= "${file2}Ins.b"
sed "/\!cats!/s/\!cats!.*/cats!300!/g $file1>$file2
I simply want to substitute whatever text was after cats with the value 300. Whenever I run this script it doesn't overwrite the previous value with 300. Any suggestions?
Try changing
sed "/\!cats!/s/\!cats!.*/cats!300!/g $file1>$file2
to
sed "s/cats.*/cats300/g" $file1 > $file2
To replace text, you often have to use sed like sed "s/foo/bar/g" file_in > file_out, to change all occurrences of foo with bar in file_in, redirecting the output to file_out.
Edit
I noticed that you are redirecting the output to the same file - you can't do that. You have 2 options:
Redirect the results to another file, with a different filename. e.g.:
sed "s/cats.*/cats300/g" $file1 > $file2.tmp
Note the .tmp after $file2
Use the -i flag (if using GNU sed):
sed -i "s/cats.*/cats300/g" $file1
The i stands for inline replacement.
I think this modified version of your script should work:
echo "Enter file name:"
read file_input
file1="$file_input" # No space after '='
file1="$file1.b" # No space after '='
file2="$file_input" # No space after '='
file2="${file2}Ins.b" # No space after '='
sed 's/!cats!.*/!cats!300!/g' "$file1" > "$file2"
Note the single quotes around sed expression: with them, there's no need to escape the !s in your expression. Note also the double quotes around "$file1" and "$file2": if one of those variables contain spaces, this will prevent your command from breaking.
Some further remarks:
As pointed by jim, you may want to use the GNU sed -i option.
Your regex will currently replace everything after !cats! in matching lines. If they were several occurences of !cats! on your line, only one will remain. If instead you just want to replace the value between two ! delimiters, you may consider use following sed command instead:
sed 's/!cats![^!]*/!cats!300/g'

How to delete the string which is present in parameter from file in unix

I have redirected some string into one parameter for ex: ab=jyoti,priya, pranit
I have one file : Name.txt which contains -
jyoti
prathmesh
John
Kelvin
pranit
I want to delete the records from the Name.txt file which are contain in ab parameter.
Please suggest if this can be done ?
If ab is a shell variable, you can easily turn it into an extended regular expression, and use it with grep -E:
grep -E -x -v "${ab//,/|}" Name.txt
The string substitution ${ab//,/|} returns the value of $ab with every , substituted with a | which turns it into an extended regular expression, suitable for passing as an argument to grep -E.
The -v option says to remove matching lines.
The -x option specifies that the match needs to cover the whole input line, so that a short substring will not cause an entire longer line to be removed. Without it, ab=prat would cause pratmesh to be removed.
If you really require a sed solution, the transformation should be fairly trivial. grep -E -v -x 'aaa|bbb|ccc' is equivalent to sed '/^\(aaa\|bbb\|ccc)$/d' (with some dialects disliking the backslashes, and others requiring them).
To do an in-place edit (modify Name.txt without a temporary file), try this:
sed -i "/^\(${ab//,/\|}\)\$/d" Name.txt
This is not entirely robust against strings containing whitespace or other shell metacharacters, but if you just need
Try with
sed -e 's/\bjyoti\b//g;s/\bpriya\b//g' < Name.txt
(using \b assuming you need word boundaries)
this will do it:
for param in `echo $ab | sed -e 's/[ ]+//g' -e 's/,/ /g'` ; do res=`sed -e "s/$param//g" < name.txt`; echo $res > name.txt; done
echo $res

using sed to find and replace in bash for loop

I have a large number of words in a text file to replace.
This script is working up until the sed command where I get:
sed: 1: "*.js": invalid command code *
PS... Bash isn't one of my strong points - this doesn't need to be pretty or efficient
cd '/Users/xxxxxx/Sites/xxxxxx'
echo `pwd`;
for line in `cat myFile.txt`
do
export IFS=":"
i=0
list=()
for word in $line; do
list[$i]=$word
i=$[i+1]
done
echo ${list[0]}
echo ${list[1]}
sed -i "s/{$list[0]}/{$list[1]}/g" *.js
done
You're running BSD sed (under OS X), therefore the -i flag requires an argument specifying what you want the suffix to be.
Also, no files match the glob *.js.
This looks like a simple typo:
sed -i "s/{$list[0]}/{$list[1]}/g" *.js
Should be:
sed -i "s/${list[0]}/${list[1]}/g" *.js
(just like the echo lines above)
So myFile.txt contains a list of from:to substitutions, and you are looping over each of those. Why don't you create a sed script from this file instead?
cd '/Users/xxxxxx/Sites/xxxxxx'
sed -e 's/^/s:/' -e 's/$/:/' myFile.txt |
# Output from first sed script is a sed script!
# It contains substitutions like this:
# s:from:to:
# s:other:substitute:
sed -f - -i~ *.js
Your sed might not like the -f - which means sed should read its script from standard input. If that is the case, perhaps you can create a temporary script like this instead;
sed -e 's/^/s:/' -e 's/$/:/' myFile.txt >script.sed
sed -f script.sed -i~ *.js
Another approach, if you don't feel very confident with sed and think you are going to forget in a week what the meaning of that voodoo symbols is, could be using IFS in a more efficient way:
IFS=":"
cat myFile.txt | while read PATTERN REPLACEMENT # You feed the while loop with stdout lines and read fields separated by ":"
do
sed -i "s/${PATTERN}/${REPLACEMENT}/g"
done
The only pitfall I can see (it may be more) is that if whether PATTERN or REPLACEMENT contain a slash (/) they are going to destroy your sed expression.
You can change the sed separator with a non-printable character and you should be safe.
Anyway, if you know whats on your myFile.txt you can just use any.

Unexpected substitution for & with sed

I have a CSV file containing some special characters and their HTML entity names
ex: htm.csv
À,À
Á,Á
Â,Â
Ã,Ã
É,É
Ê,Ê
Í,Í
Ó,Ó
Ô,Ô
Õ,Õ
and I have a number of .php files where these special characters are present. I have written a shell script
#!/bin/bash
IFS=","
while read orig html
do
for fl in *.php; do
mv $fl $fl.old
sed 's/'$orig'/'$html'/g' $fl.old > $fl
done
done< "htm.csv"
but the problem is when using the contents of $html, it is printing the contents of $orig instead of "&".
& is a special character meaning "the whole matched string" in the s/// command. Use \&.
Use any character as a command delimiter, here is an example:
sed -Ei "s|$k|$j|g" filename.txt
In addition to the special characters you can also make the commands a bit safer and shorter:
There's no need for mv if your sed supports -i (in-place replacement)
To avoid setting IFS for the rest of the commands you can limit its scope
Escape & in $html
The result:
#!/bin/bash
while IFS="," read orig html
do
for fl in *.php
do
sed -i 's/'$orig'/'${html//&/\\&}'/g' "$fl"
done
done < "htm.csv"
Please add an example if it doesn't work for you. There could be other special characters which would have to be escaped.

Resources