Unexpected substitution for & with sed - shell

I have a CSV file containing some special characters and their HTML entity names
ex: htm.csv
À,À
Á,Á
Â,Â
Ã,Ã
É,É
Ê,Ê
Í,Í
Ó,Ó
Ô,Ô
Õ,Õ
and I have a number of .php files where these special characters are present. I have written a shell script
#!/bin/bash
IFS=","
while read orig html
do
for fl in *.php; do
mv $fl $fl.old
sed 's/'$orig'/'$html'/g' $fl.old > $fl
done
done< "htm.csv"
but the problem is when using the contents of $html, it is printing the contents of $orig instead of "&".

& is a special character meaning "the whole matched string" in the s/// command. Use \&.

Use any character as a command delimiter, here is an example:
sed -Ei "s|$k|$j|g" filename.txt

In addition to the special characters you can also make the commands a bit safer and shorter:
There's no need for mv if your sed supports -i (in-place replacement)
To avoid setting IFS for the rest of the commands you can limit its scope
Escape & in $html
The result:
#!/bin/bash
while IFS="," read orig html
do
for fl in *.php
do
sed -i 's/'$orig'/'${html//&/\\&}'/g' "$fl"
done
done < "htm.csv"
Please add an example if it doesn't work for you. There could be other special characters which would have to be escaped.

Related

How to replace characters within a substring with sed or awk?

I need to replace special characters from some file names (and only file names) in an HTML document. I know how to replace special characters in the whole text with tr or sed, I know how to replace the file name with another given string with sed (e.g. 's,src="\([^"]*\)",src="newprefixtofilename_\1"'), but I am not sure sed can in some way match characters inside what I get in \1?
If sed is not able to do this, how can I do it e.g. with awk? It is probably possible to isolate the " delimited strings that are prefixed with src= and go a gsub on these only. I can assume that src= appears only in tags (so no "real" html parsing) and that there is only one string to match per file line.
Example input line:
<img src="spécial.png"> Spécial
<img src="piètre.png"> Some text including "piètre"
Desired output with [éî] replaced by [ei] only in filenames:
<img src="special.png"> Spécial
<img src="pietre.png"> Some text including "piètre"
You cant do this with sed directly (don't know about awk, tho). First you need to create a secondary file in which you replace every character for an UTF8 character, than parse and replace the differences.
I will strongly suggest to try it on test data first.
# Translate non UTF8
$ iconv -f utf-8 -t ascii//translit files.html > tmp.txt
# Create arrays (IFS if files have spaces, otherwise redundant)
$ IFS=$'\n'
$ FROM=($(diff files.html tmp.txt | grep '^<.*<img' | sed -r 's/.*src="([^"]*)".*/\1/'))
$ TO=($(diff files.html tmp.txt | grep '^>.*<img' | sed -r 's/.*src="([^"]*)".*/\1/'))
# Rename files (mv spécial.png special.png)
$ for ((i=0; i < ${#FROM[#]}; i++)); do mv "${FROM[$i]}" "${TO[$i]}"; done
# Change html src attributes
$ for ((i=0; i < ${#FROM[#]}; i++)); do sed -i "s/${FROM[$i]}/${TO[$i]}/" files.html; done
# End result
$ cat files.html
<img src="special.png"> Spécial
<img src="pietre.png"> Some text including "piètre"
Stating the requirement: replace special character ( é->e, î->i), only inside src="..." tokens.
Assuming the XML files are formatted reasonable (more specific, the full IMG tag is on one line), can be achieved replacing each of the special characters using 's' command.
First line é->e, second line î->i
sed -e 's,src="\([^"]*\)é\([^"]*"\),src=\1e\2,g' \
-e 's,src="\([^"]*\)î\([^"]*"\),src=\1i\2,g'
The above solution will not handle src that has the same special characters more than once. (e.g., src-"xîzîtîFi.png". If this is an issue, and assuming small number of repeats is accepted 92 in below example, then
# é->e
sed -e 's,src="\([^"]*\)é\([^"]*"\),src="\1e\2,g' \
-e 's,src="\([^"]*\)é\([^"]*"\),src="\1e\2,g' \
-e 's,src="\([^"]*\)é\([^"]*"\),src="\1e\2,g' \
-e 's,src="\([^"]*\)î\([^"]*"\),src="\1i\2,g' \
-e 's,src="\([^"]*\)î\([^"]*"\),src="\1i\2,g' \
-e 's,src="\([^"]*\)î\([^"]*"\),src="\1i\2,g'
I'm sure that there is a possibility to using labels/branch to perform above substitution more beneficently to handle unlimited number of special characters.
Renaming files
The other question can leverage 'sed' Transliterate command. Something like:
for file in FILELIST ; do
new_name=$(echo $file | sed -e 'y/éî/ei/')
if [ "$file" != "$new_name] ; then
mv $file $new_name
if
done

what does the at sign before a dollar sign #$VAR do in a SED string in a Shell script?

What does #$VAR mean in Shell? I don't get the use of # in this case.
I encountered the following shell file while working on my dotfiles repo
#!/usr/bin/env bash
KEY="$1"
VALUE="$2"
FILE="$3"
touch "$FILE"
if grep -q "$1=" "$FILE"; then
sed "s#$KEY=.*#$KEY=\"$VALUE\"#" -i "$FILE"
else
echo "export $KEY=\"$VALUE\"" >> "$FILE"
fi
and I'm struggling with understanding the sed "s#$KEY=.*#$KEY=\"$VALUE\"#" -i "$FILE" line, especially the use of #.
When using sed you must not necessarily use a / character as the delimiter for the substitute action.
Thereby, the #, or % characters are also perfectly fine options to be used instead:
echo A | sed s/A/B/
echo A | sed s#A#B#
echo A | sed s%A%B%
In the command
sed "s#$KEY=.*#$KEY=\"$VALUE\"#" -i "$FILE"
the character # is used as a delimiter in the s command of sed. The general form of the s (substitute) command is
s<delim><searchPattern><delim><replaceString><delim>[<flags>]
where the most commonly used <delim> is /, but other characters are sometimes used, especially when either <searchPattern> or <replaceString> contain (or might contain) slashes.

find and replace in place with grep and sed (and make a log for the files changed)

My script is as follow (variables are defined above by user input):
grep -RlI $OLD $PATH > $LIST
while read line
do
FILE=echo $line
sed -i '' -e 's|$OLD|$NEW|g' $FILE
done < $LIST
It seems to work except that sed fails because
"sed: -i may not be used with stdin"
What am I doing wrong? Maybe that's the wrong approach for what I am trying to do?
(which, by the way, is to replace occurrences of a string in many files, AND to create a file that lists all files that contain a match.)
Many thanks,
C
Try replacing
FILE=echo $line
with
FILE="$line"
sed is complaining because the $FILE variable doesn't contain anything, or just contains whitespace. Examine the contents of the file referenced by $LIST; make sure there are no empty lines or lines with just whitespace.
sed -i -r 's/\$[[:alnum:]]{32}-[[:digit:]]{8}\$[[:alnum:]+\.\_\-]{2,3}#[[:alnum:]+\.\_\-]*/****/' *.log
my variant to replace data like $1BC29B36F623BA82AAF6724FD3B16718-17082022$2sy#domain4.name with *****

sed not writing to file

I am having trouble using sed to substitute values and write to a new file. It writes to a new file, but fails to change any values. Here is my code:
cd/mydirectory
echo "Enter file name:"
read file_input
file1= "$file_input"
file1= "$file1.b"
file2= "$file_input"
file2= "${file2}Ins.b"
sed "/\!cats!/s/\!cats!.*/cats!300!/g $file1>$file2
I simply want to substitute whatever text was after cats with the value 300. Whenever I run this script it doesn't overwrite the previous value with 300. Any suggestions?
Try changing
sed "/\!cats!/s/\!cats!.*/cats!300!/g $file1>$file2
to
sed "s/cats.*/cats300/g" $file1 > $file2
To replace text, you often have to use sed like sed "s/foo/bar/g" file_in > file_out, to change all occurrences of foo with bar in file_in, redirecting the output to file_out.
Edit
I noticed that you are redirecting the output to the same file - you can't do that. You have 2 options:
Redirect the results to another file, with a different filename. e.g.:
sed "s/cats.*/cats300/g" $file1 > $file2.tmp
Note the .tmp after $file2
Use the -i flag (if using GNU sed):
sed -i "s/cats.*/cats300/g" $file1
The i stands for inline replacement.
I think this modified version of your script should work:
echo "Enter file name:"
read file_input
file1="$file_input" # No space after '='
file1="$file1.b" # No space after '='
file2="$file_input" # No space after '='
file2="${file2}Ins.b" # No space after '='
sed 's/!cats!.*/!cats!300!/g' "$file1" > "$file2"
Note the single quotes around sed expression: with them, there's no need to escape the !s in your expression. Note also the double quotes around "$file1" and "$file2": if one of those variables contain spaces, this will prevent your command from breaking.
Some further remarks:
As pointed by jim, you may want to use the GNU sed -i option.
Your regex will currently replace everything after !cats! in matching lines. If they were several occurences of !cats! on your line, only one will remain. If instead you just want to replace the value between two ! delimiters, you may consider use following sed command instead:
sed 's/!cats![^!]*/!cats!300/g'

using sed to find and replace in bash for loop

I have a large number of words in a text file to replace.
This script is working up until the sed command where I get:
sed: 1: "*.js": invalid command code *
PS... Bash isn't one of my strong points - this doesn't need to be pretty or efficient
cd '/Users/xxxxxx/Sites/xxxxxx'
echo `pwd`;
for line in `cat myFile.txt`
do
export IFS=":"
i=0
list=()
for word in $line; do
list[$i]=$word
i=$[i+1]
done
echo ${list[0]}
echo ${list[1]}
sed -i "s/{$list[0]}/{$list[1]}/g" *.js
done
You're running BSD sed (under OS X), therefore the -i flag requires an argument specifying what you want the suffix to be.
Also, no files match the glob *.js.
This looks like a simple typo:
sed -i "s/{$list[0]}/{$list[1]}/g" *.js
Should be:
sed -i "s/${list[0]}/${list[1]}/g" *.js
(just like the echo lines above)
So myFile.txt contains a list of from:to substitutions, and you are looping over each of those. Why don't you create a sed script from this file instead?
cd '/Users/xxxxxx/Sites/xxxxxx'
sed -e 's/^/s:/' -e 's/$/:/' myFile.txt |
# Output from first sed script is a sed script!
# It contains substitutions like this:
# s:from:to:
# s:other:substitute:
sed -f - -i~ *.js
Your sed might not like the -f - which means sed should read its script from standard input. If that is the case, perhaps you can create a temporary script like this instead;
sed -e 's/^/s:/' -e 's/$/:/' myFile.txt >script.sed
sed -f script.sed -i~ *.js
Another approach, if you don't feel very confident with sed and think you are going to forget in a week what the meaning of that voodoo symbols is, could be using IFS in a more efficient way:
IFS=":"
cat myFile.txt | while read PATTERN REPLACEMENT # You feed the while loop with stdout lines and read fields separated by ":"
do
sed -i "s/${PATTERN}/${REPLACEMENT}/g"
done
The only pitfall I can see (it may be more) is that if whether PATTERN or REPLACEMENT contain a slash (/) they are going to destroy your sed expression.
You can change the sed separator with a non-printable character and you should be safe.
Anyway, if you know whats on your myFile.txt you can just use any.

Resources