replace words using grep and sed in shell - shell

I have some 150 files and in them I want to remove this following code:
<SCRIPT LANGUAGE="JavaScript" TYPE="text/javascript" SRC="/height.js"></SCRIPT>
What I'm doing is:
sed -e 's/<SCRIPT LANGUAGE="JavaScript" TYPE="text/javascript" SRC="/height.js"></SCRIPT>/ /' file_names
This doesn't seem to work.
I want to remove this script from all the files in one go. How can I do that?

You have to worry about the slashes in the text you are replacing.
Either: use '\/' for each slash,
Or: cheat and use '.' to match any character at the point where the slash should appear.
The alternative exploits the improbability of a file containing the HTML. Theoretically, if you don't like the second alternative, you should also use '\.' at each point where '.' appears in the string you're looking at.
sed -e 's/<SCRIPT LANGUAGE="JavaScript" TYPE="text.javascript" SRC=".height.js"><.SCRIPT>/ /' file_names
This is copied from your example and slashes are replaced by dots. However, supplying all the file names on the command line like that will simply write the output as the concatenation of all the edited files to standard output.
Classically, to edit files more or less in situ, you'd write:
tmp=${TMPDIR:-/tmp}/xxx.$$
trap 'rm -f $tmp; exit 1' 0 1 2 3 13 15
for file in ...list...
do
sed -e '...' $file > $tmp
mv $tmp $file
done
rm -f $tmp
trap 0
This includes reasonably bullet-proof clean-up of the temporary - it is not perfect. This variant backs up the original before replacing it with the edited version:
tmp=${TMPDIR:-/tmp}/xxx.$$
trap 'rm -f $tmp; exit 1' 0 1 2 3 13 15
for file in ...list...
do
sed -e '...' $file > $tmp
mv $file $file.bak
mv $tmp $file
done
rm -f $tmp
trap 0
With GNU sed, you can use the '-i' or '--in-place' option to overwrite the files; you can use '--in-place=.bak' to create backup copies of each file in file.bak.

You need to escape the special characters with an extra backslash.
Note that the output will also all go to the console. If you want 150 separate output files, you might want to look at the xargs command, something like:
ls -1 | xargs -t -i 'sed -e -i "replace comment" {}'
Be aware that the sed '-i' option will edit the files in place so get your replacement right first and back the files up!

You should pipe the output
e.g.
sed -e 's/<.*SRC="\/height.js".*>//g' < foo.html > foo1.html

You don't have to use the / with the s command:
sed 's|old|new|' files...
If you want to do inline-replace, then the first step is to back up all of your files then issue this command, substitute 'old' and 'new' with strings of your choice:
sed -i .bak 's|old|new|' files...
Because inline replacement is very dangerous, I can't emphasize enough that you must back up all of your files before running the command.
Did I mention that you should back up all of your files?

Related

need to clean file via SED or GREP

I have these files
NotRequired.txt (having lines which need to be remove)
Need2CleanSED.txt (big file , need to clean)
Need2CleanGRP.txt (big file , need to clean)
content:
more NotRequired.txt
[abc-xyz_pqr-pe2_123]
[lon-abc-tkt_1202]
[wat-7600-1_414]
[indo-pak_isu-5_761]
I am reading above file and want to remove lines from Need2Clean???.txt, trying via SED and GREP but no success.
myFile="NotRequired.txt"
while IFS= read -r HKline
do
sed -i '/$HKline/d' Need2CleanSED.txt
done < "$myFile"
myFile="NotRequired.txt"
while IFS= read -r HKline
do
grep -vE \"$HKline\" Need2CleanGRP.txt > Need2CleanGRP.txt
done < "$myFile"
Looks as if the Variable and characters [] making some problem.
What you're doing is extremely inefficient and error prone. Just do this:
grep -vF -f NotRequired.txt Need2CleanGRP.txt > tmp &&
mv tmp Need2CleanGRP.txt
Thanks to grep -F the above treats each line of NotRequired.txt as a string rather than a regexp so you don't have to worry about escaping RE metachars like [ and you don't need to wrap it in a shell loop - that one command will remove all undesirable lines in one execution of grep.
Never do command file > file btw as the shell might decide to execute the > file first and so empty file before command gets a chance to read it! Always do command file > tmp && mv tmp file instead.
Your assumption is correct. The [...] construct looks for any characters in that set, so you have to preface ("escape") them with \. The easiest way is to do that in your original file:
sed -i -e 's:\[:\\[:' -e 's:\]:\\]:' "${myFile}"
If you don't like that, you can probably put the sed command in where you're directing the file in:
done < replace.txt|sed -e 's:\[:\\[:' -e 's:\]:\\]:'
Finally, you can use sed on each HKline variable:
HKline=$( echo $HKline | sed -e 's:\[:\\[:' -e 's:\]:\\]:' )
try gnu sed:
sed -Ez 's/\n/\|/g;s!\[!\\[!g;s!\]!\\]!g; s!(.*).!/\1/d!' NotRequired.txt| sed -Ef - Need2CleanSED.txt
Two sed process are chained into one by shell pipe
NotRequired.txt is 'slurped' by sed -z all at once and substituted its \n and [ meta-char with | and \[ respectively of which the 2nd process uses it as regex script for the input file, ie. Need2CleanSED.txt. 1st process output;
/\[abc-xyz_pqr-pe2_123\]|\[lon-abc-tkt_1202\]|\[wat-7600-1_414\]|\[indo-pak_isu-5_761\]/d
add -u ie. unbuffered, option to evade from batch process, sort of direct i/o

Remove middle of filenames

I have a list of filenames like this in bash
UTSHoS10_Other_CAAGCC-TTAGGA_R_160418.R1.fq.gz
UTSHoS10_Other_CAAGCC-TTAGGA_R_160418.R2.fq.gz
UTSHoS11_Other_AGGCCT-TTAGGA_R_160418.R2.fq.gz
UTSHoS11_Other_AGGCCT-TTAGGA_R_160418.R2.fq.gz
UTSHoS12_Other_GGCAAG-TTAGGA_R_160418.R1.fq.gz
UTSHoS12_Other_GGCAAG-TTAGGA_R_160418.R2.fq.gz
And I want them to look like this
UTSHoS10_R1.fq.gz
UTSHoS10_R2.fq.gz
UTSHoS11_R1.fq.gz
UTSHoS11_R2.fq.gz
UTSHoS12_R1.fq.gz
UTSHoS12_R2.fq.gz
I do not have the perl rename command and sed 's/_Other*160418./_/' *.gz
is not doing anything. I've tried other rename scripts on here but either nothing occurs or my shell starts printing huge amounts of code to the console and freezes.
This post (Removing Middle of Filename) is similar however the answers given do not explain what specific parts of the command are doing so I could not apply it to my problem.
Parameter expansions in bash can perform string substitutions based on glob-like patterns, which allows for a more efficient solution than calling an extra external utility such as sed in each loop iteration:
for f in *.gz; do echo mv "$f" "${f/_Other_*-TTAGGA_R_160418./_}"; done
Remove the echo before mv to perform actual renaming.
You can do something like this in the directory which contains the files to be renamed:
for file_name in *.gz
do
new_file_name=$(sed 's/_[^.]*\./_/g' <<< "$file_name");
mv "$file_name" "$new_file_name";
done
The pattern (_[^.]*\.) starts matching from the FIRST _ till the FIRST . (both inclusive). [^.]* means 0 or more non-dot (or non-period) characters.
Example:
AMD$ ls
UTSHoS10_Other_CAAGCC-TTAGGA_R_160418.R1.fq.gz UTSHoS12_Other_GGCAAG-TTAGGA_R_160418.R1.fq.gz
UTSHoS10_Other_CAAGCC-TTAGGA_R_160418.R2.fq.gz UTSHoS12_Other_GGCAAG-TTAGGA_R_160418.R2.fq.gz
UTSHoS11_Other_AGGCCT-TTAGGA_R_160418.R2.fq.gz
AMD$ for file_name in *.gz
> do new_file_name=$(sed 's/_[^.]*\./_/g' <<< "$file_name")
> mv "$file_name" "$new_file_name"
> done
AMD$ ls
UTSHoS10_R1.fq.gz UTSHoS10_R2.fq.gz UTSHoS11_R2.fq.gz UTSHoS12_R1.fq.gz UTSHoS12_R2.fq.gz
Pure Bash, using substring operation and assuming that all file names have the same length:
for file in UTS*.gz; do
echo mv -i "$file" "${file:0:9}${file:38:8}"
done
Outputs:
mv -i UTSHoS10_Other_CAAGCC-TTAGGA_R_160418.R1.fq.gz UTSHoS10_R1.fq.gz
mv -i UTSHoS10_Other_CAAGCC-TTAGGA_R_160418.R2.fq.gz UTSHoS10_R2.fq.gz
mv -i UTSHoS11_Other_AGGCCT-TTAGGA_R_160418.R2.fq.gz UTSHoS11_R2.fq.gz
mv -i UTSHoS11_Other_AGGCCT-TTAGGA_R_160418.R2.fq.gz UTSHoS11_R2.fq.gz
mv -i UTSHoS12_Other_GGCAAG-TTAGGA_R_160418.R1.fq.gz UTSHoS12_R1.fq.gz
mv -i UTSHoS12_Other_GGCAAG-TTAGGA_R_160418.R2.fq.gz UTSHoS12_R2.fq.gz
Once verified, remove echo from the line inside the loop and run again.
Going with your sed command, this can work as a bash one-liner:
for name in UTSH*fq.gz; do newname=$(echo $name | sed 's/_Other.*160418\./_/'); echo mv $name $newname; done
Notes:
I've adjusted your sed command: it had an * without a preceeding . (sed takes a regular expression, not a globbing pattern). Similarly, the dot needs escaping.
To see if it works, without actually renaming the files, I've left the echo command in. Easy to remove just that to make it functional.
It doesn't have to be a one-liner, obviously. But sometimes, that makes editing and browsing your command-line history easier.

Trying to write a script to clean <script.aa=([].slice+'hjkbghkj') from multiple htm files, recursively

I am trying to modify a bash script to remove a glob of malicious code from a large number of files.
The community will benefit from this, so here it is:
#!/bin/bash
grep -r -l 'var createDocumentFragm' /home/user/Desktop/infected_site/* > /home/user/Desktop/filelist.txt
for i in $(cat /home/user/Desktop/filelist.txt)
do
cp -f $i $i.bak
done
for i in $(cat /home/user/Desktop/filelist.txt)
do
$i | sed 's/createDocumentFragm.*//g' > $i.awk
awk '/<\/SCRIPT>/{p=1;print}/<\/script>/{p=0}!p'
This is where the script bombs out with this message:
+ for i in '$(cat /home/user/Desktop/filelist.txt)'
+ sed 's/createDocumentFragm.*//g'
+ /home/user/Desktop/infected_site/index.htm
I get 2 errors and the script stops.
/home/user/Desktop/infected_site/index.htm: line 1: syntax error near unexpected token `<'
/home/user/Desktop/infected_site/index.htm: line 1: `<html><head><script>(function (){ '
I have the first 2 parts done.
The files containing createDocumentfragm have been enumerated in a text file correctly.
The files in the textfile.txt have been duplicated, in their original location with a .bak added to them IE: infected_site/some_directory/infected_file.htm and infected_file.htm.bak
effectively making sure we have a backup.
All I need to do now is write an AWK command that will use the list of files in filelist.txt, use the entire glob of malicious text as a pattern, and remove it from the files. Using just the uppercase script as the starting point, and the lower case script is too generic and could delete legitimate text
I suspect this may help me, but I don't know how to use it correctly.
http://backreference.org/2010/03/13/safely-escape-variables-in-awk/
Once I have this part figured out, and after you have verified that the files weren't mangled you can do this to clean out the bak files:
for i in $(cat /home/user/Desktop/filelist.txt)
do
rm -f $i.bak
done
Several things:
You have:
$i | sed 's/var createDocumentFragm.*//g' > $i.awk
You should probably meant this (using your use of cat which we'll talk about in a moment):
cat $i | sed 's/var createDocumentFragm.*//g' > $i.awk
You're treating each file in your file list as if it was a command and not a file.
Now, about your use of cat. If you're using cat for almost anything but concatenating multiple files together, you probably are doing something not quite right. For example, you could have done this:
sed 's/var createDocumentFragm.*//g' "$i" > $i.awk
I'm also a bit confused about the awk statement. Exactly what file are you using awk on? Your awk statement is using STDIN and STDOUT, so it's reading file names from the for loop and then printing the output on the screen. Is the sed statement suppose to feed into the awk statement?
Note that I don't have to print out my file to STDOUT, then pipe that into sed. The sed command can take the file name directly.
You also want to avoid for loops over a list of files. That is very inefficient, and can cause problems with the command line getting overloaded. Not a big issue today, but can affect you when you least suspect it. What happens is that your $(cat /home/user/Desktop/filelist.txt) must execute first before the for loop can even start.
A little rewriting of your program:
cd ~/Desktop
grep -r -l 'var createDocumentFragm' infected_site/* > filelist.txt
while read file
do
cp -f "$file" "$file.bak"
sed 's/var createDocumentFragm.*//g' "$file" > "$i.awk"
awk '/<\/SCRIPT>/{p=1;print}/<\/script>/{p=0}!p'
done < filelist.txt
We can use one loop, and we made it a while loop. I could even feed the grep into that while loop:
grep -r -l 'var createDocumentFragm' infected_site/* | while read file
do
cp -f "$file" "$file.bak"
sed 's/var createDocumentFragm.*//g' "$file" > "$i.awk"
awk '/<\/SCRIPT>/{p=1;print}/<\/script>/{p=0}!p'
done < filelist.txt
and then I don't even have to create a temporary file.
Let me know what's going on with the awk. I suspect you wanted something like this:
grep -r -l 'var createDocumentFragm' infected_site/* | while read file
do
cp -f "$file" "$file.bak"
sed 's/var createDocumentFragm.*//g' "$file" \
| awk '/<\/SCRIPT>/{p=1;print}/<\/script>/{p=0}!p' > "$i.awk"
done < filelist.txt
Also note I put quotes around file names. This helps prevent problems if file name has a space in it.

using sed to find and replace in bash for loop

I have a large number of words in a text file to replace.
This script is working up until the sed command where I get:
sed: 1: "*.js": invalid command code *
PS... Bash isn't one of my strong points - this doesn't need to be pretty or efficient
cd '/Users/xxxxxx/Sites/xxxxxx'
echo `pwd`;
for line in `cat myFile.txt`
do
export IFS=":"
i=0
list=()
for word in $line; do
list[$i]=$word
i=$[i+1]
done
echo ${list[0]}
echo ${list[1]}
sed -i "s/{$list[0]}/{$list[1]}/g" *.js
done
You're running BSD sed (under OS X), therefore the -i flag requires an argument specifying what you want the suffix to be.
Also, no files match the glob *.js.
This looks like a simple typo:
sed -i "s/{$list[0]}/{$list[1]}/g" *.js
Should be:
sed -i "s/${list[0]}/${list[1]}/g" *.js
(just like the echo lines above)
So myFile.txt contains a list of from:to substitutions, and you are looping over each of those. Why don't you create a sed script from this file instead?
cd '/Users/xxxxxx/Sites/xxxxxx'
sed -e 's/^/s:/' -e 's/$/:/' myFile.txt |
# Output from first sed script is a sed script!
# It contains substitutions like this:
# s:from:to:
# s:other:substitute:
sed -f - -i~ *.js
Your sed might not like the -f - which means sed should read its script from standard input. If that is the case, perhaps you can create a temporary script like this instead;
sed -e 's/^/s:/' -e 's/$/:/' myFile.txt >script.sed
sed -f script.sed -i~ *.js
Another approach, if you don't feel very confident with sed and think you are going to forget in a week what the meaning of that voodoo symbols is, could be using IFS in a more efficient way:
IFS=":"
cat myFile.txt | while read PATTERN REPLACEMENT # You feed the while loop with stdout lines and read fields separated by ":"
do
sed -i "s/${PATTERN}/${REPLACEMENT}/g"
done
The only pitfall I can see (it may be more) is that if whether PATTERN or REPLACEMENT contain a slash (/) they are going to destroy your sed expression.
You can change the sed separator with a non-printable character and you should be safe.
Anyway, if you know whats on your myFile.txt you can just use any.

Redirect output from sed 's/c/d/' myFile to myFile

I am using sed in a script to do a replace and I want to have the replaced file overwrite the file. Normally I think that you would use this:
% sed -i 's/cat/dog/' manipulate
sed: illegal option -- i
However as you can see my sed does not have that command.
I tried this:
% sed 's/cat/dog/' manipulate > manipulate
But this just turns manipulate into an empty file (makes sense).
This works:
% sed 's/cat/dog/' manipulate > tmp; mv tmp manipulate
But I was wondering if there was a standard way to redirect output into the same file that input was taken from.
I commonly use the 3rd way, but with an important change:
$ sed 's/cat/dog/' manipulate > tmp && mv tmp manipulate
I.e. change ; to && so the move only happens if sed is successful; otherwise you'll lose your original file as soon as you make a typo in your sed syntax.
Note! For those reading the title and missing the OP's constraint "my sed doesn't support -i": For most people, sed will support -i, so the best way to do this is:
$ sed -i 's/cat/dog/' manipulate
Yes, -i is also supported in FreeBSD/MacOSX sed, but needs the empty string as an argument to edit a file in-place.
sed -i "" 's/old/new/g' file # FreeBSD sed
If you don't want to move copies around, you could use ed:
ed file.txt <<EOF
%s/cat/dog/
wq
EOF
Kernighan and Pike in The Art of Unix Programming discuss this issue. Their solution is to write a script called overwrite, which allows one to do such things.
The usage is: overwrite file cmd file.
# overwrite: copy standard input to output after EOF
opath=$PATH
PATH=/bin:/usr/bin
case $# in
0|1) echo 'Usage: overwrite file cmd [args]' 1>&2; exit 2
esac
file=$1; shift
new=/tmp/overwr1.$$; old=/tmp/overwr2.$$
trap 'rm -f $new $old; exit 1' 1 2 15 # clean up
if PATH=$opath "$#" >$new
then
cp $file $old # save original
trap '' 1 2 15 # wr are commmitted
cp $new $file
else
echo "overwrite: $1 failed, $file unchanged" 1>&2
exit 1
fi
rm -f $new $old
Once you have the above script in your $PATH, you can do:
overwrite manipulate sed 's/cat/dog/' manipulate
To make your life easier, you can use replace script from the same book:
# replace: replace str1 in files with str2 in place
PATH=/bin:/usr/bin
case $# in
0|2) echo 'Usage: replace str1 str2 files' 1>&2; exit 1
esac
left="$1"; right="$2"; shift; shift
for i
do
overwrite $i sed "s#$left#$right#g" $i
done
Having replace in your $PATH too will allow you to say:
replace cat dog manipulate
You can use sponge from the moreutils.
sed "s/cat/dog/" manipulate | sponge manipulate
Perhaps -i is gnu sed, or just an old version of sed, but anyways. You're on the right track. The first option is probably the most common one, the third option is if you want it to work everywhere (including solaris machines)... :) These are the 'standard' ways of doing it.
To change multiple files (and saving a backup of each as *.bak):
perl -p -i -e "s/oldtext/newtext/g" *
replaces any occurence of oldtext by newtext in all files in the current folder. However you will have to escape all perl special characters within oldtext and newtext using the backslash
This is called a “Perl pie” (mnemonic: easy as a pie)
The -i flag tells it do do in-place replacement, and it should be ok to use single (“'”) as well as double (“””) quotes.
If using ./* instead of just *, you should be able to do it in all sub-directories
See man perlrun for more details, including how to take a backup file of the original.
using sed:
sed -i 's/old/new/g' ./* (used in GNU)
sed -i '' 's/old/new/g' ./* (used in FreeBSD)
-i option is not available in standard sed.
Your alternatives are your third way or perl.
A lot of answers, but none of them is correct. Here is the correct and simplest one:
$ echo "111 222 333" > file.txt
$ sed -i -s s/222/444/ file.txt
$ cat file.txt
111 444 333
$
Workaround using open file handles:
exec 3<manipulate
Prevent open file from being truncated:
rm manipulate
sed 's/cat/dog/' <&3 > manipulate

Resources