sed delete not working with cat variable - bash

I have a file named test-domain, the contents of which contain the line 100.am.
When I do this, the line with 100.am is deleted from the test-domain file, as expected:
for x in $(echo 100.am); do sed -i "/$x/d" test-domain; done
However, if instead of echo 100.am, I read each line from a file named unwanted-lines, it does NOT work.
for x in $(cat unwanted-lines); do sed -i "/$x/d" test-domain; done
This is even if the only contents of unwanted-lines is one line, with the exact contents 100.am.
Does anyone know why sed delete line works if you use echo in your variable, but not if you use cat?

fgrep -v -f unwanted-lines test-domain > /tmp/Buffer
mv /tmp/Buffer test-domain
sed is not interesting in this case due to multiple call in shell (poor efficiency and lot of ressources used). The way to still use sed is to preload line to delete, and make a search base on this preloaded info but very heavy compare to fgrep in this case

Does anyone know why sed delete line works if you use echo in your
variable, but not if you use cat?
I believe that your file containing unwanted lines contains CR+LF line endings due to which it doesn't work when you use the file. You could strip the CR in your loop:
for x in $(cat unwanted-lines); do x="${x//$'\r'}"; sed -i "/$x/d" test-domain; done

One better strategy than yours would be to use a genuine editor, e.g., ed, as so:
ed -s test-domain < <(
shopt -s extglob
while IFS= read -r l; do
[[ $l = *([[:space:]]) ]] && continue
l=${l//./\\.}
echo "g/$l/d"
done < unwanted-lines
echo "wq"
)
Caveat. You must make sure that the file unwanted-lines doesn't contain any character that could clash with ed's regexps and commands. I have already included a match for a period (i.e., replace . with \.).
This method is quite efficient, as you're not forking so many times on sed, writing temp files, renaming them, etc.
Another possibility would be to use grep, but then you won't have the editing option ed offers.
Remark. ed is the standard editor.

why not just applying the sed command on your file?
sed -i '/.*100\.am/d' your_file

Related

need to clean file via SED or GREP

I have these files
NotRequired.txt (having lines which need to be remove)
Need2CleanSED.txt (big file , need to clean)
Need2CleanGRP.txt (big file , need to clean)
content:
more NotRequired.txt
[abc-xyz_pqr-pe2_123]
[lon-abc-tkt_1202]
[wat-7600-1_414]
[indo-pak_isu-5_761]
I am reading above file and want to remove lines from Need2Clean???.txt, trying via SED and GREP but no success.
myFile="NotRequired.txt"
while IFS= read -r HKline
do
sed -i '/$HKline/d' Need2CleanSED.txt
done < "$myFile"
myFile="NotRequired.txt"
while IFS= read -r HKline
do
grep -vE \"$HKline\" Need2CleanGRP.txt > Need2CleanGRP.txt
done < "$myFile"
Looks as if the Variable and characters [] making some problem.
What you're doing is extremely inefficient and error prone. Just do this:
grep -vF -f NotRequired.txt Need2CleanGRP.txt > tmp &&
mv tmp Need2CleanGRP.txt
Thanks to grep -F the above treats each line of NotRequired.txt as a string rather than a regexp so you don't have to worry about escaping RE metachars like [ and you don't need to wrap it in a shell loop - that one command will remove all undesirable lines in one execution of grep.
Never do command file > file btw as the shell might decide to execute the > file first and so empty file before command gets a chance to read it! Always do command file > tmp && mv tmp file instead.
Your assumption is correct. The [...] construct looks for any characters in that set, so you have to preface ("escape") them with \. The easiest way is to do that in your original file:
sed -i -e 's:\[:\\[:' -e 's:\]:\\]:' "${myFile}"
If you don't like that, you can probably put the sed command in where you're directing the file in:
done < replace.txt|sed -e 's:\[:\\[:' -e 's:\]:\\]:'
Finally, you can use sed on each HKline variable:
HKline=$( echo $HKline | sed -e 's:\[:\\[:' -e 's:\]:\\]:' )
try gnu sed:
sed -Ez 's/\n/\|/g;s!\[!\\[!g;s!\]!\\]!g; s!(.*).!/\1/d!' NotRequired.txt| sed -Ef - Need2CleanSED.txt
Two sed process are chained into one by shell pipe
NotRequired.txt is 'slurped' by sed -z all at once and substituted its \n and [ meta-char with | and \[ respectively of which the 2nd process uses it as regex script for the input file, ie. Need2CleanSED.txt. 1st process output;
/\[abc-xyz_pqr-pe2_123\]|\[lon-abc-tkt_1202\]|\[wat-7600-1_414\]|\[indo-pak_isu-5_761\]/d
add -u ie. unbuffered, option to evade from batch process, sort of direct i/o

Removing lines from multiple files with sed command

So, disclaimer: I am pretty new to using bash and zsh, so there is a chance the answer is really simple. Nonetheless. I checked previous postings and couldn't find anything. (edit: I have tried this in both bash and zsh shells- same problem.)
I have a directory with many files and am trying to remove the first line from each file.
So say the directory contains: file1.txt file2.txt file3.txt ... etc.
I am using the sed command (non-GNU):
sed -i -e "1d" *.txt
For some reason, this is only removing the first line of the first file. I thought that the *.txt would affect all files matching the pattern in directory. Strangely, it is creating the file duplicates with -e appended, but both the duplicate and original are the same.
I tried this with other commands (e.g. ls *.txt) and it works fine. Is there something about sed I am missing?
Thank you in advance.
Different versions of sed in differing operating systems support various parameters.
OpenBSD (5.4) sed
The -i flag is unavailable. You can use the following /bin/sh syntax:
for i in *.txt
do
f=`mktemp -p .`
sed -e "1d" "${i}" > "${f}" && mv -- "${f}" "${i}"
done
FreeBSD (11-CURRENT) sed
The -i flag requires an extension, even if it's empty. Thus must be written as sed -i "" -e "1d" *.txt
GNU sed
This looks to see if the argument following -i is another option (or possibly a command). If so, it assumes an in-place modification. If it appears to be a file extension such as ".bak", it will rename the original with the ".bak" and then modify it into the original file's name.
There might be other variations on other platforms, but those are the three I have at hand.
use it without -e !
for one file use:
sed -i '1d' filename
for all files use :
sed -i '1d' *.txt
or
files=/path/to/files/*.extension ; for var in $files ; do sed -i '1d' $var ; done
.for me i use ubuntu and debian based systems , this method is working for me 100% , but for other platformes i'm not sure , so this is other method :
replace first line with emty pattern , and remove empty lines , (double commands):
for files in $(ls /path/to/files/*.txt); do sed -i "s/$(head -1 "$files")//g" "$files" ; sed -i '/^$/d' "$files" ; done
Note: if your files contain splash '/' , then it will give error , so in this case sed command should look like this ( sed -i "s[$(head -1 "$files")[[g" )
hope that's what you're looking for :)
The issue here is that the line number isn't reset when sed opens a new file, so 1 only matches the first line of the first file.
One solution is to use a shell loop, calling sed once for each file. Gumnos' answer shows how to do this in the most widely compatible way, although if you have a version of sed supporting the -i flag, you could do this instead:
for i in *.txt; do
sed -i.bak '1d' "$i"
done
It is possible to avoid creating the backup file by passing an empty suffix but personally, I don't think it's such a bad thing. One day you'll be grateful for it!
It appears that you're not working with GNU tools but if you were, I would recommend using GNU awk for this task. The variable FNR is useful here, as it keeps track of the record number for each file individually, allowing you to do this:
gawk -i inplace 'FNR>1' *.txt
Using the inplace extension, this allows you to remove the first line from each of your files, by only printing the lines where FNR is greater than 1.
Testing it out:
$ seq 5 > file1
$ seq 5 > file2
$ gawk -i inplace 'FNR>1' file1 file2
$ cat file1
2
3
4
5
$ cat file2
2
3
4
5
The last argument you are passing to the Sed is the problem
try something like this.
var=(`find *txt`)
for file in "${var[#]}"
do
sed -i -e 1d $file
done
This did the trick for me.

Trying to write a script to clean <script.aa=([].slice+'hjkbghkj') from multiple htm files, recursively

I am trying to modify a bash script to remove a glob of malicious code from a large number of files.
The community will benefit from this, so here it is:
#!/bin/bash
grep -r -l 'var createDocumentFragm' /home/user/Desktop/infected_site/* > /home/user/Desktop/filelist.txt
for i in $(cat /home/user/Desktop/filelist.txt)
do
cp -f $i $i.bak
done
for i in $(cat /home/user/Desktop/filelist.txt)
do
$i | sed 's/createDocumentFragm.*//g' > $i.awk
awk '/<\/SCRIPT>/{p=1;print}/<\/script>/{p=0}!p'
This is where the script bombs out with this message:
+ for i in '$(cat /home/user/Desktop/filelist.txt)'
+ sed 's/createDocumentFragm.*//g'
+ /home/user/Desktop/infected_site/index.htm
I get 2 errors and the script stops.
/home/user/Desktop/infected_site/index.htm: line 1: syntax error near unexpected token `<'
/home/user/Desktop/infected_site/index.htm: line 1: `<html><head><script>(function (){ '
I have the first 2 parts done.
The files containing createDocumentfragm have been enumerated in a text file correctly.
The files in the textfile.txt have been duplicated, in their original location with a .bak added to them IE: infected_site/some_directory/infected_file.htm and infected_file.htm.bak
effectively making sure we have a backup.
All I need to do now is write an AWK command that will use the list of files in filelist.txt, use the entire glob of malicious text as a pattern, and remove it from the files. Using just the uppercase script as the starting point, and the lower case script is too generic and could delete legitimate text
I suspect this may help me, but I don't know how to use it correctly.
http://backreference.org/2010/03/13/safely-escape-variables-in-awk/
Once I have this part figured out, and after you have verified that the files weren't mangled you can do this to clean out the bak files:
for i in $(cat /home/user/Desktop/filelist.txt)
do
rm -f $i.bak
done
Several things:
You have:
$i | sed 's/var createDocumentFragm.*//g' > $i.awk
You should probably meant this (using your use of cat which we'll talk about in a moment):
cat $i | sed 's/var createDocumentFragm.*//g' > $i.awk
You're treating each file in your file list as if it was a command and not a file.
Now, about your use of cat. If you're using cat for almost anything but concatenating multiple files together, you probably are doing something not quite right. For example, you could have done this:
sed 's/var createDocumentFragm.*//g' "$i" > $i.awk
I'm also a bit confused about the awk statement. Exactly what file are you using awk on? Your awk statement is using STDIN and STDOUT, so it's reading file names from the for loop and then printing the output on the screen. Is the sed statement suppose to feed into the awk statement?
Note that I don't have to print out my file to STDOUT, then pipe that into sed. The sed command can take the file name directly.
You also want to avoid for loops over a list of files. That is very inefficient, and can cause problems with the command line getting overloaded. Not a big issue today, but can affect you when you least suspect it. What happens is that your $(cat /home/user/Desktop/filelist.txt) must execute first before the for loop can even start.
A little rewriting of your program:
cd ~/Desktop
grep -r -l 'var createDocumentFragm' infected_site/* > filelist.txt
while read file
do
cp -f "$file" "$file.bak"
sed 's/var createDocumentFragm.*//g' "$file" > "$i.awk"
awk '/<\/SCRIPT>/{p=1;print}/<\/script>/{p=0}!p'
done < filelist.txt
We can use one loop, and we made it a while loop. I could even feed the grep into that while loop:
grep -r -l 'var createDocumentFragm' infected_site/* | while read file
do
cp -f "$file" "$file.bak"
sed 's/var createDocumentFragm.*//g' "$file" > "$i.awk"
awk '/<\/SCRIPT>/{p=1;print}/<\/script>/{p=0}!p'
done < filelist.txt
and then I don't even have to create a temporary file.
Let me know what's going on with the awk. I suspect you wanted something like this:
grep -r -l 'var createDocumentFragm' infected_site/* | while read file
do
cp -f "$file" "$file.bak"
sed 's/var createDocumentFragm.*//g' "$file" \
| awk '/<\/SCRIPT>/{p=1;print}/<\/script>/{p=0}!p' > "$i.awk"
done < filelist.txt
Also note I put quotes around file names. This helps prevent problems if file name has a space in it.

using sed to find and replace in bash for loop

I have a large number of words in a text file to replace.
This script is working up until the sed command where I get:
sed: 1: "*.js": invalid command code *
PS... Bash isn't one of my strong points - this doesn't need to be pretty or efficient
cd '/Users/xxxxxx/Sites/xxxxxx'
echo `pwd`;
for line in `cat myFile.txt`
do
export IFS=":"
i=0
list=()
for word in $line; do
list[$i]=$word
i=$[i+1]
done
echo ${list[0]}
echo ${list[1]}
sed -i "s/{$list[0]}/{$list[1]}/g" *.js
done
You're running BSD sed (under OS X), therefore the -i flag requires an argument specifying what you want the suffix to be.
Also, no files match the glob *.js.
This looks like a simple typo:
sed -i "s/{$list[0]}/{$list[1]}/g" *.js
Should be:
sed -i "s/${list[0]}/${list[1]}/g" *.js
(just like the echo lines above)
So myFile.txt contains a list of from:to substitutions, and you are looping over each of those. Why don't you create a sed script from this file instead?
cd '/Users/xxxxxx/Sites/xxxxxx'
sed -e 's/^/s:/' -e 's/$/:/' myFile.txt |
# Output from first sed script is a sed script!
# It contains substitutions like this:
# s:from:to:
# s:other:substitute:
sed -f - -i~ *.js
Your sed might not like the -f - which means sed should read its script from standard input. If that is the case, perhaps you can create a temporary script like this instead;
sed -e 's/^/s:/' -e 's/$/:/' myFile.txt >script.sed
sed -f script.sed -i~ *.js
Another approach, if you don't feel very confident with sed and think you are going to forget in a week what the meaning of that voodoo symbols is, could be using IFS in a more efficient way:
IFS=":"
cat myFile.txt | while read PATTERN REPLACEMENT # You feed the while loop with stdout lines and read fields separated by ":"
do
sed -i "s/${PATTERN}/${REPLACEMENT}/g"
done
The only pitfall I can see (it may be more) is that if whether PATTERN or REPLACEMENT contain a slash (/) they are going to destroy your sed expression.
You can change the sed separator with a non-printable character and you should be safe.
Anyway, if you know whats on your myFile.txt you can just use any.

Redirect output from sed 's/c/d/' myFile to myFile

I am using sed in a script to do a replace and I want to have the replaced file overwrite the file. Normally I think that you would use this:
% sed -i 's/cat/dog/' manipulate
sed: illegal option -- i
However as you can see my sed does not have that command.
I tried this:
% sed 's/cat/dog/' manipulate > manipulate
But this just turns manipulate into an empty file (makes sense).
This works:
% sed 's/cat/dog/' manipulate > tmp; mv tmp manipulate
But I was wondering if there was a standard way to redirect output into the same file that input was taken from.
I commonly use the 3rd way, but with an important change:
$ sed 's/cat/dog/' manipulate > tmp && mv tmp manipulate
I.e. change ; to && so the move only happens if sed is successful; otherwise you'll lose your original file as soon as you make a typo in your sed syntax.
Note! For those reading the title and missing the OP's constraint "my sed doesn't support -i": For most people, sed will support -i, so the best way to do this is:
$ sed -i 's/cat/dog/' manipulate
Yes, -i is also supported in FreeBSD/MacOSX sed, but needs the empty string as an argument to edit a file in-place.
sed -i "" 's/old/new/g' file # FreeBSD sed
If you don't want to move copies around, you could use ed:
ed file.txt <<EOF
%s/cat/dog/
wq
EOF
Kernighan and Pike in The Art of Unix Programming discuss this issue. Their solution is to write a script called overwrite, which allows one to do such things.
The usage is: overwrite file cmd file.
# overwrite: copy standard input to output after EOF
opath=$PATH
PATH=/bin:/usr/bin
case $# in
0|1) echo 'Usage: overwrite file cmd [args]' 1>&2; exit 2
esac
file=$1; shift
new=/tmp/overwr1.$$; old=/tmp/overwr2.$$
trap 'rm -f $new $old; exit 1' 1 2 15 # clean up
if PATH=$opath "$#" >$new
then
cp $file $old # save original
trap '' 1 2 15 # wr are commmitted
cp $new $file
else
echo "overwrite: $1 failed, $file unchanged" 1>&2
exit 1
fi
rm -f $new $old
Once you have the above script in your $PATH, you can do:
overwrite manipulate sed 's/cat/dog/' manipulate
To make your life easier, you can use replace script from the same book:
# replace: replace str1 in files with str2 in place
PATH=/bin:/usr/bin
case $# in
0|2) echo 'Usage: replace str1 str2 files' 1>&2; exit 1
esac
left="$1"; right="$2"; shift; shift
for i
do
overwrite $i sed "s#$left#$right#g" $i
done
Having replace in your $PATH too will allow you to say:
replace cat dog manipulate
You can use sponge from the moreutils.
sed "s/cat/dog/" manipulate | sponge manipulate
Perhaps -i is gnu sed, or just an old version of sed, but anyways. You're on the right track. The first option is probably the most common one, the third option is if you want it to work everywhere (including solaris machines)... :) These are the 'standard' ways of doing it.
To change multiple files (and saving a backup of each as *.bak):
perl -p -i -e "s/oldtext/newtext/g" *
replaces any occurence of oldtext by newtext in all files in the current folder. However you will have to escape all perl special characters within oldtext and newtext using the backslash
This is called a “Perl pie” (mnemonic: easy as a pie)
The -i flag tells it do do in-place replacement, and it should be ok to use single (“'”) as well as double (“””) quotes.
If using ./* instead of just *, you should be able to do it in all sub-directories
See man perlrun for more details, including how to take a backup file of the original.
using sed:
sed -i 's/old/new/g' ./* (used in GNU)
sed -i '' 's/old/new/g' ./* (used in FreeBSD)
-i option is not available in standard sed.
Your alternatives are your third way or perl.
A lot of answers, but none of them is correct. Here is the correct and simplest one:
$ echo "111 222 333" > file.txt
$ sed -i -s s/222/444/ file.txt
$ cat file.txt
111 444 333
$
Workaround using open file handles:
exec 3<manipulate
Prevent open file from being truncated:
rm manipulate
sed 's/cat/dog/' <&3 > manipulate

Resources