Remove Lines in Multiple Text Files that Begin with a Certain Word

Remove Lines in Multiple Text Files that Begin with a Certain Word - bash

I have hundreds of text files in one directory. For all files, I want to delete all the lines that begin with HETATM. I would need a csh or bash code.
I would think you would use grep, but I'm not sure.

Use sed like this:
sed -i -e '/^HETATM/d' *.txt
to process all files in place.
-i means "in place".
-e means to execute the command that follows.
/^HETATM/ means "find lines starting with HETATM", and the following d means "delete".
Make a backup first!
If you really want to do it with grep, you could do this:
#!/bin/bash
for f in *.txt
do
grep -v "^HETATM" "%f" > $$.tmp && mv $$.tmp "$f"
done
It makes a temporary file of the output from grep (in file $$.tmp) and only overwrites your original file if the command executes successfully.

Using the -v option of grep to get all the lines that do not match:
grep -v '^HETATM' input.txt > output.txt

Related

need to clean file via SED or GREP

I have these files
NotRequired.txt (having lines which need to be remove)
Need2CleanSED.txt (big file , need to clean)
Need2CleanGRP.txt (big file , need to clean)
content:
more NotRequired.txt
[abc-xyz_pqr-pe2_123]
[lon-abc-tkt_1202]
[wat-7600-1_414]
[indo-pak_isu-5_761]
I am reading above file and want to remove lines from Need2Clean???.txt, trying via SED and GREP but no success.
myFile="NotRequired.txt"
while IFS= read -r HKline
do
sed -i '/$HKline/d' Need2CleanSED.txt
done < "$myFile"
myFile="NotRequired.txt"
while IFS= read -r HKline
do
grep -vE \"$HKline\" Need2CleanGRP.txt > Need2CleanGRP.txt
done < "$myFile"
Looks as if the Variable and characters [] making some problem.

What you're doing is extremely inefficient and error prone. Just do this:
grep -vF -f NotRequired.txt Need2CleanGRP.txt > tmp &&
mv tmp Need2CleanGRP.txt
Thanks to grep -F the above treats each line of NotRequired.txt as a string rather than a regexp so you don't have to worry about escaping RE metachars like [ and you don't need to wrap it in a shell loop - that one command will remove all undesirable lines in one execution of grep.
Never do command file > file btw as the shell might decide to execute the > file first and so empty file before command gets a chance to read it! Always do command file > tmp && mv tmp file instead.

Your assumption is correct. The [...] construct looks for any characters in that set, so you have to preface ("escape") them with \. The easiest way is to do that in your original file:
sed -i -e 's:\[:\\[:' -e 's:\]:\\]:' "${myFile}"
If you don't like that, you can probably put the sed command in where you're directing the file in:
done < replace.txt|sed -e 's:\[:\\[:' -e 's:\]:\\]:'
Finally, you can use sed on each HKline variable:
HKline=$( echo $HKline | sed -e 's:\[:\\[:' -e 's:\]:\\]:' )

try gnu sed:
sed -Ez 's/\n/\|/g;s!\[!\\[!g;s!\]!\\]!g; s!(.*).!/\1/d!' NotRequired.txt| sed -Ef - Need2CleanSED.txt
Two sed process are chained into one by shell pipe
NotRequired.txt is 'slurped' by sed -z all at once and substituted its \n and [ meta-char with | and \[ respectively of which the 2nd process uses it as regex script for the input file, ie. Need2CleanSED.txt. 1st process output;
/\[abc-xyz_pqr-pe2_123\]|\[lon-abc-tkt_1202\]|\[wat-7600-1_414\]|\[indo-pak_isu-5_761\]/d
add -u ie. unbuffered, option to evade from batch process, sort of direct i/o

Removing lines from multiple files with sed command

So, disclaimer: I am pretty new to using bash and zsh, so there is a chance the answer is really simple. Nonetheless. I checked previous postings and couldn't find anything. (edit: I have tried this in both bash and zsh shells- same problem.)
I have a directory with many files and am trying to remove the first line from each file.
So say the directory contains: file1.txt file2.txt file3.txt ... etc.
I am using the sed command (non-GNU):
sed -i -e "1d" *.txt
For some reason, this is only removing the first line of the first file. I thought that the *.txt would affect all files matching the pattern in directory. Strangely, it is creating the file duplicates with -e appended, but both the duplicate and original are the same.
I tried this with other commands (e.g. ls *.txt) and it works fine. Is there something about sed I am missing?
Thank you in advance.

Different versions of sed in differing operating systems support various parameters.
OpenBSD (5.4) sed
The -i flag is unavailable. You can use the following /bin/sh syntax:
for i in *.txt
do
f=`mktemp -p .`
sed -e "1d" "${i}" > "${f}" && mv -- "${f}" "${i}"
done
FreeBSD (11-CURRENT) sed
The -i flag requires an extension, even if it's empty. Thus must be written as sed -i "" -e "1d" *.txt
GNU sed
This looks to see if the argument following -i is another option (or possibly a command). If so, it assumes an in-place modification. If it appears to be a file extension such as ".bak", it will rename the original with the ".bak" and then modify it into the original file's name.
There might be other variations on other platforms, but those are the three I have at hand.

use it without -e !
for one file use:
sed -i '1d' filename
for all files use :
sed -i '1d' *.txt
or
files=/path/to/files/*.extension ; for var in $files ; do sed -i '1d' $var ; done
.for me i use ubuntu and debian based systems , this method is working for me 100% , but for other platformes i'm not sure , so this is other method :
replace first line with emty pattern , and remove empty lines , (double commands):
for files in $(ls /path/to/files/*.txt); do sed -i "s/$(head -1 "$files")//g" "$files" ; sed -i '/^$/d' "$files" ; done
Note: if your files contain splash '/' , then it will give error , so in this case sed command should look like this ( sed -i "s[$(head -1 "$files")[[g" )
hope that's what you're looking for :)

The issue here is that the line number isn't reset when sed opens a new file, so 1 only matches the first line of the first file.
One solution is to use a shell loop, calling sed once for each file. Gumnos' answer shows how to do this in the most widely compatible way, although if you have a version of sed supporting the -i flag, you could do this instead:
for i in *.txt; do
sed -i.bak '1d' "$i"
done
It is possible to avoid creating the backup file by passing an empty suffix but personally, I don't think it's such a bad thing. One day you'll be grateful for it!
It appears that you're not working with GNU tools but if you were, I would recommend using GNU awk for this task. The variable FNR is useful here, as it keeps track of the record number for each file individually, allowing you to do this:
gawk -i inplace 'FNR>1' *.txt
Using the inplace extension, this allows you to remove the first line from each of your files, by only printing the lines where FNR is greater than 1.
Testing it out:
$ seq 5 > file1
$ seq 5 > file2
$ gawk -i inplace 'FNR>1' file1 file2
$ cat file1
2
3
4
5
$ cat file2
2
3
4
5

The last argument you are passing to the Sed is the problem
try something like this.
var=(`find *txt`)
for file in "${var[#]}"
do
sed -i -e 1d $file
done
This did the trick for me.

Ubuntu:- append remove keyword to a file

I have a file containing file name.
file1
file2
file3
file4
I wan to create a shell script that add the'rm' infront
rm file1
rm file2
rm file3
rm file4
How to append the rm in front the file name?

You can do that many ways - sed, vim, perl, awk.
Or you can simply use xargs like this:
xargs rm < filelist
If you really insist on editing filelist, use sed:
sed 's/^/rm /g' filelist > newscript
(which means find start of line ^ and replace it with rm for every line /g).
You can even edit filelist in-place using sed -i:
sed -i 's/^/rm /g' filelist

I think mvp's answer is the best, but if you're talking about changing your current file list to a shell script with rm inserted before each filename, you can do this simply with any good text editor that supports find and replace with regular expressions.
Search term : ^(.)
Replacement : rm \1
Vi one-liner :
:%s/^/rm /

Another way could be using a simple shell script.
#!/bin/sh
FILE=$1
while read line
do
echo "Removing $line"
rm $line
done < $FILE
You could then run it as sh multirm.sh filelist
If you want to simply add rm in to the file you could use awk for that.
awk '{ print "rm", $1" }' filelist

Trying to write a script to clean <script.aa=([].slice+'hjkbghkj') from multiple htm files, recursively

I am trying to modify a bash script to remove a glob of malicious code from a large number of files.
The community will benefit from this, so here it is:
#!/bin/bash
grep -r -l 'var createDocumentFragm' /home/user/Desktop/infected_site/* > /home/user/Desktop/filelist.txt
for i in $(cat /home/user/Desktop/filelist.txt)
do
cp -f $i $i.bak
done
for i in $(cat /home/user/Desktop/filelist.txt)
do
$i | sed 's/createDocumentFragm.*//g' > $i.awk
awk '/<\/SCRIPT>/{p=1;print}/<\/script>/{p=0}!p'
This is where the script bombs out with this message:
+ for i in '$(cat /home/user/Desktop/filelist.txt)'
+ sed 's/createDocumentFragm.*//g'
+ /home/user/Desktop/infected_site/index.htm
I get 2 errors and the script stops.
/home/user/Desktop/infected_site/index.htm: line 1: syntax error near unexpected token `<'
/home/user/Desktop/infected_site/index.htm: line 1: `<html><head><script>(function (){ '
I have the first 2 parts done.
The files containing createDocumentfragm have been enumerated in a text file correctly.
The files in the textfile.txt have been duplicated, in their original location with a .bak added to them IE: infected_site/some_directory/infected_file.htm and infected_file.htm.bak
effectively making sure we have a backup.
All I need to do now is write an AWK command that will use the list of files in filelist.txt, use the entire glob of malicious text as a pattern, and remove it from the files. Using just the uppercase script as the starting point, and the lower case script is too generic and could delete legitimate text
I suspect this may help me, but I don't know how to use it correctly.
http://backreference.org/2010/03/13/safely-escape-variables-in-awk/
Once I have this part figured out, and after you have verified that the files weren't mangled you can do this to clean out the bak files:
for i in $(cat /home/user/Desktop/filelist.txt)
do
rm -f $i.bak
done

Several things:
You have:
$i | sed 's/var createDocumentFragm.*//g' > $i.awk
You should probably meant this (using your use of cat which we'll talk about in a moment):
cat $i | sed 's/var createDocumentFragm.*//g' > $i.awk
You're treating each file in your file list as if it was a command and not a file.
Now, about your use of cat. If you're using cat for almost anything but concatenating multiple files together, you probably are doing something not quite right. For example, you could have done this:
sed 's/var createDocumentFragm.*//g' "$i" > $i.awk
I'm also a bit confused about the awk statement. Exactly what file are you using awk on? Your awk statement is using STDIN and STDOUT, so it's reading file names from the for loop and then printing the output on the screen. Is the sed statement suppose to feed into the awk statement?
Note that I don't have to print out my file to STDOUT, then pipe that into sed. The sed command can take the file name directly.
You also want to avoid for loops over a list of files. That is very inefficient, and can cause problems with the command line getting overloaded. Not a big issue today, but can affect you when you least suspect it. What happens is that your $(cat /home/user/Desktop/filelist.txt) must execute first before the for loop can even start.
A little rewriting of your program:
cd ~/Desktop
grep -r -l 'var createDocumentFragm' infected_site/* > filelist.txt
while read file
do
cp -f "$file" "$file.bak"
sed 's/var createDocumentFragm.*//g' "$file" > "$i.awk"
awk '/<\/SCRIPT>/{p=1;print}/<\/script>/{p=0}!p'
done < filelist.txt
We can use one loop, and we made it a while loop. I could even feed the grep into that while loop:
grep -r -l 'var createDocumentFragm' infected_site/* | while read file
do
cp -f "$file" "$file.bak"
sed 's/var createDocumentFragm.*//g' "$file" > "$i.awk"
awk '/<\/SCRIPT>/{p=1;print}/<\/script>/{p=0}!p'
done < filelist.txt
and then I don't even have to create a temporary file.
Let me know what's going on with the awk. I suspect you wanted something like this:
grep -r -l 'var createDocumentFragm' infected_site/* | while read file
do
cp -f "$file" "$file.bak"
sed 's/var createDocumentFragm.*//g' "$file" \
| awk '/<\/SCRIPT>/{p=1;print}/<\/script>/{p=0}!p' > "$i.awk"
done < filelist.txt
Also note I put quotes around file names. This helps prevent problems if file name has a space in it.

sed command creates randomly named files

I recently wrote a script that does a sed command, to replace all the occurrences of "string1" with "string2" in a file named "test.txt".
It looks like this:
sed -i 's/string1/string2/g' test.txt
The catch is, "string1" does not necessarily exist in test.txt.
I notice after executing a bunch of these sed commands, I get a number of empty files, left behind in the directory, with names that look like this:
"sed4l4DpD"
Does anyone know why this might be, and how I can correct it?

-i is the suffix given to the new/output file. Also, you need -e for the command.
Here's how you use it:
sed -i '2' -e 's/string1/string2/g' test.txt
This will create a file called test.txt2 that is the backup of test.txt
To replace the file (instead of creating a new copy - called an "in-place" substitution), change the -i value to '' (ie blank):
sed -i '' -e 's/string1/string2/g' test.txt
EDIT II
Here's actual command line output from a Mac (Snow Leopard) that show that my modified answer (removed space from between the -i and the suffix) is correct.
NOTE: On a linux server, there must be no space between it -i and the suffix.
> echo "this is a test" > test.txt
> cat test.txt
this is a test
> sed -i '2' -e 's/a/a good/' test.txt
> ls test*
test.txt test.txt2
> cat test.txt
this is a good test
> cat test.txt2
this is a test
> sed -i '' -e 's/a/a really/' test.txt
> ls test*
test.txt test.txt2
> cat test.txt
this is a really good test

I wasn't able to reproduce this with a quick test (using GNU sed 4.2.1) -- but strace did show sed creating a file called sedJd9Cuy and then renaming it to tmp (the file named on the command line).
It looks like something is going wrong after sed creates the temporary file and before it's able to rename it.
My best guess is that you've run out of room in the filesystem; you're able to create a new empty file, but unable to write to it.
What does df . say?
EDIT:
I still don't know what's causing the problem, but it shouldn't be too difficult to work around it.
Rather than
sed -i 's/string1/string2/g' test.txt
try something like this:
sed 's/string1/string2/g' test.txt > test.txt.$$ && mv -f test.txt.$$ test.txt
Something is going wrong with the way sed creates and then renames a text file to replace your original file. The above command uses sed as a simple input-output filter and creates and renames the temporary file separately.

So after much testing last night, it turns out that sed was creating these files when trying to operate on an empty string. The way i was getting the array of "$string1" arguments was through a grep command, which seems to be malformed. What I wanted from the grep was all lines containing something of the type "Text here '.'".
For example the string, "Text here 'ABC.DEF'" in a file, should have been caught by grep, then the ABC.DEF portion of the string, would be substituted by ABC_DEF. Unfortunately the grep I was using would catch lines of the type "Text here ''" (that is, nothing between the ''). When later on, the script attempted to perform a sed replacement using this empty string, the random file was created (probably because sed died).
Thanks for all your help in understanding how sed works.

Its better if you do it in this way:
cat large_file | sed 's/string1/string2/g' > file_filtred

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Remove Lines in Multiple Text Files that Begin with a Certain Word - bash

I have hundreds of text files in one directory. For all files, I want to delete all the lines that begin with HETATM. I would need a csh or bash code. I would think you would use grep, but I'm not sure.

Using the -v option of grep to get all the lines that do not match: grep -v '^HETATM' input.txt > output.txt

Related

need to clean file via SED or GREP

Removing lines from multiple files with sed command

Ubuntu:- append remove keyword to a file

Trying to write a script to clean <script.aa=([].slice+'hjkbghkj') from multiple htm files, recursively

sed command creates randomly named files

Categories

Resources