File will be modified after while loop reads file in Bash - bash

I need to pass each line in a text file to a program. What I did is using while loop to read each line of the file and then passing each line to the program. My script is
tail -n +2 output.txt | while IFS=' ' read ln
do
line=(${ln})
prog $(line) > newout
grep "runtime\|opt" newout | sed -e 's/ $/\n/' > res.txt
done
I did not modify the output.txt at all. However, its content will be changed and only heading is left. Why is the script doing that?

Related

Concatenating files in Bash

Suppose I am writing a shell script foo.bash to concatenate the contents of test/*.txt with a comma like that:
> cat test/x.txt
a b c
> cat test/y.txt
1 2 3
> foo.bash test
a b c,
1 2 3
How would you write such a script ?
Could you please try following(in case you want to concatenate lines of files, line by line with comma).
paste -d, *.txt
EDIT2: To concatenate all .txt files contents with , try following once(needed GNU awk).
awk 'ENDFILE{print ","} 1' *.txt | sed '$d'
What about
for file in /tmp/test/*.txt; do
echo -n "$(cat "$file"),"
done | sed 's/.$//'
or maybe
for file in /tmp/test/*.txt; do
sed 's/$/,/' "$file"
done | sed 's/.$//'
You could use regex to do achive it.The command below grabs the content of every file and appends it after the name of the file ($ARGV).
$ grep -ER '*'
a.txt:a
b.txt:b
c.txt:c
$ perl -pe 's/^(.*)\n$/$ARGV:\1,/;' * > file.txt
$ cat file.txt
a.txt:a,b.txt:b,c.txt:c,

need to clean file via SED or GREP

I have these files
NotRequired.txt (having lines which need to be remove)
Need2CleanSED.txt (big file , need to clean)
Need2CleanGRP.txt (big file , need to clean)
content:
more NotRequired.txt
[abc-xyz_pqr-pe2_123]
[lon-abc-tkt_1202]
[wat-7600-1_414]
[indo-pak_isu-5_761]
I am reading above file and want to remove lines from Need2Clean???.txt, trying via SED and GREP but no success.
myFile="NotRequired.txt"
while IFS= read -r HKline
do
sed -i '/$HKline/d' Need2CleanSED.txt
done < "$myFile"
myFile="NotRequired.txt"
while IFS= read -r HKline
do
grep -vE \"$HKline\" Need2CleanGRP.txt > Need2CleanGRP.txt
done < "$myFile"
Looks as if the Variable and characters [] making some problem.
What you're doing is extremely inefficient and error prone. Just do this:
grep -vF -f NotRequired.txt Need2CleanGRP.txt > tmp &&
mv tmp Need2CleanGRP.txt
Thanks to grep -F the above treats each line of NotRequired.txt as a string rather than a regexp so you don't have to worry about escaping RE metachars like [ and you don't need to wrap it in a shell loop - that one command will remove all undesirable lines in one execution of grep.
Never do command file > file btw as the shell might decide to execute the > file first and so empty file before command gets a chance to read it! Always do command file > tmp && mv tmp file instead.
Your assumption is correct. The [...] construct looks for any characters in that set, so you have to preface ("escape") them with \. The easiest way is to do that in your original file:
sed -i -e 's:\[:\\[:' -e 's:\]:\\]:' "${myFile}"
If you don't like that, you can probably put the sed command in where you're directing the file in:
done < replace.txt|sed -e 's:\[:\\[:' -e 's:\]:\\]:'
Finally, you can use sed on each HKline variable:
HKline=$( echo $HKline | sed -e 's:\[:\\[:' -e 's:\]:\\]:' )
try gnu sed:
sed -Ez 's/\n/\|/g;s!\[!\\[!g;s!\]!\\]!g; s!(.*).!/\1/d!' NotRequired.txt| sed -Ef - Need2CleanSED.txt
Two sed process are chained into one by shell pipe
NotRequired.txt is 'slurped' by sed -z all at once and substituted its \n and [ meta-char with | and \[ respectively of which the 2nd process uses it as regex script for the input file, ie. Need2CleanSED.txt. 1st process output;
/\[abc-xyz_pqr-pe2_123\]|\[lon-abc-tkt_1202\]|\[wat-7600-1_414\]|\[indo-pak_isu-5_761\]/d
add -u ie. unbuffered, option to evade from batch process, sort of direct i/o

Problems using grep and sed in bash

I'm extracting domains, subdomains and ips from a text file using:
grep -oE '[[:alnum:]]+[.][[:alnum:]_.-]+' "extra-domains.txt" | sed 's/www.//' | sort -u > outputfile.txt
And I'm using this bash to run it quicker as: extract-domains.sh text-with-domains.txt
#!/bin/bash
FILE="$1"
while read LINE; do
grep -oE '[[:alnum:]]+[.][[:alnum:]_.-]+' "$LINE" | sed 's/www.//' | sort -u > outputfile.txt
done < ${FILE}
but I keep getting multiple errors with "No such file or directory" when running the bash.
Can anyone give me a hand? Thanks.
The way you wrote it, grep takes "$LINE" as a filename. Is that what it is supposed to do ?
edit : There is no point in making a while loop and reading your file line by line. It will be much slower. You should probably write you script like this:
#!/bin/bash
grep -oE '[[:alnum:]]+[.][[:alnum:]_.-]+' "$1" |
sed 's/www.//' |
sort -u
and call it :
extract-domains.sh "extra-domains.txt" > outputfile.txt

How to redirect tail of multiple files to a new file with newlines?

I suspect this is an easy one. I have a directory of files and I need the last line from each file grouped into a new file.
I used:
tail -q myFile_seed*.csv > output.csv
But the output file is one long line. Is there a simple way to redirect with newlines so that each file is on its own line?
It appears that your files do not have the usual \r\n appended to the final line of the file. In this case, you'll need to handle each file separately, rather than have tail process them all at once.
for f in myfile_seed*.csv; do
tail -n 1 "$f"
printf "\n"
done > output.csv
You can do:
tail -q -n 1 myFile_seed*.csv > output.csv
Other option would be in one line;
ls myFile_seed*.csv | xargs -ifile sh -c "tail -n 1 file; echo " > output.csv

Trying to write a script to clean <script.aa=([].slice+'hjkbghkj') from multiple htm files, recursively

I am trying to modify a bash script to remove a glob of malicious code from a large number of files.
The community will benefit from this, so here it is:
#!/bin/bash
grep -r -l 'var createDocumentFragm' /home/user/Desktop/infected_site/* > /home/user/Desktop/filelist.txt
for i in $(cat /home/user/Desktop/filelist.txt)
do
cp -f $i $i.bak
done
for i in $(cat /home/user/Desktop/filelist.txt)
do
$i | sed 's/createDocumentFragm.*//g' > $i.awk
awk '/<\/SCRIPT>/{p=1;print}/<\/script>/{p=0}!p'
This is where the script bombs out with this message:
+ for i in '$(cat /home/user/Desktop/filelist.txt)'
+ sed 's/createDocumentFragm.*//g'
+ /home/user/Desktop/infected_site/index.htm
I get 2 errors and the script stops.
/home/user/Desktop/infected_site/index.htm: line 1: syntax error near unexpected token `<'
/home/user/Desktop/infected_site/index.htm: line 1: `<html><head><script>(function (){ '
I have the first 2 parts done.
The files containing createDocumentfragm have been enumerated in a text file correctly.
The files in the textfile.txt have been duplicated, in their original location with a .bak added to them IE: infected_site/some_directory/infected_file.htm and infected_file.htm.bak
effectively making sure we have a backup.
All I need to do now is write an AWK command that will use the list of files in filelist.txt, use the entire glob of malicious text as a pattern, and remove it from the files. Using just the uppercase script as the starting point, and the lower case script is too generic and could delete legitimate text
I suspect this may help me, but I don't know how to use it correctly.
http://backreference.org/2010/03/13/safely-escape-variables-in-awk/
Once I have this part figured out, and after you have verified that the files weren't mangled you can do this to clean out the bak files:
for i in $(cat /home/user/Desktop/filelist.txt)
do
rm -f $i.bak
done
Several things:
You have:
$i | sed 's/var createDocumentFragm.*//g' > $i.awk
You should probably meant this (using your use of cat which we'll talk about in a moment):
cat $i | sed 's/var createDocumentFragm.*//g' > $i.awk
You're treating each file in your file list as if it was a command and not a file.
Now, about your use of cat. If you're using cat for almost anything but concatenating multiple files together, you probably are doing something not quite right. For example, you could have done this:
sed 's/var createDocumentFragm.*//g' "$i" > $i.awk
I'm also a bit confused about the awk statement. Exactly what file are you using awk on? Your awk statement is using STDIN and STDOUT, so it's reading file names from the for loop and then printing the output on the screen. Is the sed statement suppose to feed into the awk statement?
Note that I don't have to print out my file to STDOUT, then pipe that into sed. The sed command can take the file name directly.
You also want to avoid for loops over a list of files. That is very inefficient, and can cause problems with the command line getting overloaded. Not a big issue today, but can affect you when you least suspect it. What happens is that your $(cat /home/user/Desktop/filelist.txt) must execute first before the for loop can even start.
A little rewriting of your program:
cd ~/Desktop
grep -r -l 'var createDocumentFragm' infected_site/* > filelist.txt
while read file
do
cp -f "$file" "$file.bak"
sed 's/var createDocumentFragm.*//g' "$file" > "$i.awk"
awk '/<\/SCRIPT>/{p=1;print}/<\/script>/{p=0}!p'
done < filelist.txt
We can use one loop, and we made it a while loop. I could even feed the grep into that while loop:
grep -r -l 'var createDocumentFragm' infected_site/* | while read file
do
cp -f "$file" "$file.bak"
sed 's/var createDocumentFragm.*//g' "$file" > "$i.awk"
awk '/<\/SCRIPT>/{p=1;print}/<\/script>/{p=0}!p'
done < filelist.txt
and then I don't even have to create a temporary file.
Let me know what's going on with the awk. I suspect you wanted something like this:
grep -r -l 'var createDocumentFragm' infected_site/* | while read file
do
cp -f "$file" "$file.bak"
sed 's/var createDocumentFragm.*//g' "$file" \
| awk '/<\/SCRIPT>/{p=1;print}/<\/script>/{p=0}!p' > "$i.awk"
done < filelist.txt
Also note I put quotes around file names. This helps prevent problems if file name has a space in it.

Resources