need to clean file via SED or GREP - bash

I have these files
NotRequired.txt (having lines which need to be remove)
Need2CleanSED.txt (big file , need to clean)
Need2CleanGRP.txt (big file , need to clean)
content:
more NotRequired.txt
[abc-xyz_pqr-pe2_123]
[lon-abc-tkt_1202]
[wat-7600-1_414]
[indo-pak_isu-5_761]
I am reading above file and want to remove lines from Need2Clean???.txt, trying via SED and GREP but no success.
myFile="NotRequired.txt"
while IFS= read -r HKline
do
sed -i '/$HKline/d' Need2CleanSED.txt
done < "$myFile"
myFile="NotRequired.txt"
while IFS= read -r HKline
do
grep -vE \"$HKline\" Need2CleanGRP.txt > Need2CleanGRP.txt
done < "$myFile"
Looks as if the Variable and characters [] making some problem.

What you're doing is extremely inefficient and error prone. Just do this:
grep -vF -f NotRequired.txt Need2CleanGRP.txt > tmp &&
mv tmp Need2CleanGRP.txt
Thanks to grep -F the above treats each line of NotRequired.txt as a string rather than a regexp so you don't have to worry about escaping RE metachars like [ and you don't need to wrap it in a shell loop - that one command will remove all undesirable lines in one execution of grep.
Never do command file > file btw as the shell might decide to execute the > file first and so empty file before command gets a chance to read it! Always do command file > tmp && mv tmp file instead.

Your assumption is correct. The [...] construct looks for any characters in that set, so you have to preface ("escape") them with \. The easiest way is to do that in your original file:
sed -i -e 's:\[:\\[:' -e 's:\]:\\]:' "${myFile}"
If you don't like that, you can probably put the sed command in where you're directing the file in:
done < replace.txt|sed -e 's:\[:\\[:' -e 's:\]:\\]:'
Finally, you can use sed on each HKline variable:
HKline=$( echo $HKline | sed -e 's:\[:\\[:' -e 's:\]:\\]:' )

try gnu sed:
sed -Ez 's/\n/\|/g;s!\[!\\[!g;s!\]!\\]!g; s!(.*).!/\1/d!' NotRequired.txt| sed -Ef - Need2CleanSED.txt
Two sed process are chained into one by shell pipe
NotRequired.txt is 'slurped' by sed -z all at once and substituted its \n and [ meta-char with | and \[ respectively of which the 2nd process uses it as regex script for the input file, ie. Need2CleanSED.txt. 1st process output;
/\[abc-xyz_pqr-pe2_123\]|\[lon-abc-tkt_1202\]|\[wat-7600-1_414\]|\[indo-pak_isu-5_761\]/d
add -u ie. unbuffered, option to evade from batch process, sort of direct i/o

Related

sed doesn't catch all sets of doubles

I've writted a sed script to replace all ^^ with NULL. It seems though that sed is only catching a pair, but not including the second in that pair as it continues to search.
echo "^^^^" | sed 's/\^\^/\^NULL\^/g'
produces
^NULL^^NULL^
when it should produce
^NULL^NULL^NULL^
Try with a loop to apply your command again to modified pattern space:
echo "^^^^" | sed ':a;s/\^\^/\^NULL\^/;t a;'
To edit a file in place on OSX, try the -i flag and multiline command:
sed -i '' ':a
s/\^\^/\^NULL\^/
t a' file
With GNU sed:
sed -i ':a;s/\^\^/\^NULL\^/;t a;' file
or simply redirect the command to a temporary file before renaming it:
sed ':a;s/\^\^/\^NULL\^/;t a;' file > tmp && mv tmp file
I really like SLePort solution, but since it is not working for you, you can try with (tested on Linux, not Mac):
echo "^^^^" | sed 's/\^\^/\^NULL\^/g; s//\^NULL\^/g'
It is doing the same as the former solution, but explicitly, not looping with tags.
You can omit the pattern in the second command and sed will use the previous pattern.

Unix shell scripting, need assign the text files values to the sed command

i was trying to add the lines from the text file to the sed command
observered_list.txt
Uncaught SlingException
cannot render resource
IncludeTag Error
Recursive invocation
Reference component error
i need it to be coded like the following
sed '/Uncaught SlingException\|cannot render resource\|IncludeTag Error\|Recursive invocation\|Reference component error/ d'
help me to do this.
I would suggest you create a sed script and delete each pattern consecutively:
while read -r pattern; do
printf "/%s/ d;\n" "$pattern"
done < observered_list.txt >> remove_patterns.sed
# now invoke sed on the file you want to modify
sed -f remove_patterns.sed file_to_clean
Alternatively you could construct the sed command like this:
pattern=
while read -r line; do
pattern=$pattern'\|'$line
done < observered_list.txt
# strip of first and last \|
pattern=${pattern#\\\|}
pattern=${pattern%\\\|}
printf "sed '/%s/ d'\n" "$pattern"
# you still need to invoke the command, it's just printed
You can use grep for that:
grep -vFf /file/with/patterns.txt /file/to/process.txt
Explanation:
-v excludes lines of process.txt which match one of the patterns from output
-F treats patterns in patterns.txt as fixed strings instead of regexes (looks like this is desired here)
-f reads patterns from patterns.txt
Check man grep for further information.

How to add a line in sed if not match is found [duplicate]

I need to add the following line to the end of a config file:
include "/configs/projectname.conf"
to a file called lighttpd.conf
I am looking into using sed to do this, but I can't work out how.
How would I only insert it if the line doesn't already exist?
Just keep it simple :)
grep + echo should suffice:
grep -qxF 'include "/configs/projectname.conf"' foo.bar || echo 'include "/configs/projectname.conf"' >> foo.bar
-q be quiet
-x match the whole line
-F pattern is a plain string
https://linux.die.net/man/1/grep
Edit:
incorporated #cerin and #thijs-wouters suggestions.
This would be a clean, readable and reusable solution using grep and echo to add a line to a file only if it doesn't already exist:
LINE='include "/configs/projectname.conf"'
FILE='lighttpd.conf'
grep -qF -- "$LINE" "$FILE" || echo "$LINE" >> "$FILE"
If you need to match the whole line use grep -xqF
Add -s to ignore errors when the file does not exist, creating a new file with just that line.
Try this:
grep -q '^option' file && sed -i 's/^option.*/option=value/' file || echo 'option=value' >> file
Using sed, the simplest syntax:
sed \
-e '/^\(option=\).*/{s//\1value/;:a;n;ba;q}' \
-e '$aoption=value' filename
This would replace the parameter if it exists, else would add it to the bottom of the file.
Use the -i option if you want to edit the file in-place.
If you want to accept and keep white spaces, and in addition to remove the comment, if the line already exists, but is commented out, write:
sed -i \
-e '/^#\?\(\s*option\s*=\s*\).*/{s//\1value/;:a;n;ba;q}' \
-e '$aoption=value' filename
Please note that neither option nor value must contain a slash /, or you will have to escape it to \/.
To use bash-variables $option and $value, you could write:
sed -i \
-e '/^#\?\(\s*'${option//\//\\/}'\s*=\s*\).*/{s//\1'${value//\//\\/}'/;:a;n;ba;q}' \
-e '$a'${option//\//\\/}'='${value//\//\\/} filename
The bash expression ${option//\//\\/} quotes slashes, it replaces all / with \/.
Note: Just trapped into a problem. In bash you may quote "${option//\//\\/}", but in the sh of busybox, this does not work, so you should avoid the quotes, at least in non-bourne-shells.
All combined in a bash function:
# call option with parameters: $1=name $2=value $3=file
function option() {
name=${1//\//\\/}
value=${2//\//\\/}
sed -i \
-e '/^#\?\(\s*'"${name}"'\s*=\s*\).*/{s//\1'"${value}"'/;:a;n;ba;q}' \
-e '$a'"${name}"'='"${value}" $3
}
Explanation:
/^\(option=\).*/: Match lines that start with option= and (.*) ignore everything after the =. The \(…\) encloses the part we will reuse as \1later.
/^#?(\s*'"${option//////}"'\s*=\s*).*/: Ignore commented out code with # at the begin of line. \? means «optional». The comment will be removed, because it is outside of the copied part in \(…\). \s* means «any number of white spaces» (space, tabulator). White spaces are copied, since they are within \(…\), so you do not lose formatting.
/^\(option=\).*/{…}: If matches a line /…/, then execute the next command. Command to execute is not a single command, but a block {…}.
s//…/: Search and replace. Since the search term is empty //, it applies to the last match, which was /^\(option=\).*/.
s//\1value/: Replace the last match with everything in (…), referenced by \1and the textvalue`
:a;n;ba;q: Set label a, then read next line n, then branch b (or goto) back to label a, that means: read all lines up to the end of file, so after the first match, just fetch all following lines without further processing. Then q quit and therefore ignore everything else.
$aoption=value: At the end of file $, append a the text option=value
More information on sed and a command overview is on my blog:
https://marc.wäckerlin.ch/computer/stream-editor-sed-overview-and-reference
If writing to a protected file, #drAlberT and #rubo77 's answers might not work for you since one can't sudo >>. A similarly simple solution, then, would be to use tee --append (or, on MacOS, tee -a):
LINE='include "/configs/projectname.conf"'
FILE=lighttpd.conf
grep -qF "$LINE" "$FILE" || echo "$LINE" | sudo tee --append "$FILE"
Here's a sed version:
sed -e '\|include "/configs/projectname.conf"|h; ${x;s/incl//;{g;t};a\' -e 'include "/configs/projectname.conf"' -e '}' file
If your string is in a variable:
string='include "/configs/projectname.conf"'
sed -e "\|$string|h; \${x;s|$string||;{g;t};a\\" -e "$string" -e "}" file
If, one day, someone else have to deal with this code as "legacy code", then that person will be grateful if you write a less exoteric code, such as
grep -q -F 'include "/configs/projectname.conf"' lighttpd.conf
if [ $? -ne 0 ]; then
echo 'include "/configs/projectname.conf"' >> lighttpd.conf
fi
another sed solution is to always append it on the last line and delete a pre existing one.
sed -e '$a\' -e '<your-entry>' -e "/<your-entry-properly-escaped>/d"
"properly-escaped" means to put a regex that matches your entry, i.e. to escape all regex controls from your actual entry, i.e. to put a backslash in front of ^$/*?+().
this might fail on the last line of your file or if there's no dangling newline, I'm not sure, but that could be dealt with by some nifty branching...
Here is a one-liner sed which does the job inline. Note that it preserves the location of the variable and its indentation in the file when it exists. This is often important for the context, like when there are comments around or when the variable is in an indented block. Any solution based on "delete-then-append" paradigm fails badly at this.
sed -i '/^[ \t]*option=/{h;s/=.*/=value/};${x;/^$/{s//option=value/;H};x}' test.conf
With a generic pair of variable/value you can write it this way:
var=c
val='12 34' # it handles spaces nicely btw
sed -i '/^[ \t]*'"$var"'=/{h;s/=.*/='"$val"'/};${x;/^$/{s//c='"$val"'/;H};x}' test.conf
Finally, if you want also to keep inline comments, you can do it with a catch group. E.g. if test.conf contains the following:
a=123
# Here is "c":
c=999 # with its own comment and indent
b=234
d=567
Then running this
var='c'
val='"yay"'
sed -i '/^[ \t]*'"$var"'=/{h;s/=[^#]*\(.*\)/='"$val"'\1/;s/'"$val"'#/'"$val"' #/};${x;/^$/{s//'"$var"'='"$val"'/;H};x}' test.conf
Produces that:
a=123
# Here is "c":
c="yay" # with its own comment and indent
b=234
d=567
As an awk-only one-liner:
awk -v s=option=value '/^option=/{$0=s;f=1} {a[++n]=$0} END{if(!f)a[++n]=s;for(i=1;i<=n;i++)print a[i]>ARGV[1]}' file
ARGV[1] is your input file. It is opened and written to in the for loop of theEND block. Opening file for output in the END block replaces the need for utilities like sponge or writing to a temporary file and then mving the temporary file to file.
The two assignments to array a[] accumulate all output lines into a. if(!f)a[++n]=s appends the new option=value if the main awk loop couldn't find option in file.
I have added some spaces (not many) for readability, but you really need just one space in the whole awk program, the space after print.
If file includes # comments they will be preserved.
Here's an awk implementation
/^option *=/ {
print "option=value"; # print this instead of the original line
done=1; # set a flag, that the line was found
next # all done for this line
}
{print} # all other lines -> print them
END { # end of file
if(done != 1) # haven't found /option=/ -> add it at the end of output
print "option=value"
}
Run it using
awk -f update.awk < /etc/fdm_monitor.conf > /etc/fdm_monitor.conf.tmp && \
mv /etc/fdm_monitor.conf.tmp /etc/fdm_monitor.conf
or
awk -f update.awk < /etc/fdm_monitor.conf | sponge /etc/fdm_monitor.conf
EDIT:
As a one-liner:
awk '/^option *=/ {print "option=value";d=1;next}{print}END{if(d!=1)print "option=value"}' /etc/fdm_monitor.conf | sponge /etc/fdm_monitor.conf
use awk
awk 'FNR==NR && /configs.*projectname\.conf/{f=1;next}f==0;END{ if(!f) { print "your line"}} ' file file
sed -i 's/^option.*/option=value/g' /etc/fdm_monitor.conf
grep -q "option=value" /etc/fdm_monitor.conf || echo "option=value" >> /etc/fdm_monitor.conf
here is an awk one-liner:
awk -v s="option=value" '/^option/{f=1;$0=s}7;END{if(!f)print s}' file
this doesn't do in-place change on the file, you can however :
awk '...' file > tmpfile && mv tmpfile file
Using sed, you could say:
sed -e '/option=/{s/.*/option=value/;:a;n;:ba;q}' -e 'aoption=value' filename
This would replace the parameter if it exists, else would add it to the bottom of the file.
Use the -i option if you want to edit the file in-place:
sed -i -e '/option=/{s/.*/option=value/;:a;n;:ba;q}' -e 'aoption=value' filename
sed -i '1 h
1 !H
$ {
x
s/^option.*/option=value/g
t
s/$/\
option=value/
}' /etc/fdm_monitor.conf
Load all the file in buffer, at the end, change all occurence and if no change occur, add to the end
The answers using grep are wrong. You need to add an -x option to match the entire line otherwise lines like #text to add will still match when looking to add exactly text to add.
So the correct solution is something like:
grep -qxF 'include "/configs/projectname.conf"' foo.bar || echo 'include "/configs/projectname.conf"' >> foo.bar
Using sed: It will insert at the end of line. You can also pass in variables as usual of course.
grep -qxF "port=9033" $light.conf
if [ $? -ne 0 ]; then
sed -i "$ a port=9033" $light.conf
else
echo "port=9033 already added"
fi
Using oneliner sed
grep -qxF "port=9033" $lightconf || sed -i "$ a port=9033" $lightconf
Using echo may not work under root, but will work like this. But it will not let you automate things if you are looking to do it since it might ask for password.
I had a problem when I was trying to edit from the root for a particular user. Just adding the $username before was a fix for me.
grep -qxF "port=9033" light.conf
if [ $? -ne 0 ]; then
sudo -u $user_name echo "port=9033" >> light.conf
else
echo "already there"
fi
I elaborated on kev's grep/sed solution by setting variables in order to reduce duplication.
Set the variables in the first line (hint: $_option shall match everything on the line up until the value [including any seperator like = or :]).
_file="/etc/ssmtp/ssmtp.conf" _option="mailhub=" _value="my.domain.tld" \
sh -c '\
grep -q "^$_option" "$_file" \
&& sed -i "s/^$_option.*/$_option$_value/" "$_file" \
|| echo "$_option$_value" >> "$_file"\
'
Mind that the sh -c '...' just has the effect of widening the scope of the variables without the need for an export. (See Setting an environment variable before a command in bash not working for second command in a pipe)
You can use this function to find and search config changes:
#!/bin/bash
#Find and Replace config values
find_and_replace_config () {
file=$1
var=$2
new_value=$3
awk -v var="$var" -v new_val="$new_value" 'BEGIN{FS=OFS="="}match($1, "^\\s*" var "\\s*") {$2=" " new_val}1' "$file" > output.tmp && sudo mv output.tmp $file
}
find_and_replace_config /etc/php5/apache2/php.ini max_execution_time 60
If you want to run this command using a python script within a Linux terminal...
import os,sys
LINE = 'include '+ <insert_line_STRING>
FILE = <insert_file_path_STRING>
os.system('grep -qxF $"'+LINE+'" '+FILE+' || echo $"'+LINE+'" >> '+FILE)
The $ and double quotations had me in a jungle, but this worked.
Thanks everyone
Try:
LINE='include "/configs/projectname.conf"'
sed -n "\|$LINE|q;\$a $LINE" lighttpd.conf >> lighttpd.conf
Use the pipe as separator and quit if $LINE has been found. Otherwise, append $LINE at the end.
Since we only read the file in sed command, I suppose we have no clobber issue in general (it depends on your shell settings).
Using only sed I'd suggest the following solution:
sed -i \
-e 's#^include "/configs/projectname.conf"#include "/configs/projectname.conf"#' \
-e t \
-e '$ainclude "/configs/projectname.conf"' lighttpd.conf
s replace the line include "/configs/projectname.conf with itself (using # as delimiter here)
t if the replacement was successful skip the rest of the commands
$a otherwise jump to the last line and append include "/configs/projectname.conf after it
Almost all of the answers work but not in all scenarios or OS as per my experience. Only thing that worked on older systems and new and different flavours of OS is the following.
I needed to append KUBECONFIG path to bashrc file if it doesnt exist. So, what I did is
I assume that it exists and delete it.
with sed I append the string I want.
sed -i '/KUBECONFIG=/d' ~/.bashrc
echo 'export KUBECONFIG=/etc/rancher/rke2/rke2.yaml' >> ~/.bashrc
I needed to edit a file with restricted write permissions so needed sudo. working from ghostdog74's answer and using a temp file:
awk 'FNR==NR && /configs.*projectname\.conf/{f=1;next}f==0;END{ if(!f) { print "your line"}} ' file > /tmp/file
sudo mv /tmp/file file

Trying to write a script to clean <script.aa=([].slice+'hjkbghkj') from multiple htm files, recursively

I am trying to modify a bash script to remove a glob of malicious code from a large number of files.
The community will benefit from this, so here it is:
#!/bin/bash
grep -r -l 'var createDocumentFragm' /home/user/Desktop/infected_site/* > /home/user/Desktop/filelist.txt
for i in $(cat /home/user/Desktop/filelist.txt)
do
cp -f $i $i.bak
done
for i in $(cat /home/user/Desktop/filelist.txt)
do
$i | sed 's/createDocumentFragm.*//g' > $i.awk
awk '/<\/SCRIPT>/{p=1;print}/<\/script>/{p=0}!p'
This is where the script bombs out with this message:
+ for i in '$(cat /home/user/Desktop/filelist.txt)'
+ sed 's/createDocumentFragm.*//g'
+ /home/user/Desktop/infected_site/index.htm
I get 2 errors and the script stops.
/home/user/Desktop/infected_site/index.htm: line 1: syntax error near unexpected token `<'
/home/user/Desktop/infected_site/index.htm: line 1: `<html><head><script>(function (){ '
I have the first 2 parts done.
The files containing createDocumentfragm have been enumerated in a text file correctly.
The files in the textfile.txt have been duplicated, in their original location with a .bak added to them IE: infected_site/some_directory/infected_file.htm and infected_file.htm.bak
effectively making sure we have a backup.
All I need to do now is write an AWK command that will use the list of files in filelist.txt, use the entire glob of malicious text as a pattern, and remove it from the files. Using just the uppercase script as the starting point, and the lower case script is too generic and could delete legitimate text
I suspect this may help me, but I don't know how to use it correctly.
http://backreference.org/2010/03/13/safely-escape-variables-in-awk/
Once I have this part figured out, and after you have verified that the files weren't mangled you can do this to clean out the bak files:
for i in $(cat /home/user/Desktop/filelist.txt)
do
rm -f $i.bak
done
Several things:
You have:
$i | sed 's/var createDocumentFragm.*//g' > $i.awk
You should probably meant this (using your use of cat which we'll talk about in a moment):
cat $i | sed 's/var createDocumentFragm.*//g' > $i.awk
You're treating each file in your file list as if it was a command and not a file.
Now, about your use of cat. If you're using cat for almost anything but concatenating multiple files together, you probably are doing something not quite right. For example, you could have done this:
sed 's/var createDocumentFragm.*//g' "$i" > $i.awk
I'm also a bit confused about the awk statement. Exactly what file are you using awk on? Your awk statement is using STDIN and STDOUT, so it's reading file names from the for loop and then printing the output on the screen. Is the sed statement suppose to feed into the awk statement?
Note that I don't have to print out my file to STDOUT, then pipe that into sed. The sed command can take the file name directly.
You also want to avoid for loops over a list of files. That is very inefficient, and can cause problems with the command line getting overloaded. Not a big issue today, but can affect you when you least suspect it. What happens is that your $(cat /home/user/Desktop/filelist.txt) must execute first before the for loop can even start.
A little rewriting of your program:
cd ~/Desktop
grep -r -l 'var createDocumentFragm' infected_site/* > filelist.txt
while read file
do
cp -f "$file" "$file.bak"
sed 's/var createDocumentFragm.*//g' "$file" > "$i.awk"
awk '/<\/SCRIPT>/{p=1;print}/<\/script>/{p=0}!p'
done < filelist.txt
We can use one loop, and we made it a while loop. I could even feed the grep into that while loop:
grep -r -l 'var createDocumentFragm' infected_site/* | while read file
do
cp -f "$file" "$file.bak"
sed 's/var createDocumentFragm.*//g' "$file" > "$i.awk"
awk '/<\/SCRIPT>/{p=1;print}/<\/script>/{p=0}!p'
done < filelist.txt
and then I don't even have to create a temporary file.
Let me know what's going on with the awk. I suspect you wanted something like this:
grep -r -l 'var createDocumentFragm' infected_site/* | while read file
do
cp -f "$file" "$file.bak"
sed 's/var createDocumentFragm.*//g' "$file" \
| awk '/<\/SCRIPT>/{p=1;print}/<\/script>/{p=0}!p' > "$i.awk"
done < filelist.txt
Also note I put quotes around file names. This helps prevent problems if file name has a space in it.

using sed to find and replace in bash for loop

I have a large number of words in a text file to replace.
This script is working up until the sed command where I get:
sed: 1: "*.js": invalid command code *
PS... Bash isn't one of my strong points - this doesn't need to be pretty or efficient
cd '/Users/xxxxxx/Sites/xxxxxx'
echo `pwd`;
for line in `cat myFile.txt`
do
export IFS=":"
i=0
list=()
for word in $line; do
list[$i]=$word
i=$[i+1]
done
echo ${list[0]}
echo ${list[1]}
sed -i "s/{$list[0]}/{$list[1]}/g" *.js
done
You're running BSD sed (under OS X), therefore the -i flag requires an argument specifying what you want the suffix to be.
Also, no files match the glob *.js.
This looks like a simple typo:
sed -i "s/{$list[0]}/{$list[1]}/g" *.js
Should be:
sed -i "s/${list[0]}/${list[1]}/g" *.js
(just like the echo lines above)
So myFile.txt contains a list of from:to substitutions, and you are looping over each of those. Why don't you create a sed script from this file instead?
cd '/Users/xxxxxx/Sites/xxxxxx'
sed -e 's/^/s:/' -e 's/$/:/' myFile.txt |
# Output from first sed script is a sed script!
# It contains substitutions like this:
# s:from:to:
# s:other:substitute:
sed -f - -i~ *.js
Your sed might not like the -f - which means sed should read its script from standard input. If that is the case, perhaps you can create a temporary script like this instead;
sed -e 's/^/s:/' -e 's/$/:/' myFile.txt >script.sed
sed -f script.sed -i~ *.js
Another approach, if you don't feel very confident with sed and think you are going to forget in a week what the meaning of that voodoo symbols is, could be using IFS in a more efficient way:
IFS=":"
cat myFile.txt | while read PATTERN REPLACEMENT # You feed the while loop with stdout lines and read fields separated by ":"
do
sed -i "s/${PATTERN}/${REPLACEMENT}/g"
done
The only pitfall I can see (it may be more) is that if whether PATTERN or REPLACEMENT contain a slash (/) they are going to destroy your sed expression.
You can change the sed separator with a non-printable character and you should be safe.
Anyway, if you know whats on your myFile.txt you can just use any.

Resources