Chain grep commands to search for a pattern inside files that match another pattern - bash

How can I chain multiple grep commands?
For example, if I want to search recursively for all PHP files that are publicly accessible, i.e those which contain $_user_location = 'public; and search for "SendQueue() inside all these files, what should I do?
Few of my failed attempts :
grep -rnw ./* -e "^.*user_location.*public" *.php | grep -i "^.*SendQueue().*" --color
grep -rnw ./* -e "^.*user_location.*public" *.php | xargs -0 -i "^.*SendQueue().*" --color

Print grep results with filename, extract filenames and pass those filenames to second grep.
grep -H ..... | cut -d: -f1 | xargs -d'\n' grep ....
Works as long as there are no : in filenames and usually there are none.
You could always do a plain old loop:
for i in *.php; do
if grep -q .... "$i"; then
grep .... "$i"
fi
done

Using awk:
$ awk '
/SendQueue\(\)/ { # hash all SendQueue() containing records
a[++i]=$0
}
/.*user_location.*public/ { # if condition met, flag up
f=1
}
END {
if(f) # if flag up
for(j=1;j<=i;j++) # output all hashed records
print a[j]
}' file
Testfile:
$_user_location = 'public;
SendQueue()
In the lack of sample output you only get:
SendQueue()
For multiple files:
$ for f in *.php ; do awk ... $f ; done

If you add -l option to your first grep, you'll get all the file names which you can feed to you second grep, like :
grep -i "^.*SendQueue().*" --color $(grep -l ...)
assuming you don't have special characters in file names.

Some alternative, could be quicker...
1. Using sed
sed -s '/\(SendQueue()\|_user_location = \o47public\)/H;${ x;s/\n/ /g;/SendQueue.*_user_location\|_user_location.*SendQueue/F;};d' *.php
Could by write:
sed -s '
/\(SendQueue()\|_user_location = \o47public\)/H;
${
x;
s/\n/ /g;
/SendQueue.*_user_location\|_user_location.*SendQueue/F;
};
d' *.php
Or with find:
find /path -type f -name '*.php' -exec sed -s '
/\(SendQueue()\|_user_location = \o47public\)/H;
${
x;
s/\n/ /g;
/SendQueue.*_user_location\|_user_location.*SendQueue/F;
};
d' {} +
2. Using grep
But reading each file only 1 time
grep -c "\(SendQueue()\|_user_location = 'public\)" *.php | grep :2$
or
grep -c "\(SendQueue()\|_user_location = 'public\)" *.txt | sed -ne 's/:2$//p'
Then
find /path -type f -name '*.php' -exec grep -c \
"\(SendQueue()\|_user_location = 'public\)" {} + |
sed -ne 's/:2$//p'
Of course, this work only if you're sure all sentence could be present only once.
Remark
To ensure no commented line will polute result, you could replace regex by
"^[^#/]*\(SendQueue()\|_user_location = 'public\)"
In all submited alternatives

I can mention two ways of doing this:
POSIX way
You can use find(1) in order to do recursive search. find is defined by POSIX and is most likely included in your system.
find . -type f -name '*.php' -exec grep -q "\$_user_location.*=.*'public" {} \; -exec grep 'SendQueue()' {} +
Here is the explanation for what this command does:
-type f Look for files
-name '*.php With the suffix .php
-exec grep -q ... {} \; Run the first grep sequence individually.
-exec grep {} + Run the second grep sequence on the files that were matched previously.
Ripgrep way
ripgrep is a really fast recursive grep tool. This will take much less search time, but you will need to obtain it separately.
rg --glob '*.php' -l "\$_user_location.*=.*'public" | xargs rg 'SendQueue\(\)'
Here is the explanation for what this command does:
--glob '*.php' Only looks inside files with the suffix .php
-l Only lists files that match
We enter the first query and pipe all the matching files to xargs
xargs runs rg with the second query and adds the received files as arguments so that ripgrep only searches those files.
Which one to use
ripgrep really shines on huge directories, but it really isn't necessary otherwise for what you are asking. Picking find is enough for most cases. The time you will spend obtaining ripgrep will probably be more than the time you will save by using it for this specific operation. ripgrep is a really nice tool regardless.
EDIT:
The find command has 2 -exec options:
-exec grep (...) {} \; This calls the grep command for each file match. This will run the following:
grep (query) file1.php
grep (query) file2.php
grep (query) file3.php
find tracks the command result for each file, and passes them to the next test if they succeed.
-exec grep (...) {} + This calls the command with all the files attached as arguments. This will expand as:
grep (query) file1.php file2.php file3.php

Related

sed to replace string in file only displayed but not executed

I want to find all files with certain name (Myfile.txt) that do not contain certain string (my-wished-string) and then do a sed in order to do a replace in the found files. I tried with:
find . -type f -name "Myfile.txt" -exec grep -H -E -L "my-wished-string" {} + | sed 's/similar-to-my-wished-string/my-wished-string/'
But this only displays me all files with wished name that miss the "my-wished-string", but does not execute the replacement. Do I miss here something?
With a for loop and invoking a shell.
find . -type f -name "Myfile.txt" -exec sh -c '
for f; do
grep -H -E -L "my-wished-string" "$f" &&
sed -i "s/similar-to-my-wished-string/my-wished-string/" "$f"
done' sh {} +
You might want to add a -q to grep and -n to sed to silence the printing/output to stdout
You can do this by constructing two stacks; the first containing the files to search, and the second containing negative hits, which will then be iterated over to perform the replacement.
find . -type f -name "Myfile.txt" > stack1
while read -r line;
do
[ -z $(sed -n '/my-wished-string/p' "${line}") ] && echo "${line}" >> stack2
done < stack1
while read -r line;
do
sed -i "s/similar-to-my-wished-string/my-wished-string/" "${line}"
done < stack2
With some versions of sed, you can use -i to edit the file. But don't pipe the list of names to sed, just execute sed in the find:
find . -type f -name Myfile.txt -not -exec grep -q "my-wished-string" {} \; -exec sed -i 's/similar-to-my-wished-string/my-wished-string/g' {} \;
Note that any file which contains similar-to-my-wished-string also contains the string my-wished-string as a substring, so with these exact strings the command is a no-op, but I suppose your actual strings are different than these.

Bash - Multiple replace with sed statement

I'm getting mad with a script performance.
Basically I have to replace 600 strings in more than 35000 files.
I have got something like this:
patterns=(
oldText1 newText1
oldText2 newText2
oldText3 newText3
)
pattern_count=${#patterns[*]}
files=(`find \. -name '*.js'`);
files_count=${#files[*]}
for ((i=0; i < $pattern_count ; i=i+2)); do
search=${patterns[i]};
replace=${patterns[i+1]};
echo -en "\e[0K\r Status "$proggress"%. Iteration: "$i" of " $pattern_count;
for ((j=0; j < $files_count; j++)); do
command sed -i s#$search#$replace#g ${files[j]};
proggress=$(($i*100/$files_count));
echo -en "\e[0K\r Inside the second loop: " $proggress"%. File: "$j" of "$files_count;
done
proggress=$(($i*100/$pattern_count));
echo -en "\e[0K\r Status "$proggress"%. Iteration: "$i" of " $pattern_count;
done
But this takes tons of minutes. There is another solution? Probably using sed just one time and not in a double loop?
Thanks a lot.
Create a proper sed script:
s/pattern1/replacement1/g
s/pattern2/replacement2/g
...
Run this script with sed -f script.sed file (or in whatever way is required).
You may create that sed script using your array:
printf 's/%s/%s/g\n' "${patterns[#]}" >script.sed
Applying it to the files:
find . -type f -name '*.js' -exec sed -i -f script.sed {} ';'
I don't quite know how GNU sed (which I assume you're using) is handling multiple files when you use -i, but you may also want to try
find . -type f -name '*.js' -exec sed -i -f script.sed {} +
which may potentially be much more efficient (executing as few sed commands as possible). As always, test on data that you can afford to throw away after testing.
For more information about using -exec with find, see https://unix.stackexchange.com/questions/389705
You don't need to run sed multiple times over one file. You can separate sed commands with ';'
You can execute multiple seds in parallel
For example:
patterns=(
oldText1 newText1
oldText2 newText2
oldText3 newText3
)
// construct sed argument such as 's/old/new/g;s/old2/new2/g;...'
sedarg=$(
for ((i = 0; i < ${#patterns[#]}; i += 2)); do
echo -n "s/${patterns[i]}/${patterns[i+1]}/g;"
done
)
// find all files named '*.js' and pass them to args with zero as separator
// xargs will parse them:
// -0 use zero as separator
// --verbose will print the line before execution (ie. sed -i .... file)
// -n1 pass one argument/one line to one sed
// -P8 run 8 seds simulteneusly (experiment with that value, depends on how fast your cpu and harddrive is)
find . -type f -name '*.js' -print0 | xargs -0 --verbose -n1 -P8 sed -i "$sedarg"
If you need the progress bar so much, I guess you can count the lines xargs --verbose returns or better use parallel --bar, see this post.

Search presence of pattern in multiple files

I need to make sure that all the files which I find in a parent directory have a particular pattern or not.
Example:
./a/b/status: *foo*foo
./b/c/status: bar*bar
./c/d/status: foo
The command should return false as file 2 does not have a foo.
I am trying below but dont have clue on how to achieve this in single command.
find . -name "status" | xargs grep -c "foo"
-c option counts the number of times the pattern is found. You wouldn't need find, rather use -r and --include option for grep.
$ grep -r -c foo --include=status
-r does a recursive search for patterh foo for files that match status.
Example. I have four files in three directories. Each have a single line;
$ cat a/1.txt b/1.txt b/2.txt c/1.txt
foobar
bar
foo
bazfoobar
With the above grep, you would get something like this,
$ grep -ir -c foo --include=1.txt
a/1.txt:1
b/1.txt:0
c/1.txt:1
You can count the number of files that do not contain "foo", if number> 0 it means that there is at least one file that does not contain "foo" :
find . -type f -name "status" | xargs grep -c "foo" | grep ':0$' | wc -l
or
find . -type f -name "status" | xargs grep -c "foo" | grep -c ':0$'
or
optimized using iamauser answer (thanks) :
grep -ir -c "foo" --include=status | grep -c ':0$'
if all files in the tree are named "status", you can use the more simple commande line :
grep -ir -c "foo" | grep -c ':0$'
with check
r=`grep -ir -c foo | grep -c ':0$'`
if [ "$r" != "0" ]; then
echo "false"
fi
If you want find to output a list of files that can be read by xargs, then you need to use:
find . -name "status" -print0 | xargs -0 grep foo` to avoid filenames with special characters (like spaces and newlines and tabs) in them.
But I'd aim for something like this:
find . -name "status" -exec grep "foo" {} \+
The \+ to terminate the -exec causes find to append all the files it finds onto a single instance of the grep command. This is much more efficient than running grep once for each file found, which it would do if you used \;.
And the default behaviour of grep will be to show the filename and match, as you've shown in your question. You can alter this behaviour with options like:
-h ... don't show the filename
-l ... show only the files that match, without the matching text,
-L ... show only the files that DO NOT match - i.e. the ones without the pattern.
This last one sounds like what you're actually looking for.
find . -name 'status' -exec grep -L 'foo' {} + | awk 'NF{exit 1}'
The exit status of the above will be 0 if all files contain 'foo' and 1 otherwise.

How to make this script grep only the 1st line

for i in USER; do
find /home/$i/public_html/ -type f -iname '*.php' \
| xargs grep -A1 -l 'GLOBALS\|preg_replace\|array_diff_ukey\|gzuncompress\|gzinflate\|post_var\|sF=\|qV=\|_REQUEST'
done
Its ignoring the -A1. The end result is I just want it to show me files that contain any of matching words but only on the first line of the script. If there is a better more efficient less resource intensive way that would be great as well as this will be ran on very large shared servers.
Use awk instead:
for i in USER; do
find /home/$i/public_html/ -type f -iname '*.php' -exec \
awk 'FNR == 1 && /GLOBALS|preg_replace|array_diff_ukey|gzuncompress|gzinflate|post_var|sF=|qV=|_REQUEST/
{ print FILENAME }' {} +
done
This will print the current input file if the first line matches. It's not ideal, since it will read all of each file. If your version of awk supports it, you can use
awk '/GLOBALS|.../ { print FILENAME } {nextfile}'
The nextfile command will execute for the first line, effectively skipping the rest of the file after awk tests if it matches the regular expression.
The following code is untested:
for i in USER; do
find /home/$i/public_html/ -type f -iname '*.php' | while read -r; do
head -n1 "$REPLY" | grep -q 'GLOBALS\|preg_replace\|array_diff_ukey\|gzuncompress\|gzinflate\|post_var\|sF=\|qV=\|_REQUEST' \
&& echo "$REPLY"
done
done
The idea is to loop over each find result, explicitly test the first line, and print the filename if a match was found. I don't like it though because it feels so clunky.
for j in (find /home/$i/public_html/ -type f -iname '*.php');
do result=$(head -1l $j| grep $stuff );
[[ x$result |= x ]] && echo "$j: $result";
done
You'll need a little more effort to skip leasing blank lines. Fgrep will save resources.
A little perl would bring great improvement, but it's hard to type it on a phone.
Edit:
On a less cramped keyboard, inserted less brief solution.

Bash script to limit a directory size by deleting files accessed last

I had previously used a simple find command to delete tar files not accessed in the last x days (in this example, 3 days):
find /PATH/TO/FILES -type f -name "*.tar" -atime +3 -exec rm {} \;
I now need to improve this script by deleting in order of access date and my bash writing skills are a bit rusty. Here's what I need it to do:
check the size of a directory /PATH/TO/FILES
if size in 1) is greater than X size, get a list of the files by access date
delete files in order until size is less than X
The benefit here is for cache and backup directories, I will only delete what I need to to keep it within a limit, whereas the simplified method might go over size limit if one day is particularly large. I'm guessing I need to use stat and a bash for loop?
I improved brunner314's example and fixed the problems in it.
Here is a working script I'm using:
#!/bin/bash
DELETEDIR="$1"
MAXSIZE="$2" # in MB
if [[ -z "$DELETEDIR" || -z "$MAXSIZE" || "$MAXSIZE" -lt 1 ]]; then
echo "usage: $0 [directory] [maxsize in megabytes]" >&2
exit 1
fi
find "$DELETEDIR" -type f -printf "%T#::%p::%s\n" \
| sort -rn \
| awk -v maxbytes="$((1024 * 1024 * $MAXSIZE))" -F "::" '
BEGIN { curSize=0; }
{
curSize += $3;
if (curSize > maxbytes) { print $2; }
}
' \
| tac | awk '{printf "%s\0",$0}' | xargs -0 -r rm
# delete empty directories
find "$DELETEDIR" -mindepth 1 -depth -type d -empty -exec rmdir "{}" \;
Here's a simple, easy to read and understand method I came up with to do this:
DIRSIZE=$(du -s /PATH/TO/FILES | awk '{print $1}')
if [ "$DIRSIZE" -gt "$SOMELIMIT" ]
then
for f in `ls -rt --time=atime /PATH/TO/FILES/*.tar`; do
FILESIZE=`stat -c "%s" $f`
FILESIZE=$(($FILESIZE/1024))
DIRSIZE=$(($DIRSIZE - $FILESIZE))
if [ "$DIRSIZE" -lt "$LIMITSIZE" ]; then
break
fi
done
fi
I didn't need to use loops, just some careful application of stat and awk. Details and explanation below, first the code:
find /PATH/TO/FILES -name '*.tar' -type f \
| sed 's/ /\\ /g' \
| xargs stat -f "%a::%z::%N" \
| sort -r \
| awk '
BEGIN{curSize=0; FS="::"}
{curSize += $2}
curSize > $X_SIZE{print $3}
'
| sed 's/ /\\ /g' \
| xargs rm
Note that this is one logical command line, but for the sake of sanity I split it up.
It starts with a find command based on the one above, without the parts that limit it to files older than 3 days. It pipes that to sed, to escape any spaces in the file names find returns, then uses xargs to run stat on all the results. The -f "%a::%z::%N" tells stat the format to use, with the time of last access in the first field, the size of the file in the second, and the name of the file in the third. I used '::' to separate the fields because it is easier to deal with spaces in the file names that way. Sort then sorts them on the first field, with -r to reverse the ordering.
Now we have a list of all the files we are interested in, in order from latest accessed to earliest accessed. Then the awk script adds up all the sizes as it goes through the list, and begins outputting them when it gets over $X_SIZE. The files that are not output this way will be the ones kept, the other file names go to sed again to escape any spaces and then to xargs, which runs rm them.

Resources