Search presence of pattern in multiple files - shell

I need to make sure that all the files which I find in a parent directory have a particular pattern or not.
./a/b/status: *foo*foo
./b/c/status: bar*bar
./c/d/status: foo
The command should return false as file 2 does not have a foo.
I am trying below but dont have clue on how to achieve this in single command.
find . -name "status" | xargs grep -c "foo"

-c option counts the number of times the pattern is found. You wouldn't need find, rather use -r and --include option for grep.
$ grep -r -c foo --include=status
-r does a recursive search for patterh foo for files that match status.
Example. I have four files in three directories. Each have a single line;
$ cat a/1.txt b/1.txt b/2.txt c/1.txt
With the above grep, you would get something like this,
$ grep -ir -c foo --include=1.txt

You can count the number of files that do not contain "foo", if number> 0 it means that there is at least one file that does not contain "foo" :
find . -type f -name "status" | xargs grep -c "foo" | grep ':0$' | wc -l
find . -type f -name "status" | xargs grep -c "foo" | grep -c ':0$'
optimized using iamauser answer (thanks) :
grep -ir -c "foo" --include=status | grep -c ':0$'
if all files in the tree are named "status", you can use the more simple commande line :
grep -ir -c "foo" | grep -c ':0$'
with check
r=`grep -ir -c foo | grep -c ':0$'`
if [ "$r" != "0" ]; then
echo "false"

If you want find to output a list of files that can be read by xargs, then you need to use:
find . -name "status" -print0 | xargs -0 grep foo` to avoid filenames with special characters (like spaces and newlines and tabs) in them.
But I'd aim for something like this:
find . -name "status" -exec grep "foo" {} \+
The \+ to terminate the -exec causes find to append all the files it finds onto a single instance of the grep command. This is much more efficient than running grep once for each file found, which it would do if you used \;.
And the default behaviour of grep will be to show the filename and match, as you've shown in your question. You can alter this behaviour with options like:
-h ... don't show the filename
-l ... show only the files that match, without the matching text,
-L ... show only the files that DO NOT match - i.e. the ones without the pattern.
This last one sounds like what you're actually looking for.

find . -name 'status' -exec grep -L 'foo' {} + | awk 'NF{exit 1}'
The exit status of the above will be 0 if all files contain 'foo' and 1 otherwise.


Chain grep commands to search for a pattern inside files that match another pattern

How can I chain multiple grep commands?
For example, if I want to search recursively for all PHP files that are publicly accessible, i.e those which contain $_user_location = 'public; and search for "SendQueue() inside all these files, what should I do?
Few of my failed attempts :
grep -rnw ./* -e "^.*user_location.*public" *.php | grep -i "^.*SendQueue().*" --color
grep -rnw ./* -e "^.*user_location.*public" *.php | xargs -0 -i "^.*SendQueue().*" --color
Print grep results with filename, extract filenames and pass those filenames to second grep.
grep -H ..... | cut -d: -f1 | xargs -d'\n' grep ....
Works as long as there are no : in filenames and usually there are none.
You could always do a plain old loop:
for i in *.php; do
if grep -q .... "$i"; then
grep .... "$i"
Using awk:
$ awk '
/SendQueue\(\)/ { # hash all SendQueue() containing records
/.*user_location.*public/ { # if condition met, flag up
if(f) # if flag up
for(j=1;j<=i;j++) # output all hashed records
print a[j]
}' file
$_user_location = 'public;
In the lack of sample output you only get:
For multiple files:
$ for f in *.php ; do awk ... $f ; done
If you add -l option to your first grep, you'll get all the file names which you can feed to you second grep, like :
grep -i "^.*SendQueue().*" --color $(grep -l ...)
assuming you don't have special characters in file names.
Some alternative, could be quicker...
1. Using sed
sed -s '/\(SendQueue()\|_user_location = \o47public\)/H;${ x;s/\n/ /g;/SendQueue.*_user_location\|_user_location.*SendQueue/F;};d' *.php
Could by write:
sed -s '
/\(SendQueue()\|_user_location = \o47public\)/H;
s/\n/ /g;
d' *.php
Or with find:
find /path -type f -name '*.php' -exec sed -s '
/\(SendQueue()\|_user_location = \o47public\)/H;
s/\n/ /g;
d' {} +
2. Using grep
But reading each file only 1 time
grep -c "\(SendQueue()\|_user_location = 'public\)" *.php | grep :2$
grep -c "\(SendQueue()\|_user_location = 'public\)" *.txt | sed -ne 's/:2$//p'
find /path -type f -name '*.php' -exec grep -c \
"\(SendQueue()\|_user_location = 'public\)" {} + |
sed -ne 's/:2$//p'
Of course, this work only if you're sure all sentence could be present only once.
To ensure no commented line will polute result, you could replace regex by
"^[^#/]*\(SendQueue()\|_user_location = 'public\)"
In all submited alternatives
I can mention two ways of doing this:
You can use find(1) in order to do recursive search. find is defined by POSIX and is most likely included in your system.
find . -type f -name '*.php' -exec grep -q "\$_user_location.*=.*'public" {} \; -exec grep 'SendQueue()' {} +
Here is the explanation for what this command does:
-type f Look for files
-name '*.php With the suffix .php
-exec grep -q ... {} \; Run the first grep sequence individually.
-exec grep {} + Run the second grep sequence on the files that were matched previously.
Ripgrep way
ripgrep is a really fast recursive grep tool. This will take much less search time, but you will need to obtain it separately.
rg --glob '*.php' -l "\$_user_location.*=.*'public" | xargs rg 'SendQueue\(\)'
Here is the explanation for what this command does:
--glob '*.php' Only looks inside files with the suffix .php
-l Only lists files that match
We enter the first query and pipe all the matching files to xargs
xargs runs rg with the second query and adds the received files as arguments so that ripgrep only searches those files.
Which one to use
ripgrep really shines on huge directories, but it really isn't necessary otherwise for what you are asking. Picking find is enough for most cases. The time you will spend obtaining ripgrep will probably be more than the time you will save by using it for this specific operation. ripgrep is a really nice tool regardless.
The find command has 2 -exec options:
-exec grep (...) {} \; This calls the grep command for each file match. This will run the following:
grep (query) file1.php
grep (query) file2.php
grep (query) file3.php
find tracks the command result for each file, and passes them to the next test if they succeed.
-exec grep (...) {} + This calls the command with all the files attached as arguments. This will expand as:
grep (query) file1.php file2.php file3.php

How to get list of certain strings in a list of files using bash?

The title is maybe not really descriptive, but I couldn't find a more concise way to describe the problem.
I have a directory containing different files which have a name that e.g. looks like this:
{some text}2019Q2{some text}.pdf
So the filenames have somewhere in the name a year followed by a capital Q and then another number. The other text can be anything, but it won't contain anything matching the format year-Q-number. There will also be no numbers directly before or after this format.
I can work something out to get this from one filename, but I actually need a 'list' so I can do a for-loop over this in bash.
So, if my directory contains the files:
I want a for loop that goes over 2019Q2, 2019Q3, 2020Q1, and 2020Q2.
This is what I have so far. It is able to extract the substrings, but it still has doubles. Since I'm already in the loop and I don't see how I can remove the doubles.
find original/*.pdf -type f -print0 | while IFS= read -r -d '' line; do
echo $line | grep -oP '[0-9]{4}Q[0-9]'
# list all _filanames_ that end with .pdf from the folder original
find original -maxdepth 1 -name '*.pdf' -type f -print "%p\n" |
# extract the pattern
sed 's/.*\([0-9]{4}Q[0-9]\).*/\1/' |
# iterate
while IFS= read -r file; do
echo "$file"
I used -print %p to print just the filename, instead of full path. The GNU sed has -z option that you can use with -print0 (or -print "%p\0").
With how you have wanted to do this, if your files have no newline in the name, there is no need to loop over list in bash (as a rule of a thumb, try to avoid while read line, it's very slow):
find original -maxdepth 1 -name '*.pdf' -type f | grep -oP '[0-9]{4}Q[0-9]'
or with a zero seprated stream:
find original -maxdepth 1 -name '*.pdf' -type f -print0 |
grep -zoP '[0-9]{4}Q[0-9]' | tr '\0' '\n'
If you want to remove duplicate elements from the list, pipe it to sort -u.
Try this, in bash:
~ > $ ls
costumerA_2019Q2_something.pdf costumerB_2019Q2_something.pdf
costumerA_2019Q3_something.pdf other.pdf
costumerA_2020Q1_something.pdf someother.file.txt
~ > $ for x in `(ls)`; do [[ ${x} =~ [0-9]Q[1-4] ]] && echo $x; done;
~ > $ (for x in *; do [[ ${x} =~ ([0-9]{4}Q[1-4]).+pdf ]] && echo ${BASH_REMATCH[1]}; done;) | sort -u

How to count files in subdir and filter output in bash

Hi hoping someone can help, I have some directories on disk and I want to count the number of files in them (as well as dir size if possible) and then strip info from the output. So far I have this
find . -type d -name "*,d" -print0 | xargs -0 -I {} sh -c 'echo -e $(find "{}" | wc -l) "{}"' | sort -n
This gets me all the dir's that match my pattern as well as the number of files - great!
This gives me something like
2 ./bob/sourceimages/psd/dzv_body.psd,d
2 ./bob/sourceimages/psd/dzv_body_nrm.psd,d
2 ./bob/sourceimages/psd/dzv_body_prm.psd,d
2 ./bob/sourceimages/psd/dzv_eyeball.psd,d
2 ./bob/sourceimages/psd/t_zbody.psd,d
2 ./bob/sourceimages/psd/t_gear.psd,d
2 ./bob/sourceimages/psd/t_pupil.psd,d
2 ./bob/sourceimages/z_vehicles_diff.tga,d
2 ./bob/sourceimages/zvehiclesa_diff.tga,d
5 ./bob/sourceimages/zvehicleswheel_diff.jpg,d
From that I would like to filter based on max number of files so > 4 for example, I would like to capture filetype as a variable for each remaining result e.g ./bob/sourceimages/zvehicleswheel_diff.jpg,d
I guess I could use awk for this?
Then finally I would like like to remove all the results from disk, with find I normally just do something like -exec rm -rf {} \; but I'm not clear how it would work here
Thanks a lot
While this is clearly not the answer, these commands get me the info I want in the form I want it. I just need a way to put it all together and not search multiple times as that's total rubbish
filetype=$(find . -type d -name "*,d" -print0 | awk 'BEGIN { FS = "." }; {
print $3 }' | cut -d',' -f1)
filesize=$(find . -type d -name "*,d" -print0 | xargs -0 -I {} sh -c 'du -h
{};' | awk '{ print $1 }')
filenumbers=$(find . -type d -name "*,d" -print0 | xargs -0 -I {} sh -c
'echo -e $(find "{}" | wc -l);')
files_count=`ls -keys | nl`
For instance:
ls | nl
nl printed numbers of lines

Bash script to limit a directory size by deleting files accessed last

I had previously used a simple find command to delete tar files not accessed in the last x days (in this example, 3 days):
find /PATH/TO/FILES -type f -name "*.tar" -atime +3 -exec rm {} \;
I now need to improve this script by deleting in order of access date and my bash writing skills are a bit rusty. Here's what I need it to do:
check the size of a directory /PATH/TO/FILES
if size in 1) is greater than X size, get a list of the files by access date
delete files in order until size is less than X
The benefit here is for cache and backup directories, I will only delete what I need to to keep it within a limit, whereas the simplified method might go over size limit if one day is particularly large. I'm guessing I need to use stat and a bash for loop?
I improved brunner314's example and fixed the problems in it.
Here is a working script I'm using:
MAXSIZE="$2" # in MB
if [[ -z "$DELETEDIR" || -z "$MAXSIZE" || "$MAXSIZE" -lt 1 ]]; then
echo "usage: $0 [directory] [maxsize in megabytes]" >&2
exit 1
find "$DELETEDIR" -type f -printf "%T#::%p::%s\n" \
| sort -rn \
| awk -v maxbytes="$((1024 * 1024 * $MAXSIZE))" -F "::" '
BEGIN { curSize=0; }
curSize += $3;
if (curSize > maxbytes) { print $2; }
' \
| tac | awk '{printf "%s\0",$0}' | xargs -0 -r rm
# delete empty directories
find "$DELETEDIR" -mindepth 1 -depth -type d -empty -exec rmdir "{}" \;
Here's a simple, easy to read and understand method I came up with to do this:
DIRSIZE=$(du -s /PATH/TO/FILES | awk '{print $1}')
if [ "$DIRSIZE" -gt "$SOMELIMIT" ]
for f in `ls -rt --time=atime /PATH/TO/FILES/*.tar`; do
FILESIZE=`stat -c "%s" $f`
if [ "$DIRSIZE" -lt "$LIMITSIZE" ]; then
I didn't need to use loops, just some careful application of stat and awk. Details and explanation below, first the code:
find /PATH/TO/FILES -name '*.tar' -type f \
| sed 's/ /\\ /g' \
| xargs stat -f "%a::%z::%N" \
| sort -r \
| awk '
BEGIN{curSize=0; FS="::"}
{curSize += $2}
curSize > $X_SIZE{print $3}
| sed 's/ /\\ /g' \
| xargs rm
Note that this is one logical command line, but for the sake of sanity I split it up.
It starts with a find command based on the one above, without the parts that limit it to files older than 3 days. It pipes that to sed, to escape any spaces in the file names find returns, then uses xargs to run stat on all the results. The -f "%a::%z::%N" tells stat the format to use, with the time of last access in the first field, the size of the file in the second, and the name of the file in the third. I used '::' to separate the fields because it is easier to deal with spaces in the file names that way. Sort then sorts them on the first field, with -r to reverse the ordering.
Now we have a list of all the files we are interested in, in order from latest accessed to earliest accessed. Then the awk script adds up all the sizes as it goes through the list, and begins outputting them when it gets over $X_SIZE. The files that are not output this way will be the ones kept, the other file names go to sed again to escape any spaces and then to xargs, which runs rm them.

BASH: How to remove all files except those named in a manifest?

I have a manifest file which is just a list of newline separated filenames. How can I remove all files that are not named in the manifest from a folder?
I've tried to build a find ./ ! -name "filename" command dynamically:
command="find ./ ! -name \"MANIFEST\" "
for line in `cat MANIFEST`; do
command=${command}"! -name \"${line}\" "
command=${command} -exec echo {} \;
But the files remain.
[Note:] I know this uses echo. I want to check what my command does before using it.
Solution:(thanks PixelBeat)
ls -1 > ALLFILES
sort MANIFEST MANIFEST ALLFILES | uniq -u | xargs rm
Without temp file:
ls -1 | sort MANIFEST MANIFEST - | uniq -u | xargs rm
Both Ignores whether the files are sorted/not.
For each file in current directory grep filename in MANIFEST file and rm file if not matched.
for file in *
do grep -q -F "$file" PATH_TO_YOUR_MANIFIST || rm "$file"
Using the "set difference" pattern from
(find ./ -type f -printf "%P\n"; cat MANIFEST MANIFEST; echo MANIFEST) |
sort | uniq -u | xargs -r rm
Note I list MANIFEST twice in case there are files listed there that are not actually present.
Also note the above supports files in subdirectories
figured it out:
ls -1 > ALLFILES
comm -3 MANIFEST ALLFILES | xargs rm
Just for fun, a Perl 1-liner... not really needed in this case but much more customizable/extensible than Bash if you want something fancier :)
$ ls
1 2 3 4 5 M
$ cat M
$ perl -e '{use File::Slurp; %M = map {chomp; $_ => 1} read_file("M"); $M{M}=1; \
foreach $f (glob("*")) {next if $M{$f}; unlink "$f"||die "Can not unlink: $!\n" };}'
$ ls
1 3 M
The above can be even shorter if you pass the manifest on STDIN
perl -e '{%M = map {chomp; $_ => 1} <>; $M{M}=1; \
foreach $f (glob("*")) {next if $M{$f};unlink "$f"||die "Can not unlink: $!\n" };}' M
Assumes that MANIFEST is already sorted:
find -type f -printf %P\\n | sort | comm -3 MANIFEST - | xargs rm
