bash: find with grep in if always true - bash

Ok so this code works
if grep -lq something file.txt ; then
So why something like this doesnt? what am i doing wrong?
if find . -name file.txt -exec grep -lq something {} \;
its always true as long as the directory exist.

From the find man page:
Exit Status
find exits with status 0 if all files are processed successfully, greater than 0 if errors occur. This is deliberately a very broad description, but if the return value is non-zero, you should not rely on the correctness of the results of find.

What you're getting back from your command is the exit value of the find and not the grep. Find almost always returns an exit value of zero as long as the query is good.
I was thinking this might work:
find . -name file.txt -print0 | xargs --0 grep -lq something
But that will return only the exit status of the last execution of grep. If grep was executed multiple times, you won't get the intermediate values. However, this probably won't be an issue with your command.

A very simple way is to check if find's output is empty:
output=$( find . -name file.txt -exec grep -lq something {} \; )
if [ -n "$output" ]
then
# found
else
# not found
fi

One approach which will short-circuit as soon as a file containing the desired contents is found (presuming that your intent is to look for whether any file matches, as opposed to whether every file matches):
check_for_content() {
target=$1; shift
while IFS= read -r -d '' filename; do
if grep -lq -e "$target" "$filename"; then
return 0
fi
done < <(find "$#" -print0)
return 1
}
Usage:
check_for_content thing-to-look-for -type f -name file.txt

Related

Aggregate files with common prefix but don't repeat header in Bash

I have a bunch of files with different common prefixes that I would like to aggregate together. Files with the same prefix all have the same header, and I don't want that header to end up in my aggregate file more than once. I've written a function which takes the common prefix as an argument, finds all the files matching that prefix and prints all but the first line to the aggregate output file, collects the header, and prepends it to the output file with cat.
aggr () {
outfile=${1}_aggregate.txt
find . -name "${1}_*.txt" -exec tail -n+2 {} \; > $outfile
fl=`find . -name "${1}_*.txt" -print -quit`
header=`head -n1 $fl`
cat - $outfile <<< "$header" > tmp.txt && mv tmp.txt $outfile
}
This generally works well, but when the find commands takes a long time to run, I sometimes don't get a header in my output files. From my logs, I can see the following error after echoing the correct header string:
mv: cannot stat ‘tmp.txt’: No such file or directory
I'm not entirely sure what is happening, but it seems like the cat command adding a header is being executed before the find function has finished, sometimes. Then the command fails to produce the tmp.txt file, and subsequently the mv command never happens. I modified my function by adding wait after the find commands, but it did not resolve the issue. Any suggestions? I'm at a loss as to why this is happening only with some files.
aggr () {
outfile=${1}_aggregate.txt
find . -name "${1}_*.txt" -exec tail -n+2 {} \; > $outfile
wait
fl=`find . -name "${1}_*.txt" -print -quit`
wait
header=`head -n1 $fl`
cat - $outfile <<< "$header" > tmp.txt && mv tmp.txt $outfile
}
I cannot comment as to why cat seemingly succeeds and tmp.txt doesn't exist; the && should not execute the mv if there was not a successful return from cat, which should always be writing the contents of outfile at a minimum even if some kind of race condition exists with header handling...
That said, I could propose a modification to your script which might make it more robust, and would save you multiple invocations of the find command, making it faster if you have a larger dataset (I suspect):
aggr () {
header=0
outfile=${1}_aggregate.txt
find . -name "${1}_*.txt" -print0 |
while IFS= read -r -d '' line; do
if [ $header -eq 0 ]; then
header=1
cp $line $outfile
else
tail -n+2 $line >> $outfile
fi
done
}
Hope this helps!

Find single line files and move them to a subfolder

I am using the following bash line to find text files in a subfolder with a given a pattern inside it and move them to a subfolder:
find originalFolder/ -maxdepth 1 -type f -exec grep -q 'mySpecificPattern' {} \; -exec mv -i {} destinationFolder/ \;
Now instead of grepping a pattern, I would like to move the files to a subfolder if they consist only of a single line (of text): how can I do that?
You can do it this way:
while IFS= read -r -d '' file; do
[[ $(wc -l < "$file") -eq 1 ]] && echo mv -i "$file" destinationFolder/
done < <(find originalFolder/ -maxdepth 1 -type f -print0)
Note use of echo in front of mv so that you can verify output before actually executing mv. Once you're satisfied with output, remove echo before mv.
Using wc as shown above is the most straightforward way, although it reads the entire file to determine the length. It's also possible to do length checks with awk, and the exit function lets you fit that into a find command.
find . -type f -exec awk 'END { exit (NR==1 ? 0 : 1) } NR==2 { exit 1 }' {} \; -print
The command returns status 0 if there has been only 1 input record at end-of-file, and it also exits immediately with status 1 when line 2 is encountered; this should easily outrun wc if large files are a performance concern.

How to make this script grep only the 1st line

for i in USER; do
find /home/$i/public_html/ -type f -iname '*.php' \
| xargs grep -A1 -l 'GLOBALS\|preg_replace\|array_diff_ukey\|gzuncompress\|gzinflate\|post_var\|sF=\|qV=\|_REQUEST'
done
Its ignoring the -A1. The end result is I just want it to show me files that contain any of matching words but only on the first line of the script. If there is a better more efficient less resource intensive way that would be great as well as this will be ran on very large shared servers.
Use awk instead:
for i in USER; do
find /home/$i/public_html/ -type f -iname '*.php' -exec \
awk 'FNR == 1 && /GLOBALS|preg_replace|array_diff_ukey|gzuncompress|gzinflate|post_var|sF=|qV=|_REQUEST/
{ print FILENAME }' {} +
done
This will print the current input file if the first line matches. It's not ideal, since it will read all of each file. If your version of awk supports it, you can use
awk '/GLOBALS|.../ { print FILENAME } {nextfile}'
The nextfile command will execute for the first line, effectively skipping the rest of the file after awk tests if it matches the regular expression.
The following code is untested:
for i in USER; do
find /home/$i/public_html/ -type f -iname '*.php' | while read -r; do
head -n1 "$REPLY" | grep -q 'GLOBALS\|preg_replace\|array_diff_ukey\|gzuncompress\|gzinflate\|post_var\|sF=\|qV=\|_REQUEST' \
&& echo "$REPLY"
done
done
The idea is to loop over each find result, explicitly test the first line, and print the filename if a match was found. I don't like it though because it feels so clunky.
for j in (find /home/$i/public_html/ -type f -iname '*.php');
do result=$(head -1l $j| grep $stuff );
[[ x$result |= x ]] && echo "$j: $result";
done
You'll need a little more effort to skip leasing blank lines. Fgrep will save resources.
A little perl would bring great improvement, but it's hard to type it on a phone.
Edit:
On a less cramped keyboard, inserted less brief solution.

Is there a way to pipe from a variable?

I'm trying to find all files in a file structure above a certain file size, list them, then delete them. What I currently have looks like this:
filesToDelete=$(find $find $1 -type f -size +$2k -ls)
if [ -n "$filesToDelete" ];then
echo "Deleting files..."
echo $filesToDelete
$filesToDelete | xargs rm
else
echo "no files to delete"
fi
Everything works, except the $filesToDelete | xargs rm, obviously. Is there a way to use pipe on a variable? Or is there another way I could do this? My google-fu didn't really find anything, so any help would be appreciated.
Edit: Thanks for the information everyone. I will post the working code here now for anyone else stumbling upon this question later:
if [ $(find $1 -type f -size +$2k | wc -l) -ge 1 ]; then
find $1 -type f -size +$2k -exec sh -c 'f={}; echo "deleting file $f"; rm $f' {} \;
else
echo "no files above" $2 "kb found"
fi
As already pointed out, you don't need piping a var in this case. But just in case you needed it in some other situation, you can use
xargs rm <<< $filesToDelete
or, more portably
echo $filesToDelete | xargs rm
Beware of spaces in file names.
To also output the value together with piping it, use tee with process substitution:
echo "$x" | tee >( xargs rm )
You can directly use -exec to perform an action on the files that were found in find:
find $1 -type f -size +$2k -exec rm {} \;
The -exec trick makes find execute the command given for each one of the matches found. To refer the match itself we have to use {} \;.
If you want to perform more than one action, -exec sh -c "..." makes it. For example, here you can both print the name of the files are about to be removed... and remove them. Note the f={} thingy to store the name of the file, so that it can be used later on in echo and rm:
find $1 -type f -size +$2k -exec sh -c 'f={}; echo "removing $f"; rm $f' {} \;
In case you want to print a message if no matches were found, you can use wc -l to count the number of matches (if any) and do an if / else condition with it:
if [ $(find $1 -type f -size +$2k | wc -l) -ge 1 ]; then
find $1 -type f -size +$2k -exec rm {} \;
else
echo "no matches found"
fi
wc is a command that does word count (see man wc for more info). Doing wc -l counts the number of lines. So command | wc -l counts the number of lines returned by command.
Then we use the if [ $(command | wc -l) -ge 1 ] check, which does an integer comparison: if the value is greater or equal to 1, then do what follows; otherwise, do what is in else.
Buuuut the previous approach was using find twice, which is a bit inefficient. As -exec sh -c is opening a sub-shell, we cannot rely on a variable to keep track of the number of files opened. Why? Because a sub-shell cannot assign values to its parent shell.
Instead, let's store the files that were deleted into a file, and then count it:
find . -name "*.txt" -exec sh -c 'f={}; echo "$f" >> /tmp/findtest; rm $f' {} \;
if [ -s /tmp/findtest ]; then #check if the file is empty
echo "file has $(wc -l < /tmp/findtest) lines"
# you can also `cat /tmp/findtest` here to show the deleted files
else
echo "no matches"
fi
Note that you can cat /tmp/findtest to see the deleted files, or also use echo "$f" alone (without redirection) to indicate while removing. rm /tmp/findtest is also an option, to do once the process is finished.
You don't need to do all this. You can directly use find command to get the files over a particular size limit and delete it using xargs.
This should work:
#!/bin/bash
if [ $(find $1 -type f -size +$2k | wc -l) -eq 0 ]; then
echo "No Files to delete"
else
echo "Deleting the following files"
find $1 -size +$2 -exec ls {} \+
find $1 -size +$2 -exec ls {} \+ | xargs rm -f
echo "Done"
fi

Can find take a file as argument from stdin?

I have a list of 3,900 ID numbers and I need to find on our FTP server the matching files.
Finding one file is quite simple e.g.
find . -name "*IDNumber*" -exec ls '{}' ';' -print
but how do I do this for 3,900 IDs numbers? I created a file with the IDs like so
028892663163
028923481973
...
but how do I pass the list of ID numbers as argument? Can you provide some pointers?
Thanks!
I would try to reduce the number of times you have to invoke find:
find . -type f -print | grep -f id.file | xargs cp -t target_dir
You may try to optimize it by running find with more than one id at a time.
With bash (100 at a time, you may try with more):
c= p=
while IFS= read -r; do
p+=" -name '*$REPLY*' -o "
(( ++c ))
(( c % 100 )) || {
eval find . ${p% -o }
p=
}
done < id_list_all
[[ $p ]] &&
eval find . ${p% -o }
Figured it out.
put all my 3,900 ID numbers in a file outfile
typed the command line:
cat outfile | while read line
do
find . -name "$line" -exec cp '{}' /target_directory ';' -print
done
Worked awesome!
I read your question wrong the first time... arguments from find to other things. What you want is arguments from a file passed to find. So, here's the correct answer with xargs:
xargs --max-args=1 -I X -d '\n' find . -name X -exec [...] < your_list

Resources