Execute loop on bash thru keywords - bash

I have such script, it search my mail files and if keyword is found it move all files to other location.
How to make it work for multiple keywords?, for example i would have 11 KEY's and i would not want to copy and paste find command over and over.
DIRF='move/from'
DIRT='move/to'
KEY='discount'
find $DIRF -type f -exec grep -ilR "$KEY" {} \; | xargs -I % mv % $DIRT

Why are you using find here at all?
You are already telling grep to operate recursively (-R) so just point it at $DIRF and be done. -R is also pointless if you only ever give it files (from type -f).
Also grep takes a pattern that can do alternation. Just use that.
grep -RilE 'KEY1|KEY2|KEY3|Key4' "$DIRF"

for KEY in "discount" "other_value" "other_value2"
do
find $DIRF -type f -exec grep -ilR "$KEY " {} \; | xargs -I % mv % $DIRT
done

Related

Shell command to list file names where the matching is found

I am trying to search through a list of binary files to find some keywords on Mac.
The following works to list out all the matches, but it doesn't show me the list of files where it is being found:
find . -type f -exec strings {} \;|grep "Bv9qtsZRgspQliITY4"
Is there any trick to do this?
Using -exec with a wee ‘script’:
find . -type f \
-exec sh -c 'strings "$1" | grep -q "Bv9qtsZRgspQliITY4"' -- {} \; \
-print
The above will print the paths of all the matching files. If you also want to print the matches you can use:
find . -type f \
-exec sh -c 'strings "$1" | grep "Bv9qtsZRgspQliITY4"' -- {} \; \
-print
This will, however, print the paths after the matches. If this is not desirable, then you can use:
find . -type f \
-print \
-exec sh -c 'strings "$1" | grep "Bv9qtsZRgspQliITY4"' -- {} \;
This, on the other hand, will print all paths, even non-matching ones. To print only matching paths and their matches:
find . -type f \
-exec sh -c 'strings "$1" | grep -q "Bv9qtsZRgspQliITY4"' -- {} \; \
-print \
-exec grep "Bv9qtsZRgspQliITY4" {} \;
This will run grep twice on matching files, which will make it slower. If this is a problem the matches can be stored in a variable, and if there are any the path printed first and then the matches. This is left as an exercise to the reader.*
* Let me know if I should post it here.
Try grep -rl "Bv9qtsZRgspQliITY4" ..
Explanation of options:
-r: search recursively
-l: don't print the contents of the file, just print the filename.
Optionally, you might want to use -i to search case-insensitively.
The problem with your idea is that you're piping the output of strings into grep. The filename is only passed to strings, meaning that nothing that comes after strings knows the filename.
I'm not quite sure about portability, but if you are using GNU's version of grep, then you can use --files-with-matches
-l, --files-with-matches print only names of FILEs containing matches
Then you can use something like this:
grep --recursive --files-with-matches "Bv9qtsZRgspQliITY4" *
Well, if it's only to print names of files don't use find but grep.
grep -ar . -e 'soloman' ./testo.txt:1:soloman
-a : Search in binary files
-r : recursive
And keep it simple.
If you don't want to see the words matched in your output simply add -l, --files-with-matches:
user#DESKTOP-RR909JI ~/projects/search
$ grep -arl . -e 'soloman'
./testo.txt
You can use
# this will list all the files containing given text in current directory
# i to ignore case
# l to list files with matches
# R read and process all files in that directory, recursively, following all symbolic links
grep -iRl "your-text-to-find" ./
# for case sensitive search
grep -Rl "your-text-to-find" ./

how to grep large number of files?

I am trying to grep 40k files in the current directory and i am getting this error.
for i in $(cat A01/genes.txt); do grep $i *.kaks; done > A01/A01.result.txt
-bash: /usr/bin/grep: Argument list too long
How do one normally grep thousands of files?
Thanks
Upendra
This makes David sad...
Everyone so far is wrong (except for anubhava).
Shell scripting is not like any other programming language because much of the interpretation of lines comes from the power of the shell interpolating them before the command is actually executed.
Let's take something simple:
$ set -x
$ ls
+ ls
bar.txt foo.txt fubar.log
$ echo The text files are *.txt
echo The text files are *.txt
> echo The text files are bar.txt foo.txt
The text files are bar.txt foo.txt
$ set +x
$
The set -x allows you to see how the shell actually interpolates the glob and then passes that back to the command as input. The > points to the line that is actually being executed by the command.
You can see that the echo command isn't interpreting the *. Instead, the shell grabs the * and replaces it with the names of the matching files. Then and only then does the echo command actually executes the command.
When you have 40K plus files, and you do grep *, you're expanding that * to the names of those 40,000 plus files before grep even has a chance to execute, and that's where the error message /usr/bin/grep: Argument list too long is coming from.
Fortunately, Unix has a way around this dilemma:
$ find . -name "*.kaks" -type f -maxdepth 1 | xargs grep -f A01/genes.txt
The find . -name "*.kaks" -type f -maxdepth 1 will find all of your *.kaks files, and the -depth 1 will only include files in the current directory. The -type f makes sure you only pick up files and not a directory.
The find command pipes the names of the files into xargs and xargs will append the names of the file to the grep -f A01/genes.txtcommand. However, xargs has a trick up it sleeve. It knows how long the command line buffer is, and will execute the grep when the command line buffer is full, then pass in another series of file to the grep. This way, grep gets executed maybe three or ten times (depending upon the size of the command line buffer), and all of our files are used.
Unfortunately, xargs uses whitespace as a separator for the file names. If your files contain spaces or tabs, you'll have trouble with xargs. Fortunately, there's another fix:
$ find . -name "*.kaks" -type f -maxdepth 1 -print0 | xargs -0 grep -f A01/genes.txt
The -print0 will cause find to print out the names of the files not separated by newlines, but by the NUL character. The -0 parameter for xargs tells xargs that the file separator isn't whitespace, but the NUL character. Thus, fixes the issue.
You could also do this too:
$ find . -name "*.kaks" -type f -maxdepth 1 -exec grep -f A01/genes.txt {} \;
This will execute the grep for each and every file found instead of what xargs does and only runs grep for all the files it can stuff on the command line. The advantage of this is that it avoids shell interference entirely. However, it may or may not be less efficient.
What would be interesting is to experiment and see which one is more efficient. You can use time to see:
$ time find . -name "*.kaks" -type f -maxdepth 1 -exec grep -f A01/genes.txt {} \;
This will execute the command and then tell you how long it took. Try it with the -exec and with xargs and see which is faster. Let us know what you find.
You can combine find with grep like this:
find . -maxdepth 1 -name '*.kaks' -exec grep -H -f A01/genes.txt '{}' \; > A01/A01.result.txt
you can use recursive feature of grep:
for i in $(cat A01/genes.txt); do
grep -r $i .
done > A01/A01.result.txt
though if you want to select only kaks files:
for i in $(cat A01/genes.txt); do
find . -iregex '.*\.kaks$' -exec grep $i \;
done > A01/A01.result.txt
Put another for loop inside your outer one:
for f in *.kaks; do
grep -H $i "$f"
done
By the way, are you interested in finding EVERY occurrence in each file, or merely if the search string exists in there one or more times? If it is "good enough" to know the string occurs in there one or more times you can specify "-n 1" to grep and it will not bother reading/searching the rest of the file after finding the first match, which could potentially save lots of time.
The following solution has worked for me:
Problem:
grep -r "example\.com" *
-bash: /bin/grep: Argument list too long
Solution:
grep -r "example\.com" .
["In newer versions of grep you can omit the “.“, as the current directory is implied."]
Source:
Reinlick, J. https://www.saotn.org/bash-grep-through-large-number-files-argument-list-too-long/

Bash find filter and copy - trouble with spaces

So after a lot of searching and trying to interpret others' questions and answers to my needs, I decided to ask for myself.
I'm trying to take a directory structure full of images and place all the images (regardless of extension) in a single folder. In addition to this, I want to be able to remove images matching certain filenames in the process. I have a find command working that outputs all the filepaths for me
find -type f -exec file -i -- {} + | grep -i image | sed 's/\:.*//'
but if I try to use that to copy files, I have trouble with the spaces in the filenames.
cp `find -type f -exec file -i -- {} + | grep -i image | sed 's/\:.*//'` out/
What am I doing wrong, and is there a better way to do this?
With the caveat that it won't work if files have newlines in their names:
find . -type f -exec file -i -- {} + |
awk -vFS=: -vOFS=: '$NF ~ /image/{NF--;printf "%s\0", $0}' |
xargs -0 cp -t out/
(Based on answer by Jonathan Leffler and subsequent comments discussion with him and #devnull.)
The find command works well if none of the file names contain any newlines. Within broad limits, the grep command works OK under the same circumstances. The sed command works fine as long as there are no colons in the file names. However, given that there are spaces in the names, the use of $(...) (command substitution, also indicated by back-ticks `...`) is a disaster. Unfortunately, xargs isn't readily a part of the solution; it splits on spaces by default. Because you have to run file and grep in the middle, you can't easily use the -print0 option to (GNU) find and the -0 option to (GNU) xargs.
In some respects, it is crude, but in many ways, it is easiest if you write an executable shell script that can be invoked by find:
#!/bin/bash
for file in "$#"
do
if file -i -- "$file" | grep -i -q "$file:.*image"
then cp "$file" out/
fi
done
This is a little painful in that it invokes file and grep separately for each name, but it is reliable. The file command is even safe if the file name contains a newline; the grep is probably not.
If that script is called 'copyimage.sh', then the find command becomes:
find . -type f -exec ./copyimage.sh {} +
And, given the way the grep command is written, the copyimage.sh file won't be copied, even though its name contains the magic word 'image'.
Pipe the results of your find command to
xargs -l --replace cp "{}" out/
Example of how this works for me on Ubuntu 10.04:
atomic#atomic-desktop:~/temp$ ls
img.png img space.png
atomic#atomic-desktop:~/temp$ mkdir out
atomic#atomic-desktop:~/temp$ find -type f -exec file -i \{\} \; | grep -i image | sed 's/\:.*//' | xargs -l --replace cp -v "{}" out/
`./img.png' -> `out/img.png'
`./img space.png' -> `out/img space.png'
atomic#atomic-desktop:~/temp$ ls out
img.png img space.png
atomic#atomic-desktop:~/temp$

Using find and xargs how can I stop execution on errors without crapping out

In my script I have the following 3 commands
Basically what it is trying to do is:
create a symlink to a certain bunch of files based on their filenames, in a temp directory.
change the name of the symlink to match the current date
move the symlinks from a temp directory to their proper location
-
find . -type f -name "*${regex}-*" -exec ln -s {} "${DataTempPath}/"{} \;
find "$DataTempPath" -type l | sed -e "p;s/A[0-9]*/A${today}/" | xargs -n2 mv
mv $DataTempPath/* $DataSetPath
This will be inserted as a cron job to run every 15 mins, which is not a problem when the source directory contains valid data.
However when it doesn't contain any files I get errors on the second find command and the mv command
What I want I guess is a way of not executing the last two lines of the script if the first one does not create any new links
GNU xargs supports a --no-run-if-empty parameter that, to quote the documentation "If the standard input is completely empty, do not run the command. By default, the command is run once even if there is no input".
This should help avoid the xargs error (assuming you are running GNU xargs)
check the status of the command:
find . -type f -name "*${regex}-*" -exec ln -s {} "${DataTempPath}/"{} \;
if [[ $? == 0 ]]; then
find "$DataTempPath" -type l | sed -e "p;s/A[0-9]*/A${today}/" | xargs -n2 mv
mv $DataTempPath/* $DataSetPath
fi

More elegant use of find for passing files grouped by directory?

This script has taken me too long (!!) to compile, but I finally have a reasonably nice script which does what I want:
find "$#" -type d -print0 | while IFS= read -r -d $'\0' dir; do
find "$dir" -iname '*.flac' -maxdepth 1 ! -exec bash -c '
metaflac --list --block-type=VORBIS_COMMENT "$0" 2>/dev/null | grep -i "REPLAYGAIN_ALBUM_PEAK" &>/dev/null
exit $?
' {} ';' -exec bash -c '
echo Adding ReplayGain tags to "$0"/\*.flac...
metaflac --add-replay-gain "${#:1}"
' "$dir" {} '+'
done
The purpose is to search the file tree for directories containing FLAC files, test whether any are missing the REPLAYGAIN_ALBUM_PEAK tag, and scan all the files in that directory for ReplayGain if they are missing.
The big stumbling block is that all the FLAC files for a given album must be passed to metaflac as one command, otherwise metaflac doesn't know they're all one album. As you can see, I've achieved this using find ... -exec ... +.
What I'm wondering is if there's a more elegant way to do this. In particular, how can I skip the while loop? Surely this should be unnecessary, because find is already iterating over the directories?
You can probably use xargs to achieve it.
For example, if you are looking for text foo in all your files you'll have something like
find . type f | xargs grep foo
xargs passes each result from left-end expression (find) to the right-end invokated command.
Then, if no command exists to achieve what you want to do, you can always create a function, and pass if to xargs
I can't comment on the flac commands themselves, but as for the rest:
find . -name '*.flac' \
! -exec bash -c 'metaflac --list --block-type=VORBIS_COMMENT "$1" | grep -qi "REPLAYGAIN_ALBUM_PEAK"' -- {} \; \
-execdir bash -c 'metaflac --add-replay-gain *.flac' \;
You just find the relevant files, and then treat the directory it's in.

Resources