Copy files that have at least the mention of one certain word - bash

I want to look through 100K+ text files from a directory and copy to another directory only the ones which contain at least one word from a list.
I tried doing an if statement with grep and cp but I have no idea how to make it to work this way.
for filename in *.txt
do
grep -o -i "cultiv" "protec" "agricult" $filename|wc -w
if [ wc -gt 0 ]
then cp $filename ~/desktop/filepath
fi
done
Obviously this does not work but I have no idea how to store the wc result and then compare it to 0 and only act on those files.

Use the -l option to have grep print all the filenames that match the pattern. Then use xargs to pass these as arguments to cp.
grep -l -E -i 'cultiv|protec|agricult' *.txt | xargs cp -t ~/desktop/filepath --
The -t option is a GNU cp extension, it allows you to put the destination directory first so that it will work with xargs.
If you're using a version without that option, you need to use the -J option to xargs to substitute in the middle of the command.
grep -l -E -i 'cultiv|protec|agricult' *.txt | xargs -J {} cp -- {} ~/desktop/filepath

Related

How to copy files found with grep on OSX

I'm wanting to copy files I've found with grep on an OSX system, where the cp command doesn't have a -t option.
A previous posts' solution for doing something like this relied on the -t flag in cp. However, like that poster, I want to take the file list I receive from grep and then execute a command over it, something like:
grep -lr "foo" --include=*.txt * 2>/dev/null | xargs cp -t /path/to/targetdir
Less efficient than cp -t, but this works:
grep -lr "foo" --include=*.txt * 2>/dev/null |
xargs -I{} cp "{}" /path/to/targetdir
Explanation:
For filenames | xargs cp -t destination, xargs changes the incoming filenames into this format:
cp -t destination filename1 ... filenameN
i.e., it only runs cp once (actually, once for every few thousand filenames -- xargs breaks the command line up if it would be too long for the shell).
For filenames | xargs -I{} cp "{}" destination, on the other hand, xargs changes the incoming filenames into this format:
cp "filename1" destination
...
cp "filenameN" destination
i.e., it runs cp once for each incoming filename, which is much slower. For a large number (e.g., >10k) of very small (e.g., <10k) files, I'd guess it could even be thousands of times slower. But it does work :)
PS: Another popular technique is use find's exec function instead of xargs, e.g., https://stackoverflow.com/a/5241677/1563960
Yet another option is, if you have admin privileges or can persuade your sysadmin, to install the coreutils package as suggested here, and follow the steps but for cp rather than ls.

How to pipe mdfind to grep with a pattern and then cp

I have been trying to come up with a mdfind to locate certain files. I am not using find because it takes too long to search across a windows drive and I am on a Mac. I have indexed using mdutil and now simply want to search for files with the pattern where the file in the path starts with example. "/Volumes/DRIVE/SOME/PATH/DAD14-BLAH-BLAH.jpg". There must be a simpler way to use mdfind to look for a jpg greater than 500k and grep the path with a pattern? Below is the code I have come up with but no results are returned. Any help is deeply appreciated.
cat filelist.txt | while read -r FILE;
do mdfind -onlyin /Volumes/DRIVE/ 'kMDItemKind = "*image" && kMDItemFSSize > 500000' -name "$FILE" -0
| xargs -0 -I{} grep -i -E '.*\/[a-zA-Z]{1,3}[0-9]+.*\.(jpe?g|png|tiff?|psd)' {}
| xargs -0 -I{} cp -a {} ./images; done;
Bass
You don't want to use xargs for the grep command. Doing so means grepping the contents of the found files for matches of the pattern. You want to actually grep the output of mdfind.
That also means you don't want to use -0 with mdfind. You want each file path to be on a separate line, since grep is going to output the matching lines. Therefore, you don't want to use -0 with the final xargs command, either.
You probably want to require that the extension is at the end of the string. And you want the explicit slash (/) in your pattern to be the last slash in the string.
cat filelist.txt | while read -r FILE;
do mdfind -onlyin /Volumes/DRIVE/ 'kMDItemKind = "*image" && kMDItemFSSize > 500000' -name "$FILE"
| grep -i -E '.*\/[a-zA-Z]{1,3}[0-9]+[^/]*\.(jpe?g|png|tiff?|psd)$'
| xargs -I{} cp -a {} ./images; done;

How do I use grep to search the current directory for all files having a given string and then move these files to a new folder?

I have managed to do this separately using
grep -r "zone 19" path
mkdir zone19
find . -name "ListOfFilesfromGrep" -exec mv -i {} zone19 \;
I just don't know how to combine the two, that is, how to input the list of files I get from grep into the find command.
You should use grep from within find:
find /path/to/dir -type f -exec grep -q "zone 19" {} \; -exec mv -i {} zone19 \;
You could try
grep -lr "zone 19" path | while read in ; do mv -i "$in" zone19; done
-l prints the filenames with matched string; while ... done move the files one by one.
Using GNU versions of the standard tools:
grep -l will give you the filenames.
mv -t will move to a given directory.
xargs -r will invoke a command using arguments from stdin, but only if there's at least one.
Combine them like this:
grep -l -r -e 'zone 19' path | xargs -r mv -i -t 'zone19'
Or (if your filenames might contain newlines etc):
grep -lZr -e 'zone 19' path | xargs -0r mv -it 'zone19'
You can pipe the result from grep and use xargs:
grep -lr "zone 19" path | xargs <command>
<command> will be applied on each result of grep. Note thta -o flag tells grep to show only matching parts.
Below is the command to move all files containing string "Hello" to folder zone19.
grep Hello * |cut -f1 -d":"|sort -u|xargs -I {} mv {} zone19

Scaling up grep find and copy to large folder (xargs?)

I would like to search a directory for any file that matches any of a list of words. If a file matches, I would like to copy that file into a new directory. I created a small batch of test files and got the following code working:
cp `grep -lir 'word\|word2\|word3\|word4\|word5' '/Users/originallocation'` '/Users/newlocation'
Unfortunately, when I run this code on a large folder with a few thousand files it says the argument list is too long for cp. I think I need to loop this or use a xargs but I can't figure out how to make the conversion.
The minimal change from what you have would be:
grep -lir 'word\|word2\|word3\|word4\|word5' '/Users/originallocation' | \
xargs cp -t '/Users/newlocation'
But, don't use that. Because you never know when you will encounter a filename with spaces or newlines in it, null-terminated strings should be used. On linux/GNU, add the -Z option to grep and -0 to xargs:
grep -Zlir 'word\|word2\|word3\|word4\|word5' '/Users/originallocation' | \
xargs -0 cp -t '/Users/newlocation'
On Macs (and AIX, HP-UX, Solaris, *BSD), the grep options change slightly but, more importantly, the GNU cp -t option is not available. A workaround is:
grep -lir --null 'word\|word2\|word3\|word4\|word5' '/Users/originallocation' | \
xargs -0 -I fname cp fname '/Users/newlocation'
This is less efficient because a new instance of cp has to be run for each file to be copied.
Alternative solution for those without grep -r. Using find + egrep + xargs , hope there is no file with same file name in different folders. Secondly, I replaced the ugly style of word\|word2\|word3\|word4\|word5
find . -type f -exec egrep -l 'word|word2|word3|word4|word5' {} \; |xargs -i cp {} /LARGE_FOLDER

grep a pattern in list of zip files recursively

I am using the following command on command line for getting the pattern matched lines.
find . -name "*.gz"|xargs gzcat|grep -e "pattern1" -e "pattern2"
i need now to find only the file names where the pattern is present.
how can i do it on command line?
grel -l has no use since i am using xargs gzcat before grep
Check if you have zgrep available. And then, if yes:
find . -name '*.gz' -exec zgrep -l -e ".." -e ".." {} +
If you don't have it - well, just copy it from some machine that has it (all linuxes I use have it by default) - it's a simple bash script.
ripgrep
Use ripgrep, for example, it's very efficient, especially for large files:
rg -z -e "pattern1" -e "pattern2" *.gz
or:
rg -z "pattern1|pattern2" .
or:
rg -zf pattern.file .
Where pattern.file is a file containing all your patterns separated by a new line character.
-z/--search-zip Search in compressed files (such as gz, bz2, xz, and lzma).
for i in $(find . -name "*.gz"); do gzcat $i|grep -qe "n1" -e "n2" && echo $i; done
Untested; does everything inside find so if you have loads of gz files you wont have performance problems as runs each gzcat/grep as soon as it finds files nothing is piped out:
find . -iname '*.gz' -exec bash -c 'gzcat $1 | grep -q -e "pattern1" -e "pattern2" && echo $1' {} {} \;
In bash, I'd do something like this (untested):
find . -name '*.gz' | while read f ; do gzcat $f | grep -q -e "pattern1" -e "pattern2" && echo $f ; done
grep/zgrep/zegrep
Use zgrep or zegrep to look for pattern in compressed files using their uncompressed contents (both GNU/Linux and BSD/Unix).
On Unix, you can also use grep (which is BSD version) with -Z, including -z on macOS.
Few examples:
zgrep -E -r "pattern1|pattern2|pattern3" .
zegrep "pattern1|pattern2|pattern3" **/*.gz
grep -z -e "pattern1" -e "pattern2" *.gz # BSD/Unix only.
Note: When you've globbing option enabled, ** checks the files recursively, otherwise use -r.
-R/-r/--recursive Recursively search subdirectories listed.
-E/--extended-regexp Interpret pattern as an extended regular expression (like egrep).
-Z (BSD), -z/--decompress (BSD/macOS) Force grep to behave as zgrep.

Resources