How to copy files found with grep on OSX - macos

I'm wanting to copy files I've found with grep on an OSX system, where the cp command doesn't have a -t option.
A previous posts' solution for doing something like this relied on the -t flag in cp. However, like that poster, I want to take the file list I receive from grep and then execute a command over it, something like:
grep -lr "foo" --include=*.txt * 2>/dev/null | xargs cp -t /path/to/targetdir

Less efficient than cp -t, but this works:
grep -lr "foo" --include=*.txt * 2>/dev/null |
xargs -I{} cp "{}" /path/to/targetdir
Explanation:
For filenames | xargs cp -t destination, xargs changes the incoming filenames into this format:
cp -t destination filename1 ... filenameN
i.e., it only runs cp once (actually, once for every few thousand filenames -- xargs breaks the command line up if it would be too long for the shell).
For filenames | xargs -I{} cp "{}" destination, on the other hand, xargs changes the incoming filenames into this format:
cp "filename1" destination
...
cp "filenameN" destination
i.e., it runs cp once for each incoming filename, which is much slower. For a large number (e.g., >10k) of very small (e.g., <10k) files, I'd guess it could even be thousands of times slower. But it does work :)
PS: Another popular technique is use find's exec function instead of xargs, e.g., https://stackoverflow.com/a/5241677/1563960

Yet another option is, if you have admin privileges or can persuade your sysadmin, to install the coreutils package as suggested here, and follow the steps but for cp rather than ls.

Related

Copy files that have at least the mention of one certain word

I want to look through 100K+ text files from a directory and copy to another directory only the ones which contain at least one word from a list.
I tried doing an if statement with grep and cp but I have no idea how to make it to work this way.
for filename in *.txt
do
grep -o -i "cultiv" "protec" "agricult" $filename|wc -w
if [ wc -gt 0 ]
then cp $filename ~/desktop/filepath
fi
done
Obviously this does not work but I have no idea how to store the wc result and then compare it to 0 and only act on those files.
Use the -l option to have grep print all the filenames that match the pattern. Then use xargs to pass these as arguments to cp.
grep -l -E -i 'cultiv|protec|agricult' *.txt | xargs cp -t ~/desktop/filepath --
The -t option is a GNU cp extension, it allows you to put the destination directory first so that it will work with xargs.
If you're using a version without that option, you need to use the -J option to xargs to substitute in the middle of the command.
grep -l -E -i 'cultiv|protec|agricult' *.txt | xargs -J {} cp -- {} ~/desktop/filepath

Scaling up grep find and copy to large folder (xargs?)

I would like to search a directory for any file that matches any of a list of words. If a file matches, I would like to copy that file into a new directory. I created a small batch of test files and got the following code working:
cp `grep -lir 'word\|word2\|word3\|word4\|word5' '/Users/originallocation'` '/Users/newlocation'
Unfortunately, when I run this code on a large folder with a few thousand files it says the argument list is too long for cp. I think I need to loop this or use a xargs but I can't figure out how to make the conversion.
The minimal change from what you have would be:
grep -lir 'word\|word2\|word3\|word4\|word5' '/Users/originallocation' | \
xargs cp -t '/Users/newlocation'
But, don't use that. Because you never know when you will encounter a filename with spaces or newlines in it, null-terminated strings should be used. On linux/GNU, add the -Z option to grep and -0 to xargs:
grep -Zlir 'word\|word2\|word3\|word4\|word5' '/Users/originallocation' | \
xargs -0 cp -t '/Users/newlocation'
On Macs (and AIX, HP-UX, Solaris, *BSD), the grep options change slightly but, more importantly, the GNU cp -t option is not available. A workaround is:
grep -lir --null 'word\|word2\|word3\|word4\|word5' '/Users/originallocation' | \
xargs -0 -I fname cp fname '/Users/newlocation'
This is less efficient because a new instance of cp has to be run for each file to be copied.
Alternative solution for those without grep -r. Using find + egrep + xargs , hope there is no file with same file name in different folders. Secondly, I replaced the ugly style of word\|word2\|word3\|word4\|word5
find . -type f -exec egrep -l 'word|word2|word3|word4|word5' {} \; |xargs -i cp {} /LARGE_FOLDER

What is causing these whitespace problems on Mac OS X bash?

I am trying to untar several files at once (I know I can do it differently but I want to make this work because it should work).
So I do:
ls -1 *.gz | xargs tar xf
This produces one command from xargs with all files, and fails. -1 is optional - fails the same way without it.
In fact,
ls -1 *.gz | xargs -t echo
echo host1_logs.tar.gz host2_logs.tar.gz host3_logs.tar.gz host5_logs.tar.gz
host1_logs.tar.gz host2_logs.tar.gz host3_logs.tar.gz host5_logs.tar.gz
I tried unsetting IFS, setting it to newline.
How do I make xargs on Mac OS X actually work?
Bonus question, in light of linebreak/loop problems I had with other commands before I wonder if Mac terminal is just broken and I should replace all utilities with GNU.
Use the -n argument to force xargs to run the given command with only a single argument:
ls -1 *.gz | xargs -n 1 echo
Otherwise, it tries to use each line from the input as a separate argument to the same invocation of the command.
Note that this will fail if any of the matched file names contain newlines, since ls has no way of producing output that distinguishes such names from a sequence of newline-free file names. (That is, there is no option to ls similar to the -print0 argument to find, which is commonly used in pipelines like find ... -print0 | xargs -0 to guard against such file names.)
Your question implies that you realize that you could do something like:
for f in *.gz; do
tar xf "$f"
done
which is unlikely to be noticeably slower than any attempt at using xargs. In each case the process of spawning multiple tar processes is likely to outweigh any differences in looping in bash vs the internal loop in xargs.
Basically you are passing all filenames to tar at once, which is not what you want as you have noticed. The above xargs -n 1 answer is neater, but you could also use the -I flag to run the tar command for each of the arguments (also useful for multi-parameter commands like mvor cp):
ls *.gz | xargs -I {} tar xzvf {}

Bash find filter and copy - trouble with spaces

So after a lot of searching and trying to interpret others' questions and answers to my needs, I decided to ask for myself.
I'm trying to take a directory structure full of images and place all the images (regardless of extension) in a single folder. In addition to this, I want to be able to remove images matching certain filenames in the process. I have a find command working that outputs all the filepaths for me
find -type f -exec file -i -- {} + | grep -i image | sed 's/\:.*//'
but if I try to use that to copy files, I have trouble with the spaces in the filenames.
cp `find -type f -exec file -i -- {} + | grep -i image | sed 's/\:.*//'` out/
What am I doing wrong, and is there a better way to do this?
With the caveat that it won't work if files have newlines in their names:
find . -type f -exec file -i -- {} + |
awk -vFS=: -vOFS=: '$NF ~ /image/{NF--;printf "%s\0", $0}' |
xargs -0 cp -t out/
(Based on answer by Jonathan Leffler and subsequent comments discussion with him and #devnull.)
The find command works well if none of the file names contain any newlines. Within broad limits, the grep command works OK under the same circumstances. The sed command works fine as long as there are no colons in the file names. However, given that there are spaces in the names, the use of $(...) (command substitution, also indicated by back-ticks `...`) is a disaster. Unfortunately, xargs isn't readily a part of the solution; it splits on spaces by default. Because you have to run file and grep in the middle, you can't easily use the -print0 option to (GNU) find and the -0 option to (GNU) xargs.
In some respects, it is crude, but in many ways, it is easiest if you write an executable shell script that can be invoked by find:
#!/bin/bash
for file in "$#"
do
if file -i -- "$file" | grep -i -q "$file:.*image"
then cp "$file" out/
fi
done
This is a little painful in that it invokes file and grep separately for each name, but it is reliable. The file command is even safe if the file name contains a newline; the grep is probably not.
If that script is called 'copyimage.sh', then the find command becomes:
find . -type f -exec ./copyimage.sh {} +
And, given the way the grep command is written, the copyimage.sh file won't be copied, even though its name contains the magic word 'image'.
Pipe the results of your find command to
xargs -l --replace cp "{}" out/
Example of how this works for me on Ubuntu 10.04:
atomic#atomic-desktop:~/temp$ ls
img.png img space.png
atomic#atomic-desktop:~/temp$ mkdir out
atomic#atomic-desktop:~/temp$ find -type f -exec file -i \{\} \; | grep -i image | sed 's/\:.*//' | xargs -l --replace cp -v "{}" out/
`./img.png' -> `out/img.png'
`./img space.png' -> `out/img space.png'
atomic#atomic-desktop:~/temp$ ls out
img.png img space.png
atomic#atomic-desktop:~/temp$

Why ls command combined with xargs and cp move only 10 files?

I have a command that copies file from one dir to another
FILE_COLLECTOR_PATH="/var/www/";
FILE_BACKUP_PATH='/home/'
ls $FILE_COLLECTOR_PATH | head -${1} | xargs -i basename {} | xargs -t -i cp $FILE_COLLECTOR_PATH{} "${FILE_BACKUP_PATH}{}-`date +%F%H%M%S%N`"
I loop it in a shell script like,
#!/bin/sh
SLEEP=120
FILE_COLLECTOR_PATH="/var/www/";
FILE_BACKUP_PATH='/home/'
while true
do
ls $FILE_COLLECTOR_PATH | head -${1} | xargs -i basename {} | xargs -t -i cp $FILE_COLLECTOR_PATH{} "${FILE_BACKUP_PATH}{}-`date +%F%H%M%S%N`"
sleep ${SLEEP}
done
But it seems to move only 10 files and not all files in the dir, Why? It should suppose to move all files.
In general, don't try to parse the output of ls in a script. You can end up with many different types of subtle problems. There is almost always a better tool for the job. Many times, this tool is find. For example, to generate a list of all of the files in a directory and do something to each of them, you would do something like this:
find <search directory> -maxdepth 1 -type f -print0 | xargs -0i basename {} ...
The -print0 and -0 arguments allow find and xargs to communicate filenames in a way that handles special characters (like spaces) correctly.
The find command has other options that you may find useful in a backup script (which is what it appears you are building). Options like -mmin and -newer will enable you to only back up files that have changed since the last iteration.
Try doing
ls -1
instead of just ls, because ls by default don't displays files on a newline (tail expect newlines) for each files when ls -1 does.

Resources