Scaling up grep find and copy to large folder (xargs?) - bash

I would like to search a directory for any file that matches any of a list of words. If a file matches, I would like to copy that file into a new directory. I created a small batch of test files and got the following code working:
cp `grep -lir 'word\|word2\|word3\|word4\|word5' '/Users/originallocation'` '/Users/newlocation'
Unfortunately, when I run this code on a large folder with a few thousand files it says the argument list is too long for cp. I think I need to loop this or use a xargs but I can't figure out how to make the conversion.

The minimal change from what you have would be:
grep -lir 'word\|word2\|word3\|word4\|word5' '/Users/originallocation' | \
xargs cp -t '/Users/newlocation'
But, don't use that. Because you never know when you will encounter a filename with spaces or newlines in it, null-terminated strings should be used. On linux/GNU, add the -Z option to grep and -0 to xargs:
grep -Zlir 'word\|word2\|word3\|word4\|word5' '/Users/originallocation' | \
xargs -0 cp -t '/Users/newlocation'
On Macs (and AIX, HP-UX, Solaris, *BSD), the grep options change slightly but, more importantly, the GNU cp -t option is not available. A workaround is:
grep -lir --null 'word\|word2\|word3\|word4\|word5' '/Users/originallocation' | \
xargs -0 -I fname cp fname '/Users/newlocation'
This is less efficient because a new instance of cp has to be run for each file to be copied.

Alternative solution for those without grep -r. Using find + egrep + xargs , hope there is no file with same file name in different folders. Secondly, I replaced the ugly style of word\|word2\|word3\|word4\|word5
find . -type f -exec egrep -l 'word|word2|word3|word4|word5' {} \; |xargs -i cp {} /LARGE_FOLDER

Related

Error - script move files related to name file inside folder

Hi guys i was building a script to order my files related to my studies file, but i don't understand why the prompt give me this error
error 1.1
mv: cannot stat 'filefilefilefilefilefilefilefilefilefilefilefile.pdf'$'\n': File name too long
that's mean i have to rename all long files? exists an other way to prevent this error?
the example below it's the script that has generated the error
Script 1 - move all greped files that contain business inside their name file and move them inside auto_folder_business
mkdir -p /mnt/c/Users/alber/Desktop/testfileorder/auto_folder_business
ls /mnt/c/Users/alber/Desktop/testfileorder | egrep -i 'business.' | xargs -0 -I '{}' mv '{}' /mnt/c/Users/alber/Desktop/testfileorder/auto_folder_business
In the example above I had also this other error
error 1.2
xargs: unmatched single quote; by default quotes are special to xargs unless you use the -0 option
that i solved inserting -0 option,
despite this i tried to generalize this process writing this snippet
script 2 - move all greped files that contain the inserted keyword inside their name file and move them inside auto_folder_business
#!/bin/sh
read -p "file to order: --> " fetching_keyword
mypath=/mnt/c/Users/alber/Desktop/testfileorder/auto_folder_$fetching_keyword/
echo $mypath
mkdir -p $mypath
ls /mnt/c/Users/alber/Desktop/testfileorder |
egrep -i "$fetching_keyword" |
xargs -0 -I {} mv -n {} $mypath
also here I have an other error I think they are related
error 2
mv: cannot stat 'Statino (1).pdf'$'\n''Statino (2).pdf'$'\n''Statino (3).pdf'$'\n''Statino (4).pdf'$'\n''Statino.pdf'$'\n''auto_folder_statino'$'\n': No such file or directory
xargs: unmatched single quote; by default quotes are special to xargs unless you use the -0 option
I'm not understanding what I'm doing wrong...
xargs -0 expects to read null-terminated lines from standard input. ls and egrep are both writing newline-terminated lines, so their entire output is being read as a single input here.
The quickest fix is to utilize find -print0.
find /mnt/c/Users/alber/Desktop/testfileorder -iname "*${fetching_keyword}*" -print0 | \
xargs -0 -I {} mv -n {} "$mypath"
...but in this specific case, you might just want to use -exec at that point.
find /mnt/c/Users/alber/Desktop/testfileorder -iname "*${fetching_keyword}*" \
-exec mv -n {} "${mypath}" \;

How to copy files found with grep on OSX

I'm wanting to copy files I've found with grep on an OSX system, where the cp command doesn't have a -t option.
A previous posts' solution for doing something like this relied on the -t flag in cp. However, like that poster, I want to take the file list I receive from grep and then execute a command over it, something like:
grep -lr "foo" --include=*.txt * 2>/dev/null | xargs cp -t /path/to/targetdir
Less efficient than cp -t, but this works:
grep -lr "foo" --include=*.txt * 2>/dev/null |
xargs -I{} cp "{}" /path/to/targetdir
Explanation:
For filenames | xargs cp -t destination, xargs changes the incoming filenames into this format:
cp -t destination filename1 ... filenameN
i.e., it only runs cp once (actually, once for every few thousand filenames -- xargs breaks the command line up if it would be too long for the shell).
For filenames | xargs -I{} cp "{}" destination, on the other hand, xargs changes the incoming filenames into this format:
cp "filename1" destination
...
cp "filenameN" destination
i.e., it runs cp once for each incoming filename, which is much slower. For a large number (e.g., >10k) of very small (e.g., <10k) files, I'd guess it could even be thousands of times slower. But it does work :)
PS: Another popular technique is use find's exec function instead of xargs, e.g., https://stackoverflow.com/a/5241677/1563960
Yet another option is, if you have admin privileges or can persuade your sysadmin, to install the coreutils package as suggested here, and follow the steps but for cp rather than ls.

How to find many files from txt file in directory and subdirectories, then copy all to new folder

I can't find posts that help with this exact problem:
On Mac Terminal I want to read a txt file (example.txt) containing file names such as:
20130815 144129 865 000000 0172 0780.bmp
20130815 144221 511 000003 1068 0408.bmp
....100 more
And I want to search for them in a certain folder/subfolders (example_folder). After each find, the file should be copied to a new folder x (new_destination).
Your help would be much appreciated!
Chers,
Mo
You could use a piped command with a combination of ls, grep, xargs and cp.
So basically you start with getting the list of files
ls
then you filter them with egrep -e, grep -e or whatever flavor of grep Mac uses for their terminal. If you want to find all files ending with text you can use the regex .txt$ (which means ends with '.txt')
ls | egrep -e "yourRegexExpression"
After that you get an input stream, but cp doesn't work with input streams and only takes a bunch of arguments, that's why we use xargs to convert it to arguments. The final step is to add the flag -t to the argument to signify that the next argument is the target directory.
ls | egrep -e "yourRegexExpression" | xargs cp -t DIRECTORY
I hope this helps!
Edit
Sorry I didn't read the question well enough, I updated to be match your problem. Here you can see that the egrep command compiles a rather large regex string with all the file names in this way (filename1|filename2|...|fileN). The $() evaluates the command inside and uses the tr to translate newLines to "|" for the regex.
ls | egrep -e "("$(cat yourtextfile.txt | tr "\n" "|")")" | xargs cp -t DIRECTORY
You could do something like:
$ for i in `cat example.txt`
find /search/path -type f -name "$i" -exec cp "{}" /new/path \;
This is how it works, for every line within example.txt:
for i in `cat example.txt`
it will try to find a file matching the line $i in the defined path:
find /search/path -type f -name "$i"
And if found it will copy it to the desired location:
-exec cp "{}" /new/path \;

Changing file content using sed in bash [duplicate]

How do I find and replace every occurrence of:
subdomainA.example.com
with
subdomainB.example.com
in every text file under the /home/www/ directory tree recursively?
find /home/www \( -type d -name .git -prune \) -o -type f -print0 | xargs -0 sed -i 's/subdomainA\.example\.com/subdomainB.example.com/g'
-print0 tells find to print each of the results separated by a null character, rather than a new line. In the unlikely event that your directory has files with newlines in the names, this still lets xargs work on the correct filenames.
\( -type d -name .git -prune \) is an expression which completely skips over all directories named .git. You could easily expand it, if you use SVN or have other folders you want to preserve -- just match against more names. It's roughly equivalent to -not -path .git, but more efficient, because rather than checking every file in the directory, it skips it entirely. The -o after it is required because of how -prune actually works.
For more information, see man find.
The simplest way for me is
grep -rl oldtext . | xargs sed -i 's/oldtext/newtext/g'
Note: Do not run this command on a folder including a git repo - changes to .git could corrupt your git index.
find /home/www/ -type f -exec \
sed -i 's/subdomainA\.example\.com/subdomainB.example.com/g' {} +
Compared to other answers here, this is simpler than most and uses sed instead of perl, which is what the original question asked for.
All the tricks are almost the same, but I like this one:
find <mydir> -type f -exec sed -i 's/<string1>/<string2>/g' {} +
find <mydir>: look up in the directory.
-type f:
File is of type: regular file
-exec command {} +:
This variant of the -exec action runs the specified command on the selected files, but the command line is built by appending
each selected file name at the end; the total number of invocations of the command will be much less than the number of
matched files. The command line is built in much the same way that xargs builds its command lines. Only one instance of
`{}' is allowed within the command. The command is executed in the starting directory.
For me the easiest solution to remember is https://stackoverflow.com/a/2113224/565525, i.e.:
sed -i '' -e 's/subdomainA/subdomainB/g' $(find /home/www/ -type f)
NOTE: -i '' solves OSX problem sed: 1: "...": invalid command code .
NOTE: If there are too many files to process you'll get Argument list too long. The workaround - use find -exec or xargs solution described above.
cd /home/www && find . -type f -print0 |
xargs -0 perl -i.bak -pe 's/subdomainA\.example\.com/subdomainB.example.com/g'
For anyone using silver searcher (ag)
ag SearchString -l0 | xargs -0 sed -i 's/SearchString/Replacement/g'
Since ag ignores git/hg/svn file/folders by default, this is safe to run inside a repository.
This one is compatible with git repositories, and a bit simpler:
Linux:
git grep -l 'original_text' | xargs sed -i 's/original_text/new_text/g'
Mac:
git grep -l 'original_text' | xargs sed -i '' -e 's/original_text/new_text/g'
(Thanks to http://blog.jasonmeridth.com/posts/use-git-grep-to-replace-strings-in-files-in-your-git-repository/)
To cut down on files to recursively sed through, you could grep for your string instance:
grep -rl <oldstring> /path/to/folder | xargs sed -i s^<oldstring>^<newstring>^g
If you run man grep you'll notice you can also define an --exlude-dir="*.git" flag if you want to omit searching through .git directories, avoiding git index issues as others have politely pointed out.
Leading you to:
grep -rl --exclude-dir="*.git" <oldstring> /path/to/folder | xargs sed -i s^<oldstring>^<newstring>^g
A straight forward method if you need to exclude directories (--exclude-dir=..folder) and also might have file names with spaces (solved by using 0Byte for both grep -Z and xargs -0)
grep -rlZ oldtext . --exclude-dir=.folder | xargs -0 sed -i 's/oldtext/newtext/g'
An one nice oneliner as an extra. Using git grep.
git grep -lz 'subdomainA.example.com' | xargs -0 perl -i'' -pE "s/subdomainA.example.com/subdomainB.example.com/g"
Simplest way to replace (all files, directory, recursive)
find . -type f -not -path '*/\.*' -exec sed -i 's/foo/bar/g' {} +
Note: Sometimes you might need to ignore some hidden files i.e. .git, you can use above command.
If you want to include hidden files use,
find . -type f -exec sed -i 's/foo/bar/g' {} +
In both case the string foo will be replaced with new string bar
find /home/www/ -type f -exec perl -i.bak -pe 's/subdomainA\.example\.com/subdomainB.example.com/g' {} +
find /home/www/ -type f will list all files in /home/www/ (and its subdirectories).
The "-exec" flag tells find to run the following command on each file found.
perl -i.bak -pe 's/subdomainA\.example\.com/subdomainB.example.com/g' {} +
is the command run on the files (many at a time). The {} gets replaced by file names.
The + at the end of the command tells find to build one command for many filenames.
Per the find man page:
"The command line is built in much the same way that
xargs builds its command lines."
Thus it's possible to achieve your goal (and handle filenames containing spaces) without using xargs -0, or -print0.
I just needed this and was not happy with the speed of the available examples. So I came up with my own:
cd /var/www && ack-grep -l --print0 subdomainA.example.com | xargs -0 perl -i.bak -pe 's/subdomainA\.example\.com/subdomainB.example.com/g'
Ack-grep is very efficient on finding relevant files. This command replaced ~145 000 files with a breeze whereas others took so long I couldn't wait until they finish.
or use the blazing fast GNU Parallel:
grep -rl oldtext . | parallel sed -i 's/oldtext/newtext/g' {}
grep -lr 'subdomainA.example.com' | while read file; do sed -i "s/subdomainA.example.com/subdomainB.example.com/g" "$file"; done
I guess most people don't know that they can pipe something into a "while read file" and it avoids those nasty -print0 args, while presevering spaces in filenames.
Further adding an echo before the sed allows you to see what files will change before actually doing it.
Try this:
sed -i 's/subdomainA/subdomainB/g' `grep -ril 'subdomainA' *`
According to this blog post:
find . -type f | xargs perl -pi -e 's/oldtext/newtext/g;'
#!/usr/local/bin/bash -x
find * /home/www -type f | while read files
do
sedtest=$(sed -n '/^/,/$/p' "${files}" | sed -n '/subdomainA/p')
if [ "${sedtest}" ]
then
sed s'/subdomainA/subdomainB/'g "${files}" > "${files}".tmp
mv "${files}".tmp "${files}"
fi
done
If you do not mind using vim together with grep or find tools, you could follow up the answer given by user Gert in this link --> How to do a text replacement in a big folder hierarchy?.
Here's the deal:
recursively grep for the string that you want to replace in a certain path, and take only the complete path of the matching file. (that would be the $(grep 'string' 'pathname' -Rl).
(optional) if you want to make a pre-backup of those files on centralized directory maybe you can use this also: cp -iv $(grep 'string' 'pathname' -Rl) 'centralized-directory-pathname'
after that you can edit/replace at will in vim following a scheme similar to the one provided on the link given:
:bufdo %s#string#replacement#gc | update
You can use awk to solve this as below,
for file in `find /home/www -type f`
do
awk '{gsub(/subdomainA.example.com/,"subdomainB.example.com"); print $0;}' $file > ./tempFile && mv ./tempFile $file;
done
hope this will help you !!!
For replace all occurrences in a git repository you can use:
git ls-files -z | xargs -0 sed -i 's/subdomainA\.example\.com/subdomainB.example.com/g'
See List files in local git repo? for other options to list all files in a repository. The -z options tells git to separate the file names with a zero byte, which assures that xargs (with the option -0) can separate filenames, even if they contain spaces or whatnot.
A bit old school but this worked on OS X.
There are few trickeries:
• Will only edit files with extension .sls under the current directory
• . must be escaped to ensure sed does not evaluate them as "any character"
• , is used as the sed delimiter instead of the usual /
Also note this is to edit a Jinja template to pass a variable in the path of an import (but this is off topic).
First, verify your sed command does what you want (this will only print the changes to stdout, it will not change the files):
for file in $(find . -name *.sls -type f); do echo -e "\n$file: "; sed 's,foo\.bar,foo/bar/\"+baz+\"/,g' $file; done
Edit the sed command as needed, once you are ready to make changes:
for file in $(find . -name *.sls -type f); do echo -e "\n$file: "; sed -i '' 's,foo\.bar,foo/bar/\"+baz+\"/,g' $file; done
Note the -i '' in the sed command, I did not want to create a backup of the original files (as explained in In-place edits with sed on OS X or in Robert Lujo's comment in this page).
Happy seding folks!
just to avoid to change also
NearlysubdomainA.example.com
subdomainA.example.comp.other
but still
subdomainA.example.com.IsIt.good
(maybe not good in the idea behind domain root)
find /home/www/ -type f -exec sed -i 's/\bsubdomainA\.example\.com\b/\1subdomainB.example.com\2/g' {} \;
Here's a version that should be more general than most; it doesn't require find (using du instead), for instance. It does require xargs, which are only found in some versions of Plan 9 (like 9front).
du -a | awk -F' ' '{ print $2 }' | xargs sed -i -e 's/subdomainA\.example\.com/subdomainB.example.com/g'
If you want to add filters like file extensions use grep:
du -a | grep "\.scala$" | awk -F' ' '{ print $2 }' | xargs sed -i -e 's/subdomainA\.example\.com/subdomainB.example.com/g'
For Qshell (qsh) on IBMi, not bash as tagged by OP.
Limitations of qsh commands:
find does not have the -print0 option
xargs does not have -0 option
sed does not have -i option
Thus the solution in qsh:
PATH='your/path/here'
SEARCH=\'subdomainA.example.com\'
REPLACE=\'subdomainB.example.com\'
for file in $( find ${PATH} -P -type f ); do
TEMP_FILE=${file}.${RANDOM}.temp_file
if [ ! -e ${TEMP_FILE} ]; then
touch -C 819 ${TEMP_FILE}
sed -e 's/'$SEARCH'/'$REPLACE'/g' \
< ${file} > ${TEMP_FILE}
mv ${TEMP_FILE} ${file}
fi
done
Caveats:
Solution excludes error handling
Not Bash as tagged by OP
If you wanted to use this without completely destroying your SVN repository, you can tell 'find' to ignore all hidden files by doing:
find . \( ! -regex '.*/\..*' \) -type f -print0 | xargs -0 sed -i 's/subdomainA.example.com/subdomainB.example.com/g'
Using combination of grep and sed
for pp in $(grep -Rl looking_for_string)
do
sed -i 's/looking_for_string/something_other/g' "${pp}"
done
perl -p -i -e 's/oldthing/new_thingy/g' `grep -ril oldthing *`
to change multiple files (and saving a backup as *.bak):
perl -p -i -e "s/\|/x/g" *
will take all files in directory and replace | with x
called a “Perl pie” (easy as a pie)

Bash find filter and copy - trouble with spaces

So after a lot of searching and trying to interpret others' questions and answers to my needs, I decided to ask for myself.
I'm trying to take a directory structure full of images and place all the images (regardless of extension) in a single folder. In addition to this, I want to be able to remove images matching certain filenames in the process. I have a find command working that outputs all the filepaths for me
find -type f -exec file -i -- {} + | grep -i image | sed 's/\:.*//'
but if I try to use that to copy files, I have trouble with the spaces in the filenames.
cp `find -type f -exec file -i -- {} + | grep -i image | sed 's/\:.*//'` out/
What am I doing wrong, and is there a better way to do this?
With the caveat that it won't work if files have newlines in their names:
find . -type f -exec file -i -- {} + |
awk -vFS=: -vOFS=: '$NF ~ /image/{NF--;printf "%s\0", $0}' |
xargs -0 cp -t out/
(Based on answer by Jonathan Leffler and subsequent comments discussion with him and #devnull.)
The find command works well if none of the file names contain any newlines. Within broad limits, the grep command works OK under the same circumstances. The sed command works fine as long as there are no colons in the file names. However, given that there are spaces in the names, the use of $(...) (command substitution, also indicated by back-ticks `...`) is a disaster. Unfortunately, xargs isn't readily a part of the solution; it splits on spaces by default. Because you have to run file and grep in the middle, you can't easily use the -print0 option to (GNU) find and the -0 option to (GNU) xargs.
In some respects, it is crude, but in many ways, it is easiest if you write an executable shell script that can be invoked by find:
#!/bin/bash
for file in "$#"
do
if file -i -- "$file" | grep -i -q "$file:.*image"
then cp "$file" out/
fi
done
This is a little painful in that it invokes file and grep separately for each name, but it is reliable. The file command is even safe if the file name contains a newline; the grep is probably not.
If that script is called 'copyimage.sh', then the find command becomes:
find . -type f -exec ./copyimage.sh {} +
And, given the way the grep command is written, the copyimage.sh file won't be copied, even though its name contains the magic word 'image'.
Pipe the results of your find command to
xargs -l --replace cp "{}" out/
Example of how this works for me on Ubuntu 10.04:
atomic#atomic-desktop:~/temp$ ls
img.png img space.png
atomic#atomic-desktop:~/temp$ mkdir out
atomic#atomic-desktop:~/temp$ find -type f -exec file -i \{\} \; | grep -i image | sed 's/\:.*//' | xargs -l --replace cp -v "{}" out/
`./img.png' -> `out/img.png'
`./img space.png' -> `out/img space.png'
atomic#atomic-desktop:~/temp$ ls out
img.png img space.png
atomic#atomic-desktop:~/temp$

Resources