Copy all files with a certain extension from all subdirectories - bash

Under unix, I want to copy all files with a certain extension (all excel files) from all subdirectories to another directory. I have the following command:
cp --parents `find -name \*.xls*` /target_directory/
The problems with this command are:
It copies the directory structure as well, and I only want the files (so all files should end up in /target_directory/)
It does not copy files with spaces in the filenames (which are quite a few)
Any solutions for these problems?

--parents is copying the directory structure, so you should get rid of that.
The way you've written this, the find executes, and the output is put onto the command line such that cp can't distinguish between the spaces separating the filenames, and the spaces within the filename. It's better to do something like
$ find . -name \*.xls -exec cp {} newDir \;
in which cp is executed for each filename that find finds, and passed the filename correctly. Here's more info on this technique.
Instead of all the above, you could use zsh and simply type
$ cp **/*.xls target_directory
zsh can expand wildcards to include subdirectories and makes this sort of thing very easy.

From all of the above, I came up with this version.
This version also works for me in the mac recovery terminal.
find ./ -name '*.xsl' -exec cp -prv '{}' '/path/to/targetDir/' ';'
It will look in the current directory and recursively in all of the sub directories for files with the xsl extension. It will copy them all to the target directory.
cp flags are:
p - preserve attributes of the file
r - recursive
v - verbose (shows you whats
being copied)

I had a similar problem. I solved it using:
find dir_name '*.mp3' -exec cp -vuni '{}' "../dest_dir" ";"
The '{}' and ";" executes the copy on each file.

I also had to do this myself. I did it via the --parents argument for cp:
find SOURCEPATH -name filename*.txt -exec cp --parents {} DESTPATH \;

In 2022 the zsh solution also works in Linux Bash:
cp **/*.extension /dest/dir
works as expected.

find [SOURCEPATH] -type f -name '[PATTERN]' |
while read P; do cp --parents "$P" [DEST]; done
you may remove the --parents but there is a risk of collision if multiple files bear the same name.

On macOS Ventura 13.1, on zsh, I saw the following error when there were too many files to copy, saw the following error:
zsh: argument list too long: cp
Had to use find command along with cp to get the files copied to my destination:
find ./module/*/src -name \*.java -print | while read filelocation; do cp $filelocation mydestinationlocation; done

Related

Bash: CP will consider file with extension as duplicate as file without it

I'm running
cp -dR "${SOURCE_DIR}" "${OUTPUT_DIR}"
And there's one place with a file something and something.exe.
CP is failing because it considers both of them the same file. Can this be forced somehow?
You can use this find-based alternative :
find source/ -type d -exec mkdir target/{} \; -o -type f -exec cp -d {} target/{} \;
It recurses over the content of the source/ directory, using mkdir to create directories it encounters in the target directory and cp to copy the files one by one.
I expect this will be quite slower than your original cp -R would have been. If you've got the rsync binary available (not by default with git bash AFAIK) you should give it a try, it might not have the same unfortunate interaction with git bash and its .exe simplification that you found in cp -R and should be faster than my solution.

Running a bash find with file cp parameter error python script

I'd like to copy a file_list to another location. This is being called in a python script. I have
find <sourceaddress> -exec cp '{}' <destaddress> | .* rm
but it tells me an exact parameter is missing. It runs though it gives a prompt from the command line and from the script just does nothing.
I think you are missing "\;" at the end. I am not sure what the .* rm does. Assuming you want to remove the files you can use the 'mv' command instead of 'cp'.
For copying files only from one directory to another ,
find <srcdirectory> -exec cp '{}' <destdirectory> \;
If you want to move the files, use 'mv' instead use below.
find <srcdirectory> -exec mv '{}' <destdirectory> \;

Can I limit the recursion when copying using find (bash)

I have been given a list of folders which need to be found and copied to a new location.
I have basic knowledge of bash and have created a script to find and copy.
The basic command I am using is working, to a certain degree:
find ./ -iname "*searchString*" -type d -maxdepth 1 -exec cp -r {} /newPath/ \;
The problem I want to resolve is that each found folder contains the files that I want, but also contains subfolders which I do not want.
Is there any way to limit the recursion so that only the files at the root level of the found folder are copied: all subdirectories and files therein should be ignored.
Thanks in advance.
If you remove -R, cp doesn't copy directories:
cp *searchstring*/* /newpath
The command above copies dir1/file1 to /newpath/file1, but these commands copy it to /newpath/dir1/file1:
cp --parents *searchstring*/*(.) /newpath
for GNU cp and zsh
. is a qualifier for regular files in zsh
cp --parents dir1/file1 dir2 copies file1 to dir2/dir1 in GNU cp
t=/newpath;for d in *searchstring*/;do mkdir -p "$t/$d";cp "$d"* "$t/$d";done
find *searchstring*/ -type f -maxdepth 1 -exec rsync -R {} /newpath \;
-R (--relative) is like --parents in GNU cp
find . -ipath '*searchstring*/*' -type f -maxdepth 2 -exec ditto {} /newpath/{} \;
ditto is only available on OS X
ditto file dir/file creates dir if it doesn't exist
So ... you've been given a list of folders. Perhaps in a text file? You haven't provided an example, but you've said in comments that there will be no name collisions.
One option would be to use rsync, which is available as an add-on package for most versions of Unix and Linux. Rsync is basically an advanced copying tool -- you provide it with one or more sources, and a destination, and it makes sure things are synchronized. It knows how to copy things recursively, but it can't be told to limit its recursion to a particular depth, so the following will copy each item specified to your target, but it will do so recursively.
xargs -L 1 -J % rsync -vi -a % /path/to/target/ < sourcelist.txt
If sourcelist.txt contains a line with /foo/bar/slurm, then the slurm directory will be copied in its entiriety to /path/to/target/slurm/. But this would include directories contained within slurm.
This will work in pretty much any shell, not just bash. But it will fail if one of the lines in sourcelist.txt contains whitespace, or various special characters. So it's important to make sure that your sources (on the command line or in sourcelist.txt) are formatted correctly. Also, rsync has different behaviour if a source directory includes a trailing slash, and you should read the man page and decide which behaviour you want.
You can sanitize your input file fairly easily in sh, or bash. For example:
#!/bin/sh
# Avoid commented lines...
grep -v '^[[:space:]]*#' sourcelist.txt | while read line; do
# Remove any trailing slash, just in case
source=${line%%/}
# make sure source exist before we try to copy it
if [ -d "$source" ]; then
rsync -vi -a "$source" /path/to/target/
fi
done
But this still uses rsync's -a option, which copies things recursively.
I don't see a way to do this using rsync alone. Rsync has no -depth option, as find has. But I can see doing this in two passes -- once to copy all the directories, and once to copy the files from each directory.
So I'll make up an example, and assume further that folder names do not contain special characters like spaces or newlines. (This is important.)
First, let's do a single-pass copy of all the directories themselves, not recursing into them:
xargs -L 1 -J % rsync -vi -d % /path/to/target/ < sourcelist.txt
The -d option creates the directories that were specified in sourcelist.txt, if they exist.
Second, let's walk through the list of sources, copying each one:
# Basic sanity checking on input...
grep -v '^[[:space:]]*#' sourcelist.txt | while read line; do
if [ -d "$line" ]; then
# Strip trailing slashes, as before
source=${line%%/}
# Grab the directory name from the source path
target=${source##*/}
rsync -vi -a "$source/" "/path/to/target/$target/"
fi
done
Note the trailing slash after $source on the rsync line. This causes rsync to copy the contents of the directory, rather than the directory.
Does all this make sense? Does it match your requirements?
You can use find's ipath argument:
find . -maxdepth 2 -ipath './*searchString*/*' -type f -exec cp '{}' '/newPath/' ';'
Notice the path starts with ./ to match find's search directory, ends with /* in order to exclude files in the top level directory, and maxdepth is set to 2 to only recurse one level deep.
Edit:
Re-reading your comments, it seems like you want to preserve the directory you're copying from? E.g. when searching for foo*:
./foo1/* ---> copied to /newPath/foo1/* (not to /newPath/*)
./foo2/* ---> copied to /newPath/foo2/* (not to /newPath/*)
Also, the other requirement is to keep maxdepth at 1 for speed reasons.
(As pointed out in the comments, the following solution has security issues for specially crafted names)
Combining both, you could use this:
find . -maxdepth 1 -type d -iname 'searchString' -exec sh -c "mkdir -p '/newPath/{}'; cp "{}/*" '/newPath/{}/' 2>/dev/null" ';'
Edit 2:
Why not ditch find altogether and use a pure bash solution:
for d in *searchString*/; do mkdir -p "/newPath/$d"; cp "$d"* "/newPath/$d"; done
Note the / at the end of the search string, causing only directories to be considered for matching.

Renaming Subdirectories and Files

I have a script using a for loop that would rename folders and files. The script would take the list of files and folders and rename them conditionally. I would invoke the file using the command:
find test/* -exec ./replace.sh {} \;
My replace.sh script would contain something similar to:
for i in $#
mv $OLDFILE $NEWFILE
done
$OLDFILE and $NEWFILE has been set previously and I don't believe any problems will arise from them.
My problem arises when I hit upon subdirectories. Originally, I would have folders like:
folder_1
-file1
-file2
When my script changes folder_1 into folderX1, the next argument, folder_1/file1 woudl be invalid as the changed path would be folderX1/file1. I figured I could create a stack with a list of folders that is being changed and pop them out later to rename the files but this seems hard on bash. Is there a better method that I am missing?
P.S I could run the program several times to go through all the subdirectories but this doesn't seem efficient.
You can add -depth to the find command. This will process the directory's files before the directory itself. See man find for details.
Your find usage is problematic. The first option is the start location for the search, so you don't want to use a glob there. If you want only the files in test/ and not any of its subdirectories, use the -depth option, as Olaf suggested.
You don't really need to use a separate script to handle this rename. It can be done within the find command line, if you don't mind a little mess.
To handle just the top-level of files, you could do this:
$ touch foo.txt bar.txt baz.ext
$ find test -depth 1 -type f -name \*.txt -exec bash -c 'f="{}"; mv -v "{}" "${f/.txt/.csv}"' \;
./foo.txt -> ./foo.csv
./bar.txt -> ./bar.csv
$
But your concern is valid -- find will build a list of matches, and if your -exec changes the list out from under find, some renames will fail.
I suspect your quickest solution is to do this in TWO stages (not several): one for files, followed by one for directories. (Or change the order, I don't think it should matter.)
$ mkdir foo_1; touch red_2 foo_1/blue_3
$ find . -type f -name \*_\* -exec bash -c 'f="{}"; mv -v "{}" "${f%_?}X${f##*_}"' \;
./foo_1/blue_3 -> ./foo_1/blueX3
./red_2 -> ./redX2
$ find . -type d -name \*_\* -exec bash -c 'f="{}"; mv -v "{}" "${f%_?}X${f##*_}"' \;
./foo_1 -> ./fooX1
Bash parameter expansion will get you a long way.
Another option, depending on your implementation of find, is the -d option:
-d Cause find to perform a depth-first traversal, i.e., directories
are visited in post-order and all entries in a directory will be
acted on before the directory itself. By default, find visits
directories in pre-order, i.e., before their contents. Note, the
default is not a breadth-first traversal.
So:
$ mkdir -p foo_1/bar_2; touch red_3 foo_1/blue_4 foo_1/bar_2/green_5
$ find . -d -name \*_\* -exec bash -c 'f="{}"; mv -v "{}" "${f%_?}X${f##*_}"' \;
./foo_1/bar_2/green_5 -> ./foo_1/bar_2/greenX5
./foo_1/bar_2 -> ./foo_1/barX2
./foo_1/blue_4 -> ./foo_1/blueX4
./foo_1 -> ./fooX1
./red_3 -> ./redX3
$

Find files, rename in place unix bash

This should be relatively trivial but I have been trying for some time without much luck.
I have a directory, with many sub-directories, each with their own structure and files.
I am looking to find all .java files within any directory under the working directory, and rename them to a particular name.
For example, I would like to name all of the java files test.java.
If the directory structure is a follows:
./files/abc/src/abc.java
./files/eee/src/foo.java
./files/roo/src/jam.java
I want to simply rename to:
./files/abc/src/test.java
./files/eee/src/test.java
./files/roo/src/test.java
Part of my problem is that the paths may have spaces in them.
I don't need to worry about renaming classes or anything inside the files, just the file names in place.
If there is more than one .java file in a directory, I don't mind if it is overwritten, or a prompt is given, to choose what to do (either is OK, it is unlikely that there are more than one in each directory.
What I have tried:
I have looked into mv and find; but, when I pipe them together, I seem to be doing it wrong. I want to make sure to keep the files in their current location and rename, and not move.
The GNU version of find has an -execdir action which changes directory to wherever the file is.
find . -name '*.java' -execdir mv {} test.java \;
If your version of find doesn't support -execdir then you can get the job done with:
find . -name '*.java' -exec bash -c 'mv "$1" "${1%/*}"/test.java' -- {} \;
If your find command (like mine) doesn't support -execdir, try the following:
find . -name "*.java" -exec bash -c 'mv "{}" "$(dirname "{}")"/test.java' \;

Resources