Bash - replace all files, of a certain type, recursively, with a single file, retaining the original file name

Bash - replace all files, of a certain type, recursively, with a single file, retaining the original file name - bash

Suppose I have a directory that I need to traverse recursively, finding each file of a certain type (*.jpeg), and replace each matching occurrence with one unique file (placeholder.jpeg) BUT I want to keep the original filename.
I am replacing all jpeg images in a large directory structure with one small placeholder image, but I need to keep the original file names intact.
So a recursive find-file-by-type-and-replace, but keeping the original file name.
Here is what almost works:
#! /bin/bash
shopt -s globstar
for f in /tmp/**/*.jpeg
do
cp /tmp/placeholder.jpeg $f
done

It's not clear to me that your placeholder file is always outside the directory structure where you're doing the replacement.
Various find versions have a -samefile option to detect a particular file, so you could avoid an invocation of cp that's going to emit its "... are the same file" warning:
find /tmp \
-name '*.jpeg' ! -samefile /tmp/placeholder.jpeg \
-exec cp /tmp/placeholder.jpeg {} \;
You could test by replacing -exec cp ... with -exec echo cp ... to see the expected copies without doing them.

Try this: find /tmp/ -name \*.jpeg -print0 | xargs -0I'{}' cp placeholder.jpeg {}

Related

Using bash I need to perform a find of 0 byte files but report on their existence before deletion

The history of this problem is:
I have millions of files and directories on a NAS system. I found a count of 1,095,601 empty (0 byte) files. These files used to have data but were destroyed by a predecessor not using the correct toolsets to migrate data between an XSAN and this Isilon NAS.
The files were media production data, like fonts, pdfs and image files. They are no longer useful beyond the history of their existence. Before I proceed to delete them, the production user's need a record of which files used to exist, so when they browse a project folder, they can use the unaffected files but then refer to a text file in the same directory which records which files used to also be there and thus provide reason as to why certain reference files are broken.
So how do I find files across multiple directories and delete them but first output their filename to a text file which would be saved to each relevant path location?
I am thinking along the lines of:
for file in $(find . -type f -size 0); do
echo "$file" >> /PATH/TO/FOUND/FILE/PARENT/DIR/deletedFiles.txt -print0 |
xargs -0 rm ;
done

To delete each empty file while leaving behind a file called deletedFiles.txt which contains the names of the deleted files, try:
PATH=/bin:/usr/bin find . -empty -type f -execdir bash -c 'printf "%s\n" "$#" >>deletedFiles.txt' none {} + -delete
How it works
PATH=/bin:/usr/bin
This sets a temporary but secure path.
find .
This starts find looking in the current directory
-empty
This tells find to only look for empty files
-type f
This restricts find to looking for regular files.
-execdir bash -c 'printf "%s\n" "$#" >>deletedFiles.txt' none {} +
In each directory that contains an empty file, this adds the name of each empty file to the file deletedFiles.txt.
Notice the peculiar use of none in the command:
bash -c 'printf "%s\n" "$#" >>deletedFiles.txt' none {} +
When this command is run, bash will execute the string printf "%s\n" "$#" >>deletedFiles.txt and the arguments that follow that string are assigned to the positional parameters: $0, $1, $2, etc. When we use $#, it does not include $0. It, as is usual, expands to $1, $2, .... Thus, we add the placeholder none so that the placeholder is assigned is the $0, which we will ignore, and the complete list of file names are assigned to "$#".
-delete
This deletes each empty file.

Why not simply
find . -type f -size 0 -exec rm -v + |
sed -e 's%^removed .\./%%' -e 's/.$//' >deletedFiles.txt
If your find is too old to support -exec ... + you'll need to revert to -exec rm -v {} \; or refactor to
find . -type f -size 0 -print0 |
xargs -r -0 rm -v |
sed -e 's%^removed .\./%%' -e 's/.$//' >deletedFiles.txt
The brief sed script is to postprocess the output from rm -v which looks like
removed ‘./bar’
removed ‘./foo’
(with some funny quote characters around the file name) on my system. If you are fine with that output, of course, just omit the sed script from the pipeline.
If you know in advance which directories contain empty files, you can run the above snippet individually in those directories. Assuming you saved the snippet above as a script (with a proper shebang and execute permissions) named find-empty, you could simply use
for path in /path/to/first /path/to/second/directory /path/to/etc; do
cd "$path" && find-empty
done
This will only work if you have absolute paths (if not, you can run the body of the loop in a subshell by adding parentheses around it).
If you want to inspect all the directories in a tree, change the script to print to standard output instead (remove >deletedFiles.txt from the script) and try something like
find /path/to/tree -type d -exec sh -c '
t=$(mktemp -t find-emptyXXXXXXXX)
cd "$1" &&
find-empty | grep . >"$t" &&
mv "$t" deletedFiles.txt ||
rm "$t"' _ {} \;
This uses a temporary file so as to avoid updating the timestamp of directories which do not contain any empty files. The grep . is used purely for side effect; if any (non-empty) lines are printed, it will return success, whereas otherwise, it will report failure; this way, we know whether or not to move the temporary file to the target directory.

With prompting from #JonathanLeffler I have succeeded with the following:
#!/bin/bash
## call this script with: find . -type f -empty -exec handleEmpty.sh {} +
for file in "$#"
do
file2="$(basename "$file")"
echo "$file2" >> "$(dirname "$file")"/deletedFiles.txt
rm "$file"
done
This means I retain a trace of the removed files in a deletedFiles.txt flag file in each respective directory for the users to see when files are missing. That way, they can pursue going back to archive CD's to retrieve these deleted files, which are hopefully not 0 byte files.
Thanks to #John1024 for the suggestion of using the empty flag rather than size.

Move only files recursively from multiple directories into one directory with mv

I currently have ~40k RAW images that are in a nested directory structure. (Some folders have as many as 100 subfolders filled with files.) I would like to move them all into one master directory, with no subfolders. How could this be accomplished using mv? I know the -r switch will copy recursively, but this copies folders as well, and I do not wish to have subdirectories in the master folder.

If your photos are in /path/to/photos/ and its subdirectories, and you want to move then in /path/to/master/, and you want to select them by extension .jpg, .JPG, .png, .PNG, etc.:
find /path/to/photos \( -iname '*.jpg' -o -iname '*.png' \) -type f -exec mv -nv -t '/path/to/master' -- {} +
If you don't want to filter by extension, and just move everything (i.e., all the files):
find /path/to/photos -type f -exec mv -nv -t '/path/to/master' -- {} +
The -n option so as to not overwrite existing files (optional if you don't care) and -v option so that mv shows what it's doing (very optional).
The -t option to mv is to specify the target directory, so that we can stack all the files to be moved at the end of the command (see the + delimiter of -exec). If your mv doesn't support -t:
find /path/to/photos \( -iname '*.jpg' -o -iname '*.png' \) -type f -exec mv -nv -- {} '/path/to/master' \;
but this will be less efficient, as one instance of mv will be created for each file.
Btw, this moves the files, it doesn't copy them.
Remarks.
The directory /path/to/master must already exist (it will not be created by this command).
Make sure the directory /path/to/master is not in /path/to/photos. It would make the thing awkward!

Make use of -execdir option of find:
find /path/of/images -type f -execdir mv '{}' /master-dir \;
As per man find:
-execdir utility [argument ...] ;
The -execdir primary is identical to the -exec primary with the exception that
utility will be executed from the directory that holds the current
file. The filename substituted for the string ``{}'' is not qualified.
Since -execdir makes find execute given command from each directory therefore only base filename is moved without any parent path of the file.

find <base location of files> -type -f -name \*\.raw -exec mv {} master \;

If your hierachy is only one level deep, here is another way using the automated tools of StringSolver:
mv -a firstfolder/firstfile.raw firstfile.raw
The -a options immediately applies the similar transformation to all similar files at a nesting level 1 (i.e. for all other subfolders).
If you do not trust the system, you can use other options such as -e to explain the transformation or -t to test it on all files.
DISCLAIMER: I am a co-author of this work for academic purposes, and working on a bash script renderer. But the system is already available for testing purposes.

Can I limit the recursion when copying using find (bash)

I have been given a list of folders which need to be found and copied to a new location.
I have basic knowledge of bash and have created a script to find and copy.
The basic command I am using is working, to a certain degree:
find ./ -iname "*searchString*" -type d -maxdepth 1 -exec cp -r {} /newPath/ \;
The problem I want to resolve is that each found folder contains the files that I want, but also contains subfolders which I do not want.
Is there any way to limit the recursion so that only the files at the root level of the found folder are copied: all subdirectories and files therein should be ignored.
Thanks in advance.

If you remove -R, cp doesn't copy directories:
cp *searchstring*/* /newpath
The command above copies dir1/file1 to /newpath/file1, but these commands copy it to /newpath/dir1/file1:
cp --parents *searchstring*/*(.) /newpath
for GNU cp and zsh
. is a qualifier for regular files in zsh
cp --parents dir1/file1 dir2 copies file1 to dir2/dir1 in GNU cp
t=/newpath;for d in *searchstring*/;do mkdir -p "$t/$d";cp "$d"* "$t/$d";done
find *searchstring*/ -type f -maxdepth 1 -exec rsync -R {} /newpath \;
-R (--relative) is like --parents in GNU cp
find . -ipath '*searchstring*/*' -type f -maxdepth 2 -exec ditto {} /newpath/{} \;
ditto is only available on OS X
ditto file dir/file creates dir if it doesn't exist

So ... you've been given a list of folders. Perhaps in a text file? You haven't provided an example, but you've said in comments that there will be no name collisions.
One option would be to use rsync, which is available as an add-on package for most versions of Unix and Linux. Rsync is basically an advanced copying tool -- you provide it with one or more sources, and a destination, and it makes sure things are synchronized. It knows how to copy things recursively, but it can't be told to limit its recursion to a particular depth, so the following will copy each item specified to your target, but it will do so recursively.
xargs -L 1 -J % rsync -vi -a % /path/to/target/ < sourcelist.txt
If sourcelist.txt contains a line with /foo/bar/slurm, then the slurm directory will be copied in its entiriety to /path/to/target/slurm/. But this would include directories contained within slurm.
This will work in pretty much any shell, not just bash. But it will fail if one of the lines in sourcelist.txt contains whitespace, or various special characters. So it's important to make sure that your sources (on the command line or in sourcelist.txt) are formatted correctly. Also, rsync has different behaviour if a source directory includes a trailing slash, and you should read the man page and decide which behaviour you want.
You can sanitize your input file fairly easily in sh, or bash. For example:
#!/bin/sh
# Avoid commented lines...
grep -v '^[[:space:]]*#' sourcelist.txt | while read line; do
# Remove any trailing slash, just in case
source=${line%%/}
# make sure source exist before we try to copy it
if [ -d "$source" ]; then
rsync -vi -a "$source" /path/to/target/
fi
done
But this still uses rsync's -a option, which copies things recursively.
I don't see a way to do this using rsync alone. Rsync has no -depth option, as find has. But I can see doing this in two passes -- once to copy all the directories, and once to copy the files from each directory.
So I'll make up an example, and assume further that folder names do not contain special characters like spaces or newlines. (This is important.)
First, let's do a single-pass copy of all the directories themselves, not recursing into them:
xargs -L 1 -J % rsync -vi -d % /path/to/target/ < sourcelist.txt
The -d option creates the directories that were specified in sourcelist.txt, if they exist.
Second, let's walk through the list of sources, copying each one:
# Basic sanity checking on input...
grep -v '^[[:space:]]*#' sourcelist.txt | while read line; do
if [ -d "$line" ]; then
# Strip trailing slashes, as before
source=${line%%/}
# Grab the directory name from the source path
target=${source##*/}
rsync -vi -a "$source/" "/path/to/target/$target/"
fi
done
Note the trailing slash after $source on the rsync line. This causes rsync to copy the contents of the directory, rather than the directory.
Does all this make sense? Does it match your requirements?

You can use find's ipath argument:
find . -maxdepth 2 -ipath './*searchString*/*' -type f -exec cp '{}' '/newPath/' ';'
Notice the path starts with ./ to match find's search directory, ends with /* in order to exclude files in the top level directory, and maxdepth is set to 2 to only recurse one level deep.
Edit:
Re-reading your comments, it seems like you want to preserve the directory you're copying from? E.g. when searching for foo*:
./foo1/* ---> copied to /newPath/foo1/* (not to /newPath/*)
./foo2/* ---> copied to /newPath/foo2/* (not to /newPath/*)
Also, the other requirement is to keep maxdepth at 1 for speed reasons.
(As pointed out in the comments, the following solution has security issues for specially crafted names)
Combining both, you could use this:
find . -maxdepth 1 -type d -iname 'searchString' -exec sh -c "mkdir -p '/newPath/{}'; cp "{}/*" '/newPath/{}/' 2>/dev/null" ';'
Edit 2:
Why not ditch find altogether and use a pure bash solution:
for d in *searchString*/; do mkdir -p "/newPath/$d"; cp "$d"* "/newPath/$d"; done
Note the / at the end of the search string, causing only directories to be considered for matching.

shell entering each folder and zip content

So I have some folder
|-Folder1
||-SubFolder1
||-SubFolder2
|-Folder2
||-SubFolder3
||-SubFolder4
Each subfolder contains several jpg I want to zip to the root folder...
I'm a little bit stuck on "How to enter each folder"
Here is my code:
find ./ -type f -name '*.jpg' | while IFS= read i
do
foldName=${PWD##*/}
zip ../../foldName *
done
The better would be to store FolderName+SubFolderName and give it to the zip command as name...

Zipping JPEGs (for Compression) is Usually Wasted Effort
First of all, attempting to compress already-compressed formats like JPEG files is usually a waste of time, and can sometimes result in archives that are larger than the original files. However, it is sometimes useful to do so for the convenience of having a bunch of files in a single package.
Just something to keep in mind. YMMV.
Use Find's -execdir Flag
What you need is the find utility's -execdir flag. The GNU find man page says:
-execdir command {} +
Like -exec, but the specified command is run from the subdirec‐
tory containing the matched file, which is not normally the
directory in which you started find.
For example, given the following test corpus:
cd /tmp
mkdir -p foo/bar/baz
touch foo/bar/1.jpg
touch foo/bar/baz/2.jpg
you can zip the entire set of files with find while excluding the path information with a single invocation. For example:
find /tmp/foo -name \*jpg -execdir zip /tmp/my.zip {} +
Use Zip's --junk-paths Flag
The zip utility on many systems supports a --junk-paths flag. The man page for zip says:
--junk-paths
Store just the name of a saved file (junk the path), and do not
store directory names.
So, if your find utility doesn't support -execdir, but you do have a zip that supports junking paths, you could do this instead:
find /tmp/foo -name \*jpg -print0 | xargs -0 zip --junk-paths /tmp/my.zip

You can use dirname to get the directory name of a file/directory it is located in.
You can also simplify the find command to search only for directories by using -type d. Then you should use basename to get only the name of the subdirs:
find ./*/* -type d | while read line; do
zip --junk-paths "$(basename $line)" $line/*.jpg
done
Explanation
find ./*/* -type d
will print out all directories located in ./*/* which will result in all subdirs of directories located in the current dir
while read line reads each line from the stream and stores it in the variable "line". Thus $line will be the relative path to the subdir, e.g. "Folder1/Subdir2"
"$(basename $line)" returns the only the name of the subdir, e.g. "Subdir2"
Update: add --junk-paths to the zip command if you do not want the directy paths to be stored in the zip filde

So a little check, I finally got something working:
find ./*/* -type d | while read line; do
#printf '%s\n' "$line"
zip ./"$line" "$line"/*.jpg
done
But this create un archive containing:
Subfolder.zip
Folder
|-Subfolder
||-File1.jpg
||-File2.jpg
||-File3.jpg
Instead I fold like it to be:
Subfolder.zip
|-File1.jpg
|-File2.jpg
|-File3.jpg
So I tried using basename and dirname in differnet combination...Always got some error...
And just to learn how to: what if I would like the new archive to be created in the same root directory as "Folder"?

Ok finally got it!
find ./* -name \*.zip -type f -print0 | xargs -0 rm -rf
find ./*/* -type d | while read line; do
#printf '%s\n' "$line"
zip --junk-paths ./"$line" "$line"/*.jpg
done
find . -name \*.zip -type f -mindepth 2 -exec mv -- '{}' . \;
In first row I simply remove all .zip files,
Then I zip all and in the final row I move all zip to the root directory!
Thanks everbody for your help!

Copy all files with a certain extension from all subdirectories

Under unix, I want to copy all files with a certain extension (all excel files) from all subdirectories to another directory. I have the following command:
cp --parents `find -name \*.xls*` /target_directory/
The problems with this command are:
It copies the directory structure as well, and I only want the files (so all files should end up in /target_directory/)
It does not copy files with spaces in the filenames (which are quite a few)
Any solutions for these problems?

--parents is copying the directory structure, so you should get rid of that.
The way you've written this, the find executes, and the output is put onto the command line such that cp can't distinguish between the spaces separating the filenames, and the spaces within the filename. It's better to do something like
$ find . -name \*.xls -exec cp {} newDir \;
in which cp is executed for each filename that find finds, and passed the filename correctly. Here's more info on this technique.
Instead of all the above, you could use zsh and simply type
$ cp **/*.xls target_directory
zsh can expand wildcards to include subdirectories and makes this sort of thing very easy.

From all of the above, I came up with this version.
This version also works for me in the mac recovery terminal.
find ./ -name '*.xsl' -exec cp -prv '{}' '/path/to/targetDir/' ';'
It will look in the current directory and recursively in all of the sub directories for files with the xsl extension. It will copy them all to the target directory.
cp flags are:
p - preserve attributes of the file
r - recursive
v - verbose (shows you whats
being copied)

I had a similar problem. I solved it using:
find dir_name '*.mp3' -exec cp -vuni '{}' "../dest_dir" ";"
The '{}' and ";" executes the copy on each file.

I also had to do this myself. I did it via the --parents argument for cp:
find SOURCEPATH -name filename*.txt -exec cp --parents {} DESTPATH \;

In 2022 the zsh solution also works in Linux Bash:
cp **/*.extension /dest/dir
works as expected.

find [SOURCEPATH] -type f -name '[PATTERN]' |
while read P; do cp --parents "$P" [DEST]; done
you may remove the --parents but there is a risk of collision if multiple files bear the same name.

On macOS Ventura 13.1, on zsh, I saw the following error when there were too many files to copy, saw the following error:
zsh: argument list too long: cp
Had to use find command along with cp to get the files copied to my destination:
find ./module/*/src -name \*.java -print | while read filelocation; do cp $filelocation mydestinationlocation; done

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio