Making archive from files with same names in different directories - bash

I have some files with same names but under different directories. For example, path1/filea, path1/fileb, path2/filea, path2/fileb,....
What is the best way to make the files into an archive? Under these directories, there are many other files under these directories that I don't want to make into the archive. Off the top of my head, I think of using Bash, probably ar, tar and other commands, but am not sure how exactly to do it.
Renaming the files seems to make the file names a little complicated. I tend to keep the directory structure inside the archive. Or I might be wrong. Other ideas are welcome!
Thanks and regards!
EDIT:
Examples would be really nice!

you can use tar with --exclude PATTERN option. See the man page for more.
To exclude files, you can see this page for examples.

You may give the find command multiple directories to search through.
# example: create archive of .tex files
find -x LaTeX-files1 LaTeX-files2 -name "*.tex" -print0 | tar --null --no-recursion -uf LaTeXfiles.tar --files-from -

To recursively copy only files with filename "filea" or "fileb" from /path/to/source to /path/to/archive, you could use:
rsync -avm --include='file[ab]' -f 'hide,! */' /path/to/source/ /path/to/archive/
'*/' is a pattern which matches 'any directory'
'! */' matches anything which is not a directory (i.e. a file)
'hide,! */' means hide all files
Filter rules are applied in order, and the first rule that matches is applied.
--include='file[ab]' has precedence, so if a file matches 'file[ab]', it is included.
Any other file gets excluded from the list of files to transfer.
Another alternative is to use the find...exec pattern:
mkdir /path/to/archive
cd /path/to/source
find . -type f -iname "file[ab]" -exec cp --parents '{}' /path/to/archive ";"

What I have used to make a tar ball for the files with same name in different directories is
$find <path> -name <filename> -exec tar -rvf data.tar '{}' \;
i.e. tar [-]r --append
Hope this helps.

Related

How to exclude a list of files and folders while using tar? [duplicate]

Is there a simple shell command/script that supports excluding certain files/folders from being archived?
I have a directory that need to be archived with a sub directory that has a number of very large files I do not need to backup.
Not quite solutions:
The tar --exclude=PATTERN command matches the given pattern and excludes those files, but I need specific files & folders to be ignored (full file path), otherwise valid files might be excluded.
I could also use the find command to create a list of files and exclude the ones I don't want to archive and pass the list to tar, but that only works with for a small amount of files. I have tens of thousands.
I'm beginning to think the only solution is to create a file with a list of files/folders to be excluded, then use rsync with --exclude-from=file to copy all the files to a tmp directory, and then use tar to archive that directory.
Can anybody think of a better/more efficient solution?
EDIT: Charles Ma's solution works well. The big gotcha is that the --exclude='./folder' MUST be at the beginning of the tar command. Full command (cd first, so backup is relative to that directory):
cd /folder_to_backup
tar --exclude='./folder' --exclude='./upload/folder2' -zcvf /backup/filename.tgz .
You can have multiple exclude options for tar so
$ tar --exclude='./folder' --exclude='./upload/folder2' -zcvf /backup/filename.tgz .
etc will work. Make sure to put --exclude before the source and destination items.
You can exclude directories with --exclude for tar.
If you want to archive everything except /usr you can use:
tar -zcvf /all.tgz / --exclude=/usr
In your case perhaps something like
tar -zcvf archive.tgz arc_dir --exclude=dir/ignore_this_dir
Possible options to exclude files/directories from backup using tar:
Exclude files using multiple patterns
tar -czf backup.tar.gz --exclude=PATTERN1 --exclude=PATTERN2 ... /path/to/backup
Exclude files using an exclude file filled with a list of patterns
tar -czf backup.tar.gz -X /path/to/exclude.txt /path/to/backup
Exclude files using tags by placing a tag file in any directory that should be skipped
tar -czf backup.tar.gz --exclude-tag-all=exclude.tag /path/to/backup
old question with many answers, but I found that none were quite clear enough for me, so I would like to add my try.
if you have the following structure
/home/ftp/mysite/
with following file/folders
/home/ftp/mysite/file1
/home/ftp/mysite/file2
/home/ftp/mysite/file3
/home/ftp/mysite/folder1
/home/ftp/mysite/folder2
/home/ftp/mysite/folder3
so, you want to make a tar file that contain everyting inside /home/ftp/mysite (to move the site to a new server), but file3 is just junk, and everything in folder3 is also not needed, so we will skip those two.
we use the format
tar -czvf <name of tar file> <what to tar> <any excludes>
where the c = create, z = zip, and v = verbose (you can see the files as they are entered, usefull to make sure none of the files you exclude are being added). and f= file.
so, my command would look like this
cd /home/ftp/
tar -czvf mysite.tar.gz mysite --exclude='file3' --exclude='folder3'
note the files/folders excluded are relatively to the root of your tar (I have tried full path here relative to / but I can not make that work).
hope this will help someone (and me next time I google it)
You can use standard "ant notation" to exclude directories relative.
This works for me and excludes any .git or node_module directories:
tar -cvf myFile.tar --exclude=**/.git/* --exclude=**/node_modules/* -T /data/txt/myInputFile.txt 2> /data/txt/myTarLogFile.txt
myInputFile.txt contains:
/dev2/java
/dev2/javascript
This exclude pattern handles filename suffix like png or mp3 as well as directory names like .git and node_modules
tar --exclude={*.png,*.mp3,*.wav,.git,node_modules} -Jcf ${target_tarball} ${source_dirname}
I've experienced that, at least with the Cygwin version of tar I'm using ("CYGWIN_NT-5.1 1.7.17(0.262/5/3) 2012-10-19 14:39 i686 Cygwin" on a Windows XP Home Edition SP3 machine), the order of options is important.
While this construction worked for me:
tar cfvz target.tgz --exclude='<dir1>' --exclude='<dir2>' target_dir
that one didn't work:
tar cfvz --exclude='<dir1>' --exclude='<dir2>' target.tgz target_dir
This, while tar --help reveals the following:
tar [OPTION...] [FILE]
So, the second command should also work, but apparently it doesn't seem to be the case...
Best rgds,
I found this somewhere else so I won't take credit, but it worked better than any of the solutions above for my mac specific issues (even though this is closed):
tar zc --exclude __MACOSX --exclude .DS_Store -f <archive> <source(s)>
After reading all this good answers for different versions and having solved the problem for myself, I think there are very small details that are very important, and rare to GNU/Linux general use, that aren't stressed enough and deserves more than comments.
So I'm not going to try to answer the question for every case, but instead, try to register where to look when things doesn't work.
IT IS VERY IMPORTANT TO NOTICE:
THE ORDER OF THE OPTIONS MATTER: it is not the same put the --exclude before than after the file option and directories to backup. This is unexpected at least to me, because in my experience, in GNU/Linux commands, usually the order of the options doesn't matter.
Different tar versions expects this options in different order: for instance, #Andrew's answer indicates that in GNU tar v 1.26 and 1.28 the excludes comes last, whereas in my case, with GNU tar 1.29, it's the other way.
THE TRAILING SLASHES MATTER: at least in GNU tar 1.29, it shouldn't be any.
In my case, for GNU tar 1.29 on Debian stretch, the command that worked was
tar --exclude="/home/user/.config/chromium" --exclude="/home/user/.cache" -cf file.tar /dir1/ /home/ /dir3/
The quotes didn't matter, it worked with or without them.
I hope this will be useful to someone.
If you are trying to exclude Version Control System (VCS) files, tar already supports two interesting options about it! :)
Option : --exclude-vcs
This option excludes files and directories used by following version control systems: CVS, RCS, SCCS, SVN, Arch, Bazaar, Mercurial, and Darcs.
As of version 1.32, the following files are excluded:
CVS/, and everything under it
RCS/, and everything under it
SCCS/, and everything under it
.git/, and everything under it
.gitignore
.gitmodules
.gitattributes
.cvsignore
.svn/, and everything under it
.arch-ids/, and everything under it
{arch}/, and everything under it
=RELEASE-ID
=meta-update
=update
.bzr
.bzrignore
.bzrtags
.hg
.hgignore
.hgrags
_darcs
Option : --exclude-vcs-ignores
When archiving directories that are under some version control system (VCS), it is often convenient to read exclusion patterns from this VCS' ignore files (e.g. .cvsignore, .gitignore, etc.) This option provide such possibility.
Before archiving a directory, see if it contains any of the following files: cvsignore, .gitignore, .bzrignore, or .hgignore. If so, read ignore patterns from these files.
The patterns are treated much as the corresponding VCS would treat them, i.e.:
.cvsignore
Contains shell-style globbing patterns that apply only to the directory where this file resides. No comments are allowed in the file. Empty lines are ignored.
.gitignore
Contains shell-style globbing patterns. Applies to the directory where .gitfile is located and all its subdirectories.
Any line beginning with a # is a comment. Backslash escapes the comment character.
.bzrignore
Contains shell globbing-patterns and regular expressions (if prefixed with RE:(16). Patterns affect the directory and all its subdirectories.
Any line beginning with a # is a comment.
.hgignore
Contains posix regular expressions(17). The line syntax: glob switches to shell globbing patterns. The line syntax: regexp switches back. Comments begin with a #. Patterns affect the directory and all its subdirectories.
Example
tar -czv --exclude-vcs --exclude-vcs-ignores -f path/to/my-tar-file.tar.gz path/to/my/project/
I'd like to show another option I used to get the same result as the answers before provide, I had a similar case where I wanted to backup android studio projects all together in a tar file to upload to media fire, using the du command to find the large files, I found that I didn't need some directories like:
build, linux e .dart_tools
Using the first answer of Charles_ma I modified it a little bit to be able to run the command from the parent directory of the my Android directory.
tar --exclude='*/build' --exclude='*/linux' --exclude='*/.dart_tool' -zcvf androidProjects.tar Android/
It worked like a charm.
Ps. Sorry if this kind of answer is not allowed, if this is the case I will remove.
For Mac OSX I had to do
tar -zcv --exclude='folder' -f theOutputTarFile.tar folderToTar
Note the -f after the --exclude=
For those who have issues with it, some versions of tar would only work properly without the './' in the exclude value.
Tar --version
tar (GNU tar) 1.27.1
Command syntax that work:
tar -czvf ../allfiles-butsome.tar.gz * --exclude=acme/foo
These will not work:
$ tar -czvf ../allfiles-butsome.tar.gz * --exclude=./acme/foo
$ tar -czvf ../allfiles-butsome.tar.gz * --exclude='./acme/foo'
$ tar --exclude=./acme/foo -czvf ../allfiles-butsome.tar.gz *
$ tar --exclude='./acme/foo' -czvf ../allfiles-butsome.tar.gz *
$ tar -czvf ../allfiles-butsome.tar.gz * --exclude=/full/path/acme/foo
$ tar -czvf ../allfiles-butsome.tar.gz * --exclude='/full/path/acme/foo'
$ tar --exclude=/full/path/acme/foo -czvf ../allfiles-butsome.tar.gz *
$ tar --exclude='/full/path/acme/foo' -czvf ../allfiles-butsome.tar.gz *
I agree the --exclude flag is the right approach.
$ tar --exclude='./folder_or_file' --exclude='file_pattern' --exclude='fileA'
A word of warning for a side effect that I did not find immediately obvious:
The exclusion of 'fileA' in this example will search for 'fileA' RECURSIVELY!
Example:A directory with a single subdirectory containing a file of the same name (data.txt)
data.txt
config.txt
--+dirA
| data.txt
| config.docx
If using --exclude='data.txt' the archive will not contain EITHER data.txt file. This can cause unexpected results if archiving third party libraries, such as a node_modules directory.
To avoid this issue make sure to give the entire path, like --exclude='./dirA/data.txt'
After reading this thread, I did a little testing on RHEL 5 and here are my results for tarring up the abc directory:
This will exclude the directories error and logs and all files under the directories:
tar cvpzf abc.tgz abc/ --exclude='abc/error' --exclude='abc/logs'
Adding a wildcard after the excluded directory will exclude the files but preserve the directories:
tar cvpzf abc.tgz abc/ --exclude='abc/error/*' --exclude='abc/logs/*'
To avoid possible 'xargs: Argument list too long' errors due to the use of find ... | xargs ... when processing tens of thousands of files, you can pipe the output of find directly to tar using find ... -print0 | tar --null ....
# archive a given directory, but exclude various files & directories
# specified by their full file paths
find "$(pwd -P)" -type d \( -path '/path/to/dir1' -or -path '/path/to/dir2' \) -prune \
-or -not \( -path '/path/to/file1' -or -path '/path/to/file2' \) -print0 |
gnutar --null --no-recursion -czf archive.tar.gz --files-from -
#bsdtar --null -n -czf archive.tar.gz -T -
You can also use one of the "--exclude-tag" options depending on your needs:
--exclude-tag=FILE
--exclude-tag-all=FILE
--exclude-tag-under=FILE
The folder hosting the specified FILE will be excluded.
Use the find command in conjunction with the tar append (-r) option. This way you can add files to an existing tar in a single step, instead of a two pass solution (create list of files, create tar).
find /dir/dir -prune ... -o etc etc.... -exec tar rvf ~/tarfile.tar {} \;
You can use cpio(1) to create tar files. cpio takes the files to archive on stdin, so if you've already figured out the find command you want to use to select the files the archive, pipe it into cpio to create the tar file:
find ... | cpio -o -H ustar | gzip -c > archive.tar.gz
gnu tar v 1.26 the --exclude needs to come after archive file and backup directory arguments, should have no leading or trailing slashes, and prefers no quotes (single or double). So relative to the PARENT directory to be backed up, it's:
tar cvfz /path_to/mytar.tgz ./dir_to_backup --exclude=some_path/to_exclude
tar -cvzf destination_folder source_folder -X /home/folder/excludes.txt
-X indicates a file which contains a list of filenames which must be excluded from the backup. For Instance, you can specify *~ in this file to not include any filenames ending with ~ in the backup.
Success Case:
1) if giving full path to take backup, in exclude also should be used full path.
tar -zcvf /opt/ABC/BKP_27032020/backup_27032020.tar.gz --exclude='/opt/ABC/csv/' --exclude='/opt/ABC/log/' /opt/ABC
2) if giving current path to take backup, in exclude also should be used current path only.
tar -zcvf backup_27032020.tar.gz --exclude='ABC/csv/' --exclude='ABC/log/' ABC
Failure Case:
if giving currentpath directory to take backup and full path to ignore,then wont work
tar -zcvf /opt/ABC/BKP_27032020/backup_27032020.tar.gz --exclude='/opt/ABC/csv/' --exclude='/opt/ABC/log/' ABC
Note: mentioning exclude before/after backup directory is fine.
It seems to be impossible to exclude directories with absolute paths.
As soon as ANY of the paths are absolute (source or/and exclude) the exclude command will not work. That's my experience after trying all possible combinations.
Check it out
tar cvpzf zip_folder.tgz . --exclude=./public --exclude=./tmp --exclude=./log --exclude=fileName
I want to have fresh front-end version (angular folder) on localhost.
Also, git folder is huge in my case, and I want to exclude it.
I need to download it from server, and unpack it in order to run application.
Compress angular folder from /var/lib/tomcat7/webapps, move it to /tmp folder with name angular.23.12.19.tar.gz
Command :
tar --exclude='.git' -zcvf /tmp/angular.23.12.19.tar.gz /var/lib/tomcat7/webapps/angular/
Your best bet is to use find with tar, via xargs (to handle the large number of arguments). For example:
find / -print0 | xargs -0 tar cjf tarfile.tar.bz2
Possible redundant answer but since I found it useful, here it is:
While a FreeBSD root (i.e. using csh) I wanted to copy my whole root filesystem to /mnt but without /usr and (obviously) /mnt. This is what worked (I am at /):
tar --exclude ./usr --exclude ./mnt --create --file - . (cd /mnt && tar xvd -)
My whole point is that it was necessary (by putting the ./) to specify to tar that the excluded directories where part of the greater directory being copied.
My €0.02
I had no luck getting tar to exclude a 5 Gigabyte subdirectory a few levels deep. In the end, I just used the unix Zip command. It worked a lot easier for me.
So for this particular example from the original post
(tar --exclude='./folder' --exclude='./upload/folder2' -zcvf /backup/filename.tgz . )
The equivalent would be:
zip -r /backup/filename.zip . -x upload/folder/**\* upload/folder2/**\*
(NOTE: Here is the post I originally used that helped me https://superuser.com/questions/312301/unix-zip-directory-but-excluded-specific-subdirectories-and-everything-within-t)
The following bash script should do the trick. It uses the answer given here by Marcus Sundman.
#!/bin/bash
echo -n "Please enter the name of the tar file you wish to create with out extension "
read nam
echo -n "Please enter the path to the directories to tar "
read pathin
echo tar -czvf $nam.tar.gz
excludes=`find $pathin -iname "*.CC" -exec echo "--exclude \'{}\'" \;|xargs`
echo $pathin
echo tar -czvf $nam.tar.gz $excludes $pathin
This will print out the command you need and you can just copy and paste it back in. There is probably a more elegant way to provide it directly to the command line.
Just change *.CC for any other common extension, file name or regex you want to exclude and this should still work.
EDIT
Just to add a little explanation; find generates a list of files matching the chosen regex (in this case *.CC). This list is passed via xargs to the echo command. This prints --exclude 'one entry from the list'. The slashes () are escape characters for the ' marks.

copy only the files in a directory

I have 2 directories
dir1/results1/a.xml
dir1/results1/b.txt
and
dir2/results2/c.xml
dir2/results2/d.txt
I want to copy only the files in dir2/results2 folder into dir1/results1 folder so that the result is like this:
dir1/results1/a.xml
dir1/results1/b.txt
dir1/results1/c.xml
dir1/results1/d.txt
I tried shell comand
cp -R dir2/results2/ dir1/results1/
but it is getting copied as
dir1/results1/a.xml
dir1/results1/b.txt
dir1/results1/results2
what is the right way to do it?
In your concrete case,
cp dir/results2/* dir/results1
would do what you want. It would not work well in two cases:
If you have files starting with a period, for instance dir/results2/.abc. These files would not be copied.
If you have subdirectories in dir/results2. While they indeed would not be copied (as you required, because you want to copy only files, not directories), you would get an error message, which is at least not elegant.
There are solutions to both problems, so if this is an issue for you, create a separate post with the respective topic.
(UPDATE) If the filename expansion would generate an argument line which is longer as the allowed minimum (for instance, if there are many files in the directory, or those with long lines), my solution would not work either. In this case, something like
find dir/results2 -maxdepth 1 -type f | xargs -i --no-run-if-empty] cp {} dir/results1
This would also solve the problems with the hidden files, which I have mentioned above.
(cd dir1 && find . -maxdepth 1 -type f -print0 | tar -T - --null -cf - ) | (cd dir2 && tar -xf -)
Handles all cases including . files and very large files but won't copy sibdirs. Remove the -depth to copy sibdirs. Requires gnutar.
tarcommand is very handy for that.
Give this a try:
tar cf - -C dir2/results2 . | ( cd dir1/results1 ; tar xf - )
It will not only copy plain files but also any other ones found into dir2/results2, such as directories etc.

Can I limit the recursion when copying using find (bash)

I have been given a list of folders which need to be found and copied to a new location.
I have basic knowledge of bash and have created a script to find and copy.
The basic command I am using is working, to a certain degree:
find ./ -iname "*searchString*" -type d -maxdepth 1 -exec cp -r {} /newPath/ \;
The problem I want to resolve is that each found folder contains the files that I want, but also contains subfolders which I do not want.
Is there any way to limit the recursion so that only the files at the root level of the found folder are copied: all subdirectories and files therein should be ignored.
Thanks in advance.
If you remove -R, cp doesn't copy directories:
cp *searchstring*/* /newpath
The command above copies dir1/file1 to /newpath/file1, but these commands copy it to /newpath/dir1/file1:
cp --parents *searchstring*/*(.) /newpath
for GNU cp and zsh
. is a qualifier for regular files in zsh
cp --parents dir1/file1 dir2 copies file1 to dir2/dir1 in GNU cp
t=/newpath;for d in *searchstring*/;do mkdir -p "$t/$d";cp "$d"* "$t/$d";done
find *searchstring*/ -type f -maxdepth 1 -exec rsync -R {} /newpath \;
-R (--relative) is like --parents in GNU cp
find . -ipath '*searchstring*/*' -type f -maxdepth 2 -exec ditto {} /newpath/{} \;
ditto is only available on OS X
ditto file dir/file creates dir if it doesn't exist
So ... you've been given a list of folders. Perhaps in a text file? You haven't provided an example, but you've said in comments that there will be no name collisions.
One option would be to use rsync, which is available as an add-on package for most versions of Unix and Linux. Rsync is basically an advanced copying tool -- you provide it with one or more sources, and a destination, and it makes sure things are synchronized. It knows how to copy things recursively, but it can't be told to limit its recursion to a particular depth, so the following will copy each item specified to your target, but it will do so recursively.
xargs -L 1 -J % rsync -vi -a % /path/to/target/ < sourcelist.txt
If sourcelist.txt contains a line with /foo/bar/slurm, then the slurm directory will be copied in its entiriety to /path/to/target/slurm/. But this would include directories contained within slurm.
This will work in pretty much any shell, not just bash. But it will fail if one of the lines in sourcelist.txt contains whitespace, or various special characters. So it's important to make sure that your sources (on the command line or in sourcelist.txt) are formatted correctly. Also, rsync has different behaviour if a source directory includes a trailing slash, and you should read the man page and decide which behaviour you want.
You can sanitize your input file fairly easily in sh, or bash. For example:
#!/bin/sh
# Avoid commented lines...
grep -v '^[[:space:]]*#' sourcelist.txt | while read line; do
# Remove any trailing slash, just in case
source=${line%%/}
# make sure source exist before we try to copy it
if [ -d "$source" ]; then
rsync -vi -a "$source" /path/to/target/
fi
done
But this still uses rsync's -a option, which copies things recursively.
I don't see a way to do this using rsync alone. Rsync has no -depth option, as find has. But I can see doing this in two passes -- once to copy all the directories, and once to copy the files from each directory.
So I'll make up an example, and assume further that folder names do not contain special characters like spaces or newlines. (This is important.)
First, let's do a single-pass copy of all the directories themselves, not recursing into them:
xargs -L 1 -J % rsync -vi -d % /path/to/target/ < sourcelist.txt
The -d option creates the directories that were specified in sourcelist.txt, if they exist.
Second, let's walk through the list of sources, copying each one:
# Basic sanity checking on input...
grep -v '^[[:space:]]*#' sourcelist.txt | while read line; do
if [ -d "$line" ]; then
# Strip trailing slashes, as before
source=${line%%/}
# Grab the directory name from the source path
target=${source##*/}
rsync -vi -a "$source/" "/path/to/target/$target/"
fi
done
Note the trailing slash after $source on the rsync line. This causes rsync to copy the contents of the directory, rather than the directory.
Does all this make sense? Does it match your requirements?
You can use find's ipath argument:
find . -maxdepth 2 -ipath './*searchString*/*' -type f -exec cp '{}' '/newPath/' ';'
Notice the path starts with ./ to match find's search directory, ends with /* in order to exclude files in the top level directory, and maxdepth is set to 2 to only recurse one level deep.
Edit:
Re-reading your comments, it seems like you want to preserve the directory you're copying from? E.g. when searching for foo*:
./foo1/* ---> copied to /newPath/foo1/* (not to /newPath/*)
./foo2/* ---> copied to /newPath/foo2/* (not to /newPath/*)
Also, the other requirement is to keep maxdepth at 1 for speed reasons.
(As pointed out in the comments, the following solution has security issues for specially crafted names)
Combining both, you could use this:
find . -maxdepth 1 -type d -iname 'searchString' -exec sh -c "mkdir -p '/newPath/{}'; cp "{}/*" '/newPath/{}/' 2>/dev/null" ';'
Edit 2:
Why not ditch find altogether and use a pure bash solution:
for d in *searchString*/; do mkdir -p "/newPath/$d"; cp "$d"* "/newPath/$d"; done
Note the / at the end of the search string, causing only directories to be considered for matching.

Copy all files with a certain extension from all subdirectories

Under unix, I want to copy all files with a certain extension (all excel files) from all subdirectories to another directory. I have the following command:
cp --parents `find -name \*.xls*` /target_directory/
The problems with this command are:
It copies the directory structure as well, and I only want the files (so all files should end up in /target_directory/)
It does not copy files with spaces in the filenames (which are quite a few)
Any solutions for these problems?
--parents is copying the directory structure, so you should get rid of that.
The way you've written this, the find executes, and the output is put onto the command line such that cp can't distinguish between the spaces separating the filenames, and the spaces within the filename. It's better to do something like
$ find . -name \*.xls -exec cp {} newDir \;
in which cp is executed for each filename that find finds, and passed the filename correctly. Here's more info on this technique.
Instead of all the above, you could use zsh and simply type
$ cp **/*.xls target_directory
zsh can expand wildcards to include subdirectories and makes this sort of thing very easy.
From all of the above, I came up with this version.
This version also works for me in the mac recovery terminal.
find ./ -name '*.xsl' -exec cp -prv '{}' '/path/to/targetDir/' ';'
It will look in the current directory and recursively in all of the sub directories for files with the xsl extension. It will copy them all to the target directory.
cp flags are:
p - preserve attributes of the file
r - recursive
v - verbose (shows you whats
being copied)
I had a similar problem. I solved it using:
find dir_name '*.mp3' -exec cp -vuni '{}' "../dest_dir" ";"
The '{}' and ";" executes the copy on each file.
I also had to do this myself. I did it via the --parents argument for cp:
find SOURCEPATH -name filename*.txt -exec cp --parents {} DESTPATH \;
In 2022 the zsh solution also works in Linux Bash:
cp **/*.extension /dest/dir
works as expected.
find [SOURCEPATH] -type f -name '[PATTERN]' |
while read P; do cp --parents "$P" [DEST]; done
you may remove the --parents but there is a risk of collision if multiple files bear the same name.
On macOS Ventura 13.1, on zsh, I saw the following error when there were too many files to copy, saw the following error:
zsh: argument list too long: cp
Had to use find command along with cp to get the files copied to my destination:
find ./module/*/src -name \*.java -print | while read filelocation; do cp $filelocation mydestinationlocation; done

Unix tar: do not preserve full pathnames

When I try to compress files and directories with tar using absolute paths, the absolute path is preserved in the resulting compressed file. I need to use absolute paths to tell tar where the folder I wish to compress is located, but I only want it to compress that folder – not the whole path.
For example, tar -cvzf test.tar.gz /home/path/test – where I want to compress the folder test. However, what I actually end up compressing is /home/path/test. Is there anything that can be done to avoid this? I have tried playing with the -C operand to no avail.
This is ugly... but it works...
I had this same problem but with multiple folders, I just wanted to flat every files out. You can use the option "transform" to pass a sed expression and... it works as expected.
this is the expression:
's/.*\///g' (delete everything before '/')
This is the final command:
tar --transform 's/.*\///g' -zcvf tarballName.tgz */*/*.info
Use -C to specify the directory from which the files look like you want, and then specify the files as seen from that directory:
tar -cvzf test.tar.gz -C /home/path test
multi-directory example
tar cvzf my.tar.gz example.zip -C dir1 files_under_dir1 -C dir2 files_under_dir2
the files under dir 1/2 should not have path.
tar can perform transformations on the filenames on the way in and out of the archive. Consider this example that stores a bunch of files in a flat tarfile:
in the root ~/ directory
find logger -name \*.sh | tar -cvf test.tar -T - --xform='s|^.*/||' --show-transformed
The -T - option tell tar to read a list of files from stdin, the --xform='s|^.*/||' applies the sed expression to all the filenames as after they are read and before they are stored. --show-transformed is just a nicety to show you the file names after they are transformed, the default is to show the names as they are read.
There are no doubt other ways besides using find to specify files to archive. For instance, if you have dotglob set in bash, you can use ** patterns to wildcard any number of directories, shortening the previous to this:
tar -cvf test.tar --xform='s|^.*/||' --show-transformed logger/**/*.sh
You’ll have to judge what is best for your situation given the files you’re after.
find -type f | tar --transform 's/.*\///g' -zcvf comp.tar.gz -T -
Where find -type f finds all the files in the directory tree and using tar with --transform compresses them without including the folder structure. This is very useful if you want to compress only the files that are the result of a certain search or the files of a specific type like:
find -type f -name "*.txt" | tar --transform 's/.*\///g' -zcvf comp.tar.gz -T -
Unlike the other answers, you don't have to include */*/* specifying the depth of the directory. find handles that for you.

Resources