copy only the files in a directory - shell

I have 2 directories
dir1/results1/a.xml
dir1/results1/b.txt
and
dir2/results2/c.xml
dir2/results2/d.txt
I want to copy only the files in dir2/results2 folder into dir1/results1 folder so that the result is like this:
dir1/results1/a.xml
dir1/results1/b.txt
dir1/results1/c.xml
dir1/results1/d.txt
I tried shell comand
cp -R dir2/results2/ dir1/results1/
but it is getting copied as
dir1/results1/a.xml
dir1/results1/b.txt
dir1/results1/results2
what is the right way to do it?

In your concrete case,
cp dir/results2/* dir/results1
would do what you want. It would not work well in two cases:
If you have files starting with a period, for instance dir/results2/.abc. These files would not be copied.
If you have subdirectories in dir/results2. While they indeed would not be copied (as you required, because you want to copy only files, not directories), you would get an error message, which is at least not elegant.
There are solutions to both problems, so if this is an issue for you, create a separate post with the respective topic.
(UPDATE) If the filename expansion would generate an argument line which is longer as the allowed minimum (for instance, if there are many files in the directory, or those with long lines), my solution would not work either. In this case, something like
find dir/results2 -maxdepth 1 -type f | xargs -i --no-run-if-empty] cp {} dir/results1
This would also solve the problems with the hidden files, which I have mentioned above.

(cd dir1 && find . -maxdepth 1 -type f -print0 | tar -T - --null -cf - ) | (cd dir2 && tar -xf -)
Handles all cases including . files and very large files but won't copy sibdirs. Remove the -depth to copy sibdirs. Requires gnutar.

tarcommand is very handy for that.
Give this a try:
tar cf - -C dir2/results2 . | ( cd dir1/results1 ; tar xf - )
It will not only copy plain files but also any other ones found into dir2/results2, such as directories etc.

Related

Bash script to recursively copy files and folders when subdir is not present

I have lots of projects archived under a directory tree, some of which have a .git folder in them.
What I'd like to do is recursively copy those files and directories to a new destination, keeping the current structure - EXCEPT for those directories containing a .git folder, in which case the script should run a command (let's say "echo", I'll change it later) followed by the folder name, without creating or copying it.
Any help would be much appreciated.
Edit: I'll try to explain myself better: I need to copy every single file and directory, except for those containing .git, which should be skipped and their path should be passed to another command. In this example, path a/b/c/d and its subfolders should be skipped entirely and a/b/c/d should be displayed using echo (just for brevity, I'll replace it with a different command later):
a
a/b
a/b/c
a/b/c/d/.git
a/b/c/d/e
a/b/c/d/f/g
a/b/c/e
a/b/d
a/c
b
b/c
...
IIUC, the following find one-liner will do the job:
find . -type d -mindepth 1 -maxdepth 1 -exec sh -c "test -e '{}/.git' && echo not copy '{}' || cp -r -- '{}' /tmp/copy-here " \;

find files unique to different paths BASH

I have a suspicion that a few years ago someone accidentally copied a folder structure from /home/data to /home/something/data. Since then /home/data has had many updates and changes.
What is the easiest way to check if there are any files in /home/something/data unique
(by name and location) to that location, to help me confirm if everything in there was a copy from /home/data?
Using diff -r dir1 dir2, you can recursively scan directories for differences in structure and content. Additional flags can tweak the output and behavior to your liking.
Use rsync in dry-run mode to see if copying /home/something/data into /home/data would actually copy any data.
rsync -r --dry-run /home/something/data /home/data
If a file under /home/something/data is identical to a file under /home/data, it would not be copied, and rsync --dry-run will not report it.
You may or may not like this approach, it can take a while to scan all files but I generally have a good feeling when I do it.
Go to the top of each directory structure and run a find and get the md5 checksums of each and every file - your switches may vary as I am on OSX
cd /home/data
find . -type f -exec md5 -r {} + > /tmp/a
cd /home/something/data
find . -type f -exec md5 -r {} + > /tmp/b
When they are finished, run the output files through sort and uniq -u to tell you the lines that only appear once (they should all appear twice if the files are the same in both directories):
sort < /tmp/[ab] | uniq -u

Can I limit the recursion when copying using find (bash)

I have been given a list of folders which need to be found and copied to a new location.
I have basic knowledge of bash and have created a script to find and copy.
The basic command I am using is working, to a certain degree:
find ./ -iname "*searchString*" -type d -maxdepth 1 -exec cp -r {} /newPath/ \;
The problem I want to resolve is that each found folder contains the files that I want, but also contains subfolders which I do not want.
Is there any way to limit the recursion so that only the files at the root level of the found folder are copied: all subdirectories and files therein should be ignored.
Thanks in advance.
If you remove -R, cp doesn't copy directories:
cp *searchstring*/* /newpath
The command above copies dir1/file1 to /newpath/file1, but these commands copy it to /newpath/dir1/file1:
cp --parents *searchstring*/*(.) /newpath
for GNU cp and zsh
. is a qualifier for regular files in zsh
cp --parents dir1/file1 dir2 copies file1 to dir2/dir1 in GNU cp
t=/newpath;for d in *searchstring*/;do mkdir -p "$t/$d";cp "$d"* "$t/$d";done
find *searchstring*/ -type f -maxdepth 1 -exec rsync -R {} /newpath \;
-R (--relative) is like --parents in GNU cp
find . -ipath '*searchstring*/*' -type f -maxdepth 2 -exec ditto {} /newpath/{} \;
ditto is only available on OS X
ditto file dir/file creates dir if it doesn't exist
So ... you've been given a list of folders. Perhaps in a text file? You haven't provided an example, but you've said in comments that there will be no name collisions.
One option would be to use rsync, which is available as an add-on package for most versions of Unix and Linux. Rsync is basically an advanced copying tool -- you provide it with one or more sources, and a destination, and it makes sure things are synchronized. It knows how to copy things recursively, but it can't be told to limit its recursion to a particular depth, so the following will copy each item specified to your target, but it will do so recursively.
xargs -L 1 -J % rsync -vi -a % /path/to/target/ < sourcelist.txt
If sourcelist.txt contains a line with /foo/bar/slurm, then the slurm directory will be copied in its entiriety to /path/to/target/slurm/. But this would include directories contained within slurm.
This will work in pretty much any shell, not just bash. But it will fail if one of the lines in sourcelist.txt contains whitespace, or various special characters. So it's important to make sure that your sources (on the command line or in sourcelist.txt) are formatted correctly. Also, rsync has different behaviour if a source directory includes a trailing slash, and you should read the man page and decide which behaviour you want.
You can sanitize your input file fairly easily in sh, or bash. For example:
#!/bin/sh
# Avoid commented lines...
grep -v '^[[:space:]]*#' sourcelist.txt | while read line; do
# Remove any trailing slash, just in case
source=${line%%/}
# make sure source exist before we try to copy it
if [ -d "$source" ]; then
rsync -vi -a "$source" /path/to/target/
fi
done
But this still uses rsync's -a option, which copies things recursively.
I don't see a way to do this using rsync alone. Rsync has no -depth option, as find has. But I can see doing this in two passes -- once to copy all the directories, and once to copy the files from each directory.
So I'll make up an example, and assume further that folder names do not contain special characters like spaces or newlines. (This is important.)
First, let's do a single-pass copy of all the directories themselves, not recursing into them:
xargs -L 1 -J % rsync -vi -d % /path/to/target/ < sourcelist.txt
The -d option creates the directories that were specified in sourcelist.txt, if they exist.
Second, let's walk through the list of sources, copying each one:
# Basic sanity checking on input...
grep -v '^[[:space:]]*#' sourcelist.txt | while read line; do
if [ -d "$line" ]; then
# Strip trailing slashes, as before
source=${line%%/}
# Grab the directory name from the source path
target=${source##*/}
rsync -vi -a "$source/" "/path/to/target/$target/"
fi
done
Note the trailing slash after $source on the rsync line. This causes rsync to copy the contents of the directory, rather than the directory.
Does all this make sense? Does it match your requirements?
You can use find's ipath argument:
find . -maxdepth 2 -ipath './*searchString*/*' -type f -exec cp '{}' '/newPath/' ';'
Notice the path starts with ./ to match find's search directory, ends with /* in order to exclude files in the top level directory, and maxdepth is set to 2 to only recurse one level deep.
Edit:
Re-reading your comments, it seems like you want to preserve the directory you're copying from? E.g. when searching for foo*:
./foo1/* ---> copied to /newPath/foo1/* (not to /newPath/*)
./foo2/* ---> copied to /newPath/foo2/* (not to /newPath/*)
Also, the other requirement is to keep maxdepth at 1 for speed reasons.
(As pointed out in the comments, the following solution has security issues for specially crafted names)
Combining both, you could use this:
find . -maxdepth 1 -type d -iname 'searchString' -exec sh -c "mkdir -p '/newPath/{}'; cp "{}/*" '/newPath/{}/' 2>/dev/null" ';'
Edit 2:
Why not ditch find altogether and use a pure bash solution:
for d in *searchString*/; do mkdir -p "/newPath/$d"; cp "$d"* "/newPath/$d"; done
Note the / at the end of the search string, causing only directories to be considered for matching.

Unix tar: do not preserve full pathnames

When I try to compress files and directories with tar using absolute paths, the absolute path is preserved in the resulting compressed file. I need to use absolute paths to tell tar where the folder I wish to compress is located, but I only want it to compress that folder – not the whole path.
For example, tar -cvzf test.tar.gz /home/path/test – where I want to compress the folder test. However, what I actually end up compressing is /home/path/test. Is there anything that can be done to avoid this? I have tried playing with the -C operand to no avail.
This is ugly... but it works...
I had this same problem but with multiple folders, I just wanted to flat every files out. You can use the option "transform" to pass a sed expression and... it works as expected.
this is the expression:
's/.*\///g' (delete everything before '/')
This is the final command:
tar --transform 's/.*\///g' -zcvf tarballName.tgz */*/*.info
Use -C to specify the directory from which the files look like you want, and then specify the files as seen from that directory:
tar -cvzf test.tar.gz -C /home/path test
multi-directory example
tar cvzf my.tar.gz example.zip -C dir1 files_under_dir1 -C dir2 files_under_dir2
the files under dir 1/2 should not have path.
tar can perform transformations on the filenames on the way in and out of the archive. Consider this example that stores a bunch of files in a flat tarfile:
in the root ~/ directory
find logger -name \*.sh | tar -cvf test.tar -T - --xform='s|^.*/||' --show-transformed
The -T - option tell tar to read a list of files from stdin, the --xform='s|^.*/||' applies the sed expression to all the filenames as after they are read and before they are stored. --show-transformed is just a nicety to show you the file names after they are transformed, the default is to show the names as they are read.
There are no doubt other ways besides using find to specify files to archive. For instance, if you have dotglob set in bash, you can use ** patterns to wildcard any number of directories, shortening the previous to this:
tar -cvf test.tar --xform='s|^.*/||' --show-transformed logger/**/*.sh
You’ll have to judge what is best for your situation given the files you’re after.
find -type f | tar --transform 's/.*\///g' -zcvf comp.tar.gz -T -
Where find -type f finds all the files in the directory tree and using tar with --transform compresses them without including the folder structure. This is very useful if you want to compress only the files that are the result of a certain search or the files of a specific type like:
find -type f -name "*.txt" | tar --transform 's/.*\///g' -zcvf comp.tar.gz -T -
Unlike the other answers, you don't have to include */*/* specifying the depth of the directory. find handles that for you.

Making archive from files with same names in different directories

I have some files with same names but under different directories. For example, path1/filea, path1/fileb, path2/filea, path2/fileb,....
What is the best way to make the files into an archive? Under these directories, there are many other files under these directories that I don't want to make into the archive. Off the top of my head, I think of using Bash, probably ar, tar and other commands, but am not sure how exactly to do it.
Renaming the files seems to make the file names a little complicated. I tend to keep the directory structure inside the archive. Or I might be wrong. Other ideas are welcome!
Thanks and regards!
EDIT:
Examples would be really nice!
you can use tar with --exclude PATTERN option. See the man page for more.
To exclude files, you can see this page for examples.
You may give the find command multiple directories to search through.
# example: create archive of .tex files
find -x LaTeX-files1 LaTeX-files2 -name "*.tex" -print0 | tar --null --no-recursion -uf LaTeXfiles.tar --files-from -
To recursively copy only files with filename "filea" or "fileb" from /path/to/source to /path/to/archive, you could use:
rsync -avm --include='file[ab]' -f 'hide,! */' /path/to/source/ /path/to/archive/
'*/' is a pattern which matches 'any directory'
'! */' matches anything which is not a directory (i.e. a file)
'hide,! */' means hide all files
Filter rules are applied in order, and the first rule that matches is applied.
--include='file[ab]' has precedence, so if a file matches 'file[ab]', it is included.
Any other file gets excluded from the list of files to transfer.
Another alternative is to use the find...exec pattern:
mkdir /path/to/archive
cd /path/to/source
find . -type f -iname "file[ab]" -exec cp --parents '{}' /path/to/archive ";"
What I have used to make a tar ball for the files with same name in different directories is
$find <path> -name <filename> -exec tar -rvf data.tar '{}' \;
i.e. tar [-]r --append
Hope this helps.

Resources