cp -r * except dont copy any .pdf files - copy a directory subtree while excluding files with a given extension - bash

Editor's note: In the original form of the question the aspect of copying an entire subtree was not readily obvious.
How do I copy all the files from one directory subtree to another but omit all files of one type?
Does bash handle regex?
Something like: cp -r !*.pdf /var/www/ .?
EDIT 1
I have a find expression: find /var/www/ -not -iname "*.pdf"
This lists all the files that I want to copy. How do I pipe this to a copy command?
EDIT 2
This works so long as the argument list is not too long:
sudo cp `find /var/www/ -not -iname "*.pdf"` .
EDIT 3
One issue though is that I am running into issues with losing the directory structure.

Bash can't help here, unfortunately.
Many people use either tar or rsync for this type of task because each of them is capable of recursively copying files, and each provides an --exclude argument for excluding certain filename patterns. tar is more likely to be installed on a given machine, so I'll show you that.
Assuming you are currently in the destination directory, the shell command:
tar -cC /var/www . | tar -x
will copy all files from /var/www into the current directory recursively.
To filter out the PDF files, use:
tar -cC /var/www --exclude '*.pdf' . | tar -x
Multiple --exclude arguments can be given, so:
tar -cC /var/www --exclude '*.pdf' --exclude '*.txt' . | tar -x
would exclude .txt files as well.

K. A. Buhr's helpful answer is a concise solution that reflects the intent well and is easily extensible if multiple extensions should be excluded.
Trying to do it with POSIX utilities and POSIX-compliant options alone requires a slightly different approach:
cp -pR /var/www/. . && find . -name '*.pdf' -exec rm {} +
In other words: copy the whole subtree first, then remove all *.pdf files from the destination subtree.
Note:
-p preserves the original files' attributes in terms of file timestamps, ownership, and permission bits (tar appears to do that by default); without -p, the copies will be owned by the current user and receive new timestamps (though the permission bits are preserved).
Using cp has one advantage over tar: you get more control over how symlinks among the source files are handled, via the -H, -L, and -P options - see the POSIX spec. for cp.
tar invariably seems to copy symlinks as-is.
-R supersedes the legacy -r option for cp, as the latter's behavior with non-regular files is ill-defined - see the RATIONALE section in the POSIX spec. for cp
Neither -iname for case-insensitive matching nor -delete are part of the POSIX spec. for find, but both GNU find and BSD/macOS find support them.
Note how source path /var/www/. ends in /. to ensure that its contents are copied to the destination path (as opposed to putting everything into a www subfolder).
With BSD cp, /var/www/ (trailing /) would work too, but GNU cp treats /var/www and /var/www/ the same.
As for your questions and solution attempts:
Does bash handle regex?
In the context of filename expansion (globbing), Bash only understands patterns, not regexes (Bash does have the =~ regex-matching operator for string matching inside [[ ... ]] conditionals, however).
As a nonstandard extension, Bash implements the extglob shell option, which adds additional constructs to the pattern-matching notation to allow for more sophisticated matching, such as !(...) for negating matchings, which is what you're looking for.
If you combine that with another nonstandard shell option, globstar (**, Bash v4+), you can construct a single pattern that matches all items except a given sub-pattern across an entire subtree:
/var/www/**/!(*.pdf)
does find all non-PDF filesystem items in the subtree of /var/www/.
However, combining that pattern with cp won't work as intended: with -R, any subdirs. are still copied in full; without -R, subdirs. are ignored altogether.
Caveats:
By default, patterns (globs) ignore hidden items unless explicitly matched (* will only match non-hidden items). To include them, set shell option dotglob first.
Matching is case-sensitive by default; turn on shell option nocaseglob to make it case-insensitive.
find /var/www/ -not -iname "*.pdf" in essence yields the same as the extended glob above, except with case-insensitive matching, hidden items invariably included, and the output paths (generally) not in the same order.
However, copying the output paths to their intended destination is the nontrivial part: you'd have to construct analogous subdirs. in the destination dir. on the fly, and you'd have to do so for each input path separately, which will also be quite slow.
Your own attempt, sudo cp `find /var/www/ -not -iname "*.pdf"` ., falls short in several respects:
As you've discovered yourself, this copies all matching items into a single destination directory.
The output of the command substitution, `...`, is subject to shell expansions, namely word-splitting and filename expansion, which may break the command, notably with filenames with embedded spaces.
Note: As written, all destination items will be owned by the root user.

Edit As per #mklement0's comment below, these solutions are not suitable for directory tree recursion--they will only work on one directory, as per the OP's original form of the question.
#rorschach. Yes, you can do this.
Using cp:
Set your Bash shell's extglob option and type:
shopt -s extglob #You can set this in your shell startup to enable it by default
cp /var/www/!(*.pdf) .
If you wish to turn off (unset) this (or any other) shell option, use:
shopt -u extglob #or whatever shell option you wish to unset
Using find
If you prefer using find, you can use xargs to execute the operation you would like Bash to perform:
find /var/www/ ! -iname "*.pdf" -maxdepth 1 | xargs -I{} cp {} .

Related

How to delete all files in a dir except ones with a certain pattern in their name?

I have a lot of kernel .deb files from my custom kerenls. I would like to write a bash that would delete all the old files except the ones associated with the currently installed kernel version. My script:
#!/bin/bash
version='uname -r'
$version
dir=~/Installed-kernels
ls | grep -v '$version*' | xargs rm
Unfortunately, this deletes all files in the dir.
How can I get the currently installed kernel version and set said version as a perimeter with? Each .deb I want to keep contains the kernel version (5.18.8) but have other strings in their name (linux-headers-5.18.8_5.18.8_amd64.deb).
Edit: I am only deleting .deb files inside the noted directory. The current list of file names in the tree are
linux-headers-5.18.8-lz-xan1_5.18.8-lz-1_amd64.deb
linux-libc-dev_5.18.8-lz-1_amd64.deb
linux-image-5.18.8-lz-xan1_5.18.8-lz-1_amd64.deb
This can be done as a one-liner, though I've preserved your variables:
#!/bin/bash
version="$(uname -r)"
dir="$HOME/Installed-kernels"
find "$dir" -maxdepth 1 -type f -not -name "*$version*" -print0 |xargs -0 rm
To set a variable to the output of a command, you need either $(…) or `…`, ideally wrapped in double-quotes to preserve spacing. A tilde isn't always interpreted correctly when passed through variables, so I expanded that out to $HOME.
The find command is much safer to parse than the output of ls, plus it lets you better filter things. In this case, -maxdepth 1 will look at just that directory (no recursion), -type f seeks only files, and -not -name "*$version*" removes paths or filenames that match the kernel version (which is a glob, not a regex—you'd otherwise have to escape the dots). Also note those quotes; we want find to see the asterisks, and without the quotes, the shell will expand the glob prematurely. The -print0 and corresponding -0 ensure that you preserve spacing by delimiting entries with null characters.
You can remove the prompts regarding read-only files with rm -f.
If you also want to delete directories, remove the -type f part and add -r to the end of that final line.

Copying a file into multiple directories in bash

I have a file I would like to copy into about 300,000 different directories, these are themselves split between two directories, e.g.
DirA/Dir001/
...
DirB/Dir149000/
However when I try:
cp file.txt */*
It returns:
bash: /bin/cp: Argument list too long
What is the best way of copying a file into multiple directories, when you have too many to use cp?
The answer to the question as asked is find.
find . -mindepth 2 -maxdepth 2 -type d -exec cp script.py {} \;
But of course #triplee is right... why make so many copies of a file?
You could, of course, instead create links to the file...
find . -mindepth 2 -maxdepth 2 -type d -exec ln script.py {} \;
The options -mindepth 2 -maxdepth 2 limit the recursive search of find to elements exactly two levels deep from the current directory (.). The -type d matches all directories. -exec then executes the command (up to the closing \;), for each element found, replacing the {} with the name of the element (the two-levels-deep subdirectory).
The links created are hard links. That means, you edit the script in one place, the script will look different in all places. The script is, for all intents and purposes, in all the places, with none of them being any less "real" than the others. (This concept can be surprising to those not used to it.) Use ln -s if you instead want to create "soft" links, which are mere references to "the one, true" script.py in the original location.
The beauty of find ... -exec ... {}, as opposed to many other ways to do it, is that it will work correctly even for filenames with "funny" characters in them, including but not limited to spaces or newlines.
But still, you should really only need one script. You should fix the part of your project where you need that script in every directory; that is the broken part...
Extrapolating from the answer to your other question you seem to have code which looks something like
for TGZ in $(find . -name "file.tar.gz")
do
mkdir -p work
cd work
tar xzf $TGZ
python script.py
cd ..
rm -rf work
done
Of course, the trivial fix is to replace
python script.py
with
python ../script.py
and voilá, you no longer need a copy of the script in each directory at all.
I woud further advice to refactor out the cd and changing script.py so you can pass it the directory to operate on as a command-line argument. (Briefly, import sys and examine the value of sys.argv[1] though you'll often want to have option parsing and support for multiple arguments; argparse from the Python standard library is slightly intimidating, but there are friendly third-party wrappers like click.)
As an aside, many beginners seem to think the location of your executable is going to be the working directory when it executes. This is obviously not the case; or /bin/ls woul only list files in /bin.
To get rid of the cd problem mentioned in a comment, a minimal fix is
for tgz in $(find . -name "file.tar.gz")
do
mkdir -p work
tar -C work -x -z -f "$tgz"
(cd work; python ../script.py)
rm -rf work
done
Again, if you can change the Python script so it doesn't need its input files in the current directory, this can be simplified further. Notice also the preference for lower case for your variables, and the use of quoting around variables which contain file names. The use of find in a command substitution is still slightly broken (it can't work for file names which contain whitespace or shell metacharacters) but maybe that's a topic for a separate question.

Can I limit the recursion when copying using find (bash)

I have been given a list of folders which need to be found and copied to a new location.
I have basic knowledge of bash and have created a script to find and copy.
The basic command I am using is working, to a certain degree:
find ./ -iname "*searchString*" -type d -maxdepth 1 -exec cp -r {} /newPath/ \;
The problem I want to resolve is that each found folder contains the files that I want, but also contains subfolders which I do not want.
Is there any way to limit the recursion so that only the files at the root level of the found folder are copied: all subdirectories and files therein should be ignored.
Thanks in advance.
If you remove -R, cp doesn't copy directories:
cp *searchstring*/* /newpath
The command above copies dir1/file1 to /newpath/file1, but these commands copy it to /newpath/dir1/file1:
cp --parents *searchstring*/*(.) /newpath
for GNU cp and zsh
. is a qualifier for regular files in zsh
cp --parents dir1/file1 dir2 copies file1 to dir2/dir1 in GNU cp
t=/newpath;for d in *searchstring*/;do mkdir -p "$t/$d";cp "$d"* "$t/$d";done
find *searchstring*/ -type f -maxdepth 1 -exec rsync -R {} /newpath \;
-R (--relative) is like --parents in GNU cp
find . -ipath '*searchstring*/*' -type f -maxdepth 2 -exec ditto {} /newpath/{} \;
ditto is only available on OS X
ditto file dir/file creates dir if it doesn't exist
So ... you've been given a list of folders. Perhaps in a text file? You haven't provided an example, but you've said in comments that there will be no name collisions.
One option would be to use rsync, which is available as an add-on package for most versions of Unix and Linux. Rsync is basically an advanced copying tool -- you provide it with one or more sources, and a destination, and it makes sure things are synchronized. It knows how to copy things recursively, but it can't be told to limit its recursion to a particular depth, so the following will copy each item specified to your target, but it will do so recursively.
xargs -L 1 -J % rsync -vi -a % /path/to/target/ < sourcelist.txt
If sourcelist.txt contains a line with /foo/bar/slurm, then the slurm directory will be copied in its entiriety to /path/to/target/slurm/. But this would include directories contained within slurm.
This will work in pretty much any shell, not just bash. But it will fail if one of the lines in sourcelist.txt contains whitespace, or various special characters. So it's important to make sure that your sources (on the command line or in sourcelist.txt) are formatted correctly. Also, rsync has different behaviour if a source directory includes a trailing slash, and you should read the man page and decide which behaviour you want.
You can sanitize your input file fairly easily in sh, or bash. For example:
#!/bin/sh
# Avoid commented lines...
grep -v '^[[:space:]]*#' sourcelist.txt | while read line; do
# Remove any trailing slash, just in case
source=${line%%/}
# make sure source exist before we try to copy it
if [ -d "$source" ]; then
rsync -vi -a "$source" /path/to/target/
fi
done
But this still uses rsync's -a option, which copies things recursively.
I don't see a way to do this using rsync alone. Rsync has no -depth option, as find has. But I can see doing this in two passes -- once to copy all the directories, and once to copy the files from each directory.
So I'll make up an example, and assume further that folder names do not contain special characters like spaces or newlines. (This is important.)
First, let's do a single-pass copy of all the directories themselves, not recursing into them:
xargs -L 1 -J % rsync -vi -d % /path/to/target/ < sourcelist.txt
The -d option creates the directories that were specified in sourcelist.txt, if they exist.
Second, let's walk through the list of sources, copying each one:
# Basic sanity checking on input...
grep -v '^[[:space:]]*#' sourcelist.txt | while read line; do
if [ -d "$line" ]; then
# Strip trailing slashes, as before
source=${line%%/}
# Grab the directory name from the source path
target=${source##*/}
rsync -vi -a "$source/" "/path/to/target/$target/"
fi
done
Note the trailing slash after $source on the rsync line. This causes rsync to copy the contents of the directory, rather than the directory.
Does all this make sense? Does it match your requirements?
You can use find's ipath argument:
find . -maxdepth 2 -ipath './*searchString*/*' -type f -exec cp '{}' '/newPath/' ';'
Notice the path starts with ./ to match find's search directory, ends with /* in order to exclude files in the top level directory, and maxdepth is set to 2 to only recurse one level deep.
Edit:
Re-reading your comments, it seems like you want to preserve the directory you're copying from? E.g. when searching for foo*:
./foo1/* ---> copied to /newPath/foo1/* (not to /newPath/*)
./foo2/* ---> copied to /newPath/foo2/* (not to /newPath/*)
Also, the other requirement is to keep maxdepth at 1 for speed reasons.
(As pointed out in the comments, the following solution has security issues for specially crafted names)
Combining both, you could use this:
find . -maxdepth 1 -type d -iname 'searchString' -exec sh -c "mkdir -p '/newPath/{}'; cp "{}/*" '/newPath/{}/' 2>/dev/null" ';'
Edit 2:
Why not ditch find altogether and use a pure bash solution:
for d in *searchString*/; do mkdir -p "/newPath/$d"; cp "$d"* "/newPath/$d"; done
Note the / at the end of the search string, causing only directories to be considered for matching.

rsync : Recursively sync all files while ignoring the directory structure

I am trying to create a bash script for syncing music from my desktop to a mobile device. The desktop is the source.
Is there a way to make rsync recursively sync files but ignore the directory structure? If a file was deleted from the desktop, I want it to be deleted on the device as well.
The directory structure on my desktop is something like this.
Artist1/
Artist1/art1_track1.mp3
Artist1/art1_track2.mp3
Artist1/art1_track3.mp3
Artist2/
Artist2/art2_track1.mp3
Artist2/art2_track2.mp3
Artist2/art2_track3.mp3
...
The directory structure that I want on the device is:
Music/
art1_track1.mp3
art1_track2.mp3
art1_track3.mp3
art2_track1.mp3
art2_track2.mp3
art2_track3.mp3
...
Simply:
rsync -a --delete --include=*.mp3 --exclude=* \
pathToSongs/Theme*/Artist*/. destuser#desthost:Music/.
would do the job if you're path hierarchy has a fixed number of level.
WARNING: if two song file do have exactly same name, while on same destination directory, your backup will miss one of them!
If else, and for answering strictly to your ask ignoring the directory structure you could use bash's shopt -s globstar feature:
shopt -s globstar
rsync -a --delete --include=*.mp3 --exclude=* \
pathToSongsRoot/**/. destuser#desthost:Music/.
At all, there is no need to fork to find command.
Recursively sync all files while ignoring the directory structure
For answering strictly to question, there must no be limited to an extension:
shopt -s globstar
rsync -d --delete sourceRoot/**/. destuser#desthost:destRoot/.
With this, directories will be copied too, but without content. All files and directories would be stored on same level at destRoot/.
WARNING: If some different files with same name exists in defferents directories, they would simply be overwrited on destination, durring rsync, for finaly storing randomly only one.
May be this is a recent option, but I see the option --no-relative mentioned in the documentation for --files-from and it worked great.
find SourceDir -name \*.mp3 | rsync -av --files-from - --no-relative . DestinationDir/
The answer to your question: No, rsync cannot do this alone. But with some help of other tools, we can get there... After a few tries I came up with this:
rsync -d --delete $(find . -type d|while read d ; do echo $d/ ; done) /targetDirectory && rmdir /targetDirectory/* 2>&-
The difficulty is this: To enable deletion of files at the target position, you need to:
specify directories as sources for rsync (it doesn't delete if the source is a list of files).
give it the complete list of sources at once (rsync within a loop will give you the contents of the last directory only at the target).
end the directory names with a slash (otherwise it creates the directories at the target directory)
So the command substitution (the stuff enclosed with the $( )) does this: It finds all directories and adds a slash (/) at the end of the directory names. Now rsync sees a list of source directories, all terminated with a slash and so copies their contents to the target directory. The option -d tells it, not to copy recursively.
The second trick is the rmdir /targetDirectory/* which removes the empty directories which rsync created (although we didn't ask it to do that).
I tested that here, and deletion of files removed in the source tree worked just fine.
If you can make a list of files, you've already solved the problem.
Try:
find /path/to/src/ -name \*.mp3 > list.txt
rsync -avi --no-relative --progress --files-from=list.txt / user#server:/path/to/dest
If you run the script again for new files, it will only copy the missing files.
If you don't like the list, then try a single sentence (but it's another logic)
find /path/to/src/ -name \*.mp3 -type f \
-exec rsync -avi --progress {} user#server:/path/to/dest/ \;
In this case, you will ask for each file, each time, since by the type of sentence, you cannot build the file list previously.

How to use the .* wildcard in bash but exclude the parent directory (..)?

There are often times that I want to execute a command on all files (including hidden files) in a directory. When I try using
chmod g+w * .*
it changes the permissions on all the files I want (in the directory) and all the files in the parent directory (that I want left alone).
Is there a wildcard that does the right thing or do I need to start using find?
You will need two glob patterns to cover all the potential “dot files”: .[^.]* and ..?*.
The first matches all directory entries with two or more characters where the first character is a dot and the second character is not a dot. The second picks up entries with three or more characters that start with .. (this excludes .. because it only has two characters and starts with a ., but includes (unlikely) entries like ..foo).
chmod g+w .[^.]* ..?*
This should work well in most all shells and is suitable for scripts.
For regular interactive use, the patterns may be too difficult to remember. For those cases, your shell might have a more convenient way to skip . and ...
zsh always excludes . and .. from patterns like .*.
With bash, you have to use the GLOBIGNORE shell variable.
# bash
GLOBIGNORE=.:..
echo .*
You might consider setting GLOBIGNORE in one of your bash customization files (e.g. .bash_profile/.bash_login or .bashrc).
Beware, however, becoming accustomed to this customization if you often use other environments.
If you run a command like chmod g+w .* in an environment that is missing your customization, then you will unexpectedly end up including . and .. in your command.
Additionally, you can configure the shells to include “dot files” in patterns that do not start with an explicit dot (e.g. *).
# zsh
setopt glob_dots
# bash
shopt -s dotglob
# show all files, even “dot files”
echo *
Usually I would just use . .[a-zA-Z0-9]* since my file names tend to follow certain rules, but that won't catch all possible cases.
You can use:
chmod g+w $(ls -1a | grep -v '^..$')
which will basically list all the files and directories, strip out the parent directory then process the rest. Beware of spaces in file names though, it'll treat them as separate files.
Of course, if you just want to do files, you can use:
find . -maxdepth 0 -type f -exec chmod g+w {} ';'
or, yet another solution, which should do all files and directories except the .. one:
for i in * .* ; do if [[ ${i} != ".." ]] ; then chmod g+w "$i"; fi done
but now you're getting into territory where scripts or aliases may be necessary.
What i did was
tar --directory my_directory --file my_directory.tar --create `ls -A mydirectory/`
Works just fine the ls -A my_directory expands to everything in the directory except . and ... No wierd globs, and on a single line.
ps: Perhaps someone will tell me why this is not a good idea. :p
How about:
shopt -s dotglob
chmod g+w ./*
Since you may not want to set dotglob for the rest of your bash session you can set it for a single set of commands by running in a subprocess like so:
$ (shopt -s dotglob; chmod g+w ./*)
If you are sure that two character hidden file names will never be used, then the simplest option is just be to do:
chmod g+w * ...*

Resources