Copying a file into multiple directories in bash - bash

I have a file I would like to copy into about 300,000 different directories, these are themselves split between two directories, e.g.
DirA/Dir001/
...
DirB/Dir149000/
However when I try:
cp file.txt */*
It returns:
bash: /bin/cp: Argument list too long
What is the best way of copying a file into multiple directories, when you have too many to use cp?

The answer to the question as asked is find.
find . -mindepth 2 -maxdepth 2 -type d -exec cp script.py {} \;
But of course #triplee is right... why make so many copies of a file?
You could, of course, instead create links to the file...
find . -mindepth 2 -maxdepth 2 -type d -exec ln script.py {} \;
The options -mindepth 2 -maxdepth 2 limit the recursive search of find to elements exactly two levels deep from the current directory (.). The -type d matches all directories. -exec then executes the command (up to the closing \;), for each element found, replacing the {} with the name of the element (the two-levels-deep subdirectory).
The links created are hard links. That means, you edit the script in one place, the script will look different in all places. The script is, for all intents and purposes, in all the places, with none of them being any less "real" than the others. (This concept can be surprising to those not used to it.) Use ln -s if you instead want to create "soft" links, which are mere references to "the one, true" script.py in the original location.
The beauty of find ... -exec ... {}, as opposed to many other ways to do it, is that it will work correctly even for filenames with "funny" characters in them, including but not limited to spaces or newlines.
But still, you should really only need one script. You should fix the part of your project where you need that script in every directory; that is the broken part...

Extrapolating from the answer to your other question you seem to have code which looks something like
for TGZ in $(find . -name "file.tar.gz")
do
mkdir -p work
cd work
tar xzf $TGZ
python script.py
cd ..
rm -rf work
done
Of course, the trivial fix is to replace
python script.py
with
python ../script.py
and voilá, you no longer need a copy of the script in each directory at all.
I woud further advice to refactor out the cd and changing script.py so you can pass it the directory to operate on as a command-line argument. (Briefly, import sys and examine the value of sys.argv[1] though you'll often want to have option parsing and support for multiple arguments; argparse from the Python standard library is slightly intimidating, but there are friendly third-party wrappers like click.)
As an aside, many beginners seem to think the location of your executable is going to be the working directory when it executes. This is obviously not the case; or /bin/ls woul only list files in /bin.
To get rid of the cd problem mentioned in a comment, a minimal fix is
for tgz in $(find . -name "file.tar.gz")
do
mkdir -p work
tar -C work -x -z -f "$tgz"
(cd work; python ../script.py)
rm -rf work
done
Again, if you can change the Python script so it doesn't need its input files in the current directory, this can be simplified further. Notice also the preference for lower case for your variables, and the use of quoting around variables which contain file names. The use of find in a command substitution is still slightly broken (it can't work for file names which contain whitespace or shell metacharacters) but maybe that's a topic for a separate question.

Related

Check if file is in a folder with a certain name before proceeding

So, I have this simple script which converts videos in a folder into a format which the R4DS can play.
#!/bin/bash
scr='/home/user/dpgv4/dpgv4.py';mkdir -p 'DPG_DS'
find '../Exports' -name "*1080pnornmain.mp4" -exec python3 "$scr" {} \;
The problem is, some of the videos are invalid and won't play, and I've moved those videos to a different directory inside the Exports folder. What I want to do is check to make sure the files are in a folder called new before running the python script on them, preferably within the find command. The path should look something like this:
../Exports/(anything here)/new/*1080pnornmain.mp4
Please note that (anything here) text does not indicate a single directory, it could be something like foo/bar, foo/b/ar, f/o/o/b/a/r, etc.
You cannot use -name because the search is on the path now. My first solution was:
find ./Exports -path '**/new/*1080pnornmain.mp4' -exec python3 "$scr" {} \;
But, as #dan pointed out in the comments, it is wrong because it uses the globstar wildcard (**) unnecessarily:
This checks if /new/ is somewhere in the preceding path, it doesn't have to be a direct parent.
So, the star is not enough here. Another possibility, using find only, could be this one:
find ./Exports -regex '.*/new/[^\/]*1080pnornmain.mp4' -exec python3 "$scr" {} \;
This regex matches:
any number of nested folders before new with .*/new
any character (except / to leave out further subpaths) + your filename with [^\/]*1080pnornmain.mp4
Performances could degrade given that it uses regular expressions.
Generally, instead of using the -exec option of the find command, you should opt to passing each line of find output to xargs because of the more efficient thread spawning, like:
find ./Exports -regex '.*/new/[^\/]*1080pnornmain.mp4' | xargs -0 -I '{}' python3 "$scr" '{}'

Using both command substitution and executing a shell within GNU "find" exec command

I am a bash newbie, and I'm trying to do something that seems fairly straightforward but am having issues.
I am trying to search for a file with a pretty generic but nonunique name (e.g. analysis.uniqueExt, but also maybe sorted_result.uniqueExt) that can be within one specific subdirectory of a directory that was found from a different 'find' query. Then I would like to copy that file to my personal directory whilst also renaming the file to something more descriptive that hints to its origin location.
Here is an example of what I have tried:
case=/home/data/ABC_123 # In reality this is coming from a different query successfully
specific_id=ABC_123 # This was extracted from the previous variable
OUTDIR=/my/personal/directory
mkdir -p $OUT_DIR/$this_folder
find $case/subfolder/ -type f -name "*.uniqueExt" -exec sh -c 'cp "$1" ${OUT_DIR}/${specific_id}/$(basename "$1")' sh {} \;
This doesn't work because OUT_DIR and specific_id are not scoped in the inner shell created by the -exec command.
So I tried to do this another way:
find $case/subfolder/ -type f -name "*.uniqueExt" -exec cp {} ${OUT_DIR}/${specific_id}/$(basename {}) \;
However now I cannot extract the basename of the file found in the 'find' query as I have not invoked a shell to do so.
Is there a way I can either properly scope my variables in example #1 or execute the basename function in example #2 to accomplish this? Or maybe there is a totally different solution (possibly involving multiple -exec calls? Or maybe just piping the find results to xargs?).
Thanks for your help!
You need to export the variables since you're using them in a different shell process than the one you assigned them in.
Exporting variables makes them available in descendant processes.
export specific_id=ABC_123 # This was extracted from the previous variable
export OUTDIR=/my/personal/directory
However, you don't really need to use the shell for this. You can use
find $case/subfolder/ -type f -name "*.uniqueExt" -exec cp -t "$OUTDIR/$specific_id/" {} +
You don't have to call basename yourself, because copying a file to a target directory automatically uses the basename as the destination filename.
In my version, I use the -t option so I can put the destination directory first. This allows it to use the + variant to put all the found filenames in a single command, rather than running cp separately for each file.

cp -r * except dont copy any .pdf files - copy a directory subtree while excluding files with a given extension

Editor's note: In the original form of the question the aspect of copying an entire subtree was not readily obvious.
How do I copy all the files from one directory subtree to another but omit all files of one type?
Does bash handle regex?
Something like: cp -r !*.pdf /var/www/ .?
EDIT 1
I have a find expression: find /var/www/ -not -iname "*.pdf"
This lists all the files that I want to copy. How do I pipe this to a copy command?
EDIT 2
This works so long as the argument list is not too long:
sudo cp `find /var/www/ -not -iname "*.pdf"` .
EDIT 3
One issue though is that I am running into issues with losing the directory structure.
Bash can't help here, unfortunately.
Many people use either tar or rsync for this type of task because each of them is capable of recursively copying files, and each provides an --exclude argument for excluding certain filename patterns. tar is more likely to be installed on a given machine, so I'll show you that.
Assuming you are currently in the destination directory, the shell command:
tar -cC /var/www . | tar -x
will copy all files from /var/www into the current directory recursively.
To filter out the PDF files, use:
tar -cC /var/www --exclude '*.pdf' . | tar -x
Multiple --exclude arguments can be given, so:
tar -cC /var/www --exclude '*.pdf' --exclude '*.txt' . | tar -x
would exclude .txt files as well.
K. A. Buhr's helpful answer is a concise solution that reflects the intent well and is easily extensible if multiple extensions should be excluded.
Trying to do it with POSIX utilities and POSIX-compliant options alone requires a slightly different approach:
cp -pR /var/www/. . && find . -name '*.pdf' -exec rm {} +
In other words: copy the whole subtree first, then remove all *.pdf files from the destination subtree.
Note:
-p preserves the original files' attributes in terms of file timestamps, ownership, and permission bits (tar appears to do that by default); without -p, the copies will be owned by the current user and receive new timestamps (though the permission bits are preserved).
Using cp has one advantage over tar: you get more control over how symlinks among the source files are handled, via the -H, -L, and -P options - see the POSIX spec. for cp.
tar invariably seems to copy symlinks as-is.
-R supersedes the legacy -r option for cp, as the latter's behavior with non-regular files is ill-defined - see the RATIONALE section in the POSIX spec. for cp
Neither -iname for case-insensitive matching nor -delete are part of the POSIX spec. for find, but both GNU find and BSD/macOS find support them.
Note how source path /var/www/. ends in /. to ensure that its contents are copied to the destination path (as opposed to putting everything into a www subfolder).
With BSD cp, /var/www/ (trailing /) would work too, but GNU cp treats /var/www and /var/www/ the same.
As for your questions and solution attempts:
Does bash handle regex?
In the context of filename expansion (globbing), Bash only understands patterns, not regexes (Bash does have the =~ regex-matching operator for string matching inside [[ ... ]] conditionals, however).
As a nonstandard extension, Bash implements the extglob shell option, which adds additional constructs to the pattern-matching notation to allow for more sophisticated matching, such as !(...) for negating matchings, which is what you're looking for.
If you combine that with another nonstandard shell option, globstar (**, Bash v4+), you can construct a single pattern that matches all items except a given sub-pattern across an entire subtree:
/var/www/**/!(*.pdf)
does find all non-PDF filesystem items in the subtree of /var/www/.
However, combining that pattern with cp won't work as intended: with -R, any subdirs. are still copied in full; without -R, subdirs. are ignored altogether.
Caveats:
By default, patterns (globs) ignore hidden items unless explicitly matched (* will only match non-hidden items). To include them, set shell option dotglob first.
Matching is case-sensitive by default; turn on shell option nocaseglob to make it case-insensitive.
find /var/www/ -not -iname "*.pdf" in essence yields the same as the extended glob above, except with case-insensitive matching, hidden items invariably included, and the output paths (generally) not in the same order.
However, copying the output paths to their intended destination is the nontrivial part: you'd have to construct analogous subdirs. in the destination dir. on the fly, and you'd have to do so for each input path separately, which will also be quite slow.
Your own attempt, sudo cp `find /var/www/ -not -iname "*.pdf"` ., falls short in several respects:
As you've discovered yourself, this copies all matching items into a single destination directory.
The output of the command substitution, `...`, is subject to shell expansions, namely word-splitting and filename expansion, which may break the command, notably with filenames with embedded spaces.
Note: As written, all destination items will be owned by the root user.
Edit As per #mklement0's comment below, these solutions are not suitable for directory tree recursion--they will only work on one directory, as per the OP's original form of the question.
#rorschach. Yes, you can do this.
Using cp:
Set your Bash shell's extglob option and type:
shopt -s extglob #You can set this in your shell startup to enable it by default
cp /var/www/!(*.pdf) .
If you wish to turn off (unset) this (or any other) shell option, use:
shopt -u extglob #or whatever shell option you wish to unset
Using find
If you prefer using find, you can use xargs to execute the operation you would like Bash to perform:
find /var/www/ ! -iname "*.pdf" -maxdepth 1 | xargs -I{} cp {} .

Copying files with specific size to other directory

Its a interview question. Interviewer asked this "basic" shell script question when he understand i don't have experience in shell scripting. Here is question.
Copy files from one directory which has size greater than 500 K to another directory.
I can do it immediately in c lang but seems difficult in shell script as never tried it.I am familiar with unix basic commands so i tried it, but i can just able to extract those file names using below command.
du -sk * | awk '{ if ($1>500) print $2 }'
Also,Let me know good shell script examples book.
It can be done in several ways. I'd try and use find:
find $FIRSTDIRECTORY -size +500k -exec cp "{\} $SECONDDIRECTORY \;
To limit to the current directory, use -maxdepth option.
du recurses into subdirectories, which is probably not desired (you could have asked for clarification if that point was ambiguous). More likely you were expected to use ls -l or ls -s to get the sizes.
But what you did works to select some files and print their names, so let's build on it. You have a command that outputs a list of names. You need to put the output of that command into the command line of a cp. If your du|awk outputs this:
Makefile
foo.c
bar.h
you want to run this:
cp Makefile foo.c bar.h otherdirectory
So how you do that is with COMMAND SUBSTITUTION which is written as $(...) like this:
cd firstdirectory
cp $(du -sk * | awk '{ if ($1>500) print $2 }') otherdirectory
And that's a functioning script. The du|awk command runs first, and its output is used to build the cp command. There are a lot of subtle drawbacks that would make it unsuitable for general use, but that's how beginner-level shell scripts usually are.
find . -mindepth 1 -maxdepth 1 -type f -size +BYTESc -exec cp -t DESTDIR {}\+
The c suffix on the size is essential; the size is in bytes. Otherwise, you get probably-unexpected rounding behaviour in determining the result of the -size check. If the copying is meant to be recursive, you will need to take care of creating any destination directory also.

How can I use terminal to copy and rename files from multiple folders?

I have a folder called "week1", and in that folder there are about ten other folders that all contain multiple files, including one called "submit.pdf". I would like to be able to copy all of the "submit.pdf" files into one folder, ideally using Terminal to expedite the process. I've tried cp week1/*/submit.pdf week1/ as well as cp week1/*/*.pdf week1/, but it had only been ending up copying one file. I just realized that it has been writing over each file every time which is why I'm stuck with one...is there anyway I can prevent that from happening?
You don't indicate your OS, but if you're using Gnu cp, you can use cp week1/*/submit.pdf --backup=t week/ to have it (arbitrarily) number files that already exist; but, that won't give you any real way to identify which-is-which.
You could, perhaps, do something like this:
for file in week1/*/submit.pdf; do cp "$file" "${file//\//-}"; done
… which will produce files named something like "week1-subdir-submit.pdf"
For what it's worth, the "${var/s/r}" notation means to take var, but before inserting its value, search for s (\/, meaning /, escaped because of the other special / in that expression), and replace it with r (-), to make the unique filenames.
Edit: There's actually one more / in there, to make it match multiple times, making the syntax:
"${ var / / \/ / - }"
take "var" replace every instance of / with -
find to the rescue! Rule of thumb: If you can list the files you want with find, you can copy them. So try first this:
$ cd your_folder
$ find . -type f -iname 'submit.pdf'
Some notes:
find . means "start finding from the current directory"
-type -f means "only find regular files" (i.e., not directories)
-iname 'submit.pdf' "... with case-insensitive name 'submit.dpf'". You don't need to use 'quotation', but if you want to search using wildcards, you need to. E.g.:
~ foo$ find /usr/lib -iname '*.So*'
/usr/lib/pam/pam_deny.so.2
/usr/lib/pam/pam_env.so.2
/usr/lib/pam/pam_group.so.2
...
If you want to search case-sensitive, just use -name instead of -iname.
When this works, you can copy each file by using the -exec command. exec works by letting you specify a command to use on hits. It will run the command for each file find finds, and put the name of the file in {}. You end the sequence of commands by specifying \;.
So to echo all the files, do this:
$ find . -type f -iname submit.pdf -exec echo Found file {} \;
To copy them one by one:
$ find . -type f -iname submit.pdf -exec cp {} /destination/folder \;
Hope this helps!

Resources