Using both command substitution and executing a shell within GNU "find" exec command - bash

I am a bash newbie, and I'm trying to do something that seems fairly straightforward but am having issues.
I am trying to search for a file with a pretty generic but nonunique name (e.g. analysis.uniqueExt, but also maybe sorted_result.uniqueExt) that can be within one specific subdirectory of a directory that was found from a different 'find' query. Then I would like to copy that file to my personal directory whilst also renaming the file to something more descriptive that hints to its origin location.
Here is an example of what I have tried:
case=/home/data/ABC_123 # In reality this is coming from a different query successfully
specific_id=ABC_123 # This was extracted from the previous variable
OUTDIR=/my/personal/directory
mkdir -p $OUT_DIR/$this_folder
find $case/subfolder/ -type f -name "*.uniqueExt" -exec sh -c 'cp "$1" ${OUT_DIR}/${specific_id}/$(basename "$1")' sh {} \;
This doesn't work because OUT_DIR and specific_id are not scoped in the inner shell created by the -exec command.
So I tried to do this another way:
find $case/subfolder/ -type f -name "*.uniqueExt" -exec cp {} ${OUT_DIR}/${specific_id}/$(basename {}) \;
However now I cannot extract the basename of the file found in the 'find' query as I have not invoked a shell to do so.
Is there a way I can either properly scope my variables in example #1 or execute the basename function in example #2 to accomplish this? Or maybe there is a totally different solution (possibly involving multiple -exec calls? Or maybe just piping the find results to xargs?).
Thanks for your help!

You need to export the variables since you're using them in a different shell process than the one you assigned them in.
Exporting variables makes them available in descendant processes.
export specific_id=ABC_123 # This was extracted from the previous variable
export OUTDIR=/my/personal/directory
However, you don't really need to use the shell for this. You can use
find $case/subfolder/ -type f -name "*.uniqueExt" -exec cp -t "$OUTDIR/$specific_id/" {} +
You don't have to call basename yourself, because copying a file to a target directory automatically uses the basename as the destination filename.
In my version, I use the -t option so I can put the destination directory first. This allows it to use the + variant to put all the found filenames in a single command, rather than running cp separately for each file.

Related

What is good way to move a directory and then run a command to the file inside it using a bash shell one-liner

I would like to find txt files with find command and move the directory of the found file, and then apply a command to the file using a bash shell one-liner
For example, this command works, but the acmd is executed in the current directory.
$ find . -name "*.txt" | xargs acmd
I would like to run acmd in the txt file's direcotry.
Does anyone have good idea?
From the find man page:--
-execdir command ;
-execdir command {} +
Like -exec, but the specified command is run from the subdirec‐
tory containing the matched file, which is not normally the
directory in which you started find. This a much more secure
method for invoking commands, as it avoids race conditions dur‐
ing resolution of the paths to the matched files. As with the
-exec action, the `+' form of -execdir will build a command line
to process more than one matched file, but any given invocation
of command will only list files that exist in the same subdirec‐
tory. If you use this option, you must ensure that your $PATH
environment variable does not reference `.'; otherwise, an
attacker can run any commands they like by leaving an appropri‐
ately-named file in a directory in which you will run -execdir.
The same applies to having entries in $PATH which are empty or
which are not absolute directory names. If find encounters an
error, this can sometimes cause an immediate exit, so some pend‐
ing commands may not be run at all. The result of the action
depends on whether the + or the ; variant is being used;
-execdir command {} + always returns true, while -execdir com‐
mand {} ; returns true only if command returns 0.
Just for completeness, the other option would be to do:
$ find . -name \*.txt | xargs -i sh -c 'echo "for file $(basename {}), the directory is $(dirname '{}')"'
for file schedutil.txt, the directory is ./Documentation/scheduler
for file devices.txt, the directory is ./Documentation/admin-guide
for file kernel-parameters.txt, the directory is ./Documentation/admin-guide
for file gdbmacros.txt, the directory is ./Documentation/admin-guide/kdump
...
i.e. have xargs "defer to a shell". In usecases where -execdir suffices, go for it.

Check if file is in a folder with a certain name before proceeding

So, I have this simple script which converts videos in a folder into a format which the R4DS can play.
#!/bin/bash
scr='/home/user/dpgv4/dpgv4.py';mkdir -p 'DPG_DS'
find '../Exports' -name "*1080pnornmain.mp4" -exec python3 "$scr" {} \;
The problem is, some of the videos are invalid and won't play, and I've moved those videos to a different directory inside the Exports folder. What I want to do is check to make sure the files are in a folder called new before running the python script on them, preferably within the find command. The path should look something like this:
../Exports/(anything here)/new/*1080pnornmain.mp4
Please note that (anything here) text does not indicate a single directory, it could be something like foo/bar, foo/b/ar, f/o/o/b/a/r, etc.
You cannot use -name because the search is on the path now. My first solution was:
find ./Exports -path '**/new/*1080pnornmain.mp4' -exec python3 "$scr" {} \;
But, as #dan pointed out in the comments, it is wrong because it uses the globstar wildcard (**) unnecessarily:
This checks if /new/ is somewhere in the preceding path, it doesn't have to be a direct parent.
So, the star is not enough here. Another possibility, using find only, could be this one:
find ./Exports -regex '.*/new/[^\/]*1080pnornmain.mp4' -exec python3 "$scr" {} \;
This regex matches:
any number of nested folders before new with .*/new
any character (except / to leave out further subpaths) + your filename with [^\/]*1080pnornmain.mp4
Performances could degrade given that it uses regular expressions.
Generally, instead of using the -exec option of the find command, you should opt to passing each line of find output to xargs because of the more efficient thread spawning, like:
find ./Exports -regex '.*/new/[^\/]*1080pnornmain.mp4' | xargs -0 -I '{}' python3 "$scr" '{}'

Copying a file into multiple directories in bash

I have a file I would like to copy into about 300,000 different directories, these are themselves split between two directories, e.g.
DirA/Dir001/
...
DirB/Dir149000/
However when I try:
cp file.txt */*
It returns:
bash: /bin/cp: Argument list too long
What is the best way of copying a file into multiple directories, when you have too many to use cp?
The answer to the question as asked is find.
find . -mindepth 2 -maxdepth 2 -type d -exec cp script.py {} \;
But of course #triplee is right... why make so many copies of a file?
You could, of course, instead create links to the file...
find . -mindepth 2 -maxdepth 2 -type d -exec ln script.py {} \;
The options -mindepth 2 -maxdepth 2 limit the recursive search of find to elements exactly two levels deep from the current directory (.). The -type d matches all directories. -exec then executes the command (up to the closing \;), for each element found, replacing the {} with the name of the element (the two-levels-deep subdirectory).
The links created are hard links. That means, you edit the script in one place, the script will look different in all places. The script is, for all intents and purposes, in all the places, with none of them being any less "real" than the others. (This concept can be surprising to those not used to it.) Use ln -s if you instead want to create "soft" links, which are mere references to "the one, true" script.py in the original location.
The beauty of find ... -exec ... {}, as opposed to many other ways to do it, is that it will work correctly even for filenames with "funny" characters in them, including but not limited to spaces or newlines.
But still, you should really only need one script. You should fix the part of your project where you need that script in every directory; that is the broken part...
Extrapolating from the answer to your other question you seem to have code which looks something like
for TGZ in $(find . -name "file.tar.gz")
do
mkdir -p work
cd work
tar xzf $TGZ
python script.py
cd ..
rm -rf work
done
Of course, the trivial fix is to replace
python script.py
with
python ../script.py
and voilá, you no longer need a copy of the script in each directory at all.
I woud further advice to refactor out the cd and changing script.py so you can pass it the directory to operate on as a command-line argument. (Briefly, import sys and examine the value of sys.argv[1] though you'll often want to have option parsing and support for multiple arguments; argparse from the Python standard library is slightly intimidating, but there are friendly third-party wrappers like click.)
As an aside, many beginners seem to think the location of your executable is going to be the working directory when it executes. This is obviously not the case; or /bin/ls woul only list files in /bin.
To get rid of the cd problem mentioned in a comment, a minimal fix is
for tgz in $(find . -name "file.tar.gz")
do
mkdir -p work
tar -C work -x -z -f "$tgz"
(cd work; python ../script.py)
rm -rf work
done
Again, if you can change the Python script so it doesn't need its input files in the current directory, this can be simplified further. Notice also the preference for lower case for your variables, and the use of quoting around variables which contain file names. The use of find in a command substitution is still slightly broken (it can't work for file names which contain whitespace or shell metacharacters) but maybe that's a topic for a separate question.

How to copy and rename all .yml.sample files to be .yml in Linux?

In bash I want to copy all .yml.sample files in a Git repository (recursively) and rename them to just have a .yml extension.
Eg. test.yml.sample would be copied to test.yml
Here’s as close as I’ve got, but I'm not clear on how to strip .sample off the end of the file name when I copy.
find . -depth -name "*.yml.sample" -exec sh -c 'cp "$1" "${1%/.sample/}"' _ {} \;
This should work:
find . -depth -name "*.yml.sample" -exec sh -c 'cp -p "$1" "${1%.yml.sample}.yml"' _ {} \;
The first *.yml.sample finds the files via find. Then after the -exec part, the magic happens via cp taking the results of that find via $1 and then the file extension for the copied file is set via ${1%.yml.sample}.yml where .yml.sample is the source extension, and .yml is the new destination extension.
Note I also added the -p attribute to preserve the attributes from the source file to the copied file. You might not need that, but I think it can be helpful when doing copies like this.
And—since this shell logic can be confusing—in terms of the _ {} \;, it breaks down as this:
_ {}: As explained in this answer on the Unix/Linux Stack Exchange site, “The way this works is bash takes the parameters after -c as arguments, _ {} is needed so that the contents of {} is assigned to $1 not l.”
\;: When you run find with a -exec parameter, everything that happens after that is parsed through a new shell. Meaning the main find command runs in one parent shell and stuff after -exec runs in another child shell command. If you run it as _ {} ;, the child shell command would terminate. So instead, you escape it as \; so you get _ {} \; which means only the parent sell find would interpret that ; as a “terminate” and thus the paren find command can successfully run iterative commands via -exec without stopping that child shell command. Read up on -exec command ; here.
I think you can use a tool like mmv, to mass rename all the files you need.
mmv \*.yml.sample \#1.yml
The above line should work... just make sure to test it first. Hope this helps!
Edit: If you want to copy and rename, all in one step, you can use the -c flag. That will preserve the original file, and will make a copy using the rename mask.
mmv -c \*.yml.sample \#1.yml

Bash find: changing matched name for use in -exec

I'm writing a deploy script, and I need to run a less compiler against all .less files in a directory. This is easy to do with the following find command:
find -name "*.less" -exec plessc {} {}.css \;
After running this command on a folder with a file named main.less, I'm left with a file named main.less.css, but I want it to be main.css.
I know I can easily strip the .less portion of the resulting files with this command: rename 's/\.less//' *.css but I'm hoping to learn something new about using -exec.
Is it possible to modify the name of the file that matches while using it in the -exec parameter?
Thanks!
Your find command is using a couple of non standard GNU extensions:
You do not state where to find, this is an error in POSIX but GNU find select the current directory in that case
You use a non isolated {}, POSIX find doesn't expand it in that case.
Here is a one liner that should work with most find implementations and fix your double extension issue:
find . -name "*.less" -exec sh -c "plessc \$0 \$(dirname \$0)/\$(basename \$0 less)css" {} \;
On Solaris 10 and older, sh -c should be replaced by ksh -c if the PATH isn't POSIX compliant.
No, it is not possible to do it directly. You can only use {} to directly insert the full filename. However, in exec, you COULD put in other things like awk. Or you can redirect output to another program via pipes.
From the find man page:
-exec command ;
Execute command; true if 0 status is returned. All following
arguments to find are taken to be arguments to the command until
an argument consisting of `;' is encountered. The string `{}'
is replaced by the current file name being processed everywhere
it occurs in the arguments to the command, not just in arguments
where it is alone, as in some versions of find. Both of these
constructions might need to be escaped (with a `\') or quoted to
protect them from expansion by the shell. See the EXAMPLES
section for examples of the use of the -exec option. The
specified command is run once for each matched file. The command
is executed in the starting directory. There are unavoidable
security problems surrounding use of the -exec action; you
should use the -execdir option instead.

Resources