Check if file is in a folder with a certain name before proceeding - bash

So, I have this simple script which converts videos in a folder into a format which the R4DS can play.
#!/bin/bash
scr='/home/user/dpgv4/dpgv4.py';mkdir -p 'DPG_DS'
find '../Exports' -name "*1080pnornmain.mp4" -exec python3 "$scr" {} \;
The problem is, some of the videos are invalid and won't play, and I've moved those videos to a different directory inside the Exports folder. What I want to do is check to make sure the files are in a folder called new before running the python script on them, preferably within the find command. The path should look something like this:
../Exports/(anything here)/new/*1080pnornmain.mp4
Please note that (anything here) text does not indicate a single directory, it could be something like foo/bar, foo/b/ar, f/o/o/b/a/r, etc.

You cannot use -name because the search is on the path now. My first solution was:
find ./Exports -path '**/new/*1080pnornmain.mp4' -exec python3 "$scr" {} \;
But, as #dan pointed out in the comments, it is wrong because it uses the globstar wildcard (**) unnecessarily:
This checks if /new/ is somewhere in the preceding path, it doesn't have to be a direct parent.
So, the star is not enough here. Another possibility, using find only, could be this one:
find ./Exports -regex '.*/new/[^\/]*1080pnornmain.mp4' -exec python3 "$scr" {} \;
This regex matches:
any number of nested folders before new with .*/new
any character (except / to leave out further subpaths) + your filename with [^\/]*1080pnornmain.mp4
Performances could degrade given that it uses regular expressions.
Generally, instead of using the -exec option of the find command, you should opt to passing each line of find output to xargs because of the more efficient thread spawning, like:
find ./Exports -regex '.*/new/[^\/]*1080pnornmain.mp4' | xargs -0 -I '{}' python3 "$scr" '{}'

Related

Use Find and xargs to delete dups in arraylist

I have arraylist of files and I am trying to use rm with xargs to remove files like:
dups=["test.csv","man.csv","teams.csv"]
How can I pass the complete dups array to find and delete these files?
I want to make changes below to make it work
find ${dups[#]} -type f -print0 | xargs -0 rm
Your find command is wrong.
# XXX buggy: read below
find foo bar baz -type f -print0
means look in the paths foo, bar, and baz, and print any actual files within those. (If one of the paths is a directory, it will find all files within that directory. If one of the paths is a file in the current directory, it will certainly find it, but then what do you need find for?)
If these are files in the current directory, simply
rm -- "${dups[#]}"
(notice also how to properly quote the array expansion).
If you want to look in all subdirectories for files with these names, you will need something like
find . -type f \( -name "test.csv" -o -name "man.csv" -o -name "teams.csv" \) -delete
or perhaps
find . -type f -regextype egrep -regex '.*/(test\.csv|man\.csv|teams\.csv)' -delete
though the -regex features are somewhat platform-dependent (try find -E instead of find -regextype egrep on *BSD/MacOS to enable ERE regex support).
Notice also how find has a built-in predicate -delete so you don't need the external utility rm at all. (Though if you wanted to run a different utility, find -exec utility {} + is still more efficient than xargs. Some really old find implementations didn't have the + syntax for -exec but you seem to be on Linux where it is widely supported.)
Building this command line from an array is not entirely trivial; I have proposed a duplicate which has a solution to a similar problem. But of course, if you are building the command from Java, it should be easy to figure out how to do this on the Java side instead of passing in an array to Bash; and then, you don't need Bash at all (you can pass this to find directly, or at least use sh instead of bash because the command doesn't require any Bash features).
I'm not a Java person, but from Python this would look like
import subprocess
command = ["find", ".", "-type", "f"]
prefix = "("
for filename in dups:
command.extend([prefix, "-name", filename])
prefix = "-o"
command.extend([")", "-delete"])
subprocess.run(command, check=True, encoding="utf-8")
Notice how the backslashes and quotes are not necessary when there is no shell involved.

Find Command Exclude Hidden files when using empty flag

I am looking for a way to use the find command to tell if a folder has no files in it. I have tried using the -empty flag, but since I am on macOS the system files the OS places in the directory such as .DS_Store cause find to not consider the directory empty. I have tried telling find to ignore .DS_Store but it still considers the directory not empty because that file is present.
Is there a way to have find exclude certain files from what it considers -empty? Also is there a way to have find return a list of directories with no visible files?
The -empty predicate is rather simple, it's true for a directory if it has any entries other than . or ...
Kind of an ugly solution, but you can use -exec to run another find in each directory which will implement your criteria for deciding what directories you want to include.
Below:
the outer find will execute sh -c for each directory in /starting/point
sh will execute another find with different criteria.
the inner find will print the first match and then quit
read will consume the output (if any) of the inner find. read will have an exit status of 0 only if the inner find printed at least one line, non-zero otherwise
if there was no output from the inner find, the outer find's -exec predicate will evaluate to false
since -exec is followed by -o, the following -print action will be executed only for those directories which do not match the inner find's criteria
find /starting/point \
-type d \( \
-exec sh -c \
'find "$1" -mindepth 1 -maxdepth 1 ! -name ".*" -print -quit | read' \
sh {} \; \
-o -print \
\)
Also note that the 'find FOLDER -empty' is somewhat tricky. It will consider FOLDER empty even if it contains files, as long as these are empty.
Maybe not exactly what was asked, but I prefer the brute force approach if I want to avoid a no-match error on using FOLDER/*. In tcsh:
ls -d FOLDER/* >& /dev/null
if !($status) COMMANDS FOLDER/* ...
A variation of this might be usable here (like also using
ls -d FOLDER/.* | wc -l
and drawing the desired conclusions from the combined results).

Using both command substitution and executing a shell within GNU "find" exec command

I am a bash newbie, and I'm trying to do something that seems fairly straightforward but am having issues.
I am trying to search for a file with a pretty generic but nonunique name (e.g. analysis.uniqueExt, but also maybe sorted_result.uniqueExt) that can be within one specific subdirectory of a directory that was found from a different 'find' query. Then I would like to copy that file to my personal directory whilst also renaming the file to something more descriptive that hints to its origin location.
Here is an example of what I have tried:
case=/home/data/ABC_123 # In reality this is coming from a different query successfully
specific_id=ABC_123 # This was extracted from the previous variable
OUTDIR=/my/personal/directory
mkdir -p $OUT_DIR/$this_folder
find $case/subfolder/ -type f -name "*.uniqueExt" -exec sh -c 'cp "$1" ${OUT_DIR}/${specific_id}/$(basename "$1")' sh {} \;
This doesn't work because OUT_DIR and specific_id are not scoped in the inner shell created by the -exec command.
So I tried to do this another way:
find $case/subfolder/ -type f -name "*.uniqueExt" -exec cp {} ${OUT_DIR}/${specific_id}/$(basename {}) \;
However now I cannot extract the basename of the file found in the 'find' query as I have not invoked a shell to do so.
Is there a way I can either properly scope my variables in example #1 or execute the basename function in example #2 to accomplish this? Or maybe there is a totally different solution (possibly involving multiple -exec calls? Or maybe just piping the find results to xargs?).
Thanks for your help!
You need to export the variables since you're using them in a different shell process than the one you assigned them in.
Exporting variables makes them available in descendant processes.
export specific_id=ABC_123 # This was extracted from the previous variable
export OUTDIR=/my/personal/directory
However, you don't really need to use the shell for this. You can use
find $case/subfolder/ -type f -name "*.uniqueExt" -exec cp -t "$OUTDIR/$specific_id/" {} +
You don't have to call basename yourself, because copying a file to a target directory automatically uses the basename as the destination filename.
In my version, I use the -t option so I can put the destination directory first. This allows it to use the + variant to put all the found filenames in a single command, rather than running cp separately for each file.

BASH - execute command for all files with some extension

I have to execute command in bash for all files in a folder with the extension ".prot'
The command is called "bezogener_Spannungsgradient" and it's called like that:
bezogener_Spannungsgradient filename.prot
Thanks!
find . -maxdepth 1 -name \*.prot -exec bezogener_Spannungsgradient {} \;
-maxdepth <depth> keeps find from recursing into subdirectories beyond the given depth.
-name <pattern> limits find to files matching the pattern. The escape is necessary to keep bash from expanding the find option into a list of matching files.
-exec <cmd> {} \; executes <cmd> on each found file (replacing {} with the filename). If the command is capable of processing a list of files, use + instead of \;.
I generally recommend becoming familiar with the lots of other options of find; it's one of the most underestimated tools out there. ;-)
You could do this:
for f in *.prot; do
bezogener_Spannungsgradient "$f"
done

How can I use terminal to copy and rename files from multiple folders?

I have a folder called "week1", and in that folder there are about ten other folders that all contain multiple files, including one called "submit.pdf". I would like to be able to copy all of the "submit.pdf" files into one folder, ideally using Terminal to expedite the process. I've tried cp week1/*/submit.pdf week1/ as well as cp week1/*/*.pdf week1/, but it had only been ending up copying one file. I just realized that it has been writing over each file every time which is why I'm stuck with one...is there anyway I can prevent that from happening?
You don't indicate your OS, but if you're using Gnu cp, you can use cp week1/*/submit.pdf --backup=t week/ to have it (arbitrarily) number files that already exist; but, that won't give you any real way to identify which-is-which.
You could, perhaps, do something like this:
for file in week1/*/submit.pdf; do cp "$file" "${file//\//-}"; done
… which will produce files named something like "week1-subdir-submit.pdf"
For what it's worth, the "${var/s/r}" notation means to take var, but before inserting its value, search for s (\/, meaning /, escaped because of the other special / in that expression), and replace it with r (-), to make the unique filenames.
Edit: There's actually one more / in there, to make it match multiple times, making the syntax:
"${ var / / \/ / - }"
take "var" replace every instance of / with -
find to the rescue! Rule of thumb: If you can list the files you want with find, you can copy them. So try first this:
$ cd your_folder
$ find . -type f -iname 'submit.pdf'
Some notes:
find . means "start finding from the current directory"
-type -f means "only find regular files" (i.e., not directories)
-iname 'submit.pdf' "... with case-insensitive name 'submit.dpf'". You don't need to use 'quotation', but if you want to search using wildcards, you need to. E.g.:
~ foo$ find /usr/lib -iname '*.So*'
/usr/lib/pam/pam_deny.so.2
/usr/lib/pam/pam_env.so.2
/usr/lib/pam/pam_group.so.2
...
If you want to search case-sensitive, just use -name instead of -iname.
When this works, you can copy each file by using the -exec command. exec works by letting you specify a command to use on hits. It will run the command for each file find finds, and put the name of the file in {}. You end the sequence of commands by specifying \;.
So to echo all the files, do this:
$ find . -type f -iname submit.pdf -exec echo Found file {} \;
To copy them one by one:
$ find . -type f -iname submit.pdf -exec cp {} /destination/folder \;
Hope this helps!

Resources