Use Find and xargs to delete dups in arraylist - bash

I have arraylist of files and I am trying to use rm with xargs to remove files like:
dups=["test.csv","man.csv","teams.csv"]
How can I pass the complete dups array to find and delete these files?
I want to make changes below to make it work
find ${dups[#]} -type f -print0 | xargs -0 rm

Your find command is wrong.
# XXX buggy: read below
find foo bar baz -type f -print0
means look in the paths foo, bar, and baz, and print any actual files within those. (If one of the paths is a directory, it will find all files within that directory. If one of the paths is a file in the current directory, it will certainly find it, but then what do you need find for?)
If these are files in the current directory, simply
rm -- "${dups[#]}"
(notice also how to properly quote the array expansion).
If you want to look in all subdirectories for files with these names, you will need something like
find . -type f \( -name "test.csv" -o -name "man.csv" -o -name "teams.csv" \) -delete
or perhaps
find . -type f -regextype egrep -regex '.*/(test\.csv|man\.csv|teams\.csv)' -delete
though the -regex features are somewhat platform-dependent (try find -E instead of find -regextype egrep on *BSD/MacOS to enable ERE regex support).
Notice also how find has a built-in predicate -delete so you don't need the external utility rm at all. (Though if you wanted to run a different utility, find -exec utility {} + is still more efficient than xargs. Some really old find implementations didn't have the + syntax for -exec but you seem to be on Linux where it is widely supported.)
Building this command line from an array is not entirely trivial; I have proposed a duplicate which has a solution to a similar problem. But of course, if you are building the command from Java, it should be easy to figure out how to do this on the Java side instead of passing in an array to Bash; and then, you don't need Bash at all (you can pass this to find directly, or at least use sh instead of bash because the command doesn't require any Bash features).
I'm not a Java person, but from Python this would look like
import subprocess
command = ["find", ".", "-type", "f"]
prefix = "("
for filename in dups:
command.extend([prefix, "-name", filename])
prefix = "-o"
command.extend([")", "-delete"])
subprocess.run(command, check=True, encoding="utf-8")
Notice how the backslashes and quotes are not necessary when there is no shell involved.

Related

bash, delete all files with a pattern name

I need to delete all files with a pattern name:  2020*.js
Inside a specific directory: server/db/migrations/
And then show what it have been deleted: `| xargs``
I'm trying this:
find . -name 'server/db/migrations/2020*.js' #-delete | xargs
But nothing is deleted, and shows nothing.
What I'm doing wrong?
The immediate problem is that -name only looks at the last component of the file name (so 2020xxx.js) and cannot match anything with a slash in it. You can use the -path predicate but the correct solution is to simply delete these files directly:
rm -v server/db/migrations/2020*.js
The find command is useful when you need to traverse subdirectories.
Also, piping the output from find to xargs does not do anything useful; if find prints the names by itself, xargs does not add any value, and if it doesn't, well, xargs can't do anything with an empty input.
If indeed you want to traverse subdirectories, try
find server/db/migrations/ -type f -name '2020*.js' -print -delete
If your shell supports ** you could equally use
rm -v server/db/migrations/**/2020*.js
which however has a robustness problem if there can be very many matching files (you get "command line too long"). In that scenario, probably fall back to find after all.
You're looking for something like this:
find server/db/migrations -type f -name '2020*.js' -delete -print
You have try this:
find . -name 'server/db/migrations/2020*.js' | xargs rm

Check if file is in a folder with a certain name before proceeding

So, I have this simple script which converts videos in a folder into a format which the R4DS can play.
#!/bin/bash
scr='/home/user/dpgv4/dpgv4.py';mkdir -p 'DPG_DS'
find '../Exports' -name "*1080pnornmain.mp4" -exec python3 "$scr" {} \;
The problem is, some of the videos are invalid and won't play, and I've moved those videos to a different directory inside the Exports folder. What I want to do is check to make sure the files are in a folder called new before running the python script on them, preferably within the find command. The path should look something like this:
../Exports/(anything here)/new/*1080pnornmain.mp4
Please note that (anything here) text does not indicate a single directory, it could be something like foo/bar, foo/b/ar, f/o/o/b/a/r, etc.
You cannot use -name because the search is on the path now. My first solution was:
find ./Exports -path '**/new/*1080pnornmain.mp4' -exec python3 "$scr" {} \;
But, as #dan pointed out in the comments, it is wrong because it uses the globstar wildcard (**) unnecessarily:
This checks if /new/ is somewhere in the preceding path, it doesn't have to be a direct parent.
So, the star is not enough here. Another possibility, using find only, could be this one:
find ./Exports -regex '.*/new/[^\/]*1080pnornmain.mp4' -exec python3 "$scr" {} \;
This regex matches:
any number of nested folders before new with .*/new
any character (except / to leave out further subpaths) + your filename with [^\/]*1080pnornmain.mp4
Performances could degrade given that it uses regular expressions.
Generally, instead of using the -exec option of the find command, you should opt to passing each line of find output to xargs because of the more efficient thread spawning, like:
find ./Exports -regex '.*/new/[^\/]*1080pnornmain.mp4' | xargs -0 -I '{}' python3 "$scr" '{}'

How to use find and prename to reformat directory names recursively?

I am trying to find all directories that start with a year in brackets, such as this:
[1990] Nature Documentary
and then rename them removing brackets and inserting a dash in between.
1990 - Nature Documentary
The find command below seems to find the results, however I could not prefix the pattern with ^ to mark start of directory name otherwise its not returning hits.
I am pretty sure I need to use -exec or -execdir, but I am not sure how to store the found pattern and manipulate it.
find . -type d -name '\[[[:digit:]][[:digit:]][[:digit:]][[:digit:]]] *'
With [p]rename:
-depth -exec prename -n 's/\[(\d{4})]([^\/]+)$/$1 -$2/' {} +
Drop -n if the output looks good.
Without it, you'd need a shell script with several hardly intelligible parameter expansions there:
-depth -exec sh -c '
for dp; do
yr=${dp##*/[} yr=${yr%%]*}
echo mv "$dp" "${dp%/*}/$yr -${dp##*/\[????]}"
done' sh {} +
Remove echo to apply changes.
You can use the rename command
find . -type d -name '\[[[:digit:]][[:digit:]][[:digit:]][[:digit:]]\] *'| rename -n 's/(\[\d{4}\]) ([\w,\s]+)+$/$1 - $2/'
Note: The effect will not take place until you delete the -n option.

BASH - execute command for all files with some extension

I have to execute command in bash for all files in a folder with the extension ".prot'
The command is called "bezogener_Spannungsgradient" and it's called like that:
bezogener_Spannungsgradient filename.prot
Thanks!
find . -maxdepth 1 -name \*.prot -exec bezogener_Spannungsgradient {} \;
-maxdepth <depth> keeps find from recursing into subdirectories beyond the given depth.
-name <pattern> limits find to files matching the pattern. The escape is necessary to keep bash from expanding the find option into a list of matching files.
-exec <cmd> {} \; executes <cmd> on each found file (replacing {} with the filename). If the command is capable of processing a list of files, use + instead of \;.
I generally recommend becoming familiar with the lots of other options of find; it's one of the most underestimated tools out there. ;-)
You could do this:
for f in *.prot; do
bezogener_Spannungsgradient "$f"
done

Unix find: list of files from stdin

I'm working in Linux & bash (or Cygwin & bash).
I have a huge--huge--directory structure, and I have to find a few needles in the haystack.
Specifically, I'm looking for these files (20 or so):
foo.c
bar.h
...
quux.txt
I know that they are in a subdirectory somewhere under ..
I know I can find any one of them with
find . -name foo.c -print. This command takes a few minutes to execute.
How can I print the names of these files with their full directory name? I don't want to execute 20 separate finds--it will take too long.
Can I give find the list of files from stdin? From a file? Is there a different command that does what I want?
Do I have to first assemble a command line for find with -o using a loop or something?
If your directory structure is huge but not changing frequently, it is good to run
cd /to/root/of/the/files
find . -type f -print > ../LIST_OF_FILES.txt #and sometimes handy the next one too
find . -type d -print > ../LIST_OF_DIRS.txt
after it you can really FAST find anything (with grep, sed, etc..) and update the file-lists only when the tree is changed. (it is a simplified replacement if you don't have locate)
So,
grep '/foo.c$' LIST_OF_FILES.txt #list all foo.c in the tree..
When want find a list of files, you can try the following:
fgrep -f wanted_file_list.txt < LIST_OF_FILES.txt
or directly with the find command
find . type f -print | fgrep -f wanted_file_list.txt
the -f for fgrep mean - read patterns from the file, so you can easily grepping input for multiple patterns...
You shouldn't need to run find twenty times.
You can construct a single command with a multiple of filename specifiers:
find . \( -name 'file1' -o -name 'file2' -o -name 'file3' \) -exec echo {} \;
Is the locate(1) command an acceptable answer? Nightly it builds an index, and you can query the index quite quickly:
$ time locate id_rsa
/home/sarnold/.ssh/id_rsa
/home/sarnold/.ssh/id_rsa.pub
real 0m0.779s
user 0m0.760s
sys 0m0.010s
I gave up executing a similar find command in my home directory at 36 seconds. :)
If nightly doesn't work, you could run the updatedb(8) program by hand once before running locate(1) queries. /etc/updatedb.conf (updatedb.conf(5)) lets you select specific directories or filesystem types to include or exclude.
Yes, assemble your command line.
Here's a way to process a list of files from stdin and assemble your (FreeBSD) find command to use extended regular expression matching (n1|n2|n3).
For GNU find you may have to use one of the following options to enable extended regular expression matching:
-regextype posix-egrep
-regextype posix-extended
echo '
foo\\.c
bar\\.h
quux\\.txt
' | xargs bash -c '
IFS="|";
find -E "$PWD" -type f -regex "^.*/($*)$" -print
echo find -E "$PWD" -type f -regex "^.*/($*)$" -print
' arg0
# note: "$*" uses the first character of the IFS variable as array item delimiter
(
IFS='|'
set -- 1 2 3 4 5
echo "$*" # 1|2|3|4|5
)

Resources