Unix find: list of files from stdin - bash

I'm working in Linux & bash (or Cygwin & bash).
I have a huge--huge--directory structure, and I have to find a few needles in the haystack.
Specifically, I'm looking for these files (20 or so):
foo.c
bar.h
...
quux.txt
I know that they are in a subdirectory somewhere under ..
I know I can find any one of them with
find . -name foo.c -print. This command takes a few minutes to execute.
How can I print the names of these files with their full directory name? I don't want to execute 20 separate finds--it will take too long.
Can I give find the list of files from stdin? From a file? Is there a different command that does what I want?
Do I have to first assemble a command line for find with -o using a loop or something?

If your directory structure is huge but not changing frequently, it is good to run
cd /to/root/of/the/files
find . -type f -print > ../LIST_OF_FILES.txt #and sometimes handy the next one too
find . -type d -print > ../LIST_OF_DIRS.txt
after it you can really FAST find anything (with grep, sed, etc..) and update the file-lists only when the tree is changed. (it is a simplified replacement if you don't have locate)
So,
grep '/foo.c$' LIST_OF_FILES.txt #list all foo.c in the tree..
When want find a list of files, you can try the following:
fgrep -f wanted_file_list.txt < LIST_OF_FILES.txt
or directly with the find command
find . type f -print | fgrep -f wanted_file_list.txt
the -f for fgrep mean - read patterns from the file, so you can easily grepping input for multiple patterns...

You shouldn't need to run find twenty times.
You can construct a single command with a multiple of filename specifiers:
find . \( -name 'file1' -o -name 'file2' -o -name 'file3' \) -exec echo {} \;

Is the locate(1) command an acceptable answer? Nightly it builds an index, and you can query the index quite quickly:
$ time locate id_rsa
/home/sarnold/.ssh/id_rsa
/home/sarnold/.ssh/id_rsa.pub
real 0m0.779s
user 0m0.760s
sys 0m0.010s
I gave up executing a similar find command in my home directory at 36 seconds. :)
If nightly doesn't work, you could run the updatedb(8) program by hand once before running locate(1) queries. /etc/updatedb.conf (updatedb.conf(5)) lets you select specific directories or filesystem types to include or exclude.

Yes, assemble your command line.

Here's a way to process a list of files from stdin and assemble your (FreeBSD) find command to use extended regular expression matching (n1|n2|n3).
For GNU find you may have to use one of the following options to enable extended regular expression matching:
-regextype posix-egrep
-regextype posix-extended
echo '
foo\\.c
bar\\.h
quux\\.txt
' | xargs bash -c '
IFS="|";
find -E "$PWD" -type f -regex "^.*/($*)$" -print
echo find -E "$PWD" -type f -regex "^.*/($*)$" -print
' arg0
# note: "$*" uses the first character of the IFS variable as array item delimiter
(
IFS='|'
set -- 1 2 3 4 5
echo "$*" # 1|2|3|4|5
)

Related

How to use find and prename to reformat directory names recursively?

I am trying to find all directories that start with a year in brackets, such as this:
[1990] Nature Documentary
and then rename them removing brackets and inserting a dash in between.
1990 - Nature Documentary
The find command below seems to find the results, however I could not prefix the pattern with ^ to mark start of directory name otherwise its not returning hits.
I am pretty sure I need to use -exec or -execdir, but I am not sure how to store the found pattern and manipulate it.
find . -type d -name '\[[[:digit:]][[:digit:]][[:digit:]][[:digit:]]] *'
With [p]rename:
-depth -exec prename -n 's/\[(\d{4})]([^\/]+)$/$1 -$2/' {} +
Drop -n if the output looks good.
Without it, you'd need a shell script with several hardly intelligible parameter expansions there:
-depth -exec sh -c '
for dp; do
yr=${dp##*/[} yr=${yr%%]*}
echo mv "$dp" "${dp%/*}/$yr -${dp##*/\[????]}"
done' sh {} +
Remove echo to apply changes.
You can use the rename command
find . -type d -name '\[[[:digit:]][[:digit:]][[:digit:]][[:digit:]]\] *'| rename -n 's/(\[\d{4}\]) ([\w,\s]+)+$/$1 - $2/'
Note: The effect will not take place until you delete the -n option.

Loop through all files in a directory and subdirectories using Bash [duplicate]

This question already has answers here:
How to loop through a directory recursively to delete files with certain extensions
(16 answers)
Closed 4 years ago.
I know how to loop through all the files in a directory, for example:
for i in *
do
<some command>
done
But I would like to go through all the files in a directory, including (particularly!) all the ones in the subdirectories. Is there a simple way of doing this?
The find command is very useful for that kind of thing, provided you don't have white space or other special characters in the file names:
For example:
for i in $(find . -type f -print)
do
stuff
done
The command generates path names relative from the start of the search (the first parameter).
As pointed out, this will fail if your filenames contain spaces or some other characters.
You can also use the -exec option which avoids the problem with spaces in file names. It executes the given command for each file found. The braces are a placeholder for the filename:
find . -type f -exec command {} \;
find and xargs are great tools for recursively processing the contents of directories and sub-directories. For example
find . -type f -print0 | xargs -0 command
will run command on batches of files from the current directory and its sub-directories. The -print0 and -0 arguments avoid the usual problems with filenames that contain spaces, quotes or other metacharacters.
If command just takes one argument, you can limit the number of files passed to it with -L1.
find . -type f -print0 | xargs -0 -L1 command
And as suggested by alexgirao, xargs can also name arguments, using -I, which gives some flexibility if command takes options. -I implies -L1.
find . -type f -print0 | xargs -0 -Iarg command arg --option
recurse() {
path=$1
If [ -d "$path" ] ; then
for i in "$path/"*
do
recurse "$i"
done
elif [ -f "$path" ] ; then
do-something
fi
}
Call recurse and pass first positional parameter as directory path from where you want to start.
Ex: recurse /path

How do I recursively find files with specific names and join using ImageMagick in Terminal?

I have created an ImageMagick command to join images with certain names:
convert -append *A_SLIDER.jpg *B_SLIDER.jpg out.jpg
I have lots of folders with files named *A_SLIDER.jpg and *B_SLIDER.jpg next to each other (only ever one pair in a directory).
I would like to recursively search a directory with many folders and execute the command to join the images.
If it is possible to name the output image based on the input images that would be great e.g.
=> DOGS_A_SLIDER.jpg and DOGS_B_SLIDER.jpg would combine to DOGS_SLIDER.jpg
Something like this, but back up first and try on a sample directory only!
#!/bin/bash
find . -name "*A_SLIDER*" -execdir bash -c ' \
out=$(ls *A_SLIDER*);
out=${out/_A/}; \
convert -append "*A_SLIDER*" "*B_SLIDER*" $out' \;
Find all files containing the letters "A_SLIDER" and go to the containing directory and start bash there. While you are there, get the name of the file, and remove the _A part to form the output filename. Then execute ImageMagick convert with the _A_ and the corresponding _B_ files to form the output file.
Or, a slightly more concise suggestion from #gniourf_gniourf... thank you.
#!/bin/bash
find . -name "*A_SLIDER.jpg" -type f -execdir bash -c 'convert -append "$1" "${1/_A_/_B_}" "${1/_A/}"' _ {} \;
The "find" command will recursively search folders:
$ find . -name "*.jpg" -print
That will display all the filenames. You might instead want "-iname" which does case-insensitive filename matching.
You can add a command line with "-exec", in which "{}" is replaced by the name of the file. You must terminate the command line with "\;":
$ find . -name "*.jpg" -exec ls -l {} \;
You can use sed to edit the name of a file:
$ echo DOGS_A_SLIDER.jpg | sed 's=_.*$=='
DOGS
Can you count on all of your "B" files being named the same as the corresponding "A" files? That is, you will not have "DOGS_A_SLIDER.jpg" and "CATS_A_SLIDER.jpg" in the same directory. If so, something like the following isn't everything you need, but will contribute to your solution:
$ find . -type f -name "*.jpg" -exec "(echo {} | sed 's=_.*==')" \;
That particular sed script will do the wrong thing if you have any directory names with underscores in them.
"find . -type f" finds regular files; it runs modestly faster than without the -type. Use "-d" to find directories.

how to grep large number of files?

I am trying to grep 40k files in the current directory and i am getting this error.
for i in $(cat A01/genes.txt); do grep $i *.kaks; done > A01/A01.result.txt
-bash: /usr/bin/grep: Argument list too long
How do one normally grep thousands of files?
Thanks
Upendra
This makes David sad...
Everyone so far is wrong (except for anubhava).
Shell scripting is not like any other programming language because much of the interpretation of lines comes from the power of the shell interpolating them before the command is actually executed.
Let's take something simple:
$ set -x
$ ls
+ ls
bar.txt foo.txt fubar.log
$ echo The text files are *.txt
echo The text files are *.txt
> echo The text files are bar.txt foo.txt
The text files are bar.txt foo.txt
$ set +x
$
The set -x allows you to see how the shell actually interpolates the glob and then passes that back to the command as input. The > points to the line that is actually being executed by the command.
You can see that the echo command isn't interpreting the *. Instead, the shell grabs the * and replaces it with the names of the matching files. Then and only then does the echo command actually executes the command.
When you have 40K plus files, and you do grep *, you're expanding that * to the names of those 40,000 plus files before grep even has a chance to execute, and that's where the error message /usr/bin/grep: Argument list too long is coming from.
Fortunately, Unix has a way around this dilemma:
$ find . -name "*.kaks" -type f -maxdepth 1 | xargs grep -f A01/genes.txt
The find . -name "*.kaks" -type f -maxdepth 1 will find all of your *.kaks files, and the -depth 1 will only include files in the current directory. The -type f makes sure you only pick up files and not a directory.
The find command pipes the names of the files into xargs and xargs will append the names of the file to the grep -f A01/genes.txtcommand. However, xargs has a trick up it sleeve. It knows how long the command line buffer is, and will execute the grep when the command line buffer is full, then pass in another series of file to the grep. This way, grep gets executed maybe three or ten times (depending upon the size of the command line buffer), and all of our files are used.
Unfortunately, xargs uses whitespace as a separator for the file names. If your files contain spaces or tabs, you'll have trouble with xargs. Fortunately, there's another fix:
$ find . -name "*.kaks" -type f -maxdepth 1 -print0 | xargs -0 grep -f A01/genes.txt
The -print0 will cause find to print out the names of the files not separated by newlines, but by the NUL character. The -0 parameter for xargs tells xargs that the file separator isn't whitespace, but the NUL character. Thus, fixes the issue.
You could also do this too:
$ find . -name "*.kaks" -type f -maxdepth 1 -exec grep -f A01/genes.txt {} \;
This will execute the grep for each and every file found instead of what xargs does and only runs grep for all the files it can stuff on the command line. The advantage of this is that it avoids shell interference entirely. However, it may or may not be less efficient.
What would be interesting is to experiment and see which one is more efficient. You can use time to see:
$ time find . -name "*.kaks" -type f -maxdepth 1 -exec grep -f A01/genes.txt {} \;
This will execute the command and then tell you how long it took. Try it with the -exec and with xargs and see which is faster. Let us know what you find.
You can combine find with grep like this:
find . -maxdepth 1 -name '*.kaks' -exec grep -H -f A01/genes.txt '{}' \; > A01/A01.result.txt
you can use recursive feature of grep:
for i in $(cat A01/genes.txt); do
grep -r $i .
done > A01/A01.result.txt
though if you want to select only kaks files:
for i in $(cat A01/genes.txt); do
find . -iregex '.*\.kaks$' -exec grep $i \;
done > A01/A01.result.txt
Put another for loop inside your outer one:
for f in *.kaks; do
grep -H $i "$f"
done
By the way, are you interested in finding EVERY occurrence in each file, or merely if the search string exists in there one or more times? If it is "good enough" to know the string occurs in there one or more times you can specify "-n 1" to grep and it will not bother reading/searching the rest of the file after finding the first match, which could potentially save lots of time.
The following solution has worked for me:
Problem:
grep -r "example\.com" *
-bash: /bin/grep: Argument list too long
Solution:
grep -r "example\.com" .
["In newer versions of grep you can omit the “.“, as the current directory is implied."]
Source:
Reinlick, J. https://www.saotn.org/bash-grep-through-large-number-files-argument-list-too-long/

Programatically performing the same operation in multiple same-level subfolders

I want to write a shell script that would perform a single operation - pdflatex - on multiple files of the same type in same-level subfolders. Here's a simplified version of the directory structure:
/-|
/a-|
| a1.tex
|
/b-|
| b1.tex
|
/c-|
c1.tex
What I'd like to do is have the script launch from /, and perform pdflatex on all the .tex files without having to manually include all of those subdirs in the actual script file.
The algorithm is simple enough on paper:
Go to directory /
Find all directories matching regex given in script (may also use wildcard expression)
Go down n levels, again following regex as needed
Perform pdflatex on all .tex files in current directory
... But I'm not sure how to implement this in a Unix shell script.
Do shell scripts allow for this kind of operation at all? If so, what would an implementation look like?
find is what you seem to be looking for:
find / -mindepth 1 -maxdepth 1 -type f -exec pdflatex {} \;
This would execute pdflatex on all files in the current directory.
In order to perform the action only for files n levels down, replace 1 with n+1 for both -mindepth and -maxdepth options above.
In order to say that you want to filenames matching *.tex, say
find / -mindepth 1 -maxdepth 1 -type f -name "*.tex" -exec pdflatex {} \;
The only thing I might add from what others have said is a couple of parameters, given the desire as such in the question to turn it into a contrived little program.
pdflatesxdirs
#!/bin/bash
pattern='*.tex'
if [ -z $1 ]; then
pattern="${1}.tex"
fi
maxdepth=' '
if [ ! -z $2 ]; then
maxdepth=" -maxdepth $2 "
fi
# borrowed from #devnull ;)
find / -mindepth 1${maxdepth}-type f -name "$pattern" -exec pdflatex {} \;
Then chmod u+x pdflatexdirs
Now call it as
pdflatexdirs [optional search pattern] [optional maxdepth]
With this script:
The default search pattern here is *.tex
If you want to specify maxdepth, you must also provide a search
When you do provide a search, *.tex is appended to whatever you pass

Resources