Script to find recursively the number of files with a certain extension - shell

We have a highly nested directory structure, where we have a directory, let's call it 'my Dir', appearing many times in our hierarchy. I am interested in counting the number of "*.csv" files in all directories named 'my Dir' (yes, there is a whitespace in the name). How can I go about it?
I tried something like this, but it does not work:
find . -type d -name "my Dir" -exec ls "{}/*.csv" \; | wc -l

If you want to the number of files matching the pattern '*.csv' under "my Dir", then:
don't ask for -type d; ask for -type f
don't ask for -name "my Dir" if you really want -name '*.csv'
don't try to ls *.csv on each match, because if there's more N csv files in a directory, you would potentially count each one N times
also beware of embedding {} in -exec code!
For counting files from find, I like to use a trick I learned from Stéphane Chazelas on U&L; for example, from: Counting files in Linux:
find "my Dir" -type f -name '*.csv' -printf . | wc -c
This requires GNU find, as -printf is a GNU extension to the POSIX standard.
It works by looking within "my Dir" (from the current working directory) for files that match the pattern; for each matching file, it prints a single dot (period); that's all piped to wc who counts the number of characters (periods) that find produced -- the number of matching files.

You would exclude all pathcs that are not My Dir:
find . -type f -not '(' -not -path '*/my Dir/*' -prune ')' -name '*.csv'

Another solution is to use the -path predicate to select your files.
find . -path '*/my Dir/*.csv'
Counting the number of occurrences could be a simple matter of piping to wc -l, though this will obviously produce the wrong result if some of the files contain newlines in their names. (This is slightly pathological, but definitely something you want to cover in production code.) A common arrangement is to just print a newline for every found file, instead of its name.
find . -path '*/my Dir/*.csv' -printf '.\n' | wc -l
(The -printf predicate is not in POSIX but it's not hard to replace with an -exec or similar.)

Related

Grep through the results of a 'find' command

I am trying to do a simple search through files.
Find all files that match a name pattern
Grep through results of step 1 and find only files whose contents have a specific string
I tried,
find . -name rio.yml -exec grep "my pattern" \;
Whats best practice for something like this.
If you just want the paths that contain the match, do:
find . -name rio.yml -type f -exec grep -q "my pattern" {} \; -print
(Given that you're already filtering on the name, the -type f may be redundant, but I find it helpful when grepping.) You can use grep -l, but it's often convenient to build a pipeline to xargs with -print0, so this is a good pattern.
To get the filename which contains some string, you need to use grep -l
find . -name rio.yml -exec grep -l "my pattern" {} \;
To get full path of the files; you can use $(pwd) in place of search directory.

Why is my `find` command giving me errors relating to ignored directories?

I have this find command:
find . -type f -not -path '**/.git/**' -not -path '**/node_modules/**' | xargs sed -i '' s/typescript-library-skeleton/xxx/g;
for some reason it's giving me these warnings/errors:
find: ./.git/objects/3c: No such file or directory
find: ./.git/objects/3f: No such file or directory
find: ./.git/objects/41: No such file or directory
I even tried using:
-not -path '**/.git/objects/**'
and got the same thing. Anybody know why the find is searching in the .git directory? Seems weird.
why is the find searching in the .git directory?
GNU find is clever and supports several optimizations over a naive implementation:
It can flip the order of -size +512b -name '*.txt' and check the name first, because querying the size will require a second syscall.
It can count the hard links of a directory to determine the number of subdirectories, and when it's seen all it no longers needs to check them for -type d or for recursing.
It can even rewrite (-B -or -C) -and -A so that if the checks are equally costly and free of side effects, the -A will be evaluated first, hoping to reject the file after 1 test instead of 2.
However, it is not yet clever enough to realize that -not -path '*/.git/*' means that if you find a directory .git then you don't even need to recurse into it because all files inside will fail to match.
Instead, it dutifully recurses, finds each file and matches it against the pattern as if it was a black box.
To explicitly tell it to skip a directory entirely, you can instead use -prune. See How to exclude a directory in find . command
Both more efficient and more correct would be to avoid the default -print action, change -not -path ... to -prune, and ensure that xargs is only used with NUL-delimited input:
find . -name .git -prune -o \
-name node_modules -prune -o \
-type f -print0 | xargs -0 sed -i '' s/typescript-library-skeleton/xxx/g '{}' +
Note the following points:
We use -prune to tell find to not even recurse down the undesired directories, rather than -not -path ... to tell it to discard names in those directories after they were found.
We put the -prunes before the -type f, so we're able to match directories for pruning.
We have an explicit action, not depending on the default -print. This is important because the default -print effectively has a set of parenthesis: find ... behaves like find '(' ... ')' -print, not like find ... -print, no if explicit action is given.
We use xargs only with the -0 argument enabling NUL-delimited input, and the -print0 action on the find side to generate a NUL-delimited list of names. NUL is the only character which cannot be present in an arbitrary file path (yes, newlines can be present) -- and thus the only character which is safe to use to separate paths. (If the -0 extension to xargs and the -print0 extension to find are not guaranteed to be available, use -exec sed -i '' ... {} + instead).

Moving multiple files in subdirectories (and/or splitting strings by multichar delimeter) [bash]

So basically, I have a folder with a bunch of subfolders all with over 100 files in them. I want to take all of the mp3 files (really generic extension since I'll have to do this with jpg, etc.) and move them to a new folder in the original directory. So basically the file structure looks like this:
/.../dir/recup1/file1.mp3
/.../dir/recup2/file2.mp3
... etc.
and I want it to look like this:
/.../dir/music/file1.mp3
/.../dir/music/file2.mp3
... etc.
I figured I would use a bash script that looked along these lines:
#!/bin/bash
STR=`find ./ -type f -name \*.mp3`
FILES=(echo $STR | tr ".mp3 " "\n")
for x in $FILES
do
echo "> [$x]"
done
I just have it echo for now, but eventually I would want to use mv to get it to the correct folder. Obviously this doesn't work though because tr sees each character as a delimiter, so if you guys have a better idea I'd appreciate it.
(FYI, I'm running netbook Ubuntu, so if there's a GUI way akin to Windows' search, I would not be against using it)
If the music folder exists then the following should work -
find /path/to/search -type f -iname "*.mp3" -exec mv {} path/to/music \;
A -exec command must be terminated with a ; (so you usually need to type \; or ';' to avoid interpretion by the shell) or a +. The difference is that with ;, the command is called once per file, with +, it is called just as few times as possible (usually once, but there is a maximum length for a command line, so it might be split up) with all filenames.
You can do it like this:
find /some/dir -type f -iname '*.mp3' -exec mv \{\} /where/to/move/ \;
The \{\} part will be replaced by the found file name/path. The \; part sets the end for the -exec part, it can't be left out.
If you want to print what was found, just add a -print flag like:
find /some/dir -type f -iname '*.mp3' -print -exec mv \{\} /where/to/move/ \;
HTH

Piping find to find

I want to pipe a find result to a new find. What I have is:
find . -iname "2010-06*" -maxdepth 1 -type d | xargs -0 find '{}' -iname "*.jpg"
Expected result: Second find receives a list of folders starting with 2010-06, second find returns a list of jpg's contained within those folders.
Actual result: "find: ./2010-06 New York\n: unknown option"
Oh darn. I have a feeling it concerns the format of the output that the second find receives as input, but my only idea was to suffix -print0 to first find, with no change whatsoever.
Any ideas?
You need 2 things. -print0, and more importantly -I{} on xargs, otherwise the {} doesn't do anything.
find . -iname "2010-06*" -maxdepth 1 -type d -print0 | xargs -0 -I{} find '{}' -iname '*.jpg'
Useless use of xargs.
find 2010-06* -iname "*.jpg"
At least Gnu-find accepts multiple paths to search in. -maxdepth and type -d is implicitly assumed.
How about
find . -iwholename "./2010-06*/*.jpg
etc?
Although you did say that you specifically want this find + pipe problem to work, its inefficient to fork an extra find command. Since you are specifying -maxdepth as 1, you are not traversing subdirectories. So just use a for loop with shell expansion.
for file in *2010-06*/*.jpg
do
echo "$file"
done
If you want to find all jpg files inside each 2010-06* folders recursively, there is also no need to use multiple finds or xargs
for directory in 2010-06*/
do
find $directory -iname "*.jpg" -type f
done
Or just
find 2006-06* -type f -iname "*.jpg"
Or even better, if you have bash 4 and above
shopt -s globstar
shopt -s nullglob
for file in 2010-06*/**/*.jpg
do
echo "$file"
done

Find all files with a filename beginning with a specified string?

I have a directory with roughly 100000 files in it, and I want to perform some function on all files beginning with a specified string, which may match tens of thousands of files.
I have tried
ls mystring*
but this returns with the bash error 'Too many arguments'. My next plan was to use
find ./mystring* -type f
but this has the same issue.
The code needs to look something like
for FILE in `find ./mystring* -type f`
do
#Some function on the file
done
Use find with a wildcard:
find . -name 'mystring*'
ls | grep "^abc"
will give you all files beginning (which is what the OP specifically required) with the substringabc.
It operates only on the current directory whereas find operates recursively into sub folders.
To use find for only files starting with your string try
find . -name 'abc'*
If you want to restrict your search only to files you should consider to use -type f in your search
try to use also -iname for case-insensitive search
Example:
find /path -iname 'yourstring*' -type f
You could also perform some operations on results without pipe sign or xargs
Example:
Search for files and show their size in MB
find /path -iname 'yourstring*' -type f -exec du -sm {} \;

Resources