find *.cpp, *.hpp, *.c and *h files using regex and 'find' command - bash

I have the following:
find . -type f | grep -E '\.(cpp|hpp|c|h)$'
How can I do this using just find command?

find . -type f -iname \*\.cpp -o -iname \*\.hpp -o -iname \*\.c -o -iname \*\.h
. - Finds in current directory
-type f - Finds all entries that are files
-iname \*\.cpp - finds case insensitive name of file that is some string (matched with * backslashed by \ so shell will not expand it to local filenames) than . backslashed so it will not match any character, than extension
-o stands for "OR" so multiple conditions can be "ORed"
Also, as Matteo_Ragni stated in his answer, you can use -regex option which may be in your case much simpler.
Using GNU find may use -regex with proper pattern
find . -type f -regex '.*\.[hpp|cpp|c|h]'
As suggested by #MatteoRagni it is possibly better to use regex like this:
find -regex ".*\.\(hpp\|cpp\|c\|h\)$"
It depends mostly of which regular expression pattern types your find is supporting and which one is default. For more information please refer proper documentation of parameters -regex, -regextype and corresponding regular expression types.

Just for reference, without find command but with bash options and pathname expansion:
shopt -s globstar extglob
ls **/?(*.[ch]pp|*.[ch])
shopt is the bash builtin command to set (with -s) options.
bash option globstar enables the ** pattern in order to browse into nested directories and option extglob enables the ?(...) pattern to match the different file extensions.

Grep to find all occurrences of .c .h .cpp
grep '\.[chp]' file

Related

Unix shell-scripting: Can find-result be made independent of string capitalization?

First I'm not a star with shell-scripting, more used to programming in Python, but have to work with an existing Python script which calls Unix commands via subprocess.
In this script we use 2 find commands to check if 2 certain strings can be found in an xml file / file-name:
FIND_IN_FILE_CMD: find <directory> -name *.xml -exec grep -Hnl STRING1|STRING2 {} +
FIND_IN_FILENAME_CMD: find <directory> ( -name *STRING1*xml -o -name *STRING2*xml )
The problem we saw is that STRING1 and STRING2 are not always written capitalized.
Now I can do something like STRING1|STRING2|String1|String2|string1|string2 and ( -name *STRING1*xml -o -name *STRING2*xml -o -name *String1*xml -o -name *String2*xml -o -name *string1*xml -o -name *string2*xml ), but I was wondering if there was something more efficient to do this check in one go which basically matches all different writing styles.
Can anybody help me with that?
Both of your commands have syntax errors:
$ find -name *.xml -exec grep -Hnl STRING1|STRING2 {} +
bash: STRING2: command not found
find: missing argument to `-exec'
This is because you cannot have an unquoted | in a shell command as that is taken as a pipe symbol. As you can see above, the shell tries to execute STRING2 as a command. In any case, grep cannot understand | unless you use the -E flag or, if your grep supports it, the -P flag. For vanilla grep, you need STRING1\|STRING2.
All implementations of grep should support the POSIX-mandated -i and -E options:
-E
Match using extended regular expressions. Treat each pattern specified as an ERE, as described in XBD Extended Regular Expressions. If any entire ERE pattern matches some part of an input line excluding the terminating <newline>, the line shall be matched. A null ERE shall match every line.
-i
Perform pattern matching in searches without regard to case; see XBD Regular Expression General Requirements.
This means you can use -i for case insensitive matching and -E for extended regular expressions, making your command:
find <directory> -name '*.xml' -exec grep -iEHnl 'STRING1|STRING2' {} +
Note how I also quoted the *.xml since without the quotes, if any xml files
are present in the directory you ran the command in, then *.xml would be expanded by the shell to the list of xml files in that directory.
Your next command also has issues:
$ find ( -name *STRING1*xml -o -name *STRING2*xml )
bash: syntax error near unexpected token `-name'
This is because the ( has a special meaning in the shell (it opens a subshell) so you need to escape it (\(). As for case insensitive matching, GNU find, the default on Linux has an -iname option which is equivalent to -name but case insensitive. If you are using GNU find, then you can do:
find <directory> \( -iname '*STRING1*xml' -o -iname '*STRING2*xml' \)
If your find doesn't have -iname, you are stuck with writing out all possible permutations. In all cases, however, you will need to quote the patterns and escape the parentheses as I have done above.
If you are going to continue using find, just replace -name with the case insensitive version -iname.

Makefile with find command results in error "paths must precede expression"

I have the following Makefile which should find all .tex files starting with prefix "slides" and then compile all these latex files:
TSLIDES = $(shell find . -maxdepth 1 -iname 'slides*.tex' -printf '%f\n')
TPDFS = $(TSLIDES:%.tex=%.pdf)
all: $(TPDFS)
$(TPDFS): %.pdf: %.tex
latexmk -pdf $<
However, I keep getting the error messages (I am pretty sure it used to work and am very confused why I am getting this error now...)
/usr/bin/find: paths must precede expression: `slides01-intro.tex'
/usr/bin/find: possible unquoted pattern after predicate `-iname'?
In the manual, I found this
NON-BUGS
Operator precedence surprises
The command find . -name afile -o -name bfile -print will never print
afile because this is actually equivalent to find . -name afile -o \(
-name bfile -a -print \). Remember that the precedence of -a is
higher than that of -o and when there is no operator specified
between tests, -a is assumed.
“paths must precede expression” error message
$ find . -name *.c -print
find: paths must precede expression
Usage: find [-H] [-L] [-P] [-Olevel] [-D ... [path...] [expression]
This happens because *.c has been expanded by the shell resulting in
find actually receiving a command line like this:
find . -name frcode.c locate.c word_io.c -print
That command is of course not going to work. Instead of doing things
this way, you should enclose the pattern in quotes or escape the
wildcard:
$ find . -name '*.c' -print
$ find . -name \*.c -print
But this does not help in my case as I have used quotes to avoid shell expansion. Any idea how I can fix this (I have also tried TSLIDES = $(shell find . -maxdepth 1 -iname 'slides*.tex' in the first line of my Makefile but it exits with the same error?
EDIT: I am on windows and use the git bash (which is based on mingw-64).
You should always make very clear up-front in questions using Windows, that you're using Windows. Running POSIX-based tools like make on Windows always requires a bit of extra work. But I'm assuming based on the mingw-w64 label that you are, in fact, on Windows.
I tried your example on my GNU/Linux system and it worked perfectly. My suspicion is that your version of GNU make is invoking Windows cmd.exe instead of a POSIX shell like bash. In Windows cmd.exe, the single-quote character ' is not treated like a quote character.
Try replacing your single quotes with double-quotes " and see if it works:
TSLIDES = $(shell find . -maxdepth 1 -iname "slides*.tex" -printf "%f\n")
I'm also not sure if the \n will be handled properly. But you don't really need it, you can just use -print (or even, in GNU find, leave it out completely as it's the default action).
I'm not a Windows person so the above might not help but it's my best guess. If not please edit your question and provide more details about the environment you're using: where you got your version of make, where you're running it from, etc.

Use Find and xargs to delete dups in arraylist

I have arraylist of files and I am trying to use rm with xargs to remove files like:
dups=["test.csv","man.csv","teams.csv"]
How can I pass the complete dups array to find and delete these files?
I want to make changes below to make it work
find ${dups[#]} -type f -print0 | xargs -0 rm
Your find command is wrong.
# XXX buggy: read below
find foo bar baz -type f -print0
means look in the paths foo, bar, and baz, and print any actual files within those. (If one of the paths is a directory, it will find all files within that directory. If one of the paths is a file in the current directory, it will certainly find it, but then what do you need find for?)
If these are files in the current directory, simply
rm -- "${dups[#]}"
(notice also how to properly quote the array expansion).
If you want to look in all subdirectories for files with these names, you will need something like
find . -type f \( -name "test.csv" -o -name "man.csv" -o -name "teams.csv" \) -delete
or perhaps
find . -type f -regextype egrep -regex '.*/(test\.csv|man\.csv|teams\.csv)' -delete
though the -regex features are somewhat platform-dependent (try find -E instead of find -regextype egrep on *BSD/MacOS to enable ERE regex support).
Notice also how find has a built-in predicate -delete so you don't need the external utility rm at all. (Though if you wanted to run a different utility, find -exec utility {} + is still more efficient than xargs. Some really old find implementations didn't have the + syntax for -exec but you seem to be on Linux where it is widely supported.)
Building this command line from an array is not entirely trivial; I have proposed a duplicate which has a solution to a similar problem. But of course, if you are building the command from Java, it should be easy to figure out how to do this on the Java side instead of passing in an array to Bash; and then, you don't need Bash at all (you can pass this to find directly, or at least use sh instead of bash because the command doesn't require any Bash features).
I'm not a Java person, but from Python this would look like
import subprocess
command = ["find", ".", "-type", "f"]
prefix = "("
for filename in dups:
command.extend([prefix, "-name", filename])
prefix = "-o"
command.extend([")", "-delete"])
subprocess.run(command, check=True, encoding="utf-8")
Notice how the backslashes and quotes are not necessary when there is no shell involved.

executable files in linux using (perm)?

i'm trying to write out a list of the names of everything under the /etc directory that are executable to all other users and whose name starts or ends with a number.
find /etc "(" -name "[0-9]*" -o -name "*[0-9]" ")" -perm -o=x -print
But every time I get a wrong answer, can you help?
If you're using the zsh shell, you can get that list of files with its advanced filename generation globbing; no external programs needed. In particular, using a recursive glob, alternation, and a glob qualifier that matches world-executable files:
zsh$ printf "%s\n" /etc/**/([0-9]*|*[0-9])(X)
/etc/alternatives/animate-im6
/etc/alternatives/c89
/etc/alternatives/c99
/etc/alternatives/compare-im6
/etc/alternatives/composite-im6
...
/etc/X11
/etc/X11/fonts/Type1
/etc/xdg/xdg-xubuntu/xfce4
/etc/xdg/xfce4
/etc/xfce4
Do a setopt glob_dots first to match filenames starting with . like find does. Otherwise they get skipped.
If you're using find, you need the -mode argument to -perm to select files with at least the given permission bits (Which is actually what you have in your question and works for me)
find /etc \( -name "[0-9]*" -o -name "*[0-9]" \) -perm -o=x

Bash/sed - use sed in variable

I would like to use sed to delete and replace some characters in a bash script.
#!/bin/bash
DIR="."
file_extension=".mkv|.avi|.mp4"
files= `find $DIR -maxdepth 1 -type f -regex ".*\.\(mkv\|avi\|mp4\)" -printf "%f\n"`
In order to simplify $files, I would like to use $file_extension in it, i.e. change .mkv|.avi|.mp4 to mkv\|avi\|mp4
How can I do that with sed ? Or maybe an easier alternative ?
No need for sed; bash has basic substitution operators built in. The basic syntax for a replace-all operation is ${variable//pattern/replacement}, but unfortunately it can't be nested so you need a helper variable. For clarity, I'll even use two:
file_extension_without_dot="${file_extension//./}" # mkv|avi|mp4
file_extension_regex="${file_extension_without_dot//|/\\|}" # mkv\|avi\|mp4
files= `find $DIR -maxdepth 1 -type f -regex ".*\.\($file_extension_regex\)" -printf "%f\n"`
If your find supports it, you could also consider using a different -regextype (see find -regextype help) so you don't need quite so many backslashes anymore.

Resources