Unix shell-scripting: Can find-result be made independent of string capitalization? - shell

First I'm not a star with shell-scripting, more used to programming in Python, but have to work with an existing Python script which calls Unix commands via subprocess.
In this script we use 2 find commands to check if 2 certain strings can be found in an xml file / file-name:
FIND_IN_FILE_CMD: find <directory> -name *.xml -exec grep -Hnl STRING1|STRING2 {} +
FIND_IN_FILENAME_CMD: find <directory> ( -name *STRING1*xml -o -name *STRING2*xml )
The problem we saw is that STRING1 and STRING2 are not always written capitalized.
Now I can do something like STRING1|STRING2|String1|String2|string1|string2 and ( -name *STRING1*xml -o -name *STRING2*xml -o -name *String1*xml -o -name *String2*xml -o -name *string1*xml -o -name *string2*xml ), but I was wondering if there was something more efficient to do this check in one go which basically matches all different writing styles.
Can anybody help me with that?

Both of your commands have syntax errors:
$ find -name *.xml -exec grep -Hnl STRING1|STRING2 {} +
bash: STRING2: command not found
find: missing argument to `-exec'
This is because you cannot have an unquoted | in a shell command as that is taken as a pipe symbol. As you can see above, the shell tries to execute STRING2 as a command. In any case, grep cannot understand | unless you use the -E flag or, if your grep supports it, the -P flag. For vanilla grep, you need STRING1\|STRING2.
All implementations of grep should support the POSIX-mandated -i and -E options:
-E
Match using extended regular expressions. Treat each pattern specified as an ERE, as described in XBD Extended Regular Expressions. If any entire ERE pattern matches some part of an input line excluding the terminating <newline>, the line shall be matched. A null ERE shall match every line.
-i
Perform pattern matching in searches without regard to case; see XBD Regular Expression General Requirements.
This means you can use -i for case insensitive matching and -E for extended regular expressions, making your command:
find <directory> -name '*.xml' -exec grep -iEHnl 'STRING1|STRING2' {} +
Note how I also quoted the *.xml since without the quotes, if any xml files
are present in the directory you ran the command in, then *.xml would be expanded by the shell to the list of xml files in that directory.
Your next command also has issues:
$ find ( -name *STRING1*xml -o -name *STRING2*xml )
bash: syntax error near unexpected token `-name'
This is because the ( has a special meaning in the shell (it opens a subshell) so you need to escape it (\(). As for case insensitive matching, GNU find, the default on Linux has an -iname option which is equivalent to -name but case insensitive. If you are using GNU find, then you can do:
find <directory> \( -iname '*STRING1*xml' -o -iname '*STRING2*xml' \)
If your find doesn't have -iname, you are stuck with writing out all possible permutations. In all cases, however, you will need to quote the patterns and escape the parentheses as I have done above.

If you are going to continue using find, just replace -name with the case insensitive version -iname.

Related

Makefile with find command results in error "paths must precede expression"

I have the following Makefile which should find all .tex files starting with prefix "slides" and then compile all these latex files:
TSLIDES = $(shell find . -maxdepth 1 -iname 'slides*.tex' -printf '%f\n')
TPDFS = $(TSLIDES:%.tex=%.pdf)
all: $(TPDFS)
$(TPDFS): %.pdf: %.tex
latexmk -pdf $<
However, I keep getting the error messages (I am pretty sure it used to work and am very confused why I am getting this error now...)
/usr/bin/find: paths must precede expression: `slides01-intro.tex'
/usr/bin/find: possible unquoted pattern after predicate `-iname'?
In the manual, I found this
NON-BUGS
Operator precedence surprises
The command find . -name afile -o -name bfile -print will never print
afile because this is actually equivalent to find . -name afile -o \(
-name bfile -a -print \). Remember that the precedence of -a is
higher than that of -o and when there is no operator specified
between tests, -a is assumed.
“paths must precede expression” error message
$ find . -name *.c -print
find: paths must precede expression
Usage: find [-H] [-L] [-P] [-Olevel] [-D ... [path...] [expression]
This happens because *.c has been expanded by the shell resulting in
find actually receiving a command line like this:
find . -name frcode.c locate.c word_io.c -print
That command is of course not going to work. Instead of doing things
this way, you should enclose the pattern in quotes or escape the
wildcard:
$ find . -name '*.c' -print
$ find . -name \*.c -print
But this does not help in my case as I have used quotes to avoid shell expansion. Any idea how I can fix this (I have also tried TSLIDES = $(shell find . -maxdepth 1 -iname 'slides*.tex' in the first line of my Makefile but it exits with the same error?
EDIT: I am on windows and use the git bash (which is based on mingw-64).
You should always make very clear up-front in questions using Windows, that you're using Windows. Running POSIX-based tools like make on Windows always requires a bit of extra work. But I'm assuming based on the mingw-w64 label that you are, in fact, on Windows.
I tried your example on my GNU/Linux system and it worked perfectly. My suspicion is that your version of GNU make is invoking Windows cmd.exe instead of a POSIX shell like bash. In Windows cmd.exe, the single-quote character ' is not treated like a quote character.
Try replacing your single quotes with double-quotes " and see if it works:
TSLIDES = $(shell find . -maxdepth 1 -iname "slides*.tex" -printf "%f\n")
I'm also not sure if the \n will be handled properly. But you don't really need it, you can just use -print (or even, in GNU find, leave it out completely as it's the default action).
I'm not a Windows person so the above might not help but it's my best guess. If not please edit your question and provide more details about the environment you're using: where you got your version of make, where you're running it from, etc.

How to expand $() inside find -exec command

I have a mongodump which I want to import apparently I'm looking to do this using the find command. Something like this:
find *.bson -type f -exec echo mongoimport --db=abc --collection=$(echo '{}' | sed s/.bson//g) {} \;
What I'm looking isn't get evaluate what I need is
mongoimport --db=abc --collection=a a.bson
but I'm getting is
mongoimport --db=abc --collection=a.bson a.bson
My version of using sed to strip the .bson suffix from '{}' isn't working. I know its not a blocker but I felt if that is possible.
Any suggestions?
The problem twofold:
Shell expansions: Before a command is executed in a shell environment, the shell (sh/bash/ksh/zsh) will perform a sequence of expansions to build up the actual command that is being executed. There are seven kinds of expansion performed: brace expansion, tilde expansion, parameter and variable expansion, command substitution, arithmetic expansion, word splitting, and pathname expansion. Hence, before the find command will be executed, it will perform all substitutions, including the command substitution located in the exec statement. Ergo, the command is equivalent to:
$ find *.bson -type f -exec echo mongoimport --db=abc --collection={} {} \;
A way forward would be to prohibit the command substitution by using single-quotes, however this leads to problem two.
find's exec statement is limited: The command that -exec can execute is limited to an external utility with optional arguments. Various shell features are therefor not recognized. To use shell built-ins, functions, conditionals, pipelines, redirections etc. directly with -exec is not possible, unless wrapped in something like a sh -c child shell.
Hence the answer would be something in the line of:
$ find *.bson -type f -exec /usr/bin/sh -c 'echo mongoimport --db=abc --collection=$(echo {} | sed s/.bson//g) {}' \;
Suggesting different strategy to this problem.
Use find with option -printf to prepare your commands.
The result will be list of commands to execute (command per line).
After inspecting and testing the commands, save find command output into a file and run the file (as a bash script).
Or just run directly into bash command.
1. find result inspection:
find . -type f -name "*.bson" -printf "mongoimport --db=abc --collection=%f %f\n" | sed s/.bson//
Notice sed replacement only on first .bson match. Do not use g option.
2. Run processed and inspected find output.
bash <<< $(find . -type f -name "*.bson" -printf "mongoimport --db=abc --collection=%f %f\n" | sed s/.bson//)

find *.cpp, *.hpp, *.c and *h files using regex and 'find' command

I have the following:
find . -type f | grep -E '\.(cpp|hpp|c|h)$'
How can I do this using just find command?
find . -type f -iname \*\.cpp -o -iname \*\.hpp -o -iname \*\.c -o -iname \*\.h
. - Finds in current directory
-type f - Finds all entries that are files
-iname \*\.cpp - finds case insensitive name of file that is some string (matched with * backslashed by \ so shell will not expand it to local filenames) than . backslashed so it will not match any character, than extension
-o stands for "OR" so multiple conditions can be "ORed"
Also, as Matteo_Ragni stated in his answer, you can use -regex option which may be in your case much simpler.
Using GNU find may use -regex with proper pattern
find . -type f -regex '.*\.[hpp|cpp|c|h]'
As suggested by #MatteoRagni it is possibly better to use regex like this:
find -regex ".*\.\(hpp\|cpp\|c\|h\)$"
It depends mostly of which regular expression pattern types your find is supporting and which one is default. For more information please refer proper documentation of parameters -regex, -regextype and corresponding regular expression types.
Just for reference, without find command but with bash options and pathname expansion:
shopt -s globstar extglob
ls **/?(*.[ch]pp|*.[ch])
shopt is the bash builtin command to set (with -s) options.
bash option globstar enables the ** pattern in order to browse into nested directories and option extglob enables the ?(...) pattern to match the different file extensions.
Grep to find all occurrences of .c .h .cpp
grep '\.[chp]' file

combine find and grep into a single command

How to combine below two into one line without changing the first one?
# find / -name sshd_config -print
# grep -I <sshd_config path> permitrootlogin
I came up with the following, but don't know whether I gives correct result in different cases
cat `find / -name sshd_config -print` |grep permitrootlogin
Don't do cat $(...) [$() is the modern replacement for backticks] -- that doesn't work reliably if your filenames contain special characters (spaces, wildcards, etc).
Instead, tell find to invoke cat for you, with as many filenames passed to each cat invocation as possible:
find / -name sshd_config -exec cat -- '{}' + | grep permitrootlogin
...or, even better, ignore cat altogether and just pass the filenames to grep literally:
find / -name sshd_config -exec grep -h -e permitrootlogin -- /dev/null '{}' +
Replace the -h with -H if you want filenames to be shown.
You could do something like that:
find / -name "somefilename" -print0 | xargs -0 grep "something"
The 'xargs' keyword will transform the stdout into arguments that can be read by grep.
I guess what you want is use the output of find / -name sshd_config -print (which should be the path of the sshd_config file) and use it as the second argument to grep (so that that the sshd_config file gets parsed for your search string).
There are several ways to achieve this.
Commands in back-quotes (`) are replaced by their output. So
grep permitrootlogin `find / -name sshd_config -print`
will be replaced by
grep permitrootlogin /path/to/the/sshd_config
which will search /path/to/the/sshd_config for permitrootlogin.
The same happens with
grep permitrootlogin $(find / -name sshd_config -print)
As another answer already mentions, this syntax has some advantages over the back-ticks. Namely, it can be nested.
However, this still runs into a problem when the path where the file is found contains spaces. As both backticks and $(...) just perform text substitution, such a patch would be passed as several arguments to grep, each probably being an invalid path. (/path/to the/sshd_config would become /path/to and the/sshd_config.)
Rather than working around this with fancy quoting and escaping strategies, remember that UNIX commands were already designed for being used in combination, usually by pipes. Indeed find has a -print0 action which will separate paths of found files by \0, so that they can be distinguished from paths containing whitespace. Alas, grep can't process a zero-delimited list of files and still wants the files to search as invocation arguments, not on stdin.
This is where xargs comes into play. It applies stuff it gets on stdin as arguments to other commands. And with its -0 option, it interprets stdin as a zero-delimited list instead of treating whitespace as delimiters.
So
find / -name sshd_config -print0 | xargs -0 grep permitrootlogin
should have you covered.
| is a pipeline, which means, that the standard output of catfind / -name sshd_config -print`` will go to standard intput of grep permitrootlogin, so you just have to be sure what's the output of the first command

Unix find: list of files from stdin

I'm working in Linux & bash (or Cygwin & bash).
I have a huge--huge--directory structure, and I have to find a few needles in the haystack.
Specifically, I'm looking for these files (20 or so):
foo.c
bar.h
...
quux.txt
I know that they are in a subdirectory somewhere under ..
I know I can find any one of them with
find . -name foo.c -print. This command takes a few minutes to execute.
How can I print the names of these files with their full directory name? I don't want to execute 20 separate finds--it will take too long.
Can I give find the list of files from stdin? From a file? Is there a different command that does what I want?
Do I have to first assemble a command line for find with -o using a loop or something?
If your directory structure is huge but not changing frequently, it is good to run
cd /to/root/of/the/files
find . -type f -print > ../LIST_OF_FILES.txt #and sometimes handy the next one too
find . -type d -print > ../LIST_OF_DIRS.txt
after it you can really FAST find anything (with grep, sed, etc..) and update the file-lists only when the tree is changed. (it is a simplified replacement if you don't have locate)
So,
grep '/foo.c$' LIST_OF_FILES.txt #list all foo.c in the tree..
When want find a list of files, you can try the following:
fgrep -f wanted_file_list.txt < LIST_OF_FILES.txt
or directly with the find command
find . type f -print | fgrep -f wanted_file_list.txt
the -f for fgrep mean - read patterns from the file, so you can easily grepping input for multiple patterns...
You shouldn't need to run find twenty times.
You can construct a single command with a multiple of filename specifiers:
find . \( -name 'file1' -o -name 'file2' -o -name 'file3' \) -exec echo {} \;
Is the locate(1) command an acceptable answer? Nightly it builds an index, and you can query the index quite quickly:
$ time locate id_rsa
/home/sarnold/.ssh/id_rsa
/home/sarnold/.ssh/id_rsa.pub
real 0m0.779s
user 0m0.760s
sys 0m0.010s
I gave up executing a similar find command in my home directory at 36 seconds. :)
If nightly doesn't work, you could run the updatedb(8) program by hand once before running locate(1) queries. /etc/updatedb.conf (updatedb.conf(5)) lets you select specific directories or filesystem types to include or exclude.
Yes, assemble your command line.
Here's a way to process a list of files from stdin and assemble your (FreeBSD) find command to use extended regular expression matching (n1|n2|n3).
For GNU find you may have to use one of the following options to enable extended regular expression matching:
-regextype posix-egrep
-regextype posix-extended
echo '
foo\\.c
bar\\.h
quux\\.txt
' | xargs bash -c '
IFS="|";
find -E "$PWD" -type f -regex "^.*/($*)$" -print
echo find -E "$PWD" -type f -regex "^.*/($*)$" -print
' arg0
# note: "$*" uses the first character of the IFS variable as array item delimiter
(
IFS='|'
set -- 1 2 3 4 5
echo "$*" # 1|2|3|4|5
)

Resources