Shell script string search where '#' is not in 1st character position - shell

this string search was provided by Paul.R (much appreciated Paul):
** find dir -type f -print0 | xargs -0 grep -F -f strings.txt **
Note, I am using the above search argument to perform a recursive directory search for hard coded path names within shell scripts. However, due to the limitations of the Unix environment (TRU64) I am unable to use the GREP -r switch to perform my directory search. Hence the use of the solution provided above.
As an additional criteria, I would like to extend this search argument to exclude any text where the first leading character of the string being searched is "#" (comment symbol).
Would appreciate any feedback.
Thanks...Evan

I'm assuming you're just trying to limit the results in the output from the command you posted. If that's so, then how about
find dir -type f -print0 | xargs -0 grep -F -f strings.txt | grep -v '^#'
The final piped command will ignore all lines that match the regex ^# (begins with # char)

This solution will not work if the path and file of the files is contained in strings.txt, but it might work in your situation.
find dir -type f -print0 | xargs -0 grep -v '^#' | grep -F -f strings.txt

Related

How to find many files from txt file in directory and subdirectories, then copy all to new folder

I can't find posts that help with this exact problem:
On Mac Terminal I want to read a txt file (example.txt) containing file names such as:
20130815 144129 865 000000 0172 0780.bmp
20130815 144221 511 000003 1068 0408.bmp
....100 more
And I want to search for them in a certain folder/subfolders (example_folder). After each find, the file should be copied to a new folder x (new_destination).
Your help would be much appreciated!
Chers,
Mo
You could use a piped command with a combination of ls, grep, xargs and cp.
So basically you start with getting the list of files
ls
then you filter them with egrep -e, grep -e or whatever flavor of grep Mac uses for their terminal. If you want to find all files ending with text you can use the regex .txt$ (which means ends with '.txt')
ls | egrep -e "yourRegexExpression"
After that you get an input stream, but cp doesn't work with input streams and only takes a bunch of arguments, that's why we use xargs to convert it to arguments. The final step is to add the flag -t to the argument to signify that the next argument is the target directory.
ls | egrep -e "yourRegexExpression" | xargs cp -t DIRECTORY
I hope this helps!
Edit
Sorry I didn't read the question well enough, I updated to be match your problem. Here you can see that the egrep command compiles a rather large regex string with all the file names in this way (filename1|filename2|...|fileN). The $() evaluates the command inside and uses the tr to translate newLines to "|" for the regex.
ls | egrep -e "("$(cat yourtextfile.txt | tr "\n" "|")")" | xargs cp -t DIRECTORY
You could do something like:
$ for i in `cat example.txt`
find /search/path -type f -name "$i" -exec cp "{}" /new/path \;
This is how it works, for every line within example.txt:
for i in `cat example.txt`
it will try to find a file matching the line $i in the defined path:
find /search/path -type f -name "$i"
And if found it will copy it to the desired location:
-exec cp "{}" /new/path \;

How to pipe mdfind to grep with a pattern and then cp

I have been trying to come up with a mdfind to locate certain files. I am not using find because it takes too long to search across a windows drive and I am on a Mac. I have indexed using mdutil and now simply want to search for files with the pattern where the file in the path starts with example. "/Volumes/DRIVE/SOME/PATH/DAD14-BLAH-BLAH.jpg". There must be a simpler way to use mdfind to look for a jpg greater than 500k and grep the path with a pattern? Below is the code I have come up with but no results are returned. Any help is deeply appreciated.
cat filelist.txt | while read -r FILE;
do mdfind -onlyin /Volumes/DRIVE/ 'kMDItemKind = "*image" && kMDItemFSSize > 500000' -name "$FILE" -0
| xargs -0 -I{} grep -i -E '.*\/[a-zA-Z]{1,3}[0-9]+.*\.(jpe?g|png|tiff?|psd)' {}
| xargs -0 -I{} cp -a {} ./images; done;
Bass
You don't want to use xargs for the grep command. Doing so means grepping the contents of the found files for matches of the pattern. You want to actually grep the output of mdfind.
That also means you don't want to use -0 with mdfind. You want each file path to be on a separate line, since grep is going to output the matching lines. Therefore, you don't want to use -0 with the final xargs command, either.
You probably want to require that the extension is at the end of the string. And you want the explicit slash (/) in your pattern to be the last slash in the string.
cat filelist.txt | while read -r FILE;
do mdfind -onlyin /Volumes/DRIVE/ 'kMDItemKind = "*image" && kMDItemFSSize > 500000' -name "$FILE"
| grep -i -E '.*\/[a-zA-Z]{1,3}[0-9]+[^/]*\.(jpe?g|png|tiff?|psd)$'
| xargs -I{} cp -a {} ./images; done;

grep cannot read filename after find folders with spaces

Hi after I find the files and enclose their name with double quotes with the following command:
FILES=$(find . -type f -not -path "./.git/*" -exec echo -n '"{}" ' \; | tr '\n' ' ')
I do a for loop to grep a certain word inside each file that matches find:
for f in $FILES; do grep -Eq '(GNU)' $f; done
but grep complains about each entry that it cannot find file or directory:
grep: "./test/test.c": No such file or directory
see picture:
whereas echo $FILES produces:
"./.DS_Store" "./.gitignore" "./add_license.sh" "./ads.add_lcs.log" "./lcs_gplv2" "./lcs_mit" "./LICENSE" "./new test/test.js" "./README.md" "./sxs.add_lcs.log" "./test/test.c" "./test/test.h" "./test/test.js" "./test/test.m" "./test/test.py" "./test/test.pyc"
EDIT
found the answer here. works perfectly!
The issue is that your array contains filenames surrounded by literal " quotes.
But worse, find's -exec cmd {} \; executes cmd separately for each file which can be inefficient. As mentioned by #TomFenech in the comments, you can use -exec cmd {} + to search as many files within a single cmd invocation as possible.
A better approach for recursive search is usually to let find output filenames to search, and pipe its results to xargs in order to grep inside as many filenames together as possible. Use -print0 and -0 respectively to correctly support filenames with spaces and other separators, by splitting results by a null character instead - this way you don't need quotes, reducing possibility of bugs.
Something like this:
find . -type f -not -path './.git/*' -print0 | xargs -0 egrep '(GNU)'
However in your question you had grep -q in a loop, so I suspect you may be looking for an error status (found/not found) for each file? If so, you could use -l instead of -q to make grep list matching filenames, and then pipe/send that output to where you need the results.
find . -print0 | xargs -0 egrep -l pattern > matching_filenames
Also note that grep -E (or egrep) uses extended regular expressions, which means parentheses create a regex group. If you want to search for files containing (GNU) (with the parentheses) use grep -F or fgrep instead, which treats the pattern as a string literal.

From UNIX shell, how to find all files containing a specific string, then print the 4th line of each file?

I want to find all files within the current directory that contain a given string, then print just the 4th line of each file.
grep --null -l "$yourstring" * | # List all the files containing your string
xargs -0 sed -n '4p;q' # Print the fourth line of said files.
Different editions of grep have slightly different incantations of --null, but it's usually there in some form. Read your manpage for details.
Update: I believe one of the null file list incantations of grep is a reasonable solution that will cover the vast majority of real-world use cases, but to be entirely portable, if your version of grep does not support any null output it is not perfectly safe to use it with xargs, so you must resort to find.
find . -maxdepth 1 -type f -exec grep -q "$yourstring" {} \; -exec sed -n '4p;q' {} +
Because find arguments can almost all be used as predicates, the -exec grep -q… part filters the files that are eventually fed to sed down to only those that contain the required string.
From other user:
grep -Frl string . | xargs -n 1 sed -n 4p
Give a try to the below GNU find command,
find . -maxdepth 1 -type f -exec grep -l 'yourstring' {} \; | xargs -I {} awk 'NR==4{print; exit}' {}
It finds all the files in the current directory which contains specific string, and prints the line number 4 present in each file.
This for loop should work:
while read -d '' -r file; do
echo -n "$file: "
sed '4q;d' "$file"
done < <(grep --null -l "some-text" *.txt)

Bash find filter and copy - trouble with spaces

So after a lot of searching and trying to interpret others' questions and answers to my needs, I decided to ask for myself.
I'm trying to take a directory structure full of images and place all the images (regardless of extension) in a single folder. In addition to this, I want to be able to remove images matching certain filenames in the process. I have a find command working that outputs all the filepaths for me
find -type f -exec file -i -- {} + | grep -i image | sed 's/\:.*//'
but if I try to use that to copy files, I have trouble with the spaces in the filenames.
cp `find -type f -exec file -i -- {} + | grep -i image | sed 's/\:.*//'` out/
What am I doing wrong, and is there a better way to do this?
With the caveat that it won't work if files have newlines in their names:
find . -type f -exec file -i -- {} + |
awk -vFS=: -vOFS=: '$NF ~ /image/{NF--;printf "%s\0", $0}' |
xargs -0 cp -t out/
(Based on answer by Jonathan Leffler and subsequent comments discussion with him and #devnull.)
The find command works well if none of the file names contain any newlines. Within broad limits, the grep command works OK under the same circumstances. The sed command works fine as long as there are no colons in the file names. However, given that there are spaces in the names, the use of $(...) (command substitution, also indicated by back-ticks `...`) is a disaster. Unfortunately, xargs isn't readily a part of the solution; it splits on spaces by default. Because you have to run file and grep in the middle, you can't easily use the -print0 option to (GNU) find and the -0 option to (GNU) xargs.
In some respects, it is crude, but in many ways, it is easiest if you write an executable shell script that can be invoked by find:
#!/bin/bash
for file in "$#"
do
if file -i -- "$file" | grep -i -q "$file:.*image"
then cp "$file" out/
fi
done
This is a little painful in that it invokes file and grep separately for each name, but it is reliable. The file command is even safe if the file name contains a newline; the grep is probably not.
If that script is called 'copyimage.sh', then the find command becomes:
find . -type f -exec ./copyimage.sh {} +
And, given the way the grep command is written, the copyimage.sh file won't be copied, even though its name contains the magic word 'image'.
Pipe the results of your find command to
xargs -l --replace cp "{}" out/
Example of how this works for me on Ubuntu 10.04:
atomic#atomic-desktop:~/temp$ ls
img.png img space.png
atomic#atomic-desktop:~/temp$ mkdir out
atomic#atomic-desktop:~/temp$ find -type f -exec file -i \{\} \; | grep -i image | sed 's/\:.*//' | xargs -l --replace cp -v "{}" out/
`./img.png' -> `out/img.png'
`./img space.png' -> `out/img space.png'
atomic#atomic-desktop:~/temp$ ls out
img.png img space.png
atomic#atomic-desktop:~/temp$

Resources