Use lines in a file as filenames for grep? - shell

I have a file which contains filenames (and the full path to them) and I want to search for a word within all of them.
some pseudo-code to explain:
grep keyword <all files specified in files.txt>
or
cat files.txt > grep keyword
cat files txt | grep keyword
the problem is that I can only get grep to search the filenames, not the contents of the actual files.

cat files.txt | xargs grep keyword
or
grep keyword `cat files.txt`
or (equivalent to previous but harder to mis-read)
grep keyword $(cat files.txt)
should do the trick.
Pitfalls:
If files.txt contains file names with spaces, either solution will malfunction, because "This is a filename.txt" will be interpreted as four files, "This", "is", "a", and "filename.txt". A good reason why you shouldn't have spaces in your filenames, ever.
There are ways around this, but none of them is trivial. (find ... -print0 / xargs -0 is one of them.)
The second (cat) version can result in a very long command line (which might fail when exceeding the limits of your environment). The first (xargs) version handles long input automatically; xargs offers several options to control the details.

Both of the answers from DevSolar work (tested on Linux Ubuntu), but the xargs version is preferable if there may be many files, since it will avoid running into command line length limits.
so:
cat files.txt | xargs grep keyword
is the way to go

tr '\n' '\0' <files.txt | LANG=C xargs -r0 grep -F keyword
tr will delimit names with NUL character so that spaces not significant (note the corresponding -0 option to xargs).
xargs -r will start a single grep process for a "large" number of files, but not start any grep process if there are no files.
LANG=C means use quick routines for matching, rather than slow locale ones
grep -F means use quick string matching rather than slow regular expression matching

bash, ksh & zsh version:
grep keyword $(<files.txt)

Long time when last created a bash shell script, but you could store the result of the first grep (the one finding all filenames) in an array and iterate over it, issuing even more grep commands.
A good starting point should be the bash scripting guide.

Related

What is the best way to process multiple lines and extract a large list of specific strings? [duplicate]

I'm after a grep-type tool to search for purely literal strings. I'm looking for the occurrence of a line of a log file, as part of a line in a seperate log file. The search text can contain all sorts of regex special characters, e.g., []().*^$-\.
Is there a Unix search utility which would not use regex, but just search for literal occurrences of a string?
You can use grep for that, with the -F option.
-F, --fixed-strings PATTERN is a set of newline-separated fixed strings
That's either fgrep or grep -F which will not do regular expressions. fgrep is identical to grep -F but I prefer to not have to worry about the arguments, being intrinsically lazy :-)
grep -> grep
fgrep -> grep -F (fixed)
egrep -> grep -E (extended)
rgrep -> grep -r (recursive, on platforms that support it).
Pass -F to grep.
you can also use awk, as it has the ability to find fixed string, as well as programming capabilities, eg only
awk '{for(i=1;i<=NF;i++) if($i == "mystring") {print "do data manipulation here"} }' file
cat list.txt
one:hello:world
two:2:nothello
three:3:kudos
grep --color=always -F"hello
three" list.txt
output
one:hello:world
three:3:kudos
I really like the -P flag available in GNU grep for selective ignoring of special characters.
It makes grep -P "^some_prefix\Q[literal]\E$" possible
from grep manual
-P, --perl-regexp
Interpret I as Perl-compatible regular
expressions (PCREs). This option is experimental when
combined with the -z (--null-data) option, and grep -P may
warn of unimplemented features.

Piping the contents of a file to ls

I have a file called "input.txt." that contains one line:
/bin
I would like to make the contents of the file be the input of the command ls
I tried doing
cat input.txt | ls
but it doesn't output the list of files in the /bin directory
I also tried
ls < input.txt
to no avail.
You are looking for the xargs (transpose arguments) command.
xargs ls < input.txt
You say you want /bin to be the "input" to ls, but that's not correct; ls doesn't do anything with its input. Instead, you want /bin to be passed as a command-line argument to ls, as if you had typed ls /bin.
Input and arguments are completely different things; feeding text to a command as input is not the same as supplying that text as an argument. The difference can be blurred by the fact that many commands, such as cat, will operate on either their input or their arguments (or both) – but even there, we find an important distinction: what they actually operate on is the content of files whose names are passed as arguments.
The xargs command was specifically designed to transform between those two things: it interprets its input as a whitespace-separated list of command-line arguments to pass to some other command. That other command is supplied to xargs as its command-line argument(s), in this case ls.
Thanks to the input redirection provided by the shell via <, the arguments xargs supplies to ls here come from the input.txt file.
There are other ways of accomplishing the same thing; for instance, as long as input.txt does not have so many files in it that they won't fit in a single command line, you can just do this:
ls $(< input.txt)
Both the above command and the xargs version will treat any spaces in the input.txt file as separating filenames, so if you have filenames containing space characters, you'll have to do more work to interpret the file properly. Also, note that if any of the filenames contain wildcard/"glob" characters like ? or * or [...], the $(<...) version will expand them as wildcard patterns, while xargs will not.
ls takes the filenames from its command line, not its standard input, which | ls and ls < file would use.
If you have only one file listed in input.txt and the filename doesn't contain trailing newlines, it's enough to use (note quotes):
ls "$(cat input.txt)"
Or in almost all but plain POSIX shell:
ls "$(< input.txt)"
If there are many filenames in the file, you'd want to use xargs, but to deal with whitespace in the names, use -d "\n" (with GNU xargs) to take each line as a filename.
xargs -d "\n" ls < input.txt
Or, if you need to handle filenames with newlines, you can separate them using NUL bytes in the input, and use
xargs -0 ls < input.txt
(This also works even if there's only one filename.)
Try xargs
cat file | xargs ls
Ohhh man, I have to put this to get 30 characters long ;)

In bash, how to find files which there is "test" string in,Exclude binary files

find . -type f |xargs grep string |awk -F":" '{print $1}' |uniq
the command above,it get all files' name which contain string "test". but the result includes
binary file.
The problem is how to exclude binary file.
thanks you all.
If I understand properly, you want to get the name of all the files in the directory and its subdirectories that contain the string string, excluding binary files.
Reading grep's friendly manual, I was able to catch this:
-I Process a binary file as if it did not contain matching data;
this is equivalent to the --binary-files=without-match option.
Amazing!
Now how about I get rid of find. Is this possible with just grep? Oh, two lines below, still in the funky manual, I read this:
-R, -r, --recursive
Read all files under each directory, recursively; this is
equivalent to the -d recurse option.
That seems great, doesn't it?
How about getting only the file name? Still in grep's funny manual, I read:
-l, --files-with-matches
Suppress normal output; instead print the name of each input
file from which output would normally have been printed. The
scanning will stop on the first match. (-l is specified by
POSIX.)
Yay! I think we're done:
grep -IlR 'string' .
Remarks.
I also tried to find make me a sandwich in the manual, but my version of grep doesn't seem to support it. YMMV.
The manual is located at man grep.
As William Pursell rightly comments, the -R and -I switches are not available in all implementations of grep. If your grep possesses the make me a sandwich option, it will very likely support the -R and -I switches. YMMV.
Version of Unix that I work with, does not support the command "grep -I/R".
I tried the command:
file `find ./` | grep text | cut -d: -f1 | xargs grep "test"

assigning pipeline result in a variable

I would like to count the number of *.sav files in the $1 folder.
I tried to use the following command but it doesn't work, what is the problem?
numOfFiles=`ls $1 | grep -o *\.sav | wc -w`
Do not parse the output from ls
Use bash arrays instead.
shopt -s nullglob
sav_files=(*.sav)
echo ${#sav_files[#]}
The main problem is that *.sav is not a regular expression, it's a glob. You likely wanted to grep for \.sav, which you have to escape once for the shell and once again because of the antiquated `` syntax:
numOfFiles=`ls $1 | grep -o \\\\.sav | wc -w`
Additionally, you should not parse the output of ls, and wc -w will cause filenames with spaces to be counted multiple times.
A better solution would be to use find:
numOfFiles=$(find "$1" -maxdepth 1 -name '*.sav' | wc -l)
For an even better, Bash specific solution, see 1_CR's answer.
Don't want search hypothetical problems in file counting, but parsing output from ls can be sometimes wrong idea. Try it yourself, create a file:
touch 'aa bb
cc.sav'
so, create file with space and newline in its name. Usually any ls parsing will fail.
The 1_CR's solution is NICE and ELEGANT.
If you for some reason don't like it, you can use find and don't use wc because wc for example fails is the last line doesn't contain newline. (wc don't count lines but \n chars.
The next will give correct answer:
numOfFiles=$(find . -maxdepth 1 -name \*.sav -print0 | grep -zc .)
the find with -print0 prints all found files "zero ended", and grep -z can read like zero delimited "lines". the -c simple count found lines for the pattern . (any char)
Your regex is wrong, * is a repetition operator to grep but you don't specify what to repeat.
Anyway, the grep is unnecessary; just specify the pattern as a shell glob.
numOfFiles=`ls "$1"/*.sav | wc -l`
Note the quotes around $1 and the use of line count rather than word count; wc -w would return a count of two for underscore_filename.
If you do use grep for something, it's usually a good idea to put the search pattern in single quotes, so you don't have to backslash-escape shell metacharacters. (Even with the backslash, your pattern would end up unescaped because the shell eats one level of escapes in unquoted strings.)
If you have any matches on *.sav in the current directory, the unquoted regex would be expanded as a glob by the shell, and produce interesting results which could be very hard to debug.
Personally I would recommend using quotes around the variables to prevent spaces being a problem.
And additionally, wc -l is a better solution since it counts lines instead of words (files with spaces might count twice).
So try this instead:
numOfFiles=$(ls -1 "$1/"*.sav | wc -l)

perform an operation for *each* item listed by grep

How can I perform an operation for each item listed by grep individually?
Background:
I use grep to list all files containing a certain pattern:
grep -l '<pattern>' directory/*.extension1
I want to delete all listed files but also all files having the same file name but a different extension: .extension2.
I tried using the pipe, but it seems to take the output of grep as a whole.
In find there is the -exec option, but grep has nothing like that.
If I understand your specification, you want:
grep --null -l '<pattern>' directory/*.extension1 | \
xargs -n 1 -0 -I{} bash -c 'rm "$1" "${1%.*}.extension2"' -- {}
This is essentially the same as what #triplee's comment describes, except that it's newline-safe.
What's going on here?
grep with --null will return output delimited with nulls instead of newline. Since file names can have newlines in them delimiting with newline makes it impossible to parse the output of grep safely, but null is not a valid character in a file name and thus makes a nice delimiter.
xargs will take a stream of newline-delimited items and execute a given command, passing as many of those items (one as each parameter) to a given command (or to echo if no command is given). Thus if you said:
printf 'one\ntwo three \nfour\n' | xargs echo
xargs would execute echo one 'two three' four. This is not safe for file names because, again, file names might contain embedded newlines.
The -0 switch to xargs changes it from looking for a newline delimiter to a null delimiter. This makes it match the output we got from grep --null and makes it safe for processing a list of file names.
Normally xargs simply appends the input to the end of a command. The -I switch to xargs changes this to substitution the specified replacement string with the input. To get the idea try this experiment:
printf 'one\ntwo three \nfour\n' | xargs -I{} echo foo {} bar
And note the difference from the earlier printf | xargs command.
In the case of my solution the command I execute is bash, to which I pass -c. The -c switch causes bash to execute the commands in the following argument (and then terminate) instead of starting an interactive shell. The next block 'rm "$1" "${1%.*}.extension2"' is the first argument to -c and is the script which will be executed by bash. Any arguments following the script argument to -c are assigned as the arguments to the script. This, if I were to say:
bash -c 'echo $0' "Hello, world"
Then Hello, world would be assigned to $0 (the first argument to the script) and inside the script I could echo it back.
Since $0 is normally reserved for the script name I pass a dummy value (in this case --) as the first argument and, then, in place of the second argument I write {}, which is the replacement string I specified for xargs. This will be replaced by xargs with each file name parsed from grep's output before bash is executed.
The mini shell script might look complicated but it's rather trivial. First, the entire script is single-quoted to prevent the calling shell from interpreting it. Inside the script I invoke rm and pass it two file names to remove: the $1 argument, which was the file name passed when the replacement string was substituted above, and ${1%.*}.extension2. This latter is a parameter substitution on the $1 variable. The important part is %.* which says
% "Match from the end of the variable and remove the shortest string matching the pattern.
.* The pattern is a single period followed by anything.
This effectively strips the extension, if any, from the file name. You can observe the effect yourself:
foo='my file.txt'
bar='this.is.a.file.txt'
baz='no extension'
printf '%s\n'"${foo%.*}" "${bar%.*}" "${baz%.*}"
Since the extension has been stripped I concatenate the desired alternate extension .extension2 to the stripped file name to obtain the alternate file name.
If this does what you want, pipe the output through /bin/sh.
grep -l 'RE' folder/*.ext1 | sed 's/\(.*\).ext1/rm "&" "\1.ext2"/'
Or if sed makes you itchy:
grep -l 'RE' folder/*.ext1 | while read file; do
echo rm "$file" "${file%.ext1}.ext2"
done
Remove echo if the output looks like the commands you want to run.
But you can do this with find as well:
find /path/to/start -name \*.ext1 -exec grep -q 'RE' {} \; -print | ...
where ... is either the sed script or the three lines from while to done.
The idea here is that find will ... well, "find" things based on the qualifiers you give it -- namely, that things match the file glob "*.ext", AND that the result of the "exec" is successful. The -q tells grep to look for RE in {} (the file supplied by find), and exit with a TRUE or FALSE without generating any of its own output.
The only real difference between doing this in find vs doing it with grep is that you get to use find's awesome collection of conditions to narrow down your search further if required. man find for details. By default, find will recurse into subdirectories.
You can pipe the list to xargs:
grep -l '<pattern>' directory/*.extension1 | xargs rm
As for the second set of files with a different extension, I'd do this (as usual use xargs echo rm when testing to make a dry run; I haven't tested it, it may not work correctly with filenames with spaces in them):
filelist=$(grep -l '<pattern>' directory/*.extension1)
echo $filelist | xargs rm
echo ${filelist//.extension1/.extension2} | xargs rm
Pipe the result to xargs, it will allow you to run a command for each match.

Resources