How to print the file names from which I grep some lines - bash

I'm trying to get some lines from several json files using the following code:
cat $(find ./*/*/folderA/*DTI*.json) | grep -i -E '(phaseencodingdirection|phaseencodingaxis)' > phase_direction
It worked! the problem is that I don't know which line comes from which file
With this find ./*/*/preprocessing/*DTI*.json -type f -printf "%f\n" I can print those names, but they appear at the end and not in order with their respective phaseencodingdirection|phaseencodingaxis extracted lines.
I don't know how to combine those lines of code to print the file's name from which the line was extracted and their respective extracted lines!?
Could you help me?

the problem is that I don't know which line comes from which file
Well no, you don't, because you have concatenated the contents of all the files into a single stream. If you want to be able to identify at the point of pattern matching which file each line comes from then you have to give that information to grep in the first place. Like this, for example:
find ./*/*/folderA/*DTI*.json |
xargs grep -i -E -H '(phaseencodingdirection|phaseencodingaxis)' > phase_direction
The xargs program converts lines read from its standard input into arguments to the specified command (grep in this case). The -H option to grep causes it to list the filename of each match along with the matching line itself.
Alternatively, this variation on the same thing is a little simpler, and closer in some senses to the original:
grep -i -E -H '(phaseencodingdirection|phaseencodingaxis)' \
$(find ./*/*/folderA/*DTI*.json) > phase_direction
That takes xargs out of the picture, and moves the command substitution directly to the argument list of grep.
But now observe that if the pattern ./*/*/folderA/*DTI*.json does not match any directories then find isn't actually doing anything useful for you. There is then no directory recursion to be done, and you haven't specified any tests, so the command substitution will simply expand to all the paths that match the pattern, just like the pattern would do if expanded without find. Thus, this is probably best of all:
grep -i -E -H '(phaseencodingdirection|phaseencodingaxis)' \
./*/*/folderA/*DTI*.json > phase_direction

Use the filenames as arguments to grep rather than cat.
grep -i -H -E '(phaseencodingdirection|phaseencodingaxis)' $(find ./*/*/folderA/*DTI*.json) > phase_direction
The -H option forces grep to incliude filenames in the output even if there's only one file.
But since your arguments to find are filenames, not directories to search recursively, there's no need to use it at all. Just pass the wildcard directly to grep. There's also no need to begin with ./. Any non-absolute pathname is interpreted relative to the current directory.
grep -i -H -E '(phaseencodingdirection|phaseencodingaxis)' */*/folderA/*DTI*.json > phase_direction

You may use recursive grep:
grep -iER 'phaseencodingdirection|phaseencodingaxis' --include=*DTI*.json */*/folderA

Related

Grep to Print all file content [duplicate]

This question already has answers here:
Colorized grep -- viewing the entire file with highlighted matches
(24 answers)
Closed 4 years ago.
How can I modify grep so that it prints full file if its entry matches the grep pattern , instead of printing Just the matching line ?
I tried using(say) grep -C2 to print two lines above and 2 below but this doesn't always works as no. of lines is not fixed ..
I am not Just searching a single file , I am searching an entire directory where some files may contain the given pattern and I want those Files to be completely Printed.
I am also using grep inside grep result without getting printed the first grep output.
Simple grep + cat combination:
grep 'pattern' file && cat file
Use grep's -l option to list the paths of files with matching contents, then print the contents of these files using cat.
grep -lR 'regex' 'directory' | xargs -d '\n' cat
The command from above cannot handle filenames with newlines in them.
To overcome the filename with newlines issue and also allow more sophisticated checks you can use the find command.
The following command prints the content of all regular files in directory.
find 'directory' -type f -exec cat {} +
To print only the content of files whose content matches the regexes regex1 and regex2, use
find 'directory' -type f \
-exec grep -q 'regex1' {} \; -and \
-exec grep -q 'regex2' {} \; \
-exec cat {} +
The linebreaks are only for better readability. Without the \ you can write everything into one line.
Note the -q for grep. That option supresses grep's output. grep's exit status will tell find whether to list a file or not.

Passing filepaths containing spaces with xargs

I'm trying to use xargs to pass the contents of a variable containing zero or more filepaths separated by newlines to another command and have been having inconsistent success.
My input is the output of this:
newHTK=`grep -Fxv -f $TMPFILE /Users/foo/.htk`
Which generates the aforementioned list of filenames. Here's where things go wrong (or sometimes inexplicably right):
echo "$newHTK" | xargs -L 1 xattr -w com.apple.metadata:kMDItemFinderComment htk
The intention is for is to use each line in $newHTK as a filename argument for xattr. What usually happens is xattr splits the input at the spaces. I think I might need to escape the filenames coming out of the echo command or somehow enclose them in double quotation marks (Any advice on an easy way to do this would be appreciated). But if that's the case why did it work for some of the files?
You can use the xargs -I flag (if you have it I don't know what its portability is) to do this.
grep -Fxv -f $TMPFILE /Users/foo/.htk | xargs -I % xattr -w com.apple.metadata:kMDItemFinderComment htk %

In bash, how to batch show the text of certain line in files?

I want to batch show the text of certain line of files in certain directory, usually this can be done with the following commands:
for file in `find ./ -name "results.txt"`;
do
sed -n '12p' < ${file};
done
In the 12th line of each file names "results.txt", there is the text I want to output.
But, I wonder that if we can use the pipeline command to do this operation. I have tried the following command:
find ./ -name "results.txt" | xargs sed -n '12p'
or
find ./ -name "results.txt" | xargs sed -n '12p' < {} \;
But neither works fine.
Could you give some advice or recommend some references, please?
All are welcome, Thanks in advice!
This should do it
find ./ -name results.txt -exec sed '12!d' {} ';'
#Steven Penny's answer is the most elegant and best-performing solution, but to shed some light on why your solution didn't work:
find ./ -name "results.txt" | xargs sed -n '12p'
causes all filenames(1) to be passed at once(2) to sed. Since sed counts lines cumulatively, across input files, only 1 line will be printed for all input files, namely line 12 from the first input file.
Keeping in mind that find's -exec action is the best solution, if you still wanted to solve this problem with xargs, you'd have to use xarg's -I option as follows, so as to ensure that sed is called once per input line (filename) (% is a self-chosen placeholder):
find ./ -name "results.txt" | xargs -I % sed -n '12q;d' %
Footnotes:
(1) with word splitting applied, which would break with paths with embedded spaces, but that's a separate issue.
(2) assuming they don't make the entire command exceed the max. length of a command line; either way, multiple filenames are passed at once.
As an aside: parsing command output with for as in your first snippet is NEVER a good idea - see http://mywiki.wooledge.org/ParsingLs and http://mywiki.wooledge.org/BashFAQ/001
Your use of xargs results in running sed with multiple file arguments. But as you can see, sed doesn't reset the record number to 1 when it starts reading a new file. For example, try running the following command against files with more than 12 lines each.
sed -n '12p' x.txt y.txt
If you want to use xargs, you might consider using awk:
find . -name 'results.txt' | xargs awk 'FNR==12'
P.S: I personally like using the for loop.

In bash, how to find files which there is "test" string in,Exclude binary files

find . -type f |xargs grep string |awk -F":" '{print $1}' |uniq
the command above,it get all files' name which contain string "test". but the result includes
binary file.
The problem is how to exclude binary file.
thanks you all.
If I understand properly, you want to get the name of all the files in the directory and its subdirectories that contain the string string, excluding binary files.
Reading grep's friendly manual, I was able to catch this:
-I Process a binary file as if it did not contain matching data;
this is equivalent to the --binary-files=without-match option.
Amazing!
Now how about I get rid of find. Is this possible with just grep? Oh, two lines below, still in the funky manual, I read this:
-R, -r, --recursive
Read all files under each directory, recursively; this is
equivalent to the -d recurse option.
That seems great, doesn't it?
How about getting only the file name? Still in grep's funny manual, I read:
-l, --files-with-matches
Suppress normal output; instead print the name of each input
file from which output would normally have been printed. The
scanning will stop on the first match. (-l is specified by
POSIX.)
Yay! I think we're done:
grep -IlR 'string' .
Remarks.
I also tried to find make me a sandwich in the manual, but my version of grep doesn't seem to support it. YMMV.
The manual is located at man grep.
As William Pursell rightly comments, the -R and -I switches are not available in all implementations of grep. If your grep possesses the make me a sandwich option, it will very likely support the -R and -I switches. YMMV.
Version of Unix that I work with, does not support the command "grep -I/R".
I tried the command:
file `find ./` | grep text | cut -d: -f1 | xargs grep "test"

perform an operation for *each* item listed by grep

How can I perform an operation for each item listed by grep individually?
Background:
I use grep to list all files containing a certain pattern:
grep -l '<pattern>' directory/*.extension1
I want to delete all listed files but also all files having the same file name but a different extension: .extension2.
I tried using the pipe, but it seems to take the output of grep as a whole.
In find there is the -exec option, but grep has nothing like that.
If I understand your specification, you want:
grep --null -l '<pattern>' directory/*.extension1 | \
xargs -n 1 -0 -I{} bash -c 'rm "$1" "${1%.*}.extension2"' -- {}
This is essentially the same as what #triplee's comment describes, except that it's newline-safe.
What's going on here?
grep with --null will return output delimited with nulls instead of newline. Since file names can have newlines in them delimiting with newline makes it impossible to parse the output of grep safely, but null is not a valid character in a file name and thus makes a nice delimiter.
xargs will take a stream of newline-delimited items and execute a given command, passing as many of those items (one as each parameter) to a given command (or to echo if no command is given). Thus if you said:
printf 'one\ntwo three \nfour\n' | xargs echo
xargs would execute echo one 'two three' four. This is not safe for file names because, again, file names might contain embedded newlines.
The -0 switch to xargs changes it from looking for a newline delimiter to a null delimiter. This makes it match the output we got from grep --null and makes it safe for processing a list of file names.
Normally xargs simply appends the input to the end of a command. The -I switch to xargs changes this to substitution the specified replacement string with the input. To get the idea try this experiment:
printf 'one\ntwo three \nfour\n' | xargs -I{} echo foo {} bar
And note the difference from the earlier printf | xargs command.
In the case of my solution the command I execute is bash, to which I pass -c. The -c switch causes bash to execute the commands in the following argument (and then terminate) instead of starting an interactive shell. The next block 'rm "$1" "${1%.*}.extension2"' is the first argument to -c and is the script which will be executed by bash. Any arguments following the script argument to -c are assigned as the arguments to the script. This, if I were to say:
bash -c 'echo $0' "Hello, world"
Then Hello, world would be assigned to $0 (the first argument to the script) and inside the script I could echo it back.
Since $0 is normally reserved for the script name I pass a dummy value (in this case --) as the first argument and, then, in place of the second argument I write {}, which is the replacement string I specified for xargs. This will be replaced by xargs with each file name parsed from grep's output before bash is executed.
The mini shell script might look complicated but it's rather trivial. First, the entire script is single-quoted to prevent the calling shell from interpreting it. Inside the script I invoke rm and pass it two file names to remove: the $1 argument, which was the file name passed when the replacement string was substituted above, and ${1%.*}.extension2. This latter is a parameter substitution on the $1 variable. The important part is %.* which says
% "Match from the end of the variable and remove the shortest string matching the pattern.
.* The pattern is a single period followed by anything.
This effectively strips the extension, if any, from the file name. You can observe the effect yourself:
foo='my file.txt'
bar='this.is.a.file.txt'
baz='no extension'
printf '%s\n'"${foo%.*}" "${bar%.*}" "${baz%.*}"
Since the extension has been stripped I concatenate the desired alternate extension .extension2 to the stripped file name to obtain the alternate file name.
If this does what you want, pipe the output through /bin/sh.
grep -l 'RE' folder/*.ext1 | sed 's/\(.*\).ext1/rm "&" "\1.ext2"/'
Or if sed makes you itchy:
grep -l 'RE' folder/*.ext1 | while read file; do
echo rm "$file" "${file%.ext1}.ext2"
done
Remove echo if the output looks like the commands you want to run.
But you can do this with find as well:
find /path/to/start -name \*.ext1 -exec grep -q 'RE' {} \; -print | ...
where ... is either the sed script or the three lines from while to done.
The idea here is that find will ... well, "find" things based on the qualifiers you give it -- namely, that things match the file glob "*.ext", AND that the result of the "exec" is successful. The -q tells grep to look for RE in {} (the file supplied by find), and exit with a TRUE or FALSE without generating any of its own output.
The only real difference between doing this in find vs doing it with grep is that you get to use find's awesome collection of conditions to narrow down your search further if required. man find for details. By default, find will recurse into subdirectories.
You can pipe the list to xargs:
grep -l '<pattern>' directory/*.extension1 | xargs rm
As for the second set of files with a different extension, I'd do this (as usual use xargs echo rm when testing to make a dry run; I haven't tested it, it may not work correctly with filenames with spaces in them):
filelist=$(grep -l '<pattern>' directory/*.extension1)
echo $filelist | xargs rm
echo ${filelist//.extension1/.extension2} | xargs rm
Pipe the result to xargs, it will allow you to run a command for each match.

Resources