get last line from grep search on multiple files - shell

I'm curently having some problem with a grep command.
I've found the way to only show the last line of a grep search :
grep PATERN FILE_NAME | tail -1
I also find the way to make a grep search in multiple selected files :
find . -name "FILE_NAME" | xargs -I name grep PATERN name
Now I would like to only get the last line of the grep result for each single file.
I tried this :
find . -name "FILE_NAME" | xargs -I name grep PATERN name | tail -1
This returns me only the last value of the last file where I would like to have the last matching patern for every file.

for f in $(find . -name "FILE_NAME"); do grep PATTERN $f | tail -1; done

Sort has a uniq option that allows you to select just one line from many. Try this:
grep PATTERN FILENAMES* | tac | sort -u -t: -k1,1
Explanation:
Grep will return one line for each match in a file. This looks like:
$ grep match file*
file1.txt:match
file1.txt:match2
file2.txt:match3
file2.txt:match4
And what we want is two lines from that output:
$ ???
file1.txt:match2
file2.txt:match4
You can treat this as a sort of table, in which the first column is the filename and the second is the match, where the column separator is the ':' character.
Our first pipe reverses the output:
$ grep match file* | tac
file2.txt:match4
file2.txt:match3
file1.txt:match2
file1.txt:match
Our second pipe to sort, says: pull out the first unique line (-u), where the key to group by is the first one (-k1,1, key from column 1 to column 1), and we split the data into columns with ':' as a delimiter (-t:). It will also sort our output too! And its output:
$ grep match file* | tac sort -u -t: -k1,1
file1.txt:match2
file2.txt:match4

An alternative to this could be done with awk instead of grep. A Posix version would read:
awk '(FNR==1)&&s{print s; s=""}/PATTERN/{s=$0}END{if(s) print s}' file1 file2 file3 ...
Using GNU awk, you can use ENDFILE
awk 'BEGINFILE{s=""}/PATTERN/{s=$0}ENDFILE{if(s) print s}' file1 file2 file3 ...

you can use find to execute commands too:
find . -name "<file-name-to-find>" -exec grep "<pattern-to-match>" "{}" ";" | tail -1
"{}" is the file name, take care with shell globing and expasion when writing the command

Another way to find the last line is to reverse the file and output the first match.
find . -name "FILE_NAME" | xargs -I name sh -c 'tac name|sed -n "/PATTERN/{p;q}"'

You could start with grep's -B (before) parameter. For example to get 5 lines before the match:
duli#i5 /etc/php5/apache2 $ grep -i -B5 timezone php.ini
[CLI Server]
; Whether the CLI web server uses ANSI color coding in its terminal output.
cli_server.color = On
[Date]
; Defines the default timezone used by the date functions
; http://php.net/date.timezone
;date.timezone =

Get last line of each file (prefixed with file name). Then, filter output based on pattern.
find . -name "*" -exec tail -v -n1 {} \; | grep "some_string" -B1
on macOS, you have to do it slightly different way
find . -name "*" | xargs tail -1 | grep "some_string" -B1

7years too late to the party. A slow way to modify the Line of command:
find . -name "FILE_NAME" | xargs -I name sh -c "grep PATERN name | tail -1"
If you need to show the file name in each line:
find . -name "FILE_NAME" | xargs -I name sh -c "grep -H PATERN name | tail -1"

There is a solution without the need for loops, this gives what the OP wants.
find . -type f -exec sh -c "fgrep print {} /dev/null |tail -1" \;
./tway.pl:print map(lambda x : x[1], filter(lambda x : x[0].startswith('volume'), globals().items()))
./txml.py: print("%s does not exist: %s\n" % (host, error))
./utils.py:print combine_dicts(a, b, operator.mul)
./xml_example.py:print ET.tostring(root, method="text")
Compared without the tail -1 gives Too many lines per file but proves the above works.
find . -type f -exec sh -c "fgrep print {} /dev/null" \;
gives:
./tway.pl:print map(lambda x : x[1], filter(lambda x : x[0].startswith('volume'), globals().items()))
./txml.py: print("%s resolved to --> %s\n" % (host, ip))
./txml.py: print("%s does not exist: %s\n" % (host, error))
./utils.py:print "a", a
./utils.py:print "b", b
./utils.py:print combine_dicts(a, b, operator.mul)
./xml_example.py: print ">>"
./xml_example.py: print ET.tostring(e, method="text")
./xml_example.py: print "<<"
./xml_example.py:print ET.tostring(root, method="text")
EDIT - remove the /dev/null if you don't want the filename included in the output.

The sed version
# As soon as we find pattern
# we save that line in hold space
save_pattern_line='/PATTERN/{h;d}'
# switch pattern and hold space
switch_spaces='x'
# At the end of the file
# if the pattern is in the pattern space
# (which we swapped with our hold space)
# switch again, print and exit
eof_print='${/PATTERN/{x;p;d}}'
# Else, switch pattern and hold space
switch_spaces='x'
find . -name 'FILE_NAME' |
xargs sed -s -n -e $save_pattern_line \
-e $switch_spaces \
-e $eof_print \
-e $switch_spaces

The quickest way to do this would be get the output last 1 (or more) lines from the files and then grep through that. So -
tail -1 filenames.* | grep "what you want to grep for"

Related

How to get largest file in directory in bash?

I have a following question. In variable DIR there is unknown number of files and folders. I would like to get the name and size in bytes of the largest one in following order: name size. For example: file.txt 124.
I tried:
cd $DIR
du -a * | sort | head -1
But it does not show the size in bytes, and it is in size name format. How can I improve it please?
This should do the trick:
ls -larS | awk -F' {1,}' 'END{print $NF," ",$5}'
LS long listing reverse sort by Size, then Awk prints the last field $NF, and the 5th field, using a single space or multiple single spaces, as the field separator, of the last line, being the largest size (due to reverse sort order above).
Edit:
It was mentioned a space in the file name might cause an issue, my first suggestion, is dont use spaces in filenames, it is just plain wrong, but if you have to:
ls -larS | awk -F' {1,}' 'END{for (i=9; i<=NF; i++) printf $i" "; print " ",$5}'
will handle the space, or two, or three, or how ever many
What about the following pipeline? I'm using GNU findutils and GNU coreutils. If you work on a Mac you might have to install them.
find -maxdepth 1 -type f -printf '%s %f\0' \
| sort -z -k1,1nr \
| head -zn1 \
| cut -zd' ' -f2-
Explanation:
find -maxdepth 1 -type f -printf '%s %f\0'
Find files in the current folder and print them along with their filesize in bytes, zero terminated. Zero terminated because filenames may contain newlines in UNIX.
sort -z -k1,1nr
Sort the listing by the filesize in bytes, column 1, in reverse order (largest first). -z reads input zero terminated.
head -zn1
prints the first item, which is the largest, after the previous sorting. -z reads input zero terminated
cut -zd' ' -f2-
Cut off the filesize, print only the filename. -z reads input zero terminated.
A variation which should produce the exact output requested:
find -maxdepth 1 -type f -printf "%f %s\0" \
| sort -znr -k2 \
| head -zn1 \
| tr "\0" "\n"

Search for multiple patterns which included double quotes in single file and comment above and below two lines

I have a very large file where I need to search for 40 patterns.
If pattern got matched in the file then need to comment before 2 lines and after 2 lines.
Patterns will be like as shown below:
1.create_rev -name "2x_8_PLL"
2.create_generated_rev -name "76_L"
3.create_rev -name "PCS_T0"
4.create_generated_rev -name "x544_P"
If I need to search for single pattern then I can execute below gvim command to accomplish the task.
:g/create_rev -name "2x_8_PLL"/-2,+2s/^/#
But the search patterns are more in number 40 plus. How to search/grep for 40+ patterns such that my expected output as shown below:
#pp
#oo
create_rev -name "2x_8_PLL"
#aa
#bb
hh
#ii
#jj
create_generated_rev -name "76_L"
#cc
#dd
create_rev -name "PCS_T0"
#ee
#ff
gg
This might work for you (GNU grep and sed):
grep -A2 -B2 -nFf targets file |sed -En 's/^([0-9]+)-.*/\1s#^###/p' |sed -f - file
Use grep to ouput lines in the file matching the lines in the targets. The matches will be line numbered and contain two lines before and after the matches.
The lines output from the grep command are piped into sed and used as addresses for a sed script, which inserts a # at the start of each matching address.
The sed script created from the ouput of the first sed invocation (by way of the -f command line option and the - which uses stdin from the pipe) is used in the second sed invocation which edits the source file.
Another solution using sed only:
sed -E 's/.*/\\#\\n.*\\n&\\n.*\\n#bb/' targets |
sed -Ee ':a;N;s/\n/&/4;Ta' -f - -e 'bc;:b;s/^([^#])/#\1/mg;s/^#//m3;:c;P;D' file
Assuming that when you say "pattern" what you really want is full-line string matching then using any awk in any shell on every Unix box and handling cases of overlapping ranges by commenting them as presumably required and not double-commenting them as could happen with other solutions:
$ cat tst.awk
ARGIND==1 {
targets[$0]
next
}
ARGIND==2 {
if ($0 in targets) {
for (i=FNR-2; i<=FNR+2; i++) {
if (i != FNR) {
hits[i]
}
}
}
next
}
FNR in hits {
$0 = "#" $0
}
{ print }
$ awk -f tst.awk targets file file
#pp
#oo
create_rev -name "2x_8_PLL"
#aa
#bb
hh
#ii
#jj
create_generated_rev -name "76_L"
#cc
#dd
create_rev -name "PCS_T0"
#ee
#ff
gg
$ cat targets
create_rev -name "2x_8_PLL"
create_generated_rev -name "76_L"
create_rev -name "PCS_T0"
create_generated_rev -name "x544_P"
The above uses GNU awk for ARGIND. If you don't have GNU awk then change ARGIND==1 to FILENAME==ARGV[1] and ARGIND==2 to FILENAME==ARGV[2].
If ed is available/acceptable with some help from the shell.
The script myscript
#!/bin/sh
targets=$1
file=$2
{
ed -s "$targets" <<'EOF'
g|.|t.\
-1s|^|g/|\
s|$|/-2;+1s/^\\(#\\)\\{0,1\\}\\(.*\\)/#\\2/\\|\
+1s|.*|;+2;+1s/^\\(#\\)\\{0,1\\}\\(.*\\)/#\\2/|
$a
,p
Q
.
,p
Q
EOF
} | ed -s "$file"
./myscript targets file
Remove the first ,p to silence the output to stdout
Change the first Q to w if in-place editing is needed.
Memory issues from ed could occur depending on how big the big file is.

Ignoring lines from grep matching any element in a bash array

I have an array (superStringIgnoreArray) containing superstrings like "formula", "forest", "foreign", "fortify", and I am running the following grep lines:
eval 'find "$SEARCH_DIR" -type f -print0 | xargs -0 grep -HniI "$hitWord" >> "$OUTPUT_FILE"'
eval 'find "$SEARCH_DIR" -type f -print0 | xargs -0 grep -HniI --color=always "$hitWord" | more'
Where hitWord in this instance would be "for".
How can I return all hits that do not match any entry in my superStringIgnoreArray? (so lines containing "for", "form", "fort" "fork" "forming" would be returned, but "fortify", "forest", etc would not).
Example output:
srcToSearch/open_source_licenses.txt:12:source software packages. One or more such open_source_licenses.txt files may there**for**e
srcToSearch/open_source_licenses.txt:19:-- **For** vCenter Server 5.5u2 GA, the license in**for**mation listed in Parts 2,
srcToSearch/open_source_licenses.txt:22:-- **For** vCenter Server on Linux Virtual Appliance 5.5u2 GA, the license
srcToSearch/open_source_licenses.txt:23:in**for**mation listed in Parts 1, 2, 3, 4, 5 and 6 are applicable.
srcToSearch/open_source_licenses.txt:29:document. This list is provided **for** your convenience; please read further if
grep + bash solution:
superStringIgnoreArray=("formula" "forest" "foreign" "fortify")
grep -HniIr "$hitWord" "$SEARCH_DIR"/* \
| grep -v -f <(printf '%s\n' "${superStringIgnoreArray[#]}") | tee "$OUTPUT_FILE"
since you're outputting the filenames, chaining another grep won't be trivial, but you can achieve the same with awk
$ grep -HniIFr "$hitWord" "$SEARCH_DIR" |
awk 'BEGIN {OFS=FS=":"}
NR==FNR {a[tolower($0)]; next}
{f=$1;n=$2;$1=$2="";
for(k in a) if(tolower($0)~k) next}
{$1=f;$2=n;print}' blacklist -
here awk limits the matches after the filename with : delimiter. If you "hitWord" is a literal adding -F will help. awk is still doing pattern matching though. tolower() is to make the second step case insensitive too.
Since delimiter ":" can appear within the body, we can't depend on $3 in awk, instead, store $1 and $2; remove them from the line, match and add them back before printing. I guess at this point you can add the first grep functionality to this awk as well.
However, I think without -o flag, this and other line based solutions will fail when there is a actual match and unwanted match on the same line. If the unwanted superstrings are few, perhaps a negative lookback/lookahead pattern is a better solution.
If your blacklist is not a file but an array, you can do file substitution as in the other answer, replace with
... | awk '...' <(printf '%s\n' "${superStringIgnoreArray[#]}") -

Count multiple occurrences of some text for each file under a directory

I am trying to count multiple occurrences of some text for each file under a directory. The following script is close to what I want but it does not count multiple occurrences on the same line:
grep -rc 'blah' /some/path --include \*.txt
For example given two files:
foo.txt
blah, hey blah
some more text
bar.txt
something blah
The above script produces:
foo.txt:1
bar.txt:1
But the output I am looking for is*:
foo.txt:2
bar.txt:1
I know that the total number of occurrences can be found in one file using grep and then piping the results to word count:
grep -oh 'blah' foo.txt|wc -l
How to do this for multiple files to achieve the output as in my example* above?
Update
The best solution I could come up with is as follows:
find /some/path -name '*.txt'|awk '{print "echo -n '\''"
$0 "\: '\'' && grep -oh '\''blah'\'' " $0 "|wc -l"}'|bash
grep -o prints each match on new line - then count em up
dir=$1
grep -Hor --include '*.txt' 'blah' $dir|
uniq -c|
# output after uniq
# 3 dir/f0.txt:blah
# 2 dir/f1.txt:blah
awk '{file=gensub(/^.+\/|:.+/, "", "g", $2); print file ":" $1}'

Only get hash value using md5sum (without filename)

I use md5sum to generate a hash value for a file.
But I only need to receive the hash value, not the file name.
md5=`md5sum ${my_iso_file}`
echo ${md5}
Output:
3abb17b66815bc7946cefe727737d295 ./iso/somefile.iso
How can I 'strip' the file name and only retain the value?
A simple array assignment works... Note that the first element of a Bash array can be addressed by just the name without the [0] index, i.e., $md5 contains only the 32 characters of md5sum.
md5=($(md5sum file))
echo $md5
# 53c8fdfcbb60cf8e1a1ee90601cc8fe2
Using AWK:
md5=`md5sum ${my_iso_file} | awk '{ print $1 }'`
You can use cut to split the line on spaces and return only the first such field:
md5=$(md5sum "$my_iso_file" | cut -d ' ' -f 1)
On Mac OS X:
md5 -q file
md5="$(md5sum "${my_iso_file}")"
md5="${md5%% *}" # remove the first space and everything after it
echo "${md5}"
Another way is to do:
md5sum filename | cut -f 1 -d " "
cut will split the line to each space and return only the first field.
By leaning on head:
md5_for_file=`md5sum ${my_iso_file}|head -c 32`
One way:
set -- $(md5sum $file)
md5=$1
Another way:
md5=$(md5sum $file | while read sum file; do echo $sum; done)
Another way:
md5=$(set -- $(md5sum $file); echo $1)
(Do not try that with backticks unless you're very brave and very good with backslashes.)
The advantage of these solutions over other solutions is that they only invoke md5sum and the shell, rather than other programs such as awk or sed. Whether that actually matters is then a separate question; you'd probably be hard pressed to notice the difference.
If you need to print it and don't need a newline, you can use:
printf $(md5sum filename)
md5=$(md5sum < $file | tr -d ' -')
md5=`md5sum ${my_iso_file} | cut -b-32`
md5sum puts a backslash before the hash if there is a backslash in the file name. The first 32 characters or anything before the first space may not be a proper hash.
It will not happen when using standard input (file name will be just -), so pixelbeat's answer will work, but many others will require adding something like | tail -c 32.
if you're concerned about screwy filenames :
md5sum < "${file_name}" | awk NF=1
f244e67ca3e71fff91cdf9b8bd3aa7a5
other messier ways to deal with this :
md5sum "${file_name}" | awk NF=NF OFS= FS=' .*$'
or
| awk '_{ exit }++_' RS=' '
f244e67ca3e71fff91cdf9b8bd3aa7a5
to do it entirely inside awk :
mawk 'BEGIN {
__ = ARGV[ --ARGC ]
_ = sprintf("%c",(_+=(_^=_<_)+_)^_+_*++_)
RS = FS
gsub(_,"&\\\\&",__)
( _=" md5sum < "((_)(__)_) ) | getline
print $(_*close(_)) }' "${file_name}"
f244e67ca3e71fff91cdf9b8bd3aa7a5
Well, I had the same problem today, but I was trying to get the file MD5 hash when running the find command.
I got the most voted question and wrapped it in a function called md5 to run in the find command. The mission for me was to calculate the hash for all files in a folder and output it as hash:filename.
md5() { md5sum $1 | awk '{ printf "%s",$1 }'; }
export -f md5
find -type f -exec bash -c 'md5 "$0"' {} \; -exec echo -n ':' \; -print
So, I'd got some pieces from here and also from 'find -exec' a shell function in Linux
For the sake of completeness, a way with sed using a regular expression and a capture group:
md5=$(md5sum "${my_iso_file}" | sed -r 's:\\*([^ ]*).*:\1:')
The regular expression is capturing everything in a group until a space is reached. To get a capture group working, you need to capture everything in sed.
(More about sed and capture groups here: How can I output only captured groups with sed?)
As delimiter in sed, I use colons because they are not valid in file paths and I don't have to escape the slashes in the filepath.
Another way:
md5=$(md5sum ${my_iso_file} | sed '/ .*//' )
md5=$(md5sum < index.html | head -c -4)

Resources