how to find every file in my repo that has a specific word in the last line? - bash

In other words, how to combine tail and find/grep command in bash.
I want to find all the files(including the files in subdirectories) in my repo have a specific word in the last line, say FIX in the last line. I tried grep -Rl "FIX" to display all the files containing "FIX", but I don't know how to combine the tail command in it. Anyone can help??

Run tail on all the files at once and then grep the output for FIX. Since tail prepends each line with the corresponding file name when given multiple file names, that's all you have to do.
find -type f -exec tail -n1 {} + | grep FIX
Or use ** to find all files and subdirectories, then run tail on each of them one at a time:
shopt -s globstar
for file in **; do
[[ -f $file ]] && tail -n1 "$file" | grep -q FIX && echo "$file"
done
Or use find to find all matches and pipe it to a while read loop:
find -type f -print0 | while IFS= read -rd '' file; do
tail -n1 "$file" | grep -q FIX && echo "$file"
done
Or do the same thing but with -exec + and an explicit sub-shell:
find -type f -exec sh -c 'for file; do tail -n1 "$file" | grep -q FIX && echo "$file"; done' sh {} +

If you want to know if the last line matches a pattern, use sed and restrict the match to the last line with $. sed doesn't easily give a return value or do pretty printing of the filename like grep, but it gets the job done.
find . -exec sh -c "sed -n '$ { /FIX/p; }' {} | grep -q . " \; -print
Here, we use -n to suppress printing, and then print (with /p) only when the last line matches the pattern /FIX/. The output is piped to grep to get a return value that find uses to decide whether or not to -print the name.
Or, you can avoid using grep for the return by doing something like:
find . -exec awk 'END{ exit ! match($0, "FIX")}' {} \; -print

Related

Linux: Pipe `find` text file list | `dos2unix` | `dd` command

What I'm attempting to do:
Line 1: find any .txt or .TXT file and pipe them into next command
Line 2: convert the .txt file to unix format (get rid of Windows line endings)
Line 3: delete the last line of the file, which is always blank
find "${TEMPDIR}" -name *.[Tt][Xx][Tt] | /
xargs dos2unix -k | /
dd if=/dev/null of="$_" bs=1 seek=$(echo $(stat --format=%s "$_" ) - $( tail -n1 "$_" | wc -c) | bc )
I can't pipe the (EDIT output) filename of xargs dos2unix -k | / into the third line, I get the following error:
stat: cannot stat '': No such file or directory
tail: cannot open '' for reading: No such file or directory
dd: failed to open '': No such file or directory
Clearly Iv'e wrongly assumed that "$_" will be enough to pass the output through the pipe.
How can I pipe the output (a text file) from xargs dos2unix -k into the third line, dd if=/dev/null of="$_" bs=1 seek=$(echo $(stat --format=%s "$_" ) - $( tail -n1 "$_" | wc -c) | bc )?
The solution for line 3 comes from an answer to another question on SO about removing the last line from a file, with this answer in particular being touted as a good solution for large files: https://stackoverflow.com/a/17794626/893766
Can this help?
find "${TEMPDIR}" -iname '*.txt' -exec dos2unix "{}" \; -exec sed -i '$d' "{}" \;
You can try to substitute dos2unix with an explicit replace:
find "${TEMPDIR}" -iname '*.txt' -exec cat {} \; |
tr -d '\r' |
...
As the windows for new line is \r\n you remove all the occurrences of \r with the command tr.
About the find command you can use the option -iname for case-insensitive search and the -exec to run a command.
If the file is really big, you are already messing up the efficiency by rewriting it with tr. Then, you are reading it a second time with tail just to get the index of the last line.
The least inefficient fix I can come up with is to replace dos2unix and dd with just one command which performs both functions, so you only read and write the output file once.
find "$TMPDIR" -iname '*.txt' -exec perl -i -ne '
print $line if defined $line; ($line = $_) =~ s/\015$//' {} \;
Your attempt to use $_ for the current file name doesn't work. The value of $_ is the last file name used by the previous completed command; but in the middle of a pipeline, nothing is yet completed. One possible workaround (which I include only for illustration, not as a recommended solution) would be to run everything in xargs where you have access to {}, similarly to how it works in find -exec.
find "$TMPDIR" -iname '*.txt' -print0 |
xargs -r0 sh -c 'dos2unix -k "{}"
if=/dev/null of="{}" bs=1 seek=$(
echo $(stat --format=%s "{}" ) - $( tail -n1 "{}" | wc -c) | bc)
I added -print0 and the corresponding xargs -0 as well as xargs -r as illustrations of good form; though the zero-terminated text format is a GNU find extension not generally found on other platforms.
(Privately, I would probably also replace the seek calculation with a simple Awk script, rather than expend three processes on performing a subtraction.)

List files whose last line doesn't contain a pattern

The very last line of my file should be "#"
if I tail -n 1 * | grep -L "#" the result is (standard input) obviously because it's being piped.
was hoping for a grep solution vs reading the entire file and just searching the last line.
for i in *; do tail -n 1 "$i" | grep -q -v '#' && echo "$i"; done
You can use sed for that:
sed -n 'N;${/pattern/!p}' file
The above command prints all lines of file if it's last line doesn't contain a pattern.
However, it looks like I misunderstood you, you want only to print the file names of the those files where the last line doesn't match the pattern. In this case I would use find together with the following (GNU) sed command:
find -maxdepth 1 -type f -exec sed -n '${/pattern/!F}' {} \;
The find command iterates over all files in the current folder and executes the sed command. $ marks the last line of input. If /pattern/ isn't found ! then F prints the file name.
The solution above looks nice and executes fast it has a drawback it would not print the names of empty files, since the last line will never reached and $ will not match.
For a stable solution I would suggest to put the commands into a script:
script.sh
#!/bin/bash
# Check whether the file is empty ...
if [ ! -s "$1" ] ; then
echo "$1"
else
# ... or if the last line contains a pattern
sed -n '${/pattern/!F}' "$1"
# If you don't have GNU sed you can use this
# (($(tail -n1 a.txt | grep -c pattern))) || echo "$1"
fi
make it executable
chmod +x script.sh
And use the following find command:
find -maxdepth 1 -type f -exec ./script.sh {} \;
Consider this one-liner:
while read name ; do tail -n1 "$name" | grep -q \# || echo "$name" does not contain the pattern ; done < <( find -type f )
It uses tail to get the last line of each file and grep to test that line against the pattern. Performance will not be the best on many files because two new processes are started in each iteration.

From UNIX shell, how to find all files containing a specific string, then print the 4th line of each file?

I want to find all files within the current directory that contain a given string, then print just the 4th line of each file.
grep --null -l "$yourstring" * | # List all the files containing your string
xargs -0 sed -n '4p;q' # Print the fourth line of said files.
Different editions of grep have slightly different incantations of --null, but it's usually there in some form. Read your manpage for details.
Update: I believe one of the null file list incantations of grep is a reasonable solution that will cover the vast majority of real-world use cases, but to be entirely portable, if your version of grep does not support any null output it is not perfectly safe to use it with xargs, so you must resort to find.
find . -maxdepth 1 -type f -exec grep -q "$yourstring" {} \; -exec sed -n '4p;q' {} +
Because find arguments can almost all be used as predicates, the -exec grep -q… part filters the files that are eventually fed to sed down to only those that contain the required string.
From other user:
grep -Frl string . | xargs -n 1 sed -n 4p
Give a try to the below GNU find command,
find . -maxdepth 1 -type f -exec grep -l 'yourstring' {} \; | xargs -I {} awk 'NR==4{print; exit}' {}
It finds all the files in the current directory which contains specific string, and prints the line number 4 present in each file.
This for loop should work:
while read -d '' -r file; do
echo -n "$file: "
sed '4q;d' "$file"
done < <(grep --null -l "some-text" *.txt)

How to find files containing exactly 16 lines?

I have to find files that containing exactly 16 lines in Bash.
My idea is:
find -type f | grep '/^...$/'
Does anyone know how to utilise find + grep or maybe find + awk?
Then,
Move the matching files another directory.
Deleting all non-matching files.
I would just do:
wc -l **/* 2>/dev/null | awk '$1=="16"'
Keep it simple:
find . -type f |
while IFS= read -r file
do
size=$(wc -l < "$file")
if (( size == 16 ))
then
mv -- "$file" /wherever/you/like
else
rm -f -- "$file"
fi
done
If your file names can contain newlines then google for the find and read options to handle that.
You should use grep instead of wc because wc counts newline characters \n and will not count if the last line doesn't ends with a newline.
e.g.
grep -cH '' * 2>/dev/null | awk -F: '$2==16'
for more correct approach (without error messages, and without argument list too long error) you should combine it with the find and xargs commands, like
find . -type f -print0 | xargs -0 grep -cH '' | awk -F: '$2==16'
if you don't want count empty lines (so only lines what contains at least one character), you can replace the '' with the '.'. And instead of awk, you can use second grep, like:
find . -type f -print0 | xargs -0 grep -cH '.' | grep ':16$'
this will find all files what are contains 16 non-empty lines... and so on..
GNU sed
sed -E '/^.{16}$/!d' file
A pure bash version:
#!/usr/bin/bash
for f in *; do # Look for files in the present dir
[ ! -f "$f" ] && continue # Skip not simple files
cnt=0
# Count the first 17 lines
while ((cnt<17)) && read x; do ((++cnt)); done<"$f"
if [ $cnt == 16 ] ; then echo "Move '$f'"
else echo "Delete '$f'"
fi
done
This snippet will do the work:
find . -type f -readable -exec bash -c \
'if(( $(grep -m 17 -c "" "$0")==16 )); then echo "file $0 has 16 lines"; else echo "file $0 doesn'"'"'t have 16 lines"; fi' {} \;
Hence, if you need to delete the files that are not 16 lines long, and move those who are 16 lines long to folder /my/folder, this will do:
find . -type f -readable -exec bash -c \
'if(( $(grep -m 17 -c "" "$0")==16 )); then mv -nv "$0" /my/folder; else rm -v "$0"; fi' {} \;
Observe the quoting for "$0" so that it's safe regarding any file name with funny symbols in it (spaces, ...).
I'm using the -v option so that rm and mv are verbose (I like to know what's happening). The -n option to mv is no-clobber: a security to not overwrite an existing file; this option might not be available if you have an old system.
The good thing about this method. It's really safe regarding any filename containing funny symbols.
The bad thing(s). It forks a bash and a grep and an mv or rm for each file found. This can be quite slow. This can be fixed using trickier stuff (while still remaining safe regarding funny symbols in filenames). If you really need it, I can give you a possible answer. It will also break if file can't be (re)moved.
Remark. I'm using the -readable option to find, so that it only considers files that are readable. If you have this option, use it, you'll have a more robust command!
I would go with
find . -type f | while read f ; do
[[ "${f##*/}" =~ ^.{16}$ ]] && mv "${f}" <any_directory> || rm -f "${f}"
done
or
find . -type f | while read f ; do
[[ $(echo -n "${f##*/}" | wc -c) -eq 16 ]] && mv "${f}" <any_directory> || rm -f "${f}"
done
Replace <any_directory> with the directory you actually want to move the files to.
BTW, find command will go sub-directories. if you don't want this, then you should change the find command to fit your need.

grep a pattern in list of zip files recursively

I am using the following command on command line for getting the pattern matched lines.
find . -name "*.gz"|xargs gzcat|grep -e "pattern1" -e "pattern2"
i need now to find only the file names where the pattern is present.
how can i do it on command line?
grel -l has no use since i am using xargs gzcat before grep
Check if you have zgrep available. And then, if yes:
find . -name '*.gz' -exec zgrep -l -e ".." -e ".." {} +
If you don't have it - well, just copy it from some machine that has it (all linuxes I use have it by default) - it's a simple bash script.
ripgrep
Use ripgrep, for example, it's very efficient, especially for large files:
rg -z -e "pattern1" -e "pattern2" *.gz
or:
rg -z "pattern1|pattern2" .
or:
rg -zf pattern.file .
Where pattern.file is a file containing all your patterns separated by a new line character.
-z/--search-zip Search in compressed files (such as gz, bz2, xz, and lzma).
for i in $(find . -name "*.gz"); do gzcat $i|grep -qe "n1" -e "n2" && echo $i; done
Untested; does everything inside find so if you have loads of gz files you wont have performance problems as runs each gzcat/grep as soon as it finds files nothing is piped out:
find . -iname '*.gz' -exec bash -c 'gzcat $1 | grep -q -e "pattern1" -e "pattern2" && echo $1' {} {} \;
In bash, I'd do something like this (untested):
find . -name '*.gz' | while read f ; do gzcat $f | grep -q -e "pattern1" -e "pattern2" && echo $f ; done
grep/zgrep/zegrep
Use zgrep or zegrep to look for pattern in compressed files using their uncompressed contents (both GNU/Linux and BSD/Unix).
On Unix, you can also use grep (which is BSD version) with -Z, including -z on macOS.
Few examples:
zgrep -E -r "pattern1|pattern2|pattern3" .
zegrep "pattern1|pattern2|pattern3" **/*.gz
grep -z -e "pattern1" -e "pattern2" *.gz # BSD/Unix only.
Note: When you've globbing option enabled, ** checks the files recursively, otherwise use -r.
-R/-r/--recursive Recursively search subdirectories listed.
-E/--extended-regexp Interpret pattern as an extended regular expression (like egrep).
-Z (BSD), -z/--decompress (BSD/macOS) Force grep to behave as zgrep.

Resources