Search using reference file and print matching lines - bash

have a folder structure as shown below ./all_files
-rwxrwxrwx reference_file.txt
drwxrwxrwx file1.txt
drwxrwxrwx file2.txt
drwxrwxrwx file3.txt
reference_file.txt has filenames as shown below
$cat reference_file.txt
file1.txt
file2.txt
data in file1.txt and file2.txt are as shown below:
$cat file1.txt
step_1
step_2
step_3
Now, I have to take particular step say step2 from each file
Note1: file name must present in reference_file.txt
Note2: step2 is not line no:2 always.
Note3: search should perform recursively.
I have used below script:
#!/bin/sh
for i in cat reference_file.txt;
do
find . -type f -name $i | grep -v 'FS*' | xargs grep -F 'step_2'
done<reference_file.txt
after using above code i got no output.
# bash -x script.sh
+ for i in cat reference_file.txt
+ find . -type f -name **cat**
+ xargs grep -F 'step_2'
+ for i in cat **reference_file.txt**
+ find . -type f -name reference_file.txt
+ xargs grep -F 'step_2'
Added New requirement:
target=step_XX_2 where XX can be anything and should be skipped for search.. so that desire ouput will be.. step_ab_2 step_cd_2 step_ef_2

I think this is what you are trying to achieve. Please let me know:
EDIT: my previous version did not search recursively.
Further edits: Note that using process substitution for find means that this script MUST be run under bash and not sh.
Further edit for change in specification: note the change to target and the -E option to grep instead of -F.
#!/bin/bash
target='step_.*?_?2'
while read -r name
do
# EDIT: exclude certain directories
if [[ $name == "old1" || $name == "old2" ]]
then
# do the next iteration of the loop
continue
fi
while read -r fname
do
if [[ $fname != FS* ]]
then
# Display the filename (grep -H is not in POSIX)
if out=$(grep -E "$target" "$fname")
then
echo "$fname: $out"
fi
fi
done < <(find . -type f -name "$name")
done < reference_file.txt
Note that your trace (bash -x) uses bash but your #! line uses sh. They are different - you should be consistent with the shell you are using.
So, I have dropped the xargs, that reads strings standard input and executes a program using the strings as argument. Since we already have the argument strings for grep we don't need it.
Your grep -v 'FS*' probably doesn't do what you expect. The regular expression FS* means "F followed by zero or more S's". Not the same as a shell pattern matching (globbing). In my solution I have used FS* because I am using the shell, not grep.

I believe this question is duplicate of this
What you need is
#!/bin/sh
for i in `cat reference_file.txt`
do find . -type f -name $i | grep -v 'FS*' | xargs grep -F 'step_2'
done
See the backticks and Do Not read the file reference_file.txt twice.

Related

Search and print matching string present in both files

I have ./all_files/reference_file.txt which has data as shown below.
reference_file.txt contains filenames as shown
file1.txt
file2.txt
file3.txt
file4.txt
file5.txt
data in all files are as shown below:
step_1
step_2
step_3
step_4
Now, I have to take particular step say step2 from each file
Note1: file name must present in reference_file.txt
Note2: step2 is not line no:2 always.
Note3: search should perform recursively.
I have used below code
#!/bin/sh
while read f; do
if [ -f "$f" ]; then
find . -type f -name "*.txt" | xargs grep -l -F 'step_2' "$f"
fi
done <reference_file.txt
please help me on this
Two changes to your code:
Remove the if because find is supposed to find the file and in
that case if might return false if the file is not in current
directory.
find command should be passed $f as argument.
Below is the updated example:
while read f; do
find . -type f -name "${f}" | xargs grep -l -F 'step_2'
done <reference_file.txt

How can I get xargs to do something with the input, then do another thing?

I'm in zsh.
I'd like to do something like:
find . -iname *.md | xargs cat && echo "---" > all_slides_with_separators_in_between.md
Of course this cats all the slides, then appends a single "---" at the end instead of after each slide.
Is there an xargs way of doing this? Can I replace cat && echo "---" with some inline function or do block?
Very strangely, when I create a file cat---.sh with the contents
cat $1
echo ---
and run
find . -iname *.md | xargs ./cat---.sh
it only executes for the first result of find.
Replace cat---.sh with cat and it runs on both files.
There's no need to use xargs at all here. Following is a properly paranoid approach (robust against files with spaces, files with newlines, files with literal backslashes in their names, etc):
while IFS= read -r -d '' filename; do
printf '---\n'
cat -- "$filename"
done < <(find . -iname '*.md' -print0) >all_slides_with_separators.md
However -- you don't even need that either: find can do all the work itself, both printing the separator and calling cat!
find . -iname '*.md' -printf '---\n' -exec cat -- '{}' ';' >all_slides_with_separators.md
A common usage pattern is xargs sh -c 'command; another' _ where the entire shell script in the quotes will have access to the command-line arguments. The underscore is because the first argument to sh -c will be assigned to $0 (where you'd often see e.g. -sh in a ps listing).
find . -iname '*.md' |
xargs sh -c 'for x; do
cat "$x" && echo "---"
done' _ > all_slides_with_separators_in_between.md
As noted in the comments, you should probably investigate find -print0 and the corresponding xargs -0 option in GNU find (and maybe install it if you don't have it).
You can do something like this, but it can be insecure in some cases (see comments):
find . -iname '*.md' | xargs -I % sh -c '{ cat %; echo "----"; }' > output.txt
You'll rarely need find in zsh; its globbing facilities cover nearly every use case of find.
for f in (#i)**/*.md; do
cat $f
print -- "---"
done > all_slides.md
This looks in the current directory hierarchy for every file that matches *.md in a case-insensitive manner.
For even more efficiency, replace cat $f with < $f; zsh itself will read the file and write its contents to standard output.
Using GNU Parallel it looks like this:
parallel cat {}\; print -- --- ::: **/*.md

Making a file out of all the files with given a string

Create a file that includes the content of all the files in the current folder that has a given string (in say argument 1), the data will be in it one after the other (each file appended to the end). The name of the file will be the given string.
I thought of the following but it doesn't work:
grep $1 * >> fnames #places all the names of the right files in a file
for x in fnames
do
cat x >> $1 #concat the files from the list
done
rm fnames
On the same note, is there a site that has solved exercises like this or examples?
You can do something like this using process substitution:
shopt -s nullglob
while read -r file; do
cat "$file"
done < <(grep -l "search-pattern" *) > /path/to/newfile
This is assuming your directory only has files and no sub-directories.
You will need to use find with grep if there are sub-directories as well:
find . -maxdepth 1 -type f -exec grep -q "search-pattern" {} \; -print0 |
xargs -0 cat > /path/to/newfile
How about (assuming you aren't worried about files with spaces or newlines or shell globs/etc. in their names since those will not work here correctly):
for O in $(grep -l $1 *)
do
cat "$O" >> $1
done

List files whose last line doesn't contain a pattern

The very last line of my file should be "#"
if I tail -n 1 * | grep -L "#" the result is (standard input) obviously because it's being piped.
was hoping for a grep solution vs reading the entire file and just searching the last line.
for i in *; do tail -n 1 "$i" | grep -q -v '#' && echo "$i"; done
You can use sed for that:
sed -n 'N;${/pattern/!p}' file
The above command prints all lines of file if it's last line doesn't contain a pattern.
However, it looks like I misunderstood you, you want only to print the file names of the those files where the last line doesn't match the pattern. In this case I would use find together with the following (GNU) sed command:
find -maxdepth 1 -type f -exec sed -n '${/pattern/!F}' {} \;
The find command iterates over all files in the current folder and executes the sed command. $ marks the last line of input. If /pattern/ isn't found ! then F prints the file name.
The solution above looks nice and executes fast it has a drawback it would not print the names of empty files, since the last line will never reached and $ will not match.
For a stable solution I would suggest to put the commands into a script:
script.sh
#!/bin/bash
# Check whether the file is empty ...
if [ ! -s "$1" ] ; then
echo "$1"
else
# ... or if the last line contains a pattern
sed -n '${/pattern/!F}' "$1"
# If you don't have GNU sed you can use this
# (($(tail -n1 a.txt | grep -c pattern))) || echo "$1"
fi
make it executable
chmod +x script.sh
And use the following find command:
find -maxdepth 1 -type f -exec ./script.sh {} \;
Consider this one-liner:
while read name ; do tail -n1 "$name" | grep -q \# || echo "$name" does not contain the pattern ; done < <( find -type f )
It uses tail to get the last line of each file and grep to test that line against the pattern. Performance will not be the best on many files because two new processes are started in each iteration.

How to find files containing exactly 16 lines?

I have to find files that containing exactly 16 lines in Bash.
My idea is:
find -type f | grep '/^...$/'
Does anyone know how to utilise find + grep or maybe find + awk?
Then,
Move the matching files another directory.
Deleting all non-matching files.
I would just do:
wc -l **/* 2>/dev/null | awk '$1=="16"'
Keep it simple:
find . -type f |
while IFS= read -r file
do
size=$(wc -l < "$file")
if (( size == 16 ))
then
mv -- "$file" /wherever/you/like
else
rm -f -- "$file"
fi
done
If your file names can contain newlines then google for the find and read options to handle that.
You should use grep instead of wc because wc counts newline characters \n and will not count if the last line doesn't ends with a newline.
e.g.
grep -cH '' * 2>/dev/null | awk -F: '$2==16'
for more correct approach (without error messages, and without argument list too long error) you should combine it with the find and xargs commands, like
find . -type f -print0 | xargs -0 grep -cH '' | awk -F: '$2==16'
if you don't want count empty lines (so only lines what contains at least one character), you can replace the '' with the '.'. And instead of awk, you can use second grep, like:
find . -type f -print0 | xargs -0 grep -cH '.' | grep ':16$'
this will find all files what are contains 16 non-empty lines... and so on..
GNU sed
sed -E '/^.{16}$/!d' file
A pure bash version:
#!/usr/bin/bash
for f in *; do # Look for files in the present dir
[ ! -f "$f" ] && continue # Skip not simple files
cnt=0
# Count the first 17 lines
while ((cnt<17)) && read x; do ((++cnt)); done<"$f"
if [ $cnt == 16 ] ; then echo "Move '$f'"
else echo "Delete '$f'"
fi
done
This snippet will do the work:
find . -type f -readable -exec bash -c \
'if(( $(grep -m 17 -c "" "$0")==16 )); then echo "file $0 has 16 lines"; else echo "file $0 doesn'"'"'t have 16 lines"; fi' {} \;
Hence, if you need to delete the files that are not 16 lines long, and move those who are 16 lines long to folder /my/folder, this will do:
find . -type f -readable -exec bash -c \
'if(( $(grep -m 17 -c "" "$0")==16 )); then mv -nv "$0" /my/folder; else rm -v "$0"; fi' {} \;
Observe the quoting for "$0" so that it's safe regarding any file name with funny symbols in it (spaces, ...).
I'm using the -v option so that rm and mv are verbose (I like to know what's happening). The -n option to mv is no-clobber: a security to not overwrite an existing file; this option might not be available if you have an old system.
The good thing about this method. It's really safe regarding any filename containing funny symbols.
The bad thing(s). It forks a bash and a grep and an mv or rm for each file found. This can be quite slow. This can be fixed using trickier stuff (while still remaining safe regarding funny symbols in filenames). If you really need it, I can give you a possible answer. It will also break if file can't be (re)moved.
Remark. I'm using the -readable option to find, so that it only considers files that are readable. If you have this option, use it, you'll have a more robust command!
I would go with
find . -type f | while read f ; do
[[ "${f##*/}" =~ ^.{16}$ ]] && mv "${f}" <any_directory> || rm -f "${f}"
done
or
find . -type f | while read f ; do
[[ $(echo -n "${f##*/}" | wc -c) -eq 16 ]] && mv "${f}" <any_directory> || rm -f "${f}"
done
Replace <any_directory> with the directory you actually want to move the files to.
BTW, find command will go sub-directories. if you don't want this, then you should change the find command to fit your need.

Resources