Search and print matching string present in both files - shell

I have ./all_files/reference_file.txt which has data as shown below.
reference_file.txt contains filenames as shown
file1.txt
file2.txt
file3.txt
file4.txt
file5.txt
data in all files are as shown below:
step_1
step_2
step_3
step_4
Now, I have to take particular step say step2 from each file
Note1: file name must present in reference_file.txt
Note2: step2 is not line no:2 always.
Note3: search should perform recursively.
I have used below code
#!/bin/sh
while read f; do
if [ -f "$f" ]; then
find . -type f -name "*.txt" | xargs grep -l -F 'step_2' "$f"
fi
done <reference_file.txt
please help me on this

Two changes to your code:
Remove the if because find is supposed to find the file and in
that case if might return false if the file is not in current
directory.
find command should be passed $f as argument.
Below is the updated example:
while read f; do
find . -type f -name "${f}" | xargs grep -l -F 'step_2'
done <reference_file.txt

Related

Search using reference file and print matching lines

have a folder structure as shown below ./all_files
-rwxrwxrwx reference_file.txt
drwxrwxrwx file1.txt
drwxrwxrwx file2.txt
drwxrwxrwx file3.txt
reference_file.txt has filenames as shown below
$cat reference_file.txt
file1.txt
file2.txt
data in file1.txt and file2.txt are as shown below:
$cat file1.txt
step_1
step_2
step_3
Now, I have to take particular step say step2 from each file
Note1: file name must present in reference_file.txt
Note2: step2 is not line no:2 always.
Note3: search should perform recursively.
I have used below script:
#!/bin/sh
for i in cat reference_file.txt;
do
find . -type f -name $i | grep -v 'FS*' | xargs grep -F 'step_2'
done<reference_file.txt
after using above code i got no output.
# bash -x script.sh
+ for i in cat reference_file.txt
+ find . -type f -name **cat**
+ xargs grep -F 'step_2'
+ for i in cat **reference_file.txt**
+ find . -type f -name reference_file.txt
+ xargs grep -F 'step_2'
Added New requirement:
target=step_XX_2 where XX can be anything and should be skipped for search.. so that desire ouput will be.. step_ab_2 step_cd_2 step_ef_2
I think this is what you are trying to achieve. Please let me know:
EDIT: my previous version did not search recursively.
Further edits: Note that using process substitution for find means that this script MUST be run under bash and not sh.
Further edit for change in specification: note the change to target and the -E option to grep instead of -F.
#!/bin/bash
target='step_.*?_?2'
while read -r name
do
# EDIT: exclude certain directories
if [[ $name == "old1" || $name == "old2" ]]
then
# do the next iteration of the loop
continue
fi
while read -r fname
do
if [[ $fname != FS* ]]
then
# Display the filename (grep -H is not in POSIX)
if out=$(grep -E "$target" "$fname")
then
echo "$fname: $out"
fi
fi
done < <(find . -type f -name "$name")
done < reference_file.txt
Note that your trace (bash -x) uses bash but your #! line uses sh. They are different - you should be consistent with the shell you are using.
So, I have dropped the xargs, that reads strings standard input and executes a program using the strings as argument. Since we already have the argument strings for grep we don't need it.
Your grep -v 'FS*' probably doesn't do what you expect. The regular expression FS* means "F followed by zero or more S's". Not the same as a shell pattern matching (globbing). In my solution I have used FS* because I am using the shell, not grep.
I believe this question is duplicate of this
What you need is
#!/bin/sh
for i in `cat reference_file.txt`
do find . -type f -name $i | grep -v 'FS*' | xargs grep -F 'step_2'
done
See the backticks and Do Not read the file reference_file.txt twice.

Search and print matching string

have a folder structure as shown below
./all_files
-rwxrwxrwx reference_file.txt
drwxrwxrwx file1.txt
drwxrwxrwx file2.txt
drwxrwxrwx file3.txt
reference_file.txt has filenames as shown below
file1.txt
file2.txt
data in file1.txt and file2.txt are as shown below:
step_1
step_2
step_3
Now, I have to take particular step say step2 from each file
Note1: file name must present in reference_file.txt
Note2: step2 is not line no:2 always.
Note3: search should perform recursively.
I have used below scripts:
which is not giving what i expected. All blank and unwanted lines displayed
#!/bin/sh
while read f; do
find . -type f -name "${f}" | xargs grep -l -F 'step_2'
done <reference_file.txt
It is searching in only one folder but not searching recursively
#!/bin/sh
for i in reference_file.txt
do
find . -type f -name "${f}" | xargs grep -l -F 'step_2'
done
Please help me on this.
With removing -l option it works for me.
for f incat reference_file.txt; do find . -type f -name $f | xargs grep 'step_2' ; done
I can see it works recursively as well. See the output below when I run it from outside of dir1 (directory structure created by me)
./dir1/file1:step_2:
./dir1/file2:step_2:
Please share more information like directory structure and file names if issue in recursion still exists.

Making a file out of all the files with given a string

Create a file that includes the content of all the files in the current folder that has a given string (in say argument 1), the data will be in it one after the other (each file appended to the end). The name of the file will be the given string.
I thought of the following but it doesn't work:
grep $1 * >> fnames #places all the names of the right files in a file
for x in fnames
do
cat x >> $1 #concat the files from the list
done
rm fnames
On the same note, is there a site that has solved exercises like this or examples?
You can do something like this using process substitution:
shopt -s nullglob
while read -r file; do
cat "$file"
done < <(grep -l "search-pattern" *) > /path/to/newfile
This is assuming your directory only has files and no sub-directories.
You will need to use find with grep if there are sub-directories as well:
find . -maxdepth 1 -type f -exec grep -q "search-pattern" {} \; -print0 |
xargs -0 cat > /path/to/newfile
How about (assuming you aren't worried about files with spaces or newlines or shell globs/etc. in their names since those will not work here correctly):
for O in $(grep -l $1 *)
do
cat "$O" >> $1
done

Renaming files but keeping them in their present subdirectory

I have a script that renames html files based on information in their tags. This script goes through the current directory and all subdirectories and performs this renaming recursively. however, after renaming them it moves them into the current working directory I am executing my shell script from. How can I make sure the files remain in their subdirectories, and are not moved to the working directory?
Here is what I am working with:
#!/usr/bin/env bash
for f in `find . -type f | grep \.htm`
do
title=$( awk 'BEGIN{IGNORECASE=1;FS="<title>|</title>";RS=EOF} {print $2}' "$f" )
mv ./"$f" "${title//[ ]/-}".htm
done
Never use this construct:
for f in `find . -type f | grep \.htm`
as the loop fails for file names that contain space and the grep's unnecessary as find has a -name option for that. Use this instead:
find . -type f -name '*\.htm.*' -print |
while IFS= read -r f
This:
awk 'BEGIN{IGNORECASE=1;FS="<title>|</title>";RS=EOF}
can be reduced and clarified to:
awk 'BEGIN{IGNORECASE=1;FS="</?title>";RS=""}
Note that the use of EOF was misleading as EOF is just an undefined variable which therefore contains a null string (so your first record will go until the first blank line, not until the end of the file). You could have used RS=bubba and got the same effect but just setting RS to an empty string is clearer. Not saying it's what you SHOULD be doing, but it's a clearer implementation of what you ARE doing.
Finally putting it all back together something like this should work for you:
find . -type f -name '*\.htm.*' -print |
while IFS= read -r f
do
title=$( awk 'BEGIN{IGNORECASE=1;FS="</?title>";RS=""} {print $2}' "$f"
mv -- "$f" $(dirname "$f")/"${title//[ ]/-}".htm
done
Try:
mv ./"$f" "$(dirname "$f")/${title//[ ]/-}".htm
Note that your for f in \find...' will fail on any file name with a space or CR in it. You can avoid that with a line like:
find . -type f -name '*.htm' -type f -exec myrename.sh {} \;
where the renaming code is in a script called myrename.sh.
shopt -s globstar nullglob
for f in **/*.htm
do
title=$( awk 'BEGIN{IGNORECASE=1;FS="<title>|</title>";RS=EOF} {print $2}' "$f" )
mv "$f" "$(dirname "$f")/${title//[ ]/-}".htm
done

How to create a backup of files' lines containing "foo"

Basically I have a directory and sub-directories that needs to be scanned to find .csv files. From there I want to copy all lines containing "foo" from the csv's found to new files (in the same directory as the original) but with the name reflecting the file it was found in.
So far I have
find -type f -name "*.csv" | xargs egrep -i "foo" > foo.csv
which yields one backup file (foo.csv) with everything in it, and the location it was found in is part of the data. Both of which I don't want.
What I want:
For example if I have:
csv1.csv
csv2.csv
and they both have lines containing "foo", I would like those lines copied to:
csv1_foo.csv
csv2_foo.csv
and I don't anything extra entered in the backups, other than the full line containing "foo" from the original file. I.e. I don't want the original file name in the backup data, which is what my current code does.
Also, I suppose I should note that I'm using egrep, but my example doesn't use regex. I will be using regex in my search when I apply it to my specific scenario, so this probably needs to be taken into account when naming the new file. If that seems too difficult, an answer that doesn't account for regex would be fine.
Thanks ahead of time!
try this if helps it anyway.
find -type f -name "*.csv" | xargs -I {} sh -c 'filen=`echo {} | sed 's/.csv//' | sed "s/.\///"` && egrep -i "foo" {} > ${filen}_foo.log'
You can try this:
$ find . -type f -exec grep -H foo '{}' \; | perl -ne '`echo $2 >> $1_foo` if /(.*):(.*)/'
It uses:
find to iterate over files
grep to print file path:line tuples (-H switch)
perl to echo those line to the output files (using backslashes, but it could be done prettier).
You can also try:
find -type f -name "*.csv" -a ! -name "*_foo.csv" | while read f; do
grep foo "$f" > "${f%.csv}_foo.csv"
done

Resources