This question already has answers here:
Find lines from a file which are not present in another file [duplicate]
(4 answers)
Closed 8 years ago.
I will fetch the file names from the file say: FILE_A, and will search these file names in another file say: File_B Using the script say: script.sh
I want to print those file names which are not present in a file say: FILE_B.
I use the code but it didn't work.
Code in the script->script.sh is as follows:
#!/bin/bash
while read line
do
grep -v "$line" FILE_B
done<FILE_A
please help me. why it is not working and what is the solution of it?
grep can read its input from a file; no need for a loop.
grep -Fxvf FILE_A FILE_B
The -F option specifies that the input is literal strings, not regular expressions. Otherwise an input which contains regex metacharacters would not match itself; or not only itself. For example, the regular expression a.c matches "aac", "abc", etc.
The -x option requires a full-line match. Otherwise, the input "bc" would match on any line containing it as a substring, such as "abcd".
The -v option says to print non-matching lines instead of matching.
Finally, the lowercase -f option specifies a file name as its argument to use as input for the patterns to match.
comm is good for this, but it requires the input files to be sorted. If that's not a problem:
# lines in FILE_A that are not in FILE_B
comm -23 <(sort FILE_A) <(sort FILE_B)
No extra linefeed between while and do
grep -v expr file will
print all lines of those files, not containing expr. What you want, is just the result whether it's found or not. You need to test the
exit state.
Try:
#!/bin/bash
while read line
do
grep -q "$line" FILE_B || echo "$line"
done<FILE_A
grep returns exit 0 if a line was found. The || concatenation with echo means: execute echo when exit state != 0- i.e. when $line was not found.
This script works but does not print what you want. For each filename in FILE_A it prints all the OTHER filenames in FILE_B. Instead you should print the filename yourself if grep does not find it:
while read line
do
grep "$line" FILE_B >/dev/null || echo "$line"
done <FILE_A
Use this instead
#!/bin/bash
while read line
do
if grep -qw $line "file_B"
then
echo $line
fi
done < file_A
Related
I am new to Shell scripting, and am writing a Korn shell script.
My aim is to search for each line in fileA.txt in 4 separate files (let's call them fileA.txt, fileB.txt, fileC.txt and fileD.txt). I need to print "not found" for the lines from fileA.txt that were found in neither of the four files in a separate file.
So I came up with the following If statement. I am trying to combine the 4 grep commands using &&, and doing a logical Not (!) since I only need the lines that were found in neither of the 4 files.
for i in $(<fileA.txt);
do
if !((grep -q $i fileB.txt) && (grep -q $i fileB.txt) && (grep -q $i fileC.txt) && (grep -q $i fileD.txt)); then
print "$i not found in either of 4 files"
fi
done
I know there's something definitely wrong with the syntax, but being a beginner in shell scripting, I can't figure it out.
It doesn't answer the question you asked, and thus violates SO policy, but there's a way to solve your actual problem with awk in one pass that I can't fit in a reasonable comment:
awk 'FNR==NR{a[$0];next} {for(p in a)if($0~p){delete a[p]}} \
END{for(p in a)print "notfound: ",p}' patternfile data1 data2 data3 etc
The notfound: is just for clarity, you can change or omit it as desired.
The output values (patterns that were not found in any data file) are not necessarily in the same order as they were in patternfile; if you care about that:
awk 'FNR==NR{a[$0]=FNR;next} {for(p in a)if($0~p){delete a[p]}} \
END{for(p in a)print a[p],p}' patternfile data1 data2 data3 etc | sort -k1n | cut -f2-
# or in GNU awk v4+ only
awk 'FNR==NR{a[$0]=FNR;next} {for(p in a)if($0~p){delete a[p]}} \
END{PROCINFO["sorted_in"]="#val_num_asc";for(p in a)print p}' patternfile data1 data2 data3 etc
Your question is also ambiguous about 'lines'; do you mean each line in patternfile should occur as a line in one of the data files, or can it occur within a line but not necessarily the whole line? Also, are the values in the patternfile only data characters or are any of them special characters that match something different in the data? For example with grep defaults as you posted (or awk with ~ as I have above) if patternfile contains a line boojum.. that item will be considered found if a data file contains any of the following lines:
boojum..
boojumXY
the snark was a boojum!!
OTOH the patternfile line ^abc will match:
abc
abcdefghi
but will NOT match:
^abc
You can get full-line match in grep with option -x, literal (non-regex) match with -F, or both. These can also be achieved in awk but differently.
You don't need the parentheses. In fact, because you are using &&, you don't need 3 separate calls to grep.
while IFS= read -r line; do
if ! grep -q "$i" fileB.txt fileC.txt fileD.txt; then
print "$i not found in any of the 3 files"
fi
done < fileA.txt
You don't even need the loop; this pattern is covered by the -f option:
if ! grep -f fileA.txt fileB.txt fileC.txt fileD.txt; then
...
fi
I want to read a file word by word and i want to use each word in that text file as an input to grep.
to read the file word by word i have used the following code
for word in $(<filename)
do
echo "$word"
done
now when I replaced
echo "$word"
with
grep -i "$word"
I'm not getting any output.
The following will read the file word by word and apply grep using the read word as input:
#!/bin/bash
while read line; do
for word in $line; do
grep -i "<REGULAR_EXPRESSION_HERE>" "$word"
done
done < filename
The reason you are not getting any output is that grep expects two arguments. If you leave out the filename argument, it will wait for you to type in the text to grep from; it is reading standard input. (This is what allows you to use it in a pipeline, like command | grep error.)
Anyway, what you are attempting is already built into grep. Just pass it the file of search expressions as an argument to -f.
grep -irf filename .
where -r says to search recursively through all the files in a directory and . is the current directory.
Note, however, that this will search for matches anywhere on a line. If your input file contains dog then grep will find a match on lines which contain dogmatic or endogenous; and if it contains an empty line, it will match all lines in all files. Maybe look at the -w and/or -x options (as well as perhaps -F to disarm any regex specials in the input) to address these issues.
See if this serves your purpose:
$ grep -o "\S*" filename | grep -i "<your regex here>"
The first grep in the pipeline will flatten the file to one word per line. Then second grep will search those word for your regex.
Note: This answer assumes that the individual words in file are the data you want to grep in. If those are supposed to be interpreted as filenames, refer to higuaro's answer.
This is what worked for me
while read line
do
output=`grep -i "$line" /filepath/*`
if [ $? -eq 0 ]; then
echo "$line present in file : $output"
fi
done <filename
I have a command that lists the full 8 level deep path of all folders we are backing up.
I also have a command that enumerates all 8 level deep folders on the system.
Both of these are stored as variables in a bash script.
I'm trying to get a loop together that takes file 1 and uses the first line entry as a variable in an if/then/else, and then moves onwards to through the end of the file.
I've tried so many things but its beyond my skillset to provide an example that won't confuse the reader of this post.
TempFile1=/ifs/data/scripts/ConfigMonitor/TempFile1.txt
TempFile2=/ifs/data/scripts/ConfigMonitor/TempFile2.txt
find /ifs/*/*/backup -maxdepth 4 -mindepth 4 -type d > $TempFile1
isi snapshot schedules list -v | grep Path: | awk '{print $2}' > $TempFile2
list line 1 on $TempFile1
Grep for line 1 within $TempFile2
if result yielded then
echo found
else
echo fullpath not being backed up
fi
Use Grep's -f Flag
grep(1) says:
-f FILE, --file=FILE
Obtain patterns from FILE, one per line. The empty file
contains zero patterns, and therefore matches nothing. (-f is
specified by POSIX.)
Therefore, the following should work:
grep -f patterns_to_match.txt file_to_examine.txt
Faster Reporting
Another way to think about this is that you can ask GNU grep to show you all the matches:
echo 'Lines that match a pattern in your pattern file.'
grep -f patterns_to_match.txt file_to_examine.txt
and then show you all the lines that don't match any of the patterns:
echo 'Lines that do not match any patterns in your pattern file.'
grep -f patterns_to_match.txt -v file_to_examine.txt
This is likely to be faster and more efficient than looping through the file one line at a time in Bash. You may or may not get similar results with a grep other than GNU grep; while the -f and -v flags are specified by POSIX, I only tested it against GNU grep 2.16, so your mileage may vary.
This should iterate through Tempfile1.txt and grep for the line in TempFile2.txt.
while read line; do
if grep $line /path/to/TempFile2.txt > /dev/null
then
echo "Found $line"
else
echo "Did not find $line"
fi
done < Tempfile1.txt
Tempfile1.txt:
a
b
c
Tempfile2.txt
b
d
z
Output:
Did not find a
Found b
Did not find c
I have a file that looks like this:
some random
text
00ab46f891c2emore random
text
234324fc234ba253069
and yet more text
only one line in the file contains only hex characters (234324fc234ba253069), how do I extract that? I tried sed -ne 's/^\([a-f0-9]*\)$/\1/p' file I used line start and line end (^ and &) as delimiters, but I am obviously missing something...
Grep does the job,
$ grep '^[a-f0-9]\+$' file
234324fc234ba253069
Through awk,
$ awk '/^[a-f0-9]+$/{print}' file
234324fc234ba253069
Based on the search pattern given, awk and grep prints the matched line.
^ # start
[a-f0-9]\+ # hex characters without capital A-F one or more times
$ # End
sed can make it:
sed -n '/^[a-f0-9]*$/p' file
234324fc234ba253069
By the way, your command sed -ne 's/^\([a-f0-9]*\)$/\1/p' file is working to me. Note, also, that it is not necessary to use \1 to print back. It is handy in many cases, but now it is too much because you want to print the whole line. Just sed -n '/pattern/p' does the job, as I indicate above.
As there is just one match in the whole file, you may want to exit once it is found (thanks NeronLeVelu!):
sed -n '/^[a-f0-9]*$/{p;q}' file
Another approach is to let printf decide when the line is hexadecimal:
while read line
do
printf "%f\n" "0x"$line >/dev/null 2>&1 && echo "$line"
done < file
Based on Hexadecimal To Decimal in Shell Script, printf "%f" 0xNUMBER executes successfully if the number is indeed hexadecimal. Otherwise, it returns an error.
Hence, using printf ... >/dev/null 2>&1 && echo "$line" does not let printf print anything (redirects to /dev/null) but then prints the line if it was hexadecimal.
For your given file, it returns:
$ while read line; do printf "%f\n" "0x"$line >/dev/null 2>&1 && echo "$line"; done < a
234324fc234ba253069
Using egrep you can restrict your regex to select lines that only match valid hex characters i.e. [a-fA-F0-9]:
egrep '^[a-fA-F0-9]+$' file
234324fc234ba253069
I have a list of files (~1000) and there is 1 file per line in my text file named: 'files.txt'
I have a macro that looks something like the following:
#!/bin/sh
b=$(sed '${1}q;d' files.txt)
cat > MyMacro_${1}.C << +EOF
myFile = new TFile("/MYPATHNAME/$b");
+EOF
and I use this input script by doing
./MakeMacro.sh 1
and later I want to do
./MakeMacro.sh 2
./MakeMacro.sh 3
...etc
So that it reads the n'th line of my files.txt and feeds that string to my created .C macro.
So that it reads the n'th line of my files.txt and feeds that string to my created .C macro.
Given this statement and your tags, I'm going to answer using shell tools and not really address the issue of the .c macro.
The first line of your script contains a sed script. There are numerous ways to get the Nth line from a text file. The simplest might be to use head and tail.
$ head -n "${i}" files.txt | tail -n 1
This takes the first $i lines of files.txt, and shows you the last 1 lines of that set.
$ sed -ne "${i}p" files.txt
This use of sed uses -n to avoid printing by default, then prints the $ith line. For better performance, try:
$ sed -ne "${i}{p;q;}" files.txt
This does the same, but quits after printing the line, so that sed doesn't bother traversing the rest of the file.
$ awk -v i="$i" 'NR==i' files.txt
This passes the shell variable $i into awk, then evaluates an expression that tests whether the number of records processed is the same as that variable. If the expression evaluates true, awk prints the line. For better performance, try:
$ awk -v i="$i" 'NR==i{print;exit}' files.txt
Like the second sed script above, this will quit after printing the line, so as to avoid traversing the rest of the file.
Plenty of ways you could do this by loading the file into an array as well, but those ways would take more memory and perform less well. I'd use one-liners if you can. :)
To take any of these one-liners and put it into your script, you already have the notation:
if expr "$i" : '[0-9][0-9]*$' >/dev/null; then
b=$(sed -ne "${i}{p;q;}" files.txt)
else
echo "ERROR: invalid line number" >&2; exit 1
fi
If I am understanding you correctly, you can do a for loop in bash to call the script multiple times with different arguments.
for i in `seq 1 n`; do ./MakeMacro.sh $i; done
Based on the OP's comment, it seems that he wants to submit the generated files to Condor. You can modify the loop above to include the condor submission.
for i in `seq 1 n`; do ./MakeMacro.sh $i; condor_submit <OutputFile> ; done
i=0
while read file
do
((i++))
cat > MyMacro_${i}.C <<-'EOF'
myFile = new TFile("$file");
EOF
done < files.txt
Beware: you need tab indents on the EOF line.
I'm puzzled about why this is the way you want to do the job. You could have your C++ code read files.txt at runtime and it would likely be more efficient in most ways.
If you want to get the Nth line of files.txt into MyMacro_N.C, then:
{
echo
sed -n -e "${1}{s/.*/myFile = new TFILE(\"&\");/p;q;}" files.txt
echo
} > MyMacro_${1}.C
Good grief. The entire script should just be (untested):
awk -v nr="$1" 'NR==nr{printf "\nmyFile = new TFile(\"/MYPATHNAME/%s\");\n\n",$0 > ("MyMacro_"nr".C")}' files.txt
You can throw in a ;exit before the } if performance is an issue but I doubt if it will be.