how to stop grep creating empty file if no results - bash

I'm comparing the results of two files for lines in one that are not in the other using grep -v -f file1.txt file2.txt > result.txt
Let's say my files look like;
file1.txt
alex
peter
zoey
file2.txt
alex
john
peter
zoey
So result.txt should contain john
This is to be run inside a Jenkins job, and jenkins ends up not happy with creating an empty result.txt if there are no differences between the two.
I can't just do a blind diff/no diff output, I specifically need to know which lines differ if there are any.
Is there a neater way to run this command to not create a file if there are no results?

EDIT: Try conditionally running the command with the quiet option -q (which will exit when one match is found and save time). Then once the first match is found (meaning the files are different), you will run the same command and output it to your file.
Try this: (EDIT taken from Charles Duffy's comment)
#!/usr/bin/env bash
if grep -qvf file1.txt file2.txt
then
grep -vf file1.txt file2.txt > output.txt
echo "Differences exist. File output.txt created."
else
echo "No difference between the files detected"
fi
Or with less code and one line:
grep -qvf file1.txt file2.txt && grep -vf file1.txt file2.txt > output.txt

Could you do something as simple as removing the file if it's empty?
Solution #1:
grep -v -f file1.txt file2.txt > result.txt
[[ $? -ne 0 ]] && 'rm' -f result.txt
grep should generate a non-zero return code for an empty output
quote the rm command to ensure not calling any aliases
use the -f to keep silent any messages should the file not exist (shouldn't happen, but doesn't hurt to code for this anyway)
Solution #2:
grep -v -f file1.txt file2.txt > result.txt
[[ ! -s result.txt ]] && 'rm' -f result.txt
-s => file exists and is greater than 0 in size
! -s => file doesn't exist or file exists and is 0 in size
'rm' -f ... same explanation as for solution #1

You can use this awk for conditional differential output file creation:
awk 'NR==FNR{a[$1]; next} !($1 in a){b[$1]} END{
if (length(b) > 0) for (i in b) print i > "result.txt"}' file1.txt file2.txt
Output file result.txt will only be created when there is any output data to be written due to length(b) > 0 check.

This should work for your case
out=$(grep -v -f file1.txt file2.txt); [[ -n "$out" ]] && echo "$out" > results.txt

Using grep && grep. Positive result:
$ grep -q -m 1 -v -f file1 file2 && grep -v -f file1 file2 > out1
$ cat out1
john
and negative:
$ grep -q -m 1 -v -f file1 file1 && grep -v -f file1 file1 > out2
$ cat out2
cat: out2: No such file or directory
It exits the first grep after the first match to quicken its execution but still its execution time at the worst could be twice the runtime of one grep.
Another way, another awk:
$ awk -v outfile=out3 '
NR==FNR {
a[$0]
next
}
($0 in a==0) {
print $0 > outfile # file is only opened when there is a match
}' file1 file2
$ cat out3
john
That awk won't recognize partial matches.

Related

Combine multiple files into one including the file name

I have been looking around trying to combine multiple text files into including the name of the file.
My current file content is:
1111,2222,3333,4444
What I'm after is:
File1,1111,2222,3333,4444
File1,1111,2222,3333,4445
File1,1111,2222,3333,4446
File1,1111,2222,3333,4447
File2,1111,2222,3333,114444
File2,1111,2222,3333,114445
File2,1111,2222,3333,114446
I found multiple example how to combine them all but nothing to combine them including the file name.
Could you please try following. Considering that your Input_file names extensions are .csv.
awk 'BEGIN{OFS=","} {print FILENAME,$0}' *.csv > output_file
After seeing OP's comments if file extensions are .txt then try:
awk 'BEGIN{OFS=","} {print FILENAME,$0}' *.txt > output_file
Assuming all your files have a .txt extension and contain only one line as in the example, you can use the following code:
for f in *.txt; do echo "$f,$(cat "$f")"; done > output.log
where output.log is the output file.
Well, it works:
printf "%s\n" *.txt |
xargs -n1 -d $'\n' bash -c 'xargs -n1 -d $'\''\n'\'' printf "%s,%s\n" "$1" <"$1"' --
First output a newline separated list of files.
Then for each file xargs execute sh
Inside sh execute xargs for each line of file
and it executes printf "%s,%s\n" <filename> for each line of input
Tested in repl.
Solved using grep "" *.txt -I > $filename.

Creating a script that checks to see if each word in a file

I am pretty new to Bash and scripting in general and could use some help. Each word in the first file is separated by \n while the second file could contain anything. If the string in the first file is not found in the second file, I want to output it. Pretty much "check if these words are in these words and tell me the ones that are not"
File1.txt contains something like:
dog
cat
fish
rat
file2.txt contains something like:
dog
bear
catfish
magic ->rat
I know I want to use grep (or do I?) and the command would be (to my best understanding):
$foo.sh file1.txt file2.txt
Now for the script...
I have no idea...
grep -iv $1 $2
Give this a try. This is straight forward and not optimized but it does the trick (I think)
while read line ; do
fgrep -q "$line" file2.txt || echo "$line"
done < file1.txt
There is a funny version below, with 4 parrallel fgrep and the use of an additional result.txt file.
> result.txt
nb_parrallel=4
while read line ; do
while [ $(jobs | wc -l) -gt "$nb_parralel" ]; do sleep 1; done
fgrep -q "$line" file2.txt || echo "$line" >> result.txt &
done < file1.txt
wait
cat result.txt
You can increase the value 4, in order to use more parrallel fgrep, depending on the number of cpus and cores and the IOPS available.
With the -f flag you can tell grep to use a file.
grep -vf file2.txt file1.txt
To get a good match on complete lines, use
grep -vFxf file2.txt file1.txt
As #anubhava commented, this will not match substrings. To fix that, we will use the result of grep -Fof file1.txt file2.txt (all the relevant keywords).
Combining these will give
grep -vFxf <(grep -Fof file1.txt file2.txt) file1.txt
Using awk you can do:
awk 'FNR==NR{a[$0]; next} {for (i in a) if (index(i, $0)) next} 1' file2 file1
rat
You can simply do the following:
comm -2 -3 file1.txt file2.txt
and also:
diff -u file1.txt file2.txt
I know you were looking for a script but I don't think there is any reason to do so and if you still want to have a script you can jsut run the commands from a script.
similar awk
$ awk 'NR==FNR{a[$0];next} {for(k in a) if(k~$0) next}1' file2 file1
rat

How could I redirect file name into counts by tab using one line commands in bash?

I have some files in fasta format and want to counts their reads and would like to have output in file names and their corresponding counts.
input file names:
1.fa
2.fa
3.fa
...
I tried:
for i in $(ls -t -v *.fa); do grep -c '>' $i > echo $i >> out.txt ; done
Problem:
It gives me out.txt but double file names and their counts by ':' separated. However, I need a tab and unique file names.
1.fa:7323580
1.fa:7323580
2.fa:5591179
2.fa:5591179
...
Suggested solution
grep -c '>' *.fa | sed 's/:/'$'\t'/ > out.txt
The $'\t\' is a Bash-ism called ANSI C Quoting.
Analysis of what went wrong
Your code is:
for i in $(ls -t -v *.fa); do grep -c '>' $i > echo $i >> out.txt ; done
It isn't a good idea to parse the output of the ls command. However, if your file names are well behaved (roughly, in the portable filename character set, which is [-A-Za-z._]), you'll be reasonably OK.
Your grep command, though, is confused. It is:
grep -c '>' $i > echo $i >> out.txt
That could be written more clearly as:
grep -c '>' $i $i > echo >> out.txt
This means 'count the number of lines containing > in $i, and then in $i again, and send the output first to a file echo, and then append to out.txt. Since the append overrides the redirection, the file echo is empty. You get the file name included in the output because there are two files to search; with only one file, you wouldn't get the file name too. (One way to ensure you get file names with regular (not -c or -l) grep is to scan /dev/null too. Many versions of grep also provide options to get the name explicitly, but POSIX doesn't mandate one. BSD grep uses -H; so does GNU grep.)
So, that's why you got the double file names and entries in your output.
Try this:
for i in $(ls -t -v *.fa)
do
c=$(grep -c '>' $i | awk -F: '{print $2}')
echo "$i: $c" >> out.txt
done

Using awk to put a header in a text file

I have lots of text files and need to put a header on each one of them depending of the data on each file.
This awk command accomplishes the task:
awk 'NR==1{first=$1}{sum+=$1;}END{last=$1;print NR,last,"L";}' my_text.file
But this prints it on the screen and I want to put this output in the header of each of my file, and saving the modifications with the same file name.
Here is what I've tried:
for i in *.txt
do
echo Processing ${i}
cat awk 'NR==1{first=$1}{sum+=$1;}END{last=$1;print NR,last,"L";}' "${i}" ${i} > $$.tmp && mv $$.tmp "${i}"
done
So I guess I can't use cat to put them as a header, or am I doing something wrong?
Thanks in advance
UPDATE:
with awk:
awk 'BEGIN{print "header"}1' test.txt
without awk:
with cat & echo:
cat <(echo "header") test.txt
(OR)
using tac:
tac test.txt | echo "header" >> test.txt | tac test.txt
I THINK what you're trying to do with your loop is:
for i in *.txt
do
echo "Processing $i"
awk 'NR==1{first=$1}{sum+=$1}END{last=$1;print NR,last,"L"}' "$i" > $$.tmp &&
cat "$i" >> $$.tmp &&
mv $$.tmp "$i"
done
but it's not clear what you're really trying to do since you never use first or sum and setting last in the END section is a bad idea as it will not work across all awks and there's a simple alternative.
If you update your question with some sample input and expected output we can help you.

Removing lines based on column values read from file

I use the following code to extract lines from input_file with a certain value in the first column. The values on which the extraction of lines is based is in "one_column.txt":
while read file
do
awk -v col="$file" '$1==col {print $0}' input_file >> output_file
done < one_column.txt
My question is, how do I extract the lines where the first column does not match any of the values in one_column.txt? In other words, how do I extract only the remaining lines from input_file that don't end up in output_file?
grep -vf can make it:
grep -vf output_file input_file
grep -f compares one file with another. grep -v matches the opposite.
Test
$ cat a
hello
good
bye
$ cat b
hello
good
bye
you
all
$ grep -f a b
hello
good
bye
$ grep -vf a b ## opposite
you
all

Resources