Diff command for two files and output to third - shell

I just a have a small problem with comparing two files with the diff command in a shell script. Say I have two ascii files, file1.txt and file2.txt, with contents:
file1.txt
blah/blah2/content.fits/
blah3/blah4/content2.fits/
blah5/blah6/content3.fits/
blah7/blah8/content4.fits/
file2.txt
content.fits
content2.fits
I would now like to make a comparison of the two files based on the .fits extensions but write out the output to an ascii file keeping the formatting in file1.txt, i.e in this particular example the output file after comparing these two should give:
blah5/blah6/content3.fits/
blah7/blah8/content4.fits/
any ideas?

You can use this awk to get that output:
awk -F/ 'FNR==NR {a[$1];next} !($(NF-1) in a)' file2.txt file1.txt
blah5/blah6/content3.fits/
blah7/blah8/content4.fits/

Related

Extract lines from a list that has a double-repeated character

i have a text file
I need to Extract lines from a list that has a double-repeated character
For example, I have
cat-dog-eat
men-boy
I need to Extract lines double-repeated -
and the desired output is:
cat-dog-eat
Given that:
kent$ cat file
cat-dog-eat
men-boy
a-b-c-d-e
To get lines have exactly two -s:
awk -F'-' 'NF==3' file
cat-dog-eat
To get lines have at least two -s:
awk -F'-' 'NF>2' file
cat-dog-eat
a-b-c-d-e

Using awk to extract specific line from all text files in a directory

I have a folder with 50 text files and I want to extract the first line from each of them at the command line and output this to a result.txt file.
I'm using the following command within the directory that contains the files I'm working with:
for files in *; do awk '{if(NR==1) print NR, $0}' *.txt; done > result.txt
When I run the command, the result.txt file contains 50 lines but they're all from a single file in the directory rather than one line per file. The common appears to be looping over a single 50 times rather than over each of the 50 files.
I'd be grateful if someone could help me understand where I'm going wrong with this.
try this -
for i in *.txt;do head -1 $i;done > result.txt
OR
for files in *.txt;do awk 'NR==1 {print $0}' $i;done > result.txt
Your code has two problems:
You have an outer loop that iterates over *, but your loop body doesn't use $files. That is, you're invoking awk '...' *.txt 50 times. This is why any output from awk is repeated 50 times in result.txt.
Your awk code checks NR (the number of lines read so far), not FNR (the number of lines read within the current file). NR==1 is true only at the beginning of the very first file.
There's another problem: result.txt is created first, so it is included among *.txt. To avoid this, give it a different name (one that doesn't end in .txt) or put it in a different directory.
A possible fix:
awk 'FNR==1 {print NR, $0}' *.txt > result
Why not use head? For example with find:
find midir/ -type f -exec head -1 {} \; >> result.txt
If you want to follow your approach you need to specify the file and not use the wildcard with awk:
for files in *; do awk '{if(NR==1) print NR, $0}' "$files"; done > result.txt

Unix-Read File Line by line.Check if a string exists on another file and do required operation

I need some assistance on the below.
File1.txt
aaa:/path/to/aaa:777
bob:/path/to/bbb:700
ccc:/path/to/ccc:600
File2.txt
aaa:/path/to/aaa:700
bbb:/path/to/bbb:700
ccc:/path/to/ccc:644
I should iterate file2.txt and if aaa exists in File1.txt, then i should compare the file permission. If the file permission is same for aaa in both the files then ignore.
If they are different then write them in the output.txt
So in above case
Output.txt
aaa:/path/to/aaa:700
ccc:/path/to/ccc:644
How can i achieve this in unix shell script? Please suggest
I agree with the comment of #Marc that you should try something before asking here.
However, the following answer is difficult to find when you never have seen the constructions, so I give you something to study.
When you want to parse line by line, you can start with
while IFS=: read -r file path mode; do
comparewith=$(grep "^${file}:${path}:" File2.txt | cut -d: -f3)
# compare and output
done < File1.txt
For large files that will become very slow.
You can first filter the lines you want to compare from File2.txt.
You want to grep strings like aaa:/path/to/aaa:, including the last :. With cut -d: -f1-2 you might be fine with your inputfile, but maybe it is better to remove the last three characters:
sed 's/...$//' File1.txt.
You can let grep use the output as a file with expressions using <():
grep -f <(sed 's/...$//' File1.txt) File2.txt
Your example files don't show the situation when both files have identical lines (that you want to skip), you will need another process substitution to get that working:
grep -v -f File1.txt <(grep -f <(sed 's/...$//' File1.txt ) File2.txt )
Another solution, worth trying yourself, is using awk (see What is "NR==FNR" in awk? for accessing 2 files).
comm - compare two sorted files line by line
According to manual, comm -13 <file1> <file2> must print only lines unique to <file2>:
$ ls
File1.txt File2.txt
$ cat File1.txt
aaa:/path/to/aaa:777
bbb:/path/to/bbb:700
ccc:/path/to/ccc:600
$ cat File2.txt
aaa:/path/to/aaa:700
bbb:/path/to/bbb:700
ccc:/path/to/ccc:644
$ comm -13 File1.txt File2.txt
aaa:/path/to/aaa:700
ccc:/path/to/ccc:644
$ # Nice!
But it doesn't check for lines in <file1> that are "similar" to corresponding lines of <file2>. I. e. it won't work as you want if File1.txt has line BOB:/path/to/BOB:700 and File2.txt has BBB:/path/to/BBB:700 since it will print the latter (while you want it not to be printed).
It also won't do what you want if strings bbb:/path/to/bbb:700 and bbb:/another/path/to/bbb:700 are supposed to be "identical".

Using cut and grep commands in unix

I have a file (file1.txt) with text as:
aaa,,,,,
aaa,10001781,,,,
aaa,10001782,,,,
bbb,10001783,,,,
My file2 contents are:
11111111
10001781
11111222
I need to search second field of file1 in file2 and delete the line from file1 if pattern is matching.So output will be:
aaa,,,,,
aaa,10001782,,,,
bbb,10001783,,,,
Can I use grep and cut commands for this?
This prints lines from file1.txt only if the second field is not in file2:
$ awk -F, 'FNR==NR{a[$1]=1; next;} !a[$2]' file2 file1.txt
aaa,,,,,
aaa,10001782,,,,
bbb,10001783,,,,
How it works
This works by reading file2 and keeping track of all lines seen in an associative array a. Then, lines in file1.txt are printed only if its column 2 is not in a. In more detail:
FNR==NR{a[$1]=1; next;}
When reading file2, set a[$1] to 1 to signal that we have seen the value on this line. We then instruct awk to skip the rest of the commands and start over on the next line.
This section is only run for file2 because file2 is listed first on the command line and FNR==NR only when we are reading the first file listed on the command line. This is because FNR is the number of lines read from the current file and NR is the total number of lines read so far. These two are equal only for the first file.
!a[$2]
When reading file1.txt, a[$2] evaluates to true if column 2 was seen in file2. Since ! is negation, !a[$2] evaluates to true when column 2 was not seen. When this evaluates to true, the line is printed.
Alternative
This is the same logic, expressed in a slightly different style, as suggested in the comments by Tom Fenech:
$ awk -F, 'FNR==NR{a[$1]; next;} !($2 in a)' file2 file1.txt
aaa,,,,,
aaa,10001782,,,,
bbb,10001783,,,,
Soulution with grep
$ grep -vf file2 file1.txt
aaa,,,,,
aaa,10001782,,,,
bbb,10001783,,,,
John1024's awk soulution would be faster for large files though.

Difference between two files

I have two files. I want to compare the files but the order of the rows is not same in both files.
Can you please provide the simplest method to compare the both files.
Example:
file1
My name is sumit.
My surname is vedi.
I like shell scripting.
file2
My surname is vedi.
My name is sumit.
I like shell scripting.
The difference between the files should be zero; however, the order of the rows is not same.
Note: the files are huge.
Probably the command below would do the thing.
diff <(sort file1) <(sort file2)
If the files are huge and you do not need a sort command, then may be you could use awk:
awk 'FNR==NR{a[$0];next}!($0 in a)' file1 file2
The above command will only give the lines that are present in file2 but not in file1.

Resources