Shell scripts: diff command - shell

I have a small question about the diff command. I am comparing two ascii files to check if there's a difference between them and print out the output to another ascii. However my problem is that the order of the contents of the files shouldn't matter, for example let's say we have:
file1.txt with
1
2
3
4
5
6
7
file2.txt with
1
3
2
4
so that when i do a "diff" on them the output should just be:
5
6
7
i.e the order of the two files shouldn't matter, it should just print out the whatever it is that is different between the two files.

How about:
comm -3 <(sort file1.txt) <(sort file2.txt)

First you have to sort both the files and then input the files into diff command or comm command. There are many options for doing so.
Instead of providing a command that would do your job. I can provide you a link that can make you understand how it works.
Here it is Click here

Related

How can I get output of 2 files with no-duplicates of lines [from any file]?

Given two files (so that at any file can be duplicates) in the following format:
file1 (file that contains only numbers) for example:
10
40
20
10
10
file2 (file that contains only numbers) for example:
30
40
10
30
0
How can I prints the contents of the files, so that, from any file, we will remove the duplications.
For example, the output according to the 2 file above, need to be:
10
40
20
30
40
10
0
Note: in the output, we can get duplications (at most, will be 2 number that appears two times) , but, from any file, we will take the content without duplications !
How can I do it with sort , uniq , cat using only one command?
Namely, something like that: cat file1 file2 | sort | uniq (but, of course, this command not good - it's not solve the problem, it's only for explain what I mean while I say "using only one command").
I will be happy to listen your ideas how do it :)
If I understood the question correctly, this awk should do it while preserving the order:
awk 'FNR==1{delete a}!a[$0]++' file1 file2
If you don't need to preserve the order, it can be as simple as:
sort -u file1; sort -u file2
If you don't want to use a list (;), something like this is also an option:
cat <(sort -u file1) <(sort -u file2)

Extract blocks of lines with sed

How would one go about with sed to extract n lines of a file every m-th line?
Say my textfile looks like this:
myfile.dat:
1
2
3
4
5
6
7
8
9
10
Say that I want to extract blocks of three lines and then skipping two lines throughout the entire file, such that my output looks like this:
output.dat:
1
2
3
6
7
8
Any suggestions on how one could achieve this with sed?
Edit:
For my example I could just have used
sed -n 'p;n;p;n;p;n;n' myfile.dat > output.dat
or with GNU sed (not preferred due to portability)
sed '1~5b;2~5b;3~5b;d' myfile.dat > output.dat
However, I typically want to print blocks of 2450 lines from a file with 49 002 450 lines, such that my outputfile contains 247 450 lines.
This might work for you (GNU sed):
sed -n '1~5,+2p' file
Starting at line 1, print line numbers with modulus 5 and the following two lines.
An alternative:
sed -n 'N;N;p;n;n' file
In your case the below would work. It's checking the remainder when divided by 5 is between 1 and 3:
awk 'NR%5==1, NR%5==3' myfile.dat

how to copy certain part of file to another file in bash script [duplicate]

This question already has answers here:
How to get the part of a file after the first line that matches a regular expression
(12 answers)
Closed 7 years ago.
I am trying to do post processing using a bash script. One of the step is to copy from lines below certain line of the file to another file. For example, my file result.txt looks like this
6798
Atoms. Timestep: 0
1 -1.13977 -2.85824 -1.22655
1 -1.20925 -3.25439 -0.978641
1 -1.54565 -2.93301 -1.10555
1 -0.736167 -2.71201 -1.28063
1 -0.807178 -3.16856 -1.13489
1 -0.44354 -3.03183 -1.23103
1 -0.357782 -3.39866 -1.0277
1 -0.0431648 -3.05734 -1.23315
6798
Atoms. Timestep: 1
1 0.119814 -3.40483 -1.03653
1 0.382198 -3.03771 -1.23183
1 0.580509 -3.37434 -1.02215
1 0.818628 -3.00063 -1.21422
1 1.0244 -3.31469 -0.980957
1 1.27482 -2.97232 -1.15564
I want only the part below Atoms. Timestep: 1 . It is very easy to do it manually, however, there are thousands of file like that so that i want to use bash script to do it for me. I have checked several method from google however, they are not that fit from my question. If you have experience in that it would be nice to post your result below. That may help others who have similar problem like me.
Thank you in advance!
FINAL SOLUTION
My final solution is as guys mentioned below, I make a conclusion.
sed -n '/Atoms. Timestep: 1/, $ p' < result.txt > resultParsed.txt
sed -i '/Atoms. Timestep: 1/d' resultParsed.txt
The first line will get the content including and after Atoms. Timestep: 1. The second line will delete the line that you don't need.
This will copy from result.txt, from the first match until the end of the file and save in resultParsed.txt:
sed -n '/Atoms. Timestep: 1/, $ p' < result.txt > resultParsed.txt
This will copy from result.txt, from the first match and the next 6 entries and save in resultParsed.txt:
sed -n '/Atoms. Timestep: 1/ , +6 p' < result.txt > resultParsed.txt
Environment: GNU sed version 4.1.5
Using awk
awk 'x+=/Atoms\. Timestep: 1/' file >newfile
Increments x when that string if found, x is then more than zero which is seen as true and the default action for awk is to print. So all lines including and after this one are printed.

sed/awk - compare files, and return lines that have differences

I have two files
file1:
1
2
3
4
5
6
file2:
1
2a
3
4
5
6a
How can I script something that will return a third file that will output the lines that are different, along with the line number and filename? Ie. as lines 2 and 6 are different, the output is something like this, with a filename of 'file3':
file3
file1;line 2;2
file2;line 2;2a
file1;line 6;6
file2;line 6;6a
Thankyou as always!
Probably you are looking for diff command
type man diff on your shell for more detials
There are diff command as jkshah said. And there are too vimdiff tool.
If it's for occasional analysis and not to encapsulated in a script, vimdiff is very pleasant to use (using vim therefore allows access to vim commands such as search by regex, but also the syntax highlighting ...). In other words, with vimdiff you have a better visual diff.
vimdiff file1 file2

Shell line command Sorting command [duplicate]

This question already has answers here:
find difference between two text files with one item per line [duplicate]
(11 answers)
Closed 9 years ago.
I have a Masters.txt (all records) and a New.txt file. I want to process New.txt against Masters.txt and output all the lines from New.txt that do not exist in Masters.txt
i'm not sure if this is something the sort -u command can do.
Sort both files first using sort and then use the comm command to list the lines that exist only in new.txt and not in masters.txt. Something like:
sort masters.txt >masters_sorted.txt
sort new.txt >new_sorted.txt
comm -2 -3 new_sorted.txt masters_sorted.txt
comm produces three columns in its output by default; column 1 contains lines unique to the first file, column 2 contains lines unique to the second file; column 3 contains lines common to both files. The -2 -3 switches suppress the second and third columns.
see the linux comm command:
http://unstableme.blogspot.com/2009/08/linux-comm-command-brief-tutorial.html

Resources