Find Difference in Files but only show lines that dont exist in the other - string-comparison

I have 2 files that I need to compare, but each file has many of the same entries on different lines. How can I compare 2 LARGE files (many gb's) and only see the lines that dont show up in the other?
Example:
File1:
Line 1: Hello
Line 2: Goodbye
Line 3: Cake
File2:
Line 1: Cake
Line 2: Hello
Line 3: Test
Line 4: Goodbye
and only Line3 would be displayed because it is different, and everything else wont.

Related

How do I delete all lines from a file after (and including) a line that contains a defined string in a Bash script?

I'm hacking about a text file in the middle of a Bash script (on an RPI3B+ with OSMC installed) and trying to crop a file at the first line that contains the text "BLAH DE BLAH" (deleting everything in the same file after and including the first line it finds that text on).
For example (in the file filename.text):
This is the first line
This is the second line
This is the third line containing "BLAH DE BLAH"
This is the fourth line
This is the fifth line
Required output (in the file filename.text):
This is the first line
This is the second line
I've tried to investigate awk and sed related posts, but I'm finding it all so confusing as I can't find anything that does exactly what I need (some split at certain line numbers, some from the command line not a bash script, some before and after certain strings)... and I'm stuck. As you can see, I can't even work out how to format this post properly (my head hurts so much)!
Any help appreciated - thanks!
Looks like
sed '/BLAH DE BLAH/Q'
would do the job in GNU sed.

Modify a line below a specific line

I have a big file like this small example:
>ENSG00000002587|ENST00000002596
ATGGCCGCGCTGCTCCTGGGCGCGGTGCTGCTGGTGGCCCAGCCCCAGCTAGTGCCTTCC
>ENSG00000004059|ENST00000000233
ATGGGCCTCACCGTGTCCGCGCTCTTTTCGCGGATCTTCGGGAAGAAGCAGATGCGGATT
>ENSG00000003249|ENST00000002501
ATGGAGCCCCCGGAGGGCGCCGGCACCGGAGAGATCGTTAAGGAGGCTGAGGTGCCGCAG
GCTGCGCTGGGCGTCCCAGCCCAGGGGACAGGGGACAATGGCCACACGCCTGTGGAGGAG
>ENSG00000048028|ENST00000003302
ATGACTGCGGAGCTGCAGCAGGACGACGCGGCCGGCGCGGCAGACGGCCACGGCTCGAGC
TGCCAAATGCTGTTAAATCAACTGAGAGAAATCACAGGCATTCAGGACCCTTCCTTTCTC
CATGAAGCTCTGAAGGCCAGTAATGGTGACATTACTCAGGCAGTCAGCCTTCTCACTGAT
I want to remove the first 5 character of every line which is below the line that starts with >.
I do not know how to do that in command line. Do you know?
Here is the expected output:
>ENSG00000002587|ENST00000002596
CGCGCTGCTCCTGGGCGCGGTGCTGCTGGTGGCCCAGCCCCAGCTAGTGCCTTCC
>ENSG00000004059|ENST00000000233
CCTCACCGTGTCCGCGCTCTTTTCGCGGATCTTCGGGAAGAAGCAGATGCGGATT
>ENSG00000003249|ENST00000002501
GCCCCCGGAGGGCGCCGGCACCGGAGAGATCGTTAAGGAGGCTGAGGTGCCGCAG
GCTGCGCTGGGCGTCCCAGCCCAGGGGACAGGGGACAATGGCCACACGCCTGTGGAGGAG
>ENSG00000048028|ENST00000003302
TGCGGAGCTGCAGCAGGACGACGCGGCCGGCGCGGCAGACGGCCACGGCTCGAGC
TGCCAAATGCTGTTAAATCAACTGAGAGAAATCACAGGCATTCAGGACCCTTCCTTTCTC
CATGAAGCTCTGAAGGCCAGTAATGGTGACATTACTCAGGCAGTCAGCCTTCTCACTGAT
sed -E '/^>/{N;s/\n.{5}/\n/}' file
find line starting with >
join that line with next
replace newline and five chars with just newline

How to comment/edit multiple lines in vim/vi by using disjoint lines numbers [duplicate]

I can use
:5,12s/foo/bar/g
to search for foo and replace it by bar between lines 5 and 12. How can I do that only in line 5 and 12 (and not in the lines in between)?
Vim has special regular expression atoms that match in certain lines, columns, etc.; you can use them (possibly in addition to the range) to limit the matches:
:5,12s/\(\%5l\|\%12l\)foo/bar/g
See :help /\%l
You can do the substitution on line 5 and repeat it with minimal effort on line 12:
:5s/foo/bar
:12&
As pointed out by Ingo, :& forgets your flags. Since you are using /g, the correct command would be :&&:
:5s/foo/bar/g
:12&&
See :help :& and friends.
You could always add a c to the end. This will ask for confirmation for each and every match.
:5,12s/foo/bar/gc
Interesting question. Seems like there's only range selection and no multiple line selection:
http://vim.wikia.com/wiki/Ranges
However, if you have something special on line 5 and 12, you could use the :g operator. If your file looks like this (numbers only for reference):
1 line one
2 line one
3 line one
4 line one
5 enil one
6 line one
7 line one
8 line one
9 line one
10 line one
11 line one
12 enil one
And you want to replace one by eno on the lines where there's enil instead of line:
:g/enil/s/one/eno/
You could use ed - a line oriented text editor with similar commands to vi and vim. It probably predates vi and vim.
In a script (using a here document which processes input till the EndCommand marker) it would look like:
ed file <<EndCommands
5
s/foo/bar/g
7
s/foo/bar/g
wq
EndCommands
Obviously, the ed commands can be used on the command line also.

Echo string to .txt file with multiple lines - with Windows Batch file

I am attempting to create a Windows Batch File that creates a .txt with mulitple lines. I've tried several solutions to insert a line break in the string but no avail. There are other similar questions/answers but none of them address putting the entire string into a text file.
My batch file currently reads:
echo Here is my first line
Here is my second line > myNewTextFile.txt
pause
my goal is to have the text file read:
Here is my first line
Here is my second line
Obviously, this does not work currently, but wondering if anyone knows how to make this happen in a simple fashion?
(
echo Here is my first line
echo Here is my second line
echo Here is my third line
)>"myNewTextFile.txt"
pause
Just repeat the echo and >> for lines after the first. >> means that it should append to a file instead of creating a new file (or overwriting an existing file):
echo Here is my first line > myNewTextFile.txt
echo Here is my second line >> myNewTextFile.txt
echo Here is my third line >> myNewTextFile.txt
pause
Searching for something else, I stumbled on this meanwhile old question, and I have an additional little trick that is worth mentioning, I think.
All solutions have a problem with empty lines and when a line starts with an option for the echo command itself. Compare the output files in these examples:
call :data1 >file1.txt
call :data2 >file2.txt
exit /b
:data1
echo Next line is empty
echo
echo /? this line starts with /?
echo Last line
exit /b
:data2
echo:Next line is empty
echo:
echo:/? this line starts with /?
echo:Last line
exit /b
Now, file1.txt contains:
Next line is empty
ECHO is off.
Displays messages, or turns command-echoing on or off.
ECHO [ON | OFF]
ECHO [message]
Type ECHO without parameters to display the current echo setting.
Last line
While file2.txt contains:
Next line is empty
/? this line starts with /?
Last line
The use of echo: miraculously solves the issues with the output in file1.txt.
Besides the colon, there are other characters that you could 'paste' to echo, among them a dot, a slash, ... Try for yourself.
STEP 1: Enter Line 1 followed by the ^ character.
echo Here is my first line^
STEP 2: Hit RETURN key to get a prompt for more text
echo Here is my first line^
More?
STEP 3: Hit RETURN key once more to get a second prompt for more text
echo Here is my first line^
More?
More?
STEP 4: Continue line 2 from the second prompt
echo Here is my first line^
More?
More? Here is my second line
STEP 5: Hit the RETURN key to get 2 statements displayed on two separate lines
Results:
echo Here is my first line^
More?
More? Here is my second line
Here is my first line
Here is my second line
NOTE
However, if you wish to save this to file, you could add a final STEP.
STEP 6: with the help of the > character, you can append the filename so you save your output to file instead.
echo Here is my first line^
More?
More? Here is my second line >"myNewTextFile.txt"
Example from CMD

Going to a specific line number using Less in Unix

I have a file that has around million lines. I need to go to line number 320123 to check the data. How do I do that?
With n being the line number:
ng: Jump to line number n. Default is the start of the file.
nG: Jump to line number n. Default is the end of the file.
So to go to line number 320123, you would type 320123g.
Copy-pasted straight from Wikipedia.
To open at a specific line straight from the command line, use:
less +320123 filename
If you want to see the line numbers too:
less +320123 -N filename
You can also choose to display a specific line of the file at a specific line of the terminal, for when you need a few lines of context. For example, this will open the file with line 320123 on the 10th line of the terminal:
less +320123 -j 10 filename
You can use sed for this too -
sed -n '320123'p filename
This will print line number 320123.
If you want a range then you can do -
sed -n '320123,320150'p filename
If you want from a particular line to the very end then -
sed -n '320123,$'p filename
From within less (in Linux):
g and the line number to go forward
G and the line number to go backwards
Used alone, g and G will take you to the first and last line in a file respectively; used with a number they are both equivalent.
An example; you want to go to line 320123 of a file,
press 'g' and after the colon type in the number 320123
Additionally you can type '-N' inside less to activate / deactivate the line numbers. You can as a matter of fact pass any command line switches from inside the program, such as -j or -N.
NOTE: You can provide the line number in the command line to start less (less +number -N) which will be much faster than doing it from inside the program:
less +12345 -N /var/log/hugelogfile
This will open a file displaying the line numbers and starting at line 12345
Source: man 1 less and built-in help in less (less 418)
For editing this is possible in nano via +n from command line, e.g.,
nano +16 file.txt
To open file.txt to line 16.

Resources