Compare files SIDE by SIDE using win batch script - windows

I am doing the following, but have two issues with this (win 2003):
Output includes first and last line that is matching :-(
Output is not side by side
Can these two issues be solved? Suggestiongs (I tried /nnn and /lb options)
SCRIPT:
:bof
::Swith off echo - the # switch off echo for this line
#echo off
:: Clear screen
cls
:: Keep Locals contained to this batch file
setlocal
:: Set the two files and check that they exist
set fileA=X:\tst\pfsrc\alex.txt
set fileB=X:\tst\cbsrc\alex.txt
if not exist "%fileA%" echo %fileA% not found & goto :end
if not exist "%fileB%" echo %fileB% not found & goto :end
:: del .old file rename previous output file to .old
rem if exist X:\tst\cbsrc\resultPF1.old del X:\tst\cbsrc\resultPF1.old
rem if exist X:\tst\cbsrc\resultPF1.txt rename X:\tst\cbsrc\resultPF1.txt *.old
:: compare files
FC /c /l /n /w %fileA% %fileB%
:end
1st INPUT FILE:
new line
line1
line2
line3
line4
insert new line
and another new line
line5
line6
line7
and a line here
line8
line9
line10
what is this line?
line11
2nd INPUT FILE:
new line
alex
hart
was
here
line5
line6
line7
line8
line here
line9
line10
OUTPUT:
Comparing files X:\TST\PFSRC\alex.txt and X:\TST\CBSRC\ALEX.TXT
***** X:\TST\PFSRC\alex.txt
1: new line
2: line1
3: line2
4: line3
5: line4
6: insert new line
7: and another new line
8: line5
***** X:\TST\CBSRC\ALEX.TXT
1: new line
2: alex
3: hart
4: was
5: here
6: line5
*****
***** X:\TST\PFSRC\alex.txt
10: line7
11: and a line here
12: line8
13: line9
***** X:\TST\CBSRC\ALEX.TXT
8: line7
9: line8
10: line here
11: line9
*****
***** X:\TST\PFSRC\alex.txt
15: what is this line?
16: line11
***** X:\TST\CBSRC\ALEX.TXT
*****

The FC command always show differences as sets of lines listed one after the other; there is no way to show the differences side by side. You may get FC output and process it in a Batch program so the differences be displayed side by side, but you must realize that this program should identify different particular cases in order to correctly show two sections side by side.
Some time ago I wrote such a program; I called it "FComp.bat" and you may download it here. For example:
C:\> FComp.bat /C /L /N /W 1stInputFile.txt 2ndInputFile.txt
Comparing files 1stInputFile.txt and 2NDINPUTFILE.TXT
============================== SECTION MODIFIED =============================
1: new line | 1: new line
2: line1 | 2: alex
3: line2 | 3: hart
4: line3 | 4: was
5: line4 | 5: here
6: insert new line | 6: line5
7: and another new line
8: line5
============================== SECTION MODIFIED =============================
10: line7 | 8: line7
11: and a line here | 9: line8
12: line8 | 10: line here
13: line9 | 11: line9
OLD SECTION DELETED AT END OF FILE ===========================================
- 15: what is this line?
- 16: line11

Related

Print all the lines between two patterns in shell

I have a file which is the log of a script running in a daily cronjob. The log file looks like-
Aug 19
Line1
Line2
Line3
Line4
Line5
Line6
Line7
Line8
Line9
Aug 19
Aug 20
Line1
Line2
Line3
Line4
Line5
Line6
Line7
Line8
Line9
Aug 20
Aug 21
Line1
Line2
Line3
Line4
Line5
Line6
Line7
Line8
Line9
Aug 21
The log is written by the script starting with the date and ending with the date and in between all the logs are written.
Now when I try to get the logs for a single day using the command below -
sed -n '/Aug 19/,/Aug 19/p' filename
it displays the output as -
Aug 19
Line1
Line2
Line3
Line4
Line5
Line6
Line7
Line8
Line9
Aug 19
But if I try to get the logs of multiple dates, the logs of last day is always missing.
Example- If I run the command
sed -n '/Aug 19/,/Aug 20/p' filename
the output looks like -
Aug 19
Line1
Line2
Line3
Line4
Line5
Line6
Line7
Line8
Line9
Aug 19
Aug 20
I have gone through this site and found some valuable inputs to a similar problem but none of the solutions work for me. The links are Link 1
Link 2
The commands that I have tried are -
awk '/Aug 15/{a=1}/Aug 21/{print;a=0}a'
awk '/Aug 15/,/Aug 21/'
sed -n '/Aug 15/,/Aug 21/p
grep -Pzo "(?s)(Aug 15(.*?)(Aug 21|\Z))"
but none of the commands gives the logs of the last date, all the commands prints till the 1st timestamp as I have shown above.
I think you can use the awk command as followed to print the lines between Aug 19 & Aug 20,
awk '/Aug 19/||/Aug 20/{a++}a; a==4{a=0}' file
Brief explanation,
/Aug 19/||/Aug 20/: find the record matched Aug 19 or Aug 20
if the criteria is met, set the flag a++
if the flag a in front of the semicolon is greater than 0, that would print the record.
Final criteria, if a==4, then reset a=0, mind that it only worked for the case in the example, if Aug 19 or Aug 20 are more than 4, modify the number 4 in the answer to meet your new request.
If you want to assign the searched patterns into variables, modify the command as followed,
$ b="Aug 19"
$ c="Aug 20"
$ awk -v b="$b" -v c="$c" '$0 ~ c||$0 ~ b{a++}a; a==4{a=0}' file
You may use multiple patterns by separating with a semicolon.
sed -n '/Aug 19/,/Aug 19/p;/Aug 20/,/Aug 20/p' filename
Could you please try following awk solution too once and let me know if this helps you.
awk '/Aug 19/||/Aug 20/{flag=1}; /Aug/ && (!/Aug 19/ && !/Aug 20/){flag=""} flag' Input_file
EDIT: Adding output too here for letting OP know.
awk '/Aug 19/||/Aug 20/{flag=1}; /Aug/ && (!/Aug 19/ && !/Aug 20/){flag=""} flag' Input_file
Aug 19
Line1
Line2
Line3
Line4
Line5
Line6
Line7
Line8
Line9
Aug 19
Aug 20
Line1
Line2
Line3
Line4
Line5
Line6
Line7
Line8
Line9
Aug 20
The following approach is quite easy to understand conceptually...
Print all lines from Aug 19 onwards to end of file.
Reverse the order of the lines (with tac because tac is cat backwards).
Print all lines from Aug 21 onwards.
Reverse the order of the lines back to the original order.
sed -ne '/Aug 19/,$p' filename | tac | sed -ne '/Aug 21/,$p' | tac

How to find duplicate lines in a file?

I have an input file with foillowing data:
line1
line2
line3
begin
line5
line6
line7
end
line9
line1
line3
I am trying to find all the duplicate lines , I tried
sort filename | uniq -c
but does not seem to be working for me :
It gives me :
1 begin
1 end
1 line1
1 line1
1 line2
1 line3
1 line3
1 line5
1 line6
1 line7
1 line9
the question may seem duplicate as Find duplicate lines in a file and count how many time each line was duplicated?
but nature of input data is different .
Please suggest .
use this:
sort filename | uniq -d
man uniq
try
sort -u file
or
awk '!a[$0]++' file
you'll have to modify the standard de-dupe code just a tiny bit to account for this:
if you want unique copy of the duplicates, then it's very much same idea:
{m,g}awk 'NF~ __[$_]++' FS='^$'
{m,g}awk '__[$_]++==!_'
If you want every copy printed for duplicates, then whenever the condition yields true for the first time, print 2 copies of it, plus print new matches along the way.
Usually it's waaaaaaaaay faster to first de-dupe, then sort, instead of the other way around.

awk or sed - delete specific lines

I've got this
line1
line2
line3
line4
line5
line6
line7
line8
line9
line10
line11
line12
line13
line14
line15
I want this
line1
line3
line5
line6
line8
line10
line11
line13
line15
As you can see , the line to be deleted are going from x+2 then x+3 , x equals the line number to be deleted.
I tried this with awk but this is not the right way.
awk '(NR)%2||(NR)%3' file > filev1
Any ideas why?
If I decipher your requirements correctly, then
awk 'NR % 5 != 2 && NR % 5 != 4' file
should do.
Based on Wintermutes logic :)
awk 'NR%5!~/^[24]$/' file
line1
line3
line5
line6
line8
line10
line11
line13
line15
or
awk 'NR%5~/^[013]$/' file
How it works.
We can see from your lines that the one with * should be removed and other kept.
line1
line2*
line3
line4*
line5
line6
line7*
line8
line9*
line10
line11
line12*
line13
line14*
line15
By grouping data inn to every 5 line NR%5,
We see that line to delete is 2 or 4 in every group.
NR%5!~/^[24]$/' This divide data inn to group of 5
Then this part /^[24]$/' tell to not keep 2 or 4
The ^ and $ is important so line 12 47 i deleted too,
since 12 contains 2. So we need to anchor it ^2$ and ^4$.
Using GNU sed, you can do the following command:
sed '2~5d;4~5d' test.txt

replacing word in shell script or sed

I am a newbie, but would like to create a script which does the following.
Suppose I have a file of the form
This is line1
This is line2
This is line3
This is line4
This is line5
This is line6
I would like to replace it in the form
\textbf{This is line1}
This is line2
This is line3
\textbf{This is line4}
This is line5
This is line6
That is, at the start of the paragraph I would like to add a text \textbf{ and end the line with }. Is there a way to search for double end of lines? I am having trouble creating such a script with sed. Thank you !
Using awk you can write something like
$ awk '!f{ $0 = "\\textbf{"$0"}"; f++} 1; /^$/{f=0}' input
\textbf{This is line1}
This is line2
This is line3
\textbf{This is line4}
This is line5
This is line6
What it does?
!f{ $0 = "\\textbf{"$0"}"; f++}
!f True if value of f is 0. For the first line, since the value of f is not set, will evaluates true. If its true, awk performs tha action part {}
$0 = "\\textbf{"$0"}" adds \textbf{ and } to the line
f++ increments the value of f so that it may not enter into this action part, unless f is set to zero
1 always True. Since action part is missing, awk performs the default action to print the entire line
/^$/ Pattern matches an empty line
{f=0} If the line is empty, then set f=0 so that the next line is modfied by the first action part to include the changes
An approach using sed
sed '/^$/{N;s/^\(\n\)\(.*\)/\1\\textbf{\2}/};1{s/\(.*\)/\\textbf{\1}/}' my_file
find all lines that only have a newline character and then add the next line to it. ==
^$/{N;s/^\(\n\)\(.*\)/\1\\textbf{\2}/}
mark the line below the blank line and modify it
find the first line in the file and do the same == 1{s/\(.*\)/\\textbf{\1}/}
Just use awk's paragraph mode:
$ awk 'BEGIN{RS="";ORS="\n\n";FS=OFS="\n"} {$1="\\textbf{"$1"}"} 1' file
\textbf{This is line1}
This is line2
This is line3
\textbf{This is line4}
This is line5
This is line6

Separate by blank lines in bash

I have an input like this:
Block 1:
line1
line2
line3
line4
Block 2:
line1
line2
Block 3:
line1
line2
line3
This is an example, is there an elegant way to print Block 2 and its lines only without rely on their names? It would be like "separate the blocks by the blank line and print the second block".
try this:
awk '!$0{i++;next;}i==1' yourFile
considering performance, also can add exit after 2nd block was processed:
awk '!$0{i++;next;}i==1;i>1{exit;}' yourFile
test:
kent$ cat t
Block 1:
line1
line2
line3
line4
Block 2:
line1
line2
Block 3:
line1
line2
line3
kent$ awk '!$0{i++;next;}i==1' t
Block 2:
line1
line2
kent$ awk '!$0{i++;next;}i==1;i>1{exit;}' t
Block 2:
line1
line2
Set the record separater to the empty string to separate on blank lines. To
print the second block:
$ awk -v RS= 'NR==2{ print }'
(Note that this only separates on lines that do not contain any whitespace.
A line containing only white space is not considered a blank line.)

Resources