Comparing/finding the difference between two text files using findstr

Comparing/finding the difference between two text files using findstr - windows

I have a requirement to compare two text files and to find out the difference between them. Basically I have an input file (input.txt) which will be processed by a batch job and my batch will log the output (successful.txt) where the job has successfully ran.
In simple words, I need to find out the difference between input.txt and successful.txt (input.txt-successful.txt) and I was thinking to use findstr. It seems to be fine, BUT I don't understand one part of it. It always includes the last line of my input.txt in the output. You could see that in the example below. Please note that there is no leading space or line break after the last line of my input.txt.
In below example, you could see the line server1,db1 is present on both the files, but still listed in the output. (It is always the last line of input.txt)
D:\Scripts\dummy>type input.txt
server2,db2
server3,db3
server10,db10
server4,db4
server1,db11
server10,schema11
host1,sch2
host11,sql2
host11,sql3
server1,db1
D:\Scripts\dummy>type successful.txt
server1,db1
server2,db2
server3,db3
server4,db4
server10,db10
host1,sch2
host11,sql2
host11,sql3
D:\Scripts\dummy>findstr /vixg:successful.txt input.txt
server1,db11
server10,schema11
server1,db1
What am I doing wrong?
Cheers,
G

I could reproduce your results by removing the newline after the last line of input.txt, so solution 1 would be to add a newline to the end of input.txt. Since you appear to say that input.txt has no terminal newline, then adding one would cure the problem; findstr is acting as expected because it acts on newline-terminated lines.
Solution 2 would be
type input.txt|findstr /vixg:successful.txt

Related

How to search for special shell characters (in Linux) from one massive file in another without changing them

I've got two massive files with millions of lines.
In the first file1 one of the lines is
Oz5,z!F,k"H,#$5,#%J,$&L,m'F,o(H,6X),c*7
and in the 2nd file2 there are many lines containing the above one, e.g.,
Oz5,z!F,k"H,#$5,#%J,$&L,m'F,o(H,6X),c*7.X5t,&&***b,ccc
I want to search for the lines from file1 in file2 and I face two problems:
search itself clashes with special characters in any shell (sh,bash,csh,...)
!F,k"H,#$5,#%J,$: event not found
I also tried egrep, awk, ack, ... - same result.
How can I go around that? The aforementioned nature of the strings to be searched does not allow me to treat them in any obvious way. E.g., I do not see how I can possibly substitute something for say "!"; because if I introduce "\!" that would clash with "\!" which is also a string in file1,2. Note that all prinatable ASCII characters in all combinations appear in file1 and file2.
What I would apparently need is a shell (perhaps a virtual one) which has no special characters. Is there such a Unix shell?
how to take line by line from file1 in order to search for them in file2 and extract them from file2 into file3?

I solved the problem in the following way.
All shells and search engines in them as well as most editors (like vi, vim) have special characters built in. But not Emacs.
I used Emacs macro as follows:
Split the Emacs window into 3 sub-windows one atop of another. Put file1 in the top one, file2 in the middle, and the ouput one (file3) in the bottom one. Start macro "C^x (" with the cursor at the begging of file1. Copy the line. Go to the beginning of the next line. Go to file2: C^x o. Search for the copied line. Copy the first found line containing the line from file1. Go to the beggining of file2. Go to file3. Paste the line from file2. Go to the next line. Go to file1. Close the macro "C^x )". Repeat the macro as many times as there are remaining lines (say n) in file1: M^n C^x e . (M=Esc).

Appending a count to a code in multiple files and saving the result

I'm looking for a bit of help here. I'm a complete newbie!
I need to look in a file for a code matching the pattern A00000_00_A and append a count to it, so the first time it appears it is replaced with A00000_00_A_001, second time A00000_00_A_002 etc. The output needs to be written back to the same file. Each file only contains 1 code, but it appears multiple times.
After some digging I have found-
perl -pi -e 's/Q\d{4,5}'_'\d{2}_./$&.'_'.++$A /ge' /users/documents/*.xml
but the issue is the counter does not reset in each file.
That is, the output of the first file is say Q00390_01_A_1 to Q00390_01_A_7, while the second file is Q00391_01_A_8 to Q00391_01_A_10.
What I want is Q00390_01_A_1 to Q00390_01_A_7 in the first file and Q00391_01_A_1 to Q00391_01_A_2 in the second.
Does anyone have any idea on how to edit the above code to make it do that? I'm a total newbie so ideally an edit to what I have would be brilliant. Thanks

cd /users/documents/
for f in *.xml;do
perl -pi -e 's/facs=.(Q|M)\d{4,5}_\d{2}_\w/$&._.sprintf("%04d",++$A) /ge' $f
done
This matches the string facs= and any character, then "Q" or "M" followed by either four or five digits, then an underscore, then two digits, another underscore, and a word character. The entire match is then concatenated with an underscore and the value of $A zero padded to four digits.

grep listing false duplicates

I have the following data containing a subset of record numbers formatting like so:
>head pilot.dat
AnalogPoint,206407
AnalogPoint,2584
AnalogPoint,206292
AnalogPoint,206278
AnalogPoint,206409
AnalogPoint,206410
AnalogPoint,206254
AnalogPoint,206266
AnalogPoint,206408
AnalogPoint,206284
I want to compare the list of entries to another subset file called "disps.dat" to find duplicates, which is formatted in the same way:
>head disps.dat
StatusPoint,280264
StatusPoint,280266
StatusPoint,280267
StatusPoint,280268
StatusPoint,280269
StatusPoint,280335
StatusPoint,280336
StatusPoint,280334
StatusPoint,280124
I used the command:
grep -f pilot.dat disps.dat > duplicate.dat
However, the output file "duplicate.dat" is listing records that exist in the second file "disps.dat", but do not exist in the first file.
(Note, both files are big, so the sample shown above don't have duplicates, but I do expect and have confirmed at least 10-12k duplicates to show up in total)
> head duplicate.dat
AnalogPoint,208106
AnalogPoint,208107
StatusPoint,1235220
AnalogPoint,217270
AnalogPoint,217271
AnalogPoint,217272
AnalogPoint,217273
AnalogPoint,217274
AnalogPoint,217275
AnalogPoint,217277
> grep "AnalogPoint,208106" pilot.dat
>
I tested the above command with a smaller sample of data (10 records), also formatted the same, and the results work fine, so I'm a little bit confused on why it is failing on the larger execution.
I also tried feeding it in as a string with -F thinking that the "," comma might be the source of issue. Right now, I am feeding the data through a 'for' loop and echoing each line, which is executing very, very slowly but at least it will help me cross out the regex possibility.

the -x or -w option is needed to do an exact match.
-x will match exact string, and -w will match exact substring and block non-word characters which works in my case to handle trailing numbers.
The issue is that a record in the first file such as:
"AnalogPoint,1"
Would end up flagging records in the second file like:
"AnalogPoint,10"
"AnalogPoint,123"
"AnalogPoint,100200"
And so on.
Thanks to #Barmar for pointing out my issue.

Replacing Middle Part of String Occurring Multiple Times

I have a file, that has variations of this line multiple times:
source = "git::https://github.com/ORGNAME/REPONAME.git?ref=develop"
I am passing through a tag name in a variable. I want to find every line that starts with source and update that line in the file to be
source = "git::https://github.com/ORGNAME/REPONAME.git?ref=$TAG"
This should be able to be done with awk and sed, but having some difficulty making it work. Any help would be much appreciated!
Best,
Keren
Edit: In this scenario, the it says "develop", but it could also be set to "feature/test1" or "0.0.1" as well.
Edit2: The line with "source" is also indented by three or four spaces.

This should do:
sed 's/^\([[:blank:]]*source.*[?]ref=\)[^"]*\("\)/\1'"$TAG"'\2/' file

with sed
$ sed '/^source/s/ref=develop"$/ref=$TAG"/' file
replace ref=develop" at the end of line with ref=$TAG" for lines starting with source.

Unexpected command line behavior using commas in Windows/DOS batch file

Today, I wanted to test if filenames can contain commas and stumbled upon something else while opening cmd and trying these three tests:
echo a,b>a
This works as supposed (writes a,b to the file named a)
echo a>a,b
Does just the same! What happens here gets a bit clearer with the third test:
echo a>file,b this is a test
This will create a file named file containing a,b this is a test.
Now, three questions arise for me:
What is the explanation for this? If someone asked me, I would've guessed the comma separates commands or filenames, e.g. I would've expected the second test to create two files named a and b.
Is this behaviour documented somewhere?
Is it a cmd specific Windows extension or has it been like this since good old DOS times?

It's expected behaviour as ,;=<space><tab> are delimiters for parameters.
If you put the code into a batch file without echo OFF you will see
test.bat
echo a,b>a
echo a>a,b
echo a>file,b this is a test
Output
C:\temp>test.bat
C:\temp>echo a,b 1>a
C:\temp>echo a,b 1>a
C:\temp>echo a,b this is a test 1>file
After a redirection, only the next token is relevant, the rest is part of the normal line content.
It's unimportant where the redirection occurs in a line.
But there is the rule that when more than one redirection exists for the same stream, the last one will win.
> file.txt echo hello> nul world > con
This will result in hello world at the console.
Btw. There is still an obscure behaviour with redirection and lines extended by carets (multilines).
echo one two three^
four
Result: one two three four
But
echo one two >con three^
four
Result: one two four

The comma is a standard delimiter in batch as well as ; <space> = <tab> and everything after the comma is taken as another parameter to echo and only one parameter is taken for the redirection. You can try to enclose a,b in quotes and this should change the behaviour of the output and produce a,b file. You can also escape the delimiters with ^ - echo a>a^,b
You can try also echo a>a=b - it will be the same.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Comparing/finding the difference between two text files using findstr - windows

Related

How to search for special shell characters (in Linux) from one massive file in another without changing them

Appending a count to a code in multiple files and saving the result

grep listing false duplicates

Replacing Middle Part of String Occurring Multiple Times

Unexpected command line behavior using commas in Windows/DOS batch file

Categories

Resources