Looking for fast file comparison - windows

My mission is to compare a list of hundreds of files in Windows. In each comparison I have to compare a pair of files.
The files could be binaries or text files (all kinds)
I'm looking for the fastest time to run this. I have to check if the content is the same or not (I have nothing to do with the content itself - I have to report == or != ).
What could be the fastest way to do so? fc.exe? something else?
If fc.exe is the answer, are there any parameters that should accelerate response time?
I'd prefer to use an EXE that's part of standard Windows installation (but it's not a must).
THANK YOU

I'm assuming you want to do a binary comparison.
I would use the following to compare two files:
fc "file1" "file2" /b >nul && echo "file1" == "file2" || "file1" != "file2"
EDIT
If you have many very large files to compare, it may be worth while comparing file sizes before using FC to compare the entire file. I used the same indicator variable so that I could define the actions to take upon result of "same" or "different" just once, without resorting to CALLed subroutines. A CALL is relatively slow.
set "same="
for %%A in ("file1") do for %%B in ("file2") do (
if %%~zA equ %%~zB fc %%A %%B /b >nul && set "same=1"
)
if defined same (
echo "file1" == "file2"
) else (
echo "file1" != "file2"
)

You can get a CRC hex string of each file using a third party command line tool and compare the hex strings.
Depending on how you are comparing sets of files, when using this method then each file only needs to be read once.

#foxidrive answer has merit, especially when the files are big and on the same physical drive which, unless the comparing software slurps big file chunks at a time causes disk thrashing.
Two utils available are
Inbuilt (for windows 10 at least)
C:\Windows\System32\certutil.exe
This has a slightly awkward (enormous) parameter set but works well, does many more useful conversions and can use a large number of hash algorithms (sorry I lost the link to the full list of algorithms available ); it definitely works with sha1 sha256 sha384 md5.
Ex. certutil -hashfile "E:\huge.7z" md5
and the Microsoft utility fciv (md5 generator by default)
Google for "Microsoft File Checksum Integrity Verifier" and download from Microsoft
Ex. fciv bigFile.7z
I know this is an old question, but I haven't seen these two utilities mentioned much... Hope it helps someone.

Related

Script to move all files starting with the same 7 letters in a different folder named after first 7 chars of its future content

All files are in a directory (over 500 000 files), named in the following pattern
AR00001_1
AR00001_2
AR00001_3
AR00002_1
AR00002_2
AR00002_3
I need a script, can be both batch or unix shell that takes everything with AR00001 and moves it into a new folder that will be called AR00001, and does the same for AR00002 files etc
Here's what I've been trying to figure out until now
for f in *_*; do
DIR="$( echo ${f%.*} | tr '_' '/')"
mkdir -p "./$DIR"
mv "$f" "$DIR"
done
Thanks
// Update
Ran this in the CMD
for %F in (c:\test\*) do (md "d:\destination\%~nF"&move "%F" "d:\destination\%~nF\") >nul
Seems to be almost what I wanted, except that it does not take the first 7 characters as a substring but instead creates a folder for each file :/ I'm trying to mix it with your solutions
#echo off
setlocal enabledelayedexpansion
for %%a in (???????_*) do (
set "x=%%a"
set "x=!x:~0,7!"
md "!x!" >nul
move "!x!*" "!x!\" 2>nul
)
for every matching file do:
- get the first 7 characters
- create a folder with that name (ignore error message, if exist)
- move all files that start with those 7 characters (ignore errormessages, if files doesn't exist (already moved))
The following achieves the desired effect and checks for non-existence of the target directory each time before creating it.
#echo off
setlocal ENABLEDELAYEDEXPANSION
set "TOBASE=c:\target\"
set "MATCHFILESPEC=AR*"
for %%F in ("%MATCHFILESPEC%") do (
set "FILENAME=%%~nF"
set "TOFOLDER=%TOBASE%!FILENAME:~0,7!"
if not exist "!TOFOLDER!\" md "!TOFOLDER!"
move "%%F" "!TOFOLDER!" >nul
)
endlocal
In the move command, by moving only the current file rather than including a wildcard, we ensure that we're not eating up file names that might be about to appear the next time around the loop. Keeping it simple, assuming that efficiency is not of prime importance.
I'd recommend prototyping by creating batch files (with a .bat or .cmd extension) rather than trying to do complex tasks interactively using on one-liners. The behaviour can be different and there are more things you can do in a batch file, such as using setlocal to turn on delayed expansion of variables. It's also just a pain writing for loops using the %F interactively, only to have to remember to convert all those to %%F, %%~nF, etc. when pasting into a batch file for posterity.
One word of caution: with 500,000 files in the folder, and all of the files having very similar prefixes, if your file system has 8.3 directory naming turned on (which is often the default) it is possible to run into problems using wildcards. This happens as the 8.3 namespace gets more and more busy and there are fewer and fewer options for ways the file name can be encoded in 8 characters. (The hash table fills up and starts overflowing into unexpected file names).
One solution is to turn that feature off on the server but that may have severe implications for any legacy applications. To see what the file looks like in 8.3 naming scheme, you can do, e.g.:
dir /x /p AR*
... which might give you something like (where the left hand name is the one converted to 8.3):
ARB900~1.TST AR15467_RW322.tst
AR85E3~1.TST AR15468_RW322.tst
ARDDFE~1.TST AR15469_RW322.tst
AR1547~1.TST AR15470_RW322.tst
AR1547~2.TST AR15471_RW322.tst
...
In this example, since the first two characters seem to be maintained, there should be no conflict.
So for example if I say for %a in (AR8*) do #echo %a I get what might at first seem to be incorrect:
AR15468_RW322.tst
AR18565_RW322.tst
AR20376_RW322.tst
AR14569_RW322.tst
AR17278_RW322.tst
...
But this is actually correct; it is all the files that match AR8* in both the long file name and short file name formats.
Edit: I am aware in retrospect that this solution looks very similar to Stephan's, and I had browsed through the existing answers before starting work on my own, so I should credit him. I will try and save face by pointing out a benefit of Stephan's solution. Its use of wildcards should circumvent any 8.3 naming issue: by specifying the wildcard as ???????_*, it only catches the long file names and won't match any of the converted 8.3 file names (all of which are devoid of underscores in that position). Similarly, a wildcard such as AR?????_* would do the same.
With bash, you'd write:
for f in *; do
[[ -d $f ]] && continue # skip existing directories
prefix=${f:0:7} # substring of first 7 characters
mkdir -p "$prefix" # create the directory if it does not exist
mv "$f" "$prefix" # and move the file
done
For the substring expansion, see https://www.gnu.org/software/bash/manual/bash.html#Shell-Parameter-Expansion -- this is probably the bit you're missing.

Renaming large amount of files

i need a script which will rename large amount of files. I got a folder with a lot of files. Every file is named by ID. Then i have a CSV file like this:
oldID;newID
oldID;newID
etc...
Every old and new id is specific and original. I'd like to ask what should be the best way to do it or little help in bash/batch.
The solution for batch is very similar to e0k's solution for bash; you read the file in one line at a time, split the line on semicolons, and rename the file accordingly.
for /f "tokens=1,2 delims=;" %%A in (ids.csv) do ren "%%A" "%%B"
This assumes that your IDs are in a file called ids.csv
If you are using bash (the shell used in world of Linux, UNIX, etc.), you can use the following short script based on this internal field separator answer. This assumes that you are using a semicolon (;) as the delimiter of your "CSV" file and that there is only one such delimiter.
#!/bin/bash
while IFS=';' read -ra names; do
mv "${names[0]}" "${names[1]}";
done < translation.csv
where translation.csv is your file containing the name translations with an oldname;newname format.
If you are instead asking for a batch file (i.e. for Windows, DOS, etc.) then that is a different animal in a different world.
Given that your OS is some unix (like linux), and given that the use of csv files has been your own choice, there might be an easier way to go: mmv can rename many files in one go, using patterns to match original files, and allowing to use the matched strings in the target file names. See http://ss64.com/bash/mmv.html.

How to output multiple user input values into a single variable in BATCH using Windows 7?

SOLVED!
Update: It figures moments after posting for help which is something I never do I'd figure it out...I tend to over think things, and that was the case here, it was just so simple! >.<
Solution:
(This worked under Windows 7 Ultimate 64 Bit)
Set var=
Set var=SomeText %var1% %var2% %var3%
Echo %var% > output.txt
See an explanation in my answer below.
I've been searching and trying several posts here similar to my question for hours with no success. I'm not new to Programming in BATCH but I have memory problems and thus can't always remember things. It also doesn't help that I program in other languages on other platforms which usually means I'm trying to use *nix shell commands in my Windows Batch scripts >.<
I've gotten quite close with some examples but nothing that works as needed.
Ideally, I'd like this work to work on Windows 7, 8, 8.1, Vista and 10 as that is the intended target.
This is what I need to accomplish:
The user will answer a series of questions, each question is stored into a .txt file (or variable if you prefer. I just used text files because of a past project where I ran into issues with variables that couldn't be solved and text files worked). The lines in each text file will need to be output into a single text file, on a single line which will then be read back in as a variable and run. Again, you could just use and combine the variables in your example if that's easier for you or both of us ;P
This is a snippet example of how I was doing it
SET file1=
SET /P file1=file1:%=%
ECHO %file1% > file1.txt
Then
copy /b file1.txt + file2.txt + file3.txt + file4.txt output.txt
Here is how I'd like the result to look
toolkit /S "C:\ToolKit Bravo\Data\etc" "D:\ToolKit Bravo\Data\Ops"
The "" quotation marks are necessary. The output MUST be EXACTLY as shown above for the example I've given. The "/S" & paths are variable NOT fixed!
Here is the best I've been able to come up with using variables..
"toolkit /S "C:\ToolKit Bravo\Data\etc" "D:\ToolKit Bravo\Data\Ops""
Update 2 - An explanation as requested:
The paths in the above example directly above this are not fixed! This was an Example Only. "toolkit" is fixed, this doesn't change. "/S" is an option selected by the user to pass on to the "toolkit". Both the source and destination paths are again input by the user in "quotation" marks. They're not fixed paths.
As you can see the result is surrounded by quotations which is NOT acceptable. And Please remember, I NEED the quotations around the paths in the end result, so removing them all is NOT an option!
Any help is greatly appreciated! Thank you for your time.
Just take all of the characters between the quotes.
SET X="toolkit /S "C:\ToolKit Bravo\Data\etc" "D:\ToolKit Bravo\Data\Ops""
ECHO %X%
SET Y=%x:~1,-1%
ECHO %Y%
Solution:
This solved my problem under Windows 7 Ultimate 64 Bit
Set var=
Set var=SomeText %var1% %var2% %var3%
Echo %var% > textfile.txt
Using the SET command I first made sure the variable or var for short was empty. Using this command:
Set var=
I then proceeded to create my variable using all of the other variables I had created and wanted to combine using this line of code:
Set var=SomeText %var1% %var2% %var3%
Note that I have preceded the variables with "SomeText". This is where I'll place the name of the .exe I'm passing the arguments to, but it can be anything you want included. I also need spaces between each variable so I've left spaces between them in the example code. If you don't want the spaces simply remove them, and you'll have 1234, instead of 1 2 3 4.
Finally I send the combined variable out to a .txt file.
Echo %var% > textfile.txt
However, you could also simply call the new variable now like this:
%var%

Windows batch: Remove lines from a merged file

I am currently making a batch file that merges both, the output from systeminfo, and ipconfig:
#ECHO OFF
pause
systeminfo > "%computername% SystemInfo.txt"
ipconfig >> "%computername% SystemInfo.txt"
"%computername% systeminfo.txt"
The code runs fine and nicely, also independently from OS version and OS language as far as I can tell. My problem though, lies with the systeminfo dump. It lists all 100+ hotfixes that have ever been installed in the machine that is runs on, making the txt file barely legible:
<useful info>
[01]: File 1
[02]: File 1
[03]: File 1
[04]: File 1
....
[150]: file 1
etc...
<useful info>
There's also another problem, namely that this batch file has to run on computers that either run Dutch windows or English windows, meaning that I can't filter on words, because those hotfixes and the words will be different on every computer. Anybody have a nice sollution to this problem.
Note: I have seen it solved the other way around, leaving only the relevant info using findstr. But, because that depends on the language, it is not a viable option.
Edit: The hotfixes are named differently on different OS'es as well, meaning that I can't filter on those. Example: on the XP SP3 I tested, most of the list will be compromised of hotfixes called "[##]file1" on vista however, you will see hex values in the list.
EDIT
My original answer did not work, but I have another idea that works as long as the number and order of each systeminfo header is consistent. I am relying on the fact that the Hotfix(s): is always the 31st header.
#echo off
setlocal enableDelayedExpansion
>systemInfo.txt (
set cnt=0
for /f "delims=" %%A in ('systeminfo') do (
set "ln=%%A"
if "!ln:~0,1!"==" " (if !cnt! neq 31 echo !ln!) else (
echo !ln!
set /a cnt+=1
)
)
ipconfig
)
If the number and/or order of the headers can change, then I don't see how there can be a solution, other then to bite the bullet and look for the specific header text, accounting for all languages that you need to support.
Original failed answer
I don't know how reliable this is. It works for me on my machine, but it would not surprise me if on some machines it strips things it shouldn't.
>systemInfo.txt (
systeminfo|findstr /vxrc:" \[[0-9]*\]: [^ ]*"
ipconfig
)
All my hotfixes begin with KB, followed by a string of numbers. If this is always true, then the above could be improved as:
>systemInfo.txt (
systeminfo|findstr /vxrc:" \[[0-9]*\]: KB[0-9]*"
ipconfig
)
I don't want to figure it out, but you could format it based on the csv output from systeminfo.
systeminfo /fo csv > info.csv
The output, for any language will basically be:
(headers)"<col>", "<col>", "<col>" <...> "<col>"<newline>
(data)"<col>", "<col>", "<col>" <...> "<col>"
The hotfix column is the second to last column. so you could split it by quotes, and ignore that field. It'll have a bunch of crap in it, but it will still be "hotfix, hotfix, hotfix," so you can just remove the whole thing in quotes.
String manipulation in batch is awful if you ask me. If this were me, I'd do it in a language with a string library and call that instead.

Strange Windows DIR command behavior

I discovered this quite by accident while looking for a file with a number in the name. When I type:
dir *number*
(where number represents any number from 0 to 9 and with no spaces between the asterisks and the number)
at the cmd.exe command prompt, it returns various files do not appear in any to fit the search criteria. What's weird, is that depending on the directory, some numbers will work and not others. An example is, in a directory associated with a website, I type the following:
dir *4*
and what is returned is:
Directory of C:\Ampps\www\includes\pages
04/30/2012 03:55 PM 153 inventory_list_retrieve.php
06/18/2012 11:17 AM 6,756 ix.html
06/19/2012 01:47 PM 257,501 jquery.1.7.1.js
3 File(s) 264,410 bytes
0 Dir(s) 362,280,906,752 bytes free
That just doesn't make any sense to me. Any clue?
The question is posed on stackOverflow because the DIR command is often combined with FOR in batch programs. The strange DIR behavior would seem to make batch programs potentially unreliable if they use the DIR command.
Edit: (additional note). Though much time has passed, I discovered another quirk with this that almost cost me a lot of work. I wanted to delete all .htm files in a particular directory tree. I realized just before doing it that *.htm matches .html files as well. Also, *.man matches .manifest, and there are probably others. Deleting all .html files in that particular directory would have been upsetting to say the least.
Wild cards at the command prompt are matched against both the long file name and the short "8.3" name if one is present. This can produce surprises.
To see the short names, use the /X option to the DIR command.
Note that this behavior is not in any way specific to the DIR command, and can lead to other (often unpleasant) surprises when a wild card matches more than expected on any command, such as DEL.
Unlike in *nix shells, replacement of a file pattern with the list of matching names is implemented within each command and not implemented by the shell itself. This can mean that different commands could implement different wild card pattern rules, but in practice this is quite rare as Windows provides API calls to search a directory for files that match a pattern and most programs use those calls in the obvious way. For programs written in C or C++ using the "usual" tools, that expansion is provided "for free" by the C runtime library, using the Windows API.
The Windows API in question is FindFirstFile() and its close relatives FindFirstFileEx(), FindNextFile(), and FindClose().
Oddly, although the documentation for FindFirstFile() describes its lpFileName parameter as "directory or path, and the file name, which can include wildcard characters, for example, an asterisk (*) or a question mark (?)" it never actually defines what the * and ? characters mean.
The exact meaning of the file pattern has history in the CP/M operating system dating from the early 1970s that strongly influenced (some might say "was directly copied" in place of "influenced" here) the design of MSDOS. This has resulted in a number of "interesting" artifacts and behaviors. Some of this at the DOS end of the spectrum is described at this blog post from 2007 where Raymond describes exactly how file patters were implemented in DOS.
Yep. You'll see that it also searches through short names if you try this:
dir /x *4*
(/x switch is for short names)
for filtering file names use :
dir /b | find "4"
A quote from RBerteig's answer:
Note that this behavior is not in any way specific to the DIR command,
and can lead to other (often unpleasant) surprises when a wild card
matches more than expected on any command, such as DEL.
The above is true even for the FOR command, which is very nasty.
for %A in (*4*) do #echo %A contains a 4
will also search the short names. The solution again would be to use FIND or FINDSTR to filter out the names in a more reliable manner.
for %A in (*) do #echo %A | >nul findstr 4 && echo %A contains a 4
Note - change %A to %%A if using the command within a batch file.
Combining FOR with FINDSTR can be a general purpose method to safely use any command that runs into problems with short file names. Simply replace ECHO with the problem command such as COPY or DEL.
Seems like dir command searches also short ( 8.3 manner ) file names under the hood.
When I call dir *1* this is what I get:
Volume in drive C is System
Volume Serial Number is F061-0B78
Directory of C:\Users\Piotrek\Desktop\Downloads
2012-05-20 17:33 23 639 040 gDEBugger-5_8.msi
2012-05-20 17:30 761 942 glew-1.7.0.zip
2012-05-20 17:11 9 330 176 irfanview_plugins_433_setup.exe
2012-05-24 20:17 4 419 192 SumatraPDF-2.1.1-install.exe
2012-05-15 22:55 3 466 248 TrueCrypt Setup 7.1a.exe
5 File(s) 1 127 302 494 bytes
There is a gDEBugger-5_8.msi file amongst listed ones, which apparently does not have any 1 character in it.
Everything becomes clear when I use /X switch with the dir command, which makes dir use 8.3 file names. Output from a dir /X *1* command:
Volume in drive C is System
Volume Serial Number is F061-0B78
Directory of C:\Users\Piotrek\Desktop\Downloads
2012-05-20 17:33 23 639 040 GDEBUG~1.MSI gDEBugger-5_8.msi
2012-05-20 17:30 761 942 GLEW-1~1.ZIP glew-1.7.0.zip
2012-05-20 17:11 9 330 176 IRFANV~1.EXE irfanview_plugins_433_setup.exe
2012-05-24 20:17 4 419 192 SUMATR~1.EXE SumatraPDF-2.1.1-install.exe
2012-05-15 22:55 3 466 248 TRUECR~1.EXE TrueCrypt Setup 7.1a.exe
5 File(s) 1 127 302 494 bytes
Quote from dir's help:
/X This displays the short names generated for non-8dot3 file
names. The format is that of /N with the short name inserted
before the long name. If no short name is present, blanks are
displayed in its place.

Resources