Can MD5 hash be inconsistent on large files (>2GB) on USB flashdrives?

Can MD5 hash be inconsistent on large files (>2GB) on USB flashdrives? - powershell-4.0

I have not seen any articles dealing with this specific set of issues/questions for PowerShell in my research (no direct answers found, only slightly similar ones).
Issue:
I have a PowerShell test-script that compares MD5 hash content of Gold Files (Standards) on a Windows 7,8,or 10 PC to files that have been written to USB drives (Copies or Reproductions of Standards). Basically verifying that files on the USB drive are exactly the same before sending them out the door. USB drives are NTFS formatted, and Read-Only (or Write denied) after files are transferred to it. If file name, size, or MD5 is different, the test-script FAILS the file. Issue arise inconsistently with larger files (>2GB), for example "SQLEXPRADV_x86_ENU.exe". Some pass, some do not?
When failures occur, they are consistent on what file fails (no random occurrence); however, the MD5 hash is different from the original file? Transferring USB file to local hard drive, still produces failure when comparing. Some USB drives with same file will pass. Both FAIL and PASS files have the same file size.
USB drive copies are made in different department/vendor, so I do not have details on exactly how files are transferred to USB drives.
$fileName = "C:\DISK1\ISSetupPrerequisites\{2917C802-9F7E-4eb4-87E6-6CF7E98BAB95}\SQLEXPRADV_x86_ENU.exe"
$md5 = new-object -TypeName System.Security.Cryptography.MD5CryptoServiceProvider
$file = [System.IO.File]::Open($fileName,[System.IO.Filemode]::Open,[System.IO.FileAccess]::Read)
$cHash = [System.BitConverter]::ToString($md5.ComputeHash($file))
================================
Questions:
1) Are there any issues using MD5 with large files (other than the chance to produce the same hash)? I know its a silly question, but I have to ask.
2) Would transferring a large file (>2GB) to a USB flash drive provide a chance of altering the MD5 hash content without changing file size? Can bits be flipped somehow by the USB micro-controller without changing file content? Any thing different with the nature of USB data storage that could cause concern?
3) Would the method of copying files from Gold/Standard files to the USB drive matter on a Windows system? Copy & Paste, Drag & Drop, or xcopy command, etc. (excluding indexing/meta-data; for example "systeminfo.ini")

Related

read/write to a disk without a file system

I would like to know if anybody has any experience writing data directly to disk without a file system - in a similar way that data would be written to a magnetic tape. In particular I would like to know if/how data is written in blocks, and whether a certain blocksize needs to be specified (like it does when writing to tape), and if there is a disk equivalent of a tape file mark, which separates the archives written to a tape.
We are creating a digital archive for over 1 PB of data, and we want redundancy built in to the system in as many levels as possible (by storing multiple copies using different storage media, and storage formats). Our current system works with tapes, and we have a mechanism for storing the block offset of each archive on each tape so we can restore it.
We'd like to extend the system to work with disk volumes without having to change much of the logic. Another advantage of not having a file system is that the solution would be portable across Operating Systems.
Note that the ability to browse the files on disk is not important in this application, since we are considering this for an archival copy of data which is not accessed independently. Also note that we would already have an index of the files stored in the application database, which we also write to the end of the tape/disk when it is almost full.
EDIT 27/05/2020: It seems that accessing the disk device as a raw/character device is what I'm looking for.

how does physical disk read work with volume shadow for ntfs?

my goal is to make a backup program reading a physical disk (with NTFS partitions) while using VSS for data consistency.
i use windows api's functions CreateFile with '\.\PhysicalDriveN'
as described here (basically, it allow me to access a disk as a big file)
https://support.microsoft.com/en-us/help/100027/info-direct-drive-access-under-win32
for tests i create volume shadows with this command
wmic shadowcopy call create Volume='C:\'
this is a temporary solution, i plan on using VSS via the program itself
My question is:
how are stored Volume shadows? does it stores data that have been modified since the volume shadow or does it store modification made since the last volume shadow?
in the first case:
when i read the disk, will i get consistent data (including ntfs metadata files)?
in the other case:
can i access a volume shadow the same way i would access a disk/partition? (in order to read hidden metadata files, etc)
-im am currenctly using windows 7 but planning on using it on differents version of windows server
-i've read a lot of microsoft doc about VSS but how it work seem really unclear for me (if you answer with one please explain a bit it meaning)
-i know that Volume shadows are stored in the folder "System Volume Information" as files with names like {3808876b-c176-4e48-b7ae-04046e6cc752}

"how are stored Volume shadows? does it stores data that have been modified since the volume shadow or does it store modification made since the last volume shadow?"
A hardware or software shadow copy provider uses one of the following methods for creating a shadow copy:(Answer by msdn doc)
Complete copy This method makes a complete copy (called a "full copy"
or "clone") of the original volume at a given point in time. This copy
is read-only.
Copy-on-write This method does not copy the original volume. Instead,
it makes a differential copy by copying all changes (completed write
I/O requests) that are made to the volume after a given point in time.
Redirect-on-write This method does not copy the original volume, and
it does not make any changes to the original volume after a given
point in time. Instead, it makes a differential copy by redirecting
all changes to a different volume.
"when i read the disk, will i get consistent data (including ntfs metadata files)?"
Even if an application does not have its files open in exclusive mode, it is possible—because of the finite time needed to open, back up, and close a file—that files copied to storage media may not all reflect the same application state.
"can i access a volume shadow the same way i would access a disk/partition? (in order to read hidden metadata files, etc)"
Requester Access to Shadow Copied Data
Paths on the shadow copied volume are obtained by replacing the root
of the original path with the device object. For example, given a path
on the original volume of "C:\DATABASE*.mdb" and a VSS_SNAPSHOT_PROP
instance of snapProp, you would obtain the path on the shadow copied
volume by concatenating snapProp.m_pwszSnapshotDeviceObject, "\",
and "\DATABASE*.mdb".

So i did more test and actually Shadow Volume are made at block level not file level. it mean that by using createfile with the path
\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy1 it would work in a similar way than using createfile with the path \\.\C:
So yeah you can access a shadow copy file system, it have it own boot sector, mft, etc.

Creating virtual disk with arbitrary size

I want to do some "experimentation" to learn about the file systems starting from FAT16.
The idea is to use a C++ program to manipulate a disk at byte level and then see how it is read by Windows.
In short, format the disk to FAT16, create files, create directories, rename files, delete files, delete directories, change properties of files, see what happens if I tamper with sector number of files e.t.c. It wil all use the C++ readfile and writefile functions.
Having a "virtual disk" will make things considerably easier as no hardware will be made corrupt and the disk can be "reset" easily.
Yes, I am an electronic engineer so have to work at low level of hardware.

Undo a botched command prompt copy which concatenated all of my files

In a Windows 8 Command Prompt, I had a backup drive plugged in and I navigated to my User directory. I executed the command:
copy Documents G:/Seagate_backup/Documents
What I assumed was that copy would create the Documents directory on my backup drive and then copy the contents of the C: Documents directory into it. That is not what happened!
I proceeded to wipe my hard-drive and re-install the operating system, thinking I had backed up the important files, only to find out that copy seemingly concatenated all the C: Documents files of different types (.doc, .pdf, .txt, etc) into one file called "Documents." This file is of course unreadable but opening it in Notepad reveals what happened. I can see some of my documents which were plain text throughout the massively long file.
How do I undo this!!? It's terrible because I was actually helping a friend and was so sure of myself but now this has happened. The only thing I can think of doing is searching for some common separator amongst the concatenated files and write some sort of script to split the file back apart. But then I would have to guess the extensions of each of the pieces...

Merging files together in the fashion that copy uses, discards important file system information such as file size and file name. While the file name may not be as important the size is. Both parameters are used by the OS to discriminate files.
This problem might sound familiar if you have damaged your file allocation table before and all files disappeared. In both cases, you will end up with a binary blob (be it an actual disk or something like your file which might resemble a disk image) that lacks any size and filename information.
Fortunately, this is where a lot of file system recovery tools can help. They are specialized in matching patterns. Specifically they are looking for giveaway clues to what type a file is of, where it starts and what it's size is.
This is for instance enabled by many file types having a set of magic numbers that are used to allow a program to check if a file really is of the type that the extension claims to be.
In principle it is possible to undo this process more or less well.
You will need to use data recovery tools or other analysis tools like binwalk to extract the concatenated binary blob. Essentially the same tools that are used to recover deleted files should be able to extract your documents again. Without any filename of course. I recommend renaming the file to a disk image (.img) and either mounting it from within the operating system as a virtual harddisk (don't worry that it has no file system - it should show up as an unformatted drive) or directly using a data recovery tool or analysis tool which can read binary files (binwalk, for instance, can do that directly, but may not find all types of files as it's mainly for unpacking firmware images that may be assembled in the same or a similar way to how your files ended up).

Writing to /dev/loop USB image?

I've got an image that I write onto a bootable USB that I need to tweak. I've managed to mount the stick as /dev/loopX including allowing for the partition start offset, and I can read files from it. However writing back 'seems to work' (no errors reported) but after writing the resulting tweaked image to a USB drive, I can no longer read the tweaked files correctly.
The file that fails is large and also a compressed tarfile.
Is writing back in this manner simply a 'no-no' or is there some way to make this work?
If possible, I don't want to reformat the partition and rewrite from scratch because that will (I assume) change the UUID and then I need to go worry about the boot partition etc.

I believe I have the answer. When using losetup to create a writeable virtual device from the partition on your USB drive, you must specify the --sizelimit parameter as well as the offset parameter. If you don't then the resulting writes can go past the last defined sector on the partition (presumably requires your USB drive to have extra space). Linux reports no errors until later when you try to read. The key hints/evidence for this are that when reads (or (re)written data) fail, dmesg shows read errors attempting to read past the end of the drive. fsck tools such as dosfsck also indicate that the drive claims to be larger than it is.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio