Copyfile and clusters reservation for Windows

Copyfile and clusters reservation for Windows - windows

What is the OS (XP, Vista, Win7) behavior for copying files (with CopyFile) ?
When does it reserve clusters to copy to? which of the following ?
it reserves all destination clusters before starting to copy
it reserves some clusters, then copy a file portion to
these clusters, then, reserves additional clusters, then
copy a new file portion to these new reserved clusters,
etc.

The copy operation used by Explorer and cmd.exe reserves most of the disk space immediately, at least on my Windows 7 32-bit, as you can see by watching the free space on the volume. To the best of my recollection this behaviour has been the same in all versions of Windows since at least NT 4.
However, there are several caveats:
Explorer and cmd.exe don't (necessarily) use CopyFile.
This behaviour might be different in different versions of Windows, or depending on circumstances.
It might be only most of the destination clusters, for example it might sometimes needs to expand the MFT to complete the operation; I don't think this is likely, but I can't rule it out.
My recommendation:
If a slim possibility of the occasional failure is acceptable, test CopyFile and if it behaves as expected go ahead and use it.
If it isn't, consider doing the copy yourself. Unfortunately that last caveat might apply even then, but as I said I think it's probably not a significant risk.
You need to be prepared to cope with an unexpected failure either way since hardware faults, or perhaps even file system corruption, could cause the copy to fail part way through.

Related

A Linux Kernel Module for Self-Optimizing Hard Drives: Advice?

I am a computer engineering student studying Linux kernel development. My 4-man team was tasked to propose a kernel development project (to be implemented in 6 weeks), and we came up with a tentative "Self-Optimizing Hard Disk Drive Linux Kernel Module". I'm not sure if that title makes sense to the pros.
We based the proposal on this project.
The goal of the project is to minimize hard disk access times. The plan is to create a special partition where the "most commonly used" files are to be placed. An LKM will profile, analyze, plan, and redirect I/O operations to the hard disk. This LKM should primarily be able to predict and redirect all file access (on files with sizes of < 10 MB) with minimal overhead, and lessen average read/write access times to the hard disk. I believe Apple's HFS has this feature.
Can anybody suggest a starting point? I recently found a way to redirect I/O operations by intercepting system calls (by hijacking all the read/write ones). However, I'm not convinced that this is the best way to go. Is there a way to write a driver that redirects these read/write operations? Can we perhaps tap into the read/write cache to achieve the same effect?
Any feedback at all is appreciated.

You may want to take a look at Unionfs. You don't even need a LKM - just a some user-space daemon which would subscribe to inotify events, keep statistics and migrate files between partitions. Unionfs will combine both partitions into a single logical filesystem.

There are many ways in which such optimizations might be useful:
accessing file A implies file B access is imminent. Example: opening an icon file for a media file by a media player
accessing any file in some group G of files means that other files in the group will be accessed shortly. Example: mysql receives a use somedb command which implies all the file tables, indexes, etc. will be accessed.
a program which stops reading a sequential file suggests the program has stalled or exited, so predictions of future accesses associated with that file should be abandoned.
having multiple (yet transparent) copies of some frequently referenced files strategically sprinkled about can use the copy nearest the disk heads. Example: uncached directories or small, frequently accessed settings files.
There are so many possibilities that I think at least 50% of an efficient solution would be a sensible, limited specification for what features you will attempt to implement and what you won't. It might be valuable to study how Microsoft's Vista's aggressive file caching mechanism disappointed.
Another problem you might encounter with a modern Linux distribution is how well the system already does much of what you plan to improve. In fact, measuring the improvement might be a big challenge. I suggest writing a benchmark program which opens and reads a series of files and precisely times the complete sequence. Run it several times with your improvements enabled and disabled. But you'll have to reboot in between for valid timing....

What reliability guarantees are provided by NTFS?

I wonder what kind of reliability guarantees NTFS provides about the data stored on it? For example, suppose I'm opening a file, appending to the end, then closing it, and the power goes out at a random time during this operation. Could I find the file completely corrupted?
I'm asking because I just had a system lock-up and found two of the files that were being appended to completely zeroed out. That is, of the right size, but made entirely of the zero byte. I thought this isn't supposed to happen on NTFS, even when things fail.

NTFS is a transactional file system, so it guarantees integrity - but only for the metadata (MFT), not the (file) content.

The short answer is that NTFS does metadata journaling, which assures valid metadata.
Other modifications (to the body of a file) are not journaled, so they're not guaranteed.
There are file systems that do journaling of all writes (e.g., AIX has one, if memory serves), but with them, you tend to get a tradeoff between disk utilization and write speed. IOW, you need a lot of "free" space to get decent performance -- they basically just do all writes to free space, and link that new data into the right spots in the file. Then they go through and clean out the garbage (i.e., free up parts that have since been overwritten, and usually coalesce the pieces of a file together as well). This can get slow if they have to do it very often though.

Does windows have same maximum path lengths (name of directory entry) for different filesystems it mounts?

I have to know if a specific vulnerablity in TCL 8.4 affects Windows platform
The vulnerability is: http://www.securityfocus.com/bid/15259/info
As per the link:
Operating systems with no difference in the maximum path lengths among differing file systems are not affected by this issue
I am using TCL on windows and want to know if this vulnerablity affects TCL on windows and how ?
Further, how can a person exploit this vulnerability on Windows ?
Thanks

The windows header files define MAX_PATH - as 260 - as the usual maximum path size. This isn't really universally applied. There are a number of ways to bypass this limit, in which case the effective path limit is, well, unlimited. Or 32,767 characters. Whichever is shorter.
Naming, Files, Paths and Namespaces has more info.

While there exist common conventions regarding maximum file name and path length, certain file system drivers (or third-party file system implementations) might have their own limits which can be lower, than the commonly used ones.

That article does not mention any vulnerability of systems hosted on Windows to this at all; the standard recommended size of buffer to be allocated there is sufficiently long to hold any legal filename. This is specifically true for Tcl (Tk does not do directory scanning except via Tcl's interfaces).
Exploiting the vulnerability on Windows is going to be hard (and impossible with Tcl, which is very careful with buffer management). If you're on another platform, you are recommended to switch to a later patchlevel of Tcl; the current version is 8.4.19. (Actually, you're recommended to switch to the 8.5 series – currently 8.5.9 – as 8.4 as basically been EOLed; there will be maybe one more roll-up release on that branch but bugfixes are now only committed to 8.4 for critical things like demonstrated security issues or build-chain problems.)
Note that, since Tcl has never allocated buffers for holding a whole path directly anyway, it's not clear how this sort of thing could cause an exploit in the first place. The article does state that there is no instance of this issue in the wild.

Drawbacks of using /LARGEADDRESSAWARE for 32-bit Windows executables?

We need to link one of our executables with this flag as it uses lots of memory.
But why give one EXE file special treatment. Why not standardize on /LARGEADDRESSAWARE?
So the question is: Is there anything wrong with using /LARGEADDRESSAWARE even if you don't need it. Why not use it as standard for all EXE files?

blindly applying the LargeAddressAware flag to your 32bit executable deploys a ticking time bomb!
by setting this flag you are testifying to the OS:
yes, my application (and all DLLs being loaded during runtime) can cope with memory addresses up to 4 GB.
so don't restrict the VAS for the process to 2 GB but unlock the full range (of 4 GB)".
but can you really guarantee?
do you take responsibility for all the system DLLs, microsoft redistributables and 3rd-party modules your process may use?
usually, memory allocation returns virtual addresses in low-to-high order. so, unless your process consumes a lot of memory (or it has a very fragmented virtual address space), it will never use addresses beyond the 2 GB boundary. this is hiding bugs related to high addresses.
if such bugs exist they are hard to identify. they will sporadically show up "sooner or later". it's just a matter of time.
luckily there is an extremely handy system-wide switch built into the windows OS:
for testing purposes use the MEM_TOP_DOWN registry setting.
this forces all memory allocations to go from the top down, instead of the normal bottom up.
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management]
"AllocationPreference"=dword:00100000
(this is hex 0x100000. requires windows reboot, of course)
with this switch enabled you will identify issues "sooner" rather than "later".
ideally you'll see them "right from the beginning".
side note: for first analysis i strongly recommend the tool VMmap (SysInternals).
conclusions:
when applying the LAA flag to your 32bit executable it is mandatory to fully test it on a x64 OS with the TopDown AllocationPreference switch set.
for issues in your own code you may be able to fix them.
just to name one very obvious example: use unsigned integers instead of signed integers for memory pointers.
when encountering issues with 3rd-party modules you need to ask the author to fix his bugs. unless this is done you better remove the LargeAddressAware flag from your executable.
a note on testing:
the MemTopDown registry switch is not achieving the desired results for unit tests that are executed by a "test runner" that itself is not LAA enabled.
see: Unit Testing for x86 LargeAddressAware compatibility
PS:
also very "related" and quite interesting is the migration from 32bit code to 64bit.
for examples see:
As a programmer, what do I need to worry about when moving to 64-bit windows?
https://www.sec.cs.tu-bs.de/pubs/2016-ccs.pdf (twice the bits, twice the trouble)

Because lots of legacy code is written with the expectation that "negative" pointers are invalid. Anything in the top two Gb of a 32bit process has the msb set.
As such, its far easier for Microsoft to play it safe, and require applications that (a) need the full 4Gb and (b) have been developed and tested in a large memory scenario, to simply set the flag.
It's not - as you have noticed - that hard.
Raymond Chen - in his blog The Old New Thing - covers the issues with turning it on for all (32bit) applications.

No, "legacy code" in this context (C/C++) is not exclusively code that plays ugly tricks with the MSB of pointers.
It also includes all the code that uses 'int' to store the difference between two pointer, or the length of a memory area, instead of using the correct type 'size_t' : 'int' being signed has 31 bits, and can not handle a value of more than 2 Gb.
A way to cure a good part of your code is to go over it and correct all of those innocuous "mixing signed and unsigned" warnings. It should do a good part of the job, at least if you haven't defined function where an argument of type int is actually a memory length.
However that "legacy code" will probably apparently work right for quite a while, even if you correct nothing.
You'll only break when you'll allocate more than 2 Gb in one block. Or when you'll compare two unrelated pointers that are more than 2 Gb away from each other.
As comparing unrelated pointers is technically an undefined behaviour anyway, you won't encounter that much code that does it (but you can never be sure).
And very frequently even if in total you need more than 2Gb, your program actually never makes single allocations that are larger than that. In fact in Windows, even with LARGEADDRESSAWARE you won't be able by default to allocate that much given the way the memory is organized. You'd need to shuffle the system DLL around to get a continuous block of more than 2Gb
But Murphy's laws says that kind of code will breaks one day, it's just that it will happen very long after you've enable LARGEADDRESSAWARE without checking, and when nobody will remember this has been done.

Windows disk partition gap

Windows XP Disk Defragmenter report shows a constant gap in disk usage on a number of disk partitions on my system. I'm not referring to the little transitory gaps that occur. In disk D below, the gap in question is the one under the word "defragmentation". In disk P below, the gap is the one under "usage before def" the but a bigger one. The C partition doesn't have this anomaly. The size and placement pattern isn't obvious. It is as though there was an area, a no-man's land, that both the file system and the defragmenter avoid. These gaps survive daily use and defragmentation. I don't believe this is a residue from a paging file -- it should show up in green, anyway. Recycle bin is empty.
Any ideas?
Disk D (20 Gig):
Disk P (40 Gig):

That is probably the space reserved for the MFT, which will only be used for files if the disk gets really full. This empty space allows it to grow for a while without getting fragmented.
References:
How NTFS reserves space for its Master File Table (MFT)

No idea what's causing this, but the defragger that comes with Win XP is Diskkeeper Lite, which is not very good. A better defragger might get rid of the gap if it's not being caused by anything. I personally use O&O Defrag; it's not free, but there's a 30-day trial.

Defragging to the point that there are absolutely no gaps is not necessarily a good thing. Some OSs/FileSystems try to pack files in as tightly as possible and fill without gaps where possible.
The problem with this is if any of the earlier files get changed or appended to then you are either leaving an early gap (which will tend to case fragments) or forcing the extra bit to be entered at the next gap (creating a fragment again).
Defrag when you start getting weird behaviour (quite often it helps... even though it is not supposed to); however you don't need to do it every day, nor is a totally defragmented drive a sign of a particularly health drive.

Like the poster above said, that's most likely the reserved zone for the MFT. When the drive is formatted, about 12.5% of the partition is reserved for the MFT, and this can grow as needed to accomodate new records if the initial allocation is used up. Mind you, the MFT can also fragment if the adjacent contiguous free space is not large enough to accomodate the expansion.
Reg. defragging, instead of defragging manually regularly, save yourself the trouble and get Diskeeper. The newest version i.e 2008 Professional is fully automatic and defrags in the background using idle resources. There is also a manual/scheduled defrag mode, but I don't see any reason to waste my time; it does a fine job running on automatic on my systems.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio