Clear file cache to repeat performance testing - performance

What tools or techniques can I use to remove cached file contents to prevent my performance results from being skewed? I believe I need to either completely clear, or selectively remove cached information about file and directory contents.
The application that I'm developing is a specialised compression utility, and is expected to do a lot of work reading and writing files that the operating system hasn't touched recently, and whose disk blocks are unlikely to be cached.
I wish to remove the variability I see in IO time when I repeat the task of profiling different strategies for doing the file processing work.
I'm primarily interested in solutions for Windows XP, as that is my main development machine, but I can also test using linux, and so am interested in answers for that environment too.
I tried SysInternals CacheSet, but clicking "Clear" doesn't result in a measurable increase (restoration to timing after a cold-boot) in the time to re-read files I've just read a few times.

Use SysInternal's RAMMap app.
The Empty / Empty Standby List menu option will clear the Windows file cache.

For Windows XP, you should be able to clear the cache for a specific file by opening the file using CreateFile with the FILE_FLAG_NO_BUFFERING options and then closing the handle. This isn't documented, and I don't know if it works on later versions of Windows, but I used this long ago when writing test code to compare file compression libraries. I don't recall if read or write access affected this trick.

A command line utility can be found here
from source:
EmptyStandbyList.exe is a command line tool for Windows (Vista and
above) that can empty:
process working sets,
the modified page list,
the standby lists (priorities 0 to 7), or
the priority 0 standby list only.
Usage:
EmptyStandbyList.exe workingsets|modifiedpagelist|standbylist|priority0standbylist

A quick googling gives these options for Linux
Unmount and mount the partition holding the files
sync && echo 1 > /proc/sys/vm/drop_caches

#include <fcntl.h>
int posix_fadvise(int fd, off_t offset, off_t len, int advice);
with advice option POSIX_FADV_DONTNEED:
The specified data will not be accessed in the near future.

I've found one technique (other than rebooting) that seems to work:
Run a few copies of MemAlloc
With each one, allocate large chunks of memory a few times
Use Process Explorer to observe the System Cache size reducing to very low levels
Quit the MemAlloc programs
It isn't selective though. Ideally I'd like to be able to clear the specific portions of memory being used for caching the disk blocks of files that I want to no longer be cached.

For a much better view of the Windows XP Filesystem Cache - try ATM by Tim Murgent - it allows you to see both the filesystem cache Working Set size and Standby List size in a more detailed and accurate view. For Windows XP - you need the old version 1 of ATM which is available for download here since V2 and V3 require Server 2003,Vista, or higher.
You will observe that although Sysinternals Cacheset will reduce the "Cache WS Min" - the actual data still continues to exist in the form of Standby lists from where it can be used until it has been replaced with something else. To then replace it with something else use a tool such as MemAlloc or flushmem by Chad Austin or Consume.exe from the Windows Server 2003 Resource Kit Tools.

As the question also asked for Linux, there is a related answer here.
The command line tool vmtouch allows for adding and removing files and directories from the system file cache, amongst other things.

There's a windows API call https://learn.microsoft.com/en-us/windows/desktop/api/memoryapi/nf-memoryapi-setsystemfilecachesize that can be used to flush the file system cache. It can also be used to limit the cache size to a very small value. Looks perfect for these kinds of tests.

Related

Disable or flush page cache on Windows

I assume Windows has a similar concept to Linux's page cache for storing in memory data from disks, like files, executables and dynamic libraries. I wonder if it is possible at all to disable such cache or to the very least to clear/flush it.
This is called Standby List under windows. You can purge it globally, or for one volume, or for one file handle.
Globally
You can do it using a readily available program from Microsoft Technet, by selecting Empty → Empty Standby List
Programmatically, you can achieve the same thing using the undocumented NtSetSystemInformation function, for details see line 239 in a program which does the same thing as the previously mentioned one, among other things.
For one file handle
Open the file with FILE_FLAG_NO_BUFFERING: The documentation is lying insofar as it says that you open the file without buffering, but the true, observable behavior on all Windows versions from Windows 98 up to Windows 8 is that it simply throws away the complete cache contents for that file (for everyone!) and doesn't repopulate the cache from reads that use this handle.
For one complete volume
A volume handle is just a file handle (a somewhat special one, but still), so assuming you have appropriate privilegues to open a volume handle, you can do the same for a complete volume.
Also, as pointed out in the answer here, there seems to be a feature/bug (feature-bug?) which allows you to invalidate a volume's cache even without proper privilegues merely by attepting to open it without shared writes, at least under one recent version of Windows.
It makes perfect sense that this happens when any open which is valid for writing succeeds as you may change filesystem-internal data doing so (insofar it is a feature), but apparently it also works when opening the volume fails (which is a bug).

Does Windows XP have an equivalent to VAX/VMS Installed Shared Images?

Back in the good old/bad old days when I developed on VAX/VMS it had a feature called 'Installed Shared Images' whereby if one expected one's executable program would be run by many users concurrently one could invoke the INSTALL utility thus:
$ INSTALL
INSTALL> ADD ONES_PROGRAM.EXE/SHARE
INSTALL> EXIT
The /SHARE flag had the effect of separating out the code from the data so that concurrent users of ONES_PROGRAM.EXE would all share the code (on a read-only basis of course) but each would have their own copy of the data (on a read-write basis). This technique/feature saved Mbytes of memory (which was necessary in those days) as only ONE copy of the program's code ever needed to be resident in VAX memory irrespective of the number of concurrent users.
Does Windows XP have something similar? I can't figure out if the Control Panel's 'Add Programs/Features' is the equivalent (I think it is, but I'm not sure)
Many thanks for any info
Richard
p.s. INSTALL would also share Libraries as well as Programs in case you were curious
The Windows virtual memory manager will do this automatically for you. So long as the module can be loaded at the same address in each process, the physical memory for the code will be shared between each process that loads that module. That is true for all modules, libraries as well as executables.
This is achieved by the linker marking code segments as being shareable. So, linkers mark code segments as being shareable, and data segments otherwise.
The bottom line is that you do not have to do anything explicit to make this happen.

What can be the reason for Windows error ERROR_DISK_FULL (112) when opening a NTFS alternate data stream?

My application writes some bytes of data to an alternate data stream. This works fine on all but one machine (Windows Server 2003 SP2).
Instead, CreateFile returns ERROR_DISK_FULL when I try to create an alternate data stream (on the root directory). I don't find the reason for this result, because...
There's plenty of space on that drive.
The drive is NTFS formatted (due to GetVolumeInformation).
The drive supports altenate data
streams (due to GetVolumeInformation).
Edit: I can provide some more information about what the reason not is:
I added many streams on a test system which didn't show the error and wondered if the error might occur. It didn't. Instead after about 2000 Streams with long file names another error occurred and persisted: 1450 (ERROR_NO_SYSTEM_RESOURCES).
EDIT: Here is an example for one of the used file names:
char szStreamFileName[] = "C:\\:abcdefghijklmnopqrstuvwxyz1234567890abcdefghijklmnoqrstuvwxyz012345";
EDIT: Our customer uses some corporate antivirus software from Avira on this server. Maybe this is the reason (Alternate data streams can be abused by malware).
After opening a support ticket at MS I know that there was a readonly flag set which one only can set (and reset) with undocumented Windows functions. Nobody knows who set this flag and why, but I sent them an image of the drive (after I got the machine from our customer) and so they figured it out. We only have a workaround in our application (We use another location if we detect this error). Meanwhile we know that some of our customers have this problem.
Are there any compressed/spare files or alternate data streams?
Often backup applications receive ERROR_DISK_FULL errors attempting to back up compressed files and this causes quite a bit of confusion when there are still several gigabytes of free space on the drive. Other issues may also occur when copying compressed files. The goal of this blog is to give the reader a more thorough understanding of what really happens when you compress NTFS files.
From Understanding NTFS Compression
Just another possibility...
Did you check the number of currently opend files in your OS?
The OS support max. number of reserved file handles after that report ERROR_DISK_FULL or ERROR_NO_SYSTEM_RESOURCES.
And second possibility...
The root directory is limited by number of files. As I remember 512 files in older versions of OS. But the NTFS support unlimited number of files in root!
You might want to see what something like Sysinternal's Process Monitor utility captures when trying to create this file - it show the return codes of various APIs involved in the I/O stack and one of them might give you a clue as to why 112 is being returned to you. Hopefully the level of detail in ProcMon is enough - if not, I imagine there are other, more detailed I/O trace facilities for Windows (but I don't know of them off the top of my head)
The filename you give is
char szStreamFileName[] = "C:\\:abcdefghijklm...
it starts with
C:\\:
Is that a typo on the post, or is there really a colon after the slash? I think thats a illegal filename.
If you try to copy a file greater than 2GB from another filesystem (NTFS) to FAT / FAT32 which has a 2GB limit you may see this error.
Just a blind shot, but are the rights set properly?

Locking Executing Files: Windows does, Linux doesn't. Why?

I noticed when a file is executed on Windows (.exe or .dll), it is locked and cannot be deleted, moved or modified.
Linux, on the other hand, does not lock executing files and you can delete, move, or modify them.
Why does Windows lock when Linux does not? Is there an advantage to locking?
Linux has a reference-count mechanism, so you can delete the file while it is executing, and it will continue to exist as long as some process (Which previously opened it) has an open handle for it. The directory entry for the file is removed when you delete it, so it cannot be opened any more, but processes already using this file can still use it. Once all processes using this file terminate, the file is deleted automatically.
Windows does not have this capability, so it is forced to lock the file until all processes executing from it have finished.
I believe that the Linux behavior is preferable. There are probably some deep architectural reasons, but the prime (and simple) reason I find most compelling is that in Windows, you sometimes cannot delete a file, you have no idea why, and all you know is that some process is keeping it in use. In Linux it never happens.
As far as I know, linux does lock executables when they're running -- however, it locks the inode. This means that you can delete the "file" but the inode is still on the filesystem, untouched and all you really deleted is a link.
Unix programs use this way of thinking about the filesystem all the time, create a temporary file, open it, delete the name. Your file still exists but the name is freed up for others to use and no one else can see it.
Linux does lock the files. If you try to overwrite a file that's executing you will get "ETXTBUSY" (Text file busy). You can however remove the file, and the kernel will delete the file when the last reference to it is removed. (If the machine wasn't cleanly shutdown, these files are the cause of the "Deleted inode had zero d-time" messages when the filesystem is checked, they weren't fully deleted, because a running process had a reference to them, and now they are.)
This has some major advantages, you can upgrade a process thats running, by deleting the executable, replacing it, then restarting the process. Even init can be upgraded like this, replace the executable, and send it a signal, and it'll re-exec() itself, without requiring a reboot. (THis is normally done automatically by your package management system as part of it's upgrade)
Under windows, replacing a file that's in use appears to be a major hassle, generally requiring a reboot to make sure no processes are running.
There can be some problems, such as if you have an extremely large logfile, and you remove it, but forget to tell the process that was logging to that file to reopen the file, it'll hold the reference, and you'll wonder why your disk didn't suddenly get a lot more free space.
You can also use this trick under linux for temporary files. open the file, delete it, then continue to use the file. When your process exits (for no matter what reason -- even power failure), the file will be deleted.
Programs like lsof and fuser (or just poking around in /proc//fd) can show you what processes have files open that no longer have a name.
I think linux / unix doesn't use the same locking mechanics because they are built from the ground up as a multi-user system - which would expect the possibility of multiple users using the same file, maybe even for different purposes.
Is there an advantage to locking? Well, it could possibly reduce the amount of pointers that the OS would have to manage, but now a days the amount of savings is pretty negligible. The biggest advantage I can think of to locking is this: you save some user-viewable ambiguity. If user a is running a binary file, and user b deletes it, then the actual file has to stick around until user A's process completes. Yet, if User B or any other users look on the file system for it, they won't be able to find it - but it will continue to take up space. Not really a huge concern to me.
I think largely it's more of a question on backwards compatibility with window's file systems.
I think you're too absolute about Windows. Normally, it doesn't allocate swap space for the code part of an executable. Instead, it keeps a lock on the excutable & DLLs. If discarded code pages are needed again, they're simply reloaded. But with /SWAPRUN, these pages are kept in swap. This is used for executables on CD or network drives. Hence, windows doesn't need to lock these files.
For .NET, look at Shadow Copy.
If executed code in a file should be locked or not is a design decision and MS simply decided to lock, because it has clear advantages in practice: That way you don't need to know which code in which version is used by which application. This is a major problem with Linux default behaviour, which is simply ignored by most people. If system wide libs are replaced, you can't easily know which apps use code of such libs, most of the times the best you can get is that the package manager knows some users of those libs and restarts them. But that only works for general and well know things like maybe Postgres and its libs or such. The more interesting scenarios are if you develop your own application against some 3rd party libs and those get replaced, because most of the times the package manager simply doesn't know your app. And that's not only a problem of native C code or such, it can happen with almost everything: Just use httpd with mod_perl and some Perl libs installed using a package manager and let the package manager update those Perl libs because of any reason. It won't restart your httpd, simply because it doesn't know the dependencies. There are plenty of examples like this one, simply because any file can potentially contain code in use in memory by any runtime, think of Java, Python and all such things.
So there's a good reason to have the opinion that locking files by default may be a good choice. You don't need to agree with that reasons, though.
So what did MS do? They simply created an API which gives the calling application the chance to decide if files should be locked or not, but they decided that the default value of this API is to provide an exclusive lock to the first calling application. Have a look at the API around CreateFile and its dwShareMode argument. That is the reason why you might not be able to delete files in use by some application, it simply doesn't care about your use case, used the default values and therefore got an exclusive lock by Windows for a file.
Please don't believe in people telling you something about Windows doesn't use ref counting on HANDLEs or doesn't support Hardlinks or such, that is completely wrong. Almost every API using HANDLEs documents its behaviour regarding ref counting and you can easily read in almost any article about NTFS that it in deed does support Hardlinks and always did. Since Windows Vista it has support for Symlinks as well and the Support for Hardlinks has been improved by providing APIs to read all of those for a given file etc.
Additionally, you may simply want to have a look at the structures used to describe a file in e.g. Ext4 compared to those of NTFS, which have a lot in common. Both work with the concept of extents, which separates data from attributes like file name, and inodes are pretty much just another name for an older, but similar concept of that. Even Wikipedia lists both file systems in its article.
There's really a lot of FUD around file locking in Windows compared to other OSs on the net, just like about defragmentation. Some of this FUD can be ruled out by simply reading a bit on the Wikipedia.
NT variants have the
openfiles
command, which will show which processes have handles on which files. It does, however, require enabling the system global flag 'maintain objects list'
openfiles /local /?
tells you how to do this, and also that a performance penalty is incurred by doing so.
Executables are progressively mapped to memory when run. What that means is that portions of the executable are loaded as needed. If the file is swapped out prior to all sections being mapped, it could cause major instability.

Unmovable Files on Windows XP

When I defragment my XP machine I notice that there is a block of "Unmovable Files". Is there a file attribute I can use to make my own files unmovable?
Just to clarify, I want a way to programmatically tell Windows that a file that I create should be unmovable. Is this possible, and if so, how can I do it?
Thanks,
Terry
A lot of system files cannot be moved after the system boots, such as the page file and registry database files.
This utility runs before Windows boots to defragment those files. I have it set to run at every boot, and it works well for me on several machines.
Note that the very first time you boot up with this utility set to run, it may take several minutes to defrag. After that first run though, it finishes in just 3 or 4 seconds.
Edit0: To respond to your clarification- that link says windows has marked the page file and registry files as open for exclusive access. So you should be able to do the same thing with the LockFile API Call. However, that's not an attribute of the file itself. You'd have to actually run some background program that locks the file for exclusive access.
There are no file attributes that you can place on your files to mark them as immovable. The only way that a file cannot be moved (I think) during defragmentation is to have some other process have the file open (for read or write, I'm not even sure that you need to have the file open in exclusive mode or not).
Quite frankly, I cannot think of a reason that you'd want your files not to move, unless you have specific requirements about where on the disk platter your files reside. Defragmentation should generally lead to faster disk access and that seems to be desireable in all cases :-)
This usually means that the file is in use by some process. If you're defragmenting, you'll likely see this with a lot of system files. If the file should legitimately be movable and is stuck (it's being held by a process that runs at startup but shouldn't be, for example), the most useful way of resolving the problem is to remove all permissions on the file, reboot, restore the permissions, and then get rid of the file/run the program that's trying to use it.
I suppose the ugly way is to have an application boot on startup, check every few seconds if defrag is running and if so open the file in exclusive mode.
This is really ugly and I don't recommend it unless there is no cleaner solution.
Terry, the answers all mention ways to prevent files from becoming unmovable during defragmentation. From your question it appears that you are in fact wanting to make your personal files unmovable. Can you please clarify what is appealing about making your files unmovable.
I assume you're using the defragger that comes with Windows. Some commercial ones like DiskKeeper can move some of these files (usually system files). You can try their trial versions.
Contig might serve your purpose http://technet.microsoft.com/en-us/sysinternals/bb897428.aspx
I'm relatively certain I ran across some methods/attributes you could access programatically to do exactly what you want. This was back in NT4 days though and my memory isn't that good.
For a little more complete solution try Raxco's PerfectDisk. While it is a commercial product it does a very good job and supports boot time defrag of system files. The first defrag takes longer than say DiskKeeper but its a single pass defragger and supports defragging with very little free space left on the drive. Overall its a much smarter defrag program then any other I've seen and supports systems of any size.
http://www.raxco.com/
first try to move(or delete) the files within safe mode. If can not, try to move(or delete) the files with linux.
But be careful if those are the windows system files, then you are failed to boot up your windows.
Some reason why the files are unmovable are : the file size is too big, the files are being in open/in use condition, insufficient security privileges, being access by other computer/s, and many other things.

Resources