I notice that in many disk storage systems, such as SQLite, IStream(Created on File). When they are opened exclusively, they will get better performance.
For SQLite, it is at "PRAGMA LOCKING_MODE" section in https://blog.devart.com/increasing-sqlite-performance.html
For IStream, based on document for SHCreateStreamOnFileEx at https://learn.microsoft.com/zh-cn/windows/win32/stg/stgm-constants, it said "In transacted mode, sharing of STGM_SHARE_DENY_WRITE or STGM_SHARE_EXCLUSIVE can significantly improve performance because they do not require snapshots."
Therefore, I just wonder in Windows, whether the genereal disk file will get better performance if I open it as read mode, together with share exclusively mode? In the past, when opening a file for read purpose, I only set it share mode to deny write instead of share exclusively, though there are no other processes or threads that will try to read the file at the same time.
Related
I did read a following article:
http://surbhistechmusings.blogspot.com/2016/05/neo4j-performance-tuning.html
I come across
We loaded the files into OS cache
It is about loading file (on local disk) into OS cache.
Could you explain me how to do such loading?
And tell me please if such cache is in-memory?
What is cause that it can help?
Actually this is done simply by reading files most of the time.
Actually modern OSes fill the unused parts of your RAM with filesystem caches. And this is done when apps open files : the first time data is read from disk but then it is (transparently) read from memory. This is why an app you've just closed seems to start much faster the next time you launch it provided you did not shut down your computer between the two launches.
Many software rely on this mechanism like Kafka, PostgresSQL and so on.
I am the author of that blog. This is what I had used https://hoytech.com/vmtouch/
You can read up, extensive literature around it.
The idea behind doing this is ,almost every lookup is a random disk read which is veryyy slow, however if the file is in the os cache, it becomes much much faster.
I am currently on a mission loading files into pagecache, and I want to load locked files, too. The goal is nothing more than pro-actively keeping a dataset in RAM, reducing loading times within third party applications.
Shadow copies were my first thought on this, but unfortunately seem to have separated pagecaches.
So is there any way cheating around the exclusive lock mechanism? Like fetching file fragment location on disk, accessing whole disk and reading directly (which I fear is another separated pagecache, anyways)?
Or is there a very different approach to directing the pagecache, e.g. some Windows API that can be told to load a specific file into pagecache?
You can access locked files in Windows from kernel-mode driver, or using our RawDisk product. But for your task (speed up DB file access) this won't work right as Windows' filesystem cache size is limited (it won't accommodate GBs of data).
In general, if I were to develop a large software project (for small application the amount of work needed is just enormous) I'd do the following: create a virtual drive backed by in-memory storage, present the DB file to the application via that virtual disk and flush drive contents to the disk on change asynchronously. All of this should be done in kernel mode (this is where development time grows to 12-15 man-months of work).
In theory the same can be done using one of our Virtual Storage products, but going back into user mode for callback handling would eliminate all that you gain from moving the data into RAM.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to invalidate the file system cache?
I'm writing a disk intensive win32 program. The first time it runs, it runs a lot slower while it scans the user's folders using FindFirstFile()/FindNextFile().
How can I repeat this first time performance without rebooting? Is there any way to force the system to discard everything in its disk cache?
I know that if I were reading a single file, I can disable caching by passing the FILE_FLAG_NO_BUFFERING flag to a call to CreateFile(). But it doesn't seem possible to do this when searching for files.
Have you thought about doing it on a different volume, and dismounting / remounting the volume? That will cause the vast majority of everything to be re-read from disk (though the cache down there won't care).
You need to create enough memory pressure to cause the memory manager and cache manager to discard the previously caches results. For the cache manager, you could try to open a large (I.e. Bigger than physical ram) file with caching enabled and then read it backwards (to avoid any sequential I/o optimizations). The interactions between vm and cache manager are a little more complex and much more dependent on os version.
There are also caches on the controller (possibly, but unlikely) and on the disk drive itself (likely). There are specific IoCtls to flush this cache, but in my experience, disk firmware is untested in this arena.
Check out the Clear function of CacheSet by SysInternals.
You could avoid a physical reboot by using a virtual machine.
I tried all the methods in the answers, including CacheSet, but they would not work for FindFirstFile/FindNextfile(). Here is what worked:
Scanning files over the network. When scanning a shared drive, it seems that windows does not cache the folders, so it is slow every time.
The simplest way to make any algorithm slower is to insert calls to Sleep(). This can reveal lots of problems in multi-threaded code, and that is what I was really trying to do.
I have a large file server machine which contains several terabytes of image data that I generally access in chunks. I'm wondering if there is anything special that I can do to hint to the OS that a specific set of documents should be preloaded into memory to improve the access time for that subset of files when they are loaded over a file share.
I can supply a parent directory that contains all of the files that comprise a given chunk before I start to access them.
The first thing that comes to mind is to simply write a service that will iterate through the files in the specified path, load them into process memory and then free the memory in hopes that the OS filesystem cache holds on to them, but I was wondering if there is a more explicit way to do this.
It would save a lot of work if I could re-use the existing file share access paradigm rather than requiring the access to these files to go through a memory caching layer.
The files in question will almost always be accessed in a readonly manner.
I'm working on Windows Server 2003/2008
Two approaches come to mind:
1) Set the server to be optimized for file serving. This used to be in the properties for file & printer sharing, but seems to have gone away in Windows 2008. This is set via the registry in:
HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory
Management\LargeSystemCache=1
HKLM\SYSTEM\CurrentControlSet\Services\LanmanServer\Parameters\Size=3
http://technet.microsoft.com/en-us/library/cc784562.aspx as ref.
2) Ensure that both endpoints are either windows 2008/windows 2008, or windows 2008/Vista. There are significant performance improvements in SMB 2.0 as well as the IP stack which improve performance greatly. This may not be an option due to cost, organizational constraints, or procurement lead time, but I thought I'd mention it.
http://technet.microsoft.com/en-us/library/bb726965.aspx as ref.
What tools or techniques can I use to remove cached file contents to prevent my performance results from being skewed? I believe I need to either completely clear, or selectively remove cached information about file and directory contents.
The application that I'm developing is a specialised compression utility, and is expected to do a lot of work reading and writing files that the operating system hasn't touched recently, and whose disk blocks are unlikely to be cached.
I wish to remove the variability I see in IO time when I repeat the task of profiling different strategies for doing the file processing work.
I'm primarily interested in solutions for Windows XP, as that is my main development machine, but I can also test using linux, and so am interested in answers for that environment too.
I tried SysInternals CacheSet, but clicking "Clear" doesn't result in a measurable increase (restoration to timing after a cold-boot) in the time to re-read files I've just read a few times.
Use SysInternal's RAMMap app.
The Empty / Empty Standby List menu option will clear the Windows file cache.
For Windows XP, you should be able to clear the cache for a specific file by opening the file using CreateFile with the FILE_FLAG_NO_BUFFERING options and then closing the handle. This isn't documented, and I don't know if it works on later versions of Windows, but I used this long ago when writing test code to compare file compression libraries. I don't recall if read or write access affected this trick.
A command line utility can be found here
from source:
EmptyStandbyList.exe is a command line tool for Windows (Vista and
above) that can empty:
process working sets,
the modified page list,
the standby lists (priorities 0 to 7), or
the priority 0 standby list only.
Usage:
EmptyStandbyList.exe workingsets|modifiedpagelist|standbylist|priority0standbylist
A quick googling gives these options for Linux
Unmount and mount the partition holding the files
sync && echo 1 > /proc/sys/vm/drop_caches
#include <fcntl.h>
int posix_fadvise(int fd, off_t offset, off_t len, int advice);
with advice option POSIX_FADV_DONTNEED:
The specified data will not be accessed in the near future.
I've found one technique (other than rebooting) that seems to work:
Run a few copies of MemAlloc
With each one, allocate large chunks of memory a few times
Use Process Explorer to observe the System Cache size reducing to very low levels
Quit the MemAlloc programs
It isn't selective though. Ideally I'd like to be able to clear the specific portions of memory being used for caching the disk blocks of files that I want to no longer be cached.
For a much better view of the Windows XP Filesystem Cache - try ATM by Tim Murgent - it allows you to see both the filesystem cache Working Set size and Standby List size in a more detailed and accurate view. For Windows XP - you need the old version 1 of ATM which is available for download here since V2 and V3 require Server 2003,Vista, or higher.
You will observe that although Sysinternals Cacheset will reduce the "Cache WS Min" - the actual data still continues to exist in the form of Standby lists from where it can be used until it has been replaced with something else. To then replace it with something else use a tool such as MemAlloc or flushmem by Chad Austin or Consume.exe from the Windows Server 2003 Resource Kit Tools.
As the question also asked for Linux, there is a related answer here.
The command line tool vmtouch allows for adding and removing files and directories from the system file cache, amongst other things.
There's a windows API call https://learn.microsoft.com/en-us/windows/desktop/api/memoryapi/nf-memoryapi-setsystemfilecachesize that can be used to flush the file system cache. It can also be used to limit the cache size to a very small value. Looks perfect for these kinds of tests.