Cache of operating system - how to load data into it? - caching

I did read a following article:
http://surbhistechmusings.blogspot.com/2016/05/neo4j-performance-tuning.html
I come across
We loaded the files into OS cache
It is about loading file (on local disk) into OS cache.
Could you explain me how to do such loading?
And tell me please if such cache is in-memory?
What is cause that it can help?

Actually this is done simply by reading files most of the time.
Actually modern OSes fill the unused parts of your RAM with filesystem caches. And this is done when apps open files : the first time data is read from disk but then it is (transparently) read from memory. This is why an app you've just closed seems to start much faster the next time you launch it provided you did not shut down your computer between the two launches.
Many software rely on this mechanism like Kafka, PostgresSQL and so on.

I am the author of that blog. This is what I had used https://hoytech.com/vmtouch/
You can read up, extensive literature around it.
The idea behind doing this is ,almost every lookup is a random disk read which is veryyy slow, however if the file is in the os cache, it becomes much much faster.

Related

Will Disk File Have Better Performance When Opened Exclusively?

I notice that in many disk storage systems, such as SQLite, IStream(Created on File). When they are opened exclusively, they will get better performance.
For SQLite, it is at "PRAGMA LOCKING_MODE" section in https://blog.devart.com/increasing-sqlite-performance.html
For IStream, based on document for SHCreateStreamOnFileEx at https://learn.microsoft.com/zh-cn/windows/win32/stg/stgm-constants, it said "In transacted mode, sharing of STGM_SHARE_DENY_WRITE or STGM_SHARE_EXCLUSIVE can significantly improve performance because they do not require snapshots."
Therefore, I just wonder in Windows, whether the genereal disk file will get better performance if I open it as read mode, together with share exclusively mode? In the past, when opening a file for read purpose, I only set it share mode to deny write instead of share exclusively, though there are no other processes or threads that will try to read the file at the same time.

Monitorig system (windows): cpu and disk load log

I'm developing applications (services) for Windows and sometimes have problem with performance and recources (especially with MsSql). I need to know which service, application or OS component, developed by my or someone else, makes load CPU or HDD at some moment in past.
I whant to be able to do it using some kind of stored data (log), better with grafics.
Is there any way to do it?
Perfmon will be you built in friend!
you can either log current performance counters in a user session or let a background service track your preselected counters and you can check that afterwards.
you will find tons of explanations how to user perfmon. It is part of every windows since NT4.

Cheat exclusive access locked files in Windows (7)

I am currently on a mission loading files into pagecache, and I want to load locked files, too. The goal is nothing more than pro-actively keeping a dataset in RAM, reducing loading times within third party applications.
Shadow copies were my first thought on this, but unfortunately seem to have separated pagecaches.
So is there any way cheating around the exclusive lock mechanism? Like fetching file fragment location on disk, accessing whole disk and reading directly (which I fear is another separated pagecache, anyways)?
Or is there a very different approach to directing the pagecache, e.g. some Windows API that can be told to load a specific file into pagecache?
You can access locked files in Windows from kernel-mode driver, or using our RawDisk product. But for your task (speed up DB file access) this won't work right as Windows' filesystem cache size is limited (it won't accommodate GBs of data).
In general, if I were to develop a large software project (for small application the amount of work needed is just enormous) I'd do the following: create a virtual drive backed by in-memory storage, present the DB file to the application via that virtual disk and flush drive contents to the disk on change asynchronously. All of this should be done in kernel mode (this is where development time grows to 12-15 man-months of work).
In theory the same can be done using one of our Virtual Storage products, but going back into user mode for callback handling would eliminate all that you gain from moving the data into RAM.

How can I force Windows to clear all disk read cache data? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to invalidate the file system cache?
I'm writing a disk intensive win32 program. The first time it runs, it runs a lot slower while it scans the user's folders using FindFirstFile()/FindNextFile().
How can I repeat this first time performance without rebooting? Is there any way to force the system to discard everything in its disk cache?
I know that if I were reading a single file, I can disable caching by passing the FILE_FLAG_NO_BUFFERING flag to a call to CreateFile(). But it doesn't seem possible to do this when searching for files.
Have you thought about doing it on a different volume, and dismounting / remounting the volume? That will cause the vast majority of everything to be re-read from disk (though the cache down there won't care).
You need to create enough memory pressure to cause the memory manager and cache manager to discard the previously caches results. For the cache manager, you could try to open a large (I.e. Bigger than physical ram) file with caching enabled and then read it backwards (to avoid any sequential I/o optimizations). The interactions between vm and cache manager are a little more complex and much more dependent on os version.
There are also caches on the controller (possibly, but unlikely) and on the disk drive itself (likely). There are specific IoCtls to flush this cache, but in my experience, disk firmware is untested in this arena.
Check out the Clear function of CacheSet by SysInternals.
You could avoid a physical reboot by using a virtual machine.
I tried all the methods in the answers, including CacheSet, but they would not work for FindFirstFile/FindNextfile(). Here is what worked:
Scanning files over the network. When scanning a shared drive, it seems that windows does not cache the folders, so it is slow every time.
The simplest way to make any algorithm slower is to insert calls to Sleep(). This can reveal lots of problems in multi-threaded code, and that is what I was really trying to do.

let's say I am writing my code and then my PC died, how necessary is it to do a complete scan if i don't want my later source code to be contaminated?

let's say I am writing a Ruby on Rails program and while editing a file, the machine blue screened. in this case, how necessary is it to re-scan the whole hard drive if I don't want my future files to be damaged?
Let's say if the OS is deleting a tmp file at the moment when my computer crashed, and still have some pointers to some sector on the hard drive. and if my newly created files happen to be in those sector, and next time the OS clean up files again, it may think that the "left-over" sector wasn't cleaned last time and clean it again, and damaging our source code. (esp with Ruby on Rails, where the source code could be generated by rails and not by us, and we may not know why our rails server doesn't work, if a file is affected). we can rely on SVN, but what if the file is affected before we check it in?
i think the official answer will be: "always scan the disk after a crash or power outage, for the data and even the space and indicate attempt to fix any bad sector", but the thing is, nowadays with the hard drive so big, it could take 2 hours to scan everything. And especially at work, we cannot wait for 2 hours if it is the middle of the day.
Does someone know if the modern OS, like XP, Vista, Mac OS, and Linux (when sometimes the power cord was loose and it didn't shut down properly and just shut down on 0% battery), with these modern OS, are our source code safe? Do they know how to structure to write to sector so that at most it will waste sector instead of overlapping sectors?
With a modern journaling file system (ext3/4, NTFS), the only problem would be that a file could be in a "half-written" state. Obviously scanning is not going to help this (that's what backups are for). The file system itself could not be corrupted. If you are using something like FAT, then yes, you should worry about this.
There's really only 1 issue here.
Is any file currently being written in some kind of "half written" state.
The primary cause of this would be if the application/editor is writing the file and the machine dies halfway through. In this case, the file be written is, well, half done. If it was over writing the original file, the original file is "gone", and the new one is "half done". If you don't have a back up file, then, well, you have a problem.
As far as a file having dangling pointers, or references to sectors not written, or somesuch thing. That problem depends on your file system.
The major, modern files ystems are journaled and "won't allow" this to happen. You may have a "half written", but that's because the application only got to write half of it, rather than the file system losing track of a sector pointer.
If you're playing file system games for performance, or whatever (such as using a UFS without logging), then you would want to run a fschk to clean up the file systems meta data.
But if you're using a modern operating system and file system (i.e. anything from the past 5 years), you won't have this problem.
Finally, if you do have version control running, then just do an "svn status", it will show you any "corrupted" files as they will have changed and it will detect that as well.
i see some information on
http://en.wikipedia.org/wiki/Journaling_file_system
Journalized file systems
File systems may provide journaling, which provides safe recovery in the event of a system crash. A journaled file system writes some information twice: first to the journal, which is a log of file system operations, then to its proper place in the ordinary file system. Journaling is handled by the file system driver, and keeps track of each operation taking place that changes the contents of the disk. In the event of a crash, the system can recover to a consistent state by replaying a portion of the journal. Many UNIX file systems provide journaling including ReiserFS, JFS, and Ext3.
In contrast, non-journaled file systems typically need to be examined in their entirety by a utility such as fsck or chkdsk for any inconsistencies after an unclean shutdown. Soft updates is an alternative to journaling that avoids the redundant writes by carefully ordering the update operations. Log-structured file systems and ZFS also differ from traditional journaled file systems in that they avoid inconsistencies by always writing new copies of the data, eschewing in-place updates.

Resources