Memory mapping of files vs CreateFile/ReadFile [closed]

Memory mapping of files vs CreateFile/ReadFile [closed] - windows

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
What are the drawbacks (if any) of using memory mapped file to read (regular sized files) over doing the same using CreateFile ReadFile combination?

With ReadFile/WriteFile you have deterministic error handling semantics. When you use memory mapped files, errors are returned by throwing an exception.
In addition, if the memory mapped file has to hit the disk (or even worse, the network) your memory read may take several seconds (or even minutes) to complete. Depending on your application, this can cause unexpected stalls.
If you use ReadFile/WriteFile you can use asynchronous variants of the API to allow you to control this behavior.
You also have more deterministic performance if you use ReadFile, especially if your I/O pattern is predictable - memory mapped I/O is often random while as ReadFile is almost always serial (since ReadFile reads at the current file position and advances the current file position).

A big advantage of file mapping is that it doesn't influence system cache. If your application does excessive I/O by means of ReadFile, your system cache will grow, consuming more and more physical memory. If your OS is 32 bit and you have much more than 1GB memory, than you're lucky, since on 32 bit Windows the size of system cache is limited by 1GB. Otherwise system cache will consume all available physical memory and the memory manager will soon start purging pages of other processes to disk, intensifying disk operations instead of actually lessen them. The effect is especially noticeable on 64 bit Windows, where the cache size is limited only by available physical memory. File mapping on the other hand doesn't lead to overgrowing of system cache and at the same time doesn't degrade the performance.

You'll need more complex code for establishing the file mapping than for just opening and reading. File mapping is intended for random access to a section of file. If you don't need that, just don't bother with file mapping.
Also if ever need to port your code onto another platform you'll do it much easier and faster if you don't use file mapping.

Related

Does GetWriteWatch work with Memory-Mapped FIles?

outI'm working with memory mapped files (MMF) with very large datasets (depending on the input file), where each file has ~50GB and there are around 40 files open at the same time. Of course this depends, I can also have smaller files, but I can also have larger files - so the system should scale itself.
The MMF is acting as a backing buffer, so as long as I have enough free memory there shoud occur no paging. The problem is that the windows memory manager and my application are two autonomous processes. In good conditions everything is working fine, but the memory manager obviously is too slow in conditions where I'm entering low memory conditions, the memory is full and then the system starts to page (which is good), but I'm still allocating memory, because I don't get any information about the paging.
In the end I'm entering a state where the system stalls, the memory manager pages and I'm allocating.
So I came to the point where I need to advice the memory manager, check current memory conditions and invoke the paging myself. For that reason I wanted to use the GetWriteWatch to inspect the memory region I can flush.
Interestingly the GetWriteWatch does not work in my situation, it returns a -1 without filling the structures. So my question is does GetWriteWatch work with MMFs?

Does GetWriteWatch work with Memory-Mapped Files?
I don't think so.
GetWriteWatch accepts memory allocated via VirtualAlloc function using MEM_WRITE_WATCH.
File mapping are mapped using MapViewOfFile* functions that do not have this flag.

cache, does it make sense to have one? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
What I mean is: memories are becoming larger and larger, and OS and compilers smarter and smarter. Therefore my question, if I have to read data from file, does it make sense to implement a cache? Isn't the operating system already managing data into memory?
edit ok to be more practical, I have 1TB of data sparse in more files, and 180GB of RAM. I need to read some of this data more than once. Does it make sense to implement a cache like LRU, or when I read from file (using c++) the operating system will have been smart enough to have kept these data somewhere so to read them from memory instead of from disk?

Depending on the language and library you are using. It is highly likely that you are actually already caching things into the memory.
In general, you want to cache things that you are currently managing until you are ready to commit the updated data buffer back into the file on the disk simply because disk I/O is a very slow operation.
On files that are very big, you may not want to cache the entire data due to memory constraints, but you would still want to cache the block of data that you are currently managing.
Here's a general diagram of different means of storing data from the fastest (most expensive) to the slowest (least expensive):
CPU data registers -> CPU Cache -> RAM -> SSD -> Hard Disk -> keyboard, etc..
HowStuffWorks.com has a pretty good illustration of this hierarchy and the entire article itself is actually a pretty good read as well: http://computer.howstuffworks.com/computer-memory4.htm
EDIT: There is also another similar discussion here that you may want to check out as well.

Transferring 1-2 megabytes of data through regular files in Windows - is it slower than through RAM?

I'm passing 1-2 MB of data from one process to another, using a plain old file. Is it significantly slower than going through RAM entirely?
Before answering yes, please keep in mind that in modern Linux at least, when writing a file it is actually written to RAM, and then a daemon syncs the data to disk from time to time. So in that way, if process A writes a 1-2 MB into a file, then process B reads them within 1-2 seconds, process B would simply read the cached memory. It gets even better than that, because in Linux, there is a grace period of a few seconds before a new file is written to the hard disk, so if the file is deleted, it's not written at all to the hard disk. This makes passing data through files as fast as passing them through RAM.
Now that is Linux, is it so in Windows?
Edit: Just to lay out some assumptions:
The OS is reasonably new - Windows XP or newer for desktops, Windows Server 2003 or newer for servers.
The file is significantly smaller than available RAM - let's say less than 1% of available RAM.
The file is read and deleted a few seconds after it has been written.

When you read or write to a file Windows will often keep some or all of the file resident in memory (in the Standby List). So that if it is needed again, it is just a soft-page fault to map it into the processes' memory space.
The algorithm for what pages of a file will be kept around (and for how long) isn't publicly documented. So the short answer is that if you are lucky some or all of it may still be in memory. You can use the SysInternals tool VMmap to see what of your file is still in memory during testing.
If you want to increase your chances of the data remaining resident, then you should use Memory Mapped Files to pass the data between the two processes.
Good reading on Windows memory management:
Mysteries of Windows Memory Management Revealed

You can use FILE_ATTRIBUTE_TEMPORARY to hint that this data is never needed on disk:
A file that is being used for temporary storage. File systems avoid writing data back to mass storage if sufficient cache memory is available, because typically, an application deletes a temporary file after the handle is closed. In that scenario, the system can entirely avoid writing the data. Otherwise, the data is written after the handle is closed.
(i.e. you need use that flag with CreateFile, and DeleteFile immediately after closing that handle).
But even if the file remains cached, you still have to copy it twice: from your process A to the cache (the WriteFile call), and from cache to the proces B (ReadFile call).
Using memory mapped files (MMF, as josh poley already suggested) has the primary advantage of avoiding one copy: the same physical memory pages are mapped into both processes.
A MMF can be backed by virtual memory, which means basically that it always stays in memory unless swapping becomes necessary.
The major downside is that you can't easily grow the memory mapping to changing demands, you are stuck with the initial size.
Whether that matters for an 1-2 MB data transfer depends mostly on how you acquire and what you do with the data, in many scenarios the additional copy doesn't really matter.

How Windows executes an Win32 process? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
So when you open an PE (.exe) or call CreateProcess (from Win32 API) the following procedure is followed:
The file header, image sectors and also the DLL's which the exe links against are mapped into the Process Own Virtual Memory.
CPU begin execution at the program start address.
So here comes my question - all the instructions in the PE image use an address relative to it's own Private Address Space (Virtual Memory), which begins with 0. Also sometimes this memory is paged out by Windows somewhere in the Secondary Memory (HDD). How the CPU find out the real physical address in the RAM? Also how the Windows switch from one thread to another by it's priority, to support multi-threading and when the CPU is not fully used send Idle instructions? After all this discoveries I'm starting to think that actually the machine code, stored in the PE files, isn't really executed directly by the CPU but instead in some Windows managed environment? Can this be true, and if so doesn't this slow-down the execution?
EDIT: Ok so the question should be rewritten as follows: "Are the Windows Processes executed in an core layout program or directly on the CPU?". I get the answer I wanted, so anyway the question is solved.

A complete answer would fill an entire book, but in short:
From a high-level view, finding the physical address is done by dividing the address by some constant (typically 4096), converting the address to its corresponding "page", and looking up that page in a table, which points to the index of the real, physical memory page, if one exists. Some or all of that may be done automatically by the CPU without anyone noticing, depending on the situation.
If a page does not exist, the OS will have to read the page from disk prior to letting the code that tried to access the page continue -- and not necessarily always into the same physical page.
In reality it's much more complex, as the table is really an entire hierarchy of tables, and in addition there is a small cache (typically around 50 entries) inside the CPU to do this task automatically for recently accessed pages, without firing an interrupt and running special kernel code.
So, depending on the situation, things might happen fully automatically and invisibly, or the OS kernel may be called, traversing an entire hierarchy of tables, and finally resorting to loading data from disk (and I haven't even considered that pages may have protections that prevent them from being accessed, or protections that will cause them being copied when written to, etc. etc.).
Multi-threading is "relatively simple" in comparison. It's done by having a timer periodically fire an interrupt every so and so often (under Windwos typically around 16 milliseconds, but this can be adjusted), and running some code (the "scheduler") inside the interrupt handler which decides whether to return to the current thread or change to another thread's context and run that one instead.
In the particular case of Windows, the scheduler will always satisfy highest priority tasks first, and only consider lower priority tasks when no non-blocked higher priority tasks are left.
If no other tasks are running, the idle task (which has the lowest priority) runs. The idle task may perform tasks such as zeroing reclaimed memory pages "for free", or it may throttle down the CPU (or both).
Further, when a thread blocks (e.g. when reading a file or a socket), the scheduler runs even without a timer interrupt. This ensures that the CPU can be used for something useful during the time the blocked thread can't do anything.

Is Virtual Memory still relevant in today's world of inexpensive RAM? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Virtual memory was introduced to help run more programs with limited memory. But in todays environment of inexpensive RAM, is it still relevant?
Since there will be no disk access if it is disabled and all the programs will be memory resident, will it not improve the performance and program response times?
Is there any essential requirement for virtual memory in windows apart from running more programs as stated above? Something windows internal not known to us.

Some pedantry: virtual memory is not just the pagefile. The term encompasses a whole range of techniques that give the program the illusion that it has one single contiguous address space, some of which is the program's code, some of which is data, and some of which are DLLs or memory-mapped files.
So to your lead-in question: yes, virtual memory is required. It's what makes modern OS's work.

Don't disable virtual memory. 2GB is not nearly enough to even consider this. Regardless, you should always keep virtual memory on even if you do have enough since it will only ever be used when you actually need it. Much better to be safe than sorry since NOT having it active means you simply hit a wall, while having it active means your computer starts swapping to the hard drive but continues to run.

Yes, because it's the basis of all on-demand paging that occurs in a modern operating system, not just Windows.
Windows will always use all of your memory, if not for applications then for caching whatever you read from your hard drive. Because if that memory is not used, then you're throwing your investment in memory away. Basically, Windows uses your RAM as a big fat cache to your hard drives. And this happens all the time, as the relevant pages are only brought into main memory when you address the content of that page.

The question is really what is the use of a pagefile considering how much memory modern computers have and what's going on under the hood in the OS.
It's common for the Windows task manager to show not much physical memory being used, but, your having many page faults? Win32 will never allocate all it's physical memory. It always saves some for new resource needs. With a big pagefile vs small pagefile, Win32 will be slower to allocate physical memory to a process.
For a few days now I've been using a very small pagefile (200 MB fixed) in Vista with 3GB of addressable physical memory. I have had no crashes or problems. Haven't tried things like large video editing or many different processes open at once. I wouldn't recommend no pagefile since the OS can never shuffle pages around in physical memory leading to the development of holes. A large pagefile is fail-safe for people who wouldn't know how to manually increase the pagefile if a low memory warning pops up or the OS crashes.
Some points:
The kernel will use some of the physical memory and this will be shared through VM mapping with all other processes. Other processes will be in the remaining physical memory. VM makes each process see a 4GB mem space, the OS at the lower 2GB. Each process will need much less than the 4GB of physical memory, this amount is it's committed memory requirement. When programming, a malloc or new will reserve memory but not commit it. Things like the first write to the memory will commit it. Some memory is immedietely committed by the OS for each process.

Your question is really about using a page file, and not virtual memory, as kdgregory said. Probably the most important use for virtual memory is so that the OS can protect once process's memory from another processes memory, while still giving each process the illusion of a contiguous, flat virtual address space. The actual physical addresses can and will become fragmented, but the virtual addresses will appear contiguous.
Yes, virtual memory is vital. The page file, maybe not.

Grrr. Disk space is probably always going to be cheaper than RAM. One of my lab computers has 512MB of RAM. That used to be enough when I got it, but now it has slowed to a crawl swapping and I need to put more RAM in it. I am not running more software programs now than I was then, but they have all gotten more bloated, and they often spawn more "daemon" programs that just sit there doing nothing but wait for some event and use up memory. I look at my process list and the "in-memory" column for the file explorer is 40MB. For Firefox it's 162MB. Java's "update scheduler" jusched.exe uses another 3.6MB. And that's just the physical-memory, these numbers don't include the swap space.
So it's really important to save the quicker, more expensive memory for what can't be swapped out. I can spare tens of GB on my hard drive for swap space.
Memory is seen as cheap enough that the OS and many programs don't try to optimize any more. On the one hand it's great because it makes programs more maintainable and debuggable and quicker to develop. But I hate having to keep putting in more RAM into my computer.

A good explanation at
http://blogs.technet.com/markrussinovich/archive/2008/11/17/3155406.aspx
To optimally size your paging file you
should start all the applications you
run at the same time, load typical
data sets, and then note the commit
charge peak (or look at this value
after a period of time where you know
maximum load was attained). Set the
paging file minimum to be that value
minus the amount of RAM in your system
(if the value is negative, pick a
minimum size to permit the kind of
crash dump you are configured for). If
you want to have some breathing room
for potentially large commit demands,
set the maximum to double that number.

Virtual memory is much more than simply an extension of RAM. In reality, virtual memory is a system they virtualizes access to physical memory. Applications are presented with a consistent environment that is completely independent of RAM size. This offers a number of important advantages quite appart from the increased memory availabilty. Virtual memory is an integral part of the OS and cannot possibly be disabled.
The pagefile is NOT virtual memory. Many sources have claimed this, including some Microsoft articles. But it is wrong. You can disable the pagefile (not recommended) but this will not disable virtual memory.
Virtual mmeory has been used in large systems for some 40 years now and it is not going away anytime soon. The advantages are just too great. If virtual memory were eliminated all current 32 bit applications (and 64 bit as well) would become obsolete.
Larry Miller
Microsoft MCSA

Virtual memory is a safety net for situations when there is not enough RAM available for all running application. This was very common some time ago and today when you can have large amounts of system RAM it is less so.
Some say to leave page file alone and let it be managed by Windows. Some people say that even if you have large RAM keeping big pagefile cannot possibly hurt because it will not be used. That is not true since Windows does pre-emptive paging to prepare for spikes of memory demand. If that demand never comes this is just wasted HDD activity and we all know that HDD is the slowest component of any system. Pre-emptive paging with big enough RAM is just pointless and the only thing it does is to slow down any other disk activity that happens at the same time. Not to mention additional disk wear. Plus big page file means gigabytes of locked disk space.
A lot of people point to Mark Russinovich article to back up their strong belief that page file should not be disabled at any circumstances and so many clever people at Microsoft have thought it so thoroughly that we, little developers, should never question default Windows policy on page file size. But even Russinovich himself writes:
Set the paging file minimum to be that value (Peak Commit Charge) minus the amount of RAM in your system (if the value is negative, pick a minimum size to permit the kind of crash dump you are configured for).
So if you have large RAM amounts and your peek commit charge is never more than 50% of your RAM even when you open all your apps at once and then some, there is no need have page file at all. So in those situations 99.99% of time you will never need more memory than your RAM.
Now I am not advocating for disabling page file it but having it in size of your RAM or more is just waste of space and unnecessary activity that can slow down something else. Page file gives you a safety net in those rare (with plenty of RAM) situations when system does need more memory and to prevent it from getting out of memory which will most likely make your system unstable and unusable.
The only real need for page file is kernel dumps. If you need full kernel dumps you need at least 400 MB of paging file. But if you are happy with mini dumps, minimum is 16 MB. So to have best of both worlds which is
virtually no page file
safety net of virtual memory
I would suggest to configure Windows for mini kernel dumps, set minimum page file size to 16 MB and maximum to whatever you want. This way page file would be practically unused but would automatically expand after first out of memory error to prevent your system from being unusable. If you happen to have at least one out of memory issue you should of course reconsider your minimum size. If you really want to be safe make page file min. size 1 GB. For servers though you should be more careful.

Unfortunately, it is still needed because the windows operating system has a tendency to 'overcache'.
Plus, as stated above, 2GB isn't really enough to consider turning it off. Heck, I probably wouldn't turn it off until I had 8GB or more.
G-Man

Since there will be no disk access if it is disabled and all the programs will be memory resident, will it not improve the performance and program response times?
I'm not totally sure about other platforms, but I had a Linux machine where the swap-space had been accidently disabled. When a process used all available memory, the machine basically froze for 5 minutes, the load average went to absurd numbers and the kernel OOM killer kicked in and terminated several processes. Reenabling swap fixed this entirely.
I never experienced any unnecessary swapping to disc - it only happened when I used all the available memory. Modern OS's (even 5-10 year old Linux distros) deal with swap-space quite intelligently, and only use it when required.
You can probably get by without swap space, since it's quite rare to reach 4GB of memory usage with a single process. With a 64-bit OS and say 8GB of RAM it's even more rare.. but, there's really no point disabling swap-space, you don't gain much (if anything), and when you run out of physical memory without it, bad things happen..
Basically - any half-decent OS should only use disc-swap (or virtual-memory) when required. Disabling swap only stops the OS being able to fall back on it, which causes the OOM killer to strike (and thus data-loss when processes are terminated).

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio