Jmeter Perfmon Listner - Validating the Swap counters

Jmeter Perfmon Listner - Validating the Swap counters - jmeter

I have a DB performance test and I am reading OS counters from SQL Server box. The SWAP metric which uses Page In and Page Out counters are showing huge values. As per my knowledge the PageIn should be < 100.
Can somebody help me understand this counters as its go beyond 500000.

You need to be concern only if you have high page out values, because page in means read from memory without the overhead of reading from disk:
If an application calls a page and it is in the RAM, then it is a "page in" occurs. If an app calls for a page from memory, and that page is currently stored on the hard disk and has to be read back into the RAM, then a "Page Out" occurs. A "Page-out" slows the operation of the system down because it has to read the data from a hard disk into RAM first

I don't know where did you get this "knowledge" regarding PageIn should be < 100, as per Microsoft metric description:
Pages Input/sec is the rate at which pages are read from disk to resolve hard page faults. Hard page faults occur when a process refers to a page in virtual memory that is not in its working set or elsewhere in physical memory, and must be retrieved from disk. When a page is faulted, the system tries to read multiple contiguous pages into memory to maximize the benefit of the read operation. Compare the value of Memory\Pages Input/sec to the value of Memory\Page Reads/sec to determine the average number of pages read into memory during each read operation.
Given page size in Windows is 4K why do you expect page reads to be limited to 400 kilobytes?
Try correlating JMeter's output with Windows Performance Monitor and you should see similar numbers (even equal if you go for the same zoom levels)
Check out How to Monitor Your Server Health & Performance During a JMeter Load Test for more information on OS metrics collecting during JMeter tests.

Related

Used memory of my single page application increases as time goes

We have a single page application, which runs well at the beginning, but slows down sharply as time goes. I am trying to investigate the root cause.
I use Chrome DevTool to record the timeline for initial page loading and a typical user operation. The JS Heap shows that the memory usage is ok: goes up and down periodically (due to Garbage Collection by browser, maybe).
However, when I check the Chrome Task Manager, I found that my page uses 60MB memory initially. But after 1 hour (and some user operations), the memory goes to 160MB. While the JavaScript Memory seems stable. Later I observed that the memory usage never goes down.
I guess maybe there is some memory leak in our JavaScript code? But the JS Heap seems ok. Does Chrome hold those memory and may release in future (when, say, other process needs more memory)?
Here is the Timeline recorded when I am operating:
I googled but cannot find explanations about this. Could anybody help? Thanks.

It is because of an interval that is not cleared. It keeps calling a function too frequently.

Perfomance of the PAGE_WRITECOPY windows internal memory

I need to implement the undo-redo feature in an application, which reads a project file and makes a sequence of separate transactions changing the project's content. The project can be hundreds MB large.
My idea is to implement the undo-redo on the basis of the copy-on-write (PAGE_WRITECOPY) memory mechanism. I assume that after the end of a transaction the application can access both changed and unchanged pages, compare them, identify the changed records, store original record states in the dedicated undo stack, free the created non-changed pages and restore the write-on-copy protection of the changed pages. I have two questions:
How and where I can found the addresses of the original (non-changed) pages.
The awaited performance of such an implementation?. The middle size of the project's records is cira 100 bytes. if a transaction changes 3000 records that may involve the change of 100 or more 4K physical pages. Is the write-on-copy memory performant enough to support the routineous change of the hundreds physical pages on each step?

Why memory-mapped files are always mapped at page boundaries?

This is my first question here; I'm not sure if it is off-topic.
While self-studying, I have found the following statement regarding Operating Systems:
Operating systems that allow memory-mapped files always require files to be mapped at page boundaries. For example, with 4-KB page, a file can be mapped in starting at virtual address 4096, but not starting at virtual address 5000.
This statement is explained in the following way:
If a file could be mapped into the middle of page, a single virtual page would
need two partial pages on disk to map it. The first page, in particular, would
be mapped onto a scratch page and also onto a file page. Handling a page
fault for it would be a complex and expensive operation, requiring copying of
data. Also, there would be no way to trap references to unused parts of pages.
For these reasons, it is avoided.
I would like to ask for help to understand this answer. Particularly, what does it mean to say that "a single virtual page would need two partial pages on disk to map it"? From what I found about memory-mapped files, virtual pages are mapped to files on disk, and not to a paging file. Is this what is meant by "partial page"?
Also, what is meant by "scratch page" here? I've tried to look up this term on books (Tanenbaum's "Modern Operating Systems" and "Structured Computer Organization") and on the Web, but haven't found it.

First of all, when reading books and documentation always try to look critically at what you see. Sometimes authors tend to use language like "there is no other way" just to promote the solution that they are describing. Other ways are always possible.
Now to the matter. Modern operating systems always have a disk location for every allocated memory page. This makes sense. Once it will be necessary to discard the page in the memory - it is already clear where to put this page if it is 'dirty' or just discard it if it is not modified. This strategy is widely accepted. Although alternative policies are possible also.
The disk location can be either paging file or memory mapped file. The most common use of the memory mapped files - executables and dlls. They are (almost) never modified. If a page with the code is not used for some time - discard it. If control will come there - reread it from the file.
In the abstract that you mentioned, they say would need two partial pages on disk to map it. The first page, in particular, would be mapped onto a scratch page. They tend to present situation like there is only one solution here. In fact, it is possible to allocate page in a paging file for such combined page and handle appropriate data copying. It is also possible not to have anything in the paging file for such page and assemble this page from files using transient page. In 99% of cases disk controller can read/write only from/to the page boundary. This means that you need to read from the first file to memory page, from the second file to the transient page. Copy data from the transient page and immediately discard it.
As you see, it is perfectly possible to combine several files in one page. There is no principle problem here. Although algorithms for handling this solution will be more complex and they will consume more CPU clocks. Reconstructing such page (if it will be discarded) will require reading from several different files. In our days 4kb is rather small quantity. Saving 2kb is not a huge gain. In my opinion, looking at the benefits and the cost I would say that benefits are not significant enough.

Virtual address pages (on every machine I've ever heard of) are aligned on page sized boundaries. This is simply because it makes the math very easy. On x86, the page size is 4096. That is exactly 12 bits. To figure out which virtual page an address is referring to, you simply shift the address to the right by 12. If you were to map a disk block (assume 4096 bytes) to an address of 5000, it would start on page #1 (5000 >> 12 == 1) and end on page #2 (9095 >> 12 == 2).
Memory mapped files work by mapping a chunk of virtual address space to the file, but the data is loaded on demand (indeed, the file could be far larger than physical memory and may not fit). When you first access the virtual address, if the data isn't there (i.e. it's not in physical memory). The processor will fault and the OS has to fetch the data. When you fetch the data, you need to fetch all of the data for the page, or else you wouldn't be able to turn off the fault. If you don't have the addresses aligned, then you'd have to bring in multiple disk blocks to fill the page. You can certainly do this, it's just messy and inefficient.

Windows Memory Workings - page tables, and data

I was trying to understand following:
I know that page tables are built for translation between virtual memory and physical memory by virtual memory manager at some point. Since there are many processes running on a system, even though only process active at a time, I was wondering whether page tables for inactive process are moved to page file at any point of time? Given the fact that lower 2 GB area is reserved for windows, it would make sense that windows would keep page tables for all processes on the system. Although it would make sense as well that they are moved to page file if the current process is switched?
Same goes for the writable (data) pages. Will windows keep all the data pages for all the process in memory or move them to page file at some point. On my machine, task manager says 1.5 GB RAM is being utilized out of 3 GB and 1.5 is system cache in performance tab so my understanding is data stays in physical memory for all applications. But would there be a time when it needs to moved to paging file?

I was wondering whether page tables for inactive process are moved to page file at any point of time?
Yes, page tables are pageable.
Will windows keep all the data pages for all the process in memory or move them to page file at some point.
As far as the Windows paging policy is concerned, there's two kinds of memory: pageable and non-pageable. It doesn't really matter which process it belongs to or even if it belongs to the O/S itself, if it's pageable then it's subject to being paged out. So, yes, Windows will page out process data pages if necessary.
I suggest reading the memory management chapter in the Windows Internals book, it should cover all of this.
-scott

You are actually asking two questions here.
What's the paging policy regarding the page tables.
What's the paging policy for "writable data" pages (i.e. virtual memory with R/W permissions).
First I'll correct you a little.
Given the fact that lower 2 GB area is reserved for windows, it would
make sense that windows would keep page tables for all processes on
the system
To be exact it's the upper 2GB that are reserved to windows, more correctly - may be accessed in the kernel mode only by Windows kernel and drivers.
Now, this may surprise you, but the kernel memory may be pagable too! So technically it's not important at all which portion of the 32-bit address space is visible in the user/kernel mode. It's not related to paging.
Another correction: virtual memory may be in physical memory and saved to the page file. There's a common belief that the OS frees physical storage by on-demand saving the pages to the page file. Wrong.
Actually Windows saves memory pages to the page file before they need to be freed. In fact it dumps all the memory pages to the page file (besides of those that are related to other files, such as mapped sections) in background. There are two reasons for this:
During high load the OS will free memory pages quicker (since they're already saved)
In the kernel mode paging is not always possible. Drivers that run on high IRQL (i.e. serve the most time-critical events) may not access physical storage drivers, hence paging is not possible.
So, the answers to your questions are:
Don't know for sure, but it depends on the OS implementation details. I see no reasons why per-process page table may not be paged-out. It's needed during the context switch and modifying process virtual memory. Both situations don't belong to the time-critical events.
Definitely "writable data" memory pages are saved to the page file. Are they removed from the physical memory? On-demand only, during the system load, in the least-recent-used order.

Programatically read program's page fault count on Windows

I'd like to my Windows C++ program to be able to read the number of hard page faults it has caused. The program isn't running as administrator. Edited to add: To be clear, I'm not as interested in the aggregate page fault count of the whole system.
It looks like ETW might export counters for this, but I'm having a lot of difficulty figuring out the API, and it's not clear what's accessible by regular users as compared to administrators.
Does anyone have an example of this functionality lying around? Or is it simply not possible on Windows?
(OT, but isn't it sad how much easier this is on *nix? gerusage() and you're done.)

afai can tell the only way to do this would be to use ETW (Event Tracing for Windows) to monitor kernel Hard Page Faults. The event payload has a thread ID that you might be able to correlate with an existing process (this is going to be non-trivial btw) to produce a running per-process count. I don't see any way to get historical information per process.
I can guarantee you that this is A Hard Problem because Process Explorer supports only Page Faults (soft or hard) in its per-process display.
http://msdn.microsoft.com/en-us/magazine/ee412263.aspx
A page fault occurs when a sought-out
page table entry is invalid. If the
requested page needs to be brought in
from disk, it is called a hard page
fault (a very expensive operation),
and all other types are considered
soft page faults (a less expensive
operation). A Page Fault event payload
contains the virtual memory address
for which a page fault happened and
the instruction pointer that caused
it. A hard page fault requires disk
access to occur, which could be the
first access to contents in a file or
accesses to memory blocks that were
paged out. Enabling Page Fault events
causes a hard page fault to be logged
as a page fault with a type Hard Page
Fault. However, a hard fault typically
has a considerably larger impact on
performance, so a separate event is
available just for a hard fault that
can be enabled independently. A Hard
Fault event payload has more data,
such as file key, offset and thread
ID, compared with a Page Fault event.

I think you can use GetProcessMemoryInfo() - Please refer to http://msdn.microsoft.com/en-us/library/ms683219(v=vs.85).aspx for more information.

Yes, quite sad. Or you could just not assume Windows is so gimp that it doesn't even provide a page fault counter and look it up: Win32_PerfFormattedData_PerfOS_Memory.

There is a C/C++ sample on Microsoft's site that explain how to read performance counters: INFO: PDH Sample Code to Enumerate Performance Counters and Instances
You can copy/paste it and I think you're interested by the "Memory" / "Page Reads/sec" counters, as stated in this interesting article: The Basics of Page Faults

This is done with performance counters in windows. It's been a while since I've done anything with them. I don't recall whether or not you need to run as administrator to query them.
[Edit]
I don't have example code to provide but according to this page, you can get this information for a particular process:
Process : Page Faults/sec. This is an
indication of the number of page
faults that occurred due to requests
from this particular process.
Excessive page faults from a
particular process are an indication
usually of bad coding practices.
Either the functions and DLLs are not
organized correctly, or the data set
that the application is using is being
called in a less than efficient
manner.

I don't think you need administrative credential to enumerate the performance counters. A sample at codeproject Performance Counters Enumerator

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio