Oracle 11g direct I/O and write complete wait - oracle

Last week my database having high waiting time (write complete waits) on disk. I decided to stop all application from using this database. After that I shut down database and decided to use direct I/O option oracle as advised by my vendor last year.
*I cannot put screen shot here because didn't have enough reputation point
So changed value filesystemio_options from ASYNCH to SETALL and started back database. All application continue using this database again. But...
My disk seems more busy then before, but no waiting detected. Is this normal?
*I cannot put screen shot here because didn't have enough reputation point
For disk i'm using MegaRAID.
c0t1d0 is consist of 6 physical disk.
Virtual Drive: 1 (Target Id: 1)
Name :
RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0
Size : 835.394 GB
Mirror Data : 835.394 GB
State : Optimal
Strip Size : 64 KB
Number Of Drives per span:2
Span Depth : 3
Default Cache Policy: WriteBack, ReadAheadNone, Cached, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Cached, No Write Cache if Bad BBU
Access Policy : Read/Write
Disk Cache Policy : Disk's Default
Encryption Type : None

Related

What part of the RAM is used by the system file cache in Windows?

According to general notions about the page cache and this answer the system file cache essentially uses all the RAM not used by any other process. This is, as far as I know, the case for the page cache in Linux.
Since the notion of "free RAM" is a bit blurry in Windows, my question is, what part of the RAM does the system file cache use? For example, is the same as "Available RAM" in the task manager?
Yes, the RAM used by the file cache is essentially the RAM displayed as available in the Task Manager. But not exactly. I'll go into details and explain how to measure it more precisely.
The file cache is not a process listed in the list of processes in the Task Manager. However, since Vista, its memory is managed like a process. Thus I'll explain a bit of memory management for processes, the file cache being a special case.
In Windows, the RAM used by a process has essentially two states: "Active" and "Standby":
"Active" RAM is displayed in the Task Manager and resource monitor as "In Use". It is also the RAM displayed for each process in the Task Manager.
"Standby" RAM is visible in the Resource monitor globally and for each process with RAMMap.
"Standby" + "Free" RAM is what is called "Available" in the task manager. "Free" RAM tends to be near 0 in Windows but you can meaningfully consider Standby RAM is free as well.
Standby RAM is considered as "not used for a while by the process". It is the part of the RAM that will be used to give new memory to processes needing it. But it still belongs to the process and could be used directly if the owning process suddenly access it (which is considered as unlikely by the system).
Thus the file cache has "Active" RAM and "Standby" RAM. "Active" RAM is somehow the cache for data recently accessed. "Standby" RAM is the cache for data accessed a while ago. The "Active" RAM of the file cache is usually relatively small. The Standby RAM of the file cache is most often all the RAM of your computer: Total RAM - Active RAM of all processes. Indeed, other processes rarely have Standby RAM because it tends to go to the file cache if you do disk I/O quite a bit.
This is the info displayed by RAMMap for a busy server doing a lot of I/O and computation:
The file cache is the second row called "Mapped file". See that most of the 32 GB is either in the Active part of other processes, or in the Standby part of the file cache.
So finally, yes, the RAM used by the file cache is essentially the RAM displayed as available in the Task Manager. If you want to measure with more certainty, you can use RAMMap.
Your answer is not entirely true.
The file cache, also called the system cache, describes a range of virtual addresses, it has a physical working set that is tracked by MmSystemCacheWs, and that working set is a subset of all the mapped file physical pages on the system.
The system cache is a range of virtual addresses, hence PTEs, that point to mapped file pages. The mapped file pages are brought in by a process creating a mapping or brought in by the system cache manager in response to a file read.
Existing pages that are needed by the file cache in response to a read become part of the system working set. If a page in a mapped file is not present then it is paged in and it becomes part of the system working set. When a page is in more than one working set (i.e. system and a process or process and another process), it is considered to be in a shared working set on programs like VMMap.
The actual mapped file pages themselves are controlled by a section object, one per file, a data control area (for the file) and subsection objects for the file, and a segment object for the file with prototype PTEs for the file. These get created the first time a process creates a mapping object for the file, or the first time the system cache manager creates the mapping object (section object) for the file due to it needing to access the file in response to a file IO operation performed by a process.
When the system cache manager needs to read from the file, it maps 256KiB views of the file at a time, and keeps track of the view in a VACB object. A process maps a variable view of a file, typically the size of the whole file, and keeps track of this view in the process VAD. The act of mapping the view is simply filling in PTEs to point to physical pages that contain the file that are already resident by looking at the prototype PTE for that range in the file and seeing what it contains, and in the event that the prototype PTE does not point to a physical page, initialising the PTE to point to the prototype PTE instead of the page it points to, and the PTE is left invalid, and this fault will be resolved on demand on a page by page basis when the read from the view is actually performed.
The VACBs keep track of the 256KiB views of files that the cache manager has opened and the virtual address range of that view, which describes the range of 64 PTEs that service that range of virtual addresses. There is no virtual external fragmentation or page table external fragmentation as all views are the same size, and there is no physical external fragmentation, because all pages in the view are 4KiB. 256KiB is the size chosen because if it were smaller, there would be too many VACB objects (64 times as many, taking up space), and if it were larger, there would effectively be a lot of internal fragmentation from reads and hence large virtual address pollution, and also, the VACB uses the lower bits of the virtual address to store the number of I/O operations that are currently being performed on that range, so the VACB size would have to be increased by a few bits or it would be able to handle fewer concurrent I/O operations.
If the view were the whole size of the file, there would quickly be a lot of virtual address pollution, because it would be mapping in the whole of every file that is read, and file mappings are supposed to be for user processes which knowingly map a whole file view into its virtual address space, expecting the whole of the file to be accessed. There would also be a lot of virtual external fragmentation, because the views wouldn't be the same size.
As for executable images, they are mapped in separately with separate prototype PTEs and separate physical pages, separate control area, separate segment and subsection object to the data file map for the file. The process maps the image in, but the kernel also maps images for ntoskrnl.exe, hal.dll in large pages, and then driver images are on the system PTE working set.

ext4 commit= mount option and dirty_writeback_centisecs

I'm tring to understand the way bytes go from write() to the phisical disk plate to tune my picture server performance.
Thing I don't understand is what is the difference between these two: commit= mount option and dirty_writeback_centisecs. Looks like they are about the same procces of writing changes to the storage device, but still different.
I do not get it clear which one fires first on the way to the disk for my bytes.
Yeah, I just ran into this investigating mount options for an SDCard Ubuntu install on an ARM Chromebook. Here's what I can tell you...
Here's how to see the dirty and writeback amounts:
user#chrubuntu:~$ cat /proc/meminfo | grep "Dirty" -A1
Dirty: 14232 kB
Writeback: 4608 kB
(edit: This dirty and writeback is rather high, I had a compile running when I ran this.)
So data to be written out is dirty. Dirty data can still be eliminated (if say, a temporary file is created, used, and deleted before it goes to writeback, it'll never have to be written out). As dirty data is moved into writeback, the kernel tries to combine smaller requests that may be into dirty into single larger I/O requests, this is one reason why dirty_expire_centisecs is usually not set too low. Dirty data is usually put into writeback when a) Enough data is cached to get up to vm.dirty_background_ratio. b) As data gets to be vm.dirty_writeback_centisecs centiseconds old (3000 default is 30 seconds) it is put into writeback. vm.dirty_writeback_centisecs, a writeback daemon is run by default every 500 centiseconds (5 seconds) to actually flush out anything in writeback.
fsync will flush out an individual file (force it from dirty into writeback and wait until it's flushed out of writeback), and sync does that with everything. As far as I know, it does this ASAP, bypassing any attempt to try to balance disk reads and writes, it stalls the device doing 100% writes until the sync completes.
commit=5 default ext4 mount option actually forces a sync() every 5 seconds on that filesystem. This is intended to ensure that writes are not unduly delayed if there's heavy read activity (ideally losing a maximum of 5 seconds of data if power is cut or whatever.) What I found with an Ubuntu install on SDCard (in a Chromebook) is that this actually just leads to massive filesystem stalls like every 5 seconds if you're writing much to the card, ChromeOS uses commit=600 and I applied that Ubuntu-side to good effect.
The dirty_writeback_centisecs, configures the daemons of the kernel Linux related to the virtual memory (that's why the vm). Which are in charge of making a write back from the RAM memory to all the storage devices, so if you configure the dirty_writeback_centisecs and you have 25 different storage devices mounted on your system it will have the same amount of time of writeback for all the 25 storage systems.
While the commit is done per storage device (actually is per filesystem) and is related to the sync process instead of the daemons from the virtual memory.
So you can see it as:
dirty_writeback_centisecs
writing from RAM to all filesystems
commit
each filesystem fetches from RAM

What can my 32-bit app be doing that consumes gigabytes of physical RAM?

A co-worker mentioned to me a few months ago that one of our internal Delphi applications seems to be taking up 8 GB of RAM. I told him:
That's not possible
A 32-bit application only has a 32-bit virtual address space. Even if there was a memory leak, the most memory it could consume is 2 GB. After that allocations would fail (as there would be no empty space in the virtual address space). And in the case of a memory leak, the virtual pages will be swapped out to the pagefile, freeing up physical RAM.
But he noted that Windows Resource Monitor indicated that less than 1 GB of RAM was available on the system. And while our app was only using 220 MB of virtual memory: closing it freed up 8 GB of physical RAM.
So I tested it
I let the application run for a few weeks, and today I finally decided to test it.
First I look at memory usage before closing the app, using Process Explorer:
the working set (RAM) is: 241 MB
total virtual memory used: 409 MB
And I used Resource Monitor to check memory used by the app, and total RAM in use:
virtual memory allocated by application: 252 MB
physical memory in use: 14 GB
And then memory usage after closing the app:
physical memory in use: 6.6 GB (7.4 GB less)
I also used Process Explorer to look at a breakdown of physical RAM use before and after. The only difference is that 8 GB of RAM really was uncommitted and now free:
Item
Before
After
Commit Charge (K)
15,516,388
7,264,420
Physical Memory Available (K)
1,959,480
9,990,012
Zeroed Paging List (K)
539,212
8,556,340
Note: It's somewhat interesting that Windows would waste time instantly zeroing out all the memory, rather than simply putting it on a standby list, and zero it out as needed (as memory requests need to be satisfied).
None of those things explain what the RAM was doing (What are you doing just sitting there! What do you contain!?)
What is in that memory?
That RAM must contain something useful; it must have some purpose. For that I turned to SysInternals' RAMMap. It can break down memory allocations.
The only clue that RAMMap provides is that the 8 GB of physical memory was associated with something called Session Private. These Session Private allocations are not associated with any process (i.e. not my process):
Item
Before
After
Session Private
8,031 MB
276 MB
Unused
1,111 MB
8,342 MB
I'm certainly not doing anything with EMS, XMS, AWE, etc.
What could possibly be happening in a 32-bit non-Administrator application that is causing Windows to allocate an additional 7 GB of RAM?
It's not a cache of swapped out items
it's not a SuperFetch cache
It's just there, consuming RAM.
Session Private
The only information about "Session Private" memory is from a blog post announcing RAMMap:
Session Private: Memory that is private to a particular logged in session. This will be higher on RDS Session Host servers.
What kind of app is this?
This is a 32-bit native Windows application (i.e. not Java, not .NET). Because it is a native Windows application it, of course, makes heavy use of the Windows API.
It should be noted that I wasn't asking people to debug the application; I was hoping a Windows developer out there would know why Windows might hold memory that I never allocated. Having said that, the only thing changed recently (in the last 2 or 3 years) that could cause such a thing is the feature that takes a screenshot every 5 minutes and saves it to the user's %LocalAppData% folder. A timer fires every five minutes:
QueueUserWorkItem(TakeScreenshotThreadProc);
And pseudo-code of the thread method:
void TakeScreenshotThreadProc(Pointer data)
{
String szFolder = GetFolderPath(CSIDL_LOCAL_APPDTA);
ForceDirectoryExists(szFolder);
String szFile = szFolder + "\\" + FormatDateTime("yyyyMMdd'_'hhnnss", Now()) + ".jpg";
Image destImage = new Image();
try
{
CaptureDesktop(destImage);
JPEGImage jpg = new JPEGImage();
jpg.CopyFrom(destImage);
jpg.CompressionQuality = 13;
jpg.Compress();
HANDLE hFile = CreateFile(szFile, GENERIC_WRITE,
FILE_SHARE_READ | FILE_SHARE_WRITE, null, CREATE_ALWAYS,
FILE_ATTRIBUTE_ARCHIVE | FILE_ATTRIBUTE_ENCRYPTED, 0);
//error checking elucidated
try
{
Stream stm = new HandleStream(hFile);
try
{
jpg.SaveToStream(stm);
}
finally
{
stm.Free();
}
}
finally
{
CloseHandle(hFile);
}
}
finally
{
destImage.Free();
}
}
Most likely somewhere in your application you are allocating system resources and not releasing them. Any WinApi call that creates an object and returns a handle could be a suspect. For example (be careful running this on a system with limited memory - if you don't have 6GB free it will page badly):
Program Project1;
{$APPTYPE CONSOLE}
uses
Windows;
var
b : Array[0..3000000] of byte;
i : integer;
begin
for i := 1 to 2000 do
CreateBitmap(1000, 1000, 3, 8, #b);
ReadLn;
end.
This consumes 6GB of session memory due to the allocation of bitmap objects that are not subsequently released. Application memory consumption remains low because the objects are not created on the application's heap.
Without knowing more about your application, however, it is very difficult to be more specific. The above is one way to demonstrate the behaviour you are observing. Beyond that, I think you need to debug.
In this case, there are a large number of GDI objects allocated - this isn't necessarily indicative, however, since there are often a large number of small GDI objects allocated in an application rather than a large number of large objects (The Delphi IDE, for example, will routinely create >3000 GDI objects and this is not necessarily a problem).
In #Abelisto's example (in comments), by contrast :
Program Project1;
{$APPTYPE CONSOLE}
uses
SysUtils;
var
i : integer;
sr : TSearchRec;
begin
for i := 1 to 1000000 do FindFirst('c:\*', faAnyFile, sr);
ReadLn;
end.
Here the returned handles are not to GDI objects but are rather search handles (which fall under the general category of Kernel Objects). Here we can see that there are a large number of handles used by the process. Again, process memory consumption is low but there is a large increase in session memory used.
Similarly, the objects might be User Objects - these are created by calls to things like CreateWindow, CreateCursor, or by setting hooks with SetWindowsHookEx. For a list of WinAPI calls that create objects and return handles of each type, see :
Handles and Objects : Object Categories -- MSDN
This can help you start to track down the issue by narrowing it to the type of call that could be causing the problem. It may also be in a buggy third-party component, if you are using any.
A tool like AQTime can profile Windows allocations, but I'm not sure if there is a version that supports Delphi5. There may be other allocation profilers that can help track this down.

mod_tile cache size limit

I have a TMS server with apache mod_tile, mapnik & renderd. I have 400GB of free space on my cache folder.
I want to pre-render 11 or 12 levels.
I tried the command "render_list -a -z 0 -Z 10 -v -n 4".
But my cache folder doesn't grow more than 2.6GB and render_list says it finished, no error message.
Even when I use my map (openlayer) missing tiles are rendered on the fly but not stored in cache. Before I pre-rendered my tiles, they were stored in cache.
I searched unsuccessfully,so I ask here : Is there any option in Mod_tile to manage cache size and cache replacement strategy ?
Thanks for your answers.
udpdate : Strangely when I request tile from level 11, they are well stored in cache, and my cache grows. So is there a size limit per level ?

why virtual memory copy on write need to be backed by disk page

reading the copy on write about window's memory management, it is saying that system will find a free page in the RAM for the shared memory ( be backed immediately by disk page ).
why it is necessary to back the RAM page with disk page ? it is not swapped out, it is just created ?
I remember the RAM page only get swapped when there is not enough RAM page
The system needs the guarantee that when a write happens, space will be available. You can't fail an allocation now if the system will run out of diskspace later.
That doesn't mean the disk s written to; the page reservation is merely bookkeeping.

Resources