Unexplained Azure webapp memory consumption - asp.net-web-api

I am debugging unexplained memory consumption on one of our Azure Web apps. I spent many hours digging through logs , memory dumps using dotmemory, perfview, diagnostic tools but still unable to figure out why our app memory is increasing steadily through out the day.
One of the memory dump I downloaded was approx 900 MB but dotmemory shows only 26 MB for largest size objects
I also performed countless diagnostic sessions with VS diagnostic tools and checked that app is releasing the resources at the end of every request
Below is private bytes consumption at the beginning and end of the day
I verified multiple times in the heap to make sure the application objects / modules before and after request were being released and the above screenshot shows that it does .
I would if you could share any pointers on how to find what is consuming memory

After spending good number of hours on researching, we found that the static variables in one of our libraries were holding on to the memory and never being released. We did spot those static variables in the visual memory snapshots but refused to believe that those were the actual cause of the issue and were trying to find the non existent objects.
we replaced the static variables with properties and that resolved the memory leak issue

Related

Debugging memory usage in a very short lived application without Windows

I have a console application that runs for less than 500ms, but that according to BenchmarkDotNet allocates more than 100 MB.
I am trying to figure out what are those 100Mb because it does not add up. However I cannot find a tool to do so in Linux or Mac. Once the method the app calls is over, the GC can clean all that memory without problems, so it is not a leak I can see in a dump, unless I take the dump in the very exact moment before exiting the method. I am not clear which is the moment in which the algorithm peaks in memory usage.
I can take CPU traces using dotnet-trace and show it in the browser with Speedscope, but I cannot show in Speedscope a trace when using gc-verbose or gc-collect as provider.
Is there a way with dotnet-trace to print in the console the stats of the created objects or anything like that?
try out dotnet dump and checkout this article by Tess Ferrandez.
and maybe you can share a little bit more information please.

Understanding CentOS Memory usage

I am not an OS expert, and I am having trouble understanding my server's memory usage. I need your advices to understand the following:
My server has 8 GB RAM and operates as web server. PHP, mySQL and Apache processes consume the majority of the memory. When I issue the command "free" after the system is rebooted, I would normally see something along these lines:
total used free shared buffers cached
Mem: 8059080 2277924 5781156 0 948 310852
-/+ buffers/cache: 1966124 6092956
Swap: 4194296 0 4092668
Obviously, sooner or later the free memory would drop and the cached memory would increase and I assume there is nothing wrong with that since the OS decides to cache it.
What I don't understand is about 1-2 days later after the machine is rebooted, I would slightly see an increase in the used swap memory. Does not this mean that the server does not have free memory anymore and using IO instead? How can I understand which processes cause this?
I am asking this question to stackoverflow users because if I ask it to my hosting provider, I am sure they would ask more money to increase RAM.
Thanks.
This is perfectly normal. When the machine starts up, a large number of services also start up. As they run their startup code, read their configuration, and so on, they dirty some pages of memory. Many of these services will never run again. By writing this data to swap, the operating system accomplishes two things:
First, if it ever does encounter memory pressure, it can discard the pages without having to write them first, since it has already written them. Second, it can discard the pages to make more free memory to enlarge the cache.
The alternative is to keep information that hasn't been touched in days in physical memory. And that just doesn't make sense.

Recursive code eating up whole memory

I have a C++ project which has a extremely recursive function. This project when deployed and run on the dedicated machine it is eating upto 12 GB of memory. Previously, this process used to fail at 2GB limit. Hence, I converted this entire solution having 20 projects to 64 bit platform on visual studio 2010 platform. Now, it doesn't stop at 2 GB but eating up almost my whole RAM. Total RAM usage is 15.9 GB out of total 16 GB. I am using COM+ to run these application as dllhost. My questions are:
How to limit the memory being allocated fully causing failure of the process?
Does windows frees the memory after recursion is over? Because I am not seeing this process releasing memory until I kill this process.
Terminating case is not the problem, I guess, beacuse my job is succeeded by using Expiration Time option in Recycling in Component Services.Or is there any other problem?

How to improve GWT hosted mode / compilation times?

At work we are using some pretty powerful machines: HP Z600 with dual xeon #2.5GHz, 8-16GB ram. Unfortunately due to ill-fitting company policies we are forced to use 32-bit XP, so I made a PAE ramdrive out of the unused 4GB RAM.
Now, the temporary files are on the ramdrive. I've also tried moving the whole project to the ramdrive, then to an SSD, but there was no noticeable improvement in hosted-mode startup or compilation times.
I've then run SysInternals' Process Monitor to see if there are any bottlenecks that are not visible with task manager / hard-disk activity led, but I have not seen anything notable - except some buffer overflows which I don't understand what they mean.
I can assume that performance for OOMPH startup and GWT compilation are tied so I'm using the compilation times as benchmarking between various changes.
I've activated and deactivated hyper-threading and turbo-boost in BIOS, but again saw no differences. Hyperthreading even seems to make everything slower, I can assume that context switching penalty is higher for 16 cores than for 8 cores. Turbo-boost does not seem to do anything, I can assume it only works under Win7, I have not succeeded in activating the driver. It should boost the core from 2.5Ghz to 2.8Ghz.
Deactivated indexing and timestamping on NTFS drives, changed the performance setting from foreground to background and back, used another instance of Eclipse - no changes.
For compilation I've tried specifying a different number of workers, larger memory and some other options. Everything above two workers increases compilation time.
Older HP machines (XW6600) seem to compile a bit faster, perhaps because of the 2.8GHz clock, but their hosted mode seems to start up slower.
To sum up, memory usage is at about 2.6GB, pagefile usage is zero, harddrive is not signaling much activity, CPU activity is <10% (single core at about 50-70%) but still the computer seems to do nothing for some time while compiling or launching OOMPH GWT.
Ok, so now that I've tried everything I knew and found on the Internet, is there anything else that I can try? Will switching to 64-bit Win7 improve much (this is due next year anyway)? Are there any hardware/software options that I can tweak?
L.E.: also ran RATT (tracer from MS) to see if there are any interrupts taking too long, but everything seems in order. Antivirus does not make a difference. Benchmarked another GWT project against my i7 mobile (2630q) and the i7 is about 70% faster, though it has about the same clock.
In most cases I've encountered the reason for slow compilation/hosted mode refresh is that application is written this way.
For compilation there are few things to watch out for:
Don't keep unused classes and modules in your classpath, remove them if possible because GWT is parsing them anyway at precompile stage
whatch out for GWT-RPC type explosion, this is usually a cause of all problems
Be really carefull with interface ContextWithLokup, always try to minimize the number of methods used in interface which extends ContextWithLookup
Try to check out some other compilation options, like distributed compilation , soft permutations or multi-jvm compilation (run compiler with system property -Dgwt.jjs.permutationWorkerFactory=com.google.gwt.dev.ExternalPermutationWorkerFactory and -localWorkers to specify number of JVMs)
For hosted mode:
Use lazy loading where it is possible. The greatest problem with hosted mode i saw, i that application is trying to initialize too many classes which aren't even used on current screen, this can greatly speedup a hosted mode startup. Again, stuff like RPC type explosion affects hosted mode as well
That's all I can advise. Stuff like ram disk can speed up stuff like -compileReport, since it can generate a huge number of files.

Does Windows Server 2003 SP2 tell the truth about Free System Page Table Entries?

We have some Win32 console applications running on Windows Server 2003 Service Pack 2 that regularly fail with this:
Error 1450 (ERROR_NO_SYSTEM_RESOURCES): "Insufficient system resources exist to complete the requested service."
All the documentation we've found suggests it is linked to the number of Free System Page Table Entries running out. We have 16GB RAM in these machines and use the /3GB Operating System switch to squeeze the Windows kernel into 1GB and allow our processes access to 3GB of address space. This drastically reduces the total number of Free System Page Table Entries, so combined with our heavy use of MapViewOfFile() it is perhaps not surprising that the kernel page table entries are running out.
However, when using Performance Monitor to view the Free System Page Table Entries counter, the value is around 36,000 on reboot and doesn't go down when our application starts. I find it hard to believe that our application, which opens many large memory-mapped files, doesn't have any effect on the kernel page table. If we can't believe the counter, it's much more difficult to test the effect of any system changes we make.
There is a promising Knowledge Base article, The Performance tool does not accurately show the available Free System Page Table entries in Windows Server 2003, but it says the problem has been fixed in Service Pack 1, and we are already on Service Pack 2.
Has anyone else struggled with or solved this issue?
Update: I have checked !sysptes in windbg (debugging the kernel) and the value matches the performance counter, around 36,000. I guess this is most likely to mean that there really are that many free page table entries and Windows is telling the truth. It does leave the question of why we're getting 1450 errors though, if the PTEs are not running out.
Further update: We never did get to the bottom of why the 1450 errors were occurring. However, instead we upgraded the OS on these servers to 64-bit Windows. This allows the existing 32-bit applications (without recompilation) to access a full 4GB of virtual address space, and lets the kernel memory area with those pesky Page Table Entries be as big as it likes too. I don't think we've had a 1450 error since.
Can you try the windbg command "!sysptes" to get System PTE Information? I'm not sure if you can do this with live kernel debug, you may have to get a memory dump.
I'm not sure why you assume that ERROR_NO_SYSTEM_RESOURCES is caused only by running out of free System Page Table Entries ? As far as I know, such generic error codes are used for more than one resource type. And in fact, the first Google hit suggests that running out of file cache memory may cause it too. (KB on an XP bug, which tripped this error mode).
In your case, I'd be checking the "Handle Count". Another possible problem is address space fragmentation. If you you want to create a 1GB file mapping view, you need 1GB of free address space, and it has to be contiguous. If you map a 1GB file, a 800 MB file, and a 1GB file, close the 800MB one and open a 900MB file, the 900MB file may not fit in the hole that's left.
MS has 2 ways to allow there 32 bit OS to "deal" with hardware that has 4 GB or more of RAM.
Option 1: is what you did with the /3GB Switch in the Boot.ini.
Option 1 Pros and Cons:
(CONS) This option sucks 1 GB from the normal 2 GB kernel area - hence making the OS struggle to meet the demands of both Paged Pool allocations and kernel stack allocations. So a person might think that using the /3GB Switch will help their, but really this option is screwing the 32 bit Window OS into a slow death.
(CONS) But, This gives my App 3GB.... WRONG (Hence this is a CON) The catch is that ONLY application that have been recompiled from the vendor to be "/3GB Switch aware" can really use the extra 1 GB. Hence the whole use of the /3GB Switch is a really BAD J.O.K.E on everyone.
Read this link for a much better write-up:
http://blogs.technet.com/askperf/archive/2007/03/23/memory-management-demystifying-3gb.aspx
Option 2: Use the /PAE switch in the Boot.ini.
Option 2 Pros and Cons:
(PROS) This really this only option if you have a more then 4GB of RAM. It tricks a application by placing the complete application memory footprint in RAM. Normally, only a application "Working Set" memory is in RAM and the remaining application memory requirements go into Windows Pagefile. What is a application total memory requirements?? - it called "Virtual Size".
In my world, I have a big fat Java based IBM Product that I deal with. The server that is running the "application" has 16 GB of RAM. I simply add the /PAE switch and watch (thanks to sysinternals Processes Explorer) application paging requests go from 200 KB per sec to up to 4MB per sec.
Question: "Why"?
Answer: The whole application is in RAM.
Question: "Does the application know that it is completely running in RAM?
Answer: No - It is running that same old way that it was always run, "THINKING" that it's has part of itself as the "Working Set" memory living in RAM and the remaining application memory requirements go into Windows Pagefile.
Yes, it is that flipping GOOD.
Please Note: Microsoft has done a poor job telling anyone about the great Windows OS option. Duh
Try it and report back to stackoverflow....

Resources