Facing memory leak because of third party analytics thread - leakcanary

(standard input):10240:D/LeakCanary( 4167):
* GC ROOT thread
com.apsalar.sdk.ApsalarThread. (named 'ApsalarHTTPThread')
(standard input):10241:D/LeakCanary( 4167):
* leaks
.MainActivity instance (standard
input):10243:D/LeakCanary( 4167):
* Device: samsung samsung GT-S7562
kylexx (standard input):10244:D/LeakCanary( 4167):
* Android Version:
4.0.4 API: 15 LeakCanary: 1.3.1 (standard input):10245:D/LeakCanary( 4167):
* Durations: watch=5023ms, gc=829ms, heap dump=9032ms,
analysis=47771ms
Does above log says that Apsalar thread is the reason for memory leaks ?

ApsalarHTTPThread thread has a reference to MainActivity while ApsalarHTTPThread thread is running, even if you have finished your MainActivity.
You can try getApplicationContext() instead of MainActivity.this.

Related

Why are OpenGL and CUDA contexts memory greedy?

I develop software which usually includes both OpenGL and Nvidia CUDA SDK. Recently, I also started to seek ways to optimize run-time memory footprint. I noticed the following (Debug and Release builds differ only by 4-7 Mb):
Application startup - Less than 1 Mb total
OpenGL 4.5 context creation ( + GLEW loader init) - 45 Mb total
CUDA 8.0 context (Driver API) creation 114 Mb total.
If I create OpenGL context in "headless" mode, the GL context uses 3 Mb less, which probably goes to default frame buffers allocation. That makes sense as the window size is 640x360.
So after OpenGL and CUDA context are up, the process already consumes 114 Mb.
Now, I don't have deep knowledge regarding OS specific stuff that occurs under the hood during GL and CUDA context creation, but 45 Mb for GL and 68 for CUDA seems a whole lot to me. I know that usually several megabytes goes to system frame buffers, function pointers,(probably a bulk of allocations happens on driver side). But hitting over 100 Mb with just "empty" contexts looks too much.
I would like to know:
Why GL/CUDA context creation consumes such a considerable amount of memory?
Are there ways to optimize that?
The system setup under test:
Windows 10 64bit. NVIDIA GTX 960 GPU (Driver Version:388.31). 8 Gb RAM. Visual Studio 2015, 64bit C++ console project.
I measure memory consumption using Visual Studio built-in Diagnostic Tools -> Process Memory section.
UPDATE
I tried Process Explorer, as suggested by datenwolf. Here is the screenshot of what I got, (my process at the bottom marked with yellow):
I would appreciate some explanation on that info. I was always looking at "Private Bytes" in "VS Diagnostic Tools" window. But here I see also "Working Set", "WS Private" etc. Which one correctly shows how much memory my process currently uses? 281,320K looks way too much, because as I said above, the process at the startup does nothing, but creates CUDA and OpenGL contexts.
Partial answer: This is an OS-specific issue; on Linux, CUDA takes 9.3 MB.
I'm using CUDA (not OpenGL) on GNU/Linux:
CUDA version: 10.2.89
OS distribution: Devuan GNU/Linux Beowulf (~= Debian Buster without systemd)
Kernel: Linux 5.2.0
Processor: Intel x86_64
To check how much memory gets used by CUDA when creating a context, I ran the following C program (which also checks what happens after context destruction):
#include <stdio.h>
#include <cuda.h>
#include <malloc.h>
#include <stdlib.h>
static void print_allocation_stats(const char* s)
{
printf("%s:\n", s);
printf("--------------------------------------------------\n");
malloc_stats();
printf("--------------------------------------------------\n\n");
}
int main()
{
display_mallinfo("Initially");
int status = cuInit(0);
if (status != 0 ) { return EXIT_FAILURE; }
print_allocation_stats("After CUDA driver initialization");
int device_id = 0;
unsigned flags = 0;
CUcontext context_id;
status = cuCtxCreate(&context_id, flags, device_id);
if (status != CUDA_SUCCESS ) { return EXIT_FAILURE; }
print_allocation_stats("After context creation");
status = cuCtxDestroy(context_id);
if (status != CUDA_SUCCESS ) { return EXIT_FAILURE; }
print_allocation_stats("After context destruction");
return EXIT_SUCCESS;
}
(note that this uses a glibc-specific function, not in the standard library.)
Summarizing the results and snipping irrelevant parts:
Point in program
Total bytes
In-use
Max MMAP Regions
Max MMAP bytes
Initially
135168
1632
0
0
After CUDA driver initialization
552960
439120
2
307200
After context creation
9314304
6858208
8
6643712
After context destruction
7016448
580688
8
6643712
So CUDA starts with 0.5 MB and after allocating a context takes up 9.3 MB (going back down to 7.0 MB on destroying the context). 9 MB is still a lot of memory for not having done anything; but - maybe some of it is all-zeros, or uninitialized, or copy-on-write, in which case it doesn't really take up that much memory.
It's possible that memory use improved dramatically over the two years between the driver release with CUDA 8 and with CUDA 10, but I doubt it. So - it looks like your problem is Windows specific.
Also, I should mention I did not create an OpenGL context - which is another part of OP's question; so I haven't estimated how much memory that takes. OP brings up the question of whether the sum is greater than its part, i.e. whether a CUDA context would take more memory if an OpenGL context existed as well; I believe this should not be the case, but readers are welcome to try and report...

What causes this error: "address already known to kernel for another [busy] synchronizer type"?

I have a customer who is getting their system log flooded with thousands of copies of this message:
Jul 25 11:21:33 athayer-mbp13 kernel[0]: PSYNCH: pid[52893]: address already known to kernel for another [busy] synchronizer type
The culprit is my app, but I can’t reproduce the problem and don’t have much of a clue to its cause. My app does disk searching, and this error happens about 15 hours into the life of the process. There is no excessive memory usage or file descriptor leakage. The app continues to operate normally, it’s just that these messages cause the system log to blow up to gigabyte proportions and fill up the boot disk.
I found the Darwin kernel code where the message is printed, but it’s only a clue, it doesn’t show the smoking gun:
http://opensource.apple.com//source/xnu/xnu-1699.32.7/bsd/kern/pthread_support.c
FAILEDUSERTEST("address already known to kernel for another (busy) synchronizer type\n”);
It’s in this function:
/* find kernel waitqueue, if not present create one. Grants a reference */
int
ksyn_wqfind(user_addr_t mutex, uint32_t mgen, uint32_t ugen, uint32_t rw_wc, uint64_t tid, int flags, int wqtype, ksyn_wait_queue_t * kwqp)
Can anyone provide any insight into what’s going on?
Here’s the profile for the machine:
Model Name: MacBook Pro
Model Identifier: MacBookPro12,1
Processor Name: Intel Core i5
Processor Speed: 2.7 GHz
Number of Processors: 1
Total Number of Cores: 2
L2 Cache (per Core): 256 KB
L3 Cache: 3 MB
Memory: 8 GB
Boot ROM Version: MBP121.0167.B16
SMC Version (system): 2.28f7
Hardware UUID: 9205D058-90BF-541E-8E61-E75259ABC11F
System Software Overview:
System Version: OS X 10.11.4 (15E65)
Kernel Version: Darwin 15.4.0
Boot Volume: Macintosh HD
Boot Mode: Normal
Computer Name: athayer-mbp13
User Name: System Administrator (root)
Secure Virtual Memory: Enabled
system_integrity: integrity_enabled
Time since boot: 9 days 18:55
Possible Explanation
It's possible that you're being affected by an old kernel bug. If a pthread condition variable (the main component of a standard pthread_mutex family object) is allocated, but never waited on, there is a situation in which its object is never removed from a pthreads-internal registry on OSX.
If that happens, and if another mutex is later allocated that happens to end up in the same space in memory, and if that mutex is waited on, this error can occur, since the new mutex's ID will not match the one already present in its space. This is distinct from a reallocation issue where garbled/meaningless info is found instead of a valid ID.
Workaround
The workaround is to ensure that you are calling a a wait function on all mutexes/condvars you create. Even a nanosecond wait will trigger "correct" destruction when it completes on a no-longer-used mutex. An example of the fix by the Chromium devs is linked below.
For example, you could wait one nanosecond/tick on a lock thus:
struct timespec time { .tv_sec = 0, .tv_nsec = 1 };
pthread_cond_timedwait_relative_np(
&some_condition_handle,
&some_lock_handle,
time
);
Confounding Factors
The kernel bug may not be the real issue. There are a lot of confounding factors here:
The kernel source hasn't been published for 10.10 or 10.11, so the code being called that generates that error may not be the code that you found online.
As a result of that, the kernel bug I mentioned may not still exist, or may not be reachable in the same way.
The error line you published has parens (()) around the word "busy", but the source you found has square brackets ([]). The places in code that print out the two different messages are distinct from each other, so the problem lines might not be the ones you pointed out in your question.
Relevant Links
Article by the first (only?) person who has diagnosed this issue: http://rayne3d.com/blog/02-27-2014-rayne-weekly-devblog-4
The problem gets exhibited in the pthread source (or it was, in pthread 105.1.4), visible at this link (search in the page for 13782056): https://opensource.apple.com/source/libpthread/libpthread-105.1.4/src/pthread_cond.c
An example fix like the workaround listed above was made by the Chromium team when they were affected by a similar (the same?) issue: https://codereview.chromium.org/1323293005
The original Apple Developer Forum link appears to be defunct, though I might just be unable to access it: https://devforums.apple.com/thread/220316?tstart=0

What can my 32-bit app be doing that consumes gigabytes of physical RAM?

A co-worker mentioned to me a few months ago that one of our internal Delphi applications seems to be taking up 8 GB of RAM. I told him:
That's not possible
A 32-bit application only has a 32-bit virtual address space. Even if there was a memory leak, the most memory it could consume is 2 GB. After that allocations would fail (as there would be no empty space in the virtual address space). And in the case of a memory leak, the virtual pages will be swapped out to the pagefile, freeing up physical RAM.
But he noted that Windows Resource Monitor indicated that less than 1 GB of RAM was available on the system. And while our app was only using 220 MB of virtual memory: closing it freed up 8 GB of physical RAM.
So I tested it
I let the application run for a few weeks, and today I finally decided to test it.
First I look at memory usage before closing the app, using Process Explorer:
the working set (RAM) is: 241 MB
total virtual memory used: 409 MB
And I used Resource Monitor to check memory used by the app, and total RAM in use:
virtual memory allocated by application: 252 MB
physical memory in use: 14 GB
And then memory usage after closing the app:
physical memory in use: 6.6 GB (7.4 GB less)
I also used Process Explorer to look at a breakdown of physical RAM use before and after. The only difference is that 8 GB of RAM really was uncommitted and now free:
Item
Before
After
Commit Charge (K)
15,516,388
7,264,420
Physical Memory Available (K)
1,959,480
9,990,012
Zeroed Paging List (K)
539,212
8,556,340
Note: It's somewhat interesting that Windows would waste time instantly zeroing out all the memory, rather than simply putting it on a standby list, and zero it out as needed (as memory requests need to be satisfied).
None of those things explain what the RAM was doing (What are you doing just sitting there! What do you contain!?)
What is in that memory?
That RAM must contain something useful; it must have some purpose. For that I turned to SysInternals' RAMMap. It can break down memory allocations.
The only clue that RAMMap provides is that the 8 GB of physical memory was associated with something called Session Private. These Session Private allocations are not associated with any process (i.e. not my process):
Item
Before
After
Session Private
8,031 MB
276 MB
Unused
1,111 MB
8,342 MB
I'm certainly not doing anything with EMS, XMS, AWE, etc.
What could possibly be happening in a 32-bit non-Administrator application that is causing Windows to allocate an additional 7 GB of RAM?
It's not a cache of swapped out items
it's not a SuperFetch cache
It's just there, consuming RAM.
Session Private
The only information about "Session Private" memory is from a blog post announcing RAMMap:
Session Private: Memory that is private to a particular logged in session. This will be higher on RDS Session Host servers.
What kind of app is this?
This is a 32-bit native Windows application (i.e. not Java, not .NET). Because it is a native Windows application it, of course, makes heavy use of the Windows API.
It should be noted that I wasn't asking people to debug the application; I was hoping a Windows developer out there would know why Windows might hold memory that I never allocated. Having said that, the only thing changed recently (in the last 2 or 3 years) that could cause such a thing is the feature that takes a screenshot every 5 minutes and saves it to the user's %LocalAppData% folder. A timer fires every five minutes:
QueueUserWorkItem(TakeScreenshotThreadProc);
And pseudo-code of the thread method:
void TakeScreenshotThreadProc(Pointer data)
{
String szFolder = GetFolderPath(CSIDL_LOCAL_APPDTA);
ForceDirectoryExists(szFolder);
String szFile = szFolder + "\\" + FormatDateTime("yyyyMMdd'_'hhnnss", Now()) + ".jpg";
Image destImage = new Image();
try
{
CaptureDesktop(destImage);
JPEGImage jpg = new JPEGImage();
jpg.CopyFrom(destImage);
jpg.CompressionQuality = 13;
jpg.Compress();
HANDLE hFile = CreateFile(szFile, GENERIC_WRITE,
FILE_SHARE_READ | FILE_SHARE_WRITE, null, CREATE_ALWAYS,
FILE_ATTRIBUTE_ARCHIVE | FILE_ATTRIBUTE_ENCRYPTED, 0);
//error checking elucidated
try
{
Stream stm = new HandleStream(hFile);
try
{
jpg.SaveToStream(stm);
}
finally
{
stm.Free();
}
}
finally
{
CloseHandle(hFile);
}
}
finally
{
destImage.Free();
}
}
Most likely somewhere in your application you are allocating system resources and not releasing them. Any WinApi call that creates an object and returns a handle could be a suspect. For example (be careful running this on a system with limited memory - if you don't have 6GB free it will page badly):
Program Project1;
{$APPTYPE CONSOLE}
uses
Windows;
var
b : Array[0..3000000] of byte;
i : integer;
begin
for i := 1 to 2000 do
CreateBitmap(1000, 1000, 3, 8, #b);
ReadLn;
end.
This consumes 6GB of session memory due to the allocation of bitmap objects that are not subsequently released. Application memory consumption remains low because the objects are not created on the application's heap.
Without knowing more about your application, however, it is very difficult to be more specific. The above is one way to demonstrate the behaviour you are observing. Beyond that, I think you need to debug.
In this case, there are a large number of GDI objects allocated - this isn't necessarily indicative, however, since there are often a large number of small GDI objects allocated in an application rather than a large number of large objects (The Delphi IDE, for example, will routinely create >3000 GDI objects and this is not necessarily a problem).
In #Abelisto's example (in comments), by contrast :
Program Project1;
{$APPTYPE CONSOLE}
uses
SysUtils;
var
i : integer;
sr : TSearchRec;
begin
for i := 1 to 1000000 do FindFirst('c:\*', faAnyFile, sr);
ReadLn;
end.
Here the returned handles are not to GDI objects but are rather search handles (which fall under the general category of Kernel Objects). Here we can see that there are a large number of handles used by the process. Again, process memory consumption is low but there is a large increase in session memory used.
Similarly, the objects might be User Objects - these are created by calls to things like CreateWindow, CreateCursor, or by setting hooks with SetWindowsHookEx. For a list of WinAPI calls that create objects and return handles of each type, see :
Handles and Objects : Object Categories -- MSDN
This can help you start to track down the issue by narrowing it to the type of call that could be causing the problem. It may also be in a buggy third-party component, if you are using any.
A tool like AQTime can profile Windows allocations, but I'm not sure if there is a version that supports Delphi5. There may be other allocation profilers that can help track this down.

Unexpected Heap Dumps for Hello World Android APP

I am learning about Memory Utilization using the MAT in Eclipse. Though I have ran into a strange problem. Leave aside the heavy apps, I began with the most benign The "Hello World" App. This is what I get as Heap Stats on Nexus 5, ART runtime, Lollipop 5.0.1.
ID: 1
Heap Size: 25.429 MB
Allocated: 15.257 MB
Free: 10.172 MB
% Used: 60%
# Objects: 43487
My Heap dump gives me 3 Memory Leak suspects:
Overview
"Can't post the Pie Chart because of low reputation."
Problem Suspect 1
The class "android.content.res.Resources", loaded by "", occupies 10,166,936 (38.00%) bytes. The memory is
accumulated in one instance of "android.util.LongSparseArray[]" loaded
by "".
Keywords android.util.LongSparseArray[] android.content.res.Resources
Problem Suspect 2
209 instances of "android.graphics.NinePatch", loaded by "" occupy 5,679,088 (21.22%) bytes. These instances are
referenced from one instance of "java.lang.Object[]", loaded by
"" Keywords java.lang.Object[]
android.graphics.NinePatch
Problem Suspect 3
8 instances of "java.lang.reflect.ArtMethod[]", loaded by "" occupy 3,630,376 (13.57%) bytes. Biggest instances:
•java.lang.reflect.ArtMethod[62114] # 0x70b19178 - 1,888,776 (7.06%)
bytes. •java.lang.reflect.ArtMethod[21798] # 0x706f5a78 - 782,800
(2.93%) bytes. •java.lang.reflect.ArtMethod[24079] # 0x70a9db88 -
546,976 (2.04%) bytes. Keywords java.lang.reflect.ArtMethod[]
This is all by a simple code of:
import android.app.Activity;
import android.os.Bundle;
public class MainActivity extends Activity {
#Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
}
}
Questions
Why are the heap numbers so big. ?
Also as a side note the app was consuming 52 MB of RAM in the system.
Where are these 209 instance of NinePatch coming ? I merely created the project by doing a "Create a new Project" in Eclipse ?
The first leak suspect of resources, It comes up all the time in my analysis of apps. Is it really a suspect ?
What is the ArtMethod? Does it have to do something with the ART runtime ?
In Lollipop the default runtime is ART i.e Android Run Time, which replaces the old Dalvik Run Time(DRT) used in older Android versions.
In KitKat, Google released an experimental version of ART to get feedback from the users.
In Dalvik JIT(just in time compilation) is used, which means when you open the application only then the DEX code is converted to object code.
However, in ART the dex code is converted to object code(i.e AOT ahead of time compilation) during installation itself. The size of this object code is bigger compared to the DEX code therefore ART needs more RAM than DRT. The advantage of ART is that ART apps have better response time over DRT apps.
Yesterday i'm faced with this problem too. In your log key word is "NinePatch". In my case the cause was a "fake" shadow - tiny picture with alpha channel which trigger resource leak. It's costs about 60mb leaked memory for me.

Why core file is more than virtual memory?

I have a multithreaded program running which crashes after a day or two. Moreover the gdb backtrace of the core dump does not lead anywhere. There are no symbols at the point where it crashes.
Now the machine that generates the core file has a physical memory of 3 Gigs and 5 Gigs swap space. But the core dump that we get is around 25 Gigs. Isn't the core dump actually memory dump? Why is the core dump large?
And can anyone give me more lead on how to debug in such situation?
If you are running a 64-bit OS then you can have file-backed mappings that exceed many times the amount of available physical memory + swap space.
Since kernel version 2.6.23, Linux provides a mechanism to control what gets included in the core dump file, called core dump filter. The value of the filter is a bit-field manipulated via the /proc/<pid>/coredump_filter file (see core(5) man page):
bit 0 (0x01) - anonymous private mappings (e.g. dynamically allocated memory)
bit 1 (0x02) - anonymous shared mappings
bit 2 (0x04) - file-backed private mappings
bit 3 (0x08) - file-backed shared mappings (e.g. shared libraries)
bit 4 (0x10) - ELF headers
bit 5 (0x20) - private huge pages
bit 6 (0x40) - shared huge pages
The default value is 0x33 which corresponds to dumping all anonymous mappings as well as the ELF headers (but only if kernel is compiled with CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS) and the private huge pages. Reading from this file returns the hexadecimal value of the filter. Writing a new hexadecimal value to coredump_filter changes the filter for the particular process, e.g. to enable dump of all possible mappings one would:
echo 0x7f > /proc/<pid>/coredump_filter
(where <pid> is the PID of the process)
The value of the core dump filter is iherited in child processes created by fork().
Some Linux distributions might change the filter value for the init process early in the OS boot stage, e.g. to enable dumping the file-backed mappings. This would then affect any process started later.
A core dump contains more than just the state of the memory of the process. See the answer at https://stackoverflow.com/a/5321564/91757 for examples of other information included in the core dump (on Linux).

Resources