Xamarin Android no longer printing MONO GC log messages - xamarin

I have a Xamarin.Forms app that runs on Android, which has been in development for several years. Until recently, when running the app I would see MONO GC messages whenever a garbage collection occurred, similar to the following:
04-11 16:46:22.658 D/Mono (14892): GC_BRIDGE waiting for bridge processing to finish
04-11 16:46:22.661 D/Mono (14892): GC_TAR_BRIDGE bridges 424 objects 2303 colors 445 ignored 814 sccs 424 xref 150 cache 0/0 setup 0.16ms tarjan 2.39ms scc-setup 0.20ms gather-xref 0.06ms xref-setup 0.02ms cleanup 0.31ms
04-11 16:46:22.661 D/Mono (14892): GC_BRIDGE: Complete, was running for 49.28ms
04-11 16:46:22.661 D/Mono (14892): GC_MAJOR: (LOS overflow) time 27.89ms, stw 29.64ms los size: 2048K in use: 467K
04-11 16:46:22.661 D/Mono (14892): GC_MAJOR_SWEEP: major size: 3936K in use: 2558K
But now when I run the app I do NOT see any of these types of messages when a GC is performed. I see output from ART, such as:
[art] Starting a blocking GC Explicit
[art] Explicit concurrent mark sweep GC freed 3(72B) AllocSpace objects, 0(0B) LOS objects, 24% free, 1693KB/2MB, paused 712us total 30.887ms
When I use the environment.txt file to configure more verbose logging (e.g. MONO_LOG_LEVEL=info) then I get one additional line of output from MONO like:
[monodroid-gc] GC cleanup summary: 25 objects tested - resurrecting 25.
But this is not really useful to me. I want to see the information from GC_BRIDGE, GC_MAJOR, and GC_MAJOR_SWEEP. I have searched and searched and cannot find anyone else reporting this issue, but it seems that something changed in Xamarin.Android at some point.

Related

What causes the dynamically allocated error messages in Twincat 4024 and how do I get rid of them?

We have a project which was made with 4022.29 originally. We also tried to run the project with TwinCAT 4024.x. When the configuration is activated for the first time on my local machine it runs fine. However, when I restart the project or activate the configuration I get the following error messages:
Error 27.08.2019 14:06:37 322 ms | 'Port_851' (851): PLC: PLC instance xxx Instance tried to free pointer 0xffff9e02fe620bd8 which was not allocated by the PLC instance.
Error 27.08.2019 14:06:37 322 ms | 'Port_851' (851): PLC: PLC instance xxx Instance tried to free pointer 0xffff9e02fe620b48 which was not allocated by the PLC instance.
… (~20 more error messages)
Error 27.08.2019 14:06:37 322 ms | 'Port_851' (851): PLC: PLC instance xxx Instance tried to free pointer 0xffff9e02fe61fe28 which was not allocated by the PLC instance.
Error 27.08.2019 14:06:37 322 ms | 'Port_851' (851): PLC: PLC instance xxx Instance did not free dynamically allocated memory block at address 0xffff9e02fe616878 of size 65.
Error 27.08.2019 14:06:37 322 ms | 'Port_851' (851): PLC: PLC instance xxx Instance did not free dynamically allocated memory block at address 0xffff9e02fe6167d8 of size 65.
… (~20 more error messages)
Error 27.08.2019 14:06:37 322 ms | 'Port_851' (851): PLC: PLC instance xxx Instance did not free dynamically allocated memory block at address 0xffff9e02fe615978 of size 55.
What causes these error messages? Why do they suddenly show up? Should I get rid of them and if yes how do I do it?
Partial answer
This answer can use some improvements/better understanding. I'll post it here to collect some information on the solutions.
Origin
From Beckhoff support:
The error messages you receive lead to dynamically allocated memory in the router memory (such as __new() ) or not released interface pointers.
The mentioned error messages were implemented with the 4024 release. In the older versions of TwinCAT we were not able to detect such a memory lack.
How I got rid of them
I am not quite sure how to put the above in my own words due to lack of understanding. However, I did track down the origin of the error messages in our project.
Binary search
What I did is to use binary search to track down the origin of the issue. First I kept disabling half of the active tasks of the project until the error message disappeared. Then I re-enabled tasks until I had the specific task causing the issue. After that I did the same with the code running in this task. En/disabling the remaining 50% to track down the code causing the issues.
Origin
In the end I found a function block which used other function blocks as its input. When I changed the inputs from
VAR_INPUT
fbSomeFb : FB_SomeFB;
END_VAR
into
VAR_INPUT
fbSomeFb : REFERENCE TO FB_SomeFB;
END_VAR
the error messages disappeared when the project was restarted.
Fixed another issue
After getting rid of these error messages, another issue with this particular project was solved. We always had the issue that the PLC crashed and restarted when we activated a configuration, or put in into config mode. This only happened on the machine PLC (not any of our development PLCs).
You allocated memory on the heap for an object using the __NEW function. You need to deallocate it. In dynamic programming you need to deallocate an object after you're done using it.
The way to do it in TwinCAT is to use the __DELETE function.
If you're using __NEW in a Function Block (FB), you can simply deallocate the object in the FB_Exit(...) method by calling the __DELETE function there.
e.g.
In FB_Init(...) you put:
pData := __NEW(INT);
In FB_Exit(...) you put:
__DELETE(pData);
FB_Exit(...) will be called whenever you FB moves out of scope. This will automatically deallocate the object from memory.
If you dont want to use FB_Exit(...) you need to think carefully about the conditions necessary for your program to deallocate the object you created from memory.

.NET application handle leak, how to locate the source?

I have a .NET application running in production environment (WINDOWS XP + .NET 3.5 SP1) with a stable handle count around 2000, but in some unknown situation, its handle count will increase extremely fast and finally crash itself(over 10,000 which monitored by PerfMon tool).
I've made a memory dump from there during the increasing period (not crash yet) and imported to WinDbg, can see the overall handle summary:
0:000> !handle 0 0
7229 Handles
Type Count
None 19
Event 504
Section 6108
File 262
Port 15
Directory 3
Mutant 56
WindowStation 2
Semaphore 70
Key 97
Token 2
Process 3
Thread 75
Desktop 1
IoCompletion 9
Timer 2
KeyedEvent 1
  
so no surprise, the leak type is the Section, dig more:
0:000> !handle 0 ff Section
Handle 00007114
Type Section
Attributes 0
GrantedAccess 0xf0007:
Delete,ReadControl,WriteDac,WriteOwner
Query,MapWrite,MapRead
HandleCount 2
PointerCount 4
Name \BaseNamedObjects\MSCTF.MarshalInterface.FileMap.IBC.AKCHAC.CGOOBGKD
No object specific information available
Handle 00007134
Type Section
Attributes 0
GrantedAccess 0xf0007:
Delete,ReadControl,WriteDac,WriteOwner
Query,MapWrite,MapRead
HandleCount 2
PointerCount 4
Name \BaseNamedObjects\MSCTF.MarshalInterface.FileMap.IBC.GKCHAC.KCLBDGKD
No object specific information available
...
...
...
...
6108 handles of type Section
can see the BaseNamedObjects' naming convention are all MSCTF.MarshalInterface.FileMap.IBC.***.*****.
Basically I was stopped here, and could not go any further to link the information to my application.
Anyone could help?
[Edit0]
Tried several combination of GFlags command(+ust or via UI), with no luck, the dumps opened with WinDbg always see nothing via !htrace, so have to using attach process which finally I got the stack for above leaking handle:
0:033> !htrace 1758
--------------------------------------
Handle = 0x00001758 - OPEN
Thread ID = 0x00000768, Process ID = 0x00001784
0x7c809543: KERNEL32!CreateFileMappingA+0x0000006e
0x74723917: MSCTF!CCicFileMappingStatic::Create+0x00000022
0x7473fc0f: MSCTF!CicCoMarshalInterface+0x000000f8
0x747408e9: MSCTF!CStub::stub_OutParam+0x00000110
0x74742b05: MSCTF!CStubIUnknown::stub_QueryInterface+0x0000009e
0x74743e75: MSCTF!CStubITfLangBarItem::Invoke+0x00000014
0x7473fdb9: MSCTF!HandleSendReceiveMsg+0x00000171
0x7474037f: MSCTF!CicMarshalWndProc+0x00000161
*** ERROR: Symbol file could not be found. Defaulted to export symbols for C:\Windows\system32\USER32.dll -
0x7e418734: USER32!GetDC+0x0000006d
0x7e418816: USER32!GetDC+0x0000014f
0x7e4189cd: USER32!GetWindowLongW+0x00000127
--------------------------------------
and then I got stuck again, the stack seems not contain any of our user code, what is the suggestion for move forward?
WinDbg isn't the ideal tool for memory leaks, especially not without preparation in advance.
There's a GFlags option (+ust) which can be enabled for a process to record the stack trace for handle allocations. If you don't have this flag enabled, you'll probably not get more info out of your dump. If you have it, use !htrace to see the stack.
You can also try UMDH (user mode dump heap), which is a free tool. Or get something like memory validator which has certainly a better usability, so it might pay off in the long run.

Unexpected Heap Dumps for Hello World Android APP

I am learning about Memory Utilization using the MAT in Eclipse. Though I have ran into a strange problem. Leave aside the heavy apps, I began with the most benign The "Hello World" App. This is what I get as Heap Stats on Nexus 5, ART runtime, Lollipop 5.0.1.
ID: 1
Heap Size: 25.429 MB
Allocated: 15.257 MB
Free: 10.172 MB
% Used: 60%
# Objects: 43487
My Heap dump gives me 3 Memory Leak suspects:
Overview
"Can't post the Pie Chart because of low reputation."
Problem Suspect 1
The class "android.content.res.Resources", loaded by "", occupies 10,166,936 (38.00%) bytes. The memory is
accumulated in one instance of "android.util.LongSparseArray[]" loaded
by "".
Keywords android.util.LongSparseArray[] android.content.res.Resources
Problem Suspect 2
209 instances of "android.graphics.NinePatch", loaded by "" occupy 5,679,088 (21.22%) bytes. These instances are
referenced from one instance of "java.lang.Object[]", loaded by
"" Keywords java.lang.Object[]
android.graphics.NinePatch
Problem Suspect 3
8 instances of "java.lang.reflect.ArtMethod[]", loaded by "" occupy 3,630,376 (13.57%) bytes. Biggest instances:
•java.lang.reflect.ArtMethod[62114] # 0x70b19178 - 1,888,776 (7.06%)
bytes. •java.lang.reflect.ArtMethod[21798] # 0x706f5a78 - 782,800
(2.93%) bytes. •java.lang.reflect.ArtMethod[24079] # 0x70a9db88 -
546,976 (2.04%) bytes. Keywords java.lang.reflect.ArtMethod[]
This is all by a simple code of:
import android.app.Activity;
import android.os.Bundle;
public class MainActivity extends Activity {
#Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
}
}
Questions
Why are the heap numbers so big. ?
Also as a side note the app was consuming 52 MB of RAM in the system.
Where are these 209 instance of NinePatch coming ? I merely created the project by doing a "Create a new Project" in Eclipse ?
The first leak suspect of resources, It comes up all the time in my analysis of apps. Is it really a suspect ?
What is the ArtMethod? Does it have to do something with the ART runtime ?
In Lollipop the default runtime is ART i.e Android Run Time, which replaces the old Dalvik Run Time(DRT) used in older Android versions.
In KitKat, Google released an experimental version of ART to get feedback from the users.
In Dalvik JIT(just in time compilation) is used, which means when you open the application only then the DEX code is converted to object code.
However, in ART the dex code is converted to object code(i.e AOT ahead of time compilation) during installation itself. The size of this object code is bigger compared to the DEX code therefore ART needs more RAM than DRT. The advantage of ART is that ART apps have better response time over DRT apps.
Yesterday i'm faced with this problem too. In your log key word is "NinePatch". In my case the cause was a "fake" shadow - tiny picture with alpha channel which trigger resource leak. It's costs about 60mb leaked memory for me.

Transparent huge pages disabled but compact_stall is not null

We noticed large JVM pauses during garbage collection where user and system times are much smaller than the total time. [Times: user=3.99 sys=0.55, real=34.29 secs] We suspected it could be due to memory management and checked transparent and huge pages config which show both are disabled:
/sys/kernel/mm/redhat_transparent_hugepage/enabled:always [never]
/sys/kernel/mm/redhat_transparent_hugepage/defrag:[always] never
/sys/kernel/mm/redhat_transparent_hugepage/khugepaged/defrag:[yes] no
However looking at THP and related counters, we see a lot of compaction stalls:
egrep 'trans|thp|compact_' /proc/vmstat
nr_anon_transparent_hugepages 0
compact_blocks_moved 113682
compact_pages_moved 3535156
compact_pagemigrate_failed 0
compact_stall 1944
compact_fail 186
compact_success 1758
thp_fault_alloc 6
thp_fault_fallback 0
thp_collapse_alloc 15
thp_collapse_alloc_failed 0
thp_split 17
So the question is, why THP and compaction stall/fail counters are not 0 if THPs are disabled and how to disable compaction so it does not interfere with our JVM (which we believe is the reason of long GC pauses)
This is happening on RHEL6.2, 2.6.32-279.5.2.el6.x86_64, JVM 6u21 32-bit. Thanks!
To really get rid of THP you must make sure that not only the THP daemon is disabled, but also the THP defrag tool. defrag will run independent from THP, while the settings in /sys/kernel/mm/khugepaged/defrag only allow control whether the THP daemon may run defrag as well.
That means: Even if your applications don't get the (potential) benefit of THP, the defragmentation process which makes your system stall is still active.
It is encouraged to use the distribution independent path for controlling THP & defrag settings:
/sys/kernel/mm/transparent_hugepage/ (which may be a symlink to /sys/kernel/mm/redhat_transparent_hugepage)
This results in:
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag
If you are running a java application and want to know whether THP/defrag is causing jvm pauses or stalls, it may be worth to have a look into your gc log. Having -XX:+PrintGcDetails enabled, you may observe "real" times that are significantly longer that the sys/user times.
In my case the following one-liner was sufficient
less gc.log | grep sys=0 | grep user=0 | grep -P "real=[1-9]"
The earliest description of the negative effects of THP is afaik this blog post by Greg Rahn: http://structureddata.org/2012/06/18/linux-6-transparent-huge-pages-and-hadoop-workloads/

Why is half the profiling samples "Unknown Frame(s)"

I'm trying to profile our application by:
Compiling with no optimizations
Linking the c++ code with /profile and debug information.
Doing the command line profiling dance:
vsperfcmd /start:sample /output:profile
vsperfcmd /globalon
vsperfcmd /launch:application.exe /timer:50000
The profiling works, but for some reason, about 50% of the samples are not identified:
Function Name Inclusive Samples Exclusive Samples
Unknown Frame(s) 55.01% 47.51% <-- WHAT IS THIS?
_wWinMainCRTStartup 54.79% 0.00%
[mfc100u.dll] 47.95% 1.56%
__tmainCRTStartup 42.75% 0.00%
I'm guessing that it is not one function which it can't identify, but that it groups all unidentified functions into a single "function". This makes it hard to reason about it, since it will be called from many functions, and similarly calls many functions. Most of them being unrelated.
On would think that it should at least be able to figure out which module the sample was taken from?

Resources