Weblogic server down with outofmemory - weblogic-10.x

Is there any way to pro actively find like 'my server will down with out of memory in after some minutes.'
Is there any steps to find that. Want to know my server is going to down with out of memory in few minutes. before its occurring

I believe one of the only solutions is to keep an eye on the log files. There is the standard error message which is written to the log files at a defined time interval '% of the server memory is free'. If this figure is consistently low and shrinking then you know there is something wrong.
This document also suggests extra methods to add logging and increase alert levels for low memory conditions.

Related

Lost Duration while Debugging Apex CPU time limit exceeded

I'm open to posting the code in this section to work through the optimization but its a bit length and complex, so instead I'm hoping that somebody can assist me with a few debugging questions I have. My goal is to find out what is causing my Apex CPU Time Limit Exceeded issue.
When using the Debug Log in its basic or normal layout I receive the message
Maximum CPU Time: 15062 out of 10,000 ** Close to Limit
I've optimized and re-wrote various loops and queries several times now and in each case this number concludes around there which leads me to believe it is lying to me and that my actual usage far exceeds that number. So on my journey I switched the Log Panels of the Developer Console to Analysis in hopes of isolating exactly what loop, method, or area of the code is giving me a headache.
This leads me to my main question and problem.
Execution Tree, Performance Tree & Executed Units
All show me that my durations UNDER the 10,000ms allowance. My largest consumption is 3,556.19ms which is being used by a wrapper class I created and consumed in the constructor method where there is a fair amount of logic that is constructing a fairly complicated wrapper class that spans over 5-7 custom objects. Still even with those 3,000ms the remainder of the process shows at negligible times bringing my total around 4,000ms. Again my question is.... Why am I unable to see or find what is consuming all my time?
Incorrect Iteration Data
In addition to this, on the Performance tree there is a column of data that shows the number of iterations for each method. I know that my Production Org has 81 objects that would essentially call the constructor for my custom wrapper object. I.E. my Constructor SHOULD be called 81 times, but instead it is called 32 times. So my other question is can I rely on the iteration data in the column? Or because it was iterating so many times does it stop counting at a certain point? Its possible that one of my objects is corrupted or causing an infinite loop somehow, but I don't want to dig through all the data in search of that conclusion if its a known issue that the iteration data is not accurate anyway.
System.Debug in the Production org
The Last question is why my System.Debug() lines are not displaying in my Developer Console on the production org. I've added serveral breadcrumbs throughout the code that would help me isolate just which objects are making it through and which are not, however, I cannot in any layout view system.debug messages outside of my Sandbox.
Sorry for the wealth of questions but I did want to give an honest effort to better understand the debugging process in Salesforce. If this is a lost cause I'm happy to start sharing some code as well but hopefully some debugging tips can get me to the solution.
It's likely your debug log got truncated, see "Each debug log must be 20 MB or smaller. If it exceeds this amount, you won’t see everything you need." in https://trailhead.salesforce.com/en/content/learn/modules/apex_basics_dotnet/debugging_diagnostics
Download the log and search for text similar to "skipped 123456 bytes of detailed log" to confirm, some system.debug statements will just not show up.
You might have to fine-tune the log levels (don't log validation rules and workflows? don't log every single variable assignment with "FINE" level etc). You might have to set all flags to NONE, then track only 1 particular class/trigger that you suspect (see https://help.salesforce.com/articleView?id=code_debug_log_classes.htm&type=5 and https://salesforce.stackexchange.com/questions/214380/how-are-we-supposed-to-use-debug-logs-for-a-specific-apex-class-only)
If it's truncated it's possible analysis tools give up (I had mixed luck with console to be honest, sometimes https://apextimeline.herokuapp.com/ is great to give overview - but it'll also fail to parse a 20 MB log...
When all else fails you can load up the log into Notepad++ (or any editor of your choice), find lines related to method entry/method exit (you might need a regular expression search), take these filtered lines tor excel, play with "text to columns" and just look at timing manually, see if there's a record that causes the spike. Because it could be #10 that's the problem, the fact it exhausts limits on #32 of 81 doesn't mean much. Search like [METHOD_ENTRY|METHOD_EXIT]MyTriggerHandler.onBeforeUpdate could be a good start. But first thing is to make sure log is not truncated.

RabbitMQ Message size limitiation?

I am trying to gauge the performance of RabbitMQ when my message size increases to a few MB. However, even when I sent a 32KB message, I get a Resource temporarily unavilable message from the Server. There's no error in the log files, there are no memory limit reaching errors... How do I go about debugging this issue?
If it's on any help, I'm running this on EC2 T1.micro instance.. So 592MB RAM.
According to the bug you linked, someone recently (looks like after you left the link to the bug) left a comment that they can reliably reproduce the bug when the message size is >=15821 bytes.
I would recommend that you see if that also holds true for you -- i.e. can you also reproduce at that threshold -- and then evaluate if under that amount -- thus avoiding the bug documented in the issue above -- is a sufficient size for your needs. If not, you may want to try pika (https://github.com/pika/pika) and see if that works better with larger messages (one of the other comments on that bug suggests that pika did work for them with larger message sizes).
Another option that may work, depending on your exact use case, would be to include in the rabbitmq message payload a key of sorts that points allows you to fetch the large blob of data from wherever it's stored (Postgres, MongoDB, etc.) when you consume the message, and therefore allow you to avoid the bug. Perhaps not ideal if you really want to encapsulate everything inside the payload, but may be a feasible workaround to the bug.
In terms of debugging, since it appears that this is a bug with rabbitpy itself, I think you would need to debug the actual rabbitpy library if you wanted to proceed on that front. Doable, but perhaps not feasible due to time, etc.

XPages performance - 2 apps on same server, 1 runs and 1 doesn't

We have been having a bit of a nightmare this last week with a business critical XPage application, all of a sudden it has started crawling really badly, to the point where I have to reboot the server daily and even then some pages can take 30 seconds to open.
The server has 12GB RAM, and 2 CPUs, I am waiting for another 2 to be added to see if this helps.
The database has around 100,000 documents in it, with no more than 50,000 displayed in any one view.
The same database set up as a training application with far fewer documents, on the same server always responds even when the main copy if crawling.
There are a number of view panels in this application - I have read these are really slow. Should I get rid of them and replace with a Repeat control?
There is also Readers fields on the documents containing Roles, and authors fields as it's a workflow application.
I removed quite a few unnecessary views from the back end over the weekend to help speed it up but that has done very little.
Any ideas where I can check to see what's causing this massive performance hit? It's only really become unworkable in the last week but as far as I know nothing in the design has changed, apart from me deleting some old views.
Try to get more info about state of your server and application.
Hardware troubleshooting is summarized here: http://www-10.lotus.com/ldd/dominowiki.nsf/dx/Domino_Server_performance_troubleshooting_best_practices
According to your experience - only one of two applications is slowed down, it is rather code problem. The best thing is to profile your code: http://www.openntf.org/main.nsf/blog.xsp?permaLink=NHEF-84X8MU
To go deeper you can start to look for semaphore locks: http://www-01.ibm.com/support/docview.wss?uid=swg21094630, or to look at javadumps: http://lazynotesguy.net/blog/2013/10/04/peeking-inside-jvms-heap-part-2-usage/ and NSDs http://www-10.lotus.com/ldd/dominowiki.nsf/dx/Using_NSD_A_Practical_Guide/$file/HND202%20-%20LAB.pdf and garbage collector Best setting for HTTPJVMMaxHeapSize in Domino 8.5.3 64 Bit.
This presentation gives a good overview of Domino troubleshooting (among many others on the web).
Ok so we resolved the performance issues by doing a number of things. I'll list the changes we did in order of the improvement gained, starting with the simple tweaks that weren't really noticeable.
Defrag Domino drive - it was showing as 32% fragmented and I thought I was on to a winner but it was really no better after the defrag. Even though IBM docs say even 1% fragmentation can cause performance issues.
Reviewed all the main code in the application and took a number of needless lookups out when they can be replaced with applicationScope variables. For instance on the search page, one of the drop down choices gets it's choices by doing an #Unique lookup on all documents in the database. Changed it to a keyword and put that in the application Scope.
Removed multiple checks on database.queryAccessRole and put the user's roles in a sessionScope.
DB had 103,000 documents - 70,000 of them were tiny little docs with about 5 fields on them. They don't need to be indexed by the FTIndex so we moved them in to a separate database and pointed the data source to that DB when these docs were needed. The FTIndex went from 500mb to 200mb = faster indexing and searches but the overall performance on the app was still rubbish.
The big one - I finally got around to checking the application properties, advanced tab. I set the following options :
Optimize document table map (ran copystyle compact)
Dont overwrite free space
Dont support specialized response hierarchy
Use LZ1 compression (ran copystyle compact with options to change existing attachments -ZU)
Dont allow headline monitoring
Limit entries in $UpdatedBy and $Revisions to 10 (as per domino documentation)
And also dont allow the use of stored forms.
Now I don't know which one of these options was the biggest gain, and not all of them will be applicable to your own apps, but after doing this the application flies! It's running like there are no documents in there at all, views load super fast, documents open like they should - quickly and everyone is happy.
Until the http threads get locked out - thats another question of mine that I am about to post so please take a look if you have any idea of what's going on :-)
Thanks to all who have suggested things to try.

SSIS Pipeline performance counters

I am trying to log the performance counters for my SSIS Pipeline, for things like Buffer memory, Buffers in use, Buffers spooled, etc.
I created a new log and added all those counters to it. Things are beeing logged, at every 15 seconds to the file, but all it's beeing logged are values of 0 - no matter the time of execution or the element beeing counted.
Something is wrong, but I don't know what... and google-ing it, I could find just a couple of people having this problem also, but no actual solution to it.
Any ideea is apreciated.
Thanks!
Have you applied all the latest service packs?

How are Process Explorer's memory metrics: WS Private, WS Shareable, WS Shared columns calculated?

I am continuing my saga to understand memory consumption by VB6 application.
The option that seems to work best so far is to monitor various memory metrics at key points at run-time and understand where big memory hogs are.
The measure driver to study this, is to understand how the application scalability in multi-user environment in Terminal Server (Citrix) is impacted due to changes in memory consumption (in simple terms more memory you use, less users you can fit on the server).
I can get most memory metrics for the process using GetProcessMemoryInfo.
Process Explorer reports additional metrics WS Private, WS Shareable, WS Shared - which seem very interesting for my investigation.
So question is, is there standard/hidden API to get these metric for a process? I would like to query these metrics programatically, so that I can capture them at key spots during application run and understand memory usage better.
See the QueryWorkingSet API. This looks rather nasty to use though, as it returns info on a per-page basis and would therefore leave it up to you to aggregate the totals. If there's a better method, please leave a comment and I'll delete this answer.
Also, if you have specific places in mind where you want to monitor changes in the working set, you might want to check out the InitializeProcessForWsWatch and GetWsChanges APIs -- these might make it easier to see how many pages have been faulted in rather than having to walk the entire page set before and after.

Resources