API Manager (1.8.0) busy 'doing nothing' - performance

I have a clustered WSO2 deployment. The CPU often at 30% (on a c2.large) and despite the CPU usage, the server isn't processing request it just seems to be busy doing nothing.
It seems that the SVN deepsync autocommit feature is the cause of the CPU consumption since if I switch off deepsync or simple set autocommit to false then I don't see the same CPU spiking.
The logs seem to back up this theory as I see:
TID: [0] [AM] [2015-02-20 16:30:14,100] DEBUG {org.wso2.carbon.deployment.synchronizer.subversion.SVNBasedArtifactRepository} - SVN adding files in /zzish/wso2am/repository/deployment/server {org.wso2.carbon.deployment.synchronizer.subversion.SVNBasedArtifactRepository}
TID: [0] [AM] [2015-02-20 16:30:52,932] DEBUG {org.wso2.carbon.deployment.synchronizer.subversion.SVNBasedArtifactRepository} - No changes in the local working copy {org.wso2.carbon.deployment.synchronizer.subversion.SVNBasedArtifactRepository}
TID: [0] [AM] [2015-02-20 16:30:52,932] DEBUG {org.wso2.carbon.deployment.synchronizer.internal.DeploymentSynchronizer} - Commit completed at Fri Feb 20 16:30:52 UTC 2015. Status: false {org.wso2.carbon.deployment.synchronizer.internal.DeploymentSynchronizer}
and during this time the CPU spike occurs.
As per https://docs.wso2.com/display/CLUSTER420/SVN-based+Deployment+Synchronizer I am using svnkit-1.3.9.wso2v1.jar.
I am using an external SVN service (silksvn) in order to avoid having to run my own HA subversion service.
So I have three questions:
Is it possible to reduce the frequency of the deepsync service ?
How to further debug this performance issue? running this hot smells like a bug.
Has anyone managed to get the git deployment sync working (Link to project on github) with AM 1.8.0 ?

Related

io.netty.util.ThreadDeathWatcher$Entry occupying heap resulting in OOM

We are connecting to JBoss EAP 7.1.0 GA using JMS in our application. Over a period of 5-10 days we see that our Old Gen eventually runs out of heap due to failed clean up of io.netty.util.ThreadDeathWatcherEntry* which keeps on retaining heap
I noticed that JBoss client jar is using the bundled Netty library v4.1.9. Below are the Netty version properties:
netty-buffer.version=4.1.9.Final-redhat-1
netty-buffer.buildDate=2017-05-19 04\:27\:45 -0400
netty-buffer.commitDate=2017-05-19 10\:08\:15 +0200
netty-buffer.shortCommitHash=e75b1f8
netty-buffer.longCommitHash=e75b1f856d38a057ab0886166a1b2f9578c64c25
netty-buffer.repoStatus=dirty
#Generated by netty-parent/pom.xml
#Fri, 19 May 2017 04:28:16 -0400
I found a reference on Red Hat, but I could not access it. I wanted to see if anyone else has seen this problem and has a better way to control this behavior to avoid OOM Error?

Postegresql slow connect time on Windows

I noticed that the connection to PostgreSQL is pretty slow.
import psycopg2
import time
start_time = time.time()
try:
db = psycopg2.connect("dbname='xx' user='xxx' host='127.0.0.1' password='xxx' port='5433'")
except Exception as e:
print(e)
exit(1)
print('connect time', time.time() - start_time)
Usual connect time is 2.5-3.5 seconds.
connect time 3.3095390796661377
Its pretty much default configuration of freshly installed PostgreSQL.
I turned off log_hostname but it changed nothing. I have run both PostgreSQL 9.4 and 10 and both have the same problem.
I'm using this machine for development, but even so, I noticed it because my Django requests take 2.5-3.5 seconds which make it unbearable even for development.
Windows 10
Python 2/3
psycopg2 2.7.4
Here relevant logs with max debug from PostgreSQL
2018-03-19 21:24:43.654 +03 [10048] DEBUG: 00000: forked new backend, pid=21268 socket=5072
2018-03-19 21:24:43.654 +03 [10048] LOCATION: BackendStartup, postmaster.c:4099
2018-03-19 21:24:45.248 +03 [21268] LOG: 00000: connection received: host=127.0.0.1 port=9897
It fork new backend and then only 2 seconds later logs connection received
UPD
Even if I manage to avoid connection delay of PostgreSQL ( for example via pgbouncer, or if PostgreSQL running in docker) request still take 1.3-2seconds, but from first sent package till last its only 0.022 second, all other time idk what is happening but not a network communication between client and server. Same code if run within docker - 0.025 second. From windows - 1.3-2sec but network interaction only 0.022sec
There actually two problems that might be caused by same thing or different, no idea.
1) Postgresql not sending the packet for 1.8 seconds for an unknown reason
2) Even if the first problem eliminated and network interaction down to 0.022 sec whole thing still take 1.3-2 sec ( using either psql or psycopg2)

Periodic tns-12531: TNS: Cannot allocate memory

I have a problem that's been plaguing me about a year now. I have Oracle 12.1.x.x installed on my machine. After a day or two the listener stops responding and the listener.log contains a bunch of TNS-12531 messages. If I reboot, the problem goes away and I'm fine for another day or two. I'm lazy and I hate rebooting, so I decided to finally track this down, but I'm having no luck. Since the alternative is to do work that I really don't want to do, I'm going to spend all my time researching this.
Some notes:
Windows 10 Pro
64-Bit
32 GB RAM
Generally, about 20GB free when the error occurs
I have several databases and it doesn't matter which DB is running
Restarting the DB doesn't help
Restarting the listener doesn't help
Only rebooting clears the problem
When I set TRACE_LEVEL_LISTENER = 16, I don't get much more info. Trace files are not written to
I can connect to the DB if I bypass the listener (ie, set ORACLE_SID=xxx and connect without a DB identifier)
All other network interactions seem to work fine after the listener stops
lsnrctl status hangs and adds another TNS-12531 to the listener.log
I have roughly the same config at home and this does not happen
Below is an example of a listener.log file:
Fri Jul 28 14:21:47 2017
System parameter file is D:\app\user\product\12.1.0\dbhome_1\network\admin\listener.ora
Log messages written to D:\app\user\diag\tnslsnr\LJ-Quad\listener\alert\log.xml
Trace information written to D:\app\user\diag\tnslsnr\LJ-Quad\listener\trace\ora_24288_14976.trc
Trace level is currently 16
Started with pid=24288
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=LJ-Quad)(PORT=1521)))
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(PIPENAME=\\.\pipe\EXTPROC1521ipc)))
Listener completed notification to CRS on start
TIMESTAMP * CONNECT DATA [* PROTOCOL INFO] * EVENT [* SID] * RETURN CODE
28-JUL-2017 14:22:06 * 12531
TNS-12531: TNS:cannot allocate memory
28-JUL-2017 14:22:47 * 12531
TNS-12531: TNS:cannot allocate memory
28-JUL-2017 14:26:24 * 12531
TNS-12531: TNS:cannot allocate memory
Thanks a bunch for any help you can provide!
Issue 1
This error can occur approximately after 2048 connections have been made via the listener when running on a non-English Windows installation.
Fix for Issue 1
Create a Windows User Group named Administrators on the computer where the listener.exe resides. This can fix the issue of the listener dying.
Reference: I'll post the link for the first issue as soon as I find it again
Issue 2
This error can also occur on Windows 64-Bit systems where the Desktop Application Heap is too small.
Fix for Issue 2
Try to Increase the Desktop Application Heap Registry in windows its located in
HKLM\System\CurrentControlSet\Control\Session Manager\SubSystems\Windows
Just as note don't add this Value by yourself, you have to depend on document.
Basically search for the registry entry and alter the third value for the key SharedSection=1024,20480,1024. This is a trial and error approach, but seems to improve listener's stability and memory issues.
Reference: TNS:cannot allocate memory - is there limit to the num databases on one box (Oracle Developer Community)

Rails web app ridiculously long loading time

I'm new to Ruby on rails and my mentor just handed me a ruby on rails web application. It's fairly large but even then it's taking a ridiculously long time to load: 45 minutes! By the time it hits 20 minutes the loading page of the app already displays an error saying 'Loading seems to be taking longer than usual, please refresh.'
I'm running rails 4.2.4 on a linux server 14.04 (in Virtualbox). I access the website from my host machine (Windows 8). The rails uses jbuilder 1.2 for building JSON.
From the development.log I gathered a ton of GET requests to load all the things. Here's a small selection of those:
Started GET "/assets/loader/loader.css?body=1" for 192.168.39.XXX at 2015-11-13 13:32:43 +0100
Started GET "/assets/reset.css?body=1" for 192.168.39.XXX at 2015-11-13 13:32:47 +0100
Started GET "/assets/bootstrap/bootstrap.css?body=1" for 192.168.39.XXX at 2015-11-13 13:32:50 +0100
Started GET "/assets/site/form.css?body=1" for 192.168.39.XXX at 2015-11-13 13:32:53 +0100
Started GET "/assets/temporary.css?body=1" for 192.168.39.XXX at 2015-11-13 13:32:57 +0100
Started GET "/assets/vendor/spectrum.css?body=1" for 192.168.39.XXX at 2015-11-13 13:33:01 +0100
Started GET "/assets/general.css?body=1" for 192.168.39.XXX at 2015-11-13 13:33:04 +0100
As you can see, it takes each GET about 3-5 seconds, the log file is 2225 lines long from ONE load.
Is there any way to speed up the process?
EDIT: I copied the entire application to a different folder and tried running it from there. Loading time was down to only a couple of minutes. I still get the error 'Loading seems to be taking longer than usual, please refresh.', so it isn't fixed at all.
I haven't used Virtualbox on Windows yet, but I had a similar problem with Virtualbox on MacOS. There was a bug related to VirtualBox shared folder. I switched to NFS sharing and the performance was greatly improved!
Sorry that I don't know how to configure NFS on Windows, but I hope this may help you.
Update:
Alternatively, I found a workaround for this issue. If you use Nginx(or Apache), just add this configuration option into your nginx.conf (or apache.conf) file.
In Nginx
sendfile off;
In Apache
EnableSendfile Off
Eventually I removed the shared folder and installed samba on my linux machine and set that up for an anonymously shared folder. Now the folder is still shared with windows, so I can use it as I always have, but without the loading time issues or file system problems.
I've also followed this tutorial to set up apache with passenger:
https://www.digitalocean.com/community/tutorials/how-to-setup-a-rails-4-app-with-apache-and-passenger-on-centos-6
Which decreased my loading time to 5 minutes total.

Why does a hudson "mvn clean install" build take 3-6x longer than the same on the command line?

We are seeing relatively long build times on our CI server (hudson) and they're starting to get in our way. I am aware hudson does more than invoke maven and I would happily grant it 10-20% more time for the job, but an order of magnitude slowdown seems too much.
Anyone have an idea of why this might be and how to solve the problem? I will start by saying what is not the cause:
the virtual machine hudson is running in: on the command line, it takes roughly the same amount of time as my development PC
other concurrent tasks: I made sure there was nothing diverting resources from the build task
The maven goals are literally clean and install, nothing fancy and resource-intensive like javadoc, checkstyle etc. Looking at the hudson build task console output, there seem to be delays when "Retrieving previous build number from [our Nexus artefact repository]", but I don't know of a simple way to measure performance of this step and publishing an artefact seems too simple an operation to justify the total difference in speed.
(problem also described in this thread)
Update:
We have upgraded Hudson/Jenkins to the latest release and have been able to use the timing plugin. Short version:
the good news: we now know nexus is causing the problem
the bad news: we still don't know why
More details
On one of our actual maven projects (maven build time: 3 min, hudson build time: 9 min) we could see that hudson also performs the build in 3 min, but then takes 6 minutes to upload the artefact to nexus.
Performing a manual upload of another artefact using nexus' web UI, I was able to confirm the following:
the actual artefact upload is done in a fraction of the time (i.e., in several seconds)
after these several seconds, the artefact appears as <nexusworkdir>/nexus/storage/test/test2/test2/1.0.0/test2-1.0.0.rpm
The real puzzler is why nexus takes over a minute to create this file:
<nexusworkdir>/nexus/proxy/attributes/test/test2/test2/1.0.0/test2-1.0.0.rpm
As far as I can tell, it just calculates an MD5 and SHA1 signature and records general artefact information, but md5sum and sha1sum of a 75MB file take <1s to run...
Finally, it does not appear to be some sort of network time-out, because the delay seems to be roughly proportional to artefact size.
Any idea what nexus does after it receives an artefact is appreciated.
Update 2:
Setting the nexus log level to debug, nexus logs the following when an artefact is uploaded:
...
2011-04-05 14:38:53 DEBUG [jpsc28za2RtYQ==] -
o.s.n.p.s.l.f.Defau~ -
Copying stream with buffer size of:
4096
2011-04-05 14:39:55 DEBUG [ython-2.5.2.jar] - org.mortbay.log
- RESPONSE /nexus/content/groups/public/org/python/jython/2.5.2/jython-2.5.2.jar
200
2011-04-05 14:40:07 DEBUG [-2.5.2.jar.sha1] - org.mortbay.log
- REQUEST /nexus/content/groups/public/org/python/jython/2.5.2/jython-2.5.2.jar.sha1
on
...
2011-04-05 14:40:12 DEBUG [-2.5.2.jar.sha1] - org.mortbay.log
- RESPONSE /nexus/content/groups/public/org/python/jython/2.5.2/jython-2.5.2.jar.sha1
200
2011-04-05 14:43:45 DEBUG [ndex.properties] - org.mortbay.log
- REQUEST /nexus/content/groups/public/.index/nexus-maven-repository-index.properties
on
org.mortbay.jetty.HttpConnection#141a720
...
2011-04-05 14:44:04 DEBUG [ndex.properties] -
o.s.n.p.m.m.M2Group~ - public
retrieveItem() :: FOUND
public:/.index/nexus-maven-repository-index.properties
2011-04-05 14:44:04 DEBUG [ndex.properties] - org.mortbay.log
- RESPONSE /nexus/content/groups/public/.index/nexus-maven-repository-index.properties
200
2011-04-05 14:48:07 DEBUG [jpsc28za2RtYQ==] -
o.s.n.p.a.DefaultAt~ -
Storing attributes on
UID=test:/test/test/1.0.1/test-1.0.1.rpm
...
2011-04-05 14:48:07 DEBUG [w/icon-info.gif] - org.mortbay.log
- servlet holder=nexus
2011-04-05 14:48:08 DEBUG [w/icon-info.gif] - org.mortbay.log
- RESPONSE /nexus/ext-2.3/resources/images/default/window/icon-info.gif
200
2011-04-05 14:49:01 DEBUG [c=1302007326656] - org.mortbay.log
- REQUEST /nexus/service/local/log/config on
org.mortbay.jetty.HttpConnection#1dbd88f
....
It appears to just be sitting there for a minute or so and than continues with its work. Any idea why nexus does this is appreciated.
As discussed in the thread I suspect your forked Maven isn't getting the JVM parameters passed. Can you use jconsole to check the max heap allowed is what you have allocated in your MAVEN_OPTS?
Does it make any difference if you start Hudson as a service vs. starting Hudson from the command line?
Update:
Deploying on Nexus takes a load of RAM, much more than compiling (in my experience). The swapping on low memory might be what's slowing it down.

Resources