disclaimer - I am running this on a mid 2012 macbook air i7-3667U and 8gb ram with the 64bit jvm.
Running the test suite for an application lein t is running at what I would consider an abnormally slow speed. Most of the tests involve mongo db (creating and dropping tables/collections). I have moved to monngodb enterprise which allows running in memory. As I assumed that the bottleneck was the db io.
with a mongo.conf
storage:
engine: inMemory
dbPath: /Users/beoliver/data/testdb
inMemory:
engineConfig:
inMemorySizeGB: 1
mongo is started with the flag --conf ~/path/to/mongo.conf
I added the java flags to the project
:jvm-opts ["-XX:-OmitStackTraceInFastThrow" "-Xmx4g" "-Xms1g"]
to try and avoid extra swaps.
This appeared to fix the issue and the tests ran as:
time lein t
...
lein t 238.71s user 8.72s system 59% cpu 6:57.92 total
This is reasonable compared with the results from other team members.
But then re-running the tests again the speed is back to the original (half and hour mark).
lein t 252.53s user 13.76s system 16% cpu 26:52.45 total
cpu usage peaks at about 50% but for the most part is around <5% (this includes times when it idles at <1%)
Real memory size: 1.55 GB
Virtual memory size : 8.08 GB
Shared Memory Size: 18.0 MB
Private Memory Size : 1.67 GB
Has anyone had similar experiences? Suggestions? Is there a good way of profiling - better than starting at Activity monitor?
Related
We have been running ProxmoxVE since 5.0 (now in 6.4-15) and we noticed a decay in performance whenever there is some heavy reading/writing.
We have 9 nodes, 7 with CEPH and 56 OSDs (8 on each node). OSDs are hard drives (HDD) WD Gold or better (4~12 Tb). Nodes with 64/128 Gbytes RAM, dual Xeon CPU mainboards (various models).
We already tried simple tests like "ceph tell osd.* bench" getting stable 110 Mb/sec data transfer to each of them with +- 10 Mb/sec spread during normal operations. Apply/Commit Latency is normally below 55 ms with a couple of OSDs reaching 100 ms and one-third below 20 ms.
The front network and back network are both 1 Gbps (separated in VLANs), we are trying to move to 10 Gbps but we found some trouble we are still trying to figure out how to solve (unstable OSDs disconnections).
The Pool is defined as "replicated" with 3 copies (2 needed to keep running). Now the total amount of disk space is 305 Tb (72% used), reweight is in use as some OSDs were getting much more data than others.
Virtual machines run on the same 9 nodes, most are not CPU intensive:
Avg. VM CPU Usage < 6%
Avg. Node CPU Usage < 4.5%
Peak VM CPU Usage 40%
Peak Node CPU Usage 30%
But I/O Wait is a different story:
Avg. Node IO Delay 11
Max. Node IO delay 38
Disk writing load is around 4 Mbytes/sec average, with peaks up to 20 Mbytes/sec.
Anyone with experience in getting better Proxmox+CEPH performance?
Thank you all in advance for taking the time to read,
Ruben.
Got some Ceph pointers that you could follow...
get some good NVMEs (one or two per server but if you have 8HDDs per server 1 should be enough) and put those as DB/WALL (make sure they have power protection)
the ceph tell osd.* bench is not that relevant for real world, I suggest to try some FIO tests see here
set OSD osd_memory_target to at 8G or RAM minimum.
in order to save some write on your HDD (data is not replicated X times) create your RBD pool as EC (erasure coded pool) but please do some research on that because there are some tradeoffs. Recovery takes some extra CPU calculations
All and all, hype-converged clusters are good for training, small projects and medium projects with not such a big workload on them... Keep in mind that planning is gold
Just my 2 cents,
B.
I'm looking to find way to reduce latencies / higher response time at P99. The application is running on Jboss application server. Current configuration of the system is 0.5 core and 2 GB memory.
Suspecting low TPS might be the reason for higher P99's because current usages of the application at peak traffic is 0.25 core, averaging "0.025 core". And old gen GC times are running at 1s. Heap setting -xmx1366m -xms512m, metaspace at 250mb
Right now we have parallel GC policy, will G1GC policy help?
What else should I consider?
I'm executing pleskbackup on a Ubuntu 18.04 LTS server to create a full backup.
This task has already been running for over half a day now. While the server isn't nearly working to capacity.
CPU: 1.5% (of 800%, 8 cores)
Memory: 3.9% (626 MB of 15.6 GB)
Is there any way to give this specific task more resources for speed it up?
I've set the priority already to the highest via htop.
– Thanks in advance.
I am running performance test for perf environment.
Below is the results:
CPU Utilization
Server Apdex Resp. time Throughput Error Rate CPU usage Memory
per001205 0.970.5 220 ms 2,670 rpm 0.0009 % 493.00% 2.2 GB
per001206 0.950.5 280 ms 2,670 rpm 0.0043 % 516.00% 2.4 GB
per011079 0.830.5 526 ms 2,670 rpm 0.0034 % 598.00% 2.5 GB
per011080 0.670.5 1,110 ms 2,670 rpm 0.0026 % 639.00% 2.6 GB
Can you comment on how the avergage response time? is it accepted?
I can see CPU usage is more than 100% , is it dangerous ?
How should i improve this? i am running it for 250 users.
First of all check out CPU usage mismatch or usage over 100% article.
Consider other monitoring method, i.e. go to hosts directly and check CPU usage via your operating system built-in commands or use JMeter PerfMon plugin to either confirm the picture or get an alternative view of CPU load. Depending on the result you have 2 options:
Either individual servers CPU usage is acceptable and you can decide whether throughput good or not
Or you need to fix the issue in your application code: using profiling tools for the programming language, your application is written in detect the most CPU intensive functions and refactor them to be less processor-time-hungry
I am currently using the Adobe Experience Manager for a Client's site (Java language). It uses openJDK:
#java -version
java version "1.7.0_65"
OpenJDK Runtime Environment (rhel-2.5.1.2.el6_5-x86_64 u65-b17)
OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)
It is running on Rackspace with the following:
vCPU: 4
Memory: 16GB
Guest OS: Red Hat Enterprise Linux 6 (64-bit)
Since it has been in production I have been experiencing very slow performance on the part of the application. It goes like this I launch the app, everything is smooth then 3 to 4 days later the CPU usage spikes to 400% (~4000 users/day hit the site). I got a few OOM exceptions (1 or 2) but mostly the site was exceptionally slow and never becomes an OOM exception. Since I am a novice at Java Memory management I started reading about how it works and found tools like jstat. When the system was overwhelmed the second time around, I ran:
#top
Got the PID of the java process and then pressed shift+H and noted the PIDs of the threads with high CPU percentage. Then I ran
#sudo -uaem jstat <PID>
Got a thread dump and converted the thread PIDs I wrote down previously and searched for their hex value in the dump. After all that, I finally found that it was not surprisingly the Garbage Collector that is flipping out for some reason.
I started reading a lot about Java GC tuning and came up with the following java options.
So restarted the application with the following options:
java
-Dcom.day.crx.persistence.tar.IndexMergeDelay=0
-Djackrabbit.maxQueuedEvents=1000000
-Djava.io.tmpdir=/srv/aem/tmp/
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/srv/aem/tmp/
-Xms8192m -Xmx8192m
-XX:PermSize=256m
-XX:MaxPermSize=1024m
-XX:+UseParallelGC
-XX:+UseParallelOldGC
-XX:ParallelGCThreads=4
-XX:NewRatio=1
-Djava.awt.headless=true
-server
-Dsling.run.modes=publish
-jar crx-quickstart/app/cq-quickstart-6.0.0-standalone.jar start
-c crx-quickstart -i launchpad -p 4503
-Dsling.properties=conf/sling.properties
And it looks like it is performing much better but I think that it probably needs more GC tuning.
When I run:
#sudo -uaem jstat <PID> -gcutils
I get this:
S0 S1 E O P YGC YGCT FGC FGCT GCT
0.00 0.00 55.97 100.00 45.09 4725 521.233 505 4179.584 4700.817
after 4 days that I restarted it.
When I run:
#sudo -uaem jstat <PID> -gccapacity
I get this:
NGCMN NGCMX NGC S0C S1C EC
4194304.0 4194304.0 4194304.0 272896.0 279040.0 3636224.0
OGCMN OGCMX OGC OC PGCMN PGCMX
4194304.0 4194304.0 4194304.0 4194304.0 262144.0 1048576.0
PGC PC YGC FGC
262144.0 262144.0 4725 509
after 4 days that I restarted it.
These result are much better than when I started but I think it can get even better. I'm not really sure what to do next as I'm no GC pro so I was wondering if you guys would have any tips or advice for me on how I could get better app/GC performance and if anything is obvious like ratio's and sizes of youngGen and oldGen ?
How should I set the survivors and eden sizes/ratios ?
Should I change GC type like use CMS GC or G1 ?
How should I proceed ?
Any advice would be helpful.
Best,
Nicola
Young and Old area ratio are interms 1:3 but it could varies depends on the application usage on
short lived objects and long lived objects. If the short lived objects are more then the
young space could be extended for example 2:3 (young:old). Reason for increase in the ratio is
to avoid scavange garbage cycle. When more short lived objects are allocated then the young space
fill fast and lead to scavenge GC cycle inturn affects the application performance. When the ratio
increased then the current value then there are possibilities in the reduction of scavenge GC cycle.
When the young space increased automatically survivor and Eden space increase accordingly.
CMS policy used to reduce pause time of the application and G1 policy targeted for larger memories
with high throughput. Gc policy can be changed based on the need of the application.
Recommended Use Cases for G1 :
The first focus of G1 is to provide a solution for users running applications that require large heaps with limited GC latency.
This means heap sizes of around 6GB or larger, and stable and predictable pause time below 0.5 seconds.
As you use 8G heap size, you can test with G1 gc policy for the same environment in order to check the GC performance.