I want to monitor the JVM's CPU on a container basis.
I tried collecting through the actuator, but the values don't seem to match, so I'm looking for another method.
Is there any way to collect the JVM CPU? It would be nice to know how the limit is also set.
thank you :)
Related
is there a way to get the CPU utilization average result in specific timing from kibana graph instead of calculating it by myself ?
Tldr;
You can monitor your cluster with Metricbeat.
Although it will only give you the system load.
But you have no more details.
Then again there are modules such as the Linux one, which offer more information on the cpu usage.
If none of those offer what you need you might want to roll out your own solution ^^
Or wait for an update of someone with better knowledge.
I have just gotten into Kubernetes and really liking its ability to orchestrate containers. I had the assumption that when the app starts to grow, I can simply increase the replicas to handle the demand. However, now that I have run some benchmarking, the results confuse me.
I am running Laravel 6.2 w/ Apache on GKE with a single g1-small machine as the node. I'm only using NodePort service to expose the app since LoadBalancer seems expensive.
The benchmarking tool used are wrk and ab. When the replicas is increased to 2, requests/s somehow drops. I would expect the requests/s to increase since there are 2 pods available to serve the request. Is there a bottleneck occurring somewhere or perhaps my understanding is flawed. Do hope someone can point out what I'm missing.
A g1-small instance is really tiny: you get 50% utilization of a single core and 1.7 GB of RAM. You don't describe what your application does or how you've profiled it, but if it's CPU-bound, then adding more replicas of the process won't help you at all; you're still limited by the amount of CPU that GCP gives you. If you're hitting the memory limit of the instance that will dramatically reduce your performance, whether you swap or one of the replicas gets OOM-killed.
The other thing that can affect this benchmark is that, sometimes, for a limited time, you can be allowed to burst up to 100% CPU utilization. So if you got an instance and ran the first benchmark, it might have used a burst period and seen higher performance, but then re-running the second benchmark on the same instance might not get to do that.
In short, you can't just crank up the replica count on a Deployment and expect better performance. You need to identify where in the system the actual bottleneck is. Monitoring tools like Prometheus that can report high-level statistics on per-pod CPU utilization can help. In a typical database-backed Web application the database itself is the bottleneck, and there's nothing you can do about that at the Kubernetes level.
I have some computation intensive and long-running task. It can easily be split into sub-tasks and also it would be kind of easy to aggregate the results later on. For example Map/Reduce would work well.
I have to solve this on Cloud Foundry and there I want to get advantage from autos-caling, that is creation of additional instances due to high CPU loads. Normally I use Spring boot for developing my cf apps.
Any ideas are welcome of how to divide&conquer in an elastic way on cf. It would be great to have as many instances created as cf would do, without needing to configure the amount of available application instances in the application. Also I need to trigger the creation of instances by loading the CPUs to provoke auto-scaling.
I have to solve this on Cloud Foundry
It sounds like you're on the right track here. The main thing is that you need to write your app so that it can coexist with multiple instances of itself (or perhaps break it into a primary node that coordinates work and multiple worker apps). However you architect the app, being able to scale up instances is critical. You can then simply cf scale to add or remove nodes and increase capacity.
If you wanted to get clever, you could set up a pipeline to run your jobs. Step one would be to scale up the worker nodes of your app, step two would be to schedule the work to run, step three would be to clean up and scale down your nodes.
I'm suggesting this because manual scaling is going to be the simplest path forward (please read on for why).
and there I want to get advantage from autos-caling, that is creation of additional instances due to high CPU loads.
As to autoscaling, I think it's possible but I also think it's making the problem more complicated than it needs to be. Auto scaling by CPU on Cloud Foundry is not as simple as it seems. The way Linux reports CPU usage, you can exceed 100%, it's 100% per CPU core. Pair this with the fact that you may not know how many CPU cores are on your Cells (like if you're using a public CF provider), the fact that the number of cores could change over time (if your provider changes hardware), and that makes it's difficult to know at what point you should scale your application.
If you must autoscale, I would suggest trying to autoscale on some other metric. What metrics are available, will depend on the autoscaler tool you are using. The best would be if you could have some custom metric, then you could use work queue length or something that's relevant to your application. If custom metrics are not supported, you could always hack together your own autoscaler that does work with metrics relevant to your application (you can scale up and down by adjusting the instance cound of your app using the CF API).
You might also be able to hack together a solution based on the metrics that your autoscaler does provide. For example, you could artificially inflate a metric that your autoscaler does support in proportion to the workload you need to process.
You could also just scale up when your work day starts and scale down at the end of the day. It's not dynamic, but it simple and it will get you some efficiency improvements.
Hope that helps!
Dear fellow Apache Spark enthusiasts
I recently kicked off a sideline project with the goal of turning a couple of ODROID XU4 computers into a stand-alone Spark Cluster.
After setting up the cluster I ran into a problem that seems to be specific to heterogeneous multi processors. Spark executor tasks run extremely slow on the XU4 when using all 8 processors. The reason, as mentioned in a comment on my post below, is that Spark does not wait for the executors that have been kicked off on the slow processors.
http://forum.odroid.com/viewtopic.php?f=98&t=21369&sid=4276f7dc89a8d7825320e7f705011326&p=152415#p152415
One solution is to use fewer executor cores and to set the CPU affinity to not use the LITTLE processors. This is however a less than ideal solution.
Is there a way to ask Spark to wait a bit longer for feedback from slower executors? Obviously waiting too long will have a negative effect on performance. The positive effect of utilising all cores should however balance out the negative effect.
Thanks in advance for any help!
#Dikei response highlights two potential causes, but it turns out the problem is not the one he suspects. I have the same set up as the #TJVR, and it turns out the driver is missing heartbeats from executors. To address this, I added the following to spark-env.sh:
export SPARK_DAEMON_JAVA_OPTS="-Dspark.worker.timeout=600 -Dspark.akka.timeout=200 -Dspark.shuffle.consolidateFiles=true"
export SPARK_JAVA_OPTS="-Dspark.worker.timeout=600 -Dspark.akka.timeout=200 -Dspark.shuffle.consolidateFiles=true"
This changes the default timeouts for executor heartbeats. Also set spark.shuffle.consolidateFiles to true to improve performance on my ext4 filesystem. These defaults changes allowed me to increased the core usage above one and not frequently lose executors.
Spark does not kill slow executors, but will mark an executor as dead in two cases:
If the driver doesn't receive a heartbeat signal in a while (default: 120s): The executor have to regularly (default: 10s) send a heartbeat message to notify the driver that it is still alive. Network issues or large GC pause can prevent these heartbeat from happening.
The executor has crashed due to exception in the code or JVM runtime error, most likely due to GC pause as well.
In my opinion, it's probably that GC overhead has killed your slowed executor and the driver has to redo the task on a different executor. If this is the case, you can try splitting your data into smaller partitions, so that each executor has to process less data at a time.
Secondly, you should NOT set spark.speculation to 'true' without testing. It's 'false' by default for a reason, I've seen it do more harm than good in some case.
Lastly, the following assumption might not hold true.
The positive effect of utilising all cores should however balance out
the negative effect.
Slow executors (straggles) can cause the program to perform much worse, depending on workload. It's entirely possible that avoiding the slow cores will provide the best result.
WAS Liberty has auto-scaling feature and one of the metrics is the CPU metric.
My question is about the case when I have many JVMs being members of different clusters running on the same host. Is Liberty measuring the CPU consumption of each JVM or is it per host? If it is host-wide, then scaling decisions are almost meaningless because how does it know which JVM is actually consuming all the CPU time and needs to be scaled out? If I have 5 JVMs and 4 of those are not doing any work and one JVM is taking 99% of my CPU on the host, I only need to scale that JVM, not all five of them.
The manual says metrics trigger on averages across a cluster, so for the scenario you outline:
If they are all in a single cluster, there is really no scaling action required and it wouldn't likely help.
If the spinning JVM is in a single or very small cluster (w/o the other 4) then it could trip the CPU metric depending on the average %cpu and the configuration
The usage of the resources is compared against its threshold at each level (JVM, Host and Cluster). When the usage goes beyond the threshold at one of the levels, the scale out action is executed.