CPU Type and CPU Utilization metric for stackdriver monitoring - google-cloud-stackdriver

I want to show custom metrics like CPU Type and CPU Utilization in stack driver monitoring.For this i am looking any monitoring api is available or not.
Please let me know any suggestions on this.

You can use a product from the GCP Marketplace if you are looking for these metrics from a non GCP resource, BindPlane. Like AWS, or on premise server or VM.
They have a source collector that is for Host info that collects info on things like CPU utilization, cpu temp, cpu voltage, disk, memory, network performance, process info, free space on a file store, etc.
Here's the docs, looks like it's in Alpha: https://docs.bindplane.bluemedora.com/docs/host

Related

Google Cloud Functions CPU Speed Setup

I was not able to find settings for GCF to setup CPU speed. In pricing calculator there is several options for GCF environment on which it running, and by default as I understand the least performant option is used (800MHz?). So, I wounder, is there any options in Cloud Console or during Functions deploy to setup CPU speed?
The memory size and CPU speed are connected. The details are on the pricing page:
https://cloud.google.com/functions/pricing
So, if you allocate 1024MB of memory to your function, you'll get a 1.4GHz CPU.

How do I find a marathon runaway process

I have a mesos / marathon system, and it is working well for the most part. There are upwards of 20 processes running, most of them using only part of a CPU. However, sometimes (especially during development), a process will spin up and start using as much CPU as is available. I can see on my system monitor that there is a pegged CPU, but I can't tell what marathon process is causing it.
Is there a monitor app showing CPU usage for marathon jobs? Something that shows it over time. This would also help with understanding scaling and CPU requirements. Tracking memory usage would be good, but secondary to CPU.
It seems that you haven't configured any isolation mechanism on your agent (slave) nodes. mesos-slave comes with an --isolation flag that defaults to posix/cpu,posix/mem. Which means isolation at process level (pretty much no isolation at all). Using cgroups/cpu,cgroups/mem isolation will ensure that given task will be killed by kernel if exceeds given memory limit. Memory is a hard constraint that can be easily enforced.
Restricting CPU is more complicated. If you have machine that offers 8 CPU cores to Mesos and each of your tasks is set to require cpu=2.0, you'll be able run there at most 4 tasks. That's easy, but at given moment any of your 4 tasks might be able to utilize all idle cores. In case some of your jobs is misbehaving, it might affect other jobs running on the same machine. For restricting CPU utilization see Completely Fair Scheduler (or related question How to understand CPU allocation in Mesos? for more details).
Regarding monitoring there are many possibilities available, choose an option that suits your requirements. You can combine many of the solutions, some are open-source other enterprise level solutions (in random order):
collectd for gathering stats, Graphite for storing, Grafana for visualization
Telegraf for gathering stats, InfluxDB for storing, Grafana for visualization
Prometheus for storing and gathering data, Grafana for visualization
Datadog for a cloud based monitoring solution
Sysdig platform for monitoring and deep insights

How to Monitor Resource Utilization?

Is there a tool which logs the system resource utilization like cpu,memory,io and network for a period of time and generate graph ?
I need to monitor system and identify in which period resource is been highly utilized.
If anyone of you had experience with this kind of tool,kindly suggest.
Thanks in Advance.
Besides third party tools, there is Windows Performance Monitor that can help. It shows real-time graphs, and can save the performance information into files that you can open and analyze later
It provides multiple metrics for CPU, memory, I/O and Network utilization, and shows an instance for each processor on the machine. It can also be used to monitor remote machines
You can also create collector sets, to have all monitored counters in a single component
Performance Monitoring Getting Started Guide
Create a Data Collector Set to Monitor Performance Counters
I think this tool will help you
System-Resources-Monitoring
System Monitoring

monitoring application (CPU and cache usage) on single Linux box with 80 cores

I am looking for a performance monitoring tool for my application which will collect/visualize in realtime the CPU and cache usage on single Linux box like IBM System or HP ProLiant with typical configuration 8 processors / 80 cores.
Application is the home-grown multithreaded C+ code which uses OpenMP.
This monitoring tool should not run 24 hours per day; it should not do e-mail notification.
I will run this tool just before sending commands to my apps, the apps will execute the command (it may take as a maximum few minutes only). During this time interval I need to analyze:
- usage of cores
- data movement between processors
- usage of L1, L2, L3 caches
- some other metrics (help me here) which can help to find bottleneck in application
performance and resource utilization
I guess that tools like Nagios / Zabbix are too heavy for this task.
From another side using the command-line tools like "top" and "sar" for 80 cores not very convenient and plotting (not necessary real-time) would be nice to have...
While getting the per core usage is rather easy - the other values might prove to be not practical, not at least without running that application within a profiler of some sorts.
Measuring QPI utilization is something highly non-trivial if at all possible. Intel's vTune might be able to acquire such things but only when running instrumented version of your binaries.
Also on x86 there is no way to figure out L1,L2,L3 usage of any kind - you can grab the low level CPU counters to measure cache misses though (but would probably need to use instrumented/profiled binaries and always withan something like vTune or PAPI).
You could "easily" setup something to pull all the lower level performance counters into SNMP and grab the SNMP values via standard SNMP capable monitoring tools but be aware that SNMP pulling is something that you don't want to occur more than 1-2/s max. Or pull that info into something like collectd.
I'm also having the impression that you don't understand the problem domain of monitoring tools. They are not ment to be used as low level analysis probes for finding application level/system bottlenecks - at best you could get some hints which resource (from a 10K feet view) is running under full utilization. Monitoring and alterting tools are something that operations staff needs to use to understand which part of their IT system is currently used and how, to gather historical data and predict future resource utilization and to be alerted when something breaks.
SiteScope, Hyperic or any combination of shell scripts, native OS utilities and a DB to store the results may do the job.

amazon ec2 cpu or hard slower than my home-linux?

I'm using small size ec2.
its noticeably slower than my less than $800 home linux machine.
(about average machine purchased 6month ago)
I don't know cpu or hard-disk is the bottleneck.
Wonder if there's a way to tell which.
yes, if you want to monitor your EC2 instance, consider using Amazon's cloudwatch ( http://aws.amazon.com/cloudwatch/ ). This service can monitor all your instance's resources, such as CPU utilization, memory usage, network latency, and request counts. It's also free in the amazon free tier.
If you're looking for more detailed monitoring, consider serverdensity service ( http://www.serverdensity.com/cloud-monitoring/ ). They can monitor software installed on the server itself, such as apache service

Resources