I'm working with grafana in a scala project and I get metrics like processing-time with a value of 141.2K.
Somebody knows in which units are expressed the grafana metrics, for example mailbox-size, time-in-mailbox and processing-time?
The unit of a metric is decided by the person sending the metric. Grafana does not demand a metric to have a unit. All grafana does is show you the metric stored in some database.
Ask the person who wrote the metric storing code to explain the unit for each metric.
Related
I have problem with creating metrics and later trigger alerts base on that metric. I have two datasources, both are elasticsearch. One contains documents (logs from service) saying that message was produced to kafka, second contain documents (also logs from service) saying that message was consumed. What I want to achieve is to trigger alert if ratio of produced to consumed messages drop below 1.
Unfortunately it is impossible to use prometheus, for two reasons:
1) counter resets each time service is restarted.
2) second service doesn't have (and wont't have in reasonable time) prometheus integration.
Question is how to approach metrics and alerting based on that data sources? Is it possible? Maybe there is other way to achieve my goal?
The question is somewhat generic (meaning no mapping or code, in general, is provided), so I'll provide an approach.
You can use a watcher upon an aggregation that you will create.
It's relatively straightforward to create a percentage of consume/produce, and based upon that percentage you can trigger an alert via the watcher.
Take a look at this tutorial (official elasticsearch channel) on how to do this. Moreover, check the tutorials for your specific version of elasticsearch. From 5.x to 7.x setting alerts has been significantly improved (this means that for 7.x you might be able to do this via the UI of kibana, but for 5.x you'll probably need to add the alert via indexing json in the appropriate indices .watcher)
I haven't used grafana, but I believe the same approach can be applied. You'll need an aggregation as mentioned before and then you add the alert https://grafana.com/docs/grafana/latest/alerting/rules/
I have two metric already made.
1st metric represents the number of transactions started by client
2nd metric represents the number of transactions received by server
I want to get the number of transactions which failed(are sent by client but not received by server) which is simple subtraction
Can I achieve this in Kibana?
There is a plugin for Kibana 5.0.0+. It is based on the core Metric-Plugin but gives you the ability to output custom aggregates on metric-results by using custom formula and/or JavaScript.
You can check more details Here .
I am trying to visualize time series data stored in elastic search using grafana.
I have the legend setup to show 2 decimal places but it does not reflect in the UI.
The decimal places show up for other dashboard panels with a tsdb datasource. So this issue is specific to using grafana with elasticsearch. Is there any other configuration setup I am missing here which will help me achieve this?
Just found out that elastic search does not allow displaying values without some sort of aggregation and in my case aggregation is resulting in values getting rounded.
There was a related request which seemed to not get much traction in kibana.
https://github.com/elastic/kibana/issues/3572
In short not feasible as of [2.x] elastic search.
Right, so I am trying to get a list of metric_names for a particular namespace (I'd rather it be for an object, but I'm working with what I've got) using AWS Ruby sdk, and cloudwatch has the list_metrics function, awesome!..
Except that list_metrics doesn't return what unit's and statistics a metric supports which is a bit stupid as you need both to request data from a metric.
If you're trying to dynamically build a list of metrics per namespace (which I am) you won't know what unit's or statistics a particular metric might support without knowing about the metrics before hand which makes using list_metrics to dynamically get a list of metrics pointless.
How do I get around this so I can build a hash in the correct format containing the metrics for any namespace without knowing anything about a metric before hand except for the hash structure.
Also why is there not a query for what metrics an object (dynamo,elb,etc) has?
It seems a logical thing to have because a metric does not exist for an object unless it's actually spat out data for that metric at least once (so I've been told); which means even if you have a list of all the metrics a namespace supports, it doesn't mean that an object within the namespace will have those metrics.
CloudWatch is a very general-purpose tool, with a generic schema for all metric data in the MetricDatum structure. But individual metrics have no schema other than the data sent in practice.
So there is no object for Dynamo, EC2, etc. that projects what metrics might be sent. There is only metric data that has already been sent with a particular namespace. Amazon CloudWatch Namespaces, Dimensions, and Metrics Reference documents the metric schema for many or all of the metrics AWS services capture. I know that's not what you wanted.
You can query any CloudWatch metric support by any of the statistics tracked by CloudWatch (SampleCount, Minimum, Maximum, Average, and Sum). CloudWatch requires that incoming metric data either include all statistics or with raw values that allow the statistics to be calculated.
I don't know of any way to get the units other than to query the data and look through what is returned.
I've recently done a lot of research into graphite with statsD instrumentation. With help of our developer operations team we managed to get multiple servers reporting metrics to graphite, and combine all the metrics. This is partially what we are looking for, however I want to filter the metric collection by server rather than having all the metrics be averaged together. The purpose of this is to monitor metrics collection on a per server basis, as many of our stats could also be used to visualize server uptime and performance. I haven't been able to find anything about how this may be achieved in my research, other than maybe some trickery with the aggregation rules.
You should include the server name as the first path component of the metric name being emitted. When naming metrics, Graphite separates the metric name into path components using . as the delimiter between path components. For example, you may want to use a naming schema like: <data_center>_<environment>_<role>_<node_id>.gauges.cpu.idle_pct This will cause each server to be listed as a separate category on http://graphite_hostname.com/dashboard/
If you need to perform aggregations across servers, you can do that at the graphite layer, or you could emit the same metric under two different names: one metric name that has the first path component as the server name, and one metric name that has the first path component as a value that is shared across all servers you want that metric aggregated across.