I am using a Gauge Vector in my application for collecting and exposing a particular metric with labels from my application in the Prometheus metrics format. The problem is that once I have set a metric value for a particular set of labels, even if that metric is not collected again it will be scraped by Prometheus until the application restarts and the metric is removed from memory. This means that even if that metric is no longer valid anymore (hasn't been set again for a day say) Prometheus will still be scraping it as if it's a fresh metric.
Is it possible to either set an expiry time for collected metrics or to remove the collected metric completely? Or are problems like this dealt with on the Prometheus server side?
These are the correct semantics. Prometheus deals with metrics and metrics don't go away just because they haven't changed in a while. What you should be doing is keeping the gauge up to date.
It sounds like you might want a logs-based monitoring system, such as provided by the ELK stack.
Related
I have read over the Spring Metrics docs and I have system metrics enabled in my application.yml file. This, according to the docs, is supposed to give me metrics prefixed with process., system. and disk.. I see results for the first two of these, but I am not getting any metrics about disk space usage. I've even looked in the code and have found a MeterBinder class named DiskSpaceMetrics that seems to send the two values disk.free and disk.total. Can someone please tell me how to get my Spring app to send these disk space metrics?
I am sending my metrics to AWS CloudWatch.
I found this Question: How to enable DiskSpaceMetrics in io.micrometer. It seems to be about seeing the disk space values in Spring's actuator dashboard. I do see the values there. What I want is for those values to be periodically reported as metrics values.
It turns out that my app WAS sending out the metrics. The AWS Cloudwatch console just wasn't showing them to me. I brought up the metrics just fine via Grafana. Even once I knew they were there, I could find no way to get the AWS console to show them to me. Strange. I might have to put in a request to AWS asking them what's up with that.
Requirements is to find out root cause analysis using Prometheus and Grafana and also need to know which method is taking more time, CPU and Memory Utilisation also?
Can anyone pls help me
Integrate Prometheus client library into your app
Configure Grafana dashboard to visualize the metrics you want to monitor from your app
Additionally you can integrate JMeter with Prometheus Listener for JMeter plugin (can be installed using JMeter Plugins Manager)
Come up with a test plan which simulates a real life usage of your application
Run a stress test (start with 1 user and gradually increase the load until response time starts exceeding acceptable threshold or errors start occurring)
Inspect Grafana dashboard to see what is the reason of the application being slow, i.e. trace down the "slow" request to the underlying function. Additionally you can use a profiler tool which can provide a better picture
I installed elasticsearch logstash and kibana in the ubuntu server. Before I starting these services the CPU utilization is less than 5% and after starting these services in the next minute the CPU utilization crossing 85%. I don't know why it is happening. Can anyone help me with this issue?
Thanks in advance.
There is not enough information in your question to give you a specific answer, but i will point out few possible scenarios and how to deal with them.
Did you wait long enough? sometimes there is a warmpup which is consuming higher CPU until all services are registered and finish to boot. if you have a fairly small machine it might consume higher CPU and take longer to finish.
folder write permissions. if any of the components of the ELK fails due to restricted access on needed directories either for logging, creating sub folders for sinceDB files or more it can cause it to go into an infinity loop and try again and again while it is consuming high CPU.
connection issues. ES should be the first component to start, if it fails, Kibana and Logstash will go and try to connect to the ES again and again until successful connection- which can cause high CPU.
bad logstash configuration. if logstash fails to read the file from the configurations or if you have a bad parsing, excessive parsing for example- your first "match" in the filter part will include the least common option it might consume high CPU.
For further investigation:
I suggest you to not start all of them together. start ES first. if everything goes well start Kibana and lastly start Logstash.
check the logs of all the ELK components to find error messages, failures, etc.
for a better answer I will need the yaml of all 3 components (ES, Kibana, Logstash)
I will need the logstash configuration file.
Would recommend you to analyse the CPU cycles consumed by each of the elasticsearch, logstash and kibana process.
Check specifically which process among the above is consuming the most memory/cpu via top command for example.
Start only ES first and allow it to settle and the node to be started completely before starting kibana and may be logstash after that.
Send me the logs for each and I can assist if there are any errors.
I have set up Flink UI for application running in Intellij IDEA. I would like to get some streaming metrics like - scheduling delay and processing time. However, I can not find the anywhere in UI. Should there be some specific setup for that or should I explicitly submit app jar?
Currently, Flink UI for the job looks like this:
All of the task metrics are exposed in the web UI, as Dominik mentioned, but for other metric scopes (e.g., job metrics) only some selected metrics are displayed. You can access all of the metrics via the REST API or by connecting a metrics reporter to send the metrics to an external metrics system.
I don't think any attempt has been to made to measure scheduling delay, but in the job metrics you will find things like restarting time and uptime.
In the UI you should have a tab Task Metrics when You select the currently running job. This tab allows You to choose a task and see all the metrics that are available. Although, I am not sure if the scheduling delay is one of currently available metrics.
Probably the better idea is to refer to expose the metrics for some collector of Your choice, You can find more info in the documentation: https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html.
I am trying to make Grafana display all my metrics (CPU, Memory, etc).
I have already configured Grafana on my server and have configured influxdb and of course I have configured Jmeter listener (Backend Listener) but still I cannot display all grpahas, any idea what should I do in order to make it work ?
It seems like that system metrics (CPU/Memory, etc.) are not in the scope of the JMeter Backend Listener implementation. Actually capturing those KPIs is a part of PerfMon plugin, which currently doesn't seem to support dumping the metrics to InfluxDB/Graphite (at least it doesn't seem to work for me). It might be a good idea to raise such a request at https://groups.google.com/forum/#!forum/jmeter-plugins. Until this gets done, I guess you also have the option of using some alternative metric-collection tools to feed data in InfluxDB/Graphite. Those would depend on the server OS you want to monitor (e.g. Graphite-PowerShell-Functions for Windows or collectd for everything else)
Are you sure that JMeter posts the data to InfluxDB? Did you see the default measurements created in influxDB?
I am able to send the data using backend listener to influxdb. I have given the steps in this site.
http://www.testautomationguru.com/jmeter-real-time-results-influxdb-grafana/