Can't expose data as graph in nagios - jms

I am monitoring Message Count attribute of JMS Queue using Nagios. For that I am using check_jmx plugin and it gives the output as "JMX OK MessageCount=400". I configured graph for this service, but when click on graph icon it shows no data available. This service not generating any rrd file. How can I configure graph for my message count monitoring service? In graph i want to show message count/hour. Whether I have to use another plugin?

Nagios graphing addons such as PNP4Nagios use the performance data of the plugin output which is everything after the |. Run the plugin on the command line and see if it's outputting performance data, and try different verbosity options by adding -vvv to check_jmx.
More info on performance data
check_jmx usage

As i can't tell from your text, which plugin you are using in order to gernerate the graphs, i myself recommend PNP4Nagios.
Once installed it is working really great.
for your problem:
and it gives the output as "JMX OK MessageCount=400".
if this is the messages / hour, you dont even have to change anything.
if it's not, you might include the code of the plugin in its current version on your nagios to your question or modyfie yourself ( store / grab messagecount and timestamp in order to calculate your messages / hour )

Related

How can i get a live view of syslog-ng logs in a webfrontend?

I currently have Syslog-ng set up to aggregate my logs. I want to show these logs in real time to my web frontend users. I however have no clue how to do this, is it possible to connect directly to Syslog-ng using WebSockets? Or do I need to first pass it on to something like elasticsearch, if so, how do I get my data live from elasticsearch?
I found this table in the Syslog-ng documentation, but iIcould not find any output destination that would solve my problem.
Unfortunately currently there's no mechanism to export real-time log traffic for a generic destination. You could however write your configuration in a way that places log information for a frontend to read.
For instance, if you have a log statement delivering messages to elastic:
log {
source(s_network);
destination(d_elastic);
};
you could add an alternative destination to the same log statement, which would only serve as a buffer for exporting real-time log data. For instance:
log {
source(s_network);
destination(d_elastic);
destination { file("/var/log/buffers/elastic_snapshot.$SEC" overwrite-if-older(59)); };
};
Notice the 2nd destination in the log statement above, with curly braces you tell syslog-ng to use an in-line destination instead of a predefined one (or you could use a full-blown destination declaration, but I omitted that for brevity).
This new file destination would write all messages that elastic receives to a file. The file contains the time based macro $SEC, meaning that you'd get a series of files: one for each second in a minute.
Your frontend could just try to find the file with the latest timestamp and present that as the real-time traffic (from the last second).
The overwrite-if-older() option tells syslog-ng that if the file is older than 59 seconds, then it should overwrite it instead of appending to it.
This is a bit hacky, I even intend do implement something what you have asked for in a generic way, but it's doable even today, as long as the syslog-ng configuration is in your control.

How to properly create Prometheus metrics with unique field

I have a system that regularly downloads files and parses them. However, sometimes something might go wrong with the parsing and I have the task to create a Prometheus alert for when a certain file fails. My
initial idea is to create a custom counter alert in Prometheus - something like
processed_files_total and use status as label because if the file fails it has FAILED status and if it succeeds - SUCCESS, so supposedly the alert should look like
increase(processed_files_total{status=FAILED}[24h]) > 0 and I hope that this will alert me in case there is at least 1 file with failed status.
The problem comes from the fact that I also want to have the
exact filename in the alert message and since each file has a unique name I'm almost sure that it is not a good idea to put it as label e.g. filename={filename} - According to Prometheus docs -
Do not use labels to store dimensions with high cardinality (many different label values), such as user IDs, email addresses, or other unbounded sets of values.
is there any other way I can achieve getting the filename from the alert or this is the way to go ?
It's a good question.
I think the correct answer is that the alert should notify you that something failed and the resolution is to go to the app's logs to identify the specific file(s) that failed.
Lightning won't strike you for using the filename as a label value in Prometheus if you really must but, I think, as you are, using an unbounded value should give you pause as to whether you're abusing the tool.
Metrics seem intrinsically (hunch) about monitoring aggregate state (an unusual number of files are failing) rather than specific (why did this one fail); logs and tracing tools help with the specific cases.

Flink web UI: Monitor Metrics doesn't work

run with flink-1.9.0 on yarn(2.6.0-cdh5.11.1), but the flink web ui metrics does'nt work, as shown below:
I guess you are looking at the wrong metrics. Due no data flows from one task to another (you can see only one box at the UI) there is nothing to show. The metrics you are looking at only show the data which flows from one flink task to another. At your example everything happens within this task.
Look at this example:
You can see two tasks sending data to the map-task which emits this data to another task. Therefore you see incoming and outgoing data.
But on the other hand a source task never has incoming data(I must admit that this is confusing at the first look):
The number of records recieved is 0 but it send a couple of records to the downstream task.
Back to your problem: What you can do is have a look at the operator metrics. If you look at the metrics tab (the one at the very right) you can select beside the task metrics also some operator metrics. These metrics have a name like 0.Map.numRecordsIn.
The name assembles like this <slot>.<operatorName>.<metricsname>. But be aware that this metrics are not recorded, you don't have any historic data and once you leave this tab or remove a metric the data collected until that point are gone. I would recommend to use a proper metrics backend like influx, prometheus or graphite. You can find a description at the flink docs.
Hope that helped.

run nifi flow once and notify me when it is finish

I use rest api in my program,I made a processor group for convent a mongodb collection to json file:
I want to run the scheduling only one time,so I set the "Run schedule" to 10000 sec.Then I will stop the group when the data flow have ran one time,and I made a Notify processor and add a DistributedMapCacheService.But the DistributedMapCacheClientService of the Notify processor only comunicates with the DistributedMapCacheService in nifi itself,It never nofity my program.
I try to use my own socket server,but I only get a message "nifi" but no more message.
My question is:If I only want scheduling run once and stop it,how do I know when shall I stop it?Or is there some other way to achieve my purpose,like detect if the json file exists or use incremental data(If the scheduling run twice,the data will be repeated twice)?
As #daggett said you can do it in a synchronous way you can use HandleHttpRequest as trigger and HandleHttpResponse to manage the response.
For an asynchronous was you have several options for the notification like PutTCP, PostHTTP, GetHTTP, use FTP, file system, XMPP or whatever.
If the scheduling run twice the duplicated elements depends on the processors you use, some of them have state others no, but if you are facing problems with repeated elements you can use the DetectDuplicate processor.

Immediately Display New Metrics

I am using graphite and coda hale metrics to try and track the number of times particular API's are called and also the top 10 callers. I have assigned a metric to each user who calls the API and use graphite to bring back the top 10.
The problem is, if it is a new user - ie a new metric, this will only be displayed in Graphite when the tool is refreshed - Has anyone come across a work around for this ? Is there some way Graphite can automatically detect new meters?
Just to be clear - I can see the top ten API callers for the last 30 minutes.........unless it is a brand new user that has never logged in before.
It seems that graphite-web uses an on disk index generated by a glorified find command. Another script is available so you can run it as cron to update the metric index file.
Whenever you update the index file, graphite-web process will detect it and reload it.
Since reloading the index might be heavy for large (1M) number of metrics, I would advise to modify the update script a bit to conditionnaly update the file (only if different for instance).
EDIT: after test, graphite does not seem to call the reloading code

Resources