I'm in the process of writing a Prometheus Exporter in Go to expose metrics pushed from AIX severs. The AIX servers push their metrics (in json) to a central listener (the exporter program) that converts them to standard Prometheus metrics and exposes them for scraping.
The issue I have is that the hostname for the metrics is extracted from the pushed json. I store this as a label in each metric. E.g. njmon_memory_free{lpar="myhostname"}. While this works, it's less than ideal as there doesn't seem to be a way to relabel this to the usual instance label (njmon_memory_free{instance="myhostname"}. The Prometheus relabelling happens before the scrape so the lpar label isn't there to be relabelled.
One option seems to be to rewrite the exporter so that the Prometheus server probes defined targets, each target being the lpar. In order for that to work, I'd need a means to filter the stored metrics by lpar so only metrics relating to the target/lpar are returned. Is this a practical solution or am I forced to create a dedicated listener or url for every lpar?
So I'm fixing my answer given in comments, due it was helpfull to author.
Use "instance" label in exporter, not "lpar" (change exporter code)
Use "honor_labels: true" in Prometheus scrape_config
Related
Small question regarding SpringBoot web applications and tracing please.
By tracing, I mean traceId,spanId,parentId. Most of all, how to collect/scrap/poll those traces.
For example, logging:
SpringBoot can send logs to external systems (send log over the wire) to Loki, Logstash (just to name a few). But this construct requires the web application to know the destinations, and to send the logs there.
An alternative to this construct, which is non invasive, is to just write the logs in a log file, and let some logs forwarder, for instance FileBeat, Splunk forwarder, to collect/scrap/poll the logs and those will send to the destinations, without having the web application know anything about them.
Another example, metrics.
SpringBoot can send metrics to different metrics backend, such as Elastic, DataDog (just to name a few) using the micrometer-registry-abc. But again, the web application here needs to know about the destination.
An alternative to this construct is for instance to expose the /metrics endpoint, or the /prometheus endpoint, and have something like MetricBeat, Prometheus agent collect/scrap/poll those metrics. Here again, the web application does not need to know anything about the destinations, it is non obtrusive at all.
My question is when it comes to traces. As in traceId,spanId,parentId.
SpringBoot can send traces to Zipkin server, which is very popular.
However, it seems there is no construct to collect/scrap/poll traces.
send/push
collect/scrap/poll
logging
yes (TCP Logstash/ Loki)
yes (FileBeat/Splunk Forwarder)
metrics
yes (micrometer-registry-X)
yes (prometheus agent/ MetricBeat
traces
yes (zipkin)
?
Question, what is the best way to have SpringBoot generated traces collected/scraped/polled in a non invasive way please?
Thank you
I am trying to find a working example of how to use the remote write receiver in Prometheus.
Link : https://prometheus.io/docs/prometheus/latest/querying/api/#remote-write-receiver
I am able to send a request to the endpoint ( POST /api/v1/write ) and can authenticate with the server. However, I have no idea in what format I need to send the data over.
The official documentation says that the data needs to be in Protobuf format and snappy encoded. I know the libraries for them. I have a few metrics i need to send over to prometheus http:localhost:1234/api/v1/write.
The metrics i am trying to export are scraped from a metrics endpoint (http://127.0.0.1:9187/metrics ) and looks like this :
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 1.11e-05
go_gc_duration_seconds{quantile="0.25"} 2.4039e-05
go_gc_duration_seconds{quantile="0.5"} 3.4507e-05
go_gc_duration_seconds{quantile="0.75"} 5.7043e-05
go_gc_duration_seconds{quantile="1"} 0.002476999
go_gc_duration_seconds_sum 0.104596342
go_gc_duration_seconds_count 1629
As of now, i can authenticate with my server via a POST request in Golang.
Please note that it isn't recommended to send application data to Prometheus via remote_write protocol, since Prometheus is designed to scrape metrics from the targets specified in Prometheus config. This is known as pull model, while you are trying to push metrics to Prometheus aka push model.
If you need pushing application metrics to Prometheus, then the following options exist:
Pushing metrics to pushgateway. Please read when to use the pushgateway before using it.
Pushing metrics to statsd_exporter.
Pushing application metrics to VictoriaMetrics (this is an alternative Prometheus-like monitoring system) via any supported text-based data ingestion protocol:
Prometheus text exposition format
Graphite
Influx line protocol
OpenTSDB
DataDog
JSON
CSV
All these protocols are much easier to implement and debug comparing to Prometheus remote_write protocol, since these protocols are text-based, while Prometheus remote_write protocol is a binary protocol (basically, it is snappy-compressed protobuf messages sent over HTTP).
I am trying to understand if there is any significant difference between the two.
While looking at the example, I have noticed that it uses exactly the same binary and args (https://github.com/open-telemetry/opentelemetry-collector/blob/main/examples/demo/docker-compose.yaml). The only difference is the config files which have some difference in exporters/recivers.
So the difference only is what endpoint is used to collect/send traces?
No, although the binary is same there is a difference in terms of deployment. The agent is collector instance running on the same host as application that emits the telemetry data. Agent then forwards this data to a Gateway (One or more instances of collectors which receive data from multiple agents). And then data is send to configured backends (Jaeger, Zipkin, Private vendors etc...)
Recently i have been reading into Elastic stack and finding out about this thing called Beats, which basically used for lightweight shippers.
So the question is, if my service can directly hit to Elasticsearch, do i actually need beats for it? Since from what i have known it's just kinda a proxy (?)
Hopefully my question is clear enough
Not sure which beat you are specifically referring but let's take an example of Filebeat.
Suppose application logs need to be indexed into Elasticsearch. Options
Post the logs directly to Elasticsearch
Save the logs to a file, then use Filebeat to index logs
Publish logs to a AMQP service like RabbitMQ or Kafka, then use Logstash input plugins to read from RabbitMQ or Kafka and index into Elasticsearch
Option 2 Benefits
Filebeat ensures that each log message got delivered at-least-once. Filebeat is able to achieve this behavior because it stores the delivery state of each event in the registry file. In situations where the defined output is blocked and has not confirmed all events, Filebeat will keep trying to send events until the output acknowledges that it has received the events.
Before shipping data to Elasticsearh, we can do some additional processing or filtering. We want to drop some logs based on some text in the log message or add additional field (eg: Add Application Name to all logs, so that we can index multiple application logs into single index, then on consumption side we can filter the logs based on application name.)
Essentially beats provide the reliable way of indexing data without causing much overhead to the system as beats are lightweight shippers.
Option 3 - This also provides the same benefits as option2. This might be more useful in case if we want to ship the logs directly to an external system instead of storing it in a file in the local system. For any applications deployed in Docker/Kubernetes, where we do not have much access or enough space to store files in the local system.
Beats are good as lightweight agents for collecting streaming data like log files, OS metrics, etc, where you need some sort of agent to collect and send. If you have a service that wants to put things into Elastic, then yes by all means it can just use rest/java etc API directly.
Filebeat offers a way to centralize live logs from Multiple Servers
Let's say you are running multiple instances of an application in different servers and they are writing logs.
You can ship all these logs to a single ElasticSearch index and analyze or visualize them from there.
A single static file doesn't need Filebeat for moving to ElasticSearch.
I am going to be using logstash to send a high amount of events to a broker. I have monitoring of the broker to check the health status, but I can't find much information on how to see if the logstash process is healthy, if there are indicators of a failing process.
I was interested for those who use logstash, what are some ways you monitor it?
You can have a cronjob inject a heartbeat message and route such messages to some kind of monitoring system. If you already use Elasticsearch you could use it for this as well and write a script to ensure that you have reasonably recent heartbeat messages from all hosts that should be sending messages, but I'd prefer using e.g. Nagios or lovebeat-go.
This could be used to monitor the health of a single Logstash instance (i.e. you inject the heartbeat message into the same instance that feeds the monitoring software) but you could just as well use it to check the overall health of the whole pipeline.
Update: This got built into Logstash in 2015. See the announcement of the Logstash heartbeat plugin.
If you're trying to monitor logstash as a shipper, it's easy to write a script that would compare the contents of the .sincedb* file to the actual file on disk to make sure they're in sync.
As an indexer, I'd probably skip ahead and query ElasticSearch for the number of documents being inserted.
#magnus' idea for a latency check is also good. I've used the log's timestamp and compared it to ElasticSearch's timestamp to compute the latency.