Aggregating Collectd measures using Statsd - metrics

I am collecting system metrics using Collectd. I am collecting measures in small intervals to get accurate values. However i want to aggregate these values locally using Statsd. Statsd should aggregate the values and send them to librato in longer intervals. This will reduce costs.
I have completed the basic setup of Collectd and Statsd. How do i send data from Collectd to Statsd?
The Statsd plugin seems to be a replacement for Statsd itself and seems not to provide this functionality.

It doesn't seem like there is any established plugin to accomplish this. If you're already satisfied where/how collectd is sending the data, and just want to aggregate, you can use the aggregation plugin:
https://collectd.org/wiki/index.php/Plugin:Aggregation
If you really want to get the data into StatsD somehow, you might be able to use the collectd Network output plugin and have it point to StatsD's port (although you may have to manipulate the data somehow).
I think for the most part, though, these 2 exist in parallel, and if you needed both each daemon would send data to your librato separately, or you could consolidate by only using CollectD with the StatsD plugin.

Related

Publishing high-volume metrics from Lambdas?

I have a bunch of Lambdas written in Go that produce certain events that are pushed out to various systems. I would like to publish metrics to CloudWatch that slice these by the event type. The volume is currently about 20000 events per second with peaks about twice that much.
Due to the load, I can't publish these metrics one-by-one on each Lambda invocation (each invocation produces a single event). What available approaches are there that cheap and don't hit any limits?
You can try to utilize shutdown phase from lambda lifecycle to publish you metric.
https://docs.aws.amazon.com/lambda/latest/dg/runtimes-context.html#runtimes-lifecycle-shutdown
To publish metric would suggest to utilize EMF(Embedded Metric Format) to combine multiple data points when calling PutMetricData API which takes also an array to act like a batch.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_PutMetricData.html

Configuring connectors for multiple topics on Kafka Connect Distributed Mode

We have producers that are sending the following to Kafka:
topic=syslog, ~25,000 events per day
topic=nginx, ~5,000 events per day
topic=zeek.xxx.log, ~100,000 events per day (total). In this last case there are 20 distinct zeek topics, such as zeek.conn.log and zeek.http.log
kafka-connect-elasticsearch instances function as consumers to ship data from Kafka to Elasticsearch. The hello-world Sink configuration for kafka-connect-elasticsearch might look like this:
# elasticsearch.properties
name=elasticsearch-sink
connector.class=io.confluent.connect.elasticsearch.ElasticsearchSinkConnector
tasks.max=24
topics=syslog,nginx,zeek.broker.log,zeek.capture_loss.log,zeek.conn.log,zeek.dhcp.log,zeek.dns.log,zeek.files.log,zeek.http.log,zeek.known_services.log,zeek.loaded_scripts.log,zeek.notice.log,zeek.ntp.log,zeek.packet_filtering.log,zeek.software.log,zeek.ssh.log,zeek.ssl.log,zeek.status.log,zeek.stderr.log,zeek.stdout.log,zeek.weird.log,zeek.x509.log
topic.creation.enable=true
key.ignore=true
schema.ignore=true
...
And can be invoked with bin/connect-standalone.sh. I realized that running or attempting to run tasks.max=24 when work is performed in a single process is not ideal. I know that using distributed mode would be a better alternative, but am unclear on the performance-optimal way to submit connectors to distributed mode. Namely,
In distributed mode, would I still want to submit just a single elasticsearch.properties through a single API call? Or would it be best to break up multiple .properties configs + connectors (e.g. one for syslog, one for nginx, one for zeek.**) and submit them separately?
I understand that tasks be equal to the number of topics x number of partitions, but what dictates the number of workers?
Is there anywhere in the documentation that walks through best practices for a situation such as this where there is a noticeable imbalance of throughput for different topics?
In distributed mode, would I still want to submit just a single elasticsearch.properties through a single API call?
It'd be a JSON file, but yes.
what dictates the number of workers?
Up to you. JVM usage is one factor that you can monitor and scale on
Not really any documentation that I am aware of

Flink web UI: Monitor Metrics doesn't work

run with flink-1.9.0 on yarn(2.6.0-cdh5.11.1), but the flink web ui metrics does'nt work, as shown below:
I guess you are looking at the wrong metrics. Due no data flows from one task to another (you can see only one box at the UI) there is nothing to show. The metrics you are looking at only show the data which flows from one flink task to another. At your example everything happens within this task.
Look at this example:
You can see two tasks sending data to the map-task which emits this data to another task. Therefore you see incoming and outgoing data.
But on the other hand a source task never has incoming data(I must admit that this is confusing at the first look):
The number of records recieved is 0 but it send a couple of records to the downstream task.
Back to your problem: What you can do is have a look at the operator metrics. If you look at the metrics tab (the one at the very right) you can select beside the task metrics also some operator metrics. These metrics have a name like 0.Map.numRecordsIn.
The name assembles like this <slot>.<operatorName>.<metricsname>. But be aware that this metrics are not recorded, you don't have any historic data and once you leave this tab or remove a metric the data collected until that point are gone. I would recommend to use a proper metrics backend like influx, prometheus or graphite. You can find a description at the flink docs.
Hope that helped.

How to define/send derived metrics in addition to built-in Aerospike metrics

I'm trying to ship Aerospike metrics to another node using some available methods, e.g., collectd.
For example, among the Aerospike monitoring metrics, given two fields: say X and Y, how can I define and send a derived metric like Z = X+Y or X/Y?
We could calculate it on the receiver side but it degrades the performance of our application overall. Will appreciate your guidance in advance.
Thanks.
It can't be done within the Aerospike collectd plugin, as the metrics are more or less shipped immediately once they are read. There's no variable that saves the metrics that have been shipped.
If you can use the Graphite plugin, it keeps track of all gathered metrics then sends once at the very end. You can add another stanza for your calculated metrics right before nmsg line. You'll have to search through the msg[] array for your source metrics.
The Nagios plugin is a very different method. It's a single metric pull, so a wrapper script would be needed to run the plugin for each operand, and run the calculation in the wrapper.
Or you can supplement existing plugins with your own script(s) just for derived metrics. All of our monitoring plugins utilize the Aerospike Info Protocol and you can use asinfo to gather metrics for your operands similar to the previous Nagios method.

Saving multiple values with StatsD

I need to save data sets like (timestamp, event_name, event_value1, event_value2, event_value3, ...) with StatsD. I need this to track custom events in the web app I'm working on.
Official StatsD readme states that StatsD expects metrics to be sent in the format:
<metricname>:<value>|<type>
Is there any way to push multiple values, or any workaround to make this possible?
We're currently using Graphite as a backend service, but it can be changed for the sake of adding this feature.
You could use a naming convention to capture this information:
event_name-timestamp-event_value1_name:event_value1|<stat type>
event_name-timestamp-event_value2_name:event_value2|<stat type>
event_name-timestamp-event_value3_name:event_value3|<stat type>
etc.

Resources