hashicorp consul is not publishing all the metrics - consul

consul isn't publishing all the metrics defined in their document, from https://www.consul.io/docs/agent/telemetry.html#transaction-timing, it shows only raft metrics but not txn kvs, has anyone observed this problem?
Command to enable prometheus style metrics:
consul agent -dev -hcl 'telemetry{prometheus_retention_time="24h" disable_hostname=true}'
watch metrics:
watch -n 1 -d "curl -s localhost:8500/v1/agent/metrics?format=prometheus|grep -v ^# | grep -E 'kvs|txn|raft'"

Metrics will be exported only if they are available, i.e. if there are no transactions or KV store operations, then you will not see these metrics in the output.
I have managed to see kvs metrics in the example you have provided. While running Consul agent via command in the question, in browser open http://127.0.0.1:8500/ and click on Key/Value option in the top list (you should end up here http://127.0.0.1:8500/ui/dc1/kv). Click on Create to add new Key/Value pair. After clicking Save you should see something like this in the terminal running watch command:
consul_fsm_kvs{op="set",quantile="0.5"} 0.3572689890861511
consul_fsm_kvs{op="set",quantile="0.9"} 0.3572689890861511
consul_fsm_kvs{op="set",quantile="0.99"} 0.3572689890861511
consul_fsm_kvs_sum{op="set"} 0.3572689890861511
consul_fsm_kvs_count{op="set"} 1
consul_kvs_apply{quantile="0.5"} 2.6777150630950928
consul_kvs_apply{quantile="0.9"} 2.6777150630950928
consul_kvs_apply{quantile="0.99"} 2.6777150630950928
consul_kvs_apply_sum 2.6777150630950928
consul_kvs_apply_count 1
If there are no more transactions some of these values will be set to NaN value, depends on Prometheus metrics type.
Similarly, to see txn, you need to create Consul Transaction
Hope that helps you set up monitoring.

Related

Does any monitoring metrics/events exist for ClickHouse keeper?

I am considering using ClickHouse keeper to replace zookeeper for data replication. And zookeeper has lots of useful metrics for monitoring/convenient triage. I checked ClickHouse documents and CurrentMetrics/ProfileEvents files but found no similar monitoring data to zk(https://zookeeper.apache.org/doc/r3.7.0/zookeeperMonitor.html).
Pls. direct me to the right way, thanks!
ClickHouse-keeper already supports 4-letter commands 'ruok' and 'mntr'
# echo 'mntr' | nc localhost 9181
zk_version v22.2.1.2764-testing-4fab6bec4ec53b66246a055919a4ed4c0610f650
zk_avg_latency 0
zk_max_latency 33
zk_min_latency 0
zk_packets_received 15430936
zk_packets_sent 15430936
zk_num_alive_connections 1
zk_outstanding_requests 0
zk_server_state standalone
zk_znode_count 4272
zk_watch_count 235
zk_ephemerals_count 111
zk_approximate_data_size 781777
zk_open_file_descriptor_count 203
zk_max_file_descriptor_count 18446744073709551615
zk_followers 0
zk_synced_followers 0
echo 'ruok' | nc localhost 9181
imok
It is possible to export those in Prometheus format using external tools like https://github.com/dabealu/zookeeper-exporter
Future versions will have embedded Prometheus exporter.
They are not implemented yet. There are plans to expose keeper metrics through Prometheus endpoint.
ClickHouse Keeper now has support for Prometheus endpoint https://github.com/ClickHouse/ClickHouse/pull/43087

How to read TimeoutStartSec value in systemD configuration from Application via Dbus interfaces

In my service configuration TimeoutStartSec == 100s.
According to man page.. my Application need to notify to systemD sd_notify(READY=1) during <100s. If not service is put into failed state.
https://www.freedesktop.org/software/systemd/man/systemd.service.html
But in case of i want to do something ( eg just print out some log said : startup is not done in time ) . before my service is actually set to failed state .
Is there any change to do that...
My idea is create a timer which have same value with TimeoutStartSec == xx s
then i can manage to do something before timer expired.
But the question is TimeoutStartSec == xx is dynamicaly configured by user - in my project..
So i would expect some Dbus interface which will offer to read TimeoutStartSec from my application...
I checked
https://www.freedesktop.org/wiki/Software/systemd/dbus/
but did not found a corresponding property.
I am using systemD on Linux which freely use systemD Dbus interfaces.
I found solution .
SystemD actually provide that info
dbus-send --system --dest=org.freedesktop.systemd1 --print-reply /org/freedesktop/systemd1/unit/ServiceName_2eservice \
org.freedesktop.DBus.Properties.Get string:org.freedesktop.systemd1.Service string:TimeoutStartUSec
Note: your name of service need to modify to get exactly object path ServiceName.service adapt to ServiceName_2eservice

Invalid header field value in Go ONLY on kubernetes/CoreOS

I have a Go program that uses aws-sdk-go to talk to dynamodb. Dependencies are vendored. Go version 1.7.1. aws-sdk-go version 1.6.24. The program works as expected in all the following environments:
dev box from shell (Arch Linux)
docker container running on my dev box (Docker 1.13.1)
Ec2 instance from shell (Ubuntu 16.04)
When I run the docker container on kubernetes (same one I tested on my dev box), I get the following error:
2017/03/02 22:30:13 DEBUG ERROR: Request dynamodb/GetItem:
---[ REQUEST DUMP ERROR ]-----------------------------
net/http: invalid header field value "AWS4-HMAC-SHA256 Credential=hidden\n/20170302/us-east-1/dynamodb/aws4_request, SignedHeaders=accept-encoding;content-length;content-type;host;x-amz-date;x-amz-target, Signature=483f56dd0b17d8945d3c2f2044b7f97e531190602f132a4d5f828264b3a2cff2" for key Authorization
-----------------------------------------------------
2017/03/02 22:30:13 DEBUG: Response dynamodb/GetItem Details:
---[ RESPONSE ]--------------------------------------
HTTP/0.0 000 status code 0
Content-Length: 0
Based on:
https://golang.org/src/net/http/transport.go
https://godoc.org/golang.org/x/net/lex/httplex#ValidHeaderFieldValue
It looks like the problem is with the header value validation, yet I am at a loss to understand why it works everywhere except on my k8s cluster. The cluster is composed of Ec2 instances running the latest CoreOS stable ami (CoreOS stable 1235.8.0)
The docker image that works on my dev machine is scratch based. To troubleshoot I created an image based on Ubuntu latest with a separate go program that just does a simple get item from dynamodb. When this image is run on my k8s cluster and the program run from an interactive shell, I get the same errors. I have confirmed I can ping the dynamodb endpoints from this env.
I am having a hard time troubleshooting this issue: am I missing something stupid here? Can someone point me in the right direction or have an idea of what is going on?
remember the "-n" when you do this:
echo -n key | base64
The \n after hidden is certainly invalid. Not sure if it is actually there or somehow got inserted when you were cleansing for posting.
Consider:
package main
import (
"fmt"
"golang.org/x/net/lex/httplex"
)
func main() {
fmt.Println("Is valid (without new line)", httplex.ValidHeaderFieldValue("AWS4-HMAC-SHA256 Credential=hidden/20170302/us-east-1/dynamodb/aws4_request, SignedHeaders=accept-encoding;content-length;content-type;host;x-amz-date;x-amz-target, Signature=483f56dd0b17d8945d3c2f2044b7f97e531190602f132a4d5f828264b3a2cff2"))
fmt.Println("Is valid (with new line)", httplex.ValidHeaderFieldValue("AWS4-HMAC-SHA256 Credential=hidden\n/20170302/us-east-1/dynamodb/aws4_request, SignedHeaders=accept-encoding;content-length;content-type;host;x-amz-date;x-amz-target, Signature=483f56dd0b17d8945d3c2f2044b7f97e531190602f132a4d5f828264b3a2cff2"))
}
One guess would be wherever the real hidden value is getting pulled from (config file etc) mistakenly has the \n in there and it's happily getting pulled into your header, but only in this case.

Logstash stuck when starting up

What's wrong with the following logstash configuration?
input {
file {
type => "access_log"
# Wildcards work, here :)
path => [ "/root/isaac/my_logs/access_logs/gw_access_log*"]
start_position => "beginning"
}
}
output {
stdout { debug => true }
elasticsearch { embedded => true }
}
When running the above configuration, logstash is stuck on startup as follows:
[root#myvm logstash]# java -jar logstash-1.3.3-flatjar.jar agent -f logstash-complex.conf
Using milestone 2 input plugin 'file'. This plugin should be stable, but if you see strange behavior, please let us know! For more information on plugin milestones, see http://logstash.net/docs/1.3.3/plugin-milestones {:level=>:warn}
More importantly what are the ways to debug the issue?
I already checked that the file i am putting in the path do exist.
That isn't stuck, that's running.
you get this:
Using milestone 2 input plugin 'file'. This plugin should be stable, but if you see strange behavior, please let us know! For more information on plugin milestones, see http://logstash.net/docs/1.3.3/plugin-milestones {:level=>:warn}
Once logstash has started successfully
If you add -- web onto the end of your command then you should be able to see some output in Kibana web interface
If you aren't seeing messages appear in the console, first I would check that new entries are definitely being written to the file(s) that you're trying to tail. Since you're using the stdout output you should see the messages written to the console at the same time as they're going into the embedded Elasticsearch.
What I would suggest is you simplify your config by removing the elasticsearch output - this should speed up the startup time (it can take a minute or two for the embedded elasticsearch instance to start up) and focus on getting messages onto the console output first.
If you do want more verbose debug output from Logstash you can start the program with -v, -vv or -vvv for progressively more detailed debug information. E.g.:
java -jar logstash-1.3.3-flatjar.jar agent -f logstash-complex.conf -vvv
Fair warning that -vvv does produce a LOT of debug information, so start with -v and work your way up.

Missing values from JMeter results file when run remotely

When running a test which makes use of the of the Jmeter-Plugins listener Response Times vs Threads or Active Threads Over Time remote running of the test plan produces a results file which contains missing results used to plot the actual graph, however when run locally all results are returned. E.g. when using the Response Times vs Threads:
Example of a local result:
1383659591841,59,Example 1,200,OK,Example 1 1-579,text,true,183,22,22,59
Example of a remote result:
1383659859149,43,Example 1,200,OK,Example 1 1-575,text,true,183,43
Note the last two fields are missing
I would check the script definition of the two server: maybe some configuration for the "Write results to file" controller has been changed.
Take the local jmx service and copy it to the remote server.
Also, look for differences in the "# Results file configuration" section of jmeter.properties file.
Make sure that on all of the slave/remote servers the jmeter.properties file within $JMETER_HOME/bin has the following setting
jmeter.save.saveservice.thread_counts=true
By default this is set to false (and commented out)
For more informtation:
JMeter Plugins Installation

Resources