Setting up a Separate Monitoring Cluster for Elasticsearch - elasticsearch

I'm trying to setup a separate cluster (kibanacluster) for monitoring my primary elasticsearch cluster (marveltest). Below are the ES, Marvel and Kibana versions I'm using. The ES version is fixed for the moment. I can update or downgrade the other components if needed.
kibana-4.4.1
elasticsearch-2.2.1
marvel-agent-2.2.1
The monitoring cluster and Kibana are both running in the host 192.168.2.124 and the primary cluster is running in a separate host 192.168.2.116.
192.168.2.116: elasticsearch.yml
marvel.agent.exporter.es.hosts: ["192.168.2.124"]
marvel.enabled: true
marvel.agent.exporters:
id1:
type: http
host: ["http://192.168.2.124:9200"]
Looking at the DEBUG logs in the monitoring cluster i can see data is coming from the primary cluster but is getting "filtered" since the cluster name is different.
[2016-07-04 16:33:25,144][DEBUG][transport.netty ] [nodek] connected
to node [{#zen_unicast_2#}{192.168.2.124}{192.168.2.124:9300}]
[2016-07-04 16:33:25,144][DEBUG][transport.netty ] [nodek] connected
to node [{#zen_unicast_1#}{192.168.2.116}{192.168.2.116:9300}]
[2016-07-04 16:33:25,183][DEBUG][discovery.zen.ping.unicast] [nodek]
[1] filtering out response from
{node1}{Rmgg0Mw1TSmIpytqfnFgFQ}{192.168.2.116}{192.168.2.116:9300},
not same cluster_name [marveltest]
[2016-07-04 16:33:26,533][DEBUG][discovery.zen.ping.unicast] [nodek] [1] filtering out response from
{node1}{Rmgg0Mw1TSmIpytqfnFgFQ}{192.168.2.116}{192.168.2.116:9300},
not same cluster_name [marveltest]
[2016-07-04 16:33:28,039][DEBUG][discovery.zen.ping.unicast] [nodek] [1] filtering out response from
{node1}{Rmgg0Mw1TSmIpytqfnFgFQ}{192.168.2.116}{192.168.2.116:9300},
not same cluster_name [marveltest]
[2016-07-04 16:33:28,040][DEBUG][transport.netty ] [nodek] disconnecting from
[{#zen_unicast_2#}{192.168.2.124}{192.168.2.124:9300}] due to explicit
disconnect call
[2016-07-04 16:33:28,040][DEBUG][discovery.zen ]
[nodek] filtered ping responses: (filter_client[true],
filter_data[false])
--> ping_response{node [{nodek}{vQ-Iq8dKSz26AJUX77Ncfw}{192.168.2.124}{192.168.2.124:9300}],
id[42], master
[{nodek}{vQ-Iq8dKSz26AJUX77Ncfw}{192.168.2.124}{192.168.2.124:9300}],
hasJoinedOnce [true], cluster_name[kibanacluster]}
[2016-07-04 16:33:28,053][DEBUG][transport.netty ] [nodek] disconnecting from
[{#zen_unicast_1#}{192.168.2.116}{192.168.2.116:9300}] due to explicit
disconnect call [2016-07-04 16:33:28,057][DEBUG][transport.netty ]
[nodek] connected to node
[{nodek}{vQ-Iq8dKSz26AJUX77Ncfw}{192.168.2.124}{192.168.2.124:9300}]
[2016-07-04 16:33:28,117][DEBUG][discovery.zen.publish ] [nodek]
received full cluster state version 32 with size 5589

The issue is that you are mixing the use of Marvel 1.x settings with Marvel 2.2 settings, but also your other configuration seems to be off as Andrei pointed out in the comment.
marvel.agent.exporter.es.hosts: ["192.168.2.124"]
This isn't a setting known to Marvel 2.x. And depending on your copy/paste, it's also possible that the YAML is malformed due to whitespace:
marvel.agent.exporters:
id1:
type: http
host: ["http://192.168.2.124:9200"]
This should be:
marvel.agent.exporters:
id1:
type: http
host: ["http://192.168.2.124:9200"]
As Andrei was insinuating, you have likely added the production node(s) to your discovery.zen.ping.unicast.hosts, which attempts to join it with their cluster. I suspect you can just delete that setting altogether in your monitoring cluster.
[2016-07-04 16:33:26,533][DEBUG][discovery.zen.ping.unicast] [nodek] [1] filtering out response from {node1}{Rmgg0Mw1TSmIpytqfnFgFQ}{192.168.2.116}{192.168.2.116:9300}, not same cluster_name [marveltest]
This indicates that it's ignoring a node that it is connecting too because the other node (node1) isn't in the same cluster.
To setup a Separate Monitoring cluster, it's pretty straight forward, but it requires understanding the moving parts first.
You need a separate cluster with at least one node (most people get by with one node).
This separate cluster effectively has no knowledge about the cluster(s) it monitors. It only receives data.
You need to send the data from the production cluster(s) to that separate cluster.
The monitoring cluster interprets that the data using Kibana + Marvel UI plugin to display charts.
So, what you need:
Your production cluster needs to install marvel-agent on each node.
Each node needs to configure the exporter(s):
This is the same as you had before:
marvel.agent.exporters:
id1:
type: http
host: ["http://192.168.2.124:9200"]
Kibana should talk to the monitoring cluster (192.168.2.124 in this example) and Kibana needs the same version of the Marvel UI plugin.

Related

connect dotCMS cluster to external elasticsearch

I'm trying to create a cluster of three servers with dotCMS 5.2.6 installed.
They have to interface with a second cluster of 3 elasticsearch nodes.
Despite my attempts to combine them, the best case I've obtained is with both dotCMS and elastic up and running but from dot admin backend (Control panel > Configuration > Network) I always see my three servers with red status due to Index red status.
I have tested the following combinations:
In plugins/com.dotcms.config/conf/dotcms-config-cluster-ext.properties
AUTOWIRE_CLUSTER_TRANSPORT=false
es.path.home=WEB-INF/elasticsearch
Using AUTOWIRE_CLUSTER_TRANSPORT=true seems not to change the result
In plugins/com.dotcms.config/ROOT/dotserver/tomcat-8.5.32/webapps/ROOT/WEB-INF/elasticsearch/config/elasticsearch-override.yml
transport.tcp.port: 9301
discovery.zen.ping.unicast.hosts: first_es_server:9300, second_es_server:9300, third_es_server:9300
Using transport.tcp.port: 9300 cause dotCMS startup failure with error:
ERROR cluster.ClusterFactory - Unable to rewire cluster:Failed to bind to [9300]
Caused by: com.dotmarketing.exception.DotRuntimeException: Failed to bind to [9300]
Of course, port 9300 is listening on the three elasticsearch nodes they are configured with transport.tcp.port: 9300 and have no problem to start and create their cluster.
Using transport.tcp.port: 9301 dotCMS can start and join the elastic cluster but the index status is always red even if the indexation seems to work and nothing is apparently affected.
Using transport.tcp.port: 9309 (as suggested in the dotCMS online reference) or any other port number lead to the same result as 9301 case but from dot admin backend (Control panel > Configuration > Network) the Index information for each machine still repot 9301 as ES port.
Main Question
I would like to know where the ES port can be edited considering my Elasticsearch cluster is performing well (all indices are green) and the elasticsearch-override.yml within dotCMS plugin doesn't affect the default 9301 reported by the backend.
Is the HTTP interface enabled on ES? If not, I would enable it and see what the cluster health is and what the index health is. It might be that you need to adjust your expected replicas.
https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-health.html
and
https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-indices.html
FWIW, the upcoming version of dotCMS (5.3.0) does not support embedded elasticsearch and requires a vanilla external ES node/custer to connect to.

Kibana not showing monitoring data from external Elasticsearch node

yesterday I setup a dedicated single monitoring node following this guide.
I managed to fire up the new monitoring node with the same ES 6.6.0 version of the cluster, then added those lines to my elasticsearch.yml file on all ES cluster nodes :
xpack.monitoring.exporters:
id1:
type: http
host: ["http://monitoring-node-ip-here:9200"]
Then restarted all nodes and Kibana (that is actually running in one of the node of the ES cluster).
Now I can see today monitoring data indices being sent to the new monitoring external node but Kibana is showing a "You need to make some adjustments" when accessing the "Monitoring" section.
We checked the `cluster defaults` settings for `xpack.monitoring.exporters` , and found the
reason: `Remote exporters indicate a possible misconfiguration: id1`
Check that the intended exporters are enabled for sending statistics to the monitoring cluster,
and that the monitoring cluster host matches the `xpack.monitoring.elasticsearch` setting in
`kibana.yml` to see monitoring data in this instance of Kibana.
I already checked that all nodes are pingable each other , also I don't have xpack security so I haven't created any additional "remote_monitor" user.
I followed the error message and tried to add the xpack.monitoring.elasticsearch in kibana.yml file but I ended up with the following error :
FATAL ValidationError: child "xpack" fails because [child "monitoring" fails because [child
"elasticsearch" fails because ["url" is not allowed]]]
Hope anyone can help me in figuring what's wrong.
EDIT #1
Solved : problem was due to monitoring not being disabled in the monitoring cluster :
PUT _cluster/settings
{
"persistent": {
"xpack.monitoring.collection.enabled": false
}
}
Additional I made a mistake in kibana.yml configuration,
xpack.monitoring.elasticsearch should have been xpack.monitoring.elasticsearch.hosts
i had exactly the same problem but the root of cause was smth different.
here have a look
okay, i used to have the same problem.
my kibana did not show monitoring graphs, however
i had monitoring index index .monitoring-es-* available
the root of problem in my case was that my master nodes did not have :9200 HTTP socket available from the LAN. that is my config on master nodes was:
...
transport.host: [ "192.168.7.190" ]
transport.port: 9300
http.port: 9200
http.host: [ "127.0.0.1" ]
...
as you can see HTTP socket is available only from within host.
i didnt want if some one will make HTTP request for masters from LAN because there is
no point to do that.
However as i uderstand Kibana do not only read data from monitoring index
index .monitoring-es-*
but also make some requests directly for masters to get some information.
It was exactly why Kibana did not show anything about monitoring.
After i changed one line in the config on master node as
http.host: [ "192.168.0.190", "127.0.0.1" ]
immidiately kibana started to show monitoring graphs.
i recreated this expereminet several times.
Now all is working.
Also i want to underline in spite that now all is fine my monitoring index .monitoring-es-*
do NOT have "cluster_stats" documents.
So if your kibana do not show monitoring graphs i suggest
check if index .monitoring-es-* exists
check if your master nodes can serve HTTP requests from LAN

ElasticSearch Client Node Loses Connection on AWS EC2 with Kernel Log "Setting Capacity to 83886080"

I have an ElasticSearch 2.4.4 cluster with 3 client nodes, 3 master nodes, and 4 data nodes, all on AWS EC2. Servers are running Ubuntu 16.04.3 LTS (GNU/Linux 4.4.0-104-generic x86_64). An AWS Application ELB is in front of the client nodes.
At random times, one of the clients will suddenly write this message to the kernel log:
Jan 17 05:54:51 localhost kernel: [2101268.191447] Setting capacity to 83886080
Note, this is the size of the primary boot drive in sectors (it's 40GB). After this message is received, the client node loses its connection to the other nodes in the cluster, and reports:
[2018-01-17 05:56:21,483][INFO ][discovery.zen ] [prod_es_233_client_1] master_left [{prod_es_233_master_2}{0Sat6dx9QxegO2rM03_o9A}{172.31.101.13}{172.31.101.13:9300}{data=false, master=true}],
reason [failed to ping, tried [3] times, each with maximum [30s] timeout]
The kernel message seems to be coming from xen-blkfront.c
https://github.com/analogdevicesinc/linux/blob/8277d2088f33ed6bffaafbc684a6616d6af0250b/drivers/block/xen-blkfront.c#L2383
This problem seems unrelated to the number or type of requests to ES at the time, or any other load-related parameter. It just occurs randomly.
The Load Balancer will record 504s and 460s when attempting to contact the bad client. Other client nodes are not affected and return with normal speed.
Is this a problem with EC2's implementation of Xen?

ES upgraded from 2.4.6 to 5.3.2 - now fails to form the cluster

Did a cluster shutdown (all nodes) and then upgraded node by node from 2.4.6 to 5.3.2 and started cluster again node by node, but they all just continually complains like this:
[2017-10-26T23:21:48,072][WARN ][o.e.d.z.ZenDiscovery ] [d1r2n9] not enough master nodes discovered during pinging (found [[Candidate{node={d1r2n9}{jJ3HFWbhSfudgfaK4w-y8A}{yCVvctQ3TR6ye1k9txj6cg}{<ip>}{<ip>:9300}{rack=OPA3.4.16}, clusterStateVersion=-1}]], but needed [8]), pinging again
even though I've restart all 14 nodes again.
In 2.4.6 we used:
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: [<list of nodes>]
but had to remove this to start 5.3.2 nodes, believe I've read somewhere than 5.x now only uses unicast for cluster communication anyway.
Hints appreciated to rejoin cluster, TIA!
Solved: still had to defined the ping unicast host list :)
still had to defined the ping unicast host list :)

Elastic search : [Oddball] observer: timeout notification from cluster service. timeout setting [1m], time since start [1m]

Kibana is unable to load the data from elasticsearch. I could see the below log in the elasticsearch. I am using elasticsearch version 1.4.2. Is this something related to load? Could anyone please help me?
[2015-11-05 22:39:58,505][DEBUG][action.bulk ] [Oddball] observer: timeout notification from cluster service. timeout setting [1m], time since start [1m]
elastic search by default runs at http://localhost:9200
make sure you have proper URL in kibana.ymal
<pre>
# Kibana is served by a back end server. This controls which port to use.
port: 5601
# The host to bind the server to.
#host: example.com
# The Elastic search instance to use for all your queries.
elasticsearch_url: "http://localhost:9200"
</pre>
Aslo in elastic search config elasticsearch.yaml provide cluster name and http.cors.allow-origin.
<pre>
# Cluster name identifies your cluster for auto-discovery. If you're running
# multiple clusters on the same network, make sure you're using unique names.
#
cluster.name: elasticsearch
http.cors.allow-origin: "/.*/"
</pre>
I could solve this by setting up a new node for Elasticsearch and clearing the unassigned shards by setting the replica to 0.

Resources