I am using Elasticsearch version 6.4.3, and it was working as expected until today morning.
When I run elasticsearch.exe, it stays in initializing state, without throwing any errors.
elasticsearch.log
[2022-06-07T12:17:24,504][INFO ][o.e.n.Node ] [JO01HQESKAVM02] initializing ...
[2022-06-07T12:17:24,770][INFO ][o.e.e.NodeEnvironment ] [JO01HQESKAVM02] using [1] data paths, mounts [[(C:)]], net usable_space [136.2gb], net total_space [499.4gb], types [NTFS]
[2022-06-07T12:17:24,770][INFO ][o.e.e.NodeEnvironment ] [JO01HQESKAVM02] heap size [15.9gb], compressed ordinary object pointers [true]
elasticsearch.yml
bootstrap.memory_lock: false
cluster.name: elasticsearch
http.port: 9200
network.host: 0.0.0.0
node.data: true
node.ingest: true
node.master: true
node.max_local_storage_nodes: 1
node.name: JO01HQESKAVM02
path.data: C:\ELK\Elasticsearch\data
path.logs: C:\ELK\Elasticsearch\logs
transport.tcp.port: 9300
path.repo: ["/mount/backups", "/mount/longterm_backups"]
xpack.license.self_generated.type: basic
xpack.security.enabled: false
any help?
We are running a Confluent ElasticsearchSinkConnector on a dedicated K8S Kafka connect cluster, all seems to be working well and records appears on our Elasticsearch cluster.
Once in a while we are getting an unrecoverable error, which fails that task(s) and require a manual restart of the connector(s).
There are not much details regarding the error:
Caused by: org.apache.kafka.connect.errors.ConnectException: Bulk request failed
due to
Caused by: org.apache.http.ConnectionClosedException: Connection is closed
We are running with the following configurations:
Class: io.confluent.connect.elasticsearch.ElasticsearchSinkConnector
Config:
batch.size: 1000
behavior.on.malformed.documents: warn
behavior.on.null.values: delete
connection.compression: true
connection.password: my-password
connection.timeout.ms: 30000
connection.url: https://es-http.com:9200
connection.username: elastic
errors.log.enable: true
errors.log.include.messages: true
errors.tolerance: all
key.converter: org.apache.kafka.connect.storage.StringConverter
read.timeout.ms: 30000
retry.backoff.ms: 60000
schema.ignore: true
Topics: my-topic
Transforms: ExtractField
transforms.ExtractField.field: metadata
transforms.ExtractField.type: org.apache.kafka.connect.transforms.ExtractField$Value
value.converter: org.apache.kafka.connect.json.JsonConverter
value.converter.schemas.enable: false
Tasks Max: 10
We are running a 3 nodes Elasticsearch cluster from this image: docker.elastic.co/elasticsearch/elasticsearch:7.8.0, not sure if it is relevant.
There is no extra logs on neither the Elasticsearch cluster nor on the Kafka connect cluster.
Any suggestions?
Problem
I have a running cluster and I would like to add a data node into it. The running cluster is
x.x.x.246
and the data node is
x.x.x.99
each server can see each other by ping.
Machine OS: CentOS7
Elasticsearch: 7.61
configs:
here is elasticsearch.yml of x.x.x.246:
cluster.name: elasticsearch
node.master: true
node.name: Node_master
node.data: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: x.x.x.246
http.port: 9200
discovery.seed_hosts: ["x.x.x.99:9300"]
cluster.initial_master_nodes: ["x.x.x.246:9300"]
here is elasticsearch.yml of x.x.x.99
cluster.name: elasticsearch
node.name: Node_master
node.data: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: x.x.x.99
http.port: 9200
discovery.seed_hosts: ["x.x.x.245:9300"]
cluster.initial_master_nodes: ["x.x.x.246:9300"]
Testing running elasticsearch on machine
When I run systemctl start elasticsearch on each machine, it works well.
test run on x.x.x.246
curl -X GET "X.X.X.246:9200/_cluster/health?pretty"
show:number of the node not changing
curl -X GET "X.X.X.99:9200/_cluster/health?pretty
show:
{
"error" : {
"root_cause" : [
{
"type" : "master_not_discovered_exception",
"reason" : null
}
],
"type" : "master_not_discovered_exception",
"reason" : null
},
"status" : 503
}
edited
here is elasticsearch.yml of x.x.x.246:
cluster.name: elasticsearch
node.name: master
node.master: true
node.data: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
http.port: 9200
discovery.seed_hosts: ["x.x.x.99","x.x.x.246]
cluster.initial_master_nodes: ["x.x.x.246"]
logger.org.elasticsearch.discovery: TRACE
here is elasticsearch.yml of x.x.x.99
cluster.name: elasticsearch
node.name: node
node.data: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
http.port: 9200
discovery.seed_hosts: ["x.x.x.246","x.x.x.99"]
cluster.initial_master_nodes: ["x.x.x.246"]
logger.org.elasticsearch.discovery: TRACE
log on x.x.x.99:
[root#dev ~]# tail -30 /var/log/elasticsearch/elasticsearch.log
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:692) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.6.1.jar:7.6.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at java.lang.Thread.run(Thread.java:830) ~[?:?]
[2020-03-19T12:12:04,462][INFO ][o.e.c.c.JoinHelper ] [node-1] failed to join {master}{0UHYehfNQ2-WCadTC_VVkA}{1FNy5AJrTpKOCAejBLKR2w}{10.64.2.246}{10.64.2.246:9300}{dilm}{ml.machine_memory=1907810304, ml.max_open_jobs=20, xpack.installed=true} with JoinRequest{sourceNode={node-1}{jb_3lJq1R5-BZtxlPs_NyQ}{a4TYDhG7SWqL3CSG4tusEg}{10.64.2.99}{10.64.2.99:9300}{d}{xpack.installed=true}, optionalJoin=Optional[Join{term=178, lastAcceptedTerm=8, lastAcceptedVersion=100, sourceNode={node-1}{jb_3lJq1R5-BZtxlPs_NyQ}{a4TYDhG7SWqL3CSG4tusEg}{10.64.2.99}{10.64.2.99:9300}{d}{xpack.installed=true}, targetNode={master}{0UHYehfNQ2-WCadTC_VVkA}{1FNy5AJrTpKOCAejBLKR2w}{10.64.2.246}{10.64.2.246:9300}{dilm}{ml.machine_memory=1907810304, ml.max_open_jobs=20, xpack.installed=true}}]}
org.elasticsearch.transport.RemoteTransportException: [master][10.64.2.246:9300][internal:cluster/coordination/join]
Caused by: java.lang.IllegalStateException: failure when sending a validation request to node
at org.elasticsearch.cluster.coordination.Coordinator$2.onFailure(Coordinator.java:514) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:59) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1118) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1118) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.transport.InboundHandler.lambda$handleException$2(InboundHandler.java:244) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:633) ~[elasticsearch-7.6.1.jar:7.6.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at java.lang.Thread.run(Thread.java:830) [?:?]
Caused by: org.elasticsearch.transport.RemoteTransportException: [node-1][10.64.2.99:9300][internal:cluster/coordination/join/validate]
Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: join validation on cluster state with a different cluster uuid P4QlwvuRRGSmlT77RroSjA than local cluster uuid oUoIe2-bSbS2UPg722ud9Q, rejecting
at org.elasticsearch.cluster.coordination.JoinHelper.lambda$new$4(JoinHelper.java:148) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:257) ~[?:?]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:315) ~[?:?]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:264) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:692) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.6.1.jar:7.6.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at java.lang.Thread.run(Thread.java:830) ~[?:?]
For node x.x.x.99 the entry for seed host is wrong. It should be as below:
discovery.seed_hosts: ["x.x.x.246:9300"]
The discovery.seed_hosts list is used to detect the master node, since this list contains the address to the nodes which are master eligible nodes and hold the information of the current master node as well, Since it is pointed to x.x.x.245 instead of x.x.x.246 in the configuration of x.x.x.99, the node x.x.x.99 is unable to detect the master.
Post discussion in comment correct configuration should be:
Master node:
cluster.name: elasticsearch
node.name: master
node.master: true
node.data: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
http.port: 9200
discovery.seed_hosts: ["x.x.x.246]
cluster.initial_master_nodes: ["master"]
Note that if you want the above node to be master only and not hold data then set
node.data: false
Data node:
cluster.name: elasticsearch
node.name: data-node-1
node.data: true
node.master: false
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
http.port: 9200
discovery.seed_hosts: ["x.x.x.246"]
Also since node x.x.x.99 could not join cluster it has stale cluster state. So delete data folder on x.x.x.99 and restart this node.
The reason, why it wasn't able to elect a master, is mention of discovery.seed_hosts: ["x.x.x.245:9300"] which is not part of the current master node config and is not part of master node config as well. as mentioned in this official ES docs it's used to elect a master node.
You should read in details the 2 important configs related to master selection:
discovery.seed_hosts
initial_master_nodes
You can turn DEBUG logging on Discovery module to better understand it, by adding below the line in your elasticsearch.yml
logger.org.elasticsearch.discovery: DEBUG
You can do a few modifications in both the elasticsearch.yml.
node.name has same name in both nodes elasticsearch.yml.
It's better to just mention ip without port 9200.
Better to give network.host: 0.0.0.0 value, instead of node ip in both elasticsearch.yml.
node.data: true is the default, so no need to mention it.
So better and concise version looks like below:
Master node elasticsearch.yml
cluster.name: elasticsearch
node.name: master
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
discovery.seed_hosts: ["x.x.x.99", "x.x.x.246"] -->note this
cluster.initial_master_nodes: ["x.x.x.246"] :- note this
Another data node elasticsearch.yml
cluster.name: elasticsearch
node.name: data
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
http.port: 9200
discovery.seed_hosts: ["x.x.x.99", "x.x.x.246"] --> you need to change this and include both nodes
cluster.initial_master_nodes: ["x.x.x.246"]
Verify the master node
You can hit <your-any-node-ip>:9200/_cat/master and this should return the elected master node which would be in your case node with name master. more info on this.
I had also same issue, when I was trying to access elastic search from outside AWS windows server, I was not able to access it, after that I have added
network.host : aws_private_ip
and after that we need to restart elastic service, but it was throwing an error in restarting, and finally when had added below line, it works for me,
cluster.initial_master_nodes: node-1
I have installed an EFK stack to log nginx access log.
While using fresh install Im able to send data from Fluentd to elasticsearch without any problem. However, I installed searchguard to implement authentication on elasticsearch and kibana. Now Im able to login to Kibana and elasticsearch with searchguards demo user credentials.
Now my problem is that fluentd is unable to to connect to elasticsearch. From td-agent log im getting the following messages:
2018-07-19 15:20:34 +0600 [warn]: #0 failed to flush the buffer. retry_time=5 next_retry_seconds=2018-07-19 15:20:34 +0600 chunk="57156af05dd7bbc43d0b1323fddb2cd0" error_class=Fluent::Plugin::ElasticsearchOutput::ConnectionFailure error="Can not reach Elasticsearch cluster ({:host=>\"<elasticsearch-ip>\", :port=>9200, :scheme=>\"http\", :user=>\"logstash\", :password=>\"obfuscated\"})!"
Here is my Fluentd config
<source>
#type forward
</source>
<match user_count.**>
#type copy
<store>
#type elasticsearch
host https://<elasticsearch-ip>
port 9200
ssl_verify false
scheme https
user "logstash"
password "<logstash-password>"
index_name "custom_user_count"
include_tag_key true
tag_key "custom_user_count"
logstash_format true
logstash_prefix "custom_user_count"
type_name "custom_user_count"
utc_index false
<buffer>
flush_interval 2s
</buffer>
</store>
</match>
sg_roles.yml:
sg_logstash:
cluster:
- CLUSTER_MONITOR
- CLUSTER_COMPOSITE_OPS
- indices:admin/template/get
- indices:admin/template/put
indices:
'custom*':
'*':
- CRUD
- CREATE_INDEX
'logstash-*':
'*':
- CRUD
- CREATE_INDEX
'*beat*':
'*':
- CRUD
- CREATE_INDEX
Can anyone help me on this?
It seemed td-agent was using TLSv1 as default
added ssl_version TLSv1_2 to the config and now working
in spring boot project.
gradle:
dependencies {
compile('org.springframework.boot:spring-boot-starter-data-elasticsearch')
compile('io.searchbox:jest:2.0.3')
runtime('net.java.dev.jna:jna')
}
config.yml:
spring:
data:
elasticsearch:
cluster-nodes: 10.19.132.207:9300
cluster-name: es
elasticsearch:
jest:
uris: http://10.19.132.207:9200
read-timeout: 10000
And my es config:
cluster.name: es
node.name: node-1
network.host: 0.0.0.0
transport.tcp.port: 9300
http.port: 9200
when I want to save data to es. The console print:
Caused by: org.elasticsearch.client.transport.NoNodeAvailableException: None of the configured nodes are available: [{#transport#-1}{10.19.132.207}{10.19.132.207:9300}]
at org.elasticsearch.client.transport.TransportClientNodesService.ensureNodesAreAvailable(TransportClientNodesService.java:326) ~[elasticsearch-2.4.4.jar:2.4.4]
at org.elasticsearch.client.transport.TransportClientNodesService.execute(TransportClientNodesService.java:223) ~[elasticsearch-2.4.4.jar:2.4.4]
at org.elasticsearch.client.transport.support.TransportProxyClient.execute(TransportProxyClient.java:55) ~[elasticsearch-2.4.4.jar:2.4.4]
And my es print log:
java.lang.IllegalStateException: Received message from unsupported version: [2.0.0] minimal compatible version is: [5.0.0]
at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:1323) ~[elasticsearch-5.2.2.jar:5.2.2]
at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:74) ~[transport-netty4-5.2.2.jar:5.2.2]
How can I resolve this problem?
At first glance, it appears that neither 'spring-boot-starter-data-elasticsearch' nor 'jest 2.0.3' support Elasticsearch 5. I'd try downgrading your Elasticsearch instance to 2.4.4 and see if that works.