Related
I have set-up a ELK but I see elasticsearch not creating the Index and unable to upload the data, Service Elasticsearch and Logstash both are running..
Below is the details.. However I do not see anything on he logs.
Elastic config:
[root#aruba-elk2 rm_logs]# cat /etc/elasticsearch/elasticsearch.yml
# Elasticserach config
#########################
cluster.name: log-cohort-test
node.name: aruba-elk2
node.master: true
path:
data: /elk/lib/elasticsearch
logs: /var/log/elasticsearch
network.host: 0.0.0.0
http.port: 9200
bootstrap.system_call_filter: False
[root#aruba-elk2 rm_logs]#
[root#aruba-elk2 rm_logs]#
LOGSTASH COnfig:
[root#aruba-elk2 rm_logs]# cat /etc/logstash/logstash.yml
path.data: /var/lib/logstash
path.logs: /var/log/logstash
[root#aruba-elk2 rm_logs]# cat /etc/logstash/conf.d/logstash-syslog.conf
input {
file {
path => [ "/elk/rm_logs/*.txt" ]
type => "rmlog"
}
}
filter {
if [type] == "rmlog" {
grok {
match => { "message" => "%{HOSTNAME:hostname},%{DATE:date},%{HOUR:hour1}:%{MINUTE:minute1},%{NUMBER}-%{WORD},%{USER:user},%{USER:user2} %{NUMBER:pid} %{NUMBER:float} %{NUMBER:float} %{NUMBER:number1} %{NUMBER:number2} %{DATA} %{HOUR:hour2}:%{MINUTE:minute2} %{HOUR:hour3}:%{MINUTE:minute3} %{GREEDYDATA:command},%{PATH:path}" }
add_field => [ "received_at", "%{#timestamp}" ]
}
}
}
output {
if [type] == "rmlog" {
elasticsearch {
hosts => ["aruba-elk2:9200"]
manage_template => false
index => "rmlog-%{+YYYY.MM.dd}"
#document_type => "messages"
}
}
}
Input data Source:
[root#aruba-elk2 rm_logs]# cd /elk/rm_logs/
[root#aruba-elk2 rm_logs]# ls -ltrh | head
total 2.6M
-rw-r--r-- 1 root root 558 Jan 11 11:27 dbxchw092.txt
-rw-r--r-- 1 root root 405 Jan 11 11:27 dbxtx220.txt
-rw-r--r-- 1 root root 241 Jan 11 11:27 dbxcvm139.txt
-rw-r--r-- 1 root root 455 Jan 11 11:27 dbxcnl038.txt
-rw-r--r-- 1 root root 230 Jan 11 11:27 dbxchw052.txt
-rw-r--r-- 1 root root 143 Jan 11 11:27 dbxtx222.txt
-rw-r--r-- 1 root root 577 Jan 11 11:27 dbxtx224.txt
-rw-r--r-- 1 root root 274 Jan 11 11:27 dbxcvm082.txt
-rw-r--r-- 1 root root 281 Jan 11 11:27 dbxcsb003.txt
Sample of above data file:
testhost-in2,19/01/11,06:34,04-mins,arnav,arnav 2427 0.1 0.0 58980 580 ? S 06:30 0:00 rm -rf /test/ehf/users/arnav-090119-184844,/dv/ehf/users/arnav-090119-
testhost-in2,19/01/11,06:40,09-mins,arnav,arnav 2427 0.1 0.0 58980 580 ? S 06:30 0:00 rm -rf /dv/ehf/users/arnav-090119-184844,/dv/ehf/users/arnav-090119-\
testhost-in2,19/01/11,06:45,14-mins,arnav,arnav 2427 0.1 0.0 58980 580 ? S 06:30 0:01 rm -rf /
LOGS:
Logstash logs:
[root#aruba-elk2 logstash]# cat logstash-plain.log
[2019-01-12T23:48:31,653][INFO ][logstash.runner ] Starting Logstash {"logstash.version"=>"6.5.4"}
[2019-01-12T23:48:34,959][INFO ][logstash.pipeline ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>48, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[2019-01-12T23:48:35,374][INFO ][logstash.outputs.elasticsearch] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://aruba-elk2:9200/]}}
[2019-01-12T23:48:35,588][WARN ][logstash.outputs.elasticsearch] Attempted to resurrect connection to dead ES instance, but got an error. {:url=>"http://aruba-elk2:9200/", :error_type=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError, :error=>"Elasticsearch Unreachable: [http://aruba-elk2:9200/][Manticore::SocketException] Connection refused"}
[2019-01-12T23:48:35,608][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["//aruba-elk2:9200"]}
[2019-01-12T23:48:36,063][INFO ][logstash.inputs.file ] No sincedb_path set, generating one based on the "path" setting {:sincedb_path=>"/var/lib/logstash/plugins/inputs/file/.sincedb_076330d5fd2c2b811bc1960a3d0547be", :path=>["/elk/rm_logs/*.txt"]}
[2019-01-12T23:48:36,095][INFO ][logstash.pipeline ] Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x424bb675 run>"}
[2019-01-12T23:48:36,155][INFO ][filewatch.observingtail ] START, creating Discoverer, Watch with file and sincedb collections
[2019-01-12T23:48:36,156][INFO ][logstash.agent ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[2019-01-12T23:48:36,542][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
[2019-01-12T23:48:40,796][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://aruba-elk2:9200/"}
[2019-01-12T23:48:40,855][INFO ][logstash.outputs.elasticsearch] ES Output version determined {:es_version=>6}
[2019-01-12T23:48:40,859][WARN ][logstash.outputs.elasticsearch] Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>6}
Elasticsearch LOGS:
[root#aruba-elk2 elasticsearch]# cat gc.log.0.current| tail
2019-01-13T00:13:29.280+0530: 1237.781: Total time for which application threads were stopped: 0.0002681 seconds, Stopping threads took: 0.0000316 seconds
2019-01-13T00:13:31.281+0530: 1239.782: Total time for which application threads were stopped: 0.0003670 seconds, Stopping threads took: 0.0000586 seconds
2019-01-13T00:13:32.281+0530: 1240.782: Total time for which application threads were stopped: 0.0003134 seconds, Stopping threads took: 0.0000708 seconds
2019-01-13T00:13:37.282+0530: 1245.783: Total time for which application threads were stopped: 0.0004663 seconds, Stopping threads took: 0.0001315 seconds
2019-01-13T00:13:51.284+0530: 1259.785: Total time for which application threads were stopped: 0.0004230 seconds, Stopping threads took: 0.0000691 seconds
2019-01-13T00:13:57.286+0530: 1265.787: Total time for which application threads were stopped: 0.0008421 seconds, Stopping threads took: 0.0002697 seconds
2019-01-13T00:13:58.287+0530: 1266.787: Total time for which application threads were stopped: 0.0004467 seconds, Stopping threads took: 0.0000706 seconds
2019-01-13T00:14:11.288+0530: 1279.789: Total time for which application threads were stopped: 0.0004702 seconds, Stopping threads took: 0.0001105 seconds
2019-01-13T00:14:18.289+0530: 1286.790: Total time for which application threads were stopped: 0.0004123 seconds, Stopping threads took: 0.0000750 seconds
Any help will be appreciated..
I am issuing a snapshot creation command as follows:
curl -v -XPUT "http://localhost:9200/_snapshot/s3-backup/hospitals" -d '{\
"indices":"hospitals",\
"ignore_unavailable":true,\
"include_global_state":false\
}'
in order to create a snapshot named hospitals for the index named hospitals. However, when I get back the status of the index, it reports that it is creating a snapshot for all the indices. Here it is:
curl -X GET "localhost:9200/_snapshot/s3-backup/hospitals"
>
{"snapshots":[{"snapshot":"hospitals","uuid":"
<uid>","version_id":5050299,"version":"5.5.2","indices":
[".kibana","super_category","professions","white_page","vertical",
"location","postal_codes","hospitals","enhanced_services",
"yellow_pages_2","listings","public_sector"],"state":"IN_PROGRESS",
"start_time":"2018-08-18T05:10:34.692Z","start_time_in_millis":1534569034692,
"end_time":"1970-01-01T00:00:00.000Z",
"end_time_in_millis":0,
"duration_in_millis":-1534569034692,"failures":[],"shards":
{"total":0,"failed":0,"successful":0}}]}
What am I doing wrong? Note that I am using the S3 Repository plugin.
Update
#AHT asked me to post here the output of _cut/shards:
vertical 0 p STARTED 27086 1.2gb <IP> prod-data-1
vertical 0 r STARTED 27086 1.3gb <IP> prod-data-2
white_page 0 p STARTED 10579484 25.7gb <IP> prod-data-1
white_page 0 r STARTED 10579484 25.7gb <IP> prod-data-2
professions 0 p STARTED 24425 823.4mb <IP> prod-data-1
professions 0 r STARTED 24425 823.4mb <IP> prod-data-2
location 0 p STARTED 15048 27.2mb <IP> prod-data-1
location 0 r STARTED 15048 26.5mb <IP> prod-data-2
enhanced_services 0 p STARTED 10385 15mb <IP> prod-data-1
enhanced_services 0 r STARTED 10385 15.7mb <IP> prod-data-2
postal_codes 0 r STARTED 56079 32.6mb <IP> prod-data-1
postal_codes 0 p STARTED 56079 32.6mb <IP> prod-data-2
.kibana 0 r STARTED 2 6.2kb <IP> prod-data-1
.kibana 0 p STARTED 2 6.2kb <IP> prod-data-2
yellow_pages_2 0 p STARTED 728676 4.3gb <IP> prod-data-1
yellow_pages_2 0 r STARTED 728676 3.3gb <IP> prod-data-2
super_category 0 r STARTED 2576 7.6mb <IP> prod-data-1
super_category 0 p STARTED 2576 7.6mb <IP> prod-data-2
listings 0 r STARTED 2468540 21.6gb <IP> prod-data-1
listings 0 p STARTED 2468540 21.7gb <IP> prod-data-2
public_sector 0 p STARTED 76444 741.4mb <IP> prod-data-1
public_sector 0 r STARTED 76444 733.9mb <IP> prod-data-2
hospitals 0 r STARTED 726 281.2kb <IP> prod-data-1
hospitals 0 p STARTED 726 293.4kb <IP> prod-data-2
I am running elasticsearch on a 4GB instance and having a 2GB heap size. The elasticsearch runs fine but after some requests it gets stopped and than starts back on its own with the following logs.
[2016-07-11 11:11:04,492][INFO ][discovery ] [Ev Teel Urizen] elasticsearch/9amXiXXmTTyV_l_s3t6gnw
[2016-07-11 11:11:07,568][INFO ][cluster.service ] [Ev Teel Urizen] new_master {Ev Teel Urizen}{9amXiXXmTTyV_l_s3t6gnw}{P.P.P.P}{P.P.P.P:$300}, reason: zen-disco-join(elected_as_master, [0] joins received)
[2016-07-11 11:11:07,611][INFO ][http ] [Ev Teel Urizen] publish_address {P.P.P.P:9200}, bound_addresses {P.P.P.P:9200}
[2016-07-11 11:11:07,611][INFO ][node ] [Ev Teel Urizen] started
[2016-07-11 11:11:07,641][INFO ][gateway ] [Ev Teel Urizen] recovered [1] indices into cluster_state
[2016-07-11 11:11:07,912][INFO ][cluster.routing.allocation] [Ev Teel Urizen] Cluster health status changed from [RED] to [GREEN] (reason: [shards start$d [[user-details-index][0]] ...]).
[2016-07-11 11:13:01,482][INFO ][node ] [Ev Teel Urizen] stopping ...
[2016-07-11 11:13:01,503][INFO ][node ] [Ev Teel Urizen] stopped
[2016-07-11 11:13:01,503][INFO ][node ] [Ev Teel Urizen] closing ...
[2016-07-11 11:13:01,507][INFO ][node ] [Ev Teel Urizen] closed
where P.P.P.P is the private IP of the instance.
EDIT: Logs after changing log-level to DEBUG
[2016-07-11 12:46:34,129][DEBUG][index.shard ] [Cameron Hodge] [user-details-index][0] recovery completed from [shard_store], took [106ms]
[2016-07-11 12:46:34,129][DEBUG][cluster.action.shard ] [Cameron Hodge] [user-details-index][0] sending shard started for target shard [[user-details-index][0], node[TUeznaAxQsqaFq6iDbVUVw], [P], v[149], s[INITIALIZING], a[id=j_DlCimrSoi7kff-9Ah9xw], unassigned_info[[reason=CLUSTER_RECOVERED], at[2016-07-11T12:46:33.774Z]]], indexUUID [tStiDr3PRWaKIKCIHk5h0A], message [after recovery from store]
[2016-07-11 12:46:34,129][DEBUG][cluster.action.shard ] [Cameron Hodge] received shard started for target shard [[user-details-index][0], node[TUeznaAxQsqaFq6iDbVUVw], [P], v[149], s[INITIALIZING], a[id=j_DlCimrSoi7kff-9Ah9xw], unassigned_info[[reason=CLUSTER_RECOVERED], at[2016-07-11T12:46:33.774Z]]], indexUUID [tStiDr3PRWaKIKCIHk5h0A], message [after recovery from store]
[2016-07-11 12:46:34,130][DEBUG][cluster.service ] [Cameron Hodge] processing [shard-started ([user-details-index][0], node[TUeznaAxQsqaFq6iDbVUVw], [P], v[149], s[INITIALIZING], a[id=j_DlCimrSoi7kff-9Ah9xw], unassigned_info[[reason=CLUSTER_RECOVERED], at[2016-07-11T12:46:33.774Z]]), reason [after recovery from store]]: execute
[2016-07-11 12:46:34,131][INFO ][cluster.routing.allocation] [Cameron Hodge] Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[user-details-index][0]] ...]).
[2016-07-11 12:46:34,131][DEBUG][cluster.service ] [Cameron Hodge] cluster state updated, version [4], source [shard-started ([user-details-index][0], node[TUeznaAxQsqaFq6iDbVUVw], [P], v[149], s[INITIALIZING], a[id=j_DlCimrSoi7kff-9Ah9xw], unassigned_info[[reason=CLUSTER_RECOVERED], at[2016-07-11T12:46:33.774Z]]), reason [after recovery from store]]
[2016-07-11 12:46:34,131][DEBUG][cluster.service ] [Cameron Hodge] publishing cluster state version [4]
[2016-07-11 12:46:34,135][DEBUG][cluster.service ] [Cameron Hodge] set local cluster state to version 4
[2016-07-11 12:46:34,135][DEBUG][index.shard ] [Cameron Hodge] [user-details-index][0] state: [POST_RECOVERY]->[STARTED], reason [global state is [STARTED]]
[2016-07-11 12:46:34,153][DEBUG][cluster.service ] [Cameron Hodge] processing [shard-started ([user-details-index][0], node[TUeznaAxQsqaFq6iDbVUVw], [P], v[149], s[INITIALIZING], a[id=j_DlCimrSoi7kff-9Ah9xw], unassigned_info[[reason=CLUSTER_RECOVERED], at[2016-07-11T12:46:33.774Z]]), reason [after recovery from store]]: took 23ms done applying updated cluster_state (version: 4, uuid: G_fovEaNRFOfF0lHjz0h2A)
[2016-07-11 12:47:00,583][DEBUG][indices.memory ] [Cameron Hodge] recalculating shard indexing buffer, total is [203.1mb] with [1] active shards, each shard set to indexing=[203.1mb], translog=[64kb]
[2016-07-11 12:47:01,648][INFO ][node ] [Cameron Hodge] stopping ...
[2016-07-11 12:47:01,660][DEBUG][indices ] [Cameron Hodge] [user-details-index] closing ... (reason [shutdown])
[2016-07-11 12:47:01,661][DEBUG][indices ] [Cameron Hodge] [user-details-index] closing index service (reason [shutdown])
[2016-07-11 12:47:01,662][DEBUG][index ] [Cameron Hodge] [user-details-index] [0] closing... (reason: [shutdown])
[2016-07-11 12:47:01,664][DEBUG][index.shard ] [Cameron Hodge] [user-details-index][0] state: [STARTED]->[CLOSED], reason [shutdown]
[2016-07-11 12:47:01,664][DEBUG][index.shard ] [Cameron Hodge] [user-details-index][0] operations counter reached 0, will not accept any further writes
[2016-07-11 12:47:01,664][DEBUG][index.engine ] [Cameron Hodge] [user-details-index][0] flushing shard on close - this might take some time to sync files to disk
[2016-07-11 12:47:01,666][DEBUG][index.engine ] [Cameron Hodge] [user-details-index][0] close now acquiring writeLock
[2016-07-11 12:47:01,666][DEBUG][index.engine ] [Cameron Hodge] [user-details-index][0] close acquired writeLock
[2016-07-11 12:47:01,668][DEBUG][index.translog ] [Cameron Hodge] [user-details-index][0] translog closed
[2016-07-11 12:47:01,672][DEBUG][index.engine ] [Cameron Hodge] [user-details-index][0] engine closed [api]
[2016-07-11 12:47:01,672][DEBUG][index.store ] [Cameron Hodge] [user-details-index][0] store reference count on close: 0
[2016-07-11 12:47:01,672][DEBUG][index ] [Cameron Hodge] [user-details-index] [0] closed (reason: [shutdown])
[2016-07-11 12:47:01,672][DEBUG][indices ] [Cameron Hodge] [user-details-index] closing index cache (reason [shutdown])
[2016-07-11 12:47:01,672][DEBUG][index.cache.query.index ] [Cameron Hodge] [user-details-index] full cache clear, reason [close]
[2016-07-11 12:47:01,673][DEBUG][index.cache.bitset ] [Cameron Hodge] [user-details-index] clearing all bitsets because [close]
[2016-07-11 12:47:01,673][DEBUG][indices ] [Cameron Hodge] [user-details-index] clearing index field data (reason [shutdown])
[2016-07-11 12:47:01,674][DEBUG][indices ] [Cameron Hodge] [user-details-index] closing analysis service (reason [shutdown])
[2016-07-11 12:47:01,674][DEBUG][indices ] [Cameron Hodge] [user-details-index] closing mapper service (reason [shutdown])
[2016-07-11 12:47:01,674][DEBUG][indices ] [Cameron Hodge] [user-details-index] closing index query parser service (reason [shutdown])
[2016-07-11 12:47:01,680][DEBUG][indices ] [Cameron Hodge] [user-details-index] closing index service (reason [shutdown])
[2016-07-11 12:47:01,680][DEBUG][indices ] [Cameron Hodge] [user-details-index] closed... (reason [shutdown])
[2016-07-11 12:47:01,680][INFO ][node ] [Cameron Hodge] stopped
[2016-07-11 12:47:01,680][INFO ][node ] [Cameron Hodge] closing ...
[2016-07-11 12:47:01,685][INFO ][node ] [Cameron Hodge] closed
I am trying to setup a ES cluster over windows machine using uni cast. I think I have made all required configuration changes, but still my ES nodes do not form cluster.Could someone please let me know what I am missing. Please find below my elasticseach.yml configurations
=======Noed 8=======
cluster.name: elasticsearch
node.name: NODE8
node.data: true
network.host: "10.249.167.8"
network.publish_host: "10.249.167.8"
network.bind: "10.249.167.8"
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["10.249.167.9", "10.249.167.10", "10.249.167.8"]
transport.tcp.port: 9300
=======Node9 Config========
cluster.name: elasticsearch
node.name: NODE9
node.data: true
network.host: "10.249.167.9"
network.publish_host: "10.249.167.9"
network.bind: "10.249.167.9"
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["10.249.167.9", "10.249.167.10", "10.249.167.8"]
transport.tcp.port: 9300
I can query both ES node individually, but they dont form cluster
Node 8 Get : http://10.249.167.8:9200/_cat/nodes?h=ip,port,heapPercent,name
10.249.167.8 9300 2 Cecilia Reyes
Node 9 Get : http://10.249.167.9:9200/_cat/nodes?h=ip,port,heapPercent,name
10.249.167.9 9300 9 Victorius
Following are the startup logs, any help would be appreciated a ton, I am stuck on this for a while now:(
[2016-02-13 01:08:06,395][WARN ][bootstrap ] unable to install syscall filter: syscall filtering not supported for OS: 'Windows Server 2012 R2'
[2016-02-13 01:08:06,645][INFO ][node ] [NODE8] version[2.1.1], pid[7628], build[40e2c53/2015-12-15T13:05:55Z]
[2016-02-13 01:08:06,645][INFO ][node ] [NODE8] initializing ...
[2016-02-13 01:08:07,020][INFO ][plugins ] [NODE8] loaded [cloud-azure], sites []
[2016-02-13 01:08:07,051][INFO ][env ] [NODE8] using [1] data paths, mounts [[(C:)]], net usable_space [94.6gb], net total_space [126.6gb], spins? [unknown], types [NTFS]
[2016-02-13 01:08:09,170][INFO ][node ] [NODE8] initialized
[2016-02-13 01:08:09,170][INFO ][node ] [NODE8] starting ...
[2016-02-13 01:08:09,357][INFO ][transport ] [NODE8] publish_address {10.249.167.8:9300}, bound_addresses {10.249.167.8:9300}
[2016-02-13 01:08:09,373][INFO ][discovery ] [NODE8] elasticsearch/i42Qv-qNSJaSoLRCt2e5tg
[2016-02-13 01:08:13,936][INFO ][cluster.service ] [NODE8] new_master {NODE8}{i42Qv-qNSJaSoLRCt2e5tg}{10.249.167.8}{10.249.167.8:9300}, reason: zen-disco-join(elected_as_master, [0] joins received)
[2016-02-13 01:08:13,983][INFO ][http ] [NODE8] publish_address {10.249.167.8:9200}, bound_addresses {10.249.167.8:9200}
[2016-02-13 01:08:13,983][INFO ][node ] [NODE8] started
[2016-02-13 01:08:16,715][INFO ][gateway ] [NODE8] recovered [1] indices into cluster_state
Node 9 Log========================================================================
[2016-02-13 01:08:44,988][WARN ][bootstrap ] unable to install syscall filter: syscall filtering not supported for OS: 'Windows Server 2012 R2'
[2016-02-13 01:08:45,237][INFO ][node ] [NODE9] version[2.1.1], pid[6468], build[40e2c53/2015-12-15T13:05:55Z]
[2016-02-13 01:08:45,237][INFO ][node ] [NODE9] initializing ...
[2016-02-13 01:08:45,601][INFO ][plugins ] [NODE9] loaded [cloud-azure], sites []
[2016-02-13 01:08:45,625][INFO ][env ] [NODE9] using [1] data paths, mounts [[(C:)]], net usable_space [113.6gb], net total_space [126.6gb], spins? [unknown], types [NTFS]
[2016-02-13 01:08:47,554][INFO ][node ] [NODE9] initialized
[2016-02-13 01:08:47,554][INFO ][node ] [NODE9] starting ...
[2016-02-13 01:08:47,753][INFO ][transport ] [NODE9] publish_address {10.249.167.9:9300}, bound_addresses {10.249.167.9:9300}
[2016-02-13 01:08:47,763][INFO ][discovery ] [NODE9] elasticsearch/ys7WjfT3QR2DqwLFr-m6Ew
[2016-02-13 01:08:52,292][INFO ][cluster.service ] [NODE9] new_master {NODE9}{ys7WjfT3QR2DqwLFr-m6Ew}{10.249.167.9}{10.249.167.9:9300}, reason: zen-disco-join(elected_as_master, [0] joins received)
[2016-02-13 01:08:52,342][INFO ][http ] [NODE9] publish_address {10.249.167.9:9200}, bound_addresses {10.249.167.9:9200}
[2016-02-13 01:08:52,342][INFO ][node ] [NODE9] started
[2016-02-13 01:08:53,649][INFO ][gateway ] [NODE9] recovered [0] indices into cluster_state
The problem is looks like split brain problem.
It means the node gets separated and act as master on their own.
From your logs, its clearly visible that your nodes had created two different clusters literally.
to avoid the split brain, the possible way is
mentioning the minimum master nodes
discovery.zen.minimum_master_nodes
can be calculated by using following calculation
minimum master node = (N/2)+1
where N is number of nodes
for example, if you are having 3 nodes in your cluster, you can set as
discovery.zen.minimum_master_nodes: 2
After configuring and installing Elasticsearch I got this error while checking the logs.
[2016-01-25 15:37:33,223][WARN ][bootstrap ] Unable to lock JVM Memory: error=12,reason=Cannot allocate memory
[2016-01-25 15:37:33,223][WARN ][bootstrap ] This can result in part of the JVM being swapped out.
[2016-01-25 15:37:33,224][WARN ][bootstrap ] Increase RLIMIT_MEMLOCK, soft limit: 65536, hard limit: 65536
[2016-01-25 15:37:33,224][WARN ][bootstrap ] These can be adjusted by modifying /etc/security/limits.conf, for example:
# allow user 'elasticsearch' mlockall
elasticsearch soft memlock unlimited
elasticsearch hard memlock unlimited
[2016-01-25 15:37:33,224][WARN ][bootstrap ] If you are logged in interactively, you will have to re-login for the new limits to take effect.
[2016-01-25 15:37:33,428][INFO ][node ] [node-1] version[2.1.0], pid[13298], build[72cd1f1/2015-11-18T22:40:03Z]
[2016-01-25 15:37:33,428][INFO ][node ] [node-1] initializing ...
[2016-01-25 15:37:33,508][INFO ][plugins ] [node-1] loaded [], sites []
[2016-01-25 15:37:33,528][INFO ][env ] [node-1] using [1] data paths, mounts [[/ (rootfs)]], net usable_space [43.8gb], net total_space [49.9gb], spins? [unknown], types [rootfs]
[2016-01-25 15:37:35,022][INFO ][node ] [node-1] initialized
[2016-01-25 15:37:35,022][INFO ][node ] [node-1] starting ...
[2016-01-25 15:37:35,088][INFO ][transport ] [node-1] publish_address {10.155.153.74:9300}, bound_addresses {10.155.153.74:9300}
[2016-01-25 15:37:35,097][INFO ][discovery ] [node-1] Elasticsearch/M0pCcU6UQ1ShHxlOZ4U22w
[2016-01-25 15:37:38,157][INFO ][cluster.service ] [node-1] new_master {node-1}{M0pCcU6UQ1ShHxlOZ4U22w}{10.155.153.74}{10.155.153.74:9300}{master=true}, reason: zen-disco-join(elected_as_master, [0] joins received)
[2016-01-25 15:37:38,195][INFO ][http ] [node-1] publish_address {10.155.153.74:9200}, bound_addresses {10.155.153.74:9200}
[2016-01-25 15:37:38,196][INFO ][node ] [node-1] started
[2016-01-25 15:37:38,250][INFO ][gateway ] [node-1] recovered [0] indices into cluster_state
[2016-01-25 15:37:45,458][INFO ][cluster.metadata ] [node-1] [.kibana] creating index, cause [api], templates [], shards [1]/[1], mappings [config
I checked for bootstrap.mlockall: true too .
Elasticsearch.yml file
# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
# Before you set out to tweak and tune the configuration, make sure you
# understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please see the documentation for further information on configuration options:
# <http://www.elastic.co/guide/en/elasticsearch/reference/current/setup-configuration.html>
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: Elasticsearch
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: node-1
node.master: true
node.data: true
#
# Add custom attributes to the node:
#
# node.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
# path.data: /path/to/data
#
# Path to log files:
#
# path.logs: /path/to/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
bootstrap.mlockall: true
#
# Make sure that the `ES_HEAP_SIZE` environment variable is set to about half the memory
# available on the system and that the owner of the process is allowed to use this limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
network.host: 10.155.153.74
network.publish_host: 10.155.153.74
network.bind_host: 10.155.153.74
#
# Set a custom port for HTTP:
#
http.port: 9200
discovery.zen.ping.multicast.enabled: false
# http.cors.enabled: true
# http.cors.allow-origin: http://tvmatp326579d.ad.infosys.com:5601/
#
# For more information, see the documentation at:
# <http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-network.html>
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
# gateway.recover_after_nodes: 3
#
# For more information, see the documentation at:
# <http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-gateway.html>
#
# --------------------------------- Discovery ----------------------------------
#
# Elasticsearch nodes will find each other via unicast, by default.
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
# discovery.zen.ping.unicast.hosts: ["host1", "host2"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of nodes / 2 + 1):
#
# discovery.zen.minimum_master_nodes: 3
#
# For more information, see the documentation at:
# <http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery.html>
#
# ---------------------------------- Various -----------------------------------
#
# Disable starting multiple nodes on a single system:
#
# node.max_local_storage_nodes: 1
#
# Require explicit names when deleting indices:
#
# action.destructive_requires_name: true
Can anybody tell what could be the issue? why ES is not able to lock JVM memory?
update:
Set the environment variable
ES_HEAP_SIZE
Ref: Heap sizing
question 2: (in comments)
make sure that port 9200 and 9300 are not blocked by firewall