external access to ElasticSearch cluster - elasticsearch

Using this link I can easily setup a 3-node cluster on a single host, with docker-compose.
This is all fine if I just use ES via the included Kibana container.
However I need to access this cluster from external hosts. This becomes problematic because the nodes inside the cluster are exposed through their docker-internal IP address. The application uses this API call below to get the addresses, and then of course errors out.
$ curl 172.16.0.146:9200/_nodes/http?pretty
{
"_nodes" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"cluster_name" : "es-cluster-test",
"nodes" : {
"hYCGiuBLQMK4vn5I3C3pQQ" : {
"name" : "es01",
"transport_address" : "192.168.48.3:9300",
"host" : "192.168.48.3",
"ip" : "192.168.48.3",
"version" : "8.2.2",
.....
How can I overcome this? I have tried exposing the 9200/9300 ports for all 3 nodes to different ports on the docker-host, and then adding a network.publish_host=172.16.0.146 environment setting to each node, but this results in three 1-node clusters.
Someone must have faced this one in the past...

Related

Is there anyway to check if elasticsearch cluster exists or not?

I am working on an elasticsearch (es) cluster monitoring dashboard where I want to onboard all my es clusters. I am developing the dashboard from scratch. So, I wanted to add a button on the dashboard by clicking on that user will be able to enter the name of the es cluster address/IP(first time onboarding the cluster) then hit the submit button. If that es cluster exists then user should be able to monitor the cluster, if not then, it should show some error message to the user(on the dashboard) saying that "Sorry you have entered a wrong cluster address/IP". So, how can I determine if an es cluster exists or not?
A simple curl call to the ES cluster address and port should be enough to verify if an ES cluster exists or not.
For e.g. if we want to verify whether an ES cluster exists at http://localhost:9200, we would fire a curl call as follows:-
curl -XGET "http://localhost:9200/"
If the ES cluster exists/ has permissions to access, it would return a JSON as follows:
{
"name" : "es01",
"cluster_name" : "elasticsearch7",
"cluster_uuid" : "xu49eNE6SuC1Z857kG2Q5g",
"version" : {
"number" : "7.16.3",
"build_flavor" : "default",
"build_type" : "docker",
"build_hash" : "4e6e4eab2297e949ec994e688dad46290d018022",
"build_date" : "2022-01-06T23:43:02.825887787Z",
"build_snapshot" : false,
"lucene_version" : "8.10.1",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
Else, it would return an error as follows:-
curl: (7) Failed to connect to localhost port 9200: Connection refused
Please note, that you would need to use appropriate curl syntax according to the programming language. For the example, I have considered a bash script.

Autobalance the shards in ElasticSearch

We have 4 ElasticSearch nodes in version 5.6.9, that for some previous rules, they have an unbalanced number of shards in each node.
We have found that we can move one shard at a time to another node, but that is incredibly slow.
Apart from creating a script that uses the ElasticSearch API to balance the shards, is there another way?
You can do so using Cluster Reroute it allows for manual changes to the allocation of individual shards in the cluster. check out the docs Cluster Reroute
POST /_cluster/reroute
{
"commands" : [
{
"move" : {
"index" : "test", "shard" : 0,
"from_node" : "node1", "to_node" : "node2"
}
},
{
"allocate_replica" : {
"index" : "test", "shard" : 1,
"node" : "node3"
}
}
]
}
We found the issue, the system was not autorebalancing the cluster's indices, because we had the cluster.routing.rebalance.enable = none
We found the information here.
The problem we had with the cluster/reroute, was the according to the documentation the system will try to balance itself again. Either way, thanks for your help.

How to get HBase IP address for Phoenix URL

I can ssh to the Hadoop Cluster and can run the hbase command. But I need to connect using the Phoenix JDBC driver which needs the IP address of the HBase server.
I tried the IP address I used for the cluster with no luck.
This is probably just a generic Hadoop question but where are the IP addresses configured?
If you are aware of the hadoop cluster namenodes, then you can try pinging them or send a curl request like below
curl 'http://my-namenode-lv-101:50070/jmx?qry=Hadoop:service=NameNode,name=NameNodeStatus'
{
"beans" : [ {
"name" : "Hadoop:service=NameNode,name=NameNodeStatus",
"modelerType" : "org.apache.hadoop.hdfs.server.namenode.NameNode",
"SecurityEnabled" : false,
"NNRole" : "NameNode",
"HostAndPort" : "my-namenode-lv-101:8020",
"LastHATransitionTime" : 1561605051455,
"State" : "standby"
} ]
}
If the state is Standby, then that is the current inactive node, you have to try the other nodes to find for which the State says 'active' ... example below:
curl 'http://my-namenode-lv-102:50070/jmx?qry=Hadoop:service=NameNode,name=NameNodeStatus'
{
"beans" : [ {
"name" : "Hadoop:service=NameNode,name=NameNodeStatus",
"modelerType" : "org.apache.hadoop.hdfs.server.namenode.NameNode",
"State" : "active",
"SecurityEnabled" : false,
"NNRole" : "NameNode",
"HostAndPort" : "my-namenode-lv-102:8020",
"LastHATransitionTime" : 1561605054944
} ]
}
To connect to phoenix-hbase use the zookeeper address, port & zookeeper.znode.parent configuration's value which is configured in your cluster. (it can be found in your hbase-site.xml file)

Cant connect to my proxied elasticsearch node

I'm having issues with connecting from my Go client to my es node.
I have elasticsearch behind an nginx proxy that sets basic auth.
All settings are default in ES besides memory.
Via browser it works wonderfully, but not via this client:
https://github.com/olivere/elastic
I read the docs and it says it uses the /_nodes/http api to connect. Now this is probably where I did something wrong because the response from that api looks like this:
{
"_nodes" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"cluster_name" : "elasticsearch",
"nodes" : {
"u6TqFjAvRBa3_4FndfKh4w" : {
"name" : "u6TqFjA",
"transport_address" : "127.0.0.1:9300",
"host" : "127.0.0.1",
"ip" : "127.0.0.1",
"version" : "5.6.2",
"build_hash" : "57e20f3",
"roles" : [
"master",
"data",
"ingest"
],
"http" : {
"bound_address" : [
"[::1]:9200",
"127.0.0.1:9200"
],
"publish_address" : "127.0.0.1:9200",
"max_content_length_in_bytes" : 104857600
}
}
}
}
I'm guessing I have to set the IPs to my actual IP/domain (my domain is like es01.somedomain.com)
So how do i correctly configure elastisearch so that my go client can connect?
My config files for nginx look similar to this: https://www.elastic.co/blog/playing-http-tricks-nginx
Edit: I found a temporary solution by setting elastic.SetSniff(false) in the Options for the client, but I think that means I can't scale ES horizontally. So still looking for an alternative.
You are looking for the HTTP options, specifically http.publish_host and http.publish_port, which should be set to the publicly reachable address and port of the Nginx server proxying the ES node.
Note that with Elasticsearch listening on 127.0.0.1:9300 for the transport, you won't be able to form a cluster with nodes on other hosts. The transport can be configured similarly with the transport options.

Is it possible to organize data between elasticsearch shards based on stored data?

I want to build a data store with three nodes. The first one should keep all data, the second one data of the last month, the third data of the last week. Is it possible to automatically configure elasticsearch shards to relocate themselves between nodes so that this functionality is given?
if you want to move existing documents from some node to another then you can use _cluster/reroute.
But using this solution with automatic allocation can be dangerous as just after moving an index to target node it will try to even balance the cluster.
Or you can disable automatic allocations, in that case, only custom allocations will work and can be really risky to handle for large data set.
POST /_cluster/reroute
{
"commands" : [
{
"move" : {
"index" : "test", "shard" : 0,
"from_node" : "node1", "to_node" : "node2"
}
},
{
"allocate_replica" : {
"index" : "test", "shard" : 1,
"node" : "node3"
}
}
]
}
source: Elasticsearch rerouting
Also, you should read this : > Customize document routing

Resources