How to get HBase IP address for Phoenix URL - hadoop

I can ssh to the Hadoop Cluster and can run the hbase command. But I need to connect using the Phoenix JDBC driver which needs the IP address of the HBase server.
I tried the IP address I used for the cluster with no luck.
This is probably just a generic Hadoop question but where are the IP addresses configured?

If you are aware of the hadoop cluster namenodes, then you can try pinging them or send a curl request like below
curl 'http://my-namenode-lv-101:50070/jmx?qry=Hadoop:service=NameNode,name=NameNodeStatus'
{
"beans" : [ {
"name" : "Hadoop:service=NameNode,name=NameNodeStatus",
"modelerType" : "org.apache.hadoop.hdfs.server.namenode.NameNode",
"SecurityEnabled" : false,
"NNRole" : "NameNode",
"HostAndPort" : "my-namenode-lv-101:8020",
"LastHATransitionTime" : 1561605051455,
"State" : "standby"
} ]
}
If the state is Standby, then that is the current inactive node, you have to try the other nodes to find for which the State says 'active' ... example below:
curl 'http://my-namenode-lv-102:50070/jmx?qry=Hadoop:service=NameNode,name=NameNodeStatus'
{
"beans" : [ {
"name" : "Hadoop:service=NameNode,name=NameNodeStatus",
"modelerType" : "org.apache.hadoop.hdfs.server.namenode.NameNode",
"State" : "active",
"SecurityEnabled" : false,
"NNRole" : "NameNode",
"HostAndPort" : "my-namenode-lv-102:8020",
"LastHATransitionTime" : 1561605054944
} ]
}

To connect to phoenix-hbase use the zookeeper address, port & zookeeper.znode.parent configuration's value which is configured in your cluster. (it can be found in your hbase-site.xml file)

Related

external access to ElasticSearch cluster

Using this link I can easily setup a 3-node cluster on a single host, with docker-compose.
This is all fine if I just use ES via the included Kibana container.
However I need to access this cluster from external hosts. This becomes problematic because the nodes inside the cluster are exposed through their docker-internal IP address. The application uses this API call below to get the addresses, and then of course errors out.
$ curl 172.16.0.146:9200/_nodes/http?pretty
{
"_nodes" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"cluster_name" : "es-cluster-test",
"nodes" : {
"hYCGiuBLQMK4vn5I3C3pQQ" : {
"name" : "es01",
"transport_address" : "192.168.48.3:9300",
"host" : "192.168.48.3",
"ip" : "192.168.48.3",
"version" : "8.2.2",
.....
How can I overcome this? I have tried exposing the 9200/9300 ports for all 3 nodes to different ports on the docker-host, and then adding a network.publish_host=172.16.0.146 environment setting to each node, but this results in three 1-node clusters.
Someone must have faced this one in the past...

Is there anyway to check if elasticsearch cluster exists or not?

I am working on an elasticsearch (es) cluster monitoring dashboard where I want to onboard all my es clusters. I am developing the dashboard from scratch. So, I wanted to add a button on the dashboard by clicking on that user will be able to enter the name of the es cluster address/IP(first time onboarding the cluster) then hit the submit button. If that es cluster exists then user should be able to monitor the cluster, if not then, it should show some error message to the user(on the dashboard) saying that "Sorry you have entered a wrong cluster address/IP". So, how can I determine if an es cluster exists or not?
A simple curl call to the ES cluster address and port should be enough to verify if an ES cluster exists or not.
For e.g. if we want to verify whether an ES cluster exists at http://localhost:9200, we would fire a curl call as follows:-
curl -XGET "http://localhost:9200/"
If the ES cluster exists/ has permissions to access, it would return a JSON as follows:
{
"name" : "es01",
"cluster_name" : "elasticsearch7",
"cluster_uuid" : "xu49eNE6SuC1Z857kG2Q5g",
"version" : {
"number" : "7.16.3",
"build_flavor" : "default",
"build_type" : "docker",
"build_hash" : "4e6e4eab2297e949ec994e688dad46290d018022",
"build_date" : "2022-01-06T23:43:02.825887787Z",
"build_snapshot" : false,
"lucene_version" : "8.10.1",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
Else, it would return an error as follows:-
curl: (7) Failed to connect to localhost port 9200: Connection refused
Please note, that you would need to use appropriate curl syntax according to the programming language. For the example, I have considered a bash script.

Cant connect to my proxied elasticsearch node

I'm having issues with connecting from my Go client to my es node.
I have elasticsearch behind an nginx proxy that sets basic auth.
All settings are default in ES besides memory.
Via browser it works wonderfully, but not via this client:
https://github.com/olivere/elastic
I read the docs and it says it uses the /_nodes/http api to connect. Now this is probably where I did something wrong because the response from that api looks like this:
{
"_nodes" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"cluster_name" : "elasticsearch",
"nodes" : {
"u6TqFjAvRBa3_4FndfKh4w" : {
"name" : "u6TqFjA",
"transport_address" : "127.0.0.1:9300",
"host" : "127.0.0.1",
"ip" : "127.0.0.1",
"version" : "5.6.2",
"build_hash" : "57e20f3",
"roles" : [
"master",
"data",
"ingest"
],
"http" : {
"bound_address" : [
"[::1]:9200",
"127.0.0.1:9200"
],
"publish_address" : "127.0.0.1:9200",
"max_content_length_in_bytes" : 104857600
}
}
}
}
I'm guessing I have to set the IPs to my actual IP/domain (my domain is like es01.somedomain.com)
So how do i correctly configure elastisearch so that my go client can connect?
My config files for nginx look similar to this: https://www.elastic.co/blog/playing-http-tricks-nginx
Edit: I found a temporary solution by setting elastic.SetSniff(false) in the Options for the client, but I think that means I can't scale ES horizontally. So still looking for an alternative.
You are looking for the HTTP options, specifically http.publish_host and http.publish_port, which should be set to the publicly reachable address and port of the Nginx server proxying the ES node.
Note that with Elasticsearch listening on 127.0.0.1:9300 for the transport, you won't be able to form a cluster with nodes on other hosts. The transport can be configured similarly with the transport options.

Elasticsearch basics : transportclient or not?

I set up a graylog stack (graylog / ES/ Mongo) everything went smooth (well almost), yesterday I tried to get some info using the following command :
curl 'http://127.0.0.1:9200/_nodes/process?pretty'
{
"cluster_name" : "log_server_graylog",
"nodes" : {
"Znz_72SZSyikw6DEC4Wgzg" : {
"name" : "graylog-27274b66-3bbd-4975-99ee-1ee3d692c522",
"transport_address" : "127.0.0.1:9350",
"host" : "127.0.0.1",
"ip" : "127.0.0.1",
"version" : "2.4.4",
"build" : "fcbb46d",
"attributes" : {
"client" : "true",
"data" : "false",
"master" : "false"
},
"process" : {
"refresh_interval_in_millis" : 1000,
"id" : 788,
"mlockall" : false
}
},
"XO77zz8MRu-OOSymZbefLw" : {
"name" : "test",
"transport_address" : "127.0.0.1:9300",
"host" : "127.0.0.1",
"ip" : "127.0.0.1",
"version" : "2.4.4",
"build" : "fcbb46d",
"http_address" : "127.0.0.1:9200",
"process" : {
"refresh_interval_in_millis" : 1000,
"id" : 946,
"mlockall" : false
}
}
}
}
I does look like (to me at least that there is 2 nodes running, someone on the ES IRC told me that there might be a transport client running (which show up as a second node)...
I really don't understand why where this transport client comes from, also, the guy from IRC told me it used to be a common setup (using transport client) but this is discouraged now, how can I reverse the config to follow ES best practices ? (which I couldn't find on the docs)
FYI, my config file :
cat /etc/elasticsearch/elasticsearch.yml
cluster.name: log_server_graylog
node.name: test
path.data: /tt/elasticsearch/data
path.logs: /tt/elasticsearch/log
network.host: 127.0.0.1
action.destructive_requires_name: true
# Folowing are useless as we are defining swappiness to 1, this shloud prevent ES memeory space from being sawpped, unless emergency
#bootstrap.mlockall: true
#bootstrap.memory_lock: true
Thanks
I found the answer using the graylog IRC, the second client is the graylog client created by.... Graylog server :)
So everything is normal and as expected.

identify live spark master at the time of spark-submit

I have 5 node spark cluster where 2 node are running master. in HA(by Zookeeper) scenario any one will be elected as master.
at the time of submitting application using command
/bin/spark-submit --class SparkAggregator.java --deploy-mode cluster --supervise --master spark://host1:7077
getting error
Can only accept driver submissions in ALIVE state. Current state: STANDBY.
spark-submit doe not allow multiple master name in --master.
Question:
How to identify the elected master at the time of spark-submit.
Thanks
Pankaj
The master option can take multiple spark masters, so if you have more than one list them with a comma between them. e.g.
/bin/spark-submit --class SparkAggregator.java --deploy-mode cluster --supervise --master spark://host1:7077,host2:7077,host3:7077
If will try to connect to all of them, the first that responds is used, this allow you to use multiple masters in a cluster where only one is active and the rest are in standby.
Spark has a hidden API which tell you about the status of the Spark Cluster
API Request- http://SPARK_MASTER_IP:8080/json/
Output -
{
"url" : "spark://10.204.216.233:7077",
"workers" : [ {
"id" : "worker-20170606104140-10.204.217.96-40047",
"host" : "10.204.217.96",
"port" : 40047,
"webuiaddress" : "http://10.204.217.96:8081",
"cores" : 4,
"coresused" : 0,
"coresfree" : 4,
"memory" : 29713,
"memoryused" : 0,
"memoryfree" : 29713,
"state" : "ALIVE",
"lastheartbeat" : 1496760671542
}, {
"id" : "worker-20170606104144-10.204.219.15-42749",
"host" : "10.204.219.15",
"port" : 42749,
"webuiaddress" : "http://10.204.219.15:8081",
"cores" : 4,
"coresused" : 0,
"coresfree" : 4,
"memory" : 29713,
"memoryused" : 0,
"memoryfree" : 29713,
"state" : "ALIVE",
"lastheartbeat" : 1496760675649
}, {
"id" : "worker-20170606104151-10.204.217.249-35869",
"host" : "10.204.217.249",
"port" : 35869,
"webuiaddress" : "http://10.204.217.249:8081",
"cores" : 4,
"coresused" : 0,
"coresfree" : 4,
"memory" : 29713,
"memoryused" : 0,
"memoryfree" : 29713,
"state" : "ALIVE",
"lastheartbeat" : 1496760682270
} ],
"cores" : 12,
"coresused" : 0,
"memory" : 89139,
"memoryused" : 0,
"activeapps" : [ ],
"completedapps" : [ ],
"activedrivers" : [ ],
"status" : "ALIVE"
}

Resources