Is thrift running on my HBase master? How to connect to it with Happybase? - hadoop

I am running krejcmat/hadoop-hbase docker in pseudo distributed mode. That is master and slaves are running in separate containers on the same machine. After starting the Hadoop cluster and HBase, I start the thrift server on the master node with:
hbase thrift start -threadpool
I also expose node 9090 (the default Thrift port on the start with --expose=9090). I want to use the Happybase library to connect from my host machine to the Hbase running in the Hadoop cluster via the Thrift API. This is the command I use:
connection = happybase.Connection('hadoop-hbase-master', 9090)
But I receive the error:
TTransportException(message="Could not connect to ('hadoop-hbase-master', 9090)", type=1)
Which means the Thrift API is not reachable. Is it because the Thrift server is not running? Or shall I use some sort of a Thrift client on my host machine? Or shall I run the thrift server on one of the slaves instead of the master?
Thanks,
Sepideh

I start thrift through
hbase thrift start
and then the following codes are OK.
import happybase
connection = happybase.Connection('localhost')
You may try.

Related

Not Able to Start Hadoop Service Locally On Windows 7

I am trying to set up hadoop locally on my windows 7 computer by the instructions on the following link:
https://dimensionless.in/know-how-to-install-and-run-hadoop-on-windows-for-beginners/
I followed each single step and hadoop looks to be properly installed as I checked it by running this command: hadoop version in the windows command prompt and it returned the installed hadoop version hadoop 3.1.0 successfully.
However it failed to start all the nodes e.g. namenode, data node, yarn. I realised it must be to do with the port used as I use local port 9000 in the core site configuration: hdfs://localhost:9000 , and I checked if the port is open by running command: telnet localhost 9000, and it returned the message that failed to open the port.
Can anyone provide any guidance on the above issue which looks to be the port issue which then failed the hadoop service from starting up?
Thank you.

Configure hadoop-client to connect to hadoop in other machine/server

On server A i have hadoop and python scripts for performing tasks on hadoop.
On server B i have hive/hadoop.
Is it possible to configure hadoop-client on server A to be connected to hadoop on server B?
It's not clear what Python library you are using, but assuming PySpark, you can copy or configure the HADOOP_CONF_DIR on your client machine, and it can communicate with any external Hadoop system.
At the very least, you'll need to configure a core-site.xml to communicate with HDFS and a hive-site.xml to communicate with Hive.
If you are using PyHive library, you just connect to user#hiveserver2:1000

What is the master URL in EC2 spark cluster

I have a spark cluster launched using spark-ec2 script.
(EDIT: after login into the master), I can run spark jobs locally on the master node as :
spark-submit --class myApp --master local myApp.jar
But I can't seem to run the job in the cluster mode:
../spark/bin/spark-submit --class myApp --master spark://54.111.111.111:7077 --deploy-mode cluster myApp.jar
The ip address of the master is obtained from the AWS console.
I get the following errors:
WARN RestSubmissionClient: Unable to connect to server
Warning: Master endpoint spark://54.111.111.111:7077 was not a REST server. Falling back to legacy submission gateway instead.
Error connecting to master (akka.tcp://sparkMaster#54.111.111.111:7077).
Cause was: akka.remote.InvalidAssociation: Invalid address: akka.tcp://sparkMaster#54.177.156.236:7077
No master is available, exiting.
How to submit to a EC2 spark cluster ?
When you run with --master local you are also not connecting to the master. You are executing Spark operations in the same JVM as the application. (See docs.)
Your application code may be wrong too. So first just try to run spark-shell on the master node. /root/spark/bin/spark-shell is configured to connect to the EC2 Spark master when started without flags. If that works, you can try spark-shell --master spark://ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com:7077 on your laptop. Be sure to use the external IP or hostname of the master machine.
If that works too, try running your application in client mode (without --deploy-mode cluster). Hopefully in the course of trying all these, you will figure out what was wrong with your original approach. Good luck!
This is nothing to do with EC2, I had similar error on my server. I was able to resolve it by overwriting spark-env.sh SPARK_MASTER_IP.

How to use remote hadoop cluster

I have a Hadoop cluster deployed, and the client MapReduce program is running on another machine. How can I use that cluster?
If you have you have your jars in a client machine install hadoop-client packages in that machine and have configuration details of cluster in conf folder so that you can trigger your jobs from client machine into remote cluster

phoenix hbase not connecting remotely

I have two cloudera VM and on both i've configured phoenix and it is working fine as long as it is localhost.
When i'm trying to connect hbase from one VM from phoenix of another VM, i'm using this command
$ ./sqlline.sh xxx.xx.xx.xx:2181
The connection is successful, but phoenix is still referencing the local HBASE and not the remote HBASE. Can anyone tell me where is the problem?

Resources