Error while working on hive which is installed on edgenode - hadoop

I am new to hadoop/HIve learning and struggling to fix this, for a distributed hadoop environment where should hive and pig need to install, is this edge node or where my hadoop installed
Hadoop installed on different server say hadoopVM, 2 separate data nodes DN1, DN2 & Edge Nodes from where I can submit jobs to hadoop to load any files to HDFS
till here i have no issue, i am trying to install hive edge node and getting below error
Attached error which i am getting on edgenode server

It seems that the Meta Store service is not started. start the service by issuing the following command in one of the session and don't close that session, and parallel start another session and try to use hive.
Active session mode:
sudo hive --service metastore
Background service mode:
If you add "&&" then service will be started and keep running as a background process.
sudo hive --service metastore &&
Altarnative:
If you still facing the problem then this is the problem because the new version of MySQL, you can refer my answer at below link.
SemanticException in Hive Shell Mode

Related

Installing Hive 2.1.0 Interactive Query (LLAP) on Kerberized HDP 2.6.2 environment

I had a lot of issues surrounding the installation/activation of Hive 2.1.0 on our HDP 2.6.2 cluster. But finally I got it working, so I wanted to share the steps involved with the community. I got these steps from different sources, which I will also mention below each step. My specifications:
Clustered HDP 2.6.2 (hortonworks) environment
Kerberos
Hive 1.2.1000 -> Hive 2.1.0
Step 1: Enable Hive Interactive Query
Follow the steps on the Hortonworks website. This includes enabling YARN pre-emption and some other Yarn settings. After adjusting YARN your can enable Hive Interactive Query via Ambari. You also have to specify a default queue that is at least 20% of your total cluster capacity.
Source
Step 2: Kerberos related settings
Make sure you add the following settings to the custom hiveserver2-interactive site in Ambari. Where ${REALMNAME} is the name of your LDAP realm.
hive.llap.zk.sm.keytab.file=/etc/security/keytabs/hive.llap.zk.sm.keytab
hive.llap.zk.sm.principal=hive/_HOST#${REALMNAME}
hive.llap.daemon.keytab.file=/etc/security/keytabs/hive.service.keytab
hive.llap.daemon.service.principal=hive/_HOST#${REALMNAME}
Now you have to put those 2 keytabs (basically the same keytabs) on every YARN node. This can be done manually or through Ambari (Kerberos service). Make sure those keytabs are chown hive:hadoop and have a chmod 440 (group read).
Note: you also need a user hive on all those nodes.
Source
Step 3: Zookeeper configuration
It could be that Hive is not recognized by Zookeeper, this will give acl errors when trying to start the HiveServer2 Interactive. To cope with this issue I added the right hive acl nodes through a zookeeper client host.
su -
# First, authenticate with the hive keytab
kinit hive/'hostname' -kt /etc/security/keytabs/hive.service.keytab
# Second, connect to a zookeeper client on your cluster
/usr/hdp/current/zookeeper-server/bin/zkCli.sh -server ${ZOOKEEPER_CLIENT}
# Third, check the current status of the user-hive acl
getAcl /llap-sasl/user-hive
# Fourth, If this is not there create the following nodes
create /llap-sasl/user-hive "" sasl:hive:cdrwa,world:anyone:r
create /llap-sasl/user-hive/llap0 "" sasl:hive:cdrwa,world:anyone:r
create /llap-sasl/user-hive/llap0/workers "" sasl:hive:cdrwa,world:anyone:r
# Fifth, change the llap-sasl node to add the user hive
setAcl /llap-sasl sasl:hive:cdrwa,world:anyone:r
Source 1, Source 2
Basically, this should work for Kerberized environments. If you got errors related to ACL, go back to your Zookeeper settings and look if everything is fine. If you have errors related to a missing Hive user, you should look of the hive user is added correctly to the nodes. If you have an error related to Kerberos (principal or keytabs) look if the keytabs are on the designated (YARN) nodes with the correct rights.

Hive Service Stops after few minutes automatically

I have setup hadoop and hive on aws ec2 server with 14.04 ubuntu, when i run a background process for hive after hadoop services the hive server stops working after some time, still facing the issue below is the given command
hive --service hiveserver2 &

H2O: unable to connect to h2o cluster through python

I have a 5 node hadoop cluster running HDP 2.3.0. I setup a H2O cluster on Yarn as described here.
On running following command
hadoop jar h2odriver_hdp2.2.jar water.hadoop.h2odriver -libjars ../h2o.jar -mapperXmx 512m -nodes 3 -output /user/hdfs/H2OTestClusterOutput
I get the following ouput
H2O cluster (3 nodes) is up
(Note: Use the -disown option to exit the driver after cluster formation)
(Press Ctrl-C to kill the cluster)
Blocking until the H2O cluster shuts down...
When I try to execute the command
h2o.init(ip="10.113.57.98", port=54321)
The process remains stuck at this stage.On trying to connect to the web UI using the ip:54321, the browser tries to endlessly load the H2O admin page but nothing ever displays.
On forcefully terminating the init process I get the following error
No instance found at ip and port: 10.113.57.98:54321. Trying to start local jar...
However if I try and use H2O with python without setting up a H2O cluster, everything runs fine.
I executed all commands as the root user. Root user has permissions to read and write from the /user/hdfs hdfs directory.
I'm not sure if this is a permissions error or that the port is not accessible.
Any help would be greatly appreciated.
It looks like you are using H2O2 (H2O Classic). I recommend upgrading your H2O to the latest (H2O 3). There is a build specifically for HDP2.3 here: http://www.h2o.ai/download/h2o/hadoop
Running H2O3 is a little cleaner too:
hadoop jar h2odriver.jar -nodes 1 -mapperXmx 6g -output hdfsOutputDirName
Also, 512mb per node is tiny - what is your use case? I would give the nodes some more memory.

NoServerForRegionException while running Hadoop MapReduce job on HBase

I am executing a simple Hadoop MapReduce program with HBase as an input and output.
I am getting the error:
java.lang.RuntimeException: org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region for OutPut,,99999999999999 after 10 tries.
This exception appeared to us when there was difference in hbase version.
Our code was built with and running with 0.94.X version of hbase jars. Whereas the hbase server was running on 0.90.3.
When we changed our pom file with right version (0.90.3) of hbase jars it started working fine.
Query bin/hbase hbck and find in which machine Region server is running.
Make sure that all your Region server is up and running.
Use start regionserver for starting Region server
Even if Regionserver at the machine is started it may fail because of time sync.
Make sure you have NTP installed on all Regionserver nodes and HbaseMaster node.
As Hbase works on a key-value pair where it uses the Timestamp as the Index, So it allows a time skew less than 3 seconds.
Deleting (or move to /tmp) the WAL logs helped in our case:
hdfs dfs -mv /apps/hbase/data/MasterProcWALs/state-*.log /tmp

FAILED: Error in metadata: MetaException(message:Got exception: java.net.ConnectException Call to localhost/127.0.0.1:54310 failed

I am using Ubuntu 12.04, hadoop-0.23.5, hive-0.9.0.
I specified my metastore_db separately to some other place $HIVE_HOME/my_db/metastore_db in hive-site.xml
Hadoop runs fine, jps gives ResourceManager,NameNode,DataNode,NodeManager,SecondaryNameNode
Hive gets started perfectly,metastore_db & derby.log also created,and all hive commands run successfully,I can create databases,table,etc. But after few day later,when I run show databases,or show tables, get below error
FAILED: Error in metadata: MetaException(message:Got exception: java.net.ConnectException Call to localhost/127.0.0.1:54310 failed on connection exception: java.net.ConnectException: Connection refused) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
I had this problem too and the accepted answer did not help me so will add my solution here for others:
My problem was I had a single machine with a pseudo distributed set up installed with hive. It was working fine with localhost as the host name. However when we decided to add multiple machines to the cluster we also decided to give the machines proper names "machine01, machine 02 etc etc".
I changed all the hadoop conf/*-site.xml files and the hive-site.xml file too but still had the error. After exhaustive research I realized that in the metastore db hive was picking up the URIs not from *-site files, but from the metastore tables in mysql. Where all the hive table meta data was saved are two tables SDS and DBS. Upon changing the DB_LOCATION_URI column and LOCATION in the tables DBS and SDS respectively to point to the latest namenode URI, I was back in business.
Hope this helps others.
reasons for this
If you changed your Hadoop/Hive version,you may be specifying previous hadoop version (which has ds.default.name=hdfs://localhost:54310 in core-site.xml) in your hive-0.9.0/conf/hive-env.sh
file
$HADOOP_HOME may be point to some other location
Specified version of Hadoop is not working
your namenode may be in safe mode ,run bin/hdfs dfsadmin -safemode leave or bin/hadoop dsfadmin -safemode leave
In case of fresh installation
the above problem can be the effect of a name node issue
try formatting the namenode using the command
hadoop namenode -format
1.Turn off your namenode from safe mode. Try the commands below:
hadoop dfsadmin -safemode leave
2.Restart your Hadoop daemons:
sudo service hadoop-master stop
sudo service hadoop-master start

Resources