Hive jdbc connection is giving error if MR is involved - jdbc

I am working on Hive-jdbc connection in HDP 2.1
Code is working fine for queries where mapreduce is not involved like "select * from tabblename". The same code is showing error when the query is modified with a 'where' clause or if we specify columnnames(which will run mapreduce in the the background).
I have verified the correctness of the query by executing it in HiveCLI.
Also I have verified the read/write permissions for the table for the user through which I am running the java-jdbc code.
The error is as follows
java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:275)
at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:355)
at com.testing.poc.hivejava.HiveJDBCTest.main(HiveJDBCTest.java:25)

Today I also got this exception when I submit a hive task from java.
The following error:
org.apache.hive.jdbc.HiveDriverorg.apache.hive.jdbc.HiveDriverhive_driver:
org.apache.hive.jdbc.HiveDriverhive_url:jdbc:hive2://10.174.242.28:10000/defaultget
connection sessucess获取hive连接成功!
java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
I tried to use the sql execute in hive and it works well. Then I saw the log in /var/log/hive/hadoop-cmf-hive-HIVESERVER2-cloud000.log.out then I found the reason of this error. The following error:
Job Submission failed with exception 'org.apache.hadoop.security.AccessControlException(Permission denied: user=anonymous, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x
Solution
I used the following command :
sudo -u hdfs hadoop fs -chmod -R 777 /
This solved the error!
hive_driver:org.apache.hive.jdbc.HiveDriver
hive_url:jdbc:hive2://cloud000:10000/default
get connection sessucess
获取hive连接成功!
Heart beat
执行insert成功!

If you use beeline to execute the same queries, do you see the same behaviour as you get while running your test program?
The beeline client also uses the open source JDBC driver and connects to Hive server, which is similar to what you do in your program. HiveCLI on the other hand has Hive embedded in it and does not connect to a remote Hive server by default. You can use HiveCLI to connect to a remote Hive Server 1 but I don't believe you can use it to connect to Hive Server2 (use beeline for Hive Server 2).
For this error, you can take a look at the hive.log and hiveserver2.log on the server side to get more insight into what might have caused the MapReduce error.
Hope this helps.
Cheers,
Holman

Related

Cdap connectivity with Apache HIVE

I have linux Box with CDAP installed and I configured the Hive import and Export plugins in CDAP.
In the same machine, I have Hadoop with HIVE installed. Am able to start all of the Hadoop services and verified using jps command and create and query the hive tables.
The actual problem is when am trying to connect the hive from cdap. It is unable to connect to hive and it is throwing the below error message.
Connection string: jdbc:hive2://localhost:10000/defaultdb;auth=deligateToken;
Output Directory: /tmp/hive - this directory is already exists
Error:
I tried changing the connection string to
Option1 : Connection string: jdbc:hive2://localhost:10000/defaultdb;auth=deligateToken; - COnnection refused error
Option 2: Connection string: jdbc:hive2:// - unable to instantiate error.
Option 3: Connection string: jdbc:hive2://localhost:10001/defaultdb;auth=deligateToken; - still it is not working

Hive derby issue

I installed hive-0.12.0 recenlty, But when I run queries in hive shell it shows the below error:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
This is contained in my hive-default.xml.template:
javax.jdo.option.ConnectionURL
jdbc:derby:;databaseName=/home/hduser/hive-0.12.0/metastore/metastore_db;create=true
JDBC connect string for a JDBC metastore
Could any one help?
Seems problem with your metastore. Since you are using the default hive metastore embedded derby. Lock file would be there in case of abnormal exit. if you remove that lock file this issue should get solve
rm metastore_db/*.lck

Error Getting While Executing Query On Hive Tables in WSo2

I am getting below error while executing any query (except select * from table) on hive tables in WSO2 . Please suggest
Error while executing Hive script.Query returned non-zero code: 9, cause: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
is there something , I am missing in configuration of BAM .
You want to go to the JobTracker UI (port 50030 on the JobTracker machine). Find the failed job that matches your query and look for errors in the job log.
The Hive error just says the job failed, you need to know what error caused that failure.

Hive JDBC client throws SQLException

I am connecting to a hive installation using a JDBC client code. I have created a test table with two columns(column1, column2) both string type. When i try executing simple queries like "select* from test" i get result in java program but queries with where clauses and other complex queries throw the following exception.
"Query returned non-zero code: 1, cause: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask"
I have tried changing permissions of hdfs directories where file is present, /tmp on local directory but this didn't work.
This is my connection code
Connection con = DriverManager.getConnection("jdbc:hive://"+host+":"+port+"/default", "", "");
Statement stmt = con.createStatement();
Error is thrown at executeQuery() method
Checking the logs on server gives the following exception:
java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:83)
at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:76)
at org.apache.hadoop.mapred.JobClient.init(JobClient.java:478)
at org.apache.hadoop.mapred.JobClient.(JobClient.java:457)
at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:426)
at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:138)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1374)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1160)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:973)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:893)
at org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:198)
at org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(ThriftHive.java:644)
at org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(ThriftHive.java:628)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Job Submission failed with exception 'java.io.IOException(Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.)'
The queries work when run on a command prompt but not in JDBC client.
I am stuck on this. Any suggestions would be helpful.
UPDATE
I am using cloudera CDH4 hadoop/hive distribution. The script that i ran is as follows
#!/bin/bash
HADOOP_HOME=/usr/lib/hadoop/client
HIVE_HOME=/usr/lib/hive
echo -e '1\x01foo' > /tmp/a.txt
echo -e '2\x01bar' >> /tmp/a.txt
HADOOP_CORE={{ls $HADOOP_HOME/hadoop*core*.jar}}
CLASSPATH=.:$HADOOP_CORE:$HIVE_HOME/conf
for i in ${HIVE_HOME}/lib/*.jar ; do
CLASSPATH=$CLASSPATH:$i
done
for i in ${HADOOP_HOME}/*.jar ; do
CLASSPATH=$CLASSPATH:$i
done
java -cp $CLASSPATH com.hive.test.HiveConnect
I had change HADOOP_CORE={{ls $HADOOP_HOME/hadoop-*-core.jar}} to HADOOP_CORE={{ls $HADOOP_HOME/hadoop*core*.jar}} as there was no jar file in my hadoop_home starting with hadoop- and ending with -core.jar. Is this correct? Also running the script gives the following error
/usr/lib/hadoop/client/hadoop*core*.jar}}: No such file or directory
Also i have modified the script to add hadoop client jars to classpath as the script threw the error that hadoop fileReader not found. So i added the following as well.
for i in ${HADOOP_HOME}/*.jar ; do
CLASSPATH=$CLASSPATH:$i
done
This executes the class file and runs the query "select * from test" but fails on "select column1 from test".
Still no success and the same error.
Since, it is running fine with the hive shell, can you check if the user with which you are running the hive shell and the java program (with JDBC) are the same?
Next, Starting the Thrift Server
cd to where hive is -
Issue this commands -
bin/hive --service hiveserver &
you should see -
Starting Hive Thrift Server
A quick way to ensure the HiveServer is running is to use the netstat command to determine if port 10,000 is open and listening for connections:
netstat -nl | grep 10000
tcp 0 0 :::10000 :::* LISTEN
Next, create a file called myhivetest.sh and put the follwing inside
and replace HADOOP_HOME, HIVE_HOME and package.youMainClass according to your requirements-
#!/bin/bash
HADOOP_HOME=/your/path/to/hadoop
HIVE_HOME=/your/path/to/hive
echo -e '1\x01foo' > /tmp/a.txt
echo -e '2\x01bar' >> /tmp/a.txt
HADOOP_CORE={{ls $HADOOP_HOME/hadoop-*-core.jar}}
CLASSPATH=.:$HADOOP_CORE:$HIVE_HOME/conf
for i in ${HIVE_HOME}/lib/*.jar ; do
CLASSPATH=$CLASSPATH:$i
done
java -cp $CLASSPATH package.youMainClass
Save the myhivetest.sh and do a chmod +x myhivetest.sh. You can run the bash script using ./myhivetest.sh, which will build your classpath before invoking your hive program.
Please follow the instruction here for details.
There are two ways embedded mode and standalone mode.
You should look for the standalone mode.
For your information:
Hive is not a extensive query engine akin to the DBMS like MySQL, Oracle and Teradata etc.
Hive has got limitations on the extent of complex queries you can make, like very complex joins etc.
Hive runs Hadoop MapReduce jobs when you do a query.
Check this tutorial for what type of queries are supported and which are not.
Hope this helps.
I had the same issue. I have managed to resolve the issue.
This error popped up when I was running the hive jdbc client on a hadoop cluster with /user accounts set up.
With such a environment set up, the ability to run map-reduce jobs were all based on permissions.
With the connection string being wrong, the map-reduce framework was not able to set up staging directories and trigger off the job.
Please look at your connection string [if this error is popping up in a hadoop-cluster setup].
If the connection string looks this way
Connection con = DriverManager
.getConnection(
"jdbc:hive2://cluster.xyz.com:10000/default",
"hive", "");
Change it to
Connection con = DriverManager
.getConnection(
"jdbc:hive2://cluster.xyz.com:10000/default",
"user1", "");
where user1 is a configured user on the cluster setup.
I was having similar issues. I am trying to query Hive using Oracle SQL Developer (http://www.oracle.com/technetwork/developer-tools/sql-developer/overview/index.html) combined with a third-party JDBC driver as described here: https://blogs.oracle.com/datawarehousing/entry/oracle_sql_developer_data_modeler. Yes, I know that I could use Hue to do this but I interact with many other databases, including Oracle, and it is nice to have a rich client that I can save SQL queries and simple reports directly on my machine.
I am running the latest version of Cloudera CDH (5.4) on a cluster on AWS.
I was able to issue simple queries such as "SELECT * FROM SAMPLE_07" and receive a result, but running "SELECT COUNT(*) FROM SAMPLE_07" would throw a JDBC error. I was able to solve this by creating a user in Hue, and entering this user information in the Oracle SQL Developer connection information dialog. After doing this, I was able to run both queries.
What was confusing about this is that I was able to run a simple SELECT statement and received no error -- what I am used to is either a) I can log into a system to run queries or b) I can't. Strange that it "sort of" works without the correct user ID but I guess one of those strange Hadoop things.

java.sql.SQLException: Failed to start database 'metastore_db' ERROR, while initializing database using hive

I installed Hadoop and Hive on 3 cluster. I have able to login to hive from my cluster node where HIVE is running.
root#NODE_3 hive]# hive Logging initialized using configuration in
jar:file:/usr/lib/hive/lib/hive-common-0.10.0-cdh4.2.0.jar!/hive-log4j.properties
Hive history
file=/tmp/root/hive_job_log_root_201304020248_306369127.txt hive> show
tables ; OK Time taken: 1.459 seconds hive>
But when i try to run some hive test on my cluster nodes , I am getting following given below error.
Here it is trying to initilize data base as user =ashsshar{my username}
3/04/02 02:32:44 INFO mapred.JobClient: Cleaning up the staging area
hdfs://scaj-ns/user/ashsshar/.staging/job_201304020010_0080 13/04/02
02:32:44 ERROR security.UserGroupInformation:
PriviledgedActionException as:ashsshar (auth:SIMPLE)
cause:java.io.IOException: javax.jdo.JDOFatalDataStoreException:
Failed to create database '/var/lib/hive/metastore/metastore_db', see
the next exception for details. NestedThrowables:
java.sql.SQLException: Failed to create database
'/var/lib/hive/metastore/metastore_db', see the next exception for
details. java.io.IOException: javax.jdo.JDOFatalDataStoreException:
Failed to create database '/var/lib/hive/metastore/metastore_db', see
the next exception for details. NestedThrowables:
java.sql.SQLException: Failed to create database
'/var/lib/hive/metastore/metastore_db', see the next exception for
details.
I have tried two things .
1 . Giving permission to cd /var/lib/hive/metastore/metastore_db
Removing rm /var/lib/hive/metastore/metastore_db/*lck
But still i am getting the same error
It seems to be an issue with creating the metastore. I solved this by creating a directory and setting the value to that directory as follows:
step-1: create a directory on home say its: hive-metastore-dir
step-2: being super user edit the hive-site.xml (its in: /usr/lib/hive/conf) as follows:
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:;databaseName=/var/lib/hive/metastore/metastore_db;create=true</value>
to
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:;databaseName=/home/hive-metastore-dir/metastore/metastore_db;create=true</value>
step-3: start the CLI as sudo hive and perform your queries.
You may login to hive client from a directory where the user has write access. By default, hive will try to create temporary directory in local and HDFS when a shell is opened up.
follow this steps if you are using CDH
1. copy /usr/lib/hive/conf/hive-site.xml and paste into /usr/lib/spark/conf/
This will solve the problem of "metastore_db" error
Thanks

Resources