I am connecting to a hive installation using a JDBC client code. I have created a test table with two columns(column1, column2) both string type. When i try executing simple queries like "select* from test" i get result in java program but queries with where clauses and other complex queries throw the following exception.
"Query returned non-zero code: 1, cause: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask"
I have tried changing permissions of hdfs directories where file is present, /tmp on local directory but this didn't work.
This is my connection code
Connection con = DriverManager.getConnection("jdbc:hive://"+host+":"+port+"/default", "", "");
Statement stmt = con.createStatement();
Error is thrown at executeQuery() method
Checking the logs on server gives the following exception:
java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:83)
at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:76)
at org.apache.hadoop.mapred.JobClient.init(JobClient.java:478)
at org.apache.hadoop.mapred.JobClient.(JobClient.java:457)
at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:426)
at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:138)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1374)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1160)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:973)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:893)
at org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:198)
at org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(ThriftHive.java:644)
at org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(ThriftHive.java:628)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Job Submission failed with exception 'java.io.IOException(Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.)'
The queries work when run on a command prompt but not in JDBC client.
I am stuck on this. Any suggestions would be helpful.
UPDATE
I am using cloudera CDH4 hadoop/hive distribution. The script that i ran is as follows
#!/bin/bash
HADOOP_HOME=/usr/lib/hadoop/client
HIVE_HOME=/usr/lib/hive
echo -e '1\x01foo' > /tmp/a.txt
echo -e '2\x01bar' >> /tmp/a.txt
HADOOP_CORE={{ls $HADOOP_HOME/hadoop*core*.jar}}
CLASSPATH=.:$HADOOP_CORE:$HIVE_HOME/conf
for i in ${HIVE_HOME}/lib/*.jar ; do
CLASSPATH=$CLASSPATH:$i
done
for i in ${HADOOP_HOME}/*.jar ; do
CLASSPATH=$CLASSPATH:$i
done
java -cp $CLASSPATH com.hive.test.HiveConnect
I had change HADOOP_CORE={{ls $HADOOP_HOME/hadoop-*-core.jar}} to HADOOP_CORE={{ls $HADOOP_HOME/hadoop*core*.jar}} as there was no jar file in my hadoop_home starting with hadoop- and ending with -core.jar. Is this correct? Also running the script gives the following error
/usr/lib/hadoop/client/hadoop*core*.jar}}: No such file or directory
Also i have modified the script to add hadoop client jars to classpath as the script threw the error that hadoop fileReader not found. So i added the following as well.
for i in ${HADOOP_HOME}/*.jar ; do
CLASSPATH=$CLASSPATH:$i
done
This executes the class file and runs the query "select * from test" but fails on "select column1 from test".
Still no success and the same error.
Since, it is running fine with the hive shell, can you check if the user with which you are running the hive shell and the java program (with JDBC) are the same?
Next, Starting the Thrift Server
cd to where hive is -
Issue this commands -
bin/hive --service hiveserver &
you should see -
Starting Hive Thrift Server
A quick way to ensure the HiveServer is running is to use the netstat command to determine if port 10,000 is open and listening for connections:
netstat -nl | grep 10000
tcp 0 0 :::10000 :::* LISTEN
Next, create a file called myhivetest.sh and put the follwing inside
and replace HADOOP_HOME, HIVE_HOME and package.youMainClass according to your requirements-
#!/bin/bash
HADOOP_HOME=/your/path/to/hadoop
HIVE_HOME=/your/path/to/hive
echo -e '1\x01foo' > /tmp/a.txt
echo -e '2\x01bar' >> /tmp/a.txt
HADOOP_CORE={{ls $HADOOP_HOME/hadoop-*-core.jar}}
CLASSPATH=.:$HADOOP_CORE:$HIVE_HOME/conf
for i in ${HIVE_HOME}/lib/*.jar ; do
CLASSPATH=$CLASSPATH:$i
done
java -cp $CLASSPATH package.youMainClass
Save the myhivetest.sh and do a chmod +x myhivetest.sh. You can run the bash script using ./myhivetest.sh, which will build your classpath before invoking your hive program.
Please follow the instruction here for details.
There are two ways embedded mode and standalone mode.
You should look for the standalone mode.
For your information:
Hive is not a extensive query engine akin to the DBMS like MySQL, Oracle and Teradata etc.
Hive has got limitations on the extent of complex queries you can make, like very complex joins etc.
Hive runs Hadoop MapReduce jobs when you do a query.
Check this tutorial for what type of queries are supported and which are not.
Hope this helps.
I had the same issue. I have managed to resolve the issue.
This error popped up when I was running the hive jdbc client on a hadoop cluster with /user accounts set up.
With such a environment set up, the ability to run map-reduce jobs were all based on permissions.
With the connection string being wrong, the map-reduce framework was not able to set up staging directories and trigger off the job.
Please look at your connection string [if this error is popping up in a hadoop-cluster setup].
If the connection string looks this way
Connection con = DriverManager
.getConnection(
"jdbc:hive2://cluster.xyz.com:10000/default",
"hive", "");
Change it to
Connection con = DriverManager
.getConnection(
"jdbc:hive2://cluster.xyz.com:10000/default",
"user1", "");
where user1 is a configured user on the cluster setup.
I was having similar issues. I am trying to query Hive using Oracle SQL Developer (http://www.oracle.com/technetwork/developer-tools/sql-developer/overview/index.html) combined with a third-party JDBC driver as described here: https://blogs.oracle.com/datawarehousing/entry/oracle_sql_developer_data_modeler. Yes, I know that I could use Hue to do this but I interact with many other databases, including Oracle, and it is nice to have a rich client that I can save SQL queries and simple reports directly on my machine.
I am running the latest version of Cloudera CDH (5.4) on a cluster on AWS.
I was able to issue simple queries such as "SELECT * FROM SAMPLE_07" and receive a result, but running "SELECT COUNT(*) FROM SAMPLE_07" would throw a JDBC error. I was able to solve this by creating a user in Hue, and entering this user information in the Oracle SQL Developer connection information dialog. After doing this, I was able to run both queries.
What was confusing about this is that I was able to run a simple SELECT statement and received no error -- what I am used to is either a) I can log into a system to run queries or b) I can't. Strange that it "sort of" works without the correct user ID but I guess one of those strange Hadoop things.
Related
I installed Apache Kylin, following the official installation guide http://kylin.apache.org/docs/install/index.html, in HDP sandbox 2.6
When I run the script, $KYLIN_HOME/bin/kylin.sh start, I got the error below:
What can I do to fix this error?
Thanks in advance
Check if Hive service is up in your ambari, when Hive service is down Kylin cannot find it and gives the error. Check for .bash_profile as well. When those two issues are addressed kylin should be able to find location of hive dependency.
Kylin uses the find-hive-dependency.sh script to setup the CLASSPATH. This script uses a Hive CLI command (I test it with beeline) to query Hive env vars and extract the CLASSPATH from them.
beeline connect to Hive using the properties at kylin_hive_conf.xml but for some reason (probably due to the Hive version included in HDP 2.6) some of the loaded Hive properties cannot be set when the connection is stablished.
The Hive properties that causes the issue can be discarded for connecting to Hive to query the CLASSPATH, so, to fix this issue:
Edit $KYLIN_HOME/conf/kylin.properties and set kylin.source.hive.client=beeline
Open the find-hive-dependency.sh script, go to line 34 aprox and modify the line
hive_env=${beeline_shell} ${hive_conf_properties} ${beeline_params} --outputformat=dsv -e "set;" 2>&1 | grep 'env:CLASSPATH'
Just remove ${hive_conf_properties}
Check Hive depedencies have been configured by running the command find-hive-dependency.sh.
Now $KYLIN_HOME/bin/kylin.sh start should works.
I am stuck at point as , how to use pyspark to fetch data from hive server using jdbc.
I am Trying to connect to HiveServer2 running on my local machine from pyspark using jdbc. All components HDFS,pyspark,HiveServer2 are on same machine.
Following is the code i am using to connect :
connProps={ "username" : 'hive',"password" : '',"driver" : "org.apache.hive.jdbc.HiveDriver"}
sqlContext.read.jdbc(url='jdbc:hive2://127.0.0.1:10000/default',table='pokes',properties=connProps)
dataframe_mysql = sqlContext.read.format("jdbc").option("url", "jdbc:hive://localhost:10000/default").option("driver", "org.apache.hive.jdbc.HiveDriver").option("dbtable", "pokes").option("user", "hive").option("password", "").load()
both methods used above are giving me same error as below:
org.apache.spark.sql.AnalysisException: java.lang.RuntimeException:
java.lang.RuntimeException: Unable to instantiate
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient;
javax.jdo.JDOFatalDataStoreException: Unable to open a test connection
to the given database. JDBC url =
jdbc:derby:;databaseName=metastore_db;create=true, username = APP.
Terminating connection pool (set lazyInit to true if you expect to
start your database after your app).
ERROR XSDB6: Another instance of Derby may have already booted the database /home///jupyter-notebooks/metastore_db
metastore_db is located at same directory where my jupyter notebooks are created. but hive-site.xml is having different metastore location.
I have already checked reffering to other questions about same error saying other spark-shell or such process is running,but its not. Even if i try following command when HiveServer2 and HDFS are down i am getting same error
spark.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING) USING hive")
I am able to connect to hives using java program using jdbc. Am I missing something here? Please help.Thanks in advance.
Spark should not use JDBC to connect to Hive.
It reads from the metastore, and skips HiveServer2
However, Another instance of Derby may have already booted the database means that you're running Spark from another session, such as another Jupyter kernel that's still running. Try setting a different metastore location, or work on setting up a remote Hive metastore using a local Mysql or Postgres database and edit $SPARK_HOME/conf/hive-site.xml with that information.
From SparkSQL - Hive tables
spark = SparkSession \
.builder \
.appName("Python Spark SQL Hive integration example") \
.config("spark.sql.warehouse.dir", warehouse_location) \
.enableHiveSupport() \
.getOrCreate()
# spark is an existing SparkSession
spark.sql("CREATE TABLE...")
I tried restarting my system, checked whether there is enough space or not and also made sure my hive server2 is running. But I'm getting these errors when given '$hive' in Cloudera.
Logging initialized using configuration in
file:/etc/hive/conf.dist/hive-log4j.properties
WARN: The method class
org.apache.commons.logging.impl.SLF4JLogFactory#release() was invoked.
Exception in thread "main" java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.RuntimeException: Unable to instantiate
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
The process of starting Hive2 is changed, as Hive got deprecated. Usage of Beeline is recommended.
Beeline was developed specifically to interact with the new server. Unlike Hive CLI, which is an Apache Thrift-based client, Beeline is a JDBC client based on the SQLLine CLI — although the JDBC driver used communicates with HiveServer2 using HiveServer2’s Thrift APIs.
As Hive development has shifted from the original Hive server (HiveServer1) to the new server (HiveServer2), users and developers accordingly need to switch to the new client tool. However, there’s more to this process than simply switching the executable name from “hive” to “beeline”.
More information provided over here
Use the below command to enter into interactive mode. Beeline supports same commands that Hive server does. You can execute same script in Beeline without any modifications.
beeline -u jdbc:hive2://
To start the Hive metastore,
sudo service hive-metastore start
I am working on Hive-jdbc connection in HDP 2.1
Code is working fine for queries where mapreduce is not involved like "select * from tabblename". The same code is showing error when the query is modified with a 'where' clause or if we specify columnnames(which will run mapreduce in the the background).
I have verified the correctness of the query by executing it in HiveCLI.
Also I have verified the read/write permissions for the table for the user through which I am running the java-jdbc code.
The error is as follows
java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:275)
at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:355)
at com.testing.poc.hivejava.HiveJDBCTest.main(HiveJDBCTest.java:25)
Today I also got this exception when I submit a hive task from java.
The following error:
org.apache.hive.jdbc.HiveDriverorg.apache.hive.jdbc.HiveDriverhive_driver:
org.apache.hive.jdbc.HiveDriverhive_url:jdbc:hive2://10.174.242.28:10000/defaultget
connection sessucess获取hive连接成功!
java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
I tried to use the sql execute in hive and it works well. Then I saw the log in /var/log/hive/hadoop-cmf-hive-HIVESERVER2-cloud000.log.out then I found the reason of this error. The following error:
Job Submission failed with exception 'org.apache.hadoop.security.AccessControlException(Permission denied: user=anonymous, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x
Solution
I used the following command :
sudo -u hdfs hadoop fs -chmod -R 777 /
This solved the error!
hive_driver:org.apache.hive.jdbc.HiveDriver
hive_url:jdbc:hive2://cloud000:10000/default
get connection sessucess
获取hive连接成功!
Heart beat
执行insert成功!
If you use beeline to execute the same queries, do you see the same behaviour as you get while running your test program?
The beeline client also uses the open source JDBC driver and connects to Hive server, which is similar to what you do in your program. HiveCLI on the other hand has Hive embedded in it and does not connect to a remote Hive server by default. You can use HiveCLI to connect to a remote Hive Server 1 but I don't believe you can use it to connect to Hive Server2 (use beeline for Hive Server 2).
For this error, you can take a look at the hive.log and hiveserver2.log on the server side to get more insight into what might have caused the MapReduce error.
Hope this helps.
Cheers,
Holman
I shutdown my HDFS client while HDFS and hive instances were running. Now when I relogged into Hive, I can't execute any of my DDL Tasks e.g. "show tables" or "describe tablename" etc. It is giving me the error as below
ERROR exec.Task (SessionState.java:printError(401)) - FAILED: Error in metadata: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
Can anybody suggest what do I need to do to get my metastore_db instantiated without recreating the tables? Otherwise, I have to duplicate the effort of creating the entire database/schema once again.
I have resolved the problem. These are the steps I followed:
Go to $HIVE_HOME/bin/metastore_db
Copied the db.lck to db.lck1 and dbex.lck to dbex.lck1
Deleted the lock entries from db.lck and dbex.lck
Log out from hive shell as well as from all running instances of HDFS
Re-login to HDFS and hive shell. If you run DDL commands, it may again give you the "Could not instantiate HiveMetaStoreClient error"
Now copy back the db.lck1 to db.lck and dbex.lck1 to dbex.lck
Log out from all hive shell and HDFS instances
Relogin and you should see your old tables
Note: Step 5 may seem a little weird because even after deleting the lock entry, it will still give the HiveMetaStoreClient error but it worked for me.
Advantage: You don't have to duplicate the effort of re-creating the entire database.
Hope this helps somebody facing the same error. Please vote if you find useful. Thanks ahead
I was told that generally we get this exception if we the hive console not terminated properly.
The fix:
Run the jps command, look for "RunJar" process and kill it using
kill -9 command
See: getting error in hive
Have you copied the jar containing the JDBC driver for your metadata db into Hive's lib dir?
For instance, if you're using MySQL to hold your metadata db, you wll need to copy
mysql-connector-java-5.1.22-bin.jar into $HIVE_HOME/lib.
This fixed that same error for me.
I faced the same issue and resolved it by starting the metastore service. Sometimes service might get stopped if your machine is re-booted or went down. You could start the service by running the command:
Login as $HIVE_USER
nohup hive --service metastore>$HIVE_LOG_DIR/hive.out 2>$HIVE_LOG_DIR/hive.log &
I had a similar problem with hive server and followed the below steps:
1. Go to $HIVE_HOME/bin/metastore_db
2. Copied the db.lck to db.lck1 and dbex.lck to dbex.lck1
3. Deleted the lock entries from db.lck and dbex.lck
4. Relogin from hive shell. It is working
Thanks
For instance, I use MySQL to hold metadata db, I copied
mysql-connector-java-5.1.22-bin.jar into $HIVE_HOME/lib folder
My error resolved
I also was facing the same problem, and figured out that I had both hive-deafult.xml and hive-site.xml(created manually by me),
I moved my hive-site.xml to hive-site.xml-template(as I was not needed this file) then
started hive, worked fine.
Cheers,
Ajmal
I have faced this issue and in my case it was while running hive command from command line.
I resolved this issue by running kinit command as I was using kerberized hive.
kinit -kt <your keytab file location> <kerberos principal>