I have a setup on a EC2 instance that uses Whirr to spin up new hadoop instances. I have been trying to get Hive to work with this setup. Hive should be configured to use mysql as the local metastore. The issue that I am having is that every time I try to run a query like( CREATE TABLE testers (foo INT, bark STRING); ) via the hive interface it just hangs there and doesn't seem like it is doing anything.
Any help would be appreciated.
I would first get the debug output from the hive command line to see where it is hanging. Run the hive shell with this parameter, and then paste the output of your command.
hive -hiveconf hive.root.logger=DEBUG,console
Related
I am trying to set up hive on my local. I started all Hadoop processes and set up the {hive}/bin path. On command prompt I can run hive commands , create and read tables. My questions are -
1) is hive-site.xml is optional file ?
2) in absence of hive-site.xml file, how hive get information regrading metastore and other configuration?
If you're running Hive queries from your local machine which has Hadoop installed, hive-site.xml is not needed as you are talking directly to hive/bin in the Hive installation directory. You don't need to tell Hive where to find Hive.
If you wanted to run Hive commands from another machine, but interacting with Hive on your local machine, you'd need hive-site.xml.
I have manually installed hadoop and hive on my Ubuntu 16.04 laptop. Hive was working fine and I created a few test databases (derby).
On restarting laptop, I found that hive was running but running any command like show databases, it was giving error.
I followed the solutions given web. ie:
1) rename metastore_db to metastore_db.tmp.
2) run schematool to generate new metastore_db
3) remove tmp metastore_db.tmp (Not removing gives error when you run hive)
Now I am able to run hive but on running show databases I see only default database.
Is there any way to add databases I created previously (for exxample /user/hive/warehouse/computersalesdb.db saved in hdfs filesystem) to newly generated metastore?
* UPDATE *
On further analysis I found, metastore_db folder is being created where ever I run hive. So this seems to be the cause of problem. The solution is:
1) As advised in comment by #cricket_007 have metastore in mysql or any other rdbms you are using.
2) Always run hive from same folder
3) set property “javax.jdo.option.ConnectionURL” to create metastore in specific folder, which is defined in hive-site.xml
Leaving this comment for the benefit of other nubes like me :D
I am connecting to a hive installation using a JDBC client code. I have created a test table with two columns(column1, column2) both string type. When i try executing simple queries like "select* from test" i get result in java program but queries with where clauses and other complex queries throw the following exception.
"Query returned non-zero code: 1, cause: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask"
I have tried changing permissions of hdfs directories where file is present, /tmp on local directory but this didn't work.
This is my connection code
Connection con = DriverManager.getConnection("jdbc:hive://"+host+":"+port+"/default", "", "");
Statement stmt = con.createStatement();
Error is thrown at executeQuery() method
Checking the logs on server gives the following exception:
java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:83)
at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:76)
at org.apache.hadoop.mapred.JobClient.init(JobClient.java:478)
at org.apache.hadoop.mapred.JobClient.(JobClient.java:457)
at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:426)
at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:138)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1374)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1160)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:973)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:893)
at org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:198)
at org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(ThriftHive.java:644)
at org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(ThriftHive.java:628)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Job Submission failed with exception 'java.io.IOException(Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.)'
The queries work when run on a command prompt but not in JDBC client.
I am stuck on this. Any suggestions would be helpful.
UPDATE
I am using cloudera CDH4 hadoop/hive distribution. The script that i ran is as follows
#!/bin/bash
HADOOP_HOME=/usr/lib/hadoop/client
HIVE_HOME=/usr/lib/hive
echo -e '1\x01foo' > /tmp/a.txt
echo -e '2\x01bar' >> /tmp/a.txt
HADOOP_CORE={{ls $HADOOP_HOME/hadoop*core*.jar}}
CLASSPATH=.:$HADOOP_CORE:$HIVE_HOME/conf
for i in ${HIVE_HOME}/lib/*.jar ; do
CLASSPATH=$CLASSPATH:$i
done
for i in ${HADOOP_HOME}/*.jar ; do
CLASSPATH=$CLASSPATH:$i
done
java -cp $CLASSPATH com.hive.test.HiveConnect
I had change HADOOP_CORE={{ls $HADOOP_HOME/hadoop-*-core.jar}} to HADOOP_CORE={{ls $HADOOP_HOME/hadoop*core*.jar}} as there was no jar file in my hadoop_home starting with hadoop- and ending with -core.jar. Is this correct? Also running the script gives the following error
/usr/lib/hadoop/client/hadoop*core*.jar}}: No such file or directory
Also i have modified the script to add hadoop client jars to classpath as the script threw the error that hadoop fileReader not found. So i added the following as well.
for i in ${HADOOP_HOME}/*.jar ; do
CLASSPATH=$CLASSPATH:$i
done
This executes the class file and runs the query "select * from test" but fails on "select column1 from test".
Still no success and the same error.
Since, it is running fine with the hive shell, can you check if the user with which you are running the hive shell and the java program (with JDBC) are the same?
Next, Starting the Thrift Server
cd to where hive is -
Issue this commands -
bin/hive --service hiveserver &
you should see -
Starting Hive Thrift Server
A quick way to ensure the HiveServer is running is to use the netstat command to determine if port 10,000 is open and listening for connections:
netstat -nl | grep 10000
tcp 0 0 :::10000 :::* LISTEN
Next, create a file called myhivetest.sh and put the follwing inside
and replace HADOOP_HOME, HIVE_HOME and package.youMainClass according to your requirements-
#!/bin/bash
HADOOP_HOME=/your/path/to/hadoop
HIVE_HOME=/your/path/to/hive
echo -e '1\x01foo' > /tmp/a.txt
echo -e '2\x01bar' >> /tmp/a.txt
HADOOP_CORE={{ls $HADOOP_HOME/hadoop-*-core.jar}}
CLASSPATH=.:$HADOOP_CORE:$HIVE_HOME/conf
for i in ${HIVE_HOME}/lib/*.jar ; do
CLASSPATH=$CLASSPATH:$i
done
java -cp $CLASSPATH package.youMainClass
Save the myhivetest.sh and do a chmod +x myhivetest.sh. You can run the bash script using ./myhivetest.sh, which will build your classpath before invoking your hive program.
Please follow the instruction here for details.
There are two ways embedded mode and standalone mode.
You should look for the standalone mode.
For your information:
Hive is not a extensive query engine akin to the DBMS like MySQL, Oracle and Teradata etc.
Hive has got limitations on the extent of complex queries you can make, like very complex joins etc.
Hive runs Hadoop MapReduce jobs when you do a query.
Check this tutorial for what type of queries are supported and which are not.
Hope this helps.
I had the same issue. I have managed to resolve the issue.
This error popped up when I was running the hive jdbc client on a hadoop cluster with /user accounts set up.
With such a environment set up, the ability to run map-reduce jobs were all based on permissions.
With the connection string being wrong, the map-reduce framework was not able to set up staging directories and trigger off the job.
Please look at your connection string [if this error is popping up in a hadoop-cluster setup].
If the connection string looks this way
Connection con = DriverManager
.getConnection(
"jdbc:hive2://cluster.xyz.com:10000/default",
"hive", "");
Change it to
Connection con = DriverManager
.getConnection(
"jdbc:hive2://cluster.xyz.com:10000/default",
"user1", "");
where user1 is a configured user on the cluster setup.
I was having similar issues. I am trying to query Hive using Oracle SQL Developer (http://www.oracle.com/technetwork/developer-tools/sql-developer/overview/index.html) combined with a third-party JDBC driver as described here: https://blogs.oracle.com/datawarehousing/entry/oracle_sql_developer_data_modeler. Yes, I know that I could use Hue to do this but I interact with many other databases, including Oracle, and it is nice to have a rich client that I can save SQL queries and simple reports directly on my machine.
I am running the latest version of Cloudera CDH (5.4) on a cluster on AWS.
I was able to issue simple queries such as "SELECT * FROM SAMPLE_07" and receive a result, but running "SELECT COUNT(*) FROM SAMPLE_07" would throw a JDBC error. I was able to solve this by creating a user in Hue, and entering this user information in the Oracle SQL Developer connection information dialog. After doing this, I was able to run both queries.
What was confusing about this is that I was able to run a simple SELECT statement and received no error -- what I am used to is either a) I can log into a system to run queries or b) I can't. Strange that it "sort of" works without the correct user ID but I guess one of those strange Hadoop things.
I've just installed hadoop on windows using cygwin which works fine, and now I am installing Hive. I am running it as:
bin/hive -hiveconf java.io.tmpdir=/cygdrive/c/cygwin/tmp
OR
bin/hive -hiveconf java.io.tmpdir=/tmp
(both give the same problem) as I have found out there is a bug with the windows naming convension (https://issues.apache.org/jira/browse/HIVE-2388...)
When I run the above command, Hive seems to load fine, but when I enter "show tables;" I get no response. This is the same for all queries. CREATE TABLE etc, there is no response
Its the same problem as this guy:
http://mail-archives.apache.org/mod_mbox...
Any ideas?
I resolved a similar issue and successfully ran HIVE after starting all Hadoop daemons
namenode
datanode
jobtracker
Task Tracker
Running queries from files using hive -f <filename>, instead of writing queries directly at the HIVE command prompt. Additionally, you may also use bin/hive -e 'SHOW TABLES'
I had created a couple of tables in hive. I hit a few queries on them. Then exited hive, closed hadoop mapred and dfs after that. Then came back the next day only to see that tables went missing !!
My hive uses local metastore. After a lot of searching I saw only one such issue posted by someone. It was suggested in the answer that local if metastore is used then hive should be started from that same location. And I had done the same. I ran the hive from the master only, never even had logged into slave. Metastore folder is still there. So what must have gone wrong? I checked datanode logs of hadoop and hive metastore logs. But found nothing. Where can I found what went wrong? Please help me with this. Also what can be done to avoid such things?
If you use local metastore, Hive creates metastore_db in the directory from where hiveserver2 is started. So if you start the hiveserver2 from a different directory location next time, then a new metastore_db will be created at that location and this metastore_db will not have metadata about your earlier tables.
Where you using a database the first day? Where you using it the second day?
Meaning
hive> show databases;
OK
default
test
Time taken: 1.575 seconds
hive> use database test;
hive> show tables;
OK
blah
Time taken: 0.141 seconds
hive use table blah;
If you forgot to use a database or create one things could get messy.
Also what does the following command return?
sudo -u hdfs hadoop fs -ls -R \