ClassNotFoundException: org.apache.spark.SparkConf with spark on hive - hadoop

I am trying to use SPARK as hive execution engine, but getting the below error. Spark 1.5.0 is installed and I am working with Hive 1.1.0 version with Hadoop 2.7.0 version.
hive_emp table is created as ORC format table in hive.
hive (Koushik)> insert into table hive_emp values (2,'Koushik',1);
Query ID = hduser_20150921072727_feba8363-258d-4d0b-8976-662e404bca88
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
java.lang.NoClassDefFoundError: org/apache/spark/SparkConf
at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.generateSparkConf(HiveSparkClientFactory.java:140)
at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:56)
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:55)
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:116)
at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:113)
at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:95)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1638)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1397)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1183)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.SparkConf
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
... 25 more
FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. org/apache/spark/SparkConf
I also set the spark path and execution engine in hive shell.
hduser#ubuntu:~$ spark-shell
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.5.0
/_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_21)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context available as sc.
SQL context available as sqlContext.
scala> exit;
warning: there were 1 deprecation warning(s); re-run with -deprecation for details
hduser#ubuntu:~$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/spark-assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/auxlib/spark-assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/spark-assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/auxlib/spark-assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Logging initialized using configuration in file:/usr/lib/hive/conf/hive-log4j.properties
hive (default)> use Koushik;
OK
Time taken: 0.593 seconds
hive (Koushik)> set spark.home=/usr/local/src/spark;
I have also created a .hiverc as below
hduser#ubuntu:/usr/lib/hive/conf$ cat .hiverc
SET hive.cli.print.header=true;
set hive.cli.print.current.db=true;
set hive.auto.convert.join=true;
SET hbase.scan.cacheblock=0;
SET hbase.scan.cache=10000;
SET hbase.client.scanner.cache=10000;
SET hive.execution.engine=spark;
DEBUG mode error details given below:
hduser#ubuntu:~$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/spark-assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/auxlib/spark-assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/spark-assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/auxlib/spark-assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Logging initialized using configuration in file:/usr/lib/hive/conf/hive-log4j.properties
hive (default)> use Koushik;
OK
Time taken: 0.625 seconds
hive (Koushik)> set hive --hiveconf hive.root.logger=DEBUG
> ;
hive (Koushik)> set hive.execution.engine=spark;
hive (Koushik)> desc hive_emp;
OK
col_name data_type comment
empid int
empnm varchar(50)
deptid int
Time taken: 0.173 seconds, Fetched: 3 row(s)
hive (Koushik)> select * from hive_emp;
OK
hive_emp.empid hive_emp.empnm hive_emp.deptid
Time taken: 1.689 seconds
hive (Koushik)> insert into table hive_emp values (2,'Koushik',1);
Query ID = hduser_20151015112525_c96a458b-34f8-42ac-ab11-52c32479a29a
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
java.lang.NoSuchMethodError: org.apache.spark.scheduler.LiveListenerBus.addListener(Lorg/apache/spark/scheduler/SparkListener;)V
at org.apache.hadoop.hive.ql.exec.spark.LocalHiveSparkClient.<init>(LocalHiveSparkClient.java:85)
at org.apache.hadoop.hive.ql.exec.spark.LocalHiveSparkClient.getInstance(LocalHiveSparkClient.java:69)
at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:56)
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:55)
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:116)
at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:113)
at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:95)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1638)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1397)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1183)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. org.apache.spark.scheduler.LiveListenerBus.addListener(Lorg/apache/spark/scheduler/SparkListener;)V
hive (Koushik)>
I have executed the above insert twice and both the times it fails. Please find the hive.log which generated today.hive.log

The reason for this error is hive not able to find the spark assembly jar.
export SPARK_HOME=/usr/local/src/spark or add the spark assembly jar in hive lib folder. This issue will be resolved.

I too was facing the same issue on my Ubuntu 14.4 VitualBox. Here are the steps I have followed to fix:
hive> set spark.home=/usr/local/spark;
hive> set spark.master=local;
hive> SET hive.execution.engine=spark;
Added spark-assembly jar file as shown below:
hive> ADD jar /usr/local/spark/lib/spark-assembly-1.4.0-hadoop2.6.0.jar;

Like you, I encountered the same problem when deploying hive on spark. Finally, after my research, it was found that because hive could not load spark jars, so I made the following changes to hive-env.sh.
Add in hive-env.sh:
//Pay attention to your spark path
export SPARK_HOME=/opt/module/spark-2.4.5-bin-without-hive
export SPARK_JARS=""
for jar in `ls $SPARK_HOME/jars`; do
export SPARK_JARS=$SPARK_JARS:$SPARK_HOME/jars/$jar
done
export HIVE_AUX_JARS_PATH=$SPARK_JARS
中文: 这就是你的hive启动时没加载到spark的jars,所以在hive-env.sh里配置一下环境就可以了。
注意这里面的路径,我最下面的lzo你也可以不配,可以参考上面的这个配置(只是少了lzo)
export SPARK_HOME=/opt/module/spark-2.4.5-bin-without-hive
export SPARK_JARS=""
for jar in `ls $SPARK_HOME/jars`; do
export SPARK_JARS=$SPARK_JARS:$SPARK_HOME/jars/$jar
done
export HIVE_AUX_JARS_PATH=/opt/module/hadoop-3.1.3/share/hadoop/common/hadoop-lzo-0.4.21-SNAPSHOT.jar$SPARK_JARS

I was running into the same issue, and it was because the hive is not able to find spark files. there is a well-detailed info step if you running spark on YARN. I followed Spark 2.3 on Yarn3.0 with Hive 3.1
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started
To run with YARN mode (yarn-client or yarn-cluster), link the following jars to HIVE_HOME/lib.
scala-library
spark-core
spark-network-common
Used below Steps:
Get all the files for my version from /usr/hdp/current/spark2-client/jars
Copied all the above files from the spark directory to /usr.hdp/current/hive-client
Create symlinks in hive dir files in the same dir without and just .jar suffix.
Restarted the hiveserver2 to load the new files.
It worked, and now Hive can load the files, and I can submit a hive query as Spark job on Yarn. also, the files will have a version suffix at the end

Related

Hive remote postgres metastore

I was doing multi-node setup using Apache distribution .I was able to complete hadoop installation successfully (Hadoop 2.7.3).
When I tried hive (Hive 2.3),its working without issues with the default metastore(derby).Then I changed the hive-site.xml to point to my external postgresDB
I gave host,username,password as per the tutorial .But when I ran the schemainit it is faliling as bellow ,still showing derby details and initialization
is failing .Anybody faced the same issue ever?
bash-4.2$ /data/hive/bin/schematool -initSchema -dbType postgres --verbose
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL: jdbc:derby:;databaseName=metastore_db;create=true
Metastore Connection Driver : org.apache.derby.jdbc.EmbeddedDriver
Metastore connection User: APP
Starting metastore schema initialization to 2.3.0
Initialization script hive-schema-2.3.0.postgres.sql
Connecting to jdbc:derby:;databaseName=metastore_db;create=true
Connected to: Apache Derby (version 10.10.2.0 - (1582446))
Driver: Apache Derby Embedded JDBC Driver (version 10.10.2.0 - (1582446))
Transaction isolation: TRANSACTION_READ_COMMITTED
0: jdbc:derby:> !autocommit on
Autocommit status: true
0: jdbc:derby:> SET statement_timeout = 0
Error: Syntax error: Encountered "statement_timeout" at line 1, column 5. (state=42X01,code=30000)
Closing: 0: jdbc:derby:;databaseName=metastore_db;create=true
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
Underlying cause: java.io.IOException : Schema script failed, errorcode 2
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:590)
at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:563)
at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:1145)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.io.IOException: Schema script failed, errorcode 2
at org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:980)
at org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:959)
at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:586)
... 8 more
*** schemaTool failed ***

Unable to read HiveServer2 configs from ZooKeeper

I use HDP3.1. And I Ambari to deploy hadoop cluster and hive. After deployed, I can run hive in shell successfully. And then I deploy Apache Kylin2.6, it can sync hive table. But when I build the cube, I got the following error:
java.io.IOException: OS command error exit with return code: 1, error message: SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://datacenter1:2181,datacenter2:2181,datacenter3:2181/default;password=hdfs;serviceDiscoveryMode=zooKeeper;user=hdfs;zooKeeperNamespace=hiveserver2
19/02/15 10:04:53 [main]: INFO jdbc.HiveConnection: Connected to datacenter3:10000
19/02/15 10:04:53 [main]: WARN jdbc.HiveConnection: Failed to connect to datacenter3:10000
19/02/15 10:04:53 [main]: ERROR jdbc.Utils: Unable to read HiveServer2 configs from ZooKeeper
Error: Could not open client transport for any of the Server URI's in ZooKeeper: Failed to open new session: java.lang.IllegalArgumentException: Cannot modify dfs.replication at runtime. It is not in list of params that are allowed to be modified at runtime (state=08S01,code=0)
Cannot run commands specified using -e. No current connection
The command is:
hive -e "USE default;
I run hive command in shell. It's success. The connection string is same as the string when run build cube in kylin. I'm confused why it is success in shell but failed in building cube.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://datacenter1:2181,datacenter2:2181,datacenter3:2181/default;password=hdfs;serviceDiscoveryMode=zooKeeper;user=hdfs;zooKeeperNamespace=hiveserver2
19/02/15 12:10:19 [main]: INFO jdbc.HiveConnection: Connected to datacenter3:10000
Connected to: Apache Hive (version 3.1.0.3.1.0.0-78)
Driver: Hive JDBC (version 3.1.0.3.1.0.0-78)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.0.3.1.0.0-78 by Apache Hive
0: jdbc:hive2://datacenter1:2181,datacenter2:>
You can try to add these two properties to hive-site.xml.
<property>
<name>hive.security.authorization.sqlstd.confwhitelist</name>
<value>mapred.*|hive.*|mapreduce.*|spark.*</value>
</property>
<property>
<name>hive.security.authorization.sqlstd.confwhitelist.append</name>
<value>mapred.*|hive.*|mapreduce.*|spark.*</value>
</property>
Finally, I found the root cause. There is 'Cannot modify dfs.replication at runtime.' error message in the error log. Kylin set this property in $KYLIN_HOME/conf/kylin_hive_conf.xml. And when it is running hive command, it will auto append the properties in that file. The final command likes: hive --hiveconf dfs.replication=2 ..........
It looks like that dfs.replication property can't be appened to hive command. I removed this property in kylin_hive_conf.xml. And it works now.

Failed to initialize schema for HiveServer2 in Apache Hive 3.0.0 on Cygwin (Windows 10)

I already had a Hadoop 3.0.0 cluster consisting of 2 machine: 1 namenode + RM and 1 datanode. I tried to install Apache Hive 3.0.0 by following this document.
When I run schematool -dbType derby -initSchema --verbose on Cygwin, an exception was thrown:
$ schematool -dbType derby -initSchema --verbose
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/C:/BigSol/apache-hive-3.0.0-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/C:/BigSol/hadoop-3.0.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL: jdbc:derby:;databaseName=metastore_db;create=true
Metastore Connection Driver : org.apache.derby.jdbc.EmbeddedDriver
Metastore connection User: APP
Starting metastore schema initialization to 3.0.0
org.apache.hadoop.hive.metastore.HiveMetaException: Unknown version specified for initialization: 3.0.0
org.apache.hadoop.hive.metastore.HiveMetaException: Unknown version specified for initialization: 3.0.0
at org.apache.hadoop.hive.metastore.MetaStoreSchemaInfo.generateInitFileName(MetaStoreSchemaInfo.java:137)
at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:580)
at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:562)
at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:1445)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
*** schemaTool failed ***
When viewing the line of code that thrown the exception, I found that Hive tried to find a SQL schema located at %HIVE_HOME%\scripts\metastore\upgrade\derby\hive-schema-3.0.0.derby.sql.
I doubt that Cygwin messed up the path so that Hive didn't find that schema.
My questions:
How can I correct the path (or fix the problem)?
Are there batch files equivalent to *.sh files in %HIVE_HOME%\bin directory as Hive 2.1.1 have?
I found the solution. After running schematool on a Linux machine and copied metastore_db directory to Windows machine, I managed to start HiveServer2 but the beeline CLI said that the jar in C:\cygdrive\c\BigSol\apache-hive-3.0.0-bin\lib\hive-beeline-3.1.0.jar was not found.
It turned out that java in Cygwin parse the wrong path. I made a symbolic link from C:\cygdrive\c to C:\ and it worked.

Unable to start Hive CLI Hadoop(MapR)

I am trying to access hive CLI. However, it is failing to start with the following AccessControl issue.
Strangly enough, I am able to query hive data from Hue without the AccessControl issue. However, hive CLI is not working.
I am on a MapR cluster.
Any help is much appreciated.
[<user_name>#<edge_node> ~]$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/mapr/hive/hive-2.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/mapr/lib/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in file:/opt/mapr/hive/hive-2.1/conf/hive-log4j2.properties Async: true
2017-09-23 23:52:08,988 WARN [main] DataNucleus.General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/opt/mapr/spark/spark-2.1.0/jars/datanucleus-api-jdo-4.2.4.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/opt/mapr/hive/hive-2.1/lib/datanucleus-api-jdo-4.2.1.jar."
2017-09-23 23:52:08,993 WARN [main] DataNucleus.General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/opt/mapr/spark/spark-2.1.0/jars/datanucleus-core-4.1.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/opt/mapr/hive/hive-2.1/lib/datanucleus-core-4.1.6.jar."
2017-09-23 23:52:09,004 WARN [main] DataNucleus.General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/opt/mapr/spark/spark-2.1.0/jars/datanucleus-rdbms-4.1.19.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/opt/mapr/hive/hive-2.1/lib/datanucleus-rdbms-4.1.7.jar."
2017-09-23 23:52:09,038 INFO [main] DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
2017-09-23 23:52:09,039 INFO [main] DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
2017-09-23 23:52:14,2251 ERROR JniCommon fs/client/fileclient/cc/jni_MapRClient.cc:2172 Thread: 20235 mkdirs failed for /user/<user_name>, error 13
Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.security.AccessControlException: User <user_name>(user id 50005586) has been denied access to create <user_name>
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:617)
at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:531)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:646)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.hadoop.security.AccessControlException: User <user_name>(user id 50005586) has been denied access to create <user_name>
at com.mapr.fs.MapRFileSystem.makeDir(MapRFileSystem.java:1256)
at com.mapr.fs.MapRFileSystem.mkdirs(MapRFileSystem.java:1276)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1913)
at org.apache.hadoop.hive.ql.exec.tez.DagUtils.getDefaultDestDir(DagUtils.java:823)
at org.apache.hadoop.hive.ql.exec.tez.DagUtils.getHiveJarDirectory(DagUtils.java:917)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.createJarLocalResource(TezSessionState.java:616)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.openInternal(TezSessionState.java:256)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.beginOpen(TezSessionState.java:220)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:614)
... 10 more
The error is saying you're defined access to create a directory in the file system. This is likely /user/<user name>, which will need to be added by the HDFS / MapR FS super user.
I am able to query hive data from Hue without the AccessControl
Hue communicates via Thrift and HiveServer2.
Hive CLI bypasses HiveServer2 and is deprecated.
You should use Beeline instead.
beeline -n $(whoami) -u jdbc:hive2://hiveserver:10000/default
And if you're in a kerberized cluster, then you'll need some extra options there.

Hive 0.9, Hbase 0.98.5 and Hadoop 1.2.1

I have a single node Hadoop system. Installed version is 1.2.1. I have installed Hbase 0.98.5 and then I have installed Hive 0.9.
All the processes are running on my node.
Process details:
[root#localhost /]# jps
24396 SecondaryNameNode
24152 NameNode
23954 Jps
24274 DataNode
24488 JobTracker
24607 TaskTracker
1282 HQuorumPeer
2429 HMaster
2589 HRegionServer
From HBase shell, I am able to retrieve my table:
--------Hbase Shell-----------------
hbase(main):001:0> scan 'nancy'
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hbase-0.98.5-hadoop1/lib/slf4j-log4j12- 1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop/lib/slf4j-log4j12- 1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
ROW COLUMN+CELL
2 column=cf:name, timestamp=1410461757734, value=Test
1 row(s) in 0.2220 seconds
I am also able to retrieve table list from Hive:
----------Hive Shell----------------------------------
hive> SHOW TABLES;
OK
pokes
Time taken: 3.195 seconds
I am able to create, populate tables in HBase and also in Hive. But I am unable to integrate Hive with Hbase.
When I am trying to register a table in Hive, I am getting the following error message:
hive> CREATE EXTERNAL TABLE hbase_table_2(key int,name string) STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH
SERDEPROPERTIES ("hbase.columns.mapping"=":key,cf:name")
TBLPROPERTIES("hbase.table.name"="nancy");
FAILED: Error in metadata: java.lang.IllegalArgumentException: Not a host:port pair: PBUF localhost��ɞ�ކ)
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
After following various websites on how to resolve the error, I have also done the following changes.
I have moved the hbase-clinet*.jar, hbase-server*.jar, hbase-common*.jar, hbase-protocol*.jar, hbase-common*.jar, zookeeper*.jar, guavae*.jar from $HBASE_HOME\lib to $HIVE\lib directory.
Copied the hbase configuration $HBASE_HOME\conf, hadoop configuration files from $HADOOP_HOME\conf to $HIVE_HOME\conf
Copied the hive-hbase-handler-0.9.0.jar and hive-common-0.9.0.jar from $HIVE_HOME\lib to $HADOOP_HOME\lib and $HBASE_HOME\lib.
Kindly give me some information on how to resolve the issue.

Resources