Hive on Tez doesn't work on Hue (Error: Client Version = unknown) after upgrading HDP 2.2 to 2.3 - hadoop

I have upgraded HortonWorks from 2.2 to 2.3 for our Hadoop cluster and made all the required changes to Hue (given in HortonWorks documentation) but hue client has following issues when accessing Tez on Hive browser whereas Tez on Hive CLI works perfectly alright. Earlier (HDP 2.2), Tez was compatible with Hue but is there a problem with Hue client with Tez on HDP 2.3?
Issue 1: After upgrading HDP 2.2 to 2.3, Tez still looks for HDP 2.2 libs files in HDFS and local location.
HDP 2.2 location:
HDFS: /hdp/apps/2.2.9.0-3393
Local Files: /usr/hdp/2.2.9.0-3393
Temporary Solution to Issue 1: Moved 2.3 supporting files to 2.2
HDFS:
hdfs dfs -cp /hdp/apps/2.3.2.0-2950/tez/tez.tar.gz /hdp/apps/2.2.9.0-3393/tez/
Local Files:
cp /usr/hdp/2.3.2.0-2950/hive/lib/hive-exec-0.14.0.2.2.9.0-3393.jar /usr/hdp/2.2.9.0-3393/hive/lib/
cp /usr/hdp/2.3.2.0-2950/hadoop/lib/jersey*.jar /usr/hdp/2.2.9.0-3393/hadoop/lib/
cp /usr/hdp/2.3.2.0-2950/hadoop-yarn/lib/jersey*.jar /usr/hdp/2.2.9.0-3393/hadoop-yarn/lib/
cp /usr/hdp/2.3.2.0-2950/hadoop-mapreduce/lib/jersey*.jar /usr/hdp/2.2.9.0-3393/hadoop-mapreduce/lib/
Technically, Tez must look for "/usr/hdp/current" directory which is 2.3.2.0-2950.
Issue 2: Running Hive on Tez through Hue gives following error:
Error:
Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask
ERROR : Failed to execute tez graph.
org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. No cluster diagnostics found.
Through some research it is found that when Tez execution is not required to run a Hive query, Hue client version matches with AM version whereas any query which needs tez execution Hue client version shown as unknown.
Client version and AM version match when Tez execution is not required:
Created DAGAppMaster for application appattempt_1470224940790_0082_000001, versionInfo=[ component=tez-dag, version=0.7.0.2.3.2.0-2950, revision=4900a9cea70487666ace4c9e490d4d8fc1fee96f, SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git, buildTime=20150930-1859 ] [INFO] [main] |app.DAGAppMaster|: Comparing client version with AM version, clientVersion=0.7.0.2.3.2.0-2950, AMVersion=0.7.0.2.3.2.0-2950
Client version and AM version doesn't match when Tez execution is enabled:
Created DAGAppMaster for application appattempt_1470224940790_0092_000001, versionInfo=[ component=tez-dag, version=0.7.0.2.3.2.0-2950, revision=4900a9cea70487666ace4c9e490d4d8fc1fee96f, SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git, buildTime=20150930-1859 ]
Comparing client version with AM version, clientVersion=Unknown, AMVersion=0.7.0.2.3.2.0-2950
[ERROR] [main] |app.DAGAppMaster|: Incompatible versions found, clientVersion=Unknown, AMVersion=0.7.0.2.3.2.0-2950
Can anyone help on how to find a solution for incompatible version error when Tez is enabled through Hue on HDP 2.3.

Related

Hive tez query fails with java.io.IOException

Executing a long running Hive Tez query, it rarely fails with:
java.io.IOException: File hdfs://XXX with newer attempt ID 1 is smaller than the file hdfs://YYY with older attempt ID 0
In our 20 node HDP 3.1.5 cluster (Hive 3.1.0 and Tez 0.9.1), it fails once over around 200 executions
We where hitting HIVE-23354
It seems to have no workaround. It is solved at Hive 4.0.0
I had the same issue with a query with lots of big joins. Decresing (512 mb->16 mb in my case) the size of the tables that fit in memory, namely hive.auto.convert.join.noconditionaltask.size solved the problem for me.
Stack: HDP 3.1.4, Tez 0.9.1, Hive 3.1.0.

Hadoop 3.2.1 and HBase 2.2.3 incompatability: ERROR in hadoop-functions.sh

I am running Hadoop 3.2.1 and Hbase 2.2.3 on Centos8. I downloaded Hadoop from an Apache mirror following the Apache documentation precisely. Hadoop works just fine. I then downloaded HBase from an Apache mirror and followed the Apache documentation precisely.
When i try to run hbase version i receive this error:
/usr/hdeco/hadoop/bin/../libexec/hadoop-functions.sh: line 2366: HADOOP_ORG.APACHE.HADOOP.HBASE.UTIL.GETJAVAPROPERTY_USER: bad substitution
/usr/hdeco/hadoop/bin/../libexec/hadoop-functions.sh: line 2461: HADOOP_ORG.APACHE.HADOOP.HBASE.UTIL.GETJAVAPROPERTY_OPTS: bad substitution
HBase then goes on to print out the version information. if I run hbase-daemon.sh start master, i receive the same error, but HMaster does not show up in jps.
As per the Apache HBase documentation, because I am running a Hadoop version greater than 3.0.0, i have deleted all the jar files in the hbase/lib directory that contained the word hadoop. I received the same error both before and after deleting these jar files.
This is my first time posting anywhere. If I have not included enough, or the right, information, please let me know what you would like me to provide.

java.lang.NoSuchMethodError: org.apache.hive.common.util.ShutdownHookManager.addShutdownHook

I'm trying to build a cube on Kylin with Spark as engine type. The cluster contains the following tools:
OS image: 1.0-debian9
Apache Spark 2.4.4 (changed from 1.6.2)
Apache Hadoop 2.7.4
Apache Hive 1.2.1
I'm getting this error while building a cube:
java.lang.NoSuchMethodError: org.apache.hive.common.util.ShutdownHookManager.addShutdownHook(Ljava/lang/Runnable;)V
at org.apache.hive.hcatalog.common.HiveClientCache.createShutdownHook(HiveClientCache.java:221)
at org.apache.hive.hcatalog.common.HiveClientCache.<init>(HiveClientCache.java:153)
at org.apache.hive.hcatalog.common.HiveClientCache.<init>(HiveClientCache.java:97)
at org.apache.hive.hcatalog.common.HCatUtil.getHiveMetastoreClient(HCatUtil.java:553)
at org.apache.hive.hcatalog.mapreduce.InitializeInput.getInputJobInfo(InitializeInput.java:104)
at org.apache.hive.hcatalog.mapreduce.InitializeInput.setInput(InitializeInput.java:88)
at org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:95)
at org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:51)
at org.apache.kylin.source.hive.HiveMRInput$HiveTableInputFormat.configureJob(HiveMRInput.java:80)
at org.apache.kylin.engine.mr.steps.FactDistinctColumnsJob.setupMapper(FactDistinctColumnsJob.java:126)
at org.apache.kylin.engine.mr.steps.FactDistinctColumnsJob.run(FactDistinctColumnsJob.java:104)
at org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:131)
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I checked the hive and hadoop library jars directory to see if there are any redundant jars and I found two versions of every type of jar. For example: hive-common-1.2.1.jar and hive-common.jar.
I tried moving either of them to a different location and tried resuming the cube building process. But I got the same error. Any help on this would be greatly appreciated.
This is not supported use case for Dataproc, if you need to use Spark 2.4.4, then you should use Dataproc 1.4 or 1.5 instead of Dataproc 1.0 that comes with Spark 1.6.2.
Aside this, ShutdownHookManager.addShutdownHook(Ljava/lang/Runnable;)V method was added in Hive 2.3.0, but Spark uses fork of the Hive 1.2.1, that's why you need to use Kylin version that supports Hive 1.2.1.
Regarding duplicate jars, version less hive-common.jar is not a duplicate, it's a symbolic link to the versioned hive-common-1.2.1.jar. You can verify this by listing it:
$ ls -al /usr/lib/hive/lib/hive-common.jar
lrwxrwxrwx 1 root root 21 Nov 9 09:20 /usr/lib/hive/lib/hive-common.jar -> hive-common-2.3.6.jar
I changed the Hive version to 2.1.0 and it worked for me. I decided to install this version of Hive by checking the Kylin download page and in turn going through other cloud platforms like AWS EMR and Microsoft Azure HDInsight for Kylin 2.6.4 release.
Thanks, #Igor Dvorzhak for your valuable suggestions.

Hadoop issue with Sqoop installation

I have Hadoop(pseudo distributed mode), Hive, sqoop and mysql installed in my local machine.
But when I am trying to run sqoop Its giving me the following error
Error: /usr/lib/hadoop does not exist!
Please set $HADOOP_COMMON_HOME to the root of your Hadoop installation.
Then I set the sqoop-env-template.sh file with all the information. Beneath is the snapshot of the sqoop-env-template.sh file.
Even after providing the hadoop hive path I face the same error.
I've installed
hadoop in /home/hduser/hadoop version 1.0.3
hive in /home/hduser/hive version 0.11.0
sqoop in /home/hduser/sqoop version 1.4.4
and mysql connector jar java-5.1.29
Could anybody please throw some light on what is going wrong
sqoop-env-template.sh is a template, meaning it doesn't by itself get sourced by the configurator. If you want it to have a custom conf and load it, make a copy as $SQOOP_HOME/conf/sqoop-env.sh.
Note: here is the relevant excerpt from bin/configure-sqoop for version 1.4.4:
SQOOP_CONF_DIR=${SQOOP_CONF_DIR:-${SQOOP_HOME}/conf}
if [ -f "${SQOOP_CONF_DIR}/sqoop-env.sh" ]; then
. "${SQOOP_CONF_DIR}/sqoop-env.sh"
fi

run pig 0.7.0 error : ERROR 2998: Unhandled internal error

I have to connect pig to a hadoop which changed a little from Hadoop 0.20.0. I choose pig 0.7.0, and setting PIG_CLASSPATH by
export PIG_CLASSPATH=$HADOOP_HOME/conf
when I run pig, an error is reported like this:
ERROR org.apache.pig.Main - ERROR 2999: Unexpected internal error. Failed to create DataStorage
So, I copy hadoop-core.jar in $HADOOP_HOME to overwrite hadoop20.jar in $PIG_HOME/lib, then "ant". Now, I can run pig, but when I use dump or store, another error:
Pig Stack Trace
---------------
ERROR 2998: Unhandled internal error. org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.setOutputPath(Lorg/apache/hadoop/mapreduce/Job;Lorg/apache/ hadoop/fs/Path;)V
java.lang.NoSuchMethodError: org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.setOutputPath(Lorg/apache/hadoop/mapreduce/Job;Lorg/apache/hadoop/fs/ Path;)V
at org.apache.pig.builtin.BinStorage.setStoreLocation(BinStorage.java:369)
...
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
at org.apache.pig.Main.main(Main.java:357)
================================================================================
Does anyone have encountered this error, or is my compile way not right?
Thanks.
There is a section about this issue in the Pig FAQ which should give you a good idea what's wrong. Here is the outline taken from this page:
This usually happens when you are connecting hadoop cluster other than standard Apache hadoop 20.2 release. Pig bundles standard hadoop 20.2 jars in release. If you want to connect to other version of hadoop cluster, you need to replace bundled hadoop 20.2 jars with compatible jars. You can try:
do "ant"
copy hadoop jars from your hadoop installation to overwrite ivy/lib/Pig/hadoop-core-0.20.2.jar and ivy/lib/Pig/hadoop-test-0.20.2.jar
do "ant" again
cp pig.jar to overwrite pig-*-core.jar
Some other tricks is also possible. You can use "bin/pig -secretDebugCmd" to inspect the command line of Pig. Make sure you are using the right version of hadoop.
As pointed in this FAQ section, if nothing works I would advise just upgrading to a recent version of Pig after 0.9.1, Pig 0.7 is a bit old.
The Pig (core) jar has a bundled Hadoop dependency, which may differ from the version you want to use. If you have an old Pig version (< 0.9) the you have the option, to build a jar without Hadoop:
cd $PIG_HOME
ant jar-withouthadoop
cp $PIG_HOME/build/pig-x.x.x-dev-withouthadoop.jar $PIG_HOME
Then start Pig:
cd $PIG_HOME/bin
export PIG_CLASSPATH=$HADOOP_HOME/hadoop-core-x.x.x.jar:$HADOOP_HOME/lib/*:$HADOOP_HOME/conf:$PIG_HOME/pig-x.x.x-dev-withouthadoop.jar; ./pig
Newer Pig versions contain the prebuilt withouthadoop version (see this ticket) so you can skip the building process. Furthermore when you run pig it will pick up the withouthadoop jar from PIG_HOME rather than the bundled version, so you don't need to add withouthadoop.jar
to the PIG_CLASSPATH either (provided, that you run Pig from $PIG_HOME/bin)
..Back to your question:
Hadoop 0.20 and its modified variant (0.20-append?) can work even with the latest Pig distribution (0.11.1) :
You just need to do the followings:
unpack Pig 0.11.1
cd $PIG_HOME/bin
export PIG_CLASSPATH=$HADOOP_HOME/hadoop-core-x.x.jar:$HADOOP_HOME/lib/*:$HADOOP_HOME/conf; ./pig
If you still get "Failed to create DataStorage" it's worth to start Pig with -secretDebugCmd as Charles Menguy suggested, so that you
can see whether Pig gets the right Hadoop version..etc.
Did you remember to run start-all.sh from /usr/local/bin? I ran into the same problem and I basically retraced my steps in configuring Hadoop itself. I am able to use Pig now.

Resources