select query errored out in Hive - hadoop

I am using Hadoop - 1.0.4 & Hive - 1.2.1.
I am facing issue with select query in hive CLI. snippet of error log attached. Please help me resolving the issue.

Thanks Nirmal. Its resolved after upgrading hadoop version to 2.6.0

Related

Hive tez query fails with java.io.IOException

Executing a long running Hive Tez query, it rarely fails with:
java.io.IOException: File hdfs://XXX with newer attempt ID 1 is smaller than the file hdfs://YYY with older attempt ID 0
In our 20 node HDP 3.1.5 cluster (Hive 3.1.0 and Tez 0.9.1), it fails once over around 200 executions
We where hitting HIVE-23354
It seems to have no workaround. It is solved at Hive 4.0.0
I had the same issue with a query with lots of big joins. Decresing (512 mb->16 mb in my case) the size of the tables that fit in memory, namely hive.auto.convert.join.noconditionaltask.size solved the problem for me.
Stack: HDP 3.1.4, Tez 0.9.1, Hive 3.1.0.

Hive cli starting throws error Unrecognized Hadoop major version number: 1.0.4

I am facing below issue to start Hive/beeline :
*Logging initialized using configuration in jar:file:/home/mine/work/apache-hive-2.3.6-bin/lib/hive-common-2.3.6.jar!/hive-log4j2.properties Async: true
Exception in thread "main" java.lang.RuntimeException: java.lang.IllegalArgumentException: Unrecognized Hadoop major version number: 1.0.4
at org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:91)*
i followed below url to set up hive setup:
https://www.bogotobogo.com/Hadoop/BigData_hadoop_Hive_Install_On_Ubuntu_16_04.php
previously, i had hadoop 1.2.1. now installed 2.7.3.
bashrc contains:
mine#ubuntu:~$ echo $HADOOP_HOME
/home/mine/work/hadoop-2.7.3
mine#ubuntu:~$ echo $HIVE_HOME
/home/mine/work/apache-hive-2.3.6-bin
hive-env.sh contains:
export HADOOP_HOME=/home/mine/work/hadoop-2.7.3
Derby server started.
I am not understanding where hadoop 1.0.4 comes. Is there any compatible issue.
Kindly, Please help me with ur Valuable suggestions.
Thanks in advance,
Try : export HADOOP_VERSION="2.7.3"

java.lang.NoSuchMethodError: org.apache.hive.common.util.ShutdownHookManager.addShutdownHook

I'm trying to build a cube on Kylin with Spark as engine type. The cluster contains the following tools:
OS image: 1.0-debian9
Apache Spark 2.4.4 (changed from 1.6.2)
Apache Hadoop 2.7.4
Apache Hive 1.2.1
I'm getting this error while building a cube:
java.lang.NoSuchMethodError: org.apache.hive.common.util.ShutdownHookManager.addShutdownHook(Ljava/lang/Runnable;)V
at org.apache.hive.hcatalog.common.HiveClientCache.createShutdownHook(HiveClientCache.java:221)
at org.apache.hive.hcatalog.common.HiveClientCache.<init>(HiveClientCache.java:153)
at org.apache.hive.hcatalog.common.HiveClientCache.<init>(HiveClientCache.java:97)
at org.apache.hive.hcatalog.common.HCatUtil.getHiveMetastoreClient(HCatUtil.java:553)
at org.apache.hive.hcatalog.mapreduce.InitializeInput.getInputJobInfo(InitializeInput.java:104)
at org.apache.hive.hcatalog.mapreduce.InitializeInput.setInput(InitializeInput.java:88)
at org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:95)
at org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:51)
at org.apache.kylin.source.hive.HiveMRInput$HiveTableInputFormat.configureJob(HiveMRInput.java:80)
at org.apache.kylin.engine.mr.steps.FactDistinctColumnsJob.setupMapper(FactDistinctColumnsJob.java:126)
at org.apache.kylin.engine.mr.steps.FactDistinctColumnsJob.run(FactDistinctColumnsJob.java:104)
at org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:131)
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I checked the hive and hadoop library jars directory to see if there are any redundant jars and I found two versions of every type of jar. For example: hive-common-1.2.1.jar and hive-common.jar.
I tried moving either of them to a different location and tried resuming the cube building process. But I got the same error. Any help on this would be greatly appreciated.
This is not supported use case for Dataproc, if you need to use Spark 2.4.4, then you should use Dataproc 1.4 or 1.5 instead of Dataproc 1.0 that comes with Spark 1.6.2.
Aside this, ShutdownHookManager.addShutdownHook(Ljava/lang/Runnable;)V method was added in Hive 2.3.0, but Spark uses fork of the Hive 1.2.1, that's why you need to use Kylin version that supports Hive 1.2.1.
Regarding duplicate jars, version less hive-common.jar is not a duplicate, it's a symbolic link to the versioned hive-common-1.2.1.jar. You can verify this by listing it:
$ ls -al /usr/lib/hive/lib/hive-common.jar
lrwxrwxrwx 1 root root 21 Nov 9 09:20 /usr/lib/hive/lib/hive-common.jar -> hive-common-2.3.6.jar
I changed the Hive version to 2.1.0 and it worked for me. I decided to install this version of Hive by checking the Kylin download page and in turn going through other cloud platforms like AWS EMR and Microsoft Azure HDInsight for Kylin 2.6.4 release.
Thanks, #Igor Dvorzhak for your valuable suggestions.

Error when trying to execute kylin.sh start in HDP Sandbox 2.6

I installed Apache Kylin, following the official installation guide http://kylin.apache.org/docs/install/index.html, in HDP sandbox 2.6
When I run the script, $KYLIN_HOME/bin/kylin.sh start, I got the error below:
What can I do to fix this error?
Thanks in advance
Check if Hive service is up in your ambari, when Hive service is down Kylin cannot find it and gives the error. Check for .bash_profile as well. When those two issues are addressed kylin should be able to find location of hive dependency.
Kylin uses the find-hive-dependency.sh script to setup the CLASSPATH. This script uses a Hive CLI command (I test it with beeline) to query Hive env vars and extract the CLASSPATH from them.
beeline connect to Hive using the properties at kylin_hive_conf.xml but for some reason (probably due to the Hive version included in HDP 2.6) some of the loaded Hive properties cannot be set when the connection is stablished.
The Hive properties that causes the issue can be discarded for connecting to Hive to query the CLASSPATH, so, to fix this issue:
Edit $KYLIN_HOME/conf/kylin.properties and set kylin.source.hive.client=beeline
Open the find-hive-dependency.sh script, go to line 34 aprox and modify the line
hive_env=${beeline_shell} ${hive_conf_properties} ${beeline_params} --outputformat=dsv -e "set;" 2>&1 | grep 'env:CLASSPATH'
Just remove ${hive_conf_properties}
Check Hive depedencies have been configured by running the command find-hive-dependency.sh.
Now $KYLIN_HOME/bin/kylin.sh start should works.

Unable to use Cassandra from Presto

I have setup presto 0.76, Cassandra 2.1.2 and created a mykeyspace and a table to it. I started both the Cassandra daemons and Presto daemons. When I try to query Cassandra using presto CLI it returns
presto:mykeyspace> select * from userinfo;
Query 20141216_181006_00021_me4u4 failed: replicate_on_write is not a column defined in this metadata
So is there any way to get over it?
Use latest version 0.88 with fixes for cassandra, http://prestodb.io/docs/current/release/release-0.88.html

Resources