I'm trying to build a cube on Kylin with Spark as engine type. The cluster contains the following tools:
OS image: 1.0-debian9
Apache Spark 2.4.4 (changed from 1.6.2)
Apache Hadoop 2.7.4
Apache Hive 1.2.1
I'm getting this error while building a cube:
java.lang.NoSuchMethodError: org.apache.hive.common.util.ShutdownHookManager.addShutdownHook(Ljava/lang/Runnable;)V
at org.apache.hive.hcatalog.common.HiveClientCache.createShutdownHook(HiveClientCache.java:221)
at org.apache.hive.hcatalog.common.HiveClientCache.<init>(HiveClientCache.java:153)
at org.apache.hive.hcatalog.common.HiveClientCache.<init>(HiveClientCache.java:97)
at org.apache.hive.hcatalog.common.HCatUtil.getHiveMetastoreClient(HCatUtil.java:553)
at org.apache.hive.hcatalog.mapreduce.InitializeInput.getInputJobInfo(InitializeInput.java:104)
at org.apache.hive.hcatalog.mapreduce.InitializeInput.setInput(InitializeInput.java:88)
at org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:95)
at org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:51)
at org.apache.kylin.source.hive.HiveMRInput$HiveTableInputFormat.configureJob(HiveMRInput.java:80)
at org.apache.kylin.engine.mr.steps.FactDistinctColumnsJob.setupMapper(FactDistinctColumnsJob.java:126)
at org.apache.kylin.engine.mr.steps.FactDistinctColumnsJob.run(FactDistinctColumnsJob.java:104)
at org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:131)
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I checked the hive and hadoop library jars directory to see if there are any redundant jars and I found two versions of every type of jar. For example: hive-common-1.2.1.jar and hive-common.jar.
I tried moving either of them to a different location and tried resuming the cube building process. But I got the same error. Any help on this would be greatly appreciated.
This is not supported use case for Dataproc, if you need to use Spark 2.4.4, then you should use Dataproc 1.4 or 1.5 instead of Dataproc 1.0 that comes with Spark 1.6.2.
Aside this, ShutdownHookManager.addShutdownHook(Ljava/lang/Runnable;)V method was added in Hive 2.3.0, but Spark uses fork of the Hive 1.2.1, that's why you need to use Kylin version that supports Hive 1.2.1.
Regarding duplicate jars, version less hive-common.jar is not a duplicate, it's a symbolic link to the versioned hive-common-1.2.1.jar. You can verify this by listing it:
$ ls -al /usr/lib/hive/lib/hive-common.jar
lrwxrwxrwx 1 root root 21 Nov 9 09:20 /usr/lib/hive/lib/hive-common.jar -> hive-common-2.3.6.jar
I changed the Hive version to 2.1.0 and it worked for me. I decided to install this version of Hive by checking the Kylin download page and in turn going through other cloud platforms like AWS EMR and Microsoft Azure HDInsight for Kylin 2.6.4 release.
Thanks, #Igor Dvorzhak for your valuable suggestions.
Related
How can I know whether my cluster has been setup using Hortonworks,Cloudera or normal installation of hadoop components?
Also how can I know the port number of various services?
It is difficult to identify hadoop distribution from port number, since Apache, Hortonworks, Cloudera distros uses different port numbers
Other options are to check for cluster management service agents (Cloudera Manager - agent start up script - /etc/init.d/cloudera-scm-agent , Hortonworks - Ambari agent start up script - /etc/init.d/ambari-agent, Vanilla Apache hadoop will not have any agents in the server
Another option is to check hadoop classpath, below command can be used to get the classpath.
`hadoop classpath`
Most of hadoop distributions include distro name in the classpath, If classpath doesn't contains any of below keywords, distribution/setup will be Apache/Normal installation.
hdp - (Hortonworks)
cdh - (Cloudera)
The simplest way is to run hadoop version command and in output you will see, what version of Hadoop you are having and also which distribution and its version you are running with. If you will find words like cdh or hdp then cdh stands for cloudera and hdp for hortonworks.
For example, here I am having cloudera and with hadoop version command below is output.
Here in first line Hadoop version followed by hadoop distribution and its version.
Hope this will help.
Command hdfs version will give you version of the hadoop and its distribution
I'm using Hadoop 0.23.8 pseudo distributed and HBase 0.94.8. My HBase master is failing with:
Server IPC version 5 cannot communicate with client version 4
I think this is because HBase is using hadoop-core-1.0.4.jar in its lib folder.
Now http://cloudfront.blogspot.in/2012/06/how-to-configure-habse-in-pseudo.html#.UYfPYkAW38s suggests I should replace this jar by copying:
the hadoop-core-*.jar from your HADOOP_HOME ...
but there are no hadoop-core-*.jars in 0.23.8.
Will this process work for 0.23.8, and if so, which jars should I be using?
TIA!
I gave up with this and am using hadoop 2.2.0 which works well (ish) with HBase.
Has anyone tried/succeeded in installing Hue on Hadoop without Cloudera?
I have gotten to a point where I can reliably reproduce a hadoop cluster with hbase and hive and can set it all up in about 15 minutes. I'd love to have Hue along with all this without having to go back and redo my setup with Cloudera.
Checkout slides #19 & #5, Hue is getting everywhere and is compatible with Hadoop 0.20 / 1.2.0 / 2.2.0: http://gethue.com/hue-goes-to-paris-hug-france/
Hue has tarball releases releases that you are free to install. You can also simply clone the source code (Hue is open source and Apache Licenced) github: https://github.com/cloudera/hue and build the branch you want.
Upstream documentation is here or CDH's one here.
Hue is also packaged in BigTop (and so based on Vanilla Hadoop).
Hue is a Web Server (Django based) which acts as a view on top of Hadoop. So Hue just needs to be installed and then configured by adding the hosts of NameNode, JobTracker, Resource Manager, Oozie, HiveServer... etc in its hue.ini.
Also, as detailed on the gehue.com/releases, the version you need might depend on your Hive version.
Notice that without Cloudera's distribution your mileage might vary but feel free to chime-in on the Hue user-list or gethue.com ;)
We are also seeing for improving Hue setup with Amazon AWS/EMR!
To build and run hue 3.6.0 with apache hadoop 2.4.1
git clone https://github.com/cloudera/hue.git (Notice! releases/tag/release-3.6.0 is unstable, It's better to build from latest master. I built from Aug 7, 87d6b2da1 - it's stable)
cd hue
$ vi maven/pom.xml
change hadoop.version to 2.4.1
replace hadoop-core with hadoop-common
set hadoop-test version to 1.2.1
remove files which need hadoop mr1
$ rm desktop/libs/hadoop/java/src/main/java/org/apache/hadoop/mapred/ThriftJobTrackerPlugin.java
$ rm desktop/libs/hadoop/java/src/main/java/org/apache/hadoop/thriftfs/ThriftJobTrackerPlugin.java
build hue $ make apps
configure hue $ vi desktop/conf/pseudo-distributed.ini
run hue server in dev mode $ build/env/bin/hue runserver 0.0.0.0:8000
Follow the Hue manual installation steps from Hortonworks documentation, it will take you step-by-step on how to do it manually.
Quote: "...without Cloudera's distribution your mileage might vary...."
Indeed, it will vary A LOT! It would seem that the following is quite true:
Per the install giude:
http://cloudera.github.io/hue/docs-2.0.1/manual.html#_install_hue
NOTE:
Hue requires the Hadoop contained in Cloudera’s Distribution including Apache Hadoop (CDH), version 3 update 4 or later.
I've tried it and have run into walls with Hue trying to connect to Hive, Pig and OOZIE.
At this stage - from my experience at least - Hue will NOT run on a standard Apache Hadoop installation using standard Apache tools like Hive and Pig. It must be a vintage of Cloudera’s Distribution.
If anyone has any other (positive) experiences installing Hue outside of the Cloudera’s Distribution, I'd be quite interested to hear about them...
Has anyone tried configuring Hue with the frameworks from Apache and not with CDH. The documentation says to set the mapred.jobtracker.plugins property to org.apache.hadoop.thriftfs.ThriftJobTrackerPlugin and check the JT log files to make sure that the Thrift plugin has been loaded. But, I don't see anything related to Thrift in the JT log files. And, also looks like the mapred.jobtracker.plugins in not defined in the mapred-site.xml for Hadoop 1.2.1 which is the latest stable release.
Did you had the mapred.jobtracker.plugins to the mapred-site.xml? Hadoop support plugins since 1.2.0 so you should be good.
I have to connect pig to a hadoop which changed a little from Hadoop 0.20.0. I choose pig 0.7.0, and setting PIG_CLASSPATH by
export PIG_CLASSPATH=$HADOOP_HOME/conf
when I run pig, an error is reported like this:
ERROR org.apache.pig.Main - ERROR 2999: Unexpected internal error. Failed to create DataStorage
So, I copy hadoop-core.jar in $HADOOP_HOME to overwrite hadoop20.jar in $PIG_HOME/lib, then "ant". Now, I can run pig, but when I use dump or store, another error:
Pig Stack Trace
---------------
ERROR 2998: Unhandled internal error. org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.setOutputPath(Lorg/apache/hadoop/mapreduce/Job;Lorg/apache/ hadoop/fs/Path;)V
java.lang.NoSuchMethodError: org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.setOutputPath(Lorg/apache/hadoop/mapreduce/Job;Lorg/apache/hadoop/fs/ Path;)V
at org.apache.pig.builtin.BinStorage.setStoreLocation(BinStorage.java:369)
...
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
at org.apache.pig.Main.main(Main.java:357)
================================================================================
Does anyone have encountered this error, or is my compile way not right?
Thanks.
There is a section about this issue in the Pig FAQ which should give you a good idea what's wrong. Here is the outline taken from this page:
This usually happens when you are connecting hadoop cluster other than standard Apache hadoop 20.2 release. Pig bundles standard hadoop 20.2 jars in release. If you want to connect to other version of hadoop cluster, you need to replace bundled hadoop 20.2 jars with compatible jars. You can try:
do "ant"
copy hadoop jars from your hadoop installation to overwrite ivy/lib/Pig/hadoop-core-0.20.2.jar and ivy/lib/Pig/hadoop-test-0.20.2.jar
do "ant" again
cp pig.jar to overwrite pig-*-core.jar
Some other tricks is also possible. You can use "bin/pig -secretDebugCmd" to inspect the command line of Pig. Make sure you are using the right version of hadoop.
As pointed in this FAQ section, if nothing works I would advise just upgrading to a recent version of Pig after 0.9.1, Pig 0.7 is a bit old.
The Pig (core) jar has a bundled Hadoop dependency, which may differ from the version you want to use. If you have an old Pig version (< 0.9) the you have the option, to build a jar without Hadoop:
cd $PIG_HOME
ant jar-withouthadoop
cp $PIG_HOME/build/pig-x.x.x-dev-withouthadoop.jar $PIG_HOME
Then start Pig:
cd $PIG_HOME/bin
export PIG_CLASSPATH=$HADOOP_HOME/hadoop-core-x.x.x.jar:$HADOOP_HOME/lib/*:$HADOOP_HOME/conf:$PIG_HOME/pig-x.x.x-dev-withouthadoop.jar; ./pig
Newer Pig versions contain the prebuilt withouthadoop version (see this ticket) so you can skip the building process. Furthermore when you run pig it will pick up the withouthadoop jar from PIG_HOME rather than the bundled version, so you don't need to add withouthadoop.jar
to the PIG_CLASSPATH either (provided, that you run Pig from $PIG_HOME/bin)
..Back to your question:
Hadoop 0.20 and its modified variant (0.20-append?) can work even with the latest Pig distribution (0.11.1) :
You just need to do the followings:
unpack Pig 0.11.1
cd $PIG_HOME/bin
export PIG_CLASSPATH=$HADOOP_HOME/hadoop-core-x.x.jar:$HADOOP_HOME/lib/*:$HADOOP_HOME/conf; ./pig
If you still get "Failed to create DataStorage" it's worth to start Pig with -secretDebugCmd as Charles Menguy suggested, so that you
can see whether Pig gets the right Hadoop version..etc.
Did you remember to run start-all.sh from /usr/local/bin? I ran into the same problem and I basically retraced my steps in configuring Hadoop itself. I am able to use Pig now.