I'm running hive 2.1.1, hadoop 2.7.3 on Ubuntu 16.04.
According to Hive on Spark: Getting Started , it says
Install/build a compatible version. Hive root pom.xml's
defines what version of Spark it was built/tested
with.
I checked the pom.xml, it shows that spark version is 1.6.0.
<spark.version>1.6.0</spark.version>
But Hive on Spark: Getting Started also says that
Prior to Spark 2.0.0: ./make-distribution.sh --name
"hadoop2-without-hive" --tgz
"-Pyarn,hadoop-provided,hadoop-2.4,parquet-provided"
Since Spark
2.0.0: ./dev/make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.7,parquet-provided"
So now I'm confused because I am running hadoop 2.7.3. Do I have to downgrade my hadoop to 2.4?
Which version of Spark should I use? 1.6.0 or 2.0.0?
Thank you!
I am currently using spark 2.0.2 with hadoop 2.7.3 and hive 2.1 and it's working fine. And I think hive will support both version of spark 1.6.x and 2.x but I will suggest you to go with spark 2.x since it's the latest version.
Some motivational link for why to use spark 2.x
https://docs.cloud.databricks.com/docs/latest/sample_applications/04%20Apache%20Spark%202.0%20Examples/03%20Performance%20Apache%20(Spark%202.0%20vs%201.6).html
Apache Spark vs Apache Spark 2
The current version of Spark 2.X is not compatible with Hive 2.1 and Hadoop 2.7, there is a major bug:
JavaSparkListener is not available and Hive crash on execution
https://issues.apache.org/jira/browse/SPARK-17563
You can try to build Hive 2.1 with Hadoop 2.7 and Spark 1.6 with:
./make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.7,parquet-provided"
If you take a look to the command after 2.0 the difference is that ./make-distribution is inside the folder /dev.
If it does not work for hadoop 2.7.X, I can confirm you that I have been able to successfully built it with Hadoop 2.6, by using:
./make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.6,parquet-provided"
and for scala 2.10.5
Can anyone please suggest the references (or) share the idea regarding installation steps of apache hue on top of apache hadoop 2.6.1
I want to use elasticsearch on hadoop. Can any one suggest me step by step installation and configuration of elasticsearch on hadoop? Is there version dependency of elasticsearch and hadoop?
Installation:
http://www.elastic.co/guide/en/elasticsearch/hadoop/current/install.html
Configuration:
http://www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html
Hadoop 1.x (ideally the latest stable version in the 1.x line,
currently 1.2.1) or 2.x (ideally the latest stable version, currently
2.2.0). elasticsearch-hadoop is tested daily against Apache Hadoop. Any distro compatible with Apache Hadoop should work just fine.
I have Hadoop 2.5.1 installed on three nodes (1 master, 2 slave nodes) and I want to know the version compatibility of HBase and Hive?
Also, are any alternatives for this Hadoop+Hbase+Hive integration or any guides explaining the installation of Hadoop 2.5.1 with compatible HBase and Hive ?
Currently I am trying with Apache Ambari for the above integration and its still ongoing.
Environment:
Jdk version: 1.7.0_67
RHEL 5
64 bit architecture
Any leads will be much appreciated!
With hadoop 2.5.1 supported versions are:
HBase-0.98.x (Support for Hadoop 1.1+ is deprecated.)
HBase-1.0.x (Hadoop 1.x is NOT supported)
HBase-1.1.x
HBase-1.2.x
Here is the link : http://hbase.apache.org/book.html#configuration
Warning: only hive 1.2.1 can work with Hbase 2.x.
I have no other option than to install HBase 0.90.6 as it is only recommended stable version for Nutch (web crawler) other than 0.90.4.
My question, which Hadoop version is recommended for HBase 0.90.6 to work on pseudo distributed mode?
I figured out Hadoop 0.20.205.0 is the compatible version.
I tried Hadoop 1.2.1 but it doesn't seem to work well with HBase 0.90.6