I'm running hive 2.1.1, hadoop 2.7.3 on Ubuntu 16.04.
According to Hive on Spark: Getting Started , it says
Install/build a compatible version. Hive root pom.xml's
defines what version of Spark it was built/tested
with.
I checked the pom.xml, it shows that spark version is 1.6.0.
<spark.version>1.6.0</spark.version>
But Hive on Spark: Getting Started also says that
Prior to Spark 2.0.0: ./make-distribution.sh --name
"hadoop2-without-hive" --tgz
"-Pyarn,hadoop-provided,hadoop-2.4,parquet-provided"
Since Spark
2.0.0: ./dev/make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.7,parquet-provided"
So now I'm confused because I am running hadoop 2.7.3. Do I have to downgrade my hadoop to 2.4?
Which version of Spark should I use? 1.6.0 or 2.0.0?
Thank you!
I am currently using spark 2.0.2 with hadoop 2.7.3 and hive 2.1 and it's working fine. And I think hive will support both version of spark 1.6.x and 2.x but I will suggest you to go with spark 2.x since it's the latest version.
Some motivational link for why to use spark 2.x
https://docs.cloud.databricks.com/docs/latest/sample_applications/04%20Apache%20Spark%202.0%20Examples/03%20Performance%20Apache%20(Spark%202.0%20vs%201.6).html
Apache Spark vs Apache Spark 2
The current version of Spark 2.X is not compatible with Hive 2.1 and Hadoop 2.7, there is a major bug:
JavaSparkListener is not available and Hive crash on execution
https://issues.apache.org/jira/browse/SPARK-17563
You can try to build Hive 2.1 with Hadoop 2.7 and Spark 1.6 with:
./make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.7,parquet-provided"
If you take a look to the command after 2.0 the difference is that ./make-distribution is inside the folder /dev.
If it does not work for hadoop 2.7.X, I can confirm you that I have been able to successfully built it with Hadoop 2.6, by using:
./make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.6,parquet-provided"
and for scala 2.10.5
I experienced with Nutch/MongoDB setup. But I cannot find instructions for Nutch 2.3.1 (Hadoop 2.5.2 data structure) configuration on Linux VM. Hadoop has been installed successfully. Please assist me.
Can I avoid Hbase installation for Nutch_2.3.1/Hadoop_2.5.2 configuration?
Regards,
Victor
I want to use elasticsearch on hadoop. Can any one suggest me step by step installation and configuration of elasticsearch on hadoop? Is there version dependency of elasticsearch and hadoop?
Installation:
http://www.elastic.co/guide/en/elasticsearch/hadoop/current/install.html
Configuration:
http://www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html
Hadoop 1.x (ideally the latest stable version in the 1.x line,
currently 1.2.1) or 2.x (ideally the latest stable version, currently
2.2.0). elasticsearch-hadoop is tested daily against Apache Hadoop. Any distro compatible with Apache Hadoop should work just fine.
I have no other option than to install HBase 0.90.6 as it is only recommended stable version for Nutch (web crawler) other than 0.90.4.
My question, which Hadoop version is recommended for HBase 0.90.6 to work on pseudo distributed mode?
I figured out Hadoop 0.20.205.0 is the compatible version.
I tried Hadoop 1.2.1 but it doesn't seem to work well with HBase 0.90.6
I am installing hadoop-1.0.3 in widow-7 using cygbin. Now i want to install HBase so please suggest me which version of Hbase compatible with hadoop 1.0.3 ?
Here is the list of compatibility matrix between Hadoop and HBase versions:
Above: S - Supported, X - Not Supported, NT - Not Tested
More Info available here: http://hadoop.apache.org/releases.html