I was wondering if any Mahout version has been confirmed to work properly with any version of Hadoop 3.x.
It looks like both Cloudera's and Amazon's Hadoop distribution removed Mahout when they went from Hadoop 2 to Hadoop 3. But I cannot find any reason for omitting Mahout.
Does anyone have a source or personal experience that indicates that Mahout can work with Hadoop 3?
The hadoop version recommended by trunk branch of Mahout on git hub is hadoop-2.4.1
but take a look at this dockerfile on maser branch:
https://github.com/apache/mahout/blob/master/docker/build/Dockerfile
it uses spark v2.3.1 on hadoop 3.0
gettyimages/spark:2.3.1-hadoop-3.0
hope it could help
Related
I am confused which hbase version to used with hadoop- 2.7.1 or hadoop 2.6.0?
HBase V1.2.x is now supporting Hadoop 2.7.1+ version.
I found this link, there is a chart of Hadoop and HBase compatibility.
Have a look of it.
https://www.quora.com/Which-version-of-hbase-should-I-use-with-Hadoop-2-7-1
I want to install Cloudera distribution of Hadoop and Spark using tarball.
I have already set up Hadoop in Pseudo-Distributed mode in my local machine and successfully ran a Yarn example.
I have downloaded latest tarballs CDH 5.3.x from here
But the folder structure of Spark downloaded from Cloudera is differrent from Apache website. This may be because Cloudera provides it's own version maintained separately.
So, as there are no documentation I have found yet to install Spark from this Cloudera's tarball separately.
Could someone help me to understand how to do it?
Spark could be extracted to any directory. You just need to run the ./bin/spark-submit command (available in extracted spark directory) with required parameters to submit the job. To start spark interactive shell, please use command ./bin/spark-shell.
I have a three-node cluster running Hadoop 2.2.0 and HBase 0.98.1 and I need to use a Nutch 2.2.1 crawler on top of that. But it only supports Hadoop versions from 1.x branch. By now I am able to submit a Nutch job to my cluster, but it fails with java.lang.NumberFormatException.
So my question is pretty simple: how do I make Nutch work in my environment?
At the moment it's impossible to integrate Nutch 2.2.1 (Gora 0.3) with HBase 0.98.x.
See: https://issues.apache.org/jira/browse/GORA-304
Official Nutch tutorial recommends only 0.90.x HBase branch:
http://wiki.apache.org/nutch/Nutch2Tutorial
Also you can download HBase 0.94.24-hadoop-2.5.0 version which I created and tested today:
https://github.com/dobromyslov/hbase/releases/tag/0.94.24-hadoop-2.5.0
Take a note that Nutch 2.2.1 does not support HBase 0.94.x and you have to get the latest Nutch 2.x from Git branch: https://github.com/apache/nutch/tree/2.x
I have no other option than to install HBase 0.90.6 as it is only recommended stable version for Nutch (web crawler) other than 0.90.4.
My question, which Hadoop version is recommended for HBase 0.90.6 to work on pseudo distributed mode?
I figured out Hadoop 0.20.205.0 is the compatible version.
I tried Hadoop 1.2.1 but it doesn't seem to work well with HBase 0.90.6
I have a Hadoop cluster with version 1.2.1 and recently i also downloaded hbase 0.94.11 to try out. I able to setup hbase t run in distributed mode but when i checked the web gui status, it stated that the Hadoop version is 1.0.4. I noticed that this is because hbase use the hadoop-core-1.0.4.jar file comes together with hbase. So my question is should i replace this jar file with the hadoop-core-1.2.1.jar so that hbase can use the latest hadoop-core jar file? And does it matter?
Cw
You don't have to do that if 1.0.4 works for you. Because the newest version may bring you any other problems and just replace hadoop-core.jar is unsafe. If you want to upgrade the HBase, please follow the official guide.
Hope it helps.