Does Hadoop 3 support Mahout?

Does Hadoop 3 support Mahout? - hadoop

I was wondering if any Mahout version has been confirmed to work properly with any version of Hadoop 3.x.
It looks like both Cloudera's and Amazon's Hadoop distribution removed Mahout when they went from Hadoop 2 to Hadoop 3. But I cannot find any reason for omitting Mahout.
Does anyone have a source or personal experience that indicates that Mahout can work with Hadoop 3?

The hadoop version recommended by trunk branch of Mahout on git hub is hadoop-2.4.1
but take a look at this dockerfile on maser branch:
https://github.com/apache/mahout/blob/master/docker/build/Dockerfile
it uses spark v2.3.1 on hadoop 3.0
gettyimages/spark:2.3.1-hadoop-3.0
hope it could help

Related

Which hbase version to be used with hadoop-2.7.1

I am confused which hbase version to used with hadoop- 2.7.1 or hadoop 2.6.0?

HBase V1.2.x is now supporting Hadoop 2.7.1+ version.
I found this link, there is a chart of Hadoop and HBase compatibility.
Have a look of it.
https://www.quora.com/Which-version-of-hbase-should-I-use-with-Hadoop-2-7-1

how to install Spark and Hadoop from tarball separately [Cloudera]

I want to install Cloudera distribution of Hadoop and Spark using tarball.
I have already set up Hadoop in Pseudo-Distributed mode in my local machine and successfully ran a Yarn example.
I have downloaded latest tarballs CDH 5.3.x from here
But the folder structure of Spark downloaded from Cloudera is differrent from Apache website. This may be because Cloudera provides it's own version maintained separately.
So, as there are no documentation I have found yet to install Spark from this Cloudera's tarball separately.
Could someone help me to understand how to do it?

Spark could be extracted to any directory. You just need to run the ./bin/spark-submit command (available in extracted spark directory) with required parameters to submit the job. To start spark interactive shell, please use command ./bin/spark-shell.

Nutch in Hadoop 2.x

I have a three-node cluster running Hadoop 2.2.0 and HBase 0.98.1 and I need to use a Nutch 2.2.1 crawler on top of that. But it only supports Hadoop versions from 1.x branch. By now I am able to submit a Nutch job to my cluster, but it fails with java.lang.NumberFormatException.
So my question is pretty simple: how do I make Nutch work in my environment?

At the moment it's impossible to integrate Nutch 2.2.1 (Gora 0.3) with HBase 0.98.x.
See: https://issues.apache.org/jira/browse/GORA-304
Official Nutch tutorial recommends only 0.90.x HBase branch:
http://wiki.apache.org/nutch/Nutch2Tutorial
Also you can download HBase 0.94.24-hadoop-2.5.0 version which I created and tested today:
https://github.com/dobromyslov/hbase/releases/tag/0.94.24-hadoop-2.5.0
Take a note that Nutch 2.2.1 does not support HBase 0.94.x and you have to get the latest Nutch 2.x from Git branch: https://github.com/apache/nutch/tree/2.x

Which Hadoop version recommended for HBase 0.90.6?

I have no other option than to install HBase 0.90.6 as it is only recommended stable version for Nutch (web crawler) other than 0.90.4.
My question, which Hadoop version is recommended for HBase 0.90.6 to work on pseudo distributed mode?

I figured out Hadoop 0.20.205.0 is the compatible version.
I tried Hadoop 1.2.1 but it doesn't seem to work well with HBase 0.90.6

hbase 0.94.11 and hadoop version

I have a Hadoop cluster with version 1.2.1 and recently i also downloaded hbase 0.94.11 to try out. I able to setup hbase t run in distributed mode but when i checked the web gui status, it stated that the Hadoop version is 1.0.4. I noticed that this is because hbase use the hadoop-core-1.0.4.jar file comes together with hbase. So my question is should i replace this jar file with the hadoop-core-1.2.1.jar so that hbase can use the latest hadoop-core jar file? And does it matter?
Cw

You don't have to do that if 1.0.4 works for you. Because the newest version may bring you any other problems and just replace hadoop-core.jar is unsafe. If you want to upgrade the HBase, please follow the official guide.
Hope it helps.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Does Hadoop 3 support Mahout? - hadoop

The hadoop version recommended by trunk branch of Mahout on git hub is hadoop-2.4.1 but take a look at this dockerfile on maser branch: https://github.com/apache/mahout/blob/master/docker/build/Dockerfile it uses spark v2.3.1 on hadoop 3.0 gettyimages/spark:2.3.1-hadoop-3.0 hope it could help

Related

Which hbase version to be used with hadoop-2.7.1

how to install Spark and Hadoop from tarball separately [Cloudera]

Nutch in Hadoop 2.x

Which Hadoop version recommended for HBase 0.90.6?

hbase 0.94.11 and hadoop version

Categories

Resources