Running Mahout on Hadoop Cluster - hadoop

I am a Mahout/Hadoop Beginner.
I am trying to run Mahout examples given in "Mahout in Action" Book. I am able to run the examples in Eclipse without Hadoop.
Can you please let me know how to run the same examples in the Hadoop Cluster.

This wiki page has the different articles implemented in Mahout and how to run them. Many of them take the below as an argument
-xm "execution method: sequential or mapreduce"
Mahout requirements mention that it works on Hadoop 0.20.0+. See this tutorial on how to setup Hadoop on a single node and on a multi node cluster on Ubuntu.

Related

Hadoop ResourceManager not show any job's record

I install Hadoop MultiNode cluster based on this link http://pingax.com/install-apache-hadoop-ubuntu-cluster-setup/
then I try to run wordcount example in my environment, but when I access to Resource Manager http://HadoopMaster:8088 to see the job's details, no records show in UI.
I also search this problem, one guy give the solution like that Hadoop is not showing my job in the job tracker even though it is running but in my case, I'm just running hadoop's example, in which wordcount also didn't add any extra configuration for yarn.
Anyone has install successfully Hadoop2 Muiltinode and Hadoop web UI works correctly can help me about this issue or can give a link to install correctly.
Whether you got the output of word-count job?

Deploy Mahout jobs on a cluster

I'm new to Hadoop/Mahout, I understand the concepts, but I'm having issues deploying Mahout jobs to an already set cluster of computers.
I have used Mahout on single computer, but what should I do to make it up and running to an already formed Hadoop cluster?
I have a cluster with Hadoop 0.20.2 installed, and Mahout 0.9, which contains Hadoop 1.2.1. What jars should I copy so I could run code that contains Mahout calls, or what else should I do to make it work on Hadoop cluster?
Any suggestion/example/tutorial would be great.
Thanks
important link for your problem
https://mahout.apache.org/users/clustering/k-means-commandline.html

How to run mahout from command line with KNN based Item Recommender?

I'm new to mahout and still trying to figure things out.
I'm trying to run a KNN based recommender using mahout 0.8 that runs in hadoop cluster (distributed recommender). I'm using mahout 0.8, so KNN is deprecated, but it is still usable (at least when I make it in java code)
I have several questions:
Is it true that there are basically two mahout implementations?
distributed (runs from command line)
non disributed (runs from jar file)
Assumming (1) is correct, Is mahout support running KNN based recommender from command-line? Can someone gives me a direction to do it?
Assumming (1) is wrong, how can I build a recommender in java (I'm using eclipse) that runs in hadoop cluster (distributed)?
Thanks!
KNN is being deprecated because it is being replaced with item-based and user-based cooccurrence recommenders and the ALS-WR recommender, which are better, more modern.
Yes, but not all code has a CLI interface. For the most part the CLI jobs in Mahout are Hadoop/distributed jobs that produce files in HDFS for output. These can be run from jar files with your own code wrapping them as you must with the local/non-distributed/non-Hadoop versions, which do not have a CLI. The in-memory recommenders require you to pass in a user ID to get recs, so you have to write code to do that. The Hadoop versions do have a CLI since they precalculate all recs for all users and put them in files. You'll probably insert them in your DB or serve them up some other way.
No, to my knowledge only user-based, item-based, and ALS-WR recommenders are supported from the command line. This runs the Hadoop/distributed version of the recommenders. This can work on a single machine, of course even using the local filesystem since Hadoop can be set up that way.
For the in-memory recommenders, just write your driver code and run them in eclipse, since Hadoop is not involved it works fine. If you want to use the Hadoop versions, setup Hadoop on your dev machine to run locally using the local filesystem and everything works fine in eclipse. Once you have things debugged move it to your Hadoop cluster. You can also debug remotely on the cluster but that is another question altogether.
The latest thing in Mahout recommenders is one that is trained in the background using Hadoop then the output is indexed by Solr. You then query Solr with items the user has expressed a preference for, no need to precalculate all recs for all users since they returned from a Solr query in near realtime. This is in Mahout 1.0-SNAPSHOT's mahout/examples/ or here https://github.com/pferrel/solr-recommender
BTW this code is being integrated with Mahout 1.0 and moved to run on Spark instead of Hadoop so even the training step will be much much faster.
Update:
I've clarified what can be run from the CLI above.

Hadoop Multi-Node Cluster Set-up Basic Questions

I'm trying to set up a multi-node Apache Hadoop cluster. I'm following their tutorial here: http://hadoop.apache.org/docs/stable/cluster_setup.html. I've set up single-node Hadoop set-ups on each individual node, as well as Sun Java installed on each. Unfortunately, the documentation is not very clear and a bit outdated. Am I supposed to (under the 'Configuring the Hadoop Daemons' sections) update those files (.conf/*-site.xml) on every single node?
Also, how am I supposed to locate the host:port/IP:port pair for each node?
Sorry, I am new to all of this. Thank you in advance for your help!

More detailed Link to install hbase and hadoop on ubuntu

I am going to install hadoop and HBase on ubuntu. when I tried to search any good link then I was unable to find which is fully clear and more descriptive. I need a detailed link from which I can easily set up hsdoop and hbase.
Thanks
you didn't mention either you want to set up these in pseudo distributed mode or multi distributed OR single-node or multi-node . Anyway here are some links which would be helpful for you
hadoop single node cluster ,
hadoop multi node cluster ,
and for hbase , I think you should see these links
install HBase in pseudo distributed mode ,
hbase installation in fully distributed mode
Hope it will help you - Thanks
For psuedo distributed mode I would suggest the following links
HADOOP INSTALLATION
http://preciselyconcise.com/apis_and_installations/hadoop_installation.php
HBASE INSTALLATION
http://preciselyconcise.com/apis_and_installations/hbase_installation.php

Resources