Hadoop can't find mahout-examples-$MAHOUT_VERSION-job.jar - hadoop

I am trying to do a simple clustering job by using mahout on top of hadoop (following this tutorial).
So far, I have hadoop running in a single node mode, I have downloaded and built the mahout core and examples mvn projects, but when I try to run the job, I get a FileNotFound Exception. Here is a screen-shot.
Note that I have checked that the mahout-examples-0.5-job.jar is where it is supposed to be (in D:\mahout\mahout-distribution-0.5\examples\target).

Related

How to run mapreduce samples using jar file?

I am trying to run mapreduce samples such as Bellard ,LongLong, Montgomery,Summation,DistributedPentomino ,OneSidedPentomino usi such as Bellard ,LongLong, Montgomery,Summation,DistributedPentomino ,OneSidedPentomino using hadoop mapresuce example jar. Can anyone please tell me the command to run these samples using jar file?
Here is exact chapter from HortonWorks about Hadoop MapReduce out of the box examples. It is part of their guide.
Running MapReduce examples on Hadoop YARN.
Another. more straightforward short article but without too much details:
Run Sample MapReduce Examples
In fact, having Hadoop version XXX you need:
export YARN_EXAMPLES=$YARN_HOME/share/hadoop/mapreduce
yarn jar $YARN_EXAMPLES/hadoop-mapreduce-examples-XXX.jar
Please check your actual path.

How to build and execute examples in Mahout in Action

I am learning Mahout in Action now and writing to ask how to build and execute examples in the book. I can find instructions with eclipse, but my environment doesn't include UI. So I copied the first example (RecommenderIntro) to RecommenderIntro.java and compile it through javac.
I got an error because the package was not imported. So I am looking for :
Approaches to import missing packages.
I guess, even it compiles successfully, .class file will be generated,
how can I execute it? through "java RecommnderIntro"? I can execute
mahout examples through sudo -u hdfs hadoop jar
mahout-examples-0.7-cdh4.2.0-job.jar
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job, how can I
do something similar for my own example?
All my data is saved in HBase tables, but in the book (and even
google), I cannot find a way to integrate it with HBase, any
suggestions?
q1 and q2, you need a java build tool like maven.
You build the hadoop-jar with : 'mvn clean install' This creates your hadoop job in target/mia-job.jar
You then execute your job with:
hadoop jar target/mia-job.jar RecommenderIntro inputDirIgnored outputDirIgnored
(The RecommenderIntro ignores parameters, but hadoop forces you to specify at least 2 parameters usually the input and output dir )
q3: You can't out-of-the-box.
Option1: export your hbase data to a text file 'intro.csv' with content like: "%userId%, %ItemId%, %score%" as described in the book. Because that's the file the RecommenderIntro is looking for.
Option2: Modify the example code to read data from hbase...
ps1. for developing such an application I'd really advise using an IDE. Because it allows you to use code-completion, execute, build, etc. A simple way to get started is to download a virtual image with hadoop like Cloudera or HortonWorks and install an IDE like eclipse. You can also configure these images to use your hadoop cluster, but you dont need to for small data sets.
ps2. The RecommenderIntro code isn't a distributed implementation and thus can't run on large datasets. It also runs locally instead of on a hadoop cluster.

Hadoop Eclipse Plugin Erros in Ubuntu

I am trying to build some program on hadoop with ubuntu. I am able to successfully install and run hadoop on my machine in pseudo-distributed mode. But when I tried to use eclipse-plugin for making project,I am facing several issue. After putting parameters for connecting to the server in the eclipse plugin I am getting the following error:
1.Error: java.io.IOException:Unknown Protocol to jobTracker:org.apache.hadoop.hdfs.Protocol.ClientProtocol
I am using hadoop 0.20 version and eclipse plugin is also from the configuration directory. Any suggestion or reason why these errors are coming.And what can I do for build hadoop project on eclipse?
Go to "Edit hadoop location".
Switch Map/Reduce Master port with DFS master port.

How to run GIS codes through hadoop's prompt?

I am running a GIS code through hadoop's prompt in following manner:
Wrote the GIS code in Eclipse including all the GIS jars (relevant).
Went into the dir. where my eclipse workspace is.
Compiled the code by adding all the relevant jars in the classpath. *(The compilation was successful).
Built the jar.
Now running the same jar using hadoop: bin/hadoop jar my_jar_file_name.jar my_pkg_structure.Main_class_file
Now, inspite of the code being error free, when i try to execute through hadoop's propmpt, it gives me multiple issues.
Is there a workable alternative way to do the same without any hassles?
Also note, the gid code runs beautifully in eclipse. Since, I have to do Geo processing over hadoop, I need to run it through hadoop's prompt.

Hadoop Mapreduce with two jars (one of the jars is needed on namenode only)

The mapred task is a very simple 'wordcount' implemented by Java (plz, see http://wiki.apache.org/hadoop/WordCount ).
after the last line, "job.waitForCompletion(true);"
I add some code implemented by Jython.
It means the libraries for Jythoon is only needed on namenode.
However, I added all libraries for Jython to a single jar, and then
executed it
hadoop jar wordcount.jar in out
The wordcount is done without any problem.
The problem I want to solve is I have to heavy libraries for Jython that is not needed for the slave nodes(mappers and reducers). the jar is almost 15M (upper than 14M is for Jython).
Can I split them, and get the same results?
Nobody knows this question.
I've solved this problem as follows: even if it's not the best.
Simply, copy jython.jar to /usr/local/hadoop (or path of hadoop installed) which is the default classpath of hadoop, and make a jar without jython.jar
If you need very big libraries to mapreduce task, then
upload jython.jar to hdfs
hadoop fs -put jython.jar Lib/jython.jar
add the follow line to your main code
DistributedCache.addFileToClassPath(new URI("Lib/jython.jar"));

Resources