How to run GIS codes through hadoop's prompt? - hadoop

I am running a GIS code through hadoop's prompt in following manner:
Wrote the GIS code in Eclipse including all the GIS jars (relevant).
Went into the dir. where my eclipse workspace is.
Compiled the code by adding all the relevant jars in the classpath. *(The compilation was successful).
Built the jar.
Now running the same jar using hadoop: bin/hadoop jar my_jar_file_name.jar my_pkg_structure.Main_class_file
Now, inspite of the code being error free, when i try to execute through hadoop's propmpt, it gives me multiple issues.
Is there a workable alternative way to do the same without any hassles?
Also note, the gid code runs beautifully in eclipse. Since, I have to do Geo processing over hadoop, I need to run it through hadoop's prompt.

Related

Generating bundle of Hadoop program and using it at runtime in Java (Hadoop local mode)

I am new to Hadoop and offer apologies if this is a naive question.
I have a compiled jar version of Hadoop program. I want to run that Hadoop program (the compiled jar bundle) in local mode (from a Java program). I have no idea how can I do that. I could not locate a tutorial which talks about this. The situation is exacerbated by the fact that jar of Hadoop program is created at runtime so I can not add the jar to my dependencies and use that.
Any help or pointer will be appreciated.

Example Jar in Hadoop release

I am learning Hadoop with book 'Hadoop in Action' by Chuck Lam. In first chapter the books says that Hadoop installation will have example jar and by running 'hadoop jar hadoop-*-examples.jar' will show all the examples. But when I run the command then it throw error 'Could not find or load main class org.apache.hadoop.util.RunJar'. My guess is that installed Hadoop doesn't have example jar. I have installed 'hadoop-2.1.0-beta.tar.gz' on cygwin on Win 7 laptop. Please suggest how to get example jar.
run following command
hadoop jar PathToYourJarFile wordcount inputPath OutputPath
you can get examples jar file at your hadoop installation directory
What I can suggest here is you should manually go to the Hadoop installation directory and look for a jar name similar to hadoop-examples.jar yourself. Different distribution can have different names for the jar.
If you are in Cygwin, while in the Hadoop Installation directory you can also do a ls *examples*.jar to find the same, narrowing down the file listing to any jar file containing examples as a string.
You can then directly use the jar file name like --
hadoop jar <exampleJarYourFound.jar>
Hope this takes you to a solution.

How to build and execute examples in Mahout in Action

I am learning Mahout in Action now and writing to ask how to build and execute examples in the book. I can find instructions with eclipse, but my environment doesn't include UI. So I copied the first example (RecommenderIntro) to RecommenderIntro.java and compile it through javac.
I got an error because the package was not imported. So I am looking for :
Approaches to import missing packages.
I guess, even it compiles successfully, .class file will be generated,
how can I execute it? through "java RecommnderIntro"? I can execute
mahout examples through sudo -u hdfs hadoop jar
mahout-examples-0.7-cdh4.2.0-job.jar
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job, how can I
do something similar for my own example?
All my data is saved in HBase tables, but in the book (and even
google), I cannot find a way to integrate it with HBase, any
suggestions?
q1 and q2, you need a java build tool like maven.
You build the hadoop-jar with : 'mvn clean install' This creates your hadoop job in target/mia-job.jar
You then execute your job with:
hadoop jar target/mia-job.jar RecommenderIntro inputDirIgnored outputDirIgnored
(The RecommenderIntro ignores parameters, but hadoop forces you to specify at least 2 parameters usually the input and output dir )
q3: You can't out-of-the-box.
Option1: export your hbase data to a text file 'intro.csv' with content like: "%userId%, %ItemId%, %score%" as described in the book. Because that's the file the RecommenderIntro is looking for.
Option2: Modify the example code to read data from hbase...
ps1. for developing such an application I'd really advise using an IDE. Because it allows you to use code-completion, execute, build, etc. A simple way to get started is to download a virtual image with hadoop like Cloudera or HortonWorks and install an IDE like eclipse. You can also configure these images to use your hadoop cluster, but you dont need to for small data sets.
ps2. The RecommenderIntro code isn't a distributed implementation and thus can't run on large datasets. It also runs locally instead of on a hadoop cluster.

Problems compiling Hadoop

That's the problem: I have done a simply Hadoop program to "clean" a graph saved in a text file that I will use later (with Hadoop), but I can't compile it!
The compiler can't find Hadoop classes (IntWritable, Text ecc...), and each time I get a "cannot find symbol" error.
I've tried with:
javac -classpath path/to/hadoop/root/hadoop-core-{version}.jar filename.java
I'm running with ubuntu 11.04, and the Hadoop version is 1.0.3.
the problem is that hadoop-core-{version}.jar depends on some other jars. You can find all the dependencies on the Maven repository web site :
http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-core/1.0.3
You should use Maven or add all the dependencies to your project to be able to build it.

Hadoop can't find mahout-examples-$MAHOUT_VERSION-job.jar

I am trying to do a simple clustering job by using mahout on top of hadoop (following this tutorial).
So far, I have hadoop running in a single node mode, I have downloaded and built the mahout core and examples mvn projects, but when I try to run the job, I get a FileNotFound Exception. Here is a screen-shot.
Note that I have checked that the mahout-examples-0.5-job.jar is where it is supposed to be (in D:\mahout\mahout-distribution-0.5\examples\target).

Resources