mahout wont start up. Anything to do with compatible version between hadoop and mahout? - hadoop

I am new to hadoop and not to say mahout. I hope someone could assist me to get through here.. have been trying for 2 days..
I have already a hadoop cluster running.
I am using hadoop-2.0.0-alpha.
I installed mahout (ahout-distribution-0.7) and maven-2.2.1 (latest maven-3.0.4 doesnt work)
Now i would like to just run mahout to get the idea of what is it.
I learnt that by typing "mahout" it will print out a list of options (algorithms) available in mahout, but when i typed mahout, it just gives me Java Exception.
$ [hadoop#localhost bin]$ mahout
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using /home/hadoop/hadoop/bin/hadoop and HADOOP_CONF_DIR=/home/hadoop/hadoop/conf
MAHOUT-JOB: /home/hadoop/mahout/examples/target/mahout-examples-0.7-job.jar
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.util.ProgramDriver.driver([Ljava/lang/String;)V
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:123)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
From what i googled online, most of the answers required me to use lower version of hadoop, ie hadoop-0.20, Is my problem now has something to do with my hadoop version?
Thank you.
======== NEWLY EDITED ========
I changed my hadoop version to hadoop-1.0.3 and now it works when i typed "mahout" (my mahout is version7)
But it fails again with the similar error, when i tried to run an example..
$ hadoop /home/hadoop/mahout/core/target/mahout-core-0.7-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -Dmapred.output.dir=output -Dmapred.input.dir=input/prefs.txt --usersFile input/users.txt --similarityClassname SIMILARITY_PEARSON_CORRELATION
Caused by: java.lang.ClassNotFoundException: .home.hadoop.mahout.core.target.mahout-core-0.7-job.jar
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
Could not find the main class: /home/hadoop/mahout/core/target/mahout-core-0.7-job.jar. Program will exit.
Hmm..

Yes it looks like you need to use a different version of Hadoop (or build the latest Mahout from source) if you want this to work. You got a NoSuchMethodError so the first thing to do is check if the ProgramDriver is in the distribution of hadoop you're using.
Looking at the API docs for the various version you can see that its in v0.0.20.x but had been removed from the newer versions.
http://hadoop.apache.org/common/docs/r0.20.205.0/api/index.html
http://hadoop.apache.org/common/docs/r2.0.0-alpha/api/index.html (your version)
Looking at the JIRA for Mahout you can see a bug was submitted for a similar problem to this on July 11th and has been fixed in version 0.8.
[MAHOUT-1044] https://issues.apache.org/jira/browse/MAHOUT-1044
Update:
Shouldn't your command have jar after the hadoop command? Something like:
$ hadoop jar /home/hadoop/mahout/core/target/mahout-core-0.7-job.jar
etc

#BinaryNerd is right. There is a bug in Mahout as detailed in:
[MAHOUT-1044] https://issues.apache.org/jira/browse/MAHOUT-1044
Mahout 0.7 command line gives the NoClassDef ProgramDriver error as detailed in the first part of your question. This will be fixed in 0.8 or you can edit your bin/mahout as detailed in the bug report to fix the brace placement.
I ran into the same issue using mahout under Cloudera CDH 4.1.2. Changing the brace in my bin/mahout fixed it.

Related

Flink error - org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4

I am trying to run a flink job using a file from HDFS. I have created a dataset as following -
DataSource<Tuple2<LongWritable, Text>> visits = env.readHadoopFile(new TextInputFormat(), LongWritable.class,Text.class, Config.pathToVisits());
I am using flink's latest version - 0.9.0-milestone-1-hadoop1
(I have also tried with 0.9.0-milestone-1)
whereas my Hadoop version is 2.6.0
But, I get the following exception when I try to execute the job. I have searched for similar problem, and it is related to version incompatibility between client and hdfs.
Exception in thread "main" org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4
at org.apache.hadoop.ipc.Client.call(Client.java:1113)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
at com.sun.proxy.$Proxy5.getProtocolVersion(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
Can you please let me know what changes should I make in my pom, so that it points to correct Hadoop/HDFS version? or changes elsewhere?
Or I need to downgrade the hadoop installation?
Have you tried the Hadoop-2 build of Flink? Have a look at the downloads page. There is a build called flink-0.9.0-milestone-1-bin-hadoop2.tgz that should work with Hadoop 2.

Error in hadoop examples.jar

I just installed Hadoop from the yahoo developers network running on a vm. I ran the following code after start-all.sh after cd-ing to the bin folder
hadoop jar hadoop-0.19.0.-examples.jar pi 10 1000000
I'm getting
java. io.IOException:Error opening jon jar:hadoop-0.18.0-examples.jar
at org.apache.hadoop.util.main(RunJar.java:90) at
org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) at
org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at
org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at
org.apache.hadoop.mapred.JobShell.main(JobShell.java:68) caused
by:java.util.ZipExcaption:error in opening zip file
How do i sort this out?
Please make sure that have below things in place
Your examples.jar file is present in the path where you are running the above command. else you need to give complete path for the jar file.
hadoop jar /usr/lib/hadoop-mapreduce/*example.jar pi 10 100000
It has appropriate read permissions for the user that you are using to run the hadoop job.
If you still face issue, please update logs in your question.
You will face this issue if you are using older version of the java . Hadoop needs Java 7 or Java 8. Please check your JAVA version and update if needed.

Is it possible to run Hadoop jobs (like the WordCount sample) in the local mode on Windows without Cygwin?

I have Windows 7, Java 8, Maven and Eclipse.
I've created a Maven project and used almost exactly the same code as here.
It's just a simple "word count" sample.
I try to launch the "driver" program from Eclipse, I provide command line arguments (the input file and the output directory) and get the following error:
Exception in thread "main" java.lang.NullPointerException at
java.lang.ProcessBuilder.start(ProcessBuilder.java:1012) at
org.apache.hadoop.util.Shell.runCommand(Shell.java:404) at
org.apache.hadoop.util.Shell.run(Shell.java:379) at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at
org.apache.hadoop.util.Shell.execCommand(Shell.java:678) at
org.apache.hadoop.util.Shell.execCommand(Shell.java:661) at
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:639) at
org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:435) at
org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:277) at
org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:125) at
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:344) at
org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268) at
org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:422) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at
org.apache.hadoop.mapreduce.Job.submit(Job.java:1265) at
org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1286) at
misc.projects.hadoop.exercises.WordCountDriverApp.main(WordCountDriverApp.java:29)
The failing line (WordCountDriverApp.java:29) contains the command to launch the job:
job.waitForCompletion(true)
I want to make it work and therefore I want to understand something:
Do I have to provide any hdfs-site.xml, yarn-site.xml, ... all this, if I want just the local mode (without any cluster)?
I don't have these XML config files now. As far as I remember, the defaults are all OK for the local mode, maybe I am wrong.
Is it possible at all under Windows (to launch any Hadoop jobs whatsoever) or the whole Hadoop thing is Linux-only?
P.S.:
The Hadoop dependency is the following:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.2.0</version>
<scope>provided</scope>
</dependency>
Download Hadoop 2.6.0 or 2.7.1 compiled for Windows
Create HADOOP_HOME environment variable pointing to the unzipped dir
Add %HADOOP_HOME%\bin to PATH env var
Source: https://stackoverflow.com/a/27394808/543836
Hadoop runs on Windows, it is possible, but you'll grow white hair if you try to pull it off on your own.
To start with, all filesystem operations in Windows Hadoop are routed either through the NativeIO, if available, or via winutils if NativeIO is not loaded. In your case it took the winutils path. You could make NativeIO available if you instruct Eclipse where to find it. See How to add native library to “java.library.path” with Eclipse launch (instead of overriding it), you need to add the location of that hadoop-common-project project target's bin, where you'll find hadoop.dll which hosts the NativeIO. But even after that, you'll still need wintils for container launch. The winutils.exe will be in that same location (the hadoop-common target/bin), but the code looks for it based on %HADOOP_HOME%, so you'll have to define that. And it will go uphill from there. I intentionally omitted the details how to configure all these because I don't think you should, or to be more precise, you should only if you understand how to do it.
It would be much much easier if you take an off-the-shelf Hadoop distribution for Windows, of which there are exactly one: the HDP from Hortonworks, download it, install it, configure it and then run against the 'cluster'.

Trouble with Hadoop RecommenderJob

I have added my input files 'input.txt' and 'users.txt' to HDFS successfully. I have tested Hadoop and Mahout jobs separately with success. However, when I go to run a RecommenderJob with the following command line:
bin/hadoop jar /Applications/mahout-distribution-0.9/mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -Dmapred.input.dir=/user/valtera45/input/input.txt -Dmapred.output.dir=/user/valtera45/output
--usersFile /user/valtera45/input2/users.txt --similarityClassname SIMILARITY_COOCCURRENCE
This is the output I get:
Exception in thread "main" java.io.IOException: Cannot open filename /user/valtera45/temp/preparePreferenceMatrix/numUsers.bin
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1444)
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.(DFSClient.java:1435)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:347)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:178)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:351)
at org.apache.mahout.common.HadoopUtil.readInt(HadoopUtil.java:339)
at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.run(RecommenderJob.java:172)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.main(RecommenderJob.java:322)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Whenever I run a standalone Mahout job, a temp folder gets created within the Mahout directory. The RecommenderJob can't seem to get past this step. Any ideas? Thanks in advance. I know the input files I am using are well formatted because they have worked successfully for others.
hadoop jar mahout-core-0.8-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -Dmapred.input.dir=large_data.csv -Dmapred.output.dir=output/output1.csv -s SIMILARITY_LOGLIKELIHOOD --booleanData --numRecommendations 5
I am using this and my program is running successfully on ec2 instance with mahout and hadoop but i am not able to get relevant results. if anyone knows anything about it please revert on this.

run pig 0.7.0 error : ERROR 2998: Unhandled internal error

I have to connect pig to a hadoop which changed a little from Hadoop 0.20.0. I choose pig 0.7.0, and setting PIG_CLASSPATH by
export PIG_CLASSPATH=$HADOOP_HOME/conf
when I run pig, an error is reported like this:
ERROR org.apache.pig.Main - ERROR 2999: Unexpected internal error. Failed to create DataStorage
So, I copy hadoop-core.jar in $HADOOP_HOME to overwrite hadoop20.jar in $PIG_HOME/lib, then "ant". Now, I can run pig, but when I use dump or store, another error:
Pig Stack Trace
---------------
ERROR 2998: Unhandled internal error. org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.setOutputPath(Lorg/apache/hadoop/mapreduce/Job;Lorg/apache/ hadoop/fs/Path;)V
java.lang.NoSuchMethodError: org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.setOutputPath(Lorg/apache/hadoop/mapreduce/Job;Lorg/apache/hadoop/fs/ Path;)V
at org.apache.pig.builtin.BinStorage.setStoreLocation(BinStorage.java:369)
...
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
at org.apache.pig.Main.main(Main.java:357)
================================================================================
Does anyone have encountered this error, or is my compile way not right?
Thanks.
There is a section about this issue in the Pig FAQ which should give you a good idea what's wrong. Here is the outline taken from this page:
This usually happens when you are connecting hadoop cluster other than standard Apache hadoop 20.2 release. Pig bundles standard hadoop 20.2 jars in release. If you want to connect to other version of hadoop cluster, you need to replace bundled hadoop 20.2 jars with compatible jars. You can try:
do "ant"
copy hadoop jars from your hadoop installation to overwrite ivy/lib/Pig/hadoop-core-0.20.2.jar and ivy/lib/Pig/hadoop-test-0.20.2.jar
do "ant" again
cp pig.jar to overwrite pig-*-core.jar
Some other tricks is also possible. You can use "bin/pig -secretDebugCmd" to inspect the command line of Pig. Make sure you are using the right version of hadoop.
As pointed in this FAQ section, if nothing works I would advise just upgrading to a recent version of Pig after 0.9.1, Pig 0.7 is a bit old.
The Pig (core) jar has a bundled Hadoop dependency, which may differ from the version you want to use. If you have an old Pig version (< 0.9) the you have the option, to build a jar without Hadoop:
cd $PIG_HOME
ant jar-withouthadoop
cp $PIG_HOME/build/pig-x.x.x-dev-withouthadoop.jar $PIG_HOME
Then start Pig:
cd $PIG_HOME/bin
export PIG_CLASSPATH=$HADOOP_HOME/hadoop-core-x.x.x.jar:$HADOOP_HOME/lib/*:$HADOOP_HOME/conf:$PIG_HOME/pig-x.x.x-dev-withouthadoop.jar; ./pig
Newer Pig versions contain the prebuilt withouthadoop version (see this ticket) so you can skip the building process. Furthermore when you run pig it will pick up the withouthadoop jar from PIG_HOME rather than the bundled version, so you don't need to add withouthadoop.jar
to the PIG_CLASSPATH either (provided, that you run Pig from $PIG_HOME/bin)
..Back to your question:
Hadoop 0.20 and its modified variant (0.20-append?) can work even with the latest Pig distribution (0.11.1) :
You just need to do the followings:
unpack Pig 0.11.1
cd $PIG_HOME/bin
export PIG_CLASSPATH=$HADOOP_HOME/hadoop-core-x.x.jar:$HADOOP_HOME/lib/*:$HADOOP_HOME/conf; ./pig
If you still get "Failed to create DataStorage" it's worth to start Pig with -secretDebugCmd as Charles Menguy suggested, so that you
can see whether Pig gets the right Hadoop version..etc.
Did you remember to run start-all.sh from /usr/local/bin? I ran into the same problem and I basically retraced my steps in configuring Hadoop itself. I am able to use Pig now.

Resources