Hadoop 2.2.0 is compatible with Mahout 0.8? - hadoop

I have hadoop cluster version 2.2.0 running with mahout 0.8, is it compatible? Because whenever I run this command:
bin/mahout recommenditembased --input mydata.dat --usersFile user.dat --numRecommendations 2 --output output/ --similarityClassname SIMILARITY_PEARSON_CORRELATION
Give me this error:
Exception in thread "main" java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
at org.apache.mahout.common.HadoopUtil.getCustomJobName(HadoopUtil.java:174)
at org.apache.mahout.common.AbstractJob.prepareJob(AbstractJob.java:614)
at org.apache.mahout.cf.taste.hadoop.preparation.PreparePreferenceMatrixJob.run(PreparePreferenceMatrixJob.java:75)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.run(RecommenderJob.java:158)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.main(RecommenderJob.java:312)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:622)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:194)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:622)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Or Im wrong? Any info would be helpful.

No, it does not work with Hadoop 2.x, someone else got the same error message like you.
It seems that at the very least it would require a recompile.
And more people are having the same problems: how can I compile/using mahout for hadoop 2.0?

About an hour ago Mahout has officially added support to Hadoop 2.x in the master branch (see MAHOUT-1329)
Checkout the code here https://github.com/apache/mahout and recompile using:
mvn clean package -Dhadoop2.version=2.2.0
Try and see if that works.

You can get the source code from github https://github.com/apache/mahout and run the following command
mvn -Dhadoop2.version=2.2.0 -DskipTests clean install
Then, you can find the release package in distribution/target/

Related

Hadoop error on Windows : java.lang.UnsatisfiedLinkError

I am new to Hadoop and trying to execute my first mapreduce job of wordcount.
However, whenever I am trying to do it, I am getting the below error:
java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCrc32.nativeCompute
ChunkedSumsByteArray(II[BI[BIILjava/lang/String;JZ)V
at org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSumsByteArray(
Native Method)
at org.apache.hadoop.util.NativeCrc32.calculateChunkedSumsByteArray(Nati
veCrc32.java:86)
at org.apache.hadoop.util.DataChecksum.calculateChunkedSums(DataChecksum
.java:430)
at org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSumme
r.java:202)
at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:1
63)
at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:1
44)
at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:221
7)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOut
putStream.java:72)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java
:106)
at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237)
at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:254)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:61)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:112)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:338)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:190
5)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:187
3)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:183
8)
at org.apache.hadoop.mapreduce.JobResourceUploader.copyJar(JobResourceUp
loader.java:246)
at org.apache.hadoop.mapreduce.JobResourceUploader.uploadFiles(JobResour
ceUploader.java:166)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSub
mitter.java:98)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitt
er.java:191)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma
tion.java:1656)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1315)
at org.apache.hadoop.examples.WordCount.main(WordCount.java:87)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(Progra
mDriver.java:71)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Also, when I do
hadoop checknative -a
it shows me following details:
Native library checking:
hadoop: true C:\hadoop-2.6.1\bin\hadoop.dll
zlib: false
snappy: false
lz4: true revision:43
bzip2: false
openssl: false org.apache.hadoop.util.NativeCodeLoader.buildSupportsOpenssl()Z
winutils: true C:\hadoop-2.6.1\bin\winutils.exe
15/10/19 15:18:24 INFO util.ExitUtil: Exiting with status 1
Is there any way I can resolve this issue ?
I had this error too and I resolved this error:
You access link https://github.com/steveloughran/winutils
Download file "winutils.exe" and "hadoop.dll" with version which you use
Copy 2 file to HADOOP_HOME\bin.
It's Ok for me.
Note: if 2 file "winutils.exe" and "hadoop.dll" not right with hadoop version which you use, it isn't OK
I had same problem. after removing hadoop.dll from the system32 directory and setting the HADOOP_HOME environment variable it worked.
Alternatively use can add a jvm argument like -Djava.library.path=<hadoop home>/lib/native.
check whether same C:\hadoop-2.6.1\bin\hadoop.dll file is used by the java process while you get this error.
use process explorer to find.
I am using Spark2.4.6
I downloaded hadoop-2.9.0 from https://archive.apache.org/dist/hadoop/common/hadoop-2.9.0/
Then during execution of my code where I am also writing the output to an external file,
I used -Djava.library.path=<path to extracted hadoop-2.9.0>/lib/native
as VM arguments.
That's it & it worked for me.
Check your java version. If java is 32bits version, you need uninstall and re-install with 64 bits version for hadoop.
Check command:
java -d32 -version;
java -d64 -version;(no error, if 64 version)

ERROR running HIVE at LINUX COMMAND LINE.. java version = 1.7, HADOOP 2.3.0, Apache HIVE 1.1.0

root#hadoop:~# hive
Logging initialized using configuration in jar:file:/usr/lib/hive/apache-hive-1.1.0-bin/lib/hive-common-1.1.0.jar!/hive-log4j.properties
[ERROR] Terminal initialization failed; falling back to unsupported
java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected
at jline.TerminalFactory.create(TerminalFactory.java:101)
at jline.TerminalFactory.get(TerminalFactory.java:158)
at jline.console.ConsoleReader.<init>(ConsoleReader.java:229)
at jline.console.ConsoleReader.<init>(ConsoleReader.java:221)
at jline.console.ConsoleReader.<init>(ConsoleReader.java:209)
at org.apache.hadoop.hive.cli.CliDriver.getConsoleReader(CliDriver.java:773)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:715)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Exception in thread "main" java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected
at jline.console.ConsoleReader.<init>(ConsoleReader.java:230)
at jline.console.ConsoleReader.<init>(ConsoleReader.java:221)
at jline.console.ConsoleReader.<init>(ConsoleReader.java:209)
at org.apache.hadoop.hive.cli.CliDriver.getConsoleReader(CliDriver.java:773)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:715)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
You should init the hadoop libary:
vi ~/.bashrc
add export HADOOP_USER_CLASSPATH_FIRST=true into the bashrc
source .bashrc
hive
OK, Bingo!
Yes the above suggestion solves the problem.
But I ran into problems when running pig & hive from the same instance.
I removed jline-0.9.94.jar from $HADOOP_HOME/share/hadoop/yarn/lib.
Now I am able to run both.
I had the same problem and got it working from this link:
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started
Hive has upgraded to Jline2 but jline 0.9x exists in the Hadoop lib.
So you should follow these steps:
Delete jline from the Hadoop lib directory (it's only pulled in transitively from ZooKeeper).
export HADOOP_USER_CLASSPATH_FIRST=true

Exception - java.lang.IllegalArgumentException: Label not found in Mahout

I am running the following commands,
/mahout trainnb
-i ${WORK_DIR}/20news-train-vectors -el
-o ${WORK_DIR}/model
-li ${WORK_DIR}/labelindex
-ow
./mahout testnb
-i ${WORK_DIR}/20news-test-vectors
-m ${WORK_DIR}/model
-l ${WORK_DIR}/labelindex\
-ow -o ${WORK_DIR}/20news-testing
On running the last command,I am able to run the map task to 100% but on reduce task I am getting the the following Error:
Exception in thread "main" java.lang.IllegalArgumentException: Label not found: 10002
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
at org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:182)
at org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java: 205)
at org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java: 209)
at org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:173 )
at org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:70)
at org.apache.mahout.classifier.naivebayes.test.TestNaiveBayesDriver.analyzeResults( TestNaiveBayesDriver.java:160)
at org.apache.mahout.classifier.naivebayes.test.TestNaiveBayesDriver.run(TestNaiveBa yesDriver.java:125)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.mahout.classifier.naivebayes.test.TestNaiveBayesDriver.main(TestNaiveB ayesDriver.java:66)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java :43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java :72)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java :43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
I am following the example from http://www.packtpub.com/article/implementing-the-na%C3%AFve-bayes-classifier-in-mahout and also tried the seqdumper on labelindex and can see the keys and values in it.
I am using Hadoop 2.2,Mahout 1.0 and whole environment is setup on Amazon EC2.
Please help me out.Am I doing something wrong ?
I think mahout is not compatible with your hadoop version, you should download the 1.1.0 or 1.2.0 versions of hadoop.
this will probabely fix your problem.
I guess you have your files in local. I also had this problem and I fixed it when I changed the files to HDFS

Trying to load indexed LZO file using LzoPigStorage and elephant-bird

I've got a log file with default LZO compression and an .index file generated using Hadoop-LZO, but when I run a simple Pig file to retrieve the top 100 records using LzoPigStorage, I get the following exception:
Message: Unexpected System Error Occured: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:130)
at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:191)
at java.lang.Thread.run(Thread.java:724)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:257)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
... 3 more
Caused by: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
at com.twitter.elephantbird.mapreduce.input.LzoInputFormat.listStatus(LzoInputFormat.java:55)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:269)
at com.twitter.elephantbird.mapreduce.input.LzoInputFormat.getSplits(LzoInputFormat.java:111)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:274)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:452)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:469)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:336)
I am running Hadoop 2.0, Pig 0.11, and elephant-bird 2.2.3
I don't use elephant-bird, so I'm not entirely sure that this is the problem.
But looking at their build for v2.2.3, it was compiled against hadoop-0.20.2 and pig-0.9.2. I've seen problems with UDFs in Pig when running on a newer version than what the UDF was compiled against.
Are you able to upgrade elephant-bird to a newer version or recompile against the correct libraries?

Hadoop in Windows

I'm trying to run the sample word count program for Hadoop in Windows thru Cygwin. I've installed Hadoop and Cygwin.
I run the wordcount program using this statment:
$ bin/hadoop jar hadoop-examples-1.0.1.jar wordcount input output
I'm getting the following error:
12/05/08 23:05:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12/05/08 23:05:35 ERROR security.UserGroupInformation: PriviledgedActionException as:suresh cause:java.io.IOException: Failed to set permissions of path: \tmp\hadoop-suresh\mapred\staging\suresh1005684431\.staging to 0700
java.io.IOException: Failed to set permissions of path: \tmp\hadoop-suresh\mapred\staging\suresh1005684431\.staging to 0700
at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:682)
at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:655)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:856)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
at org.apache.hadoop.examples.WordCount.main(WordCount.java:67)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
I've set the Cygwin bin path in path variable. Any help is appreciated.
This is a known issue with some versions of Hadoop (see https://issues.apache.org/jira/browse/HADOOP-7682 for the full discussion).
I had this problem with version 1.0.2, so I tried various other versions.
In the end I got it to work by going back to version 0.22.0
If you go back to version 0.22.0 you will need to make a couple of changes to the bin/hadoop-config.sh script:
Change the line that sets up HADOOP_MAPRED_HOME to point the mapreduce directory instead of the mapred directory.
Comment out all the code that sets the java.library.path for a native hadoop install.
You should look at hadoop services for windows - a version of hadoop that was ported to windows.
currently it's in beta, but it's supposed to be released soon.
I've managed to get this working to the point where jobs are dispatched, tasks executed, and results compiled.
en.wikisource.org/wiki/User:Fkorning/Code/Hadoop-on-Cygwin

Resources