Exception running MRBenchmark on hadoop cluster - hadoop

I have set up hadoop in a distributed manner and it is working correctly. I have been trying to run the mrbenchmark as per the instructions given in http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/
However I get the following exception,
hadoop jar build/hadoop-0.20-test.jar mrbench -numRuns 1 -maps 2 -reduces 1 -inputLines 1 -inputType ascending
MRBenchmark.0.0.2
14/03/16 21:37:03 INFO conf.ClientConfigurationUtil: Client configuration lookup disabled/failed. Using default configuration
14/03/16 21:37:04 INFO mapred.MRBench: creating control file: 1 numLines, ASCENDING sortOrder
14/03/16 21:37:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/03/16 21:37:04 INFO mapred.MRBench: created control file: /benchmarks/MRBench/mr_input/input_1714244267.txt
14/03/16 21:37:04 INFO fs.FileSystem: File /benchmarks/MRBench is being deleted only through Trash.
Moved to trash: /benchmarks/MRBench
java.lang.NullPointerException
at java.util.Hashtable.put(Hashtable.java:514)
at java.util.Properties.setProperty(Properties.java:161)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:476)
at org.apache.hadoop.mapred.JobConf.setJar(JobConf.java:474)
at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:180)
at org.apache.hadoop.mapred.MRBench.run(MRBench.java:289)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:68)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
at org.apache.hadoop.mapred.MRBench.main(MRBench.java:211)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:87)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
I am using Hadoop version 20.2.
Thanks in advance.

Including the parameters -jar and specifying an example jar solved my problem ( I mentioned the mrbenchmark jar again).
eg. hadoop jar build/hadoop-0.20-test.jar mrbench -numRuns 1 -maps 2 -reduces 1 -inputLines 1 -inputType ascending -jar build/hadoop-0.20-test.jar
Even though the Benchmark states that the argument -jar is initialized to the current jar, this is not the case. The -jar argument has to be specified.

Related

Input path does not exist: hdfs://localhost:9000/user/rab/input

Following is the output of my job ... on single mode it runs well but on pseudo distribute mode it throws the following error all the time... I have tried a lot but could not meet a possible solution from anyone yet. I need a quick fix of the problem.
Highly obliged upon success...
rab#rab-VirtualBox:~/hadoop/hadoop-1.2.1$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
17/03/17 22:13:12 INFO util.NativeCodeLoader: Loaded the native-hadoop library
17/03/17 22:13:12 WARN snappy.LoadSnappy: Snappy native library not loaded
17/03/17 22:13:12 INFO mapred.JobClient: Cleaning up the staging area hdfs://localhost:9000/tmp/hadoop-rab/mapred/staging/rab/.staging/job_201703172201_0004
17/03/17 22:13:12 ERROR security.UserGroupInformation: PriviledgedActionException as:rab cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://localhost:9000/user/rab/input
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://localhost:9000/user/rab/input
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:197)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1081)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1073)
at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353)
at org.apache.hadoop.examples.Grep.run(Grep.java:69)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.examples.Grep.main(Grep.java:93)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
The log output shows error:
"org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://localhost:9000/user/rab/input"
Do you have a directory called "input" under /user/rab on the HDFS file system? The error indicates that you don't ! The error also shows that it is looking at hdfs and not the local file system. You can check with the following command
"hdfs dfs -ls /user/rab"
The full command that you used "bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'" expects the command to be of the form "hadoop jar jarfilename classname hdfsinputdirectory hdfsoutputdirectory" where classname is the name of the class in the jar file that you want to run.

File not found exception getting while import export table in hbase

i am running this command "hbase org.apache.hadoop.hbase.mapreduce.Driver export 'temp' /dump"
but i am getting exception Actually i have to export table and import in different database.
2016-06-15 17:56:49,365 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2016-06-15 17:56:49,463 INFO [main] mapreduce.Export: versions=1, starttime=0, endtime=9223372036854775807, keepDeletedCells=false
2016-06-15 17:56:49,745 WARN [main] mapreduce.TableMapReduceUtil: The hbase-prefix-tree module jar containing PrefixTreeCodec is not present. Continuing without it.
2016-06-15 17:56:50,058 INFO [main] Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
2016-06-15 17:56:50,058 INFO [main] jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
2016-06-15 17:56:50,289 INFO [main] mapreduce.JobSubmitter: Cleaning up the staging area file:/app/hadoop/tmp/mapred/staging/abhay1268840199/.staging/job_local1268840199_0001
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hbase.mapreduce.Driver.main(Driver.java:61)
Caused by: java.io.FileNotFoundException: File does not exist: hdfs://localhost:54310/usr/local/hbase/lib/zookeeper-3.4.6.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1072)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1064)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1064)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
at org.apache.hadoop.hbase.mapreduce.Export.main(Export.java:188)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:153)
... 5 more
I know this is definitely common environment issue. Please try to export classpath first and then try to run Export command
export HADOOP_CLASSPATH=`hadoop classpath`:`hbase classpath`
java -cp "$HADOOP_CLASSPATH/*"
second command java -cp (is for debugging purpose)will expand each jar file in the classpath to see whether the missing jar exists in this list or not...
Another quick solution (not preferable but for the time being kind of) : copy the jar hdfs://localhost:54310/usr/local/hbase/lib/zookeeper-3.4.6.jar to that location.

Mapreduce hadoop error with Gradle

The error which I am getting is as :
16/02/10 11:21:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/02/10 11:21:53 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost:9000/tmp/outku already exists
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:266)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
at mr.WordCount.main(WordCount.java:87)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Please help me How can I resolve this Error or what is making this to happen.
Everytime hadoop run a MapReduce, it's create a folder as an output.
If you want to run the same job, you can specify diferent output folder name, or delete the past one.
FileAlreadyExistsException: Output directory hdfs://localhost:9000/tmp/outku. It means output directory already is exist.
Always specify the output directory name at run time(i.e Hadoop will create the directory automatically for you. You need not to worry about the output directory creation)
you have to delete the existing output directory and try running the M/r Job or change the output directory name

Executing example on Giraph1.1.0 on hadoop 2.3.0-cdh-5.0.shows the following error

root#pseudo-hadoop:/usr/lib/hadoop# bin/hadoop jar $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsComputation -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /tiny_graph.txt -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op output/shortestpaths -w 1
14/06/12 17:32:32 INFO utils.ConfigurationUtils: No edge input format specified. Ensure your InputFormat does not require one.
14/06/12 17:32:32 INFO utils.ConfigurationUtils: No edge output format specified. Ensure your OutputFormat does not require one.
14/06/12 17:32:32 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 0, old value = 4)
14/06/12 17:32:32 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/06/12 17:32:33 INFO mapred.JobClient: Cleaning up the staging area hdfs://localhost:8020/var/lib/hadoop-hdfs/cache/mapred/mapred/staging/root/.staging/job_201406121249_0012
Exception in thread "main" java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
at org.apache.giraph.bsp.BspOutputFormat.checkOutputSpecs(BspOutputFormat.java:43)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:987)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:948)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:948)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:582)
at org.apache.giraph.job.GiraphJob.run(GiraphJob.java:250)
at org.apache.giraph.GiraphRunner.run(GiraphRunner.java:94)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.giraph.GiraphRunner.main(GiraphRunner.java:124)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
The first warnings don't matter.
A java.lang.IncompatibleClassChangeError suggests you have built Giraph against the wrong version of Hadoop. Try building using the correct profile, e.g. mvn -Phadoop_2.0.0 package

Hadoop in Windows

I'm trying to run the sample word count program for Hadoop in Windows thru Cygwin. I've installed Hadoop and Cygwin.
I run the wordcount program using this statment:
$ bin/hadoop jar hadoop-examples-1.0.1.jar wordcount input output
I'm getting the following error:
12/05/08 23:05:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12/05/08 23:05:35 ERROR security.UserGroupInformation: PriviledgedActionException as:suresh cause:java.io.IOException: Failed to set permissions of path: \tmp\hadoop-suresh\mapred\staging\suresh1005684431\.staging to 0700
java.io.IOException: Failed to set permissions of path: \tmp\hadoop-suresh\mapred\staging\suresh1005684431\.staging to 0700
at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:682)
at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:655)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:856)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
at org.apache.hadoop.examples.WordCount.main(WordCount.java:67)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
I've set the Cygwin bin path in path variable. Any help is appreciated.
This is a known issue with some versions of Hadoop (see https://issues.apache.org/jira/browse/HADOOP-7682 for the full discussion).
I had this problem with version 1.0.2, so I tried various other versions.
In the end I got it to work by going back to version 0.22.0
If you go back to version 0.22.0 you will need to make a couple of changes to the bin/hadoop-config.sh script:
Change the line that sets up HADOOP_MAPRED_HOME to point the mapreduce directory instead of the mapred directory.
Comment out all the code that sets the java.library.path for a native hadoop install.
You should look at hadoop services for windows - a version of hadoop that was ported to windows.
currently it's in beta, but it's supposed to be released soon.
I've managed to get this working to the point where jobs are dispatched, tasks executed, and results compiled.
en.wikisource.org/wiki/User:Fkorning/Code/Hadoop-on-Cygwin

Resources