Hadoop in Windows

Hadoop in Windows - hadoop

I'm trying to run the sample word count program for Hadoop in Windows thru Cygwin. I've installed Hadoop and Cygwin.
I run the wordcount program using this statment:
$ bin/hadoop jar hadoop-examples-1.0.1.jar wordcount input output
I'm getting the following error:
12/05/08 23:05:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12/05/08 23:05:35 ERROR security.UserGroupInformation: PriviledgedActionException as:suresh cause:java.io.IOException: Failed to set permissions of path: \tmp\hadoop-suresh\mapred\staging\suresh1005684431\.staging to 0700
java.io.IOException: Failed to set permissions of path: \tmp\hadoop-suresh\mapred\staging\suresh1005684431\.staging to 0700
at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:682)
at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:655)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:856)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
at org.apache.hadoop.examples.WordCount.main(WordCount.java:67)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
I've set the Cygwin bin path in path variable. Any help is appreciated.

This is a known issue with some versions of Hadoop (see https://issues.apache.org/jira/browse/HADOOP-7682 for the full discussion).
I had this problem with version 1.0.2, so I tried various other versions.
In the end I got it to work by going back to version 0.22.0
If you go back to version 0.22.0 you will need to make a couple of changes to the bin/hadoop-config.sh script:
Change the line that sets up HADOOP_MAPRED_HOME to point the mapreduce directory instead of the mapred directory.
Comment out all the code that sets the java.library.path for a native hadoop install.

You should look at hadoop services for windows - a version of hadoop that was ported to windows.
currently it's in beta, but it's supposed to be released soon.

I've managed to get this working to the point where jobs are dispatched, tasks executed, and results compiled.
en.wikisource.org/wiki/User:Fkorning/Code/Hadoop-on-Cygwin

Related

Mapreduce hadoop error with Gradle

The error which I am getting is as :
16/02/10 11:21:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/02/10 11:21:53 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost:9000/tmp/outku already exists
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:266)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
at mr.WordCount.main(WordCount.java:87)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Please help me How can I resolve this Error or what is making this to happen.

Everytime hadoop run a MapReduce, it's create a folder as an output.
If you want to run the same job, you can specify diferent output folder name, or delete the past one.

FileAlreadyExistsException: Output directory hdfs://localhost:9000/tmp/outku. It means output directory already is exist.
Always specify the output directory name at run time(i.e Hadoop will create the directory automatically for you. You need not to worry about the output directory creation)
you have to delete the existing output directory and try running the M/r Job or change the output directory name

Hadoop error on Windows : java.lang.UnsatisfiedLinkError

I am new to Hadoop and trying to execute my first mapreduce job of wordcount.
However, whenever I am trying to do it, I am getting the below error:
java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCrc32.nativeCompute
ChunkedSumsByteArray(II[BI[BIILjava/lang/String;JZ)V
at org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSumsByteArray(
Native Method)
at org.apache.hadoop.util.NativeCrc32.calculateChunkedSumsByteArray(Nati
veCrc32.java:86)
at org.apache.hadoop.util.DataChecksum.calculateChunkedSums(DataChecksum
.java:430)
at org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSumme
r.java:202)
at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:1
63)
at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:1
44)
at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:221
7)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOut
putStream.java:72)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java
:106)
at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237)
at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:254)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:61)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:112)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:338)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:190
5)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:187
3)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:183
8)
at org.apache.hadoop.mapreduce.JobResourceUploader.copyJar(JobResourceUp
loader.java:246)
at org.apache.hadoop.mapreduce.JobResourceUploader.uploadFiles(JobResour
ceUploader.java:166)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSub
mitter.java:98)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitt
er.java:191)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma
tion.java:1656)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1315)
at org.apache.hadoop.examples.WordCount.main(WordCount.java:87)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(Progra
mDriver.java:71)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Also, when I do
hadoop checknative -a
it shows me following details:
Native library checking:
hadoop: true C:\hadoop-2.6.1\bin\hadoop.dll
zlib: false
snappy: false
lz4: true revision:43
bzip2: false
openssl: false org.apache.hadoop.util.NativeCodeLoader.buildSupportsOpenssl()Z
winutils: true C:\hadoop-2.6.1\bin\winutils.exe
15/10/19 15:18:24 INFO util.ExitUtil: Exiting with status 1
Is there any way I can resolve this issue ?

I had this error too and I resolved this error:
You access link https://github.com/steveloughran/winutils
Download file "winutils.exe" and "hadoop.dll" with version which you use
Copy 2 file to HADOOP_HOME\bin.
It's Ok for me.
Note: if 2 file "winutils.exe" and "hadoop.dll" not right with hadoop version which you use, it isn't OK

I had same problem. after removing hadoop.dll from the system32 directory and setting the HADOOP_HOME environment variable it worked.
Alternatively use can add a jvm argument like -Djava.library.path=<hadoop home>/lib/native.

check whether same C:\hadoop-2.6.1\bin\hadoop.dll file is used by the java process while you get this error.
use process explorer to find.

I am using Spark2.4.6
I downloaded hadoop-2.9.0 from https://archive.apache.org/dist/hadoop/common/hadoop-2.9.0/
Then during execution of my code where I am also writing the output to an external file,
I used -Djava.library.path=<path to extracted hadoop-2.9.0>/lib/native
as VM arguments.
That's it & it worked for me.

Check your java version. If java is 32bits version, you need uninstall and re-install with 64 bits version for hadoop.
Check command:
java -d32 -version;
java -d64 -version;(no error, if 64 version)

Exception running MRBenchmark on hadoop cluster

I have set up hadoop in a distributed manner and it is working correctly. I have been trying to run the mrbenchmark as per the instructions given in http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/
However I get the following exception,
hadoop jar build/hadoop-0.20-test.jar mrbench -numRuns 1 -maps 2 -reduces 1 -inputLines 1 -inputType ascending
MRBenchmark.0.0.2
14/03/16 21:37:03 INFO conf.ClientConfigurationUtil: Client configuration lookup disabled/failed. Using default configuration
14/03/16 21:37:04 INFO mapred.MRBench: creating control file: 1 numLines, ASCENDING sortOrder
14/03/16 21:37:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/03/16 21:37:04 INFO mapred.MRBench: created control file: /benchmarks/MRBench/mr_input/input_1714244267.txt
14/03/16 21:37:04 INFO fs.FileSystem: File /benchmarks/MRBench is being deleted only through Trash.
Moved to trash: /benchmarks/MRBench
java.lang.NullPointerException
at java.util.Hashtable.put(Hashtable.java:514)
at java.util.Properties.setProperty(Properties.java:161)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:476)
at org.apache.hadoop.mapred.JobConf.setJar(JobConf.java:474)
at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:180)
at org.apache.hadoop.mapred.MRBench.run(MRBench.java:289)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:68)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
at org.apache.hadoop.mapred.MRBench.main(MRBench.java:211)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:87)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
I am using Hadoop version 20.2.
Thanks in advance.

Including the parameters -jar and specifying an example jar solved my problem ( I mentioned the mrbenchmark jar again).
eg. hadoop jar build/hadoop-0.20-test.jar mrbench -numRuns 1 -maps 2 -reduces 1 -inputLines 1 -inputType ascending -jar build/hadoop-0.20-test.jar
Even though the Benchmark states that the argument -jar is initialized to the current jar, this is not the case. The -jar argument has to be specified.

Not able to run MapReduce second time onwards in EC2 with S3 for storage

I have setup Single Node hadoop in Amazon EC2 instance. Following this and this I can run the example program for the first time. But to get it running it second time, I have to delete all the directories and files in S3 and local tmp directories after stopping issuing stop-all.sh . I am only running mapred (tasktracker and jobtracker). Attempting to re-run the example for second time onwards I get the error message.
hduser#ip-10-252-196-143:~$ hadoop jar ./hadoop/hadoop-examples-1.2.1.jar wordcount input output2
13/09/20 09:43:06 ERROR security.UserGroupInformation: PriviledgedActionException as:hduser cause:java.io.IOException: The ownership on the staging directory s3://vikesh-hadoop-bucket/home/hduser/tmp/mapred/staging/hduser/.staging is not as expected. It is owned by . The directory must be owned by the submitter hduser or by hduser
java.io.IOException: The ownership on the staging directory s3://vikesh-hadoop-bucket/home/hduser/tmp/mapred/staging/hduser/.staging is not as expected. It is owned by . The directory must be owned by the submitter hduser or by hduser
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:113)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:942)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:550)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580)
at org.apache.hadoop.examples.WordCount.main(WordCount.java:82)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Seems this is related, but is using Cloudera's distribution I believe.
Why is this happening and what can I do to solve this ? I am using hadoop-1.2.1 with OpenJDK 7 on a 64 bit VM.
Thanks

We faced the same problem. We simply delete the s3://{bucket-name}/home/{user}/tmp/mapred/staging/{user}/.staging between each MapReduce... But if someone has a better solution...

Running Hadoop in Windows 7 setting via Cygwin - PriviledgedActionException as:PC cause:java.io.IOException: Failed to set permissions of path:

I use Hadoop distribution 1.1.2. When I try to run example wordcount routine I get following error.
Input command:
'D:/Files/hadoop-1.1.2/hadoop-1.1.2/bin/hadoop' jar 'D:/Files/hadoop-1.1.2/hadoop-1.1.2/hadoop-examples-1.1.2.jar' wordcount input output
The result:
13/07/03 11:02:42 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/07/03 11:02:42 ERROR security.UserGroupInformation: PriviledgedActionException as:PC cause:java.io.IOException: Failed to set permissions of path: \tmp\hadoop-PC\mapred\staging\PC119237705.staging to 0700
java.io.IOException: Failed to set permissions of path: \tmp\hadoop-PC\mapred\staging\PC119237705.staging to 0700
at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:689)
at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:662)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:918)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
at org.apache.hadoop.examples.WordCount.main(WordCount.java:67)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
I have problem to locate the particular cause of this error. Please help.

Looks like you have hit this. You might find this patch helpful. But, before that you might wanna try changing the directory permissions to 755 and re-running the job.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Hadoop in Windows - hadoop

You should look at hadoop services for windows - a version of hadoop that was ported to windows. currently it's in beta, but it's supposed to be released soon.

I've managed to get this working to the point where jobs are dispatched, tasks executed, and results compiled. en.wikisource.org/wiki/User:Fkorning/Code/Hadoop-on-Cygwin

Related

Mapreduce hadoop error with Gradle

Hadoop error on Windows : java.lang.UnsatisfiedLinkError

Exception running MRBenchmark on hadoop cluster

Not able to run MapReduce second time onwards in EC2 with S3 for storage

Running Hadoop in Windows 7 setting via Cygwin - PriviledgedActionException as:PC cause:java.io.IOException: Failed to set permissions of path:

Categories

Resources