Hadoop Map Task Fails When Using Compression In 2 Node Cluster. But Both Node Works Fine When Running as Single Node - hadoop

Node1 : hadoop2.5.2 RedhatLinux.el6 64bit
build 64bit native library and it's working
Node2 : hadoop2.5.2 RedhatLinux.el5 32bit
build 32bit native library and it's working
when running map reduce task as single node it works(with compression)
as multinode also it's working (without compression)
but as multinode with compression it's not working....
map task only finishing in one of the node(somtimes in node1, sometime in node2) in other node it is failed with error and job got failed.
Error: java.io.IOException: Spill failed at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.checkSpillException(MapTask.java:1535)
at . . Caused by: java.lang.RuntimeException: native lz4 library not
available at
org.apache.hadoop.io.compress.Lz4Codec.getCompressorType(Lz4Codec.java:124)
at
org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148)
at
i tried
<name>mapreduce.admin.user.env</name>
<value>LD_LIBRARY_PATH=$HADOOP_HOME/lib/native</value>
in mapred-site.xml
but still not working...
please suggest a solution...

Adding these properties in mapred-site.xml of the Hadoop node, in which the job is submitting solved the problem.
<property>
<name>yarn.app.mapreduce.am.admin.user.env</name>
<value>LD_LIBRARY_PATH={{HADOOP_COMMON_HOME}}/lib/native</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>LD_LIBRARY_PATH={{HADOOP_COMMON_HOME}}/lib/native</value>
</property>
<property>
<name>mapreduce.admin.user.env</name>
<value>LD_LIBRARY_PATH={{HADOOP_COMMON_HOME}}/lib/native</value>
</property>

Enable debug logs for hadoop in the machine where the exception is thrown.
restart hadoop process, post that you should be able to figure out based on logs of NativeCodeLoader as why native library is not loaded.
you can use below command to verify if native libraries are loaded or not.
hadoop checknative -a

Related

Hadoop - Mkdirs failed to create C:\Users\acer\AppData\Local\Temp\hadoop-unjar778 7707269774970262\META-INF\license

I installed Hadoop on a Windows machine in pseudo-distributed mode and tried to run a MapReduce job on it. The Namenode and Datanode ran without any problems, however, the MapReduce job kept failing with the error:
Exception in thread "main" java.io.IOException: Mkdirs failed to create C:\Users\acer\AppData\Local\Temp\hadoop-unjar778
7707269774970262\META-INF\license
at org.apache.hadoop.util.RunJar.ensureDirectory(RunJar.java:128)
at org.apache.hadoop.util.RunJar.unJar(RunJar.java:104)
at org.apache.hadoop.util.RunJar.unJar(RunJar.java:81)
at org.apache.hadoop.util.RunJar.run(RunJar.java:209)
I've checked that I already have full permission to that folder, and I also tried using maven-shade-plugin with no success.
Not sure what the issue but there are some todo
verify the folder permission with proper user for Temp\hadoop-unjar778
7707269774970262\META-INF (Can use chmod -R 777)
Check Namenode is running while executing MR
Node Managger service is running
Check the configuration:
For Hadoop 1.x:
<property>
<name>mapred.job.tracker</name>
<value>localhost:9101</value>
</property
For Hadoop 2.x:
<property>
<name>mapreduce.jobtracker.address</name>
<value>localhost:9101</value>
</property>

Mapreduce job gzip compression failure

I have setup a new cluster (using HDP on Windows ) and I am encountering a new problem which I haven't seen before.
When I run a simple word count problem from hadoop-examples jar the MapreduceV2 job fails with below error
5/05/16 18:58:29 INFO mapreduce.Job: Task Id : attempt_1431802381254_0001_r_000000_0, Status : FAILED
Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#15
Now,when I go to Application Master tracker and dig into logs I find that reducer is expecting a gzip file but the mapper output wasn’t
2015-05-16 18:45:20,864 WARN [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Failed to shuffle output of attempt_1431791182314_0011_m_000000_0 from <url>:13562
java.io.IOException: not a gzip file
When I specifically drill into Map phase log,I see this
2015-05-16 18:45:09,532 WARN [main] org.apache.hadoop.io.compress.zlib.ZlibFactory: Failed to load/initialize native-zlib library
2015-05-16 18:45:09,532 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor [.gz]
2015-05-16 18:45:09,532 WARN [main] org.apache.hadoop.mapred.IFile: Could not obtain compressor from CodecPool
I have the following configurations in my core-site.xml
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec</value>
<description>A list of the compression codec classes that can be used for compression/decompression.</description>
</property>
and in mapred-site.xml
<property>
<name>mapred.compress.map.output</name>
<value>true</value>
</property>
<property>
<name>mapred.map.output.compression.codec</name>
<value>org.apache.hadoop.io.compress.GzipCodec</value>
</property>
<property>
<name>mapred.output.compression.type</name>
<value>BLOCK</value>
</property>
<property>
<name>mapred.output.compress</name>
<value>true</value>
</property>
<property>
<name>mapred.output.compression.codec</name>
<value>org.apache.hadoop.io.compress.GzipCodec</value>
</property>
Now I realise this is pointing to error in native zlib dll loading,so I ran the job overriding options to run without compression and it does work.
I have downloaded the zlib.dll from zlib site and placed it in Hadoop/bin , C:\system32 and C:\SystemWOW64 folders and restarted the cluster services but still I have same error. Not sure why.I would appreciate any ideas to debug this further and resolve it
Hadoop 2.7.2
I ran into the same issue, when I built and ran hadoop 2.7.2 on windows 7. To resolve the issue you need to do the following:
1) On the Build Machine: set ZLIB_HOME to the zlib headers folder zlib_unzip_folder\zlib128-dll\include and build the distribution.
2) On the Run Machine make zlib1.dll zlib_unzip_folder\zlib128-dll\zlib1.dll available on the path.
I used zlib 1.2.8 and the download link can be found here: http://zlib.net/zlib128-dll.zip
Hadoop 2.4.1
This issue can also be reproduced on an older version of HADOOP by setting native lib as false and forcing map output to be compressed. For More detail you can see here: https://issues.apache.org/jira/browse/HADOOP-11334

How to use JobClient in hadoop2(yarn)

(Solved)I want to contact hadoop cluster and get some job/task information.
In hadoop1, I was able to use JobClient ( local pesudo distributed mode, use Eclipse):
JobClient jobClient = new JobClient(new InetSocketAddress("127.0.0.1",9001),new JobConf(config));
JobID job_id = JobID.forName("job_xxxxxx");
RunningJob job = jobClient.getJob(job_id);
.....
Today I set up a pesudo distributed hadoop2 YARN cluster, however, the above code doesn't work. I use the port of resource manager(8032).
JobClient jobClient = new JobClient(new InetSocketAddress("127.0.0.1",8032),new JobConf(config));
This line gives exception:
Exception in thread "main" java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
I search this exception but all solutions are not working. I use eclipse, and I have add all hadoop jars including hadoop-mapreduce-client-xxx. Also, I can successfully run example programs on my cluster.
Any suggestions on how to use JobClient on hadoop2 yarn?
Update: I am able to solve this issue by compile with the same hadoop lib as the rm server. In Eclipse it still gives this exception but after I compiled and deployed my project it works fine.(not sure why as in hadoop1 it works in eclipse) There is no need to change the api, JobClient is still functioning well in hadoop2
Have you configured the mapred-site.xml file as followed? It is located in $HADOOP_HOME/etc/hadoop/ in hadoop 2.x
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
edit: Also make sure that your yarn-site.xml (same location) contains the following property:
<property>
<name>yarn.resourcemanager.address</name>
<value>host:port</value>
</property>
One last thing: I strongly advise you to work with hostnames instead of IPs. There are known cases of failure with hadoop when IPs are set in the configuration files.

Installing Hadoop on NFS

As a start, I've installed Hadoop (0.15.2) and setup a cluster of 3 nodes: one each for NameNode, DataNode and the JobTracker. All the daemons are up and running. But when I issue any command I get the above error. For instance, when I do a copyFromLocal, I get the following error:
Am I missing something?
More details:
I am trying to install Hadoop on an NFS file system. I've installed 1.0.4 version and tried running it but to of no avail. The 1.0.4 version doesn't start the datanode. And the log files for the datanode are empty. Hence I switched back to 0.15 version which started all the daemons atleast.
I believe the problem is due to the underlying NFS file system i.e. all the datanodes and masters using the same files and folders. But I am not sure if that is actually the case.
But I don't see any reason why I shouldn't be able to run Hadoop on NFS (after appropriately setting the configuration parameters).
Currently I am trying and figuring out if I could set the name and data directories differently for different machines based on the individual machine names.
Configuration file: (hadoop-site.xml)
<property>
<name>fs.default.name</name>
<value>mumble-12.cs.wisc.edu:9001</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>mumble-13.cs.wisc.edu:9001</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.secondary.info.port</name>
<value>9002</value>
</property>
<property>
<name>dfs.info.port</name>
<value>9003</value>
</property>
<property>
<name>mapred.job.tracker.info.port</name>
<value>9004</value>
</property>
<property>
<name>tasktracker.http.port</name>
<value>9005</value>
</property>
Error using Hadoop 1.0.4 (DataNode doesn't get started):
2013-04-22 18:50:50,438 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 9001, call addBlock(/tmp/hadoop-akshar/mapred/system/jobtracker.info, DFSClient_502734479, null) from 128.105.112.13:37204: error: java.io.IOException: File /tmp/hadoop-akshar/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
java.io.IOException: File /tmp/hadoop-akshar/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
Error using Hadoop 0.15.2:
[akshar#mumble-12] (38)$ bin/hadoop fs -copyFromLocal lib/junit-3.8.1.LICENSE.txt input
13/04/17 03:22:11 WARN fs.DFSClient: Error while writing.
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:189)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at java.net.SocketInputStream.read(SocketInputStream.java:203)
at java.io.DataInputStream.readShort(DataInputStream.java:312)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:1660)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:1733)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:49)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:64)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:55)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:83)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:140)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:826)
at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:120)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:1360)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:1478)
13/04/17 03:22:12 WARN fs.DFSClient: Error while writing.
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:189)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at java.net.SocketInputStream.read(SocketInputStream.java:203)
at java.io.DataInputStream.readShort(DataInputStream.java:312)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:1660)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:1733)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:49)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:64)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:55)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:83)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:140)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:826)
at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:120)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:1360)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:1478)
13/04/17 03:22:12 WARN fs.DFSClient: Error while writing.
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:189)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at java.net.SocketInputStream.read(SocketInputStream.java:203)
at java.io.DataInputStream.readShort(DataInputStream.java:312)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:1660)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:1733)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:49)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:64)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:55)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:83)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:140)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:826)
at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:120)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:1360)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:1478)
copyFromLocal: Connection reset
I was able to get Hadoop to run over NFS using version 1.1.2. It might work for other versions, but I can't guarantee anything.
If you have an NFS file system then each node should have access to the filesystem. The fs.default.name tells Hadoop the filesystem URI to use, so it should be pointed to the local disk. I'll assume that your NFS directory is mounted to each node at /nfs.
In core-site.xml you should define:
<property>
<name>fs.default.name</name>
<value>file:///</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/nfs/tmp</value>
</property>
In mapred-site.xml you should define:
<property>
<name>mapred.job.tracker</name>
<value>node1:8021</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/tmp/mapred-local</value>
</property>
Since hadoop.tmp.dir is pointed to the nfs drive then the default locations of mapred.system.dir and mapreduce.jobtracker.staging.root.dir point to locations on the nfs drive. It might run with leaving the default value for mapred.local.dir, but it is supposed to point to the local filesystem so to be safe you can put that in /tmp.
You don't have to worry about hdfs-site.xml. This configuration file is used when you start the namenode, but with everything being distributed on the nfs drive you shouldn't run HDFS.
Now you can run start-mapred.sh on the jobtracker node and run a hadoop job. Don't run start-all.sh or start-dfs.sh because those will start HDFS. If you run multiple DataNodes that point to the same NFS directory, then one DataNode will lock that directory and the others will shutdown because they are unable to obtain a lock.
I tested the configuration with:
bin/hadoop jar hadoop-examples-1.1.2.jar wordcount /nfs/data/test.text /nfs/out
Note that you need to specify full paths to the input and output locations.
I also tried:
bin/hadoop jar hadoop-examples-1.1.2.jar grep /nfs/data/loremIpsum.txt /nfs/out2 lorem
It gave me the same output as when I run it in Standalone, so I assume it is performing correctly.
Here is more information on fs.default.name:
http://www.greenplum.com/blog/dive-in/usage-and-quirks-of-fs-default-name-in-hadoop-filesystem

Error running mapreduce sample in hadoop 0.23.6

I deployed Hadoop 0.23.6 in Ubuntu 12.04 LTS. I am able to copy files across and do file manipulation. I am using YARN for mapreduce.
I am getting the following error, when I am trying to run any mapreduce application using the hadoop-mapreduce-examples-0.23.6.jar
Command used:
bin/hadoop jar hadoop-mapreduce-examples-0.23.6.jar randomwriter -Dmapreduce.randomwriter.mapsperhost=1 -Dmapreduce.job.user.name=$USER -Dmapreduce.randomwriter.bytespermap=10000 -Ddfs.blocksize=536870912 -Ddfs.block.size=536870912 -libjars hadoop-mapreduce-client-app-0.23.6.jar output
Hadoop version: 0.23.6
Container launch failed for container_1364342550899_0001_01_000002 : java.lang.IllegalStateException: Invalid shuffle port number -1 returned for attempt_1364342550899_0001_m_000000_0
Verify your yarn-site.xml configuration. You need to have below properties configured.
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
For more details, have look at jira
https://issues.apache.org/jira/browse/MAPREDUCE-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Resources