Mapreduce job gzip compression failure - hadoop

I have setup a new cluster (using HDP on Windows ) and I am encountering a new problem which I haven't seen before.
When I run a simple word count problem from hadoop-examples jar the MapreduceV2 job fails with below error
5/05/16 18:58:29 INFO mapreduce.Job: Task Id : attempt_1431802381254_0001_r_000000_0, Status : FAILED
Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#15
Now,when I go to Application Master tracker and dig into logs I find that reducer is expecting a gzip file but the mapper output wasn’t
2015-05-16 18:45:20,864 WARN [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Failed to shuffle output of attempt_1431791182314_0011_m_000000_0 from <url>:13562
java.io.IOException: not a gzip file
When I specifically drill into Map phase log,I see this
2015-05-16 18:45:09,532 WARN [main] org.apache.hadoop.io.compress.zlib.ZlibFactory: Failed to load/initialize native-zlib library
2015-05-16 18:45:09,532 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor [.gz]
2015-05-16 18:45:09,532 WARN [main] org.apache.hadoop.mapred.IFile: Could not obtain compressor from CodecPool
I have the following configurations in my core-site.xml
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec</value>
<description>A list of the compression codec classes that can be used for compression/decompression.</description>
</property>
and in mapred-site.xml
<property>
<name>mapred.compress.map.output</name>
<value>true</value>
</property>
<property>
<name>mapred.map.output.compression.codec</name>
<value>org.apache.hadoop.io.compress.GzipCodec</value>
</property>
<property>
<name>mapred.output.compression.type</name>
<value>BLOCK</value>
</property>
<property>
<name>mapred.output.compress</name>
<value>true</value>
</property>
<property>
<name>mapred.output.compression.codec</name>
<value>org.apache.hadoop.io.compress.GzipCodec</value>
</property>
Now I realise this is pointing to error in native zlib dll loading,so I ran the job overriding options to run without compression and it does work.
I have downloaded the zlib.dll from zlib site and placed it in Hadoop/bin , C:\system32 and C:\SystemWOW64 folders and restarted the cluster services but still I have same error. Not sure why.I would appreciate any ideas to debug this further and resolve it

Hadoop 2.7.2
I ran into the same issue, when I built and ran hadoop 2.7.2 on windows 7. To resolve the issue you need to do the following:
1) On the Build Machine: set ZLIB_HOME to the zlib headers folder zlib_unzip_folder\zlib128-dll\include and build the distribution.
2) On the Run Machine make zlib1.dll zlib_unzip_folder\zlib128-dll\zlib1.dll available on the path.
I used zlib 1.2.8 and the download link can be found here: http://zlib.net/zlib128-dll.zip
Hadoop 2.4.1
This issue can also be reproduced on an older version of HADOOP by setting native lib as false and forcing map output to be compressed. For More detail you can see here: https://issues.apache.org/jira/browse/HADOOP-11334

Related

Ranger Coprocessor error in HBase (Vanilla hadoop)

Setting up the ranger on the vanilla version of Hadoop
Ranger 2.1
hdfs plugin & hive are working fine,
but I can't solve the error with hbase plugin
I ran into an error when enabling the ranger plugin
September 15, 09: 04:06 dn01 hbase[504922]: 2021-09-15 09:04:06,200 ERROR [dn01:60000.activeMasterManager] of the coprocessor.Coprocessor Host: Coprocessor org.apache.ranger.authorization.hbase.rangerauthorizationprocessor threw java. lang. NoClassDefFoundError: org/apache/hadoop/hbase/coprocessor/regionprocessor
FATAL [dn01:60000.activeMasterManager] master.HMaster: The coprocessor org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor threw java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/coprocessor/RegionCoprocessor
In hbase-site.xml when the plugin is enabled, it adds
<property>
<name>hbase.security.authorization</name>
<value>true</value>
</property>
<property>
<name>hbase.coprocessor.master.classes</name>
<value>org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor</value>
</property>
<property>
<name>hbase.coprocessor.region.classes</name>
<value>org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor</value>
</property>
when enabled, the plugin copies its libraries to the local path where hbase is installed or creates symbolic links
Do you have any ideas how to fix this error?
Ranger 2.1
HBase 1.4.12
Maybe there is a version conflict?
Thanks!

Hadoop Map Task Fails When Using Compression In 2 Node Cluster. But Both Node Works Fine When Running as Single Node

Node1 : hadoop2.5.2 RedhatLinux.el6 64bit
build 64bit native library and it's working
Node2 : hadoop2.5.2 RedhatLinux.el5 32bit
build 32bit native library and it's working
when running map reduce task as single node it works(with compression)
as multinode also it's working (without compression)
but as multinode with compression it's not working....
map task only finishing in one of the node(somtimes in node1, sometime in node2) in other node it is failed with error and job got failed.
Error: java.io.IOException: Spill failed at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.checkSpillException(MapTask.java:1535)
at . . Caused by: java.lang.RuntimeException: native lz4 library not
available at
org.apache.hadoop.io.compress.Lz4Codec.getCompressorType(Lz4Codec.java:124)
at
org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148)
at
i tried
<name>mapreduce.admin.user.env</name>
<value>LD_LIBRARY_PATH=$HADOOP_HOME/lib/native</value>
in mapred-site.xml
but still not working...
please suggest a solution...
Adding these properties in mapred-site.xml of the Hadoop node, in which the job is submitting solved the problem.
<property>
<name>yarn.app.mapreduce.am.admin.user.env</name>
<value>LD_LIBRARY_PATH={{HADOOP_COMMON_HOME}}/lib/native</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>LD_LIBRARY_PATH={{HADOOP_COMMON_HOME}}/lib/native</value>
</property>
<property>
<name>mapreduce.admin.user.env</name>
<value>LD_LIBRARY_PATH={{HADOOP_COMMON_HOME}}/lib/native</value>
</property>
Enable debug logs for hadoop in the machine where the exception is thrown.
restart hadoop process, post that you should be able to figure out based on logs of NativeCodeLoader as why native library is not loaded.
you can use below command to verify if native libraries are loaded or not.
hadoop checknative -a

HMaster not starting up

I have configured Hadoop 2.6.0 successfully. Next, I am trying to install Hbase 0.98.9 but am having trouble starting up Hbase.
I get the below error message:
Error: Could not find or load main class org.apache.hadoop.hbase.util.HBaseConfTool
Error: Could not find or load main class org.apache.hadoop.hbase.zookeeper.ZKServerTool
starting master, logging to /usr/local/hbase/logs/hbase-yarn-master-hadoopmaster.out
Error: Could not find or load main class org.apache.hadoop.hbase.master.HMaster
localhost:
starting regionserver, logging to /usr/local/hbase/bin/../logs/hbase-yarn-regionserver-hadoopmaster.out
localhost: Error: Could not find or load main class org.apache.hadoop.hbase.regionserver.HRegionServer
And, this is my hbase-site.xml file
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://hadoopmaster:9000/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/yarn/hbase/zookeeper</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
</configuration>
Please let me know what is wrong with my configuration.
Regards.
Add this line in hadoop-env.sh:
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/path/to/hbase/jars
NOTE: Change /path/to/hbase/jars to hbase jars location. If possible add all available hbase jar files to hadoop classpath (to avoid future class problems).

Installing Hadoop on NFS

As a start, I've installed Hadoop (0.15.2) and setup a cluster of 3 nodes: one each for NameNode, DataNode and the JobTracker. All the daemons are up and running. But when I issue any command I get the above error. For instance, when I do a copyFromLocal, I get the following error:
Am I missing something?
More details:
I am trying to install Hadoop on an NFS file system. I've installed 1.0.4 version and tried running it but to of no avail. The 1.0.4 version doesn't start the datanode. And the log files for the datanode are empty. Hence I switched back to 0.15 version which started all the daemons atleast.
I believe the problem is due to the underlying NFS file system i.e. all the datanodes and masters using the same files and folders. But I am not sure if that is actually the case.
But I don't see any reason why I shouldn't be able to run Hadoop on NFS (after appropriately setting the configuration parameters).
Currently I am trying and figuring out if I could set the name and data directories differently for different machines based on the individual machine names.
Configuration file: (hadoop-site.xml)
<property>
<name>fs.default.name</name>
<value>mumble-12.cs.wisc.edu:9001</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>mumble-13.cs.wisc.edu:9001</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.secondary.info.port</name>
<value>9002</value>
</property>
<property>
<name>dfs.info.port</name>
<value>9003</value>
</property>
<property>
<name>mapred.job.tracker.info.port</name>
<value>9004</value>
</property>
<property>
<name>tasktracker.http.port</name>
<value>9005</value>
</property>
Error using Hadoop 1.0.4 (DataNode doesn't get started):
2013-04-22 18:50:50,438 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 9001, call addBlock(/tmp/hadoop-akshar/mapred/system/jobtracker.info, DFSClient_502734479, null) from 128.105.112.13:37204: error: java.io.IOException: File /tmp/hadoop-akshar/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
java.io.IOException: File /tmp/hadoop-akshar/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
Error using Hadoop 0.15.2:
[akshar#mumble-12] (38)$ bin/hadoop fs -copyFromLocal lib/junit-3.8.1.LICENSE.txt input
13/04/17 03:22:11 WARN fs.DFSClient: Error while writing.
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:189)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at java.net.SocketInputStream.read(SocketInputStream.java:203)
at java.io.DataInputStream.readShort(DataInputStream.java:312)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:1660)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:1733)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:49)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:64)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:55)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:83)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:140)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:826)
at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:120)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:1360)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:1478)
13/04/17 03:22:12 WARN fs.DFSClient: Error while writing.
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:189)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at java.net.SocketInputStream.read(SocketInputStream.java:203)
at java.io.DataInputStream.readShort(DataInputStream.java:312)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:1660)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:1733)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:49)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:64)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:55)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:83)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:140)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:826)
at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:120)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:1360)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:1478)
13/04/17 03:22:12 WARN fs.DFSClient: Error while writing.
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:189)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at java.net.SocketInputStream.read(SocketInputStream.java:203)
at java.io.DataInputStream.readShort(DataInputStream.java:312)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:1660)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:1733)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:49)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:64)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:55)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:83)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:140)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:826)
at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:120)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:1360)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:1478)
copyFromLocal: Connection reset
I was able to get Hadoop to run over NFS using version 1.1.2. It might work for other versions, but I can't guarantee anything.
If you have an NFS file system then each node should have access to the filesystem. The fs.default.name tells Hadoop the filesystem URI to use, so it should be pointed to the local disk. I'll assume that your NFS directory is mounted to each node at /nfs.
In core-site.xml you should define:
<property>
<name>fs.default.name</name>
<value>file:///</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/nfs/tmp</value>
</property>
In mapred-site.xml you should define:
<property>
<name>mapred.job.tracker</name>
<value>node1:8021</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/tmp/mapred-local</value>
</property>
Since hadoop.tmp.dir is pointed to the nfs drive then the default locations of mapred.system.dir and mapreduce.jobtracker.staging.root.dir point to locations on the nfs drive. It might run with leaving the default value for mapred.local.dir, but it is supposed to point to the local filesystem so to be safe you can put that in /tmp.
You don't have to worry about hdfs-site.xml. This configuration file is used when you start the namenode, but with everything being distributed on the nfs drive you shouldn't run HDFS.
Now you can run start-mapred.sh on the jobtracker node and run a hadoop job. Don't run start-all.sh or start-dfs.sh because those will start HDFS. If you run multiple DataNodes that point to the same NFS directory, then one DataNode will lock that directory and the others will shutdown because they are unable to obtain a lock.
I tested the configuration with:
bin/hadoop jar hadoop-examples-1.1.2.jar wordcount /nfs/data/test.text /nfs/out
Note that you need to specify full paths to the input and output locations.
I also tried:
bin/hadoop jar hadoop-examples-1.1.2.jar grep /nfs/data/loremIpsum.txt /nfs/out2 lorem
It gave me the same output as when I run it in Standalone, so I assume it is performing correctly.
Here is more information on fs.default.name:
http://www.greenplum.com/blog/dive-in/usage-and-quirks-of-fs-default-name-in-hadoop-filesystem

Error running mapreduce sample in hadoop 0.23.6

I deployed Hadoop 0.23.6 in Ubuntu 12.04 LTS. I am able to copy files across and do file manipulation. I am using YARN for mapreduce.
I am getting the following error, when I am trying to run any mapreduce application using the hadoop-mapreduce-examples-0.23.6.jar
Command used:
bin/hadoop jar hadoop-mapreduce-examples-0.23.6.jar randomwriter -Dmapreduce.randomwriter.mapsperhost=1 -Dmapreduce.job.user.name=$USER -Dmapreduce.randomwriter.bytespermap=10000 -Ddfs.blocksize=536870912 -Ddfs.block.size=536870912 -libjars hadoop-mapreduce-client-app-0.23.6.jar output
Hadoop version: 0.23.6
Container launch failed for container_1364342550899_0001_01_000002 : java.lang.IllegalStateException: Invalid shuffle port number -1 returned for attempt_1364342550899_0001_m_000000_0
Verify your yarn-site.xml configuration. You need to have below properties configured.
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
For more details, have look at jira
https://issues.apache.org/jira/browse/MAPREDUCE-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Resources