java.io.IOException: Failed to add a datanode. HDFS (Hadoop) - hadoop

I am faced with the error while appending file on HDFS (cloudera 2.0.0-cdh4.2.0).
The use case that cause an error is:
Create file on file system (DistributedFileSystem). OK
Append earlier created file. ERROR
OutputStream stream = FileSystem.append(filePath);
stream.write(fileContents);
Then error is thrown:
Exception in thread "main" java.io.IOException: Failed to add a datanode.
User may turn off this feature by setting dfs.client.block.write.replace-datanode-on- failure.policy in configuration, where the current policy is DEFAULT. (Nodes: current=[host1:50010, host2:50010], original=[host1:50010, host2:50010])
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:792)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:852)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:958)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:469)
Some related hdfs configs:
dfs.replication set to 2
dfs.client.block.write.replace-datanode-on-failure.policy set to true
dfs.client.block.write.replace-datanode-on-failure set to DEFAULT
Any ideas?
Thanks!

Problem was solved by running on file system
hadoop dfs -setrep -R -w 2 /
Old files on file system had replication factor set to 3,
setting dfs.replication to 2 in hdfs-site.xml will not solve the problem
as this config will not apply to already existing files.
So, if u remove machines from cluster you better check files and system replication factor

Related

java.io.EOFException: Premature EOF: no length prefix available in Spark on Hadoop

I'm getting this weird exception. I'm using Spark 1.6.0 on Hadoop 2.6.4 and submitting Spark job on YARN cluster.
16/07/23 20:05:21 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-532134798-128.110.152.143-1469321545728:blk_1073741865_1041
java.io.EOFException: Premature EOF: no length prefix available
at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2203)
at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:176)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:867)
16/07/23 20:49:09 ERROR server.TransportRequestHandler: Error sending result RpcResponse{requestId=4719626006875125240, body=NioManagedBuffer{buf=java.nio.HeapByteBuffer[pos=0 lim=81 cap=81]}} to ms0440.utah.cloudlab.us/128.110.152.175:58944; closing connection
java.nio.channels.ClosedChannelException
I was getting this error when running on Hadoop 2.6.0 and thought the exception might be kind of a bug like this but after even changing this to Hadoop 2.6.4 I'm getting the same error. There is not any memory problem, my cluster is good with HDFS and memory. I went through this and this but no luck.
Note: 1. I'm using Apache Hadoop and Spark not any CDH/HDP. 2. I'm able to copy data in HDFS and even able to execute another job on this cluster.
Check file permissions of dfs directory:
find /path/to/dfs -group root
In general, the user permission group is hdfs.
Since I started HDFS service with root user, some dfs block file with root permissions was generated.
I solved the problem after change to right permissions:
sudo chown -R hdfs:hdfs /path/to/dfs

Error with flume and remote hdfs sink

I'm trying to run flume with an hdfs sink. The hdfs is running in a different machine properly and I can even interact with the hdfs from the flume machine, but when I run flume and send events to it I get the following error:
2013-05-26 14:22:11,399 (SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:456)] HDFS IO error
java.io.IOException: Callable timed out after 25000 ms
at org.apache.flume.sink.hdfs.HDFSEventSink.callWithTimeout(HDFSEventSink.java:352)
at org.apache.flume.sink.hdfs.HDFSEventSink.append(HDFSEventSink.java:727)
at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:430)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:679)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:258)
at java.util.concurrent.FutureTask.get(FutureTask.java:119)
at org.apache.flume.sink.hdfs.HDFSEventSink.callWithTimeout(HDFSEventSink.java:345)
... 5 more
Again, conectivity is not an issue since I can interact with hdfs using the hadoop command line (the flume machine is NOT a datanode).
The weirdest part is that after killing flume I can see that the tmp file is created in hdfs but it's empty (and the .tmp extension remains).
Any ideas as to why could this be happening? Thanks a lot!
Check 3 things, if your firewall is off i.e. iptables should be stopped. Secondly, value of the property agent.sinks.hdfs-sink.hdfs.path = hdfs://PUBLIC_IP:8020/user/hdfs/flume and not Private IP.
And change
agent.sinks.hdfs-sink.hdfs.callTimeout = 180000 because the default is 10000 ms which is very less time for HDFS to react.
Thanks,
Shilpa

hadoop hdfs points to file:/// not hdfs://

So I installed Hadoop via Cloudera Manager cdh3u5 on CentOS 5. When I run cmd
hadoop fs -ls /
I expected to see the contents of hdfs://localhost.localdomain:8020/
However, it had returned the contents of file:///
Now, this goes without saying that I can access my hdfs:// through
hadoop fs -ls hdfs://localhost.localdomain:8020/
But when it came to installing other applications such as Accumulo, accumulo would automatically detect Hadoop Filesystem in file:///
Question is, has anyone ran into this issue and how did you resolve it?
I had a look at HDFS thrift server returns content of local FS, not HDFS , which was a similar issue, but did not solve this issue.
Also, I do not get this issue with Cloudera Manager cdh4.
By default, Hadoop is going to use local mode. You probably need to set fs.default.name to hdfs://localhost.localdomain:8020/ in $HADOOP_HOME/conf/core-site.xml.
To do this, you add this to core-site.xml:
<property>
<name>fs.default.name</name>
<value>hdfs://localhost.localdomain:8020/</value>
</property>
The reason why Accumulo is confused is because it's using the same default configuration to figure out where HDFS is... and it's defaulting to file://
We should specify data node data directory and name node meta data directory.
dfs.name.dir,
dfs.namenode.name.dir,
dfs.data.dir,
dfs.datanode.data.dir,
fs.default.name
in core-site.xml file and format name node.
To format HDFS Name Node:
hadoop namenode -format
Enter 'Yes' to confirm formatting name node. Restart HDFS service and deploy client configuration to access HDFS.
If you have already did the above steps. Ensure client configuration is deployed correctly and it points to the actual cluster endpoints.

Does a file need to be in HDFS in order to use it in distributed cache?

I get
Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: file:/path/to/my.jar, expected: hdfs://ec2-xx-xx-xx-xxx.compute-1.amazonaws.com
if I try to add a local file to distributed cache in hadoop. When the file is on HDFS, I don't get this error (obviously, since it's using the expected FS). Is there a way to use a local file in distributed cache without first copying it to hdfs? Here is a code snippet:
Configuration conf = job.getConfiguration();
FileSystem fs = FileSystem.getLocal(conf);
Path dependency = fs.makeQualified(new Path("/local/path/to/my.jar");
DistributedCache.addArchiveToClassPath(path, conf);
Thanks
It has to be in HDFS first. I'm going to go out on a limb here, but I think it is because the file is "pulled" to the local distributed cache by the slaves, not pushed. Since they are pulled, they have no way to access that local path.
No, I don't think you can put anything on the distributed cache without it being in HDFS first. All Hadoop jobs use input/output path in relation to HDFS.
File can be either in local system, hdfs, S3 or other cluster also. You need to specify as
-files hdfs:// if the file is in hdfs
by default it assumes local file system.

how can I start mapred for Hadoop?

I set up a Hadoop test environment on three virtual CentOS 6.2. All things go smoothly but start-mapred .
what I always get is:
2012-02-03 19:54:17,592 WARN org.apache.hadoop.mapred.JobTracker: Failed to initialize recovery manager.
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /home/hadoop/devx/var/tmp/mapred/system/jobtracker.info could o
nly be replicated to 0 nodes, instead of 1
and I have checked nameSpaceID on name node and data node, they are identical.
is there anything I am doing wrong?
Hadoop is so sensitive to host names and IPs

Resources