HDFS + cant copy file from HDFS to local folder - hadoop

we are trying to copy the file from /hdp/apps/2.6.5.0-292/hive/hive.tar.gz to local folder /var/tmp
as we can see we get hdfs.DFSClient: Could not obtain and No live nodes contain current block Block locations: Dead nodes: . Throwing a BlockMissingException
and finally the file not copied to local folder - /var/tmp
we also try to copy other files under /hdp/apps/2.6.5.0-292 to local folder - /var/tmp
but we get the same errors
any idea what could be the reason for this issues?
NOTE - we chacked the HDFS helth check from ambari and HDFS is fine
hdfs dfs -copyToLocal /hdp/apps/2.6.5.0-292/hive/hive.tar.gz /var/tmp
20/08/04 09:07:12 INFO hdfs.DFSClient: No node available for BP-551390946-23.1.22.254-1596451810664:blk_1073741831_1007 file=/hdp/apps/2.6.5.0-292/hive/hive.tar.gz
20/08/04 09:07:12 INFO hdfs.DFSClient: Could not obtain BP-551390946-23.1.22.254-1596451810664:blk_1073741831_1007 from any node: java.io.IOException: No live nodes contain block BP-551390946-23.1.22.254-1596451810664:blk_1073741831_1007 after checking nodes = [], ignoredNodes = null No live nodes contain current block Block locations: Dead nodes: . Will get new block locations from namenode and retry...
20/08/04 09:07:12 WARN hdfs.DFSClient: DFS chooseDataNode: got # 1 IOException, will wait for 916.7101213444472 msec.
20/08/04 09:07:12 INFO hdfs.DFSClient: No node available for BP-551390946-23.1.22.254-1596451810664:blk_1073741831_1007 file=/hdp/apps/2.6.5.0-292/hive/hive.tar.gz
20/08/04 09:07:12 INFO hdfs.DFSClient: Could not obtain BP-551390946-23.1.22.254-1596451810664:blk_1073741831_1007 from any node: java.io.IOException: No live nodes contain block BP-551390946-23.1.22.254-1596451810664:blk_1073741831_1007 after checking nodes = [], ignoredNodes = null No live nodes contain current block Block locations: Dead nodes: . Will get new block locations from namenode and retry...
20/08/04 09:07:12 WARN hdfs.DFSClient: DFS chooseDataNode: got # 2 IOException, will wait for 8364.841990287568 msec.
20/08/04 09:07:21 INFO hdfs.DFSClient: No node available for BP-551390946-23.1.22.254-1596451810664:blk_1073741831_1007 file=/hdp/apps/2.6.5.0-292/hive/hive.tar.gz
20/08/04 09:07:21 INFO hdfs.DFSClient: Could not obtain BP-551390946-23.1.22.254-1596451810664:blk_1073741831_1007 from any node: java.io.IOException: No live nodes contain block BP-551390946-23.1.22.254-1596451810664:blk_1073741831_1007 after checking nodes = [], ignoredNodes = null No live nodes contain current block Block locations: Dead nodes: . Will get new block locations from namenode and retry...
20/08/04 09:07:21 WARN hdfs.DFSClient: DFS chooseDataNode: got # 3 IOException, will wait for 14554.977191829808 msec.
20/08/04 09:07:35 WARN hdfs.DFSClient: Could not obtain block: BP-551390946-23.1.22.254-1596451810664:blk_1073741831_1007 file=/hdp/apps/2.6.5.0-292/hive/hive.tar.gz No live nodes contain current block Block locations: Dead nodes: . Throwing a BlockMissingException
20/08/04 09:07:35 WARN hdfs.DFSClient: Could not obtain block: BP-551390946-23.1.22.254-1596451810664:blk_1073741831_1007 file=/hdp/apps/2.6.5.0-292/hive/hive.tar.gz No live nodes contain current block Block locations: Dead nodes: . Throwing a BlockMissingException
20/08/04 09:07:35 WARN hdfs.DFSClient: DFS Read
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-551390946-23.1.22.254-1596451810664:blk_1073741831_1007 file=/hdp/apps/2.6.5.0-292/hive/hive.tar.gz
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:995)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:638)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:888)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:945)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:88)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:62)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:122)
at org.apache.hadoop.fs.shell.CommandWithDestination$TargetFileSystem.writeStreamToFile(CommandWithDestination.java:467)
at org.apache.hadoop.fs.shell.CommandWithDestination.copyStreamToTarget(CommandWithDestination.java:392)
at org.apache.hadoop.fs.shell.CommandWithDestination.copyFileToTarget(CommandWithDestination.java:329)
at org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:264)
at org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:249)
at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317)
at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289)
at org.apache.hadoop.fs.shell.CommandWithDestination.processPathArgument(CommandWithDestination.java:244)
at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
at org.apache.hadoop.fs.shell.CommandWithDestination.processArguments(CommandWithDestination.java:221)
at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:119)
at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:297)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:356)
copyToLocal: Could not obtain block: BP-551390946-23.1.22.254-1596451810664:blk_1073741831_1007 file=/hdp/apps/2.6.5.0-292/hive/hive.tar.gz

Please run the below command and check if blocks are corrupted?
hdfs fsck /
If there are corrupted blocks then you might need to follow a recovery process.
For recovery you can follow the link
https://blog.cloudera.com/understanding-hdfs-recovery-processes-part-1/

Related

No node error when executing Hadoop command

Getting error when executing a 'cat' command in Hadoop. Tried increasing space to the node still getting the error:
INFO hdfs.DFSClient: No node available for BP-333635372-127.0.0.1-1508779710286:blk_1073743948_3135 file=/user/cloudera/sqoop_import/departments/part-m-00000
INFO hdfs.DFSClient: Could not obtain BP-333635372-127.0.0.1-1508779710286:blk_1073743948_3135 from any node: No live nodes contain current block Block locations: Dead nodes: . Will get new block locations from namenode and retry...

How to put large data sets in HDFS?

I've tried to put large datasets(about 200 folders) in HDFS.
But I got errors:
WARN hdfs.DFSClient: Slow waitForAckedSeqno took 72699ms;
INFO hdfs.DFSClient: Excluding datanode DatanodeInfoWithStorage[192.168.111.3:50010;
java.io.IOException: Got error, status message, ask with firstBadLink as 192.168.111.3:50010
at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:140)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1363)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1266)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
I used this command for the number of folders, not at once: hdfs dfs -put "eache folder" /hadoopPath
Is there a solution to address these errors?

Hadoop : java.io.IOException: No live nodes contain current block. Will get new block locations from namenode and retry

I am using Hadoop in a pseudo-distributed mode and everything was working fine.But whenever I restart my computer namenode goes into safenode. In order to forcefully let the namenode leave safemode, I am using $ bin/hadoop dfsadmin -safemode leavecommand. but after that I have a situation here that I'm wondering where it's coming from. When I use -ls, I can see files but when I try to get the file, I'm not able to retrieve this block. I am getting following error
$ hadoop fs -cat /user/op/part-r-00000
13/11/21 12:45:12 INFO hdfs.DFSClient: No node available for block: blk_-4538200827997952429_1071 file=/user/op/part-r-00000
13/11/21 12:45:12 INFO hdfs.DFSClient: Could not obtain block blk_-4538200827997952429_1071 from any node: java.io.IOException: No live nodes contain current block. Will get new block locations from namenode and retry...
13/11/21 12:45:15 INFO hdfs.DFSClient: No node available for block: blk_-4538200827997952429_1071 file=/user/op/part-r-00000
13/11/21 12:45:15 INFO hdfs.DFSClient: Could not obtain block blk_-4538200827997952429_1071 from any node: java.io.IOException: No live nodes contain current block. Will get new block locations from namenode and retry...
13/11/21 12:45:18 INFO hdfs.DFSClient: No node available for block: blk_-4538200827997952429_1071 file=/user/op/part-r-00000
13/11/21 12:45:18 INFO hdfs.DFSClient: Could not obtain block blk_-4538200827997952429_1071 from any node: java.io.IOException: No live nodes contain current block. Will get new block locations from namenode and retry...
13/11/21 12:45:21 WARN hdfs.DFSClient: DFS Read: java.io.IOException: Could not obtain block: blk_-4538200827997952429_1071 file=/user/op/part-r-00000
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:2426)
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:2218)
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2381)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:68)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:100)
at org.apache.hadoop.fs.FsShell.printToStdout(FsShell.java:114)
at org.apache.hadoop.fs.FsShell.access$100(FsShell.java:49)
at org.apache.hadoop.fs.FsShell$1.process(FsShell.java:349)
at org.apache.hadoop.fs.FsShell$DelayedExceptionThrowing.globAndProcess(FsShell.java:1913)
at org.apache.hadoop.fs.FsShell.cat(FsShell.java:346)
at org.apache.hadoop.fs.FsShell.doall(FsShell.java:1557)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:1776)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:1895)
cat: Could not obtain block: blk_-4538200827997952429_1071 file=/user/op/part-r-00000

ERROR namenode.FSNamesystem: FSNamesystem initialization failed

I'm running hadoop in pseudo distributed mode on an ubuntu VM. I recently decided to increase the RAM and number of cores available to my VM, and that seems to have completely screwed hdfs. First, it was in safemode and I manually released that using:
hadoop dfsadmin -safemode leave
Then I ran:
hadoop fsck -blocks
and practically every block was corrupt or missing. So I figured, this is just for my learning, I deleted everything in "/user/msknapp" and everything in "/var/lib/hadoop-0.20/cache/mapred/mapred/.settings". So the block errors were gone. Then I try:
hadoop fs -put myfile myfile
and get (abridged):
12/01/07 15:05:29 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/msknapp/myfile could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1490)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:653)
at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
12/01/07 15:05:29 WARN hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null
12/01/07 15:05:29 WARN hdfs.DFSClient: Could not get block locations. Source file "/user/msknapp/myfile" - Aborting...
put: java.io.IOException: File /user/msknapp/myfile could only be replicated to 0 nodes, instead of 1
12/01/07 15:05:29 ERROR hdfs.DFSClient: Exception closing file /user/msknapp/myfile : org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/msknapp/myfile could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1490)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:653)
at ...
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/msknapp/myfile could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1490)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:653)
at ...
So I tried to stop and restart the namenode and datanode. No luck:
hadoop namenode
12/01/07 15:13:47 ERROR namenode.FSNamesystem: FSNamesystem initialization failed.
java.io.FileNotFoundException: /var/lib/hadoop-0.20/cache/hadoop/dfs/name/image/fsimage (Permission denied)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233)
at org.apache.hadoop.hdfs.server.namenode.FSImage.isConversionNeeded(FSImage.java:683)
at org.apache.hadoop.hdfs.server.common.Storage.checkConversionNeeded(Storage.java:690)
at org.apache.hadoop.hdfs.server.common.Storage.access$000(Storage.java:60)
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:469)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:297)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:99)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:358)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:327)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:271)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:465)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1239)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1248)
12/01/07 15:13:47 ERROR namenode.NameNode: java.io.FileNotFoundException: /var/lib/hadoop-0.20/cache/hadoop/dfs/name/image/fsimage (Permission denied)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233)
at org.apache.hadoop.hdfs.server.namenode.FSImage.isConversionNeeded(FSImage.java:683)
at org.apache.hadoop.hdfs.server.common.Storage.checkConversionNeeded(Storage.java:690)
at org.apache.hadoop.hdfs.server.common.Storage.access$000(Storage.java:60)
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:469)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:297)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:99)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:358)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:327)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:271)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:465)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1239)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1248)
12/01/07 15:13:47 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ubuntu/127.0.1.1
************************************************************/
Would somebody please help me out here? I have been trying to fix this for hours.
Go into where you have configured the hdfs. delete everything there, format namenode and you are good to go. It usually happens if you don't shut down your cluster properly!
Following error means, fsimage file not have permission
namenode.NameNode: java.io.FileNotFoundException: /var/lib/hadoop-0.20/cache/hadoop/dfs/name/image/fsimage (Permission denied)
so give permission to fsimage file,
$chmod -R 777 fsimage

Hadoop pseudo-distributed mode error

I have set-up Hadoop on a OpenSuse 11.2 VM using Virtualbox.I have made the prerequisite configs. I ran this example in the Standalone mode successfully.
But in psuedo-distributed mode I get the following error:
$./bin/hadoop fs -put conf input
10/04/13 15:56:25 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.net.SocketException: Protocol not available
10/04/13 15:56:25 INFO hdfs.DFSClient: Abandoning block blk_-8490915989783733314_1003
10/04/13 15:56:31 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.net.SocketException: Protocol not available
10/04/13 15:56:31 INFO hdfs.DFSClient: Abandoning block blk_-1740343312313498323_1003
10/04/13 15:56:37 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.net.SocketException: Protocol not available
10/04/13 15:56:37 INFO hdfs.DFSClient: Abandoning block blk_-3566235190507929459_1003
10/04/13 15:56:43 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.net.SocketException: Protocol not available
10/04/13 15:56:43 INFO hdfs.DFSClient: Abandoning block blk_-1746222418910980888_1003
10/04/13 15:56:49 WARN hdfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable to create new block.
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2845)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
10/04/13 15:56:49 WARN hdfs.DFSClient: Error Recovery for block blk_-1746222418910980888_1003 bad datanode[0] nodes == null
10/04/13 15:56:49 WARN hdfs.DFSClient: Could not get block locations. Source file "/user/max/input/core-site.xml" - Aborting...
put: Protocol not available
10/04/13 15:56:49 ERROR hdfs.DFSClient: Exception closing file /user/max/input/core-site.xml : java.net.SocketException: Protocol not available
java.net.SocketException: Protocol not available
at sun.nio.ch.Net.getIntOption0(Native Method)
at sun.nio.ch.Net.getIntOption(Net.java:178)
at sun.nio.ch.SocketChannelImpl$1.getInt(SocketChannelImpl.java:419)
at sun.nio.ch.SocketOptsImpl.getInt(SocketOptsImpl.java:60)
at sun.nio.ch.SocketOptsImpl.sendBufferSize(SocketOptsImpl.java:156)
at sun.nio.ch.SocketOptsImpl$IP$TCP.sendBufferSize(SocketOptsImpl.java:286)
at sun.nio.ch.OptionAdaptor.getSendBufferSize(OptionAdaptor.java:129)
at sun.nio.ch.SocketAdaptor.getSendBufferSize(SocketAdaptor.java:328)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2873)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2826)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
It seems like there are no live data nodes in the cluster. Did you check whether status page shows live nodes? http://localhost:50070/
Start all Hadoop daemons using command $ bin/start-all.sh .
Did you start the Hadoop demons. This needs to be done in psuedo-dist mode unlike the standalone mode. You start them using something like:
$bin\start-all.sh
Documentation for the steps required can be found here.
Did you follow all these steps? Can you browse the NameNode and JobTracker web interfaces?
Maybe try using preconfigured virtual machine? http://www.cloudera.com/developers/downloads/virtual-machine/ I think this is probably the best way to start learning hadoop, and those problems should not happened there.

Resources