Hadoop fsck shows missing replicas - hadoop

I am running Hadoop 2.2.0 cluster with two datanodes and one namenode. When I try checking the system using hadoop fsck command on namenode or any of the datanodes, I get the following:
Target Replicas is 3 but found 2 replica(s).
I tried changing the configuration in hdfs-site.xml (dfs.replication to 2 ) and restarted the cluster services. On running hadoop fsck / it is still showing the same status:
Target Replicas is 3 but found 2 replica(s).
Please clarify, is this a caching issue or a bug?

By setting dfs.replication does not bring down your replication. this property will be referred only when a files is created whose replication is not specified. For changing the replication following hadoop utility could be used
hadoop fs -setrep [-R] [-w] <rep> <path/file>
or
hdfs dfs -setrep [-R] [-w] <rep> <path/file>
Here / also can be specified for changing the replication factor of the complete filesystem.

Related

Hadoop errorcode -1000, No space available in any of the local directories

I'm using Windows 7 with Hadoop 2.10.1 installed as shown here: https://exitcondition.com/install-hadoop-windows/ and I get an error when running my job:
INFO mapreduce.Job:
Job job_1605374051781_0001 failed with state FAILED due to:
Application application_1605374051781_0001 failed 2 times
due to AM Container for appattempt_1605374051781_0001_000002 exited with
exitCode: -1000 Failing this attempt.Diagnostics:
[2020-11-14 18:17:54.217]No space available in any of the local directories.
The expected output is several lines of text and my disks are nowhere near full (at least 10GB free). The code is some generic mapreduce job that I cannot post here because it's the intellectual property of the university.
Any tips on how to solve the "No space available" error?
For clarification I'm using only my PC, I'm not connected to other machines.
PS: I've solved it, as said here: Hadoop map reduce example stuck on Running job by user "banu reddy" https://stackoverflow.com/users/4249076/banu-reddy the free HDD space needs to be at least 10% od the disk.
Hadoop's jobs are executed within the framework's distributed filesystem aka HDFS, which works independently from the local filesystem (even by operating in just one machine, as you clarified).
That basically means that the error you got referred to the disk space available in the HDFS and not on your hard drives in general. To check if the HDFS has enough disk space to run the job or not, you can execute the following command on the terminal:
hdfs dfs -df -h
Which can have an output like this (ignoring the warning I get on my Hadoop setup):
If the command output in your system indicates that the available disk space is low or non-existent, you can individualy delete directories from the HDFS
by firstly checking what directories and files are stored:
hadoop fs -ls
And then deleting each directory from the HDFS:
hadoop fs -rm -r name_of_the_folder
Or file from the HDFS:
hadoop fs -rm name_of_the_file
Alternatively, you can empty everything stored in the HDFS to be sure that you will not hit the disk space limit again any time soon. You can do that by stopping the YARN and HDFS daemons at first:
stop-all.sh
Then enabling only the HDFS daemon:
start-dfs.sh
Then formatting everything on the namenode (aka the HDFS in your system, not your local files of course):
hadoop namenode -format
And enabling YARN and HDFS daemons at last:
start-all.sh
Remember to re-run the hdfs dfs -df -h command after deleting stuff in the HDFS so you make sure you have free space on the HDFS.

Hadoop : swap DataNode & NameNode without losing any HDFS data

I have a cluster of 5 machines:
1 big NameNode
4 standard DataNodes
I want to change my current NameNode with a DataNode without losing the data stored in HDFS, so my cluster could become:
1 standard NameNode
3 standard DataNodes
1 big DataNode
Does someone know a simple way to do that?
Thank you very much
Decomission data node where namenode will be moved.
Stop the cluster.
Create a tar of dfs.name.dir from current namenode.
Copy all hadoop config files from current NN to target NN.
Replace the name/ip of target namenode by modifying core-site.xml.
Restore tarball of dfs.name.dir. Make sure that full path is same.
Now start the cluster by starting new namenode and one less datanode.
Verify that everything is working perfectly.
Add old namenode as datanode by configuring it as datanode.
I would suggest to uninstall and then install hadoop on both the nodes so that previous configuration does not cause any problem.

HDFS Namenode High Availability

I enabled the Namenode High Availability using ambari.
I want to verify the connection using dfs.nameservices (nameservice ID) before start the coding.
Is there any command line or tool to verifiy it?
You can use the normal HDFS CLI.
hdfs dfs -ls hdfs://nameservice/user
Which should also work the same as
hdfs dfs -ls hdfs:///user
Or giving your active namenode
hdfs dfs -ls hdfs://namenode-1:port/user
If you provide the standby namenode, it will say operation READ not supported in state standby

Unable to load large file to HDFS on Spark cluster master node

I have fired up a Spark Cluster on Amazon EC2 containing 1 master node and 2 servant nodes that have 2.7gb of memory each
However when I tried to put a file of 3 gb on to the HDFS through the code below
/root/ephemeral-hdfs/bin/hadoop fs -put /root/spark/2GB.bin 2GB.bin
it returns the error, "/user/root/2GB.bin could only be replicated to 0 nodes, instead of 1". fyi, I am able to upload files of smaller size but not when it exceeds a certain size (about 2.2 gb).
If the file exceeds the memory size of a node, wouldn't it will be split by Hadoop to the other node?
Edit: Summary of my understanding of the issue you are facing:
1) Total HDFS free size is 5.32 GB
2) HDFS free size on each node is 2.6GB
Note: You have bad blocks (4 Blocks with corrupt replicas)
The following Q&A mentions similar issues:
Hadoop put command throws - could only be replicated to 0 nodes, instead of 1
In that case, running JPS showed that the datanode are down.
Those Q&A suggest a way to restart the data-node:
What is best way to start and stop hadoop ecosystem, with command line?
Hadoop - Restart datanode and tasktracker
Please try to restart your data-node, and let us know if it solved the problem.
When using HDFS - you have one shared file system
i.e. all nodes share the same file system
From your description - the current free space on the HDFS is about 2.2GB , while you tries to put there 3GB.
Execute the following command to get the HDFS free size:
hdfs dfs -df -h
hdfs dfsadmin -report
or (for older versions of HDFS)
hadoop fs -df -h
hadoop dfsadmin -report

Cloudera CDH4 - how come I can't browse the hdfs filesystem from the nodes?

I installed my test cluster using Cloudera Manager free.
I can only browse the filesystem from the main NameNode. When running hadoop dfs -ls only shows the local folder.
JPS shows the Jps, TaskTracker, DataNode on the nodes.
MapReduce tasks/jobs run fine on all the nodes as a cluster.
With my custom setup Hadoop cluster (without Cloudera), I can easily browse and manipulate the hdfs filesystem (eg. I can run hadoop dfs -mkdir test1 on all the nodes - but only on the NameNode in CDH4)
why is this?
Try using the command ./bin/hadoop fs -ls / for HDFS browsing.

Resources