How to fix corrupt HDFS FIles - hadoop

How does someone fix a HDFS that's corrupt? I looked on the Apache/Hadoop website and it said its fsck command, which doesn't fix it. Hopefully someone who has run into this problem before can tell me how to fix this.
Unlike a traditional fsck utility for native file systems, this command does not correct the errors it detects. Normally NameNode automatically corrects most of the recoverable failures.
When I ran bin/hadoop fsck / -delete, it listed the files that were corrupt or missing blocks. How do I make it not corrupt? This is on a practice machine so I COULD blow everything away but when we go live, I won't be able to "fix" it by blowing everything away so I'm trying to figure it out now.

You can use
hdfs fsck /
to determine which files are having problems. Look through the output for missing or corrupt blocks (ignore under-replicated blocks for now). This command is really
verbose especially on a large HDFS filesystem so I normally get down to
the meaningful output with
hdfs fsck / | egrep -v '^\.+$' | grep -v eplica
which ignores lines with nothing but dots and lines talking about replication.
Once you find a file that is corrupt
hdfs fsck /path/to/corrupt/file -locations -blocks -files
Use that output to determine where blocks might live. If the file is
larger than your block size it might have multiple blocks.
You can use the reported block numbers to go around to the
datanodes and the namenode logs searching for the machine or machines
on which the blocks lived. Try looking for filesystem errors
on those machines. Missing mount points, datanode not running,
file system reformatted/reprovisioned. If you can find a problem
in that way and bring the block back online that file will be healthy
again.
Lather rinse and repeat until all files are healthy or you exhaust
all alternatives looking for the blocks.
Once you determine what happened and you cannot recover any more blocks,
just use the
hdfs fs -rm /path/to/file/with/permanently/missing/blocks
command to get your HDFS filesystem back to healthy so you can start
tracking new errors as they occur.

If you just want to get your HDFS back to normal state and don't worry much about the data, then
This will list the corrupt HDFS blocks:
hdfs fsck -list-corruptfileblocks
This will delete the corrupted HDFS blocks:
hdfs fsck / -delete
Note that, you might have to use sudo -u hdfs if you are not the sudo user (assuming "hdfs" is name of the sudo user)

the solution here worked for me : https://community.hortonworks.com/articles/4427/fix-under-replicated-blocks-in-hdfs-manually.html
su - <$hdfs_user>
bash-4.1$ hdfs fsck / | grep 'Under replicated' | awk -F':' '{print $1}' >> /tmp/under_replicated_files
-bash-4.1$ for hdfsfile in `cat /tmp/under_replicated_files`; do echo "Fixing $hdfsfile :" ; hadoop fs -setrep 3 $hdfsfile; done

start all daemons and run the command as "hadoop namenode -recover -force" stop the daemons and start again.. wait some time to recover data.

Related

Hadoop Mapreduce get job history in psuedo-distributed mode

I am running Hadoop Mapreduce and Yarn in psuedo distributed mode and I want to get job history log. To get that, I tried solution 2 in this question and so, from the directory.
hadoop-3.0.0/bin
I executed
$ ./hdfs dfs -ls /tmp/hadoop-uname/mapred.
Following is what I get as response:
ls: `/tmp/hadoop-uname/mapred': No such file or directory
I get same response for:
$ ./hdfs dfs -ls /tmp/hadoop-uname/mapred/staging
also.
My questions are:
1) Are job history logs generated in psuedo history mode?
2) Is logging turned on by default? Or I need to do some other setting to turn it on?
3) Am I missing anything else?

Suppressing warnings for hadoop fs -get -p command

I am copying huge number of files using hadoop fs -get -p command. I want to retain (timestamps, ownerships) Many of the files are not able to retain the permissions
as the userid are not available in the local machine. So for these files I am getting "get: chown: changing ownership /a/b/c.txt Operation not permitted)
Is it possible to suppress the error, because it might be possible that I might get other issues as well. If I do 2>/dev/null, this will suppress all the issues
So I don't want to use this option. Is there any way I can suppress ONLY issues related to Privileges.?
Any hint can be really helpful?
Not very elegant, but functionnal, use grep -v your_undesired_pattern
hadoop fs -get -p command 2>&1 | grep -v "changing ownership"
From the Hadoop side, no. The error is printed using System.err.println and is coming from the OS as the command execs chown.

Find out actual disk usage in HDFS

Is there a way to find out how much space is consumed in HDFS?
I used
hdfs dfs -df
but it seems to be not relevant cause after deleting huge amount of data with
hdfs dfs -rm -r -skipTrash
the previous comand displays changes not at once but after several minutes (I need up-to-date disk usage info).
To see the space consumed by a particular folder try:
hadoop fs -du -s /folder/path
And if you want to see the usage, space consumed, space available, etc. of the whole HDFS:
hadoop dfsadmin -report
hadoop cli is deprecated. Use hdfs instead.
Folder wise :
sudo -u hdfs hdfs dfs -du -h /
Cluster wise :
sudo -u hdfs hdfs dfsadmin -report
hadoop fs -count -q /path/to/directory

How to change replication factor while running copyFromLocal command?

I'm not asking how to set replication factor in hadoop for a folder/file. I know following command works flawlessly for existing files & folders.
hadoop fs -setrep -R -w 3 <folder-path>
I'm asking, how do I set the replication factor, other than default (which is 4 in my scenario), while copying data from local. I'm running following command,
hadoop fs -copyFromLocal <src> <dest>
When I run above commands, it copies the data from src to dest path with replication factor as 4. But I want to make replication factor as 1 while copying data but not after copying is complete. Bascially I want something like this,
hadoop fs -setrep -R 1 -copyFromLocal <src> <dest>
I tried it, but it didn't work. So, can it be done? or I've first copy data with replication factor 4 and then run setrep command?
According to this post and this post (both asking different questions), this command seems to work:
hadoop fs -D dfs.replication=1 -copyFromLocal <src> <dest>
The -D option means "Use value for given property."

HDFS Error while copying the file : Could only be replicated to 0 nodes, instead of 1

While copying file from Local system to HDFS i am getting the below error,I am using Single Node
13/08/04 10:50:02 WARN hdfs.DFSClient: DataStreamer Exception: java.io.IOException: File /user/vishu/input could only be replicated to 0 nodes, instead of 1
I deleted the dfs/Name and dfs/data directories and formated the Namenode still No use.
and I have enough space to replicate the Data.
Could anyone help resolving this issue?
Regards,
Vishwa
Some times the data node may be starting up slowly and this may cause the above issue.. keep some wait time after the start of dfs and mapred demons.
bin/hadoop namenode -format
bin/start-dfs.sh
wait for some 5 min(data node will be up in the time)
bin/start-mapred.sh
Check whether all daemons are started or not. make sure that your input file is correct.
Use the following command to copy the file from local to hdfs:
bin/hadoop fs -mkdir abc
bin/hadoop fs -copyFromLocal inputfile abc
If your Client writing the file is outside the cluster, make sure the client has access to the Datanode.
Look at this http://www.hadoopinrealworld.com/could-only-be-replicated-to-0-nodes/

Resources