HDFS + results from hdfs fsck / are diff from hdfs dfsadmin -report - hadoop

we have hadoop cluster ( Ambari platform with HDP version - 2.6.4 )
and we performed verification step in order to understand if we have under replica blocks
the first verification was with:
su hdfs
hdfs fsck / - -->
its gives the results:
Total size: 17653549013347 B (Total open files size: 854433698229 B)
Total dirs: 843714
Total files: 11752836
Total symlinks: 0 (Files currently being written: 16)
Total blocks (validated): 11792203 (avg. block size 1497052 B) (Total open file blocks (not validated): 6381)
Minimally replicated blocks: 11792203 (100.00001 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 6
Number of racks: 1
so as we can see above Under-replicated blocks is 0
BUT
when we perform the next verification:
hdfs dfsadmin -report
then we get
Configured Capacity: 141275429535744 (128.49 TB)
Present Capacity: 140886991802565 (128.14 TB)
DFS Remaining: 84748655941292 (77.08 TB)
DFS Used: 56138335861273 (51.06 TB)
DFS Used%: 39.85%
Under replicated blocks: 4212067
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
so from above we can see that Under replicated blocks is --> 4212067
about to know what is the right under replica number:
why we get differences between hdfs fsck / and hdfs dfsadmin -report ?
BTW - from Ambari we get the ~ same results as from hdfs dfsadmin -report

Related

hdfs jmxget vs hdfs fsck

I have 2 namenodes with several datanodes, but today I've just seen that I have some corrupt blocks.
What is awkward is that:
hdfs jmxget -server namenode02 -port 8006 | grep CorruptBlocks
CorruptBlocks=27
and when I've checked with hdfs fsck / , I've got:
Total size: 734930879995888 B (Total open files size: 537967073 B)
Total dirs: 1501316
Total files: 113743394
Total symlinks: 0 (Files currently being written: 137)
Total blocks (validated): 109063040 (avg. block size 6738587 B) (Total open file blocks (not validated): 133)
Minimally replicated blocks: 109063040 (100.00001 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.001944
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 103
Number of racks: 1
FSCK ended at Mon Feb 12 10:09:10 CET 2018 in 1608344 milliseconds
So with fsck nothing bad regarding the blocks. How is this check made?
Thx in advance!
For the hdfs jmx command we have the overall status of the blocks from Hadoop, which it seems that few of them might be corrupted (don't know the reason).
For the fsck command we have the status of the files which they are safe due to the replica number set.
To conclude it's normal behavior, no anomalies here.

Unable to delete HDFS Corrupt files

I am unable to delete corrupt files present in my HDFS. Namenode has run into Safe mode. Total number of blocks are 980, out of which 978 have reported. When I run the following command,
sudo -u hdfs hdfs dfsadmin -report
The report generated is,
Safe mode is ON
Configured Capacity: 58531520512 (54.51 GB)
Present Capacity: 35774078976 (33.32 GB)
DFS Remaining: 32374509568 (30.15 GB)
DFS Used: 3399569408 (3.17 GB)
DFS Used%: 9.50%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
-------------------------------------------------
Live datanodes (1):
Name: 10.0.2.15:50010 (quickstart.cloudera)
Hostname: quickstart.cloudera
Decommission Status : Normal
Configured Capacity: 58531520512 (54.51 GB)
DFS Used: 3399569408 (3.17 GB)
Non DFS Used: 19777388544 (18.42 GB)
DFS Remaining: 32374509568 (30.15 GB)
DFS Used%: 5.81%
DFS Remaining%: 55.31%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Tue Nov 14 10:39:58 IST 2017
And for the following command when executed,
sudo -u hdfs hdfs fsck /
The output is,
Connecting to namenode via http://quickstart.cloudera:50070/fsck?ugi=hdfs&path=%2F
FSCK started by hdfs (auth:SIMPLE) from /10.0.2.15 for path / at Tue Nov 14 10:41:25 IST 2017
/hbase/oldWALs/quickstart.cloudera%2C60020%2C1509698296866.default.1509701903728: CORRUPT blockpool BP-1914853243-127.0.0.1-1500467607052 block blk_1073743141
/hbase/oldWALs/quickstart.cloudera%2C60020%2C1509698296866.default.1509701903728: MISSING 1 blocks of total size 83 B..
/hbase/oldWALs/quickstart.cloudera%2C60020%2C1509698296866.meta.1509701932269.meta: CORRUPT blockpool BP-1914853243-127.0.0.1-1500467607052 block blk_1073743142
/hbase/oldWALs/quickstart.cloudera%2C60020%2C1509698296866.meta.1509701932269.meta: MISSING 1 blocks of total size 83 B
Status: CORRUPT
Total size: 3368384392 B (Total open files size: 166 B)
Total dirs: 286
Total files: 966
Total symlinks: 0 (Files currently being written: 3)
Total blocks (validated): 980 (avg. block size 3437126 B) (Total open file blocks (not validated): 2)
********************************
UNDER MIN REPL'D BLOCKS: 2 (0.20408164 %)
dfs.namenode.replication.min: 1
CORRUPT FILES: 2
MISSING BLOCKS: 2
MISSING SIZE: 166 B
CORRUPT BLOCKS: 2
********************************
Minimally replicated blocks: 978 (99.79592 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 1
Average block replication: 0.9979592
Corrupt blocks: 2
Missing replicas: 0 (0.0 %)
Number of data-nodes: 1
Number of racks: 1
FSCK ended at Tue Nov 14 10:41:26 IST 2017 in 774 milliseconds
The filesystem under path '/' is CORRUPT
Can anyone please help in either correcting the corrupted blocks, (or) deleting them? Thanks in advance.
As it's said that Namenode is in Safe mode, first turn it off.
hdfs dfsadmin -safemode leave
Then execute either of commands
hdfs fsck / | egrep -v '^\.+$' | grep -v replica | grep -v Replica
or
hdfs fsck hdfs://quickstart.cloudera:50070/ | egrep -v '^\.+$' | grep -v replica | grep -v Replica
The output would be somewhat similar to
/path/to/filename.fileextension: CORRUPT blockpool BP-1016133662-10.29.100.41-1415825958975 block blk_1073904305
/path/to/filename.fileextension: MISSING 1 blocks of total size 15620361 B
In your case corrupted files are already listed. So execute below commands
hdfs dfs -rm /hbase/oldWALs/quickstart.cloudera%2C60020%2C1509698296866.default.1509701903728
hdfs dfs -rm /hbase/oldWALs/quickstart.cloudera%2C60020%2C1509698296866.meta.1509701932269.meta
And don't enter into Safemode. Just continue working.
Yippee!!

CDH HDFS node decommission never ends

We have a 12 servers hadoop cluster(CDH), Recent, we want to decommission three of them, but this process already been running there more than 2 days. But it never ends, Especially, in the past 24 hours, I saw there are only 94G data available on the three data-node, but the size seems not changing in the past 24 hours. even through the under replicated blocks number already been zero. The replication factor is 3 for all the data in hdfs.
Below is the result for hadoop fsck command:
Total size: 5789534135468 B (Total open files size: 94222879072 B)
Total dirs: 42458
Total files: 5494378
Total symlinks: 0 (Files currently being written: 133)
Total blocks (validated): 5506578 (avg. block size 1051385 B) (Total open file blocks (not validated): 822)
Minimally replicated blocks: 5506578 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.999584
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 13
Number of racks: 1
FSCK ended at Mon Oct 17 16:36:09 KST 2016 in 781094 milliseconds
You can try to stop cloudera agent on the datanode.
sudo service cloudera-scm-agent hard_stop_confirmed
After the agent is stopped, you can just delete that datanode from hdfs instance page
Hope this works

Which nodes a hadoop file is stored on

Is there a way to find which datanodes a particular hdfs file is stored on, or a list of blocks that store an hdfs file?
For example, if I have hdfs://user/person/file.csv,
Is there a way to find a list of ext4 paths corresponding to the blocks that make up this file on the datanodes?
Yes, you can find out the location of blocks which are stored on different datanodes in HDFS. Here is the command:
hdfs fsck /user/hduser/file.txt -files -blocks -locations
This will give you all the information related to individual blocks created for file: "/user/hduser/file.txt". Output generally looks like this:
[hduser#node001 ~]$ hdfs fsck /user/hduser/file.txt -files -blocks -locations
Connecting to namenode via http://node001.morado.com:50070
FSCK started by hduser (auth:SIMPLE) from /192.168.2.169 for path /user/hduser/file.txt at Mon Jul 11 23:14:27 PDT 2016
/user/hduser/file.txt 1073839694 bytes, 9 block(s): OK
0. BP-778802867-192.168.2.147-1465886958278:blk_1080847742_7107323 len=134217728 repl=3 [DatanodeInfoWithStorage[192.168.2.169:50010,DS-25d2b73a-2dc2-48c1-9aad-f0f5ca8d302a,DISK], DatanodeInfoWithStorage[192.168.2.147:50010,DS-293a7f8d-ad31-4bc1-98d8-0c0822eda305,DISK], DatanodeInfoWithStorage[192.168.2.20:50010,DS-8efb7a6e-08f0-4f2d-aee2-bc5a102277bd,DISK]]
1. BP-778802867-192.168.2.147-1465886958278:blk_1080847748_7107329 len=134217728 repl=3 [DatanodeInfoWithStorage[192.168.2.169:50010,DS-6881b609-1473-48d5-a07c-f111e0bdcf2f,DISK], DatanodeInfoWithStorage[192.168.2.15:50010,DS-060c75ff-5632-4f6f-a73b-fb2a68927c63,DISK], DatanodeInfoWithStorage[192.168.2.147:50010,DS-3e108776-d3bd-4b84-b68a-59e1ca755331,DISK]]
2. BP-778802867-192.168.2.147-1465886958278:blk_1080847753_7107334 len=134217728 repl=3 [DatanodeInfoWithStorage[192.168.2.169:50010,DS-25d2b73a-2dc2-48c1-9aad-f0f5ca8d302a,DISK], DatanodeInfoWithStorage[192.168.2.177:50010,DS-b7a33931-8917-4fe2-b2ec-2e4c3d5b6b01,DISK], DatanodeInfoWithStorage[192.168.2.135:50010,DS-5efb0813-7e4e-4d27-8fa4-7f8f3b2e6e3c,DISK]]
3. BP-778802867-192.168.2.147-1465886958278:blk_1080847760_7107341 len=134217728 repl=3 [DatanodeInfoWithStorage[192.168.2.169:50010,DS-6881b609-1473-48d5-a07c-f111e0bdcf2f,DISK], DatanodeInfoWithStorage[192.168.2.20:50010,DS-b8a5ceaf-6953-4842-8930-29b286ccb7cf,DISK], DatanodeInfoWithStorage[192.168.2.134:50010,DS-c6418fbb-6e30-447e-b507-bf19e0f28fd1,DISK]]
4. BP-778802867-192.168.2.147-1465886958278:blk_1080847764_7107345 len=134217728 repl=3 [DatanodeInfoWithStorage[192.168.2.169:50010,DS-95636645-c59e-4bca-8478-c15b3c16d514,DISK], DatanodeInfoWithStorage[192.168.2.147:50010,DS-293a7f8d-ad31-4bc1-98d8-0c0822eda305,DISK], DatanodeInfoWithStorage[192.168.2.20:50010,DS-8efb7a6e-08f0-4f2d-aee2-bc5a102277bd,DISK]]
5. BP-778802867-192.168.2.147-1465886958278:blk_1080847768_7107349 len=134217728 repl=3 [DatanodeInfoWithStorage[192.168.2.169:50010,DS-6881b609-1473-48d5-a07c-f111e0bdcf2f,DISK], DatanodeInfoWithStorage[192.168.2.20:50010,DS-276bd2c8-ee3d-4cd3-b655-17a83917c45b,DISK], DatanodeInfoWithStorage[192.168.2.134:50010,DS-5e128658-c876-46df-b10e-5962baf73db2,DISK]]
6. BP-778802867-192.168.2.147-1465886958278:blk_1080847772_7107353 len=134217728 repl=3 [DatanodeInfoWithStorage[192.168.2.169:50010,DS-fa57d5e9-a187-4856-8bf2-6933e63b3afe,DISK], DatanodeInfoWithStorage[192.168.2.135:50010,DS-f5f1e2a0-186b-4c70-844f-7e7ebe389f50,DISK], DatanodeInfoWithStorage[192.168.2.15:50010,DS-8cfb8ffb-77b6-40bb-930e-81c7198166ad,DISK]]
7. BP-778802867-192.168.2.147-1465886958278:blk_1080847776_7107357 len=134217728 repl=3 [DatanodeInfoWithStorage[192.168.2.169:50010,DS-95636645-c59e-4bca-8478-c15b3c16d514,DISK], DatanodeInfoWithStorage[192.168.2.15:50010,DS-060c75ff-5632-4f6f-a73b-fb2a68927c63,DISK], DatanodeInfoWithStorage[192.168.2.147:50010,DS-3e108776-d3bd-4b84-b68a-59e1ca755331,DISK]]
8. BP-778802867-192.168.2.147-1465886958278:blk_1080847780_7107361 len=97870 repl=3 [DatanodeInfoWithStorage[192.168.2.169:50010,DS-25d2b73a-2dc2-48c1-9aad-f0f5ca8d302a,DISK], DatanodeInfoWithStorage[192.168.2.15:50010,DS-de3cffa6-cef4-4f47-9bbf-5f44214b3a5a,DISK], DatanodeInfoWithStorage[192.168.2.177:50010,DS-847ec520-bc14-4ca4-af94-21140a3b20f6,DISK]]
Status: HEALTHY
Total size: 1073839694 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 9 (avg. block size 119315521 B)
Minimally replicated blocks: 9 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 7
Number of racks: 1
FSCK ended at Mon Jul 11 23:14:27 PDT 2016 in 1 milliseconds
The filesystem under path '/user/hduser/file.txt' is HEALTHY
Look for the information after "repl", usually starting with "DatanodeInfoWithStorage" tag. It gives the required information about datanode locations.

HDFS blocks issue

when I run fsck command it shows total blocks to be 68 (avg. block size 286572 B). How can I have only 68 blocks?
I recently installed CDH5 with version: Hadoop 2.6.0
-
[hdfs#cluster1 ~]$ hdfs fsck /
Connecting to namenode via http://cluster1.abc:50070
FSCK started by hdfs (auth:SIMPLE) from /192.168.101.241 for path / at Fri Sep 25 09:51:56 EDT 2015
....................................................................Status: HEALTHY
Total size: 19486905 B
Total dirs: 569
Total files: 68
Total symlinks: 0
Total blocks (validated): 68 (avg. block size 286572 B)
Minimally replicated blocks: 68 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 1.9411764
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 3
Number of racks: 1
FSCK ended at Fri Sep 25 09:51:56 EDT 2015 in 41 milliseconds
The filesystem under path '/' is HEALTHY
-
This is what I get when I run hdfsadmin -repot command:
[hdfs#cluster1 ~]$ hdfs dfsadmin -report
Configured Capacity: 5715220577895 (5.20 TB)
Present Capacity: 5439327449088 (4.95 TB)
DFS Remaining: 5439303270400 (4.95 TB)
DFS Used: 24178688 (23.06 MB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 504
-
Also, my hive query does not start MapReduce job, could it be above issue?
Any suggestion?
Thank you!
Blocks are chunks of data that is distributed in the nodes in the File System. So for example if you are having a file of 200MB, there would infact be 2 blocks of 128 and 72 mbs each.
So do not be worried about the blocks as that is taken care of by the Framework. As the fsck report shows, you have 68 files in HDFS and hence 68 blocks.

Resources