we have hadoop cluster ( Ambari platform with HDP version - 2.6.4 )
and we performed verification step in order to understand if we have under replica blocks
the first verification was with:
su hdfs
hdfs fsck / - -->
its gives the results:
Total size: 17653549013347 B (Total open files size: 854433698229 B)
Total dirs: 843714
Total files: 11752836
Total symlinks: 0 (Files currently being written: 16)
Total blocks (validated): 11792203 (avg. block size 1497052 B) (Total open file blocks (not validated): 6381)
Minimally replicated blocks: 11792203 (100.00001 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 6
Number of racks: 1
so as we can see above Under-replicated blocks is 0
BUT
when we perform the next verification:
hdfs dfsadmin -report
then we get
Configured Capacity: 141275429535744 (128.49 TB)
Present Capacity: 140886991802565 (128.14 TB)
DFS Remaining: 84748655941292 (77.08 TB)
DFS Used: 56138335861273 (51.06 TB)
DFS Used%: 39.85%
Under replicated blocks: 4212067
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
so from above we can see that Under replicated blocks is --> 4212067
about to know what is the right under replica number:
why we get differences between hdfs fsck / and hdfs dfsadmin -report ?
BTW - from Ambari we get the ~ same results as from hdfs dfsadmin -report
I have 2 namenodes with several datanodes, but today I've just seen that I have some corrupt blocks.
What is awkward is that:
hdfs jmxget -server namenode02 -port 8006 | grep CorruptBlocks
CorruptBlocks=27
and when I've checked with hdfs fsck / , I've got:
Total size: 734930879995888 B (Total open files size: 537967073 B)
Total dirs: 1501316
Total files: 113743394
Total symlinks: 0 (Files currently being written: 137)
Total blocks (validated): 109063040 (avg. block size 6738587 B) (Total open file blocks (not validated): 133)
Minimally replicated blocks: 109063040 (100.00001 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.001944
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 103
Number of racks: 1
FSCK ended at Mon Feb 12 10:09:10 CET 2018 in 1608344 milliseconds
So with fsck nothing bad regarding the blocks. How is this check made?
Thx in advance!
For the hdfs jmx command we have the overall status of the blocks from Hadoop, which it seems that few of them might be corrupted (don't know the reason).
For the fsck command we have the status of the files which they are safe due to the replica number set.
To conclude it's normal behavior, no anomalies here.
Is there a way to find which datanodes a particular hdfs file is stored on, or a list of blocks that store an hdfs file?
For example, if I have hdfs://user/person/file.csv,
Is there a way to find a list of ext4 paths corresponding to the blocks that make up this file on the datanodes?
Yes, you can find out the location of blocks which are stored on different datanodes in HDFS. Here is the command:
hdfs fsck /user/hduser/file.txt -files -blocks -locations
This will give you all the information related to individual blocks created for file: "/user/hduser/file.txt". Output generally looks like this:
[hduser#node001 ~]$ hdfs fsck /user/hduser/file.txt -files -blocks -locations
Connecting to namenode via http://node001.morado.com:50070
FSCK started by hduser (auth:SIMPLE) from /192.168.2.169 for path /user/hduser/file.txt at Mon Jul 11 23:14:27 PDT 2016
/user/hduser/file.txt 1073839694 bytes, 9 block(s): OK
0. BP-778802867-192.168.2.147-1465886958278:blk_1080847742_7107323 len=134217728 repl=3 [DatanodeInfoWithStorage[192.168.2.169:50010,DS-25d2b73a-2dc2-48c1-9aad-f0f5ca8d302a,DISK], DatanodeInfoWithStorage[192.168.2.147:50010,DS-293a7f8d-ad31-4bc1-98d8-0c0822eda305,DISK], DatanodeInfoWithStorage[192.168.2.20:50010,DS-8efb7a6e-08f0-4f2d-aee2-bc5a102277bd,DISK]]
1. BP-778802867-192.168.2.147-1465886958278:blk_1080847748_7107329 len=134217728 repl=3 [DatanodeInfoWithStorage[192.168.2.169:50010,DS-6881b609-1473-48d5-a07c-f111e0bdcf2f,DISK], DatanodeInfoWithStorage[192.168.2.15:50010,DS-060c75ff-5632-4f6f-a73b-fb2a68927c63,DISK], DatanodeInfoWithStorage[192.168.2.147:50010,DS-3e108776-d3bd-4b84-b68a-59e1ca755331,DISK]]
2. BP-778802867-192.168.2.147-1465886958278:blk_1080847753_7107334 len=134217728 repl=3 [DatanodeInfoWithStorage[192.168.2.169:50010,DS-25d2b73a-2dc2-48c1-9aad-f0f5ca8d302a,DISK], DatanodeInfoWithStorage[192.168.2.177:50010,DS-b7a33931-8917-4fe2-b2ec-2e4c3d5b6b01,DISK], DatanodeInfoWithStorage[192.168.2.135:50010,DS-5efb0813-7e4e-4d27-8fa4-7f8f3b2e6e3c,DISK]]
3. BP-778802867-192.168.2.147-1465886958278:blk_1080847760_7107341 len=134217728 repl=3 [DatanodeInfoWithStorage[192.168.2.169:50010,DS-6881b609-1473-48d5-a07c-f111e0bdcf2f,DISK], DatanodeInfoWithStorage[192.168.2.20:50010,DS-b8a5ceaf-6953-4842-8930-29b286ccb7cf,DISK], DatanodeInfoWithStorage[192.168.2.134:50010,DS-c6418fbb-6e30-447e-b507-bf19e0f28fd1,DISK]]
4. BP-778802867-192.168.2.147-1465886958278:blk_1080847764_7107345 len=134217728 repl=3 [DatanodeInfoWithStorage[192.168.2.169:50010,DS-95636645-c59e-4bca-8478-c15b3c16d514,DISK], DatanodeInfoWithStorage[192.168.2.147:50010,DS-293a7f8d-ad31-4bc1-98d8-0c0822eda305,DISK], DatanodeInfoWithStorage[192.168.2.20:50010,DS-8efb7a6e-08f0-4f2d-aee2-bc5a102277bd,DISK]]
5. BP-778802867-192.168.2.147-1465886958278:blk_1080847768_7107349 len=134217728 repl=3 [DatanodeInfoWithStorage[192.168.2.169:50010,DS-6881b609-1473-48d5-a07c-f111e0bdcf2f,DISK], DatanodeInfoWithStorage[192.168.2.20:50010,DS-276bd2c8-ee3d-4cd3-b655-17a83917c45b,DISK], DatanodeInfoWithStorage[192.168.2.134:50010,DS-5e128658-c876-46df-b10e-5962baf73db2,DISK]]
6. BP-778802867-192.168.2.147-1465886958278:blk_1080847772_7107353 len=134217728 repl=3 [DatanodeInfoWithStorage[192.168.2.169:50010,DS-fa57d5e9-a187-4856-8bf2-6933e63b3afe,DISK], DatanodeInfoWithStorage[192.168.2.135:50010,DS-f5f1e2a0-186b-4c70-844f-7e7ebe389f50,DISK], DatanodeInfoWithStorage[192.168.2.15:50010,DS-8cfb8ffb-77b6-40bb-930e-81c7198166ad,DISK]]
7. BP-778802867-192.168.2.147-1465886958278:blk_1080847776_7107357 len=134217728 repl=3 [DatanodeInfoWithStorage[192.168.2.169:50010,DS-95636645-c59e-4bca-8478-c15b3c16d514,DISK], DatanodeInfoWithStorage[192.168.2.15:50010,DS-060c75ff-5632-4f6f-a73b-fb2a68927c63,DISK], DatanodeInfoWithStorage[192.168.2.147:50010,DS-3e108776-d3bd-4b84-b68a-59e1ca755331,DISK]]
8. BP-778802867-192.168.2.147-1465886958278:blk_1080847780_7107361 len=97870 repl=3 [DatanodeInfoWithStorage[192.168.2.169:50010,DS-25d2b73a-2dc2-48c1-9aad-f0f5ca8d302a,DISK], DatanodeInfoWithStorage[192.168.2.15:50010,DS-de3cffa6-cef4-4f47-9bbf-5f44214b3a5a,DISK], DatanodeInfoWithStorage[192.168.2.177:50010,DS-847ec520-bc14-4ca4-af94-21140a3b20f6,DISK]]
Status: HEALTHY
Total size: 1073839694 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 9 (avg. block size 119315521 B)
Minimally replicated blocks: 9 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 7
Number of racks: 1
FSCK ended at Mon Jul 11 23:14:27 PDT 2016 in 1 milliseconds
The filesystem under path '/user/hduser/file.txt' is HEALTHY
Look for the information after "repl", usually starting with "DatanodeInfoWithStorage" tag. It gives the required information about datanode locations.
My hdfs data got corrupted.
on doing fsck, i got the following result
.
/siva: CORRUPT block blk_-1910702044505537827
/siva: CORRUPT block blk_6483992593913191763
/siva: MISSING 2 blocks of total size 82009995 B.Status: CORRUPT
Total size: 82009995 B
Total dirs: 8
Total files: 1
Total blocks (validated): 2 (avg. block size 41004997 B)
CORRUPT FILES: 1
MISSING BLOCKS: 2
MISSING SIZE: 82009995 B
CORRUPT BLOCKS: 2
Minimally replicated blocks: 0 (0.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 2
Average block replication: 0.0
Corrupt blocks: 2
Missing replicas: 0
Number of data-nodes: 1
Number of racks: 1
FSCK ended at Tue Feb 23 12:21:03 IST 2016 in 2 milliseconds
The filesystem under path '/' is CORRUPT
Then i tried to remove the /siva folder but i got the following output
rmr: cannot remove /siva: No such file or directory.
please support
Use hdfs fsck / -delete to remove corrupted files.
Please run the below command on any headnode[ HN0 or HN1].
hdfs fsck -D "fs.default.name=hdfs://mycluster/" /
In the report, we can see a filesystem as corrupt since blocks are got corrupted.
The filesystem under path '/' is CORRUPT
Run the below command to fix this issue.
hdfs fsck -D "fs.default.name=hdfs://mycluster/" / -delete
after, run the below command again to see a filesystem status.
hdfs fsck -D "fs.default.name=hdfs://mycluster/" /
this time we should see the filesytem status as Healthy like below.
The filesystem under path '/' is HEALTHY
when I run fsck command it shows total blocks to be 68 (avg. block size 286572 B). How can I have only 68 blocks?
I recently installed CDH5 with version: Hadoop 2.6.0
-
[hdfs#cluster1 ~]$ hdfs fsck /
Connecting to namenode via http://cluster1.abc:50070
FSCK started by hdfs (auth:SIMPLE) from /192.168.101.241 for path / at Fri Sep 25 09:51:56 EDT 2015
....................................................................Status: HEALTHY
Total size: 19486905 B
Total dirs: 569
Total files: 68
Total symlinks: 0
Total blocks (validated): 68 (avg. block size 286572 B)
Minimally replicated blocks: 68 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 1.9411764
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 3
Number of racks: 1
FSCK ended at Fri Sep 25 09:51:56 EDT 2015 in 41 milliseconds
The filesystem under path '/' is HEALTHY
-
This is what I get when I run hdfsadmin -repot command:
[hdfs#cluster1 ~]$ hdfs dfsadmin -report
Configured Capacity: 5715220577895 (5.20 TB)
Present Capacity: 5439327449088 (4.95 TB)
DFS Remaining: 5439303270400 (4.95 TB)
DFS Used: 24178688 (23.06 MB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 504
-
Also, my hive query does not start MapReduce job, could it be above issue?
Any suggestion?
Thank you!
Blocks are chunks of data that is distributed in the nodes in the File System. So for example if you are having a file of 200MB, there would infact be 2 blocks of 128 and 72 mbs each.
So do not be worried about the blocks as that is taken care of by the Framework. As the fsck report shows, you have 68 files in HDFS and hence 68 blocks.