How does HDFS detect and handle corrupted blocks?

Role of Block Scanner

Block Scanner is basically used to identify corrupt datanode Block. During a write operation, when a datanode writes in to the HDFS, it verifies a checksum for that data. This checksum helps in verifying the data corruptions during the data transmission.

Handling Corrupted Blocks

Block scanner runs periodically on every DataNode to verify whether the data blocks stored are correct or not. The following steps will occur when a corrupted data block is detected by the block scanner:

DataNode will report to the NameNode about the corrupted block.
NameNode will start the process of creating a new replica using the correct replica of the corrupted block present in other DataNodes.
The corrupted data block will not be deleted until the replication count of the correct replicas matches with the replication factor (3 by default).

This whole process allows HDFS to maintain the integrity of the data when a client performs a read operation.

Reference

https://www.edureka.co/community/12658/block-scanner-hdfs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hdfs_block_scanner.md

hdfs_block_scanner.md

How does HDFS detect and handle corrupted blocks?

Role of Block Scanner

Handling Corrupted Blocks

Reference

Files

hdfs_block_scanner.md

Latest commit

History

hdfs_block_scanner.md

File metadata and controls

How does HDFS detect and handle corrupted blocks?

Role of Block Scanner

Handling Corrupted Blocks

Reference