Testing condition of HDD, SSD, CD, DVD, etc. media.
This tool allows for checking if the storage device is healthy (does not have any badblocks or latent bad sectors). Allowing for non-destructive and online assessment of HDD or SSD condition.
To compile the program, download the latest package from release page or git repo and execute:
make
If the compilation succeeds, you'll get one executable file: hdck
. You can
either copy it to /sbin
or just run it from local directory. The rest of this
manual will assume the former option.
To test the the only disk in the PC, run the program as this:
hdck -f /dev/sda
While the program is running (it can take from half an hour, to 12h or more, depending on HDD's size and speed) it will print progress information in form similar to the one below:
hdck status:
============
Loop: 1 of 1
Progress: 1.83%, 1.83% total
Read: 768000 sectors of 21474836480
Speed: 185.433MiB/s, average: 172.749MiB/s
Elapsed time: 00:00:02
Expected time: 00:01:49
Samples: Blocks (9th decile):
< 2.4ms: 2961 2961
< 5.8ms: 7 7
<14.1ms: 19 19
<22.5ms: 7 7
<39.1ms: 3 3
<55.8ms: 2 2
>55.8ms: 1 1
ERR : 0 0
Intrrpt: 0 0
(meaning of particular fields is explained below)
At the end it will print summary similar to this:
hdck results:
=============
possible latent bad sectors or silent realocations:
block 0 (LBA: 0-255) rel std dev: -nan, avg: 2.50, valid: yes, samples: 1, errors: 0, 9th decile: 2.50
block 1 (LBA: 256-511) rel std dev: -nan, avg: 2.55, valid: yes, samples: 1, errors: 0, 9th decile: 2.55
block 2616 (LBA: 669696-669951) rel std dev: 0.00, avg: 37.37, valid: yes, samples: 15, errors: 0, 9th decile: 37.42
block 2643 (LBA: 676608-676863) rel std dev: 0.00, avg: 35.90, valid: yes, samples: 15, errors: 0, 9th decile: 35.93
block 2657 (LBA: 680192-680447) rel std dev: 0.00, avg: 35.89, valid: yes, samples: 15, errors: 0, 9th decile: 35.94
block 15177 (LBA: 3885312-3885567) rel std dev: 0.90, avg: 11.39, valid: yes, samples: 20, errors: 0, 9th decile: 22.66
block 33574 (LBA: 8594944-8595199) rel std dev: 0.59, avg: 14.31, valid: yes, samples: 15, errors: 0, 9th decile: 29.31
block 33862 (LBA: 8668672-8668927) rel std dev: 0.87, avg: 7.92, valid: yes, samples: 14, errors: 0, 9th decile: 19.20
block 33981 (LBA: 8699136-8699391) rel std dev: 0.01, avg: 6.43, valid: yes, samples: 14, errors: 0, 9th decile: 15.74
block 34139 (LBA: 8739584-8739839) rel std dev: 0.81, avg: 20.48, valid: yes, samples: 20, errors: 0, 9th decile: 45.94
block 34166 (LBA: 8746496-8746751) rel std dev: 0.82, avg: 7.81, valid: yes, samples: 19, errors: 0, 9th decile: 20.96
block 34192 (LBA: 8753152-8753407) rel std dev: 0.85, avg: 11.33, valid: yes, samples: 19, errors: 0, 9th decile: 22.67
block 34368 (LBA: 8798208-8798463) rel std dev: 0.67, avg: 22.02, valid: yes, samples: 24, errors: 0, 9th decile: 55.14
block 34814 (LBA: 8912384-8912639) rel std dev: 0.79, avg: 10.29, valid: yes, samples: 24, errors: 0, 9th decile: 26.55
block 34837 (LBA: 8918272-8918527) rel std dev: 0.67, avg: 18.53, valid: yes, samples: 24, errors: 0, 9th decile: 37.66
block 34891 (LBA: 8932096-8932351) rel std dev: 0.50, avg: 42.55, valid: yes, samples: 30, errors: 0, 9th decile: 62.59
block 37690 (LBA: 9648640-9648895) rel std dev: 0.89, avg: 11.96, valid: yes, samples: 20, errors: 0, 9th decile: 22.42
block 37987 (LBA: 9724672-9724927) rel std dev: 0.53, avg: 23.39, valid: yes, samples: 20, errors: 0, 9th decile: 45.02
block 39149 (LBA: 10022144-10022399) rel std dev: 0.25, avg: 30.48, valid: yes, samples: 20, errors: 0, 9th decile: 45.06
block 41191 (LBA: 10544896-10545151) rel std dev: 0.30, avg: 143.75, valid: yes, samples: 30, errors: 0, 9th decile: 229.82
block 43859 (LBA: 11227904-11228159) rel std dev: 0.55, avg: 27.37, valid: yes, samples: 30, errors: 0, 9th decile: 63.21
block 45148 (LBA: 11557888-11558143) rel std dev: 0.26, avg: 100.32, valid: yes, samples: 30, errors: 0, 9th decile: 145.01
block 47789 (LBA: 12233984-12234239) rel std dev: 0.67, avg: 22.79, valid: yes, samples: 20, errors: 0, 9th decile: 40.72
block 74821 (LBA: 19154176-19154431) rel std dev: 0.24, avg: 71.71, valid: yes, samples: 30, errors: 0, 9th decile: 120.03
block 80023 (LBA: 20485888-20486143) rel std dev: 0.72, avg: 33.94, valid: yes, samples: 30, errors: 0, 9th decile: 85.84
block 81849 (LBA: 20953344-20953599) rel std dev: 0.53, avg: 15.89, valid: yes, samples: 15, errors: 0, 9th decile: 32.56
block 90502 (LBA: 23168512-23168767) rel std dev: 0.48, avg: 48.78, valid: yes, samples: 30, errors: 0, 9th decile: 89.06
block 91273 (LBA: 23365888-23366143) rel std dev: 0.86, avg: 10.55, valid: yes, samples: 20, errors: 0, 9th decile: 20.99
block 91507 (LBA: 23425792-23426047) rel std dev: 0.43, avg: 28.06, valid: yes, samples: 20, errors: 0, 9th decile: 46.87
block 96648 (LBA: 24741888-24742143) rel std dev: 0.53, avg: 22.24, valid: yes, samples: 20, errors: 0, 9th decile: 38.52
block 98009 (LBA: 25090304-25090559) rel std dev: 0.38, avg: 68.93, valid: yes, samples: 30, errors: 0, 9th decile: 112.58
block 98190 (LBA: 25136640-25136895) rel std dev: 0.32, avg: 63.20, valid: yes, samples: 30, errors: 0, 9th decile: 105.19
block 104310 (LBA: 26703360-26703615) rel std dev: 0.42, avg: 47.65, valid: yes, samples: 30, errors: 0, 9th decile: 89.85
block 191916 (LBA: 49130496-49130751) rel std dev: 0.40, avg: 19.45, valid: yes, samples: 20, errors: 0, 9th decile: 37.76
block 273929 (LBA: 70125824-70126079) rel std dev: 0.45, avg: 35.09, valid: yes, samples: 30, errors: 0, 9th decile: 65.39
block 301249 (LBA: 77119744-77119999) rel std dev: 0.84, avg: 24.71, valid: yes, samples: 21, errors: 0, 9th decile: 54.10
block 301250 (LBA: 77120000-77120255) rel std dev: 0.40, avg: 10.40, valid: yes, samples: 20, errors: 0, 9th decile: 14.17
block 301272 (LBA: 77125632-77125887) rel std dev: 0.91, avg: 17.24, valid: yes, samples: 20, errors: 0, 9th decile: 39.82
block 304234 (LBA: 77883904-77884159) rel std dev: 0.00, avg: 37.69, valid: yes, samples: 15, errors: 0, 9th decile: 37.71
39 uncertain blocks found
wall time: 3481s.825ms.341µs.155ns
sum time: 3435s.169ms.124µs
tested 305333 blocks (0 errors, 964536 samples)
mean block time: 0s.3ms.555µs
std dev: 1.058807835(ms)
Number of invalid blocks because of detected interrupted reads: 0
Number of interrupted reads: 29
Individual block statistics:
<2.44ms: 0
<5.79ms: 299889
<14.12ms: 5383
<22.46ms: 30
<39.12ms: 12
<55.79ms: 8
>55.79ms: 11
ERR: 0
Worst blocks:
block no st.dev avg 1stQ med 3rdQ valid samples 9th decile
43859 20.5902 27.37 12.37 20.66 35.28 yes 30 63.21
273929 22.9114 35.09 13.70 38.67 47.01 yes 30 65.39
80023 30.7677 33.94 10.89 23.40 50.46 yes 30 85.84
90502 30.5331 48.78 20.73 41.54 70.73 yes 30 89.06
104310 33.9484 47.65 29.02 37.38 54.06 yes 30 89.85
98190 37.7779 63.20 39.70 58.49 85.47 yes 30 105.19
98009 37.8684 68.93 35.89 77.53 94.24 yes 30 112.58
74821 35.9626 71.71 60.81 60.89 90.06 yes 30 120.03
45148 35.2132 100.32 77.49 85.93 127.55 yes 30 145.01
41191 61.4519 143.75 97.81 133.18 179.01 yes 30 229.82
Disk status: CRITICAL
CAUTION! Sectors that required more than 6 read attempts detected, drive may be ALREADY FAILING!
As can be read from the status message, this drive is in critical condition.
Disk status can range from:
- excellent - self explanatory
- very good - When all blocks were read below the rotational latency threshold
- good - When there are very few blocks that read constant re-reads (because of bad block reallocations or ECC failures)
- moderate - When there are many blocks that need constant re-reads
- bad - When blocks that required more than 2 re-reads detected
- very bad - when blocks that required more than 4 re-reads were detected
- critical - when blocks that required more than 6 read attempts were detected
- failed - when disk returned read failures
Basically any disk at very bad or worse requires administrator action (more on this later), lower levels can be considered as OK.
Note: the very fact that HDDs are physical systems, makes this scan not fully deterministic, but if there were multiple sectors that required multiple re-reads from disk (time larger than 16.66ms for 7200 rpm disk) during any one scan it means that the disk may be beginning to fail. See more in the THEORY_OF_OPERATION.md document.
When the scan is running, the status look similar to this:
hdck status:
============
Loop: 1 of 1
Progress: 2.87%, 2.87% total
Read: 56064000 sectors of 1000204795904
Speed: 121.408MiB/s, average: 94.588MiB/s
Elapsed time: 00:04:49
Expected time: 02:47:50
Samples: Blocks (9th decile):
< 2.1ms: 190880 190788
< 4.2ms: 26886 26877
< 8.3ms: 13 13
<16.7ms: 661 642
<33.3ms: 130 113
<50.0ms: 201 28
>50.0ms: 26 175
ERR : 0 0
Intrrpt: 203 364
Loop
- the current and programmed total number of whole-disk reads that the application will performProgress
- what percentage of the current read has been performed and how much is that in light of the total number of expected readsRead
- current sector number (LBA) being read and the total number of sectors on the deviceSpeed
- current and average speed of reading the deviceElapsed time
- how long has thehdck
been running, in hours, minutes and secondsExpected time
- what is the expected total runtime of the application, in hours, minutes and secondsSamples
- total number of blocks with a given propertyBlocks
- total number of blocks for which the calculated (or estimated) ninth decile places them in a given category2.1ms
,4.2ms
, etc. - the amount of time it at most took the hard drive to read the sector (the exact numbers depend on rotational speed of the hard drive, the numbers are calculated for groupings of normal reads, reads with cylinder change, 1 re-read, 2 re-reads, 4 re-reads, 6 re-reads and more than 6 re-reads)ERR
- sectors for which an IO error was returned (the read failed completely)Intrrpt
- reads which were interrupted - whenhdck
detected that other process was accessing the drive, they will be ignored and sectors will be re-read
The most important part of the scan is the list of the worst blocks:
Worst blocks:
block no st.dev avg 1stQ med 3rdQ valid samples 9th decile
7390 6.6420 4.84 1.01 1.01 6.76 yes 3 10.21
6821 5.2559 6.81 4.80 8.75 9.78 yes 3 10.40
19616 6.9332 4.99 0.99 0.99 7.00 yes 3 10.60
18108 6.9555 4.98 0.96 0.98 7.00 yes 3 10.60
25216 5.9841 6.12 2.67 2.67 7.85 yes 3 10.96
29680 5.2407 5.08 0.97 0.97 9.56 yes 12 12.66
1912 5.3093 4.41 0.98 1.02 7.49 yes 15 12.92
30218 8.2052 6.03 2.11 2.64 6.56 yes 4 13.58
2471 6.0064 7.13 2.70 2.71 14.69 yes 11 14.70
24895 11.3644 7.34 0.78 0.98 10.72 yes 3 16.57
(note, the list is sorted in reverse order - worst last - to ensure that the most important info is shown even on 80x25 rescue terminals)
Here, the explanation for columns is as follows:
block no
- block number (multiply by 256 to get LBA of first sector)st.dev
- standard deviation of all the samples for the blockavg
- average (arithmetic mean) of oll the samples1stQ
- first quartile - no more than 25% of the block reads took this much time (in ms)med
- median - no more than 50% of the block reads took this much time (in ms)3rdQ
- third quartile - no more than 75% of reads took this much time (in ms)valid
- is the statistic valid; for sectors near bad blocks (IO errors) the collection may be not possiblesamples
- number of times the block was read9th decile
- no more than 90% of reads took this much time to complete (in ms)
As with all IT systems, this program is no exception to the "garbage in, garbage out" rule. To get meaningful test results, the test environment must be relatively calm.
Things to avoid:
- running downloads (or p2p systems like torrents)
- applications that intensively use the hdd, use
iotop
oratop
to find them - applications that frequently write to hdd, The usual culprits on desktop
computers are
akonadi
,nepomuk
andnscd
. All of them can be turned off. - running tests on disk connected to computer using non native interface,
especially USB or Firewire enclosures, some IDE-SATA bridges can degrade the
meaningfulness of data. Use
--ata-verify
to workaround this. - Running in PIO mode or degraded DMA, all IDE disks produced since 2005 years
should work with UDMA 100 or UDMA 133 (check your dmsg). Similarly SATA 3.0
disk should not be run on SATA 1.0 interface. Basically, don't run fast disk
on limiting interface. If the disk is able to read at 60MiB/s (that's normal
for 120G 7200rpm HDDs) then UDMA-66 or any PIO mode will be too slow, so will
be USB-2.0 or FireWire-400, older SCSI such as Ultra2, Ultra or Ultra Wide
SCSI will limit the drive too. Use
--ata-verify
to workaround this. - Testing SSDs is supported but possibly meaningless, the only thing this application will detect correctly are hard read errors, and if SSD is in such state it's long gone. (people willing to collect statistics of well-used SSDs welcome - see #2)
If the Interrpt
row grows rapidly, it means that either something is wrong,
usually some background process keeps writing and flushing data, use iotop to
confirm. Sometimes using noatime helps:
mount -o remount,noatime /
TODO
- Dmitry Postrigan for MHDD, the main source of inspiration for
hdck