2008年11月3日 星期一

Can RAID 5 read detect bad blocks?

[Wiki] Standard RAID levels
http://en.wikipedia.org/wiki/Standard_RAID_levels

The parity blocks are not read on data reads, since this would be unnecessary overhead and would diminish performance. The parity blocks are read, however, when a read of blocks in the stripe and within the parity block in the stripe are used to reconstruct the errant sector. The CRC error is thus hidden from the main computer.

Fails
  1. totally failed, can't read
  2. partially failed, but write-then-read check would failed


Parity check on read is very expensive, a read would require N(disks amount) read.

Can RAID 5 detect 2nd type fails, if no parity check on read?

Shone, my colleague, says that HD itself should be capable of detecting the fail first, while some hardware RAID do have a option to turn on "parity check on read".
Does HD have the ability to detect the 2nd type failure?
(My guessing)
check parity on read, if failed:
  1. write-then-read check for each relative block on each disk, in order to know which block fails
  2. move the whole stripe to another location
  3. regenerate the failed block data

http://linas.org/linux/raid.html
Bad Blocks on Disk Drive
The most common form of disk drive failure is a slow but steady loss of 'blocks' on the disk drive. Blocks can go bad in a number of ways: the thin film of magnetic media can separate or slide on its underlying disk platter; the film of magnetic media can span a pit or gouge in the underlying platter, and eventually, like a soap bubble, it can pop. Although disk drives have filters that prevent dust from entering, the filters will not keep out humidity, and slow corrosion can set in. Mechanical abrasion can occur in several ways: the disk head can smash into the disk; alternately, a piece of broken-off media can temporarily jam under the head, or can skitter across the disk patter. Disk head crashes can be caused by kicking or hitting the CPU cabinet; they can also be caused by vibration induced by cooling fans, construction work in the room, etc. There are many other mechanical causes leading to (permanent) bad blocks. In addition, there are also "soft" or corrupted blocks: in modern hard drives, the size of one bit is so small that ordinary thermal noise (Boltzmann noise) is sufficient to occasionally flip a bit. This occurs so frequently that it is normally handled by the disk firmware: modern disk drives store ECC bits to detect and correct such errors. The number of ECC-corrected errors on a disk can be monitored with smartmon tools. Although on-disk ECC correction is sufficient to correct most soft errors, a tiny fraction will remain uncorrectable. Such soft errors damage the data, but do not render the block permanently (physically) unusable. Other soft errors are described in the next section below.

Over time, bad blocks can accumulate, and, from personal experience, do so as fast as one a day. Once a block is bad, data cannot be (reliably) read from it. Bad blocks are not uncommon: all brand new disk drives leave the factory with hundreds (if not thousands) of bad blocks on them. The hard drive electronics can detect a bad block, and automatically reassign in its place a new, good block from elsewhere on the disk. All subsequent accesses to that block by the operating system are automatically and transparently handled by the disk drive. This feature is both good, and bad. As blocks slowly fail on the drive, they are automatically handled until one day the bad-block lookup table on the hard drive is full. At this point, bad blocks become painfully visible to the operating system: Linux will grind to a near halt, while spewing dma_intr: status=0x51 { DriveReady SeekComplete UnrecoverableError } messages.

Using RAID can mitigate the effect of bad blocks. A Linux md-based software RAID array can be forced to run a check/repair sequence by writing the appropriate command to /sys/block/mdX/md/sync_action (see RAID Administration commands, and also below, for details). During repairs, if a disk drive reports a read error, the RAID array will attempt to obtain a good copy of the data from another disk, and then write the good copy onto the failing disk. Assuming the disk has spare blocks for bad-block relocation, this should trigger the bad-block relocation mechanism of the disk. If the disk no longer has spare blocks, then syslog error messages should provide adequate warning that a hard drive needs to be replaced. In short, RAID can protect against bad blocks, provided that the disk drive firmware is correctly detecting and reporting bad blocks. For the case of general data corruption, discussed below, this need not be the case.


BTW, I don't see any field that is related to the number of ECC-corrected errors on a disk:
root@test-laptop# smartctl --all /dev/sda
smartctl version 5.37 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model: WDC WD1600BEVS-22RST0
Serial Number: WD-WXE108Y89770
Firmware Version: 04.01G04
User Capacity: 160,041,885,696 bytes
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 7
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Mon Nov 3 19:02:24 2008 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (6780) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 87) minutes.
Conveyance self-test routine
recommended polling time: ( 6) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0003 187 186 021 Pre-fail Always - 1641
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 509
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 100 253 051 Pre-fail Always - 0
9 Power_On_Hours 0x0032 098 098 000 Old_age Always - 1796
10 Spin_Retry_Count 0x0013 100 100 051 Pre-fail Always - 0
11 Calibration_Retry_Count 0x0012 100 100 051 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 479
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 215
193 Load_Cycle_Count 0x0032 154 154 000 Old_age Always - 138390
194 Temperature_Celsius 0x0022 106 085 000 Old_age Always - 41
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0009 100 253 051 Pre-fail Offline - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 0 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

root@test-laptop#

沒有留言: