Replacing a failed mdadm mirror disk

2008-02-20 by jhoblitt | 0 comments

The scenario is that we have a system with only two SATA disks setup as a RAID1/mirror set. Both disks are partitioned as follows:


Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1          13       97656   fd  Linux raid autodetect
Partition 1 does not end on cylinder boundary.
/dev/sda2              13        2005    16000000   82  Linux swap / Solaris
/dev/sda3            2005        5988    32000000   fd  Linux raid autodetect
/dev/sda4            5988       60802   440288927+  fd  Linux raid autodetect

Where sda[134] are mirrored to partitions sdb[134] (named /dev/md[123]), while sda2 & sdb2 are used directly as swap space (there is no reason to mirror swap space and using each partition separately doubles the size of usable swap). Also worth noting is that since the system is booting from one or the other of this disks (in the event of a disk failure), they both need to have identical MBRs. Now lets say that /dev/sda has failed (as actually happened to a system here) and that the bad disk has been pulled and replaced. We end up with a system that looks like this:


# cat /proc/partitions 
major minor  #blocks  name

   8    16  488386584 sdb
   8    17      97656 sdb1
   8    18   16000000 sdb2
   8    19   32000000 sdb3
   8    20  440288927 sdb4
   9     2   31999936 md2
   9     3  440288832 md3
   9     1      97536 md1
   8    32  488386584 sdc

What’s happened here is that Linux has recognized that even thou there is a device attached to same SATA channel as /dev/sda was on, that this device is different and should be given a different name. This is also a brand new disk that does not yet contain a partition table. In order to recover a mdadm managed mirror set we need to have a partition of equal or great size to add into the raid set. We could parition /dev/sdc by hand but then we’d still have to deal with running grub and setting up the MBR. Fortunately there is an easier way:


# dd if=/dev/sdb of=/dev/sdc bs=512 count=1

Which copies both the MBR and the partition table. Next we need to the kernel to attempt to re-read the parition table for this disk.


# partprobe /dev/sdc
# cat /proc/partitions 
major minor  #blocks  name

   8    16  488386584 sdb
   8    17      97656 sdb1
   8    18   16000000 sdb2
   8    19   32000000 sdb3
   8    20  440288927 sdb4
   9     2   31999936 md2
   9     3  440288832 md3
   9     1      97536 md1
   8    32  488386584 sdc
   8    33      97656 sdc1
   8    34   16000000 sdc2
   8    35   32000000 sdc3
   8    36  440288927 sdc4

At this point were finally read to start recovering the RAID individual RAID sets. This is what the status of the RAID sets currently is. Note the “(F)”s from where I had alrady tried to rebuild the RAID sets onto the faultly /dev/sda.


# cat /proc/mdstat     
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] 
md1 : active raid1 sda1[2](F) sdb1[1]
      97536 blocks [2/1] [_U]
      
md3 : active raid1 sda4[2](F) sdb4[1]
      440288832 blocks [2/1] [_U]
      
md2 : active raid1 sdb3[1]
      31999936 blocks [2/1] [_U]
      
unused devices:

From here is just a simple matter of adding back in the /dev/sdcX equivilants to /dev/sdaX.


mops11 ~ # mdadm --manage /dev/md1 --add /dev/sdc1
mdadm: added /dev/sdc1
mops11 ~ # mdadm --manage /dev/md3 --add /dev/sdc4
mdadm: added /dev/sdc4
mops11 ~ # mdadm --manage /dev/md2 --add /dev/sdc3
mdadm: added /dev/sdc3
mops11 ~ # cat /proc/mdstat 
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] 
md1 : active raid1 sdc1[0] sda1[2](F) sdb1[1]
      97536 blocks [2/2] [UU]
      
md3 : active raid1 sdc4[2] sda4[3](F) sdb4[1]
      440288832 blocks [2/1] [_U]
      [>....................]  recovery =  0.0% (378368/440288832) finish=4708.3min speed=1557K/sec
      
md2 : active raid1 sdc3[2](F) sdb3[1]
      31999936 blocks [2/1] [_U]
      
unused devices:

Drat! It looks like /dev/sdc has died too. This probably means that either this SATA channel is bad or there is a cabling problem. Looking at the system’s dmesg we see:


sd 0:0:0:0: [sdc] Synchronizing SCSI cache
sd 0:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
sd 0:0:0:0: [sdc] Stopping disk
sd 0:0:0:0: [sdc] START_STOP FAILED
sd 0:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
...
scsi 0:0:0:0: Direct-Access     ATA      WDC WD5000YS-01M 09.0 PQ: 0 ANSI: 5
sd 0:0:0:0: [sdd] 976773168 512-byte hardware sectors (500108 MB)
sd 0:0:0:0: [sdd] Write Protect is off
sd 0:0:0:0: [sdd] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 0:0:0:0: [sdd] 976773168 512-byte hardware sectors (500108 MB)
sd 0:0:0:0: [sdd] Write Protect is off
sd 0:0:0:0: [sdd] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sdd: sdd1 sdd2 sdd3 sdd4
sd 0:0:0:0: [sdd] Attached SCSI disk
sd 0:0:0:0: Attached scsi generic sg0 type 0
scsi 0:0:0:0: rejecting I/O to dead device

Had this not happened we would have only had to resetup the swap partition:


# mkwap /dev/sdc2

and reboot the system.

UPDATE: 17:36

In this case, it turned out to be dust in the SATA connectors (or at least, blowing the ends of the cables out seems to have fiex the issue). Here is what the system should look like with a properly rebuilding RAID set:


Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] 
md1 : active raid1 sdb1[0] sda1[1]
      97536 blocks [2/2] [UU]
      
md3 : active raid1 sda4[2] sdb4[1]
      440288832 blocks [2/1] [_U]
      [=====>...............]  recovery = 25.8% (113689216/440288832) finish=81.2min speed=67012K/sec
      
md2 : active raid1 sdb3[0] sda3[1]
      31999936 blocks [2/2] [UU]
      
unused devices:

RTFM

[Read This Fine Material] from Joshua Hoblitt

Replacing a failed mdadm mirror disk

Leave a Reply Cancel reply