RTFM

[Read This Fine Material] from Joshua Hoblitt

Permanently removing a disk from a mdadm raid1 (mirror) array

| 1 Comment

A “special snowflake” system that I’m responsible for was configured with two SAS disks in mdadm raid1 arrays. Multiple times over the course of a few years the system would essentially dead lock (still responding to ICMP echo requests) and print errors on the console about I/O errors to [0:0:1:0]. There would be no other response on the console and this state would persist until forcibly power cycled via switched PDU. Upon a boot and a fsck cycle the system would be back in an apparently normal state.

dsas3-console-screenshot-2014-03-30

This system literally resides in the opposite hemisphere of where [I normally] am, complicating troubleshooting for hardware problems. [0:0:1:0] maps to /dev/sdb and that SAS drive never displayed unusual error counts (while the system was in a working state anyways). It always passed smart “long” tests so it wasn’t possible to determine if it was a failing disk, a failing controller/port, or some sort of kernel issue. I decided to add a 3rd disk sdc (that happened to already be in the system) into the raid1 array sets as a precaution against total disk failure. The the lockup issue reoccurred at least twice after the raid1 arrays were expanded to encompass 3 drives (warm fuzzies for software RAID…). After the most recent occurrence, sdb finally (!!!) failed a smart test.

# smartctl -a /dev/sdb
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.18-348.6.1.el5] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

Vendor:               SEAGATE 
Product:              ST3146356SS     
Revision:             0009
User Capacity:        146,815,737,856 bytes [146 GB]
Logical block size:   512 bytes
Logical Unit id:      0x5000c5002402db33
Serial number:        3QN44R1P00009103VVUL
Device type:          disk
Transport protocol:   SAS
Local Time is:        Thu Nov 20 16:45:44 2014 CLST
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK

Current Drive Temperature:     36 C
Drive Trip Temperature:        68 C
Elements in grown defect list: 6
Vendor (Seagate) cache information
  Blocks sent to initiator = 373711492
  Blocks received from initiator = 1186088917
  Blocks read from cache and sent to initiator = 7176963
  Number of read and write commands whose size <= segment size = 12639304
  Number of read and write commands whose size > segment size = 0
Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 33608.63
  number of minutes until next internal SMART test = 6

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:     872583        0         0    872583     872583        191.340           0
write:         0        0         0         0          1        609.094           0
verify:     1773        0         0      1773       1773          0.000           0

Non-medium error count:        3

[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background long   Failed in segment -->       -   33515           1027100 [0x3 0x11 0x0]
# 2  Background long   Completed                   -   27975                 - [-   -    -]
# 3  Background long   Completed                   -   22361                 - [-   -    -]

Long (extended) Self Test duration: 1740 seconds [29.0 minutes]

Since there were already two good disks in raid1 arrays, and invoking remote hands always carries some element of risk, I decided to remove the disk from the mdadm arrays and leave it installed but unused. I had never permanently reduced the size of a mdadm or device-mapper array and it took some fiddling to figure out how to accomplish this.

This is what the mdadm arrays initially looked like (I’d edited out some unrelated arrays):

$ cat /proc/mdstat 
Personalities : [raid1] 
md0 : active raid1 sdc1[2] sdb1[1] sda1[0]
      513984 blocks [3/3] [UUU]

md5 : active raid1 sdc2[2] sdb2[1] sda2[0]
      36861056 blocks [3/3] [UUU]

md4 : active raid1 sdc5[2] sdb5[1] sda5[0]
      16386176 blocks [3/3] [UUU]

md1 : active raid1 sdc6[2] sdb6[1] sda6[0]
      12586816 blocks [3/3] [UUU]

md3 : active raid1 sdc7[2] sdb7[1] sda7[0]
      1020032 blocks [3/3] [UUU]

md6 : active raid1 sdc8[2] sdb8[1] sda8[0]
      45287104 blocks [3/3] [UUU]

md2 : active raid1 sdc3[2] sdb3[1] sda3[0]
      30716160 blocks [3/3] [UUU]

unused devices: 

As you would when replacing a drive, you need to fail and remove the appropriate block devices.

mdadm --manage /dev/md0 --fail /dev/sdb1
mdadm --manage /dev/md5 --fail /dev/sdb2
mdadm --manage /dev/md4 --fail /dev/sdb5
mdadm --manage /dev/md1 --fail /dev/sdb6
mdadm --manage /dev/md3 --fail /dev/sdb7
mdadm --manage /dev/md6 --fail /dev/sdb8
mdadm --manage /dev/md2 --fail /dev/sdb3

mdadm --manage /dev/md0 --remove /dev/sdb1
mdadm --manage /dev/md5 --remove /dev/sdb2
mdadm --manage /dev/md4 --remove /dev/sdb5
mdadm --manage /dev/md1 --remove /dev/sdb6
mdadm --manage /dev/md3 --remove /dev/sdb7
mdadm --manage /dev/md6 --remove /dev/sdb8
mdadm --manage /dev/md2 --remove /dev/sdb3

At this point I was into uncharted territory. I wanted to make sure that sdb would not be reincorporated into the array set as part of auto assembly so I nuked the superblocks.

mdadm --zero-superblock /dev/sdb1
mdadm --zero-superblock /dev/sdb2
mdadm --zero-superblock /dev/sdb5
mdadm --zero-superblock /dev/sdb6
mdadm --zero-superblock /dev/sdb7
mdadm --zero-superblock /dev/sdb8
mdadm --zero-superblock /dev/sdb3
# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sda1[0] sdc1[2]
      513984 blocks [3/2] [U_U]
      
md5 : active raid1 sda2[0] sdc2[2]
      36861056 blocks [3/2] [U_U]
      
md4 : active raid1 sda5[0] sdc5[2]
      16386176 blocks [3/2] [U_U]
      
md1 : active raid1 sda6[0] sdc6[2]
      12586816 blocks [3/2] [U_U]
      
md3 : active raid1 sda7[0] sdc7[2]
      1020032 blocks [3/2] [U_U]
      
md6 : active raid1 sda8[0] sdc8[2]
      45287104 blocks [3/2] [U_U]
      
md2 : active raid1 sda3[0] sdc3[2]
      30716160 blocks [3/2] [U_U]
      
unused devices:  

However, at this point the arrays all show that one of 3 “blocks” was missing. It’s a bit counter-intuitive naming but the --grow mode also allows you to “shrink” an array.

mdadm --grow /dev/md0 --raid-devices=2
mdadm --grow /dev/md5 --raid-devices=2
mdadm --grow /dev/md4 --raid-devices=2
mdadm --grow /dev/md1 --raid-devices=2
mdadm --grow /dev/md3 --raid-devices=2
mdadm --grow /dev/md6 --raid-devices=2
mdadm --grow /dev/md2 --raid-devices=2

And now we have a set of healthy mdadm raid1 arrays composed of partitions on only sda and sdc. Since there should be virtual no read or write activity to sdb except when being probed, hopefully the SCSI I/O deadlock problem will not reoccur until the disk can be physically removed (or more likely, the entire chassis replaced).

# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sda1[0] sdc1[1]
513984 blocks [2/2] [UU]

md5 : active raid1 sda2[0] sdc2[1]
36861056 blocks [2/2] [UU]

md4 : active raid1 sda5[0] sdc5[1]
16386176 blocks [2/2] [UU]

md1 : active raid1 sda6[0] sdc6[1]
12586816 blocks [2/2] [UU]

md3 : active raid1 sda7[0] sdc7[1]
1020032 blocks [2/2] [UU]

md6 : active raid1 sda8[0] sdc8[1]
45287104 blocks [2/2] [UU]

md2 : active raid1 sda3[0] sdc3[1]
30716160 blocks [2/2] [UU]

unused devices: 

One Comment

Leave a Reply