A “special snowflake” system that I’m responsible for was configured with two SAS disks in mdadm raid1 arrays. Multiple times over the course of a few years the system would essentially dead lock (still responding to ICMP echo requests) and print errors on the console about I/O errors to
[0:0:1:0]. There would be no other response on the console and this state would persist until forcibly power cycled via switched PDU. Upon a boot and a fsck cycle the system would be back in an apparently normal state.
This system literally resides in the opposite hemisphere of where [I normally] am, complicating troubleshooting for hardware problems.
[0:0:1:0] maps to
/dev/sdb and that SAS drive never displayed unusual error counts (while the system was in a working state anyways). It always passed smart “long” tests so it wasn’t possible to determine if it was a failing disk, a failing controller/port, or some sort of kernel issue. I decided to add a 3rd disk
sdc (that happened to already be in the system) into the raid1 array sets as a precaution against total disk failure. The the lockup issue reoccurred at least twice after the raid1 arrays were expanded to encompass 3 drives (warm fuzzies for software RAID…). After the most recent occurrence,
sdb finally (!!!) failed a smart test.
# smartctl -a /dev/sdb smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.18-348.6.1.el5] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net Vendor: SEAGATE Product: ST3146356SS Revision: 0009 User Capacity: 146,815,737,856 bytes [146 GB] Logical block size: 512 bytes Logical Unit id: 0x5000c5002402db33 Serial number: 3QN44R1P00009103VVUL Device type: disk Transport protocol: SAS Local Time is: Thu Nov 20 16:45:44 2014 CLST Device supports SMART and is Enabled Temperature Warning Enabled SMART Health Status: OK Current Drive Temperature: 36 C Drive Trip Temperature: 68 C Elements in grown defect list: 6 Vendor (Seagate) cache information Blocks sent to initiator = 373711492 Blocks received from initiator = 1186088917 Blocks read from cache and sent to initiator = 7176963 Number of read and write commands whose size <= segment size = 12639304 Number of read and write commands whose size > segment size = 0 Vendor (Seagate/Hitachi) factory information number of hours powered up = 33608.63 number of minutes until next internal SMART test = 6 Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 872583 0 0 872583 872583 191.340 0 write: 0 0 0 0 1 609.094 0 verify: 1773 0 0 1773 1773 0.000 0 Non-medium error count: 3 [GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on'] SMART Self-test log Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ] Description number (hours) # 1 Background long Failed in segment --> - 33515 1027100 [0x3 0x11 0x0] # 2 Background long Completed - 27975 - [- - -] # 3 Background long Completed - 22361 - [- - -] Long (extended) Self Test duration: 1740 seconds [29.0 minutes]
Since there were already two good disks in raid1 arrays, and invoking remote hands always carries some element of risk, I decided to remove the disk from the mdadm arrays and leave it installed but unused. I had never permanently reduced the size of a mdadm or device-mapper array and it took some fiddling to figure out how to accomplish this.
This is what the mdadm arrays initially looked like (I’d edited out some unrelated arrays):
$ cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sdc1 sdb1 sda1 513984 blocks [3/3] [UUU] md5 : active raid1 sdc2 sdb2 sda2 36861056 blocks [3/3] [UUU] md4 : active raid1 sdc5 sdb5 sda5 16386176 blocks [3/3] [UUU] md1 : active raid1 sdc6 sdb6 sda6 12586816 blocks [3/3] [UUU] md3 : active raid1 sdc7 sdb7 sda7 1020032 blocks [3/3] [UUU] md6 : active raid1 sdc8 sdb8 sda8 45287104 blocks [3/3] [UUU] md2 : active raid1 sdc3 sdb3 sda3 30716160 blocks [3/3] [UUU] unused devices:
As you would when replacing a drive, you need to
remove the appropriate block devices.
mdadm --manage /dev/md0 --fail /dev/sdb1 mdadm --manage /dev/md5 --fail /dev/sdb2 mdadm --manage /dev/md4 --fail /dev/sdb5 mdadm --manage /dev/md1 --fail /dev/sdb6 mdadm --manage /dev/md3 --fail /dev/sdb7 mdadm --manage /dev/md6 --fail /dev/sdb8 mdadm --manage /dev/md2 --fail /dev/sdb3 mdadm --manage /dev/md0 --remove /dev/sdb1 mdadm --manage /dev/md5 --remove /dev/sdb2 mdadm --manage /dev/md4 --remove /dev/sdb5 mdadm --manage /dev/md1 --remove /dev/sdb6 mdadm --manage /dev/md3 --remove /dev/sdb7 mdadm --manage /dev/md6 --remove /dev/sdb8 mdadm --manage /dev/md2 --remove /dev/sdb3
At this point I was into uncharted territory. I wanted to make sure that
sdb would not be reincorporated into the array set as part of auto assembly so I nuked the superblocks.
mdadm --zero-superblock /dev/sdb1 mdadm --zero-superblock /dev/sdb2 mdadm --zero-superblock /dev/sdb5 mdadm --zero-superblock /dev/sdb6 mdadm --zero-superblock /dev/sdb7 mdadm --zero-superblock /dev/sdb8 mdadm --zero-superblock /dev/sdb3
# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sda1 sdc1 513984 blocks [3/2] [U_U] md5 : active raid1 sda2 sdc2 36861056 blocks [3/2] [U_U] md4 : active raid1 sda5 sdc5 16386176 blocks [3/2] [U_U] md1 : active raid1 sda6 sdc6 12586816 blocks [3/2] [U_U] md3 : active raid1 sda7 sdc7 1020032 blocks [3/2] [U_U] md6 : active raid1 sda8 sdc8 45287104 blocks [3/2] [U_U] md2 : active raid1 sda3 sdc3 30716160 blocks [3/2] [U_U] unused devices:
However, at this point the arrays all show that one of 3 “blocks” was missing. It’s a bit counter-intuitive naming but the
--grow mode also allows you to “shrink” an array.
mdadm --grow /dev/md0 --raid-devices=2 mdadm --grow /dev/md5 --raid-devices=2 mdadm --grow /dev/md4 --raid-devices=2 mdadm --grow /dev/md1 --raid-devices=2 mdadm --grow /dev/md3 --raid-devices=2 mdadm --grow /dev/md6 --raid-devices=2 mdadm --grow /dev/md2 --raid-devices=2
And now we have a set of healthy mdadm raid1 arrays composed of partitions on only
sdc. Since there should be virtual no read or write activity to
sdb except when being probed, hopefully the SCSI I/O deadlock problem will not reoccur until the disk can be physically removed (or more likely, the entire chassis replaced).
# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sda1 sdc1 513984 blocks [2/2] [UU] md5 : active raid1 sda2 sdc2 36861056 blocks [2/2] [UU] md4 : active raid1 sda5 sdc5 16386176 blocks [2/2] [UU] md1 : active raid1 sda6 sdc6 12586816 blocks [2/2] [UU] md3 : active raid1 sda7 sdc7 1020032 blocks [2/2] [UU] md6 : active raid1 sda8 sdc8 45287104 blocks [2/2] [UU] md2 : active raid1 sda3 sdc3 30716160 blocks [2/2] [UU] unused devices: