A “special snowflake” system that I’m responsible for was configured with two SAS disks in mdadm raid1 arrays. Multiple times over the course of a few years the system would essentially dead lock (still responding to ICMP echo requests) and print errors on the console about I/O errors to [0:0:1:0]
. There would be no other response on the console and this state would persist until forcibly power cycled via switched PDU. Upon a boot and a fsck cycle the system would be back in an apparently normal state.
This system literally resides in the opposite hemisphere of where [I normally] am, complicating troubleshooting for hardware problems. [0:0:1:0]
maps to /dev/sdb
and that SAS drive never displayed unusual error counts (while the system was in a working state anyways). It always passed smart “long” tests so it wasn’t possible to determine if it was a failing disk, a failing controller/port, or some sort of kernel issue. I decided to add a 3rd disk sdc
(that happened to already be in the system) into the raid1 array sets as a precaution against total disk failure. The the lockup issue reoccurred at least twice after the raid1 arrays were expanded to encompass 3 drives (warm fuzzies for software RAID…). After the most recent occurrence, sdb
finally (!!!) failed a smart test.
# smartctl -a /dev/sdb smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.18-348.6.1.el5] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net Vendor: SEAGATE Product: ST3146356SS Revision: 0009 User Capacity: 146,815,737,856 bytes [146 GB] Logical block size: 512 bytes Logical Unit id: 0x5000c5002402db33 Serial number: 3QN44R1P00009103VVUL Device type: disk Transport protocol: SAS Local Time is: Thu Nov 20 16:45:44 2014 CLST Device supports SMART and is Enabled Temperature Warning Enabled SMART Health Status: OK Current Drive Temperature: 36 C Drive Trip Temperature: 68 C Elements in grown defect list: 6 Vendor (Seagate) cache information Blocks sent to initiator = 373711492 Blocks received from initiator = 1186088917 Blocks read from cache and sent to initiator = 7176963 Number of read and write commands whose size <= segment size = 12639304 Number of read and write commands whose size > segment size = 0 Vendor (Seagate/Hitachi) factory information number of hours powered up = 33608.63 number of minutes until next internal SMART test = 6 Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 872583 0 0 872583 872583 191.340 0 write: 0 0 0 0 1 609.094 0 verify: 1773 0 0 1773 1773 0.000 0 Non-medium error count: 3 [GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on'] SMART Self-test log Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ] Description number (hours) # 1 Background long Failed in segment --> - 33515 1027100 [0x3 0x11 0x0] # 2 Background long Completed - 27975 - [- - -] # 3 Background long Completed - 22361 - [- - -] Long (extended) Self Test duration: 1740 seconds [29.0 minutes]
Since there were already two good disks in raid1 arrays, and invoking remote hands always carries some element of risk, I decided to remove the disk from the mdadm arrays and leave it installed but unused. I had never permanently reduced the size of a mdadm or device-mapper array and it took some fiddling to figure out how to accomplish this.
This is what the mdadm arrays initially looked like (I’d edited out some unrelated arrays):
$ cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sdc1[2] sdb1[1] sda1[0] 513984 blocks [3/3] [UUU] md5 : active raid1 sdc2[2] sdb2[1] sda2[0] 36861056 blocks [3/3] [UUU] md4 : active raid1 sdc5[2] sdb5[1] sda5[0] 16386176 blocks [3/3] [UUU] md1 : active raid1 sdc6[2] sdb6[1] sda6[0] 12586816 blocks [3/3] [UUU] md3 : active raid1 sdc7[2] sdb7[1] sda7[0] 1020032 blocks [3/3] [UUU] md6 : active raid1 sdc8[2] sdb8[1] sda8[0] 45287104 blocks [3/3] [UUU] md2 : active raid1 sdc3[2] sdb3[1] sda3[0] 30716160 blocks [3/3] [UUU] unused devices:
As you would when replacing a drive, you need to fail
and remove
the appropriate block devices.
mdadm --manage /dev/md0 --fail /dev/sdb1 mdadm --manage /dev/md5 --fail /dev/sdb2 mdadm --manage /dev/md4 --fail /dev/sdb5 mdadm --manage /dev/md1 --fail /dev/sdb6 mdadm --manage /dev/md3 --fail /dev/sdb7 mdadm --manage /dev/md6 --fail /dev/sdb8 mdadm --manage /dev/md2 --fail /dev/sdb3 mdadm --manage /dev/md0 --remove /dev/sdb1 mdadm --manage /dev/md5 --remove /dev/sdb2 mdadm --manage /dev/md4 --remove /dev/sdb5 mdadm --manage /dev/md1 --remove /dev/sdb6 mdadm --manage /dev/md3 --remove /dev/sdb7 mdadm --manage /dev/md6 --remove /dev/sdb8 mdadm --manage /dev/md2 --remove /dev/sdb3
At this point I was into uncharted territory. I wanted to make sure that sdb
would not be reincorporated into the array set as part of auto assembly so I nuked the superblocks.
mdadm --zero-superblock /dev/sdb1 mdadm --zero-superblock /dev/sdb2 mdadm --zero-superblock /dev/sdb5 mdadm --zero-superblock /dev/sdb6 mdadm --zero-superblock /dev/sdb7 mdadm --zero-superblock /dev/sdb8 mdadm --zero-superblock /dev/sdb3
# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sda1[0] sdc1[2] 513984 blocks [3/2] [U_U] md5 : active raid1 sda2[0] sdc2[2] 36861056 blocks [3/2] [U_U] md4 : active raid1 sda5[0] sdc5[2] 16386176 blocks [3/2] [U_U] md1 : active raid1 sda6[0] sdc6[2] 12586816 blocks [3/2] [U_U] md3 : active raid1 sda7[0] sdc7[2] 1020032 blocks [3/2] [U_U] md6 : active raid1 sda8[0] sdc8[2] 45287104 blocks [3/2] [U_U] md2 : active raid1 sda3[0] sdc3[2] 30716160 blocks [3/2] [U_U] unused devices:
However, at this point the arrays all show that one of 3 “blocks” was missing. It’s a bit counter-intuitive naming but the --grow
mode also allows you to “shrink” an array.
mdadm --grow /dev/md0 --raid-devices=2 mdadm --grow /dev/md5 --raid-devices=2 mdadm --grow /dev/md4 --raid-devices=2 mdadm --grow /dev/md1 --raid-devices=2 mdadm --grow /dev/md3 --raid-devices=2 mdadm --grow /dev/md6 --raid-devices=2 mdadm --grow /dev/md2 --raid-devices=2
And now we have a set of healthy mdadm raid1 arrays composed of partitions on only sda
and sdc
. Since there should be virtual no read or write activity to sdb
except when being probed, hopefully the SCSI I/O deadlock problem will not reoccur until the disk can be physically removed (or more likely, the entire chassis replaced).
# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sda1[0] sdc1[1] 513984 blocks [2/2] [UU] md5 : active raid1 sda2[0] sdc2[1] 36861056 blocks [2/2] [UU] md4 : active raid1 sda5[0] sdc5[1] 16386176 blocks [2/2] [UU] md1 : active raid1 sda6[0] sdc6[1] 12586816 blocks [2/2] [UU] md3 : active raid1 sda7[0] sdc7[1] 1020032 blocks [2/2] [UU] md6 : active raid1 sda8[0] sdc8[1] 45287104 blocks [2/2] [UU] md2 : active raid1 sda3[0] sdc3[1] 30716160 blocks [2/2] [UU] unused devices:
2015-03-04 at 02:05
Very useful, thanks!