RTFM

[Read This Fine Material] from Joshua Hoblitt

How to fsck a GPFS filesystem after a disk fault

| 0 comments

Recently, we expired a GPFS fault caused by some sort of LSI9285-8e glitch that happened during a regular patrol read. The fault has not been reproductable. This is what the syslog message from GPFS look like:

Sep 2 20:17:31 foo01 mmfs: Error=MMFS_SYSTEM_UNMOUNT, ID=0xC954F85D, Tag=21232\
32: Unrecoverable file system operation error. Status code 19. Volume foodata1
Sep 2 20:21:39 foo01 mmfs: Error=MMFS_DISKFAIL, ID=0x9C6C05FA, Tag=2123233: \
Disk failure. Volume decdata1. rc = 19. Physical volume nsd1
Sep 2 20:21:39 foo01 mmfs: Error=MMFS_SYSTEM_UNMOUNT, ID=0xC954F85D, Tag=21232\
34: Unrecoverable file system operation error. Status code 19. Volume foodata1 

As you can see, nsd1 is not available:

[root@foonsd1 ~]# mmlsdisk foodata1
disk         driver   sector failure holds    holds                            storage
name         type       size   group metadata data  status        availability pool
------------ -------- ------ ------- -------- ----- ------------- ------------ ------------
nsd1         nsd         512       1 Yes      Yes   ready         down         system
nsd2         nsd         512       2 Yes      Yes   ready         up           system
nsd3         nsd         512       1 Yes      Yes   ready         up           system
nsd4         nsd         512       2 Yes      Yes   ready         up           system

mmchdisk needs to be run to re-enable the downed disk. This operation is functionally similar to mounting a non-distributed filesystem that was not umounted cleanly.

[root@foonsd1 log]# mmchdisk foodata1 start -d nsd1
Scanning file system metadata, phase 1 ...
  81 % complete on Tue Sep  4 10:17:53 2012
 100 % complete on Tue Sep  4 10:17:54 2012
Scan completed successfully.
Scanning file system metadata, phase 2 ...
Scan completed successfully.
Scanning file system metadata, phase 3 ...
Scan completed successfully.
Scanning file system metadata, phase 4 ...
Scan completed successfully.
Scanning user file metadata ...
 100.00 % complete on Tue Sep  4 10:18:03 2012
Scan completed successfully.

Now we want to fsck the entire filesystem with mmfsck. Note that the -t argument is a path for tempary working files. Obivously, this can’t be on the filesystem your fscking.

[root@foonsd1 log]# mmfsck foodata1 -v -o -t /home/gpfs/
Checking "foodata1"
  fsckFlags                    0x18
  needNewLogs                  0
  nThreads                     8
  clientTerm                   0
  fsckReady                    1
  fsckCreated                  0
  % pool allowed               50
  tuner                        off
  threshold                      0.20
  Disks                        4
  Bytes per subblock           131072 131072
  Sectors per subblock         256 1654712940
  Sectors per indirect block   64
  Subblocks per block          32
  Subblocks per indirect block 1
  Inodes                       7372800
  Inode size                   512
  singleINum                   -1
  Inode regions                131
  maxInodesPerSegment          522240
  Segments per inode region    1
  Bytes per inode segment      4194304
  nInode0Files                 1
  Memory available per pass    4214505436
  Regions per pass of pool system 1124
  fsckStatus                   2
  lrOwned                      -1
  hrOwned                      -1
  PA size                      0
  PA map size                  0
  PA OptimalInodes             0
  Inodes per inode block       8192
  Data ptrs per inode          16
  Indirect ptrs per inode      16
  Data ptrs per indirect       1363
  User files exposed           some
  Meta files exposed           some
  User files ill replicated    some
  Meta files ill replicated    some
  User files unbalanced        all
  Meta files unbalanced        all
  Current snapshots            0
  Max snapshots                256
  checkFilesets                1
  checkFilesetsV2              1
  Worker node                  0
Checking inodes
Regions 0 to 1123 of total 1124 in storage pool "system".
Node x.x.27.29 (foo09) starting inode scan 0 to 65535

[lots more output about inode scanning...]

Lost blocks were found.
Correct the allocation map? y

   292765696 subblocks
    62243195   allocated
       32010   unreferenced
       32010   deallocated

     2531993 addresses
           0   suspended

File system is clean.
Exit status 0:10:0.

And now we’re ready to remount the filessytem.

[root@foonsd1 log]# mmlsdisk foodata1
disk         driver   sector failure holds    holds                            storage
name         type       size   group metadata data  status        availability pool
------------ -------- ------ ------- -------- ----- ------------- ------------ ------------
nsd1         nsd         512       1 Yes      Yes   ready         up           system
nsd2         nsd         512       2 Yes      Yes   ready         up           system
nsd3         nsd         512       1 Yes      Yes   ready         up           system
nsd4         nsd         512       2 Yes      Yes   ready         up           system
[root@foonsd1 log]# mmmount all -a
Tue Sep  4 10:21:19 MST 2012: mmmount: Mounting file systems ...
[root@foonsd1 log]# mmlsmount all
File system foodata1 is mounted on 18 nodes.

Leave a Reply