A few quick pics of socket LGA 2011

2012-09-20 by jhoblitt | 0 comments

We just got our first three Xeon E5-26xx / socket LGA 2011 based systems @ $day_job and one of them came from the vendor with the incorrect model of motherboard installed. I took a few quick snaps of the socket while swapping the board out for the correct model. Had I been thinking about blogging it, I would have taken a shot of the pads on the bottom of the CPU too.

How to fsck a GPFS filesystem after a disk fault

2012-09-18 by jhoblitt | 0 comments

Recently, we expired a GPFS fault caused by some sort of LSI9285-8e glitch that happened during a regular patrol read. The fault has not been reproductable. This is what the syslog message from GPFS look like:

Sep 2 20:17:31 foo01 mmfs: Error=MMFS_SYSTEM_UNMOUNT, ID=0xC954F85D, Tag=21232\
32: Unrecoverable file system operation error. Status code 19. Volume foodata1
Sep 2 20:21:39 foo01 mmfs: Error=MMFS_DISKFAIL, ID=0x9C6C05FA, Tag=2123233: \
Disk failure. Volume decdata1. rc = 19. Physical volume nsd1
Sep 2 20:21:39 foo01 mmfs: Error=MMFS_SYSTEM_UNMOUNT, ID=0xC954F85D, Tag=21232\
34: Unrecoverable file system operation error. Status code 19. Volume foodata1

As you can see, nsd1 is not available:

[root@foonsd1 ~]# mmlsdisk foodata1
disk         driver   sector failure holds    holds                            storage
name         type       size   group metadata data  status        availability pool
------------ -------- ------ ------- -------- ----- ------------- ------------ ------------
nsd1         nsd         512       1 Yes      Yes   ready         down         system
nsd2         nsd         512       2 Yes      Yes   ready         up           system
nsd3         nsd         512       1 Yes      Yes   ready         up           system
nsd4         nsd         512       2 Yes      Yes   ready         up           system

mmchdisk needs to be run to re-enable the downed disk. This operation is functionally similar to mounting a non-distributed filesystem that was not umounted cleanly.

[root@foonsd1 log]# mmchdisk foodata1 start -d nsd1
Scanning file system metadata, phase 1 ...
  81 % complete on Tue Sep  4 10:17:53 2012
 100 % complete on Tue Sep  4 10:17:54 2012
Scan completed successfully.
Scanning file system metadata, phase 2 ...
Scan completed successfully.
Scanning file system metadata, phase 3 ...
Scan completed successfully.
Scanning file system metadata, phase 4 ...
Scan completed successfully.
Scanning user file metadata ...
 100.00 % complete on Tue Sep  4 10:18:03 2012
Scan completed successfully.

Now we want to fsck the entire filesystem with mmfsck. Note that the -t argument is a path for tempary working files. Obivously, this can’t be on the filesystem your fscking.

[root@foonsd1 log]# mmfsck foodata1 -v -o -t /home/gpfs/
Checking "foodata1"
  fsckFlags                    0x18
  needNewLogs                  0
  nThreads                     8
  clientTerm                   0
  fsckReady                    1
  fsckCreated                  0
  % pool allowed               50
  tuner                        off
  threshold                      0.20
  Disks                        4
  Bytes per subblock           131072 131072
  Sectors per subblock         256 1654712940
  Sectors per indirect block   64
  Subblocks per block          32
  Subblocks per indirect block 1
  Inodes                       7372800
  Inode size                   512
  singleINum                   -1
  Inode regions                131
  maxInodesPerSegment          522240
  Segments per inode region    1
  Bytes per inode segment      4194304
  nInode0Files                 1
  Memory available per pass    4214505436
  Regions per pass of pool system 1124
  fsckStatus                   2
  lrOwned                      -1
  hrOwned                      -1
  PA size                      0
  PA map size                  0
  PA OptimalInodes             0
  Inodes per inode block       8192
  Data ptrs per inode          16
  Indirect ptrs per inode      16
  Data ptrs per indirect       1363
  User files exposed           some
  Meta files exposed           some
  User files ill replicated    some
  Meta files ill replicated    some
  User files unbalanced        all
  Meta files unbalanced        all
  Current snapshots            0
  Max snapshots                256
  checkFilesets                1
  checkFilesetsV2              1
  Worker node                  0
Checking inodes
Regions 0 to 1123 of total 1124 in storage pool "system".
Node x.x.27.29 (foo09) starting inode scan 0 to 65535

[lots more output about inode scanning...]

Lost blocks were found.
Correct the allocation map? y

   292765696 subblocks
    62243195   allocated
       32010   unreferenced
       32010   deallocated

     2531993 addresses
           0   suspended

File system is clean.
Exit status 0:10:0.

And now we’re ready to remount the filessytem.

[root@foonsd1 log]# mmlsdisk foodata1
disk         driver   sector failure holds    holds                            storage
name         type       size   group metadata data  status        availability pool
------------ -------- ------ ------- -------- ----- ------------- ------------ ------------
nsd1         nsd         512       1 Yes      Yes   ready         up           system
nsd2         nsd         512       2 Yes      Yes   ready         up           system
nsd3         nsd         512       1 Yes      Yes   ready         up           system
nsd4         nsd         512       2 Yes      Yes   ready         up           system
[root@foonsd1 log]# mmmount all -a
Tue Sep  4 10:21:19 MST 2012: mmmount: Mounting file systems ...
[root@foonsd1 log]# mmlsmount all
File system foodata1 is mounted on 18 nodes.

pwauth checks UID of caller and restricts the minimum UID of the account checked

2012-08-09 by jhoblitt | 0 comments

After a very frustrating hour of trying to figure out why pwauth called from the Apache module mod_auth_external was failing for some user accounts but not others, I finally discovered that the source code has a default mimum UID of 500. This appears to be preserved in the package pwauth-2.3.10-1.el6.x86_64 from epel.

The error messages on failure in the apache error_log look something like this:

[Wed Aug 08 17:08:04 2012] [error] [client 10.1.1.1] user foo: authentication failure for "/bar/": Password Mismatch
[Wed Aug 08 17:08:29 2012] [error] [client 10.1.1.1] AuthExtern pwauth [/usr/bin/pwauth]: Failed (3) for user foo

The solution is to download the pwauth source and change some header values. For pwauth-2.3.10, you need to change these two values in the config.h header.

#define SERVER_UIDS 30    /* user "wwwrun" on the author's system */

#define MIN_UNIX_UID 500  /**/

On RHEL/SL/Centos, the SERVER_UID (aka, the UID of apache) is likely 48. Since I I had already installed the pwauth RPM, the installation was simple as the nessicary pam configuration was already done.

sudo mv /usr/bin/pwauth /usr/bin/pwauth.old
sudo cp ./pwauth /usr/bin/pwauth
sudo chgrp apache /usr/bin/pwauth
sudo chmod 4750 /usr/bin/pwauth
$ ls -la /usr/bin/pwauth*
-rwsr-x--- 1 root apache 20066 Aug  8 17:28 /usr/bin/pwauth
-rwsr-x--- 1 root apache  8112 May  7 18:27 /usr/bin/pwauth.old

And that fixes the UID restriction.

How take a screenshot under gnome when the application captures keyboard input

2012-07-30 by jhoblitt | 0 comments

I discovered today that the virt-manager console window for kvm/qemu captures the input from the print screen key effectively keeping you from using the default gnome hotkey to screenshot just the window with focus.

The work around for this is run the gnome-screenshot utility from another terminal with the --window and --delay flags. Eg.