I had a boo-boo the other day while upgrading glibc on an EC2 instance. Assuming that it’s not easier in terms of opportunity cost to provision a completely new instances to replace a broken on, it is possible to recover from this situation but you can’t do it purely via the AWS console. The solution is attach the damaged volume to another instance for repair. For some reason, the AWS console will not allow you to attach multiple volumes to the same instance.
I found an excellent detailed write up on how to do this:
http://alestic.com/2011/02/ec2-fix-ebs-root
The procedure is to:
- Start a new instance in the same availability zone as the instance/volume that needs repair. Do this first as it takes awhile for EC2 instances to spin up. There’s no reason to create anything larger than a micro instance since we just need something to attach the broken volume to.
- Stop (not terminate!) the broken instance.
- Detach the volume needing repair from the broken instance after it has completely stopped.
- Attach the broken volume to the new repair instance as
/dev/sdf
or higher. - On the repair instance, do something like
cat /proc/partition
(or installlsscsi
) to make sure that the broken volume has appeared as a new block device. Likely something likexvdj
. blockdev --rereadpt /dev/xvdj
(not needed, just paranoid);mkdir /mnt/tmp; mount /dev/xvdj /mnt/tmp
.- Do any needed repairs from the repair instance’s shell and/or
chroot /mnt/tmp
(if you want a fully working chroot you will need to bind mount sysfs/etc.). - Once your sure it’s fixed.
umount /mnt/tmp
- Detach the fixed volume from the repair instance.
- Attach the fixed volume to the original instance as
/dev/sda1
. - Start the original instance.
- Terminate the repair instance.