Monday, August 07, 2006


This server is running CentOS 4.3, with a bunch of drives in two RAID-1 arrays. (I really should have used LVM to get everything in a nice, contiguous space, but that's a subject for a different day.) One of the arrays is on two 120GB SATA drives, one of which was given me by a former employer who swore the drive was bad. I brought the drive home, and ran Samsung's test utility on it, which told me it was fine. There was no room in my machine at that time for another drive, so I let it sit and collect dust for over a year.

When I was putting together my new desktop, I almost used that drive for the machine, but decided instead that if it's reliability wasn't 100%, that I would use it as a RAID member in one of the arrays for the server. Sure enough, a few days after the install, I got an email saying that one of the drives in the array had failed. It was the Samsung drive. I downloaded the latest version of smartutils, which now works on SATA drives, and it verified what the RAID software had told me. This drive was going down in flames.
I ordered a new hard drive from, a 160GB Seagate. I booted Knoppix, used partimage to create an image of the 120GB drive that was in my desktop, swapped drives, and restored the image. (I love partimage. I use knoppix at work to manage our 50+ workstations. Cheaper then ghost, and it works great.)

Then came the fun of replacing the dead drive. I had done this once before in Linux, but I couldn't remember how. I logged into the server, and read the man pages on mdadm.

First, I wrote out the partition information on the failing drive:

fdisk -l /dev/sdb > sdb-partition.txt

Then, I shut down the server, and swapped drives. When that was complete, I booted the system up. Of course, all the arrays that had this drive in it came up degraded. I ran fdisk, and created the partition table just like it was in the sdb-partition.txt file.

Then, I ran a couple of mdadm commands:

mdadm /dev/md0 --add /dev/sdb1
mdadm /dev/md1 --add /dev/sdb2

Yay! Rebuilding started in the background.

Now I don't have to worry about losing my data to a drive failure.


Post a Comment

<< Home