Should I have named it Barad-dûr? Or, my new server.

When we last left our intrepid hero (geek), had convinced myself that the new Western Digital drives with the 4kb block sizes operated perfectly fine under OpenSolaris, but that the frankenstein of my old server needed to be replaced. I figured if I was going to do this, I was going to do it right. And given the goal of this machine was storage, which meant fast network, lots of SATA ports, and lots of drive bays. So, thanks to the wonders of the internet and Newegg, I found the following items.

CPU: Intel Core 2 Duo E8400 Wolfdale 3.0GHz - Seemed a nice balance between power consumption, power, and performance.
Motherboard: TYAN S5220AG2NR Toledo q35T LGA 775 Intel Q35 ATX Motherboard - This won out due to the low cost and large number of on-board SATA ports (6).
Memory: 4 x Kingston 1GB 240-Pin DDR2 SDRAM DDR2 667 - I bought ECC RAM, but unfortunately the motherboard does not support it (descriptions were all vague).
SAS Controller: LSI Internal SATA/SAS SAS3081E-R 3Gb/s PCI-Express 1.1 RAID Controller Card - This card is one generation back for SAS, but it is plenty fast to support 8 SATA drives. I didn’t get this until after the machine was up and running, I will touch on moving the array to the new card later.
Storage: 8-2 (aka 6) x Western Digital Caviar Green WD15EARS 1.5TB 64MB Cache SATA 3.0Gb/s 3.5” Internal Hard Drive - 2 drives were DOA on arrival, but Newegg did a great job of handling the RMAs. Chosen because they are big, relatively cheap, low power, and cool to run.
Power: Antec CP-850 850W Power Supply - Specifically designed for the case below, lots of power connectors, and power to spare.
Case: Antec Twelve Hundred Black Steel ATX Full Tower Computer Case - This is a beast of a case, but lots of room inside and nice to work with.

If you look at the specs and are counting, you will see that this gives me a combined total of 14 SATA ports, more than enough for expansion as the case has 12 drive bays. The primary storage is the 6 new SATA drives attached to the LSI SAS card. I also brought over the 3 remaining 1GB SATA drives I had from my old server attached to the motherboard SATA controller. I don’t trust them completely anymore, but they make a nice big scratch space volume.

Of note is also the challenge I had with getting enough working drives. I needed 6 hard drives to build the system, but I had to order a total of 8 before I had 6 working ones. I’m not sure what to take away from this. Obviously, I don’t like the idea of drives dying, but I think I much prefer DOA drives than ones that will fail later. But these are obviously consumer grade devices, which is why they are going into a dual-parity RAID system because I don’t trust them.

In my old server, I used an old ATA drive as the boot drive. It was small (capacity) and loud and probably used up more power than it should. This time, I wanted something small for the boot drive, since I didn’t really care about it terribly much. So, for the new server, I took a 2.5” laptop SATA drive and plugged in a SATA->USB adapter and am using that as the boot drive. It doesn’t take up much room and draws very little power.

So, after all the parts arrived and 2 rounds of RMAs with dead hard drives, I built the new system. Obviously OS installation was the first step. So I temporarily hooked up a DVD drive into the system and booted it up. BIOS, check. Bootloader, check. Kernel loaded, check. Detecting devices………….nothing. Shit.

I ended up doing quite a few steps at this point of debugging, trying older and newer versions of OpenSolaris. No change. I finally learned how to boot the kernel such that it drops into the kernel debugger (if that doesn’t make you run in fear, it should) and get it to boot while spitting out lots more debug info. The specific commands for anyone coming to this page with similar troubles were to first add the parameter -kdv to the kernel line in GRUB, then when you get dropped into the debugger, enter the following items.

use_mp/W0
moddebug/W80000000
::cont

This hardly gave me a culprit. But I noticed after enabling and disabling various things in the BIOS that the problem always seemed to occur around the time the system loaded the ATA drivers. As the only thing attached via ATA was the DVD drive, I researched in that arena. (My apologies that I don’t have all the links I used to find the info here, but I was more concerned with getting it working than preserving the link history for posterity, but you do get my solutions.) In the end, I found other people who reported that their CD/DVD drives did not function properly with DMA enabled. I added the following option to the kernel boot:

-B atapi-cd-dma-enabled=0

And, voila! The system booted into the installer. To be honest, this would not work as a long term solution, not using DMA for the DVD drive makes it slow and bogs down the rest of the system, but since the only use of the drive was for install, I didn’t care.

The install of the base OS went smoothly as did the upgrade to the latest development release. While in general I will say the userland tools for OpenSolaris are lacking as compared to Linux and even that the package tools are lacking in general, upgrading the whole system is a breeze.

pkg install SUNWipkg
pkg set-publisher -O http://pkg.opensolaris.org/dev opensolaris.org
pkg image-update

Next came the CIFS (Windows file sharing) server.

pkg install SUNWsmbskr
pkg install SUNWsmbs
svcadm enable -r smb/server

Now on to building the new drive arrays. Given that I never make things easy on myself, I still had a slight conundrum. One of the 6 new drives that was destined for the new machine was still in the old server as the replacement for the drive that died. So that meant that I only had 5 drives for the new machine where I wanted 6. AHA! but this was going to be a dual parity RAID system (raidz2 to be specific). This means that once operational, it can sustain a 2 drive failure without loosing data. The question arises, can you create an array from scratch that is in a degraded state and then add the drive later? The answer is yes! The basic idea is to create a sparse file to be a temporary “drive” for array creation. Then immediately remove it from the array (degrading the array) since it won’t actually work to store data on it since there wouldn’t be enough drive space on the boot drive to hold all the parity. Then, once the data is transfered, “replace” the fake drive with a real one and let the system heal itself.

# Create temporary "drive" so I don't loose
# redundancy on source array
mkfile -nv 1500g /TEMP-DRIVE
zpool create -f nas raidz2 c9t0d0 c9t1d0 c9t2d0 c9t3d0 c9t4d0 /TEMP-DRIVE
zpool offline nas /TEMP-DRIVE
# Do all of the transfers and install 6th drive
zpool replace nas /TEMP-DRIVE c9t5d0

All good! Now on to transferring the data from the old system to the new. One of the (many) awesome features of ZFS is the ability to snapshot filesystems. On top of that, you can easily serialize these snapshots and send them around and then have them unserialized on another filesystem or even another machine. A great feature is that I could be transferring data to my new server without having to stop using my old one for the bulk of the time. Again, the basic structure was to:

snapshot the old system and transfer the data over to the new server (while still being able to use the old server). This took about 20 hours, but as I had no loss of usage, that is ok.
Stop using old server, take another snapshot, and transfer it to the new server. The delta between the first snapshot and the second is small, so the transfer time was also small (less than an hour).
Power down the old server, start using the new server

And here are the gory details:

zfs snapshot -r nas@transfer-primary
zfs send -v -R nas/scratch@transfer-primary '
    ssh gothmog@rivendell.home pfexec /usr/sbin/zfs recv -v -u nas/scratch
zfs send -v -R nas/av@transfer-primary '
    ssh gothmog@rivendell.home pfexec /usr/sbin/zfs recv -v -u nas/av
zfs send -v -R nas/backup@transfer-primary '
    ssh gothmog@rivendell.home pfexec /usr/sbin/zfs recv -v -u nas/backup

# Stop all use of system on bywater

zfs snapshot -r nas@transfer-final
zfs send -v -R -i nas/scratch@transfer-primary nas/scratch@transfer-final  '
    ssh gothmog@rivendell.home pfexec /usr/sbin/zfs recv -v -u nas/scratch
zfs send -v -R -i nas/av@transfer-primary nas/av@transfer-final '
    ssh gothmog@rivendell.home pfexec /usr/sbin/zfs recv -v -u nas/av
zfs send -v -R -i nas/backup@transfer-primary nas/backup@transfer-final  '
    ssh gothmog@rivendell.home pfexec /usr/sbin/zfs recv -v -u nas/backup

Honestly, not terribly gory for what I did. Note that transferring that way brought along everything about the filesystem, including the NFS exports, the CIFS exports, permissions, ACLs, everything. Very painless.

At this point I was largely done. But as I mentioned in the hardware list, I didn’t get the SAS card until later. I realized that it was a waste to not use the 3 old 1TB drives from my old server. Given their age and the failure of one of them, I didn’t trust them lots, but using them to move my network scratch space off the raidz2 array to a separate striped array seemed like a good use. As I had no extra SATA motherboard ports, I needed an external card. The LSI cards had been recommended to me, so I picked one up. It is very well supported by OpenSolaris. I changed the firmware on the card to the IT firmware (that is the non-RAID firmware since I was not going to use the hardware RAID) which took a little work to get a DOS boot environment, but that is another entry.

As it is the better card, I wanted the raidz2 array on the new card. Again, ZFS to the rescue. Under Linux this would have been a pain, but it was extremely simple under OpenSolaris.

# prepare the array to be moved
zpool export nas
# power off box, swap cables to new card, power up
zpool import nas

That was it.

Since then, I have installed a few more things to get things working well.

Smartmon to monitor the disk drives. (OpenSolaris article)
Apcupsd to shut down the system on power failures
Virtualbox to host a few test environments

Simply put, I am thrilled with the new machine. I expect it will last me for quite a while. Oh, the Barad-dûr reference? You will have to wait for the pictures.

May Contain Blueberries

the sometimes journal of Jeremy Beker

Should I have named it Barad-dûr? Or, my new server.