It’s Difficult To Be A System Administrator

[WARNING: Geeky Content Ahead]

Sometimes, things go well – most of the time, usually. But then, you have things like massive power outages when you’re in the middle of doing something to prevent catastrophic data logs when the power goes out. And what you were doing in the middle of the power outage actually CAUSES catastrophic data loss. 😦

Two weeks ago, one of my hard drives on my personal file server failed. It’s a 1.5TB hardware RAID1 (mirror) array. For those who don’t know what that is, I have a special device in my server that allows me to build redundant arrays so that in case of the failure of a single disk, no data is lost. That’s what I have: two 1.5TB drives mirrored to provide full redundancy. One of them failed and the server started to beep… telling me “fix my drive!” So I did. I got a new drive and it spent the next 20 hours or so rebuilding the 1TB of data that is on the drive. Problem solved!

Then, I did something dangerous. I started to think.

I thought “Hmmm… that hard drive failed way too early in it’s lifecycle. What if my other ones fail to?” You see: the drive I just replaced was only my personal data [which is very important] but it wasn’t part of my infrastructure. I could use that data on any machine, but I’d have to rebuild a ton of stuff to get to it if I had many more failures.

Just so that you understand an overview of my “home server farm”, I have [for purposes of this discussion] 2 host servers that serve up virtual machines (VMs). I have somewhere around 30 VMs spread across these hosts. Most are work-related, helping me to design and build solutions for my customers and do research, and some are for my personal use, such as a desktop, web server, email, and the file server mentioned above. There’s also a few machines to maintain the farm – Domain Controllers, DNS, Certificate Authorities, etc.

Each one of these computers is actually a file on one of the two host boxes – and those files are rather large. If something were to happen, say a power outage, there’s no guarantee that my battery power would last long enough for me to shut down the machines cleanly.

With all that in mind, and the idea that in this instance a “server” is actually a big file on the order of 40GB to 200GB in size, I thought that making the disk upon which these files sit a mirrored array would be a smart thing to do. Which it is.

So: on Friday afternoon, I began the process of creating mirrors on two of my servers. One server, with most of my work VMs on it, has no RAID card. On that one, I used the Windows OS to mirror two 1TB drives after shutting down the machines and moving the files off of one. Once I had that started, I moved to the other server, my personal one, and did the same thing – shut down the VMs and moved them to another disk. Then, I repurposed the vacant disk and joined it to the one which now held the VMs and began building a mirrored volume.

Now, with the RAID card, I can do this on the fly while the disk is available. Before I turned the machines on, it said it would take about 2.5 hours. I turned the machines on. Now, it said it would take about 20 hours.

I should have left them off.

Well after 2.5 hours later, at about 10:30 that night, the power went off. 14 or so hours later, it came back. I went down to power things on. It all “looked” okay for a while – I was getting email again, but the web server was wonky and slow and some other things were just kind of weird.

Looking further, it appeared that the rebuild had to be restarted since it had lost power. I restarted it. [Note: the work server using the Windows OS RAID simply came back on automatically and began rebuilding the mirror and it completed with zero errors.] About 2 hours later, there was a loud obnoxious beeping from the server closet. The rebuild had failed and the drive simply dropped offline. Gah!

All my VMs disappeared for a moment. Rescanning the array with the utility made it come back, but now I was very worried. Since the VMs were all off, I copied all the files to a second drive and build the array from scratch [after several attempts to find and fix whatever bad secords or corrupt tables were on the drives]. I moved the files back after it completed. I turned on the VMs only to have half of the machines not come back – the half that mattered of course. One domain controller, my web server, desktop and the email server were the biggest losses. I had the old disks, but I had to actually reinstall the OS on all of them and begin the slow, painful process of restoration.

Which is where I am today. I have the web server working [obviously] and we now have email with empty mailboxes. I have a recovery database ready to go, but there are issues with the old database so I need to finish patching the Exchange server so I can get it to the same version that it was before so the recovery tools will work properly. That’s what I’m doing now.

All of this work has taken 5 days or so to get things back up. I now have most of the critical VMs housed on RAID drives. I just need one more to complete the process.

At least I learned more about doing Exchange server mailbox recovery.

Power Outage

Many of you may have noticed the site down for several days. That is directly due to the power outage. Not that we’ve been without power, mind you. Power came back within 12 hours of loss. (Yay! Air Conditioning!) What happened was that I was in the middle of moving critical files from one disk to another when the power failed. The failure damaged several of my servers including the web server and the email server. I have no email for now, but hope to have it up soon – even if I have to do without the old stuff.

But, as you can now see: the web site is up and running.

Thank you for your patience!

New Look And Feel

Some of you may have noticed that things are looking different here. Well, they are. But not just “looking” different, are we. We are actually “feeling” different. Under all this visible change are other major changes – changes that will not only make my life easier, but will make the site a bit more stable, reliable, and faster.

What I’ve done is actually migrate [finally] all my sites/blogs/galleries to a single server. I’ve eliminated the old photo gallery server that was an “interim” box and moved that function to the main server. I’ve removed all instances of MySQL and am instead utilizing a single Microsoft SQL Server 2008 instance, which will soon be upgraded to SQL Server 2008 R2 for those that care.

This was harder to do than to write about and much pain and link adjustment/correction was required after migration to make sure everything was working.

In the past, running WordPress required MySQL since the Microsoft database was considered “enterprise” and not “open source”. However, in recent months, some enterprising individuals wrote a WordPress plugin that allows a connection directly to MS SQL instead of MySQL.

Now, all my blog are belong to MSSQL. [Grammatical incorrectness intentional for those not “in the know”, and if you don’t know where that reference comes from, I’m not explaining it to you.]

Closet Construction & Data Migration

Now that the urgency of my project schedule has eased after the holidays have passed, and now that life is settling back into more normal rhythms, Laura and I have taken on and completed some projects. Among those are the complete replacement of my closet and some backend server migration – stuff that you don’t see, but see the results of [or not].

First off, Laura has been bugging me to “re-do” my closet for years. Our master bedroom has two closets, one large [hers] and one small [mine]. We did hers shortly after moving in, and a couple other projects involving the same closet construction parts and process. Now, it was my turn – especially since it seems my excessive quantity of shirts was bending the closet shelf/rod downwards. In my defense, the truth was that it wasn’t a capacity problem, but one of sloppy installation of the old system. I think it was bad when we moved in, but my obtaining a few more items of clothing has not helped.

Friday, on the way home from work, I stopped at the mall and picked up the equipment and shelves… in my two-seat convertible. It fit, but it was tight – kinda hanging out the back. That night, we ripped out the old shelves and I sanded and spackled the gaping holes that were left. I did more sanding and spackling the next morning and that evening, we painted the walls a similar color to the ceiling of the bedroom. Today, we installed the shelving system and hung up my clothes again. No more sagging: all is well with the world!

305-8275

Oh, look: room for more shirts!

<geek-speak>

And the data migration part? I’ve just found out that there is a database plugin for WordPress that allows for it to utilize Microsoft SQL Server as the backend database. FINALLY. You’ll have seen many posts from me in the past about this [if you’ve been with me that long] and know my frustration that WordPress would never abstract the data layer and forced a proprietary data source on the users. Well, I’ve ditched it and this is my first post on the blog using SQL Server. So far, I think it’s a bit faster. But, I could just be biased…

🙂

The main thing is that I know how to use it, manage it, optimize it, and patch it. It is, after all, what I do.

</geek-speak>

Geeking Out

If you’re not interested in technology or other geeky stuff, you can probably skip this entry as I’m going to be describing some of the technical endeavors I’ve been undertaking over the last couple of weeks.

SP1
You may or may not have noticed that the site is a bit faster. This is mainly due to the fact that I have installed Service Pack 1 for Windows Server 2008 R2 on both the host and the various guest OSs that I have running. The new Dynamic Memory feature allows me to just set initial memory settings and allow the system [guest] load to either request more memory or free up unused bits so that other guests can use them. I’ve given the email server and this web server [and the firewall server] priority so that their requests for resources are serviced first.

The performance is much better – you may not notice much, but on my end when I use my virtual desktop [which is where I do all my personal computing like Quicken and personal email] is much faster since it has the RAM it needs and the other guests aren’t being hogs.

It also makes accessing all my other guest machines easier and faster. The ones not used much can idle and give back RAM. It’s supposed to be “greener” too, allowing you to save money by throttling back CPU and resource usage on the server to only what is needed.

New Virtual Box: FAIL
I tried to utilize one of my old laptops, a Toshiba Techra M5, as a new virtual host for a couple of guest machines [like a new virtual SAN host I got at TechReady]. I installed the raw, bare-bones Windows Hyper-V Server 2008 R2 – which is NOT Windows Server, for those of you not in the know. The install went fine until I got to installing SP1 [see above] on the machine. The Blue Screen of Death made a command performance.

It seems that I remember why I stopped using the laptop: it’s prone to overheating and crashing, which is exactly what it did – several times. So, scratch having another host box for my virtual machines for now. I suppose I’ll have to wait until I get a new one and take one of the other old ones and repurpose it.

Smart Cards and Certificates
I’m also playing with certificates and smart cards. I’m about halfway there – I’ve gotten one to work via logging into a laptop, but not yet with OWA [which is the goal].

I’ll soon have to upgrade the CA to 2008 R2, but not just yet.

That’s all for now. 🙂

Ooops – I Upgraded…

It wasn’t intentional, at least the scheduling wasn’t intentional. However, last night there was a power outage and our servers crashed and then came back on. What is different, though, is that the firewall server is now running Forefront TMG instead of the old ISA 2006. This is much better, of course, but previously I hadn’t been able to get it to work properly.

As you can see, it’s working now. This means that the configuration I had was working just fine. When the servers rebooted, the new server came up first and took over the network and became the production firewall server. I only found out because I hadn’t put the new mail certificate on the listener. The old server came up second, but because it has the same IP address, it couldn’t connect to anything.

I needed to upgrade it because it was running Windows Server 2003 and the old version of the firewall. The new version is running on Windows Server 2008 R2 and is all current and up to date.

It seems to be faster, too.

I’ve turned off the old server and deleted it’s virtual hard drive. No going back now!

Almost Destroyed

Some of you may have noticed this morning when the blog was unavailable. As you can see, it’s back now and working properly. I guess that some of the “fixes” I put in place broke things over the weekend and I only now got it back up and running. It as almost completely lost. That would have been bad. At least I had a backup.

Anyway: all is well now with the world and the web server.

Enjoy!

New Web Server

I bet you didn’t even notice, but we’re now running on Windows Server 2008 R2, which is 64-bit only.

Yay!

This is has been an ongoing project which I started a while back when 2008 came out. I wanted to migrate all the sites from 2003 to 2008. I got bogged down and then R2 was release and I couldn’t just upgrade the new server, so I had to deploy a new one. Presently I have 3.

Soon, there will be only one.

There is still much work to be done, but it won’t be the onerous task I thought it would be… I hope…

Server Room: Complete

Upon arrival back from our trip to Houston, it was good to arrive back at the house to see that our basement work was complete. The wall was done and painted, the wall trim installed, and the door put in place.

201-3764

It looks like it was always there. Here are some pictures of the inside:

201-3767 201-3768

So, I’m pretty excited about the room. Tonight, I’ll be moving the servers into the room, so the site will be down for a bit.

More Home Improvements

This week, we’ll be modifying our basement. We’ll be walling in a nook to contain all my servers and networking equipment. It’s the latest phase of getting our basement set up for photography. This will allow us to move all the excess computer equipment out of the main basement area into a closed room. In this room will also be some shelving for storing other unsightly materials. Here is a pictures of the nook [after we moved all the servers away from it]:

201-3640

Previously, there was a desk [and then a table] holding printer, monitor, CD filing units on top and computer part holders and a server or two beneath it. It was quite messy.

Over the next few days, there may be some few outages as we move wiring and other things around to accommodate the new space.

Not that I’ve been posting anything recently… I know: I’m a slacker.