So the neat thing about the Windows Home Server (knowing Microsoft, that link will probably be dead by the time you read this) is that it pools all your hard drives into this one massive storage pool, it's easily expandable and provides flexible folder-level RAID 1 like data redundancy. It accomplishes this feat by automagically shuffling your files around for you to all the different drives that are part of the storage pool as it sees fit.
One aspect of this balancing process is that it tries to keep all your data off the main system drive. The rationale for this is that if the main system drive fails you can do a non-destructive server recovery process where it re-installs a brand new copy of the OS to the system drive, and since your actual data files are elsewhere, it will create the proper tombstone links from the data portion of the system drive to your actual files that live elsewhere.
Good theory.
So to continue my story from where we left off last, I was about to enter this server recovery process. I've since pulled the dying system drive, checked it for bad sectors, imaged what was left of it just to have something to fall back to in case everything went horribly wrong from here on. By the way, it turns out that it had bad sectors all over the beginning of the drive so the OS was definitely on its last leg.
Now that I was able to get the server recovery wizard to actually find the server, I proceeded to click the server recovery option (versus Factory Reset which implies data loss). The wizard let me know that it will try to recover all my files. Since I've verified earlier that all my files were actually on the non-failing drives I had high hopes.
An hour or so later...
Server recovery was apparently successful. After going through the standard initial setup I had what appeared to be a functioning server. Even all my shares and data appeared to be there. After opening a few files and getting back what I expected, it looked like server recovery might have actually worked. Imagine that!
I proceeded to install the connector software on my main desktop. Right after that, I noticed something horribly wrong. The backup service was not running and claimed that my backup database was corrupt! How can it be corrupt, it was fine before the server restore? Obviously the server restore process is selective about the files it recovers, and in this case, it chose not to restore some of my backup database.
After digging through the log files I noted that it was complaining about a missing file named Control.4096.dat, and that it was being referenced by Data.4096.NN.dat files.
Not good.
Data loss is never good, but this was a 260GB+ backup database that holds data to some hard drives which are no longer in use. In effect it became my archive. I realize that it was never meant to be used that way, but that's what happens naturally over time, and data loss should never be an expected and accepted outcome. We should always strive to prevent data loss.
Now that I realized some of my files were missing, I went on to load the image I've made earlier of the failing drive, and since it contained a list of all my files (in tombstone link form) I could compare the files in the current backup database with the original working backup database.
Incidentally, the backup database is located in D:\folders\{00008086-058D-4C89-AB57-A7F909A47AB4} on the windows home server. I recommend you don't touch it unless you know what you are doing (or you're like me, who has too much time on his hands).
After comparing the file lists, I quickly discovered that 2 files were missing, not 1:
Control.4096.dat
Data.4096.0.dat
Clearly, the backup database integrity check that the windows home server was doing was not up to snuff. It didn't notice that a whole big ol' 4GB file was missing!
After pulling my data drives and running an undelete recovery tool, I found that the said files were deleted and conveniently overwritten with other data, and thus unrecoverable. The first file, Control.4096.dat, had this done twice to it on two separate data drives. Nice.
So me being me, unable to accept the inevitability of things, I proceeded to think about how I could recover this data. After going through all the sanctioned possibilities, of which there are none really, short of resetting your backup database, I came to the conclusion that I have to try something else.
By the way, can you believe people are actually accepting this as a solution? My backup database is corrupt, umm... erase the whole thing and start over. Can you imagine this in the real world, I have a leaky roof, what should I do? Umm... the house is built in such a way sir where the entire roof is an integral part of the sub-structure and attempting repair on any part of it is impossible. Therefore, unfortunately we will have to wreck your entire house and rebuild it from scratch. Unfortunately, we can't let you enter the house first to retrieve your belongings because the whole thing is bound to collapse on you at any moment.
That's ludicrous!
So I started looking for alternative solutions. I read the WHS technical brief on the topic and I came across this excellent effort by brubber on the official WHS forum (all links are subject to change at Microsoft's whimsy, so don't bet on them working for long). While this was all very useful, it didn't really give me any options I could use.
What I really wanted was to reconstruct that Control.4096.dat file. From looking at the tombstone image I could tell it was only 4KB in size. I had a good chance of getting at most of my data, albeit incomplete, if I could just get that Control file back. Since the backup engine never noticed the gigantic data file missing, I thought maybe it didn't really need it. It looked like the first data file of the whole backup set, and I didn't really care about the first backups I ever did, it's the stuff in the middle that I wanted to recover.
I couldn't believe that a 4KB file was preventing me from accessing 260GB!
Now I don't normally do this, but I think this warranted it, I posted on the official WHS forum for help. I wanted to avoid the standard responses of "did you read the FAQ?" and such, so I got right to the point. Among other things, I asked "Is it possible to get the format of this file or some sort of rebuilding process?".
Surely enough, the first response I got ColinWH, although not very encouraging, was very useful. I got a snippet from this supposed all important control file and it looked like XML! I rejoiced, surely I can recreate this file.
2 weeks later...
No, not really :) But this was only the beginning.
Welcome
The front page shows all the recent posts. To see a more organized view use the tabs above.
Server Recovery Data Loss - Backup Corruption
Posted by Alex at 11:59 AMThis entry was posted on 11:59 AM and is filed under server recovery, talk . You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.
blog comments powered by Disqus
Subscribe to:
Post Comments (Atom)