Notes:
Fixed some false positives that were showing up on checking a database with canceled backups. By false positives I mean good databases that were being flagged as bad, not the other way around.
Changes:
|
Fixed some false positives that were showing up on checking a database with canceled backups. By false positives I mean good databases that were being flagged as bad, not the other way around.
|
Posted in download, whsdbcheck | View Comments
A new optimization was added in this build called the hash cache. This is strictly targeted at the level 4 check, and in fact, it's not active on any other check level. The level 4 check on the previous build was VERY slow on machines with small amounts of RAM free or on large databases. The hash cache solves this by explicitly allocating a specific amount of RAM for the second part of the level 4 check, the verification stage. It allocates 256 MB by default but if you have plenty of RAM (gigabytes) and would prefer windows to handle the file caching, you can do that with /hashcache=0.
The more RAM assigned to the hash cache the less passes you have to perform per file verification. With the hash cache off everything is accomplished with one pass but your disk seeks heavily, unless windows caches the hashes into RAM using the built-in OS file system cache.
Also, if you are in the middle of a level 4 verification check with the last build, and it's not moving very fast, you can Ctrl-C it to abort and resume with this new build. Just move the Index.NNNN.dat.md5 file from the temp folder that was created by the tool (WhsDbCheck_Temp) up one level to where the database files are. Then start a level 4 check with this new version and it will find the md5 file, and simulate the read check taking you directly into the verification stage using the new hash cache. Pretty neat.
|
Posted in download, whsdbcheck | View Comments
This is a fairly big update that includes the all important level 4 test. This test is a complete top to bottom WHS DB verification test. It will rehash ALL the data and verify that it's correct. Also, this version contains a slew of optimizations that are targeted at different test levels. This build is very much untested, so expect bugs. I expect the next build to be a BETA, meaning that this build is very close to feature complete.
|
Posted in download, whsdbcheck | View Comments
Changes:
|
Posted in download, whsdbcheck | View Comments
Changes:
|
Posted in download, whsdbdump | View Comments
Changes:
|
Posted in download, whsdbdump | View Comments
WARNING: This is an advanced tool. Working with your Windows Home Server database is dangerous. I'm not responsible for any data loss that may ensue as a result of the use of WhsDbDump. Please always operate on a copy of the database and not the original one. Shuffle your bits with care.
If you're running this tool on a pre-Vista machine please be aware that WhsDbDump requires at least .NET 2.0+. You may download the latest version of the .NET framework directly from Microsoft.
WhsDbDump is a utility to dump your Windows Home Server backup database data files to human readable text files for examination. It decodes the binary format stored in the data files to .txt and .xml files. It supports the standard Windows Home Server data file format and is able to decode any database file specified.
Please note that some files are storing purpose specific data and will not be dumped. For example, the Data.nnnn.n.dat files store the actual cluster level data. That data will not be dumped.
Basic Usage:
WhsDbDump takes a number of parameters, but for basic usage that is all you need to know. If you want to dump the data/record sections of the backup files you can specify /data and optionally /dataalternate after the file mask (e.g. WhsDbDump.exe *dat /data /dataalternate). This will take some time to complete if your database is big and will eat up a whole lot of disk space too.
Posted in instructions, whsdbdump | View Comments
Here you will find links to the latest versions of my Windows Home Server Backup Database tools.
WARNING: Even though these tools do not open any backup data files for write access, I cannot take responsibility for any data loss. Please take appropriate measures to protect your data.
Posted in download index | View Comments
The data in the Header section can be obtained by going to the offset specified in the Signature. Back to our infamous Control.4096.dat. The Header section in HEX: 80 20 80 98 81 0b 80 c0 d7 a5 80 80 80 0b You'll understand why I chose the number of bytes that I did after understanding how deserialization works. The field definitions would lead you to believe that these are some sort of fixed integers (like in the header). They are not! In fact, they are specifically encoded integers, that are VARIABLE length. Let's look at the first field. It's type is int and it's name is BytesPerCluster. After doing a bit of Googling I figured out that there should be 4096 bytes per cluster. So how do you get 4096 out of Hex 80, or is it 80 20, or maybe 20 80 ? Let me spare you the gory details of how I figured this out, here's the secret: Each byte's first bit determines whether the next byte is part of the same value. So let's look at this in binary: 80 = 10000000 20 = 00100000 Also, because the first bit plays this special role, it does not compute into the actual value. Now that we know this, how do we calculate the value of these? The value is actually stored as little-endian signed numbers (not decimal floats), and the first bit of each byte does not play a role in the value calculation. So here's a simple deserialization algorithm: Step 1. Find the number of bytes participating in this value. So based on the information above, the number of bytes in this case is 2. Since the first encountered byte with its first bit set to 0 is byte number 2. Step 2. Go through each bit starting from the most-significant bit to the least significant bit adding up the values. Apply the sign bit. I know this is hard to understand and visualize, so I'll try to explain what I mean: So our 2 bytes in binary are: 10000000 00100000 The bits that are part of the value, starting from the least significant to the most significant are: * 6 5 4 3 2 1 0 * +- 12 11 10 9 8 7
^
This means to read the next byte, we're not done.
^
This means we're done and the next byte is part of the next value.
Bits: 1 0 0 0 0 0 0 0 | 0 0 1 0 0 0 0 0
Now that we have the significance of each bit, this is how you convert significance to bit-value:
2 ^ significance = bit-value
To determine the final value, just add up all the bit values. So in this case we only have one bit set and the bit-value is 2 ^ 12 = 4096.
So now we know that the first field value for "BytesPerCluster" is 4096.
If the number had been bigger you would, of course, have had more bytes participating.
Posted in header, specification | View Comments
The definition is the most readable section here and is actually an XML text fragment. It begins at the file offset pointed to by the signature section. It ends when encountering a null byte. Let's look at the XML from Control.4096.dat: This is specifying the file type and defining the format of the header section. In this case, we can see the file type and the header fields. Each entry under the header tag is a specific reference to a field VALUE stored in the Header section. The type defines how we deserialize (or decipher) the value for that field. Some schemas will also define the format for the records located in the Records section and the Alternate Records section. There can be multiple records in a Records section. Fields types are not limited to simple intrinsic types, they can also be structures, and there can be more than one successively. In addition, there can also be a sentinel, which defines the end of records. The sentinel must be of the same type as the first record type, for obvious reasons.
<file type="Control">
<header>
<field type="int" name="BytesPerCluster">
<field type="int" name="NextIndex">
<field type="int" name="NextDataOffset">
</header>
</file>
Posted in schema, specification | View Comments
The signature is read starting at the first byte of any backup database file. Here's the signature of a typical Control.4096.dat in Hex: 08 00 00 00 08 00 00 00 04 00 00 00 00 02 00 00 These are little-endian encoded 4 byte integers. Let's look at these 1 at a time: 1. 08 00 00 00 - (8) Unknown. File signature? 2. 08 00 00 00 - (8) Unknwon. File signature? 3. 04 00 00 00 - (4) File type. 4 = Control file. 4. 00 02 00 00 - (512) Offset into Header section. 5. 00 04 00 00 - (1024) Offset into Schema section. 6. 01 00 00 00 - (1) Unknown. Sometimes is 2. 7. 01 00 00 00 - (1) File version. 8. 00 10 00 00 - (4096) Offset into Alternate Records section. 9. 00 10 00 00 - (4096) Offset into Records section. The important ones here are 4, 5, 8, 9. These let us parse the file further.
00 04 00 00 01 00 00 00 01 00 00 00 00 10 00 00
00 10 00 00
Posted in signature, specification | View Comments
Windows Home Server Backup Database - Reverse Engineered Specification
Updated: July 27 2008
Each backup DB file is divided into 4 parts
Posted in specification index | View Comments
I think it was when ColinWH replied to my request for help I had decided that I'm going to attempt a crude reverse engineering attempt of the Windows Home Server backup database. I didn't want to know how the whole thing worked, just enough to get that darned Control.4096.dat reconstructed.
Unfortunately, Control.4096.dat was not an XML file at all. It was not a very big file, only 4KB, and it did contain something in the middle that looked like XML. I'll save you the trouble or reading through the excruciating description of how I deciphered the binary format that the Home Server uses. Surprisingly, it didn't take long. I had a good part of the basic infrastructure figured out on the next day after I posted the initial request for help. It was somewhat crude and incomplete but it gave me enough information to begin to think about reconstructing the missing file.
In fact, on the next day, I was able to load my so called lost-cause backup database and pull files off. It worked so well that I pulled off 2 entire partitions and since have imaged them to DVDs.
In the process of reverse engineering the DB format I decided to document everything, because it seemed to me that this knowledge can be valuable to other's who might get into similar circumstances. Also, I had the thought of developers making applications that work directly with the WHS Backup DB using my unofficial spec. as the starting point.
One of the reasons that this blog exists is so that I can post my original reverse engineered spec. to a more permanent place. It was originally available in this forum post.
It's now been just over 2 weeks since my catastrophe. I've figured out most of the database format. I've made my first tool based on my spec. (this is coming up shortly). So what follows will be the original spec. edited and updated for correctness.
Posted in talk | View Comments
So the neat thing about the Windows Home Server (knowing Microsoft, that link will probably be dead by the time you read this) is that it pools all your hard drives into this one massive storage pool, it's easily expandable and provides flexible folder-level RAID 1 like data redundancy. It accomplishes this feat by automagically shuffling your files around for you to all the different drives that are part of the storage pool as it sees fit.
One aspect of this balancing process is that it tries to keep all your data off the main system drive. The rationale for this is that if the main system drive fails you can do a non-destructive server recovery process where it re-installs a brand new copy of the OS to the system drive, and since your actual data files are elsewhere, it will create the proper tombstone links from the data portion of the system drive to your actual files that live elsewhere.
Good theory.
So to continue my story from where we left off last, I was about to enter this server recovery process. I've since pulled the dying system drive, checked it for bad sectors, imaged what was left of it just to have something to fall back to in case everything went horribly wrong from here on. By the way, it turns out that it had bad sectors all over the beginning of the drive so the OS was definitely on its last leg.
Now that I was able to get the server recovery wizard to actually find the server, I proceeded to click the server recovery option (versus Factory Reset which implies data loss). The wizard let me know that it will try to recover all my files. Since I've verified earlier that all my files were actually on the non-failing drives I had high hopes.
An hour or so later...
Server recovery was apparently successful. After going through the standard initial setup I had what appeared to be a functioning server. Even all my shares and data appeared to be there. After opening a few files and getting back what I expected, it looked like server recovery might have actually worked. Imagine that!
I proceeded to install the connector software on my main desktop. Right after that, I noticed something horribly wrong. The backup service was not running and claimed that my backup database was corrupt! How can it be corrupt, it was fine before the server restore? Obviously the server restore process is selective about the files it recovers, and in this case, it chose not to restore some of my backup database.
After digging through the log files I noted that it was complaining about a missing file named Control.4096.dat, and that it was being referenced by Data.4096.NN.dat files.
Not good.
Data loss is never good, but this was a 260GB+ backup database that holds data to some hard drives which are no longer in use. In effect it became my archive. I realize that it was never meant to be used that way, but that's what happens naturally over time, and data loss should never be an expected and accepted outcome. We should always strive to prevent data loss.
Now that I realized some of my files were missing, I went on to load the image I've made earlier of the failing drive, and since it contained a list of all my files (in tombstone link form) I could compare the files in the current backup database with the original working backup database.
Incidentally, the backup database is located in D:\folders\{00008086-058D-4C89-AB57-A7F909A47AB4} on the windows home server. I recommend you don't touch it unless you know what you are doing (or you're like me, who has too much time on his hands).
After comparing the file lists, I quickly discovered that 2 files were missing, not 1:
Control.4096.dat
Data.4096.0.dat
Clearly, the backup database integrity check that the windows home server was doing was not up to snuff. It didn't notice that a whole big ol' 4GB file was missing!
After pulling my data drives and running an undelete recovery tool, I found that the said files were deleted and conveniently overwritten with other data, and thus unrecoverable. The first file, Control.4096.dat, had this done twice to it on two separate data drives. Nice.
So me being me, unable to accept the inevitability of things, I proceeded to think about how I could recover this data. After going through all the sanctioned possibilities, of which there are none really, short of resetting your backup database, I came to the conclusion that I have to try something else.
By the way, can you believe people are actually accepting this as a solution? My backup database is corrupt, umm... erase the whole thing and start over. Can you imagine this in the real world, I have a leaky roof, what should I do? Umm... the house is built in such a way sir where the entire roof is an integral part of the sub-structure and attempting repair on any part of it is impossible. Therefore, unfortunately we will have to wreck your entire house and rebuild it from scratch. Unfortunately, we can't let you enter the house first to retrieve your belongings because the whole thing is bound to collapse on you at any moment.
That's ludicrous!
So I started looking for alternative solutions. I read the WHS technical brief on the topic and I came across this excellent effort by brubber on the official WHS forum (all links are subject to change at Microsoft's whimsy, so don't bet on them working for long). While this was all very useful, it didn't really give me any options I could use.
What I really wanted was to reconstruct that Control.4096.dat file. From looking at the tombstone image I could tell it was only 4KB in size. I had a good chance of getting at most of my data, albeit incomplete, if I could just get that Control file back. Since the backup engine never noticed the gigantic data file missing, I thought maybe it didn't really need it. It looked like the first data file of the whole backup set, and I didn't really care about the first backups I ever did, it's the stuff in the middle that I wanted to recover.
I couldn't believe that a 4KB file was preventing me from accessing 260GB!
Now I don't normally do this, but I think this warranted it, I posted on the official WHS forum for help. I wanted to avoid the standard responses of "did you read the FAQ?" and such, so I got right to the point. Among other things, I asked "Is it possible to get the format of this file or some sort of rebuilding process?".
Surely enough, the first response I got ColinWH, although not very encouraging, was very useful. I got a snippet from this supposed all important control file and it looked like XML! I rejoiced, surely I can recreate this file.
2 weeks later...
No, not really :) But this was only the beginning.
Posted in server recovery, talk | View Comments
Here we go, my first blog. So let me get right to the point. About 2 weeks ago I started loosing my primary hard drive on my HP MedeaSmart Windows Home Server. First I started getting worrying messages from the WHS console, I then proceeded to check the even log on the server itself and sure enough some really nasty event IDs showed up, which after googling, proved to to be sure signs of impending hard drive melting doom.
I researched my recovery options and it seems that the HP server had a built in facility to do complete non-destructive recovery from such a scenario. All I needed was a new hard drive, a paper clip and another compute with a cd drive. Sounded simple enough.
So I proceeded to pulled the old system drive and replace it with a new drive. I started the server recovery CD from another system on the network, followed the instructions given to me, got out that trusty paper clip and hit the hidden server recovery button on the front of the system just at the right time, and then... nothing! The server recovery wizard went through it's finding your home server bit for about 2 minutes and then proceeded to tell me that it couldn't find anything and suggested that I turn off my firewall. Clearly, that was not the issue.
So, 6 hours later...
After going through multiple routers, multiple ethernet cables, but still the original paper clip, I came to the conclusion that there must be something seriously wrong with the server's BIOS, or whatever process it uses for server recovery. Connecting the server to a router and booting into this "safe recovery environment" didn't produce a single blip on the router's lights. I assumed the worst and was prepared to call HP for a replacement. It was really late at night, so I saved that task for the next day.
However, come next day, relentless as I am, I continued my search for the possible cause of this recovery mode malfunction.
A few hours later...
I come across this post on some obscure forum, which I've lost since, with another person having a similar experience with server recovery. Someone had replied to them to make sure that the server only sees one OS, and that it had trouble booting up with a milti-boot system and that they should unplug any external USB hard drives not part of the storage pool.
Now I don't have a multi-boot system, and why you would want to multi-boot when you're doing server recovery is completely beyond me. But I DO have an external USB hard drive that was not part of the storage pool. Although, how the server knows if a hard drive is part of the storage pool or not at this early stage in booting, is also completely beyond me. But nevertheless, this was one thing I hadn't tried. So I proceed to unplug my single USB hard drive and fire up the wizard again, get out that trusty parer clip one more time, and sure enough, after 2 minutes of searching the wizard had reported that it found my home server. Hooray!
Lesson learned.
I wish they had mentioned somewhere that server recovery DOES NOT WORK WITH EXTERNAL USB HARD DRIVES PLUGGED IN.
But little did I know, my problems had only started. Because at this point, I hadn't actually lost any data yet.
Posted in server recovery, talk | View Comments
Original design by Salesforce.com | Converted by eBlog Templates | SFDC Blogger Template