The VHD format is becoming more popular based on common use by Microsoft. It has been said that Windows 7 will have built in support for VHD and will even allow a VHD to be booted. As has been said a few times, the VHD specification is public which means that essentially anyone is allowed to program to it.
The format is fairly easy to understand and the specification, though short, covers what needs to be said.
However, having read the specification, certain things seemed a bit unclear. The only way to get full clarity was to experiment with a real VHD and match it to the spec.
The first concept is that each VHD has a header and a footer. Both happen to be identical for the sake of redundancy. Most likely the footer was defined first and was projected to the front as well. This is good news for getting key information up front.
This post will focus on Dynamic VHD files. There are two other types (fixed and differencing) but dynamic is perhaps the most common. Fixed is fixed. Once you allocate a size, you are stuck with it. It takes all the space specified without necessarily using any of it. It is good for guaranteeing the space will be there but a bad citizen for disk space usage on the host. Differencing is more advanced and essentially is used for parent/child disk relationships to create what could be called a linked clone. The idea is that the difference disk builds on its parent and does not require all the data the parent has. Dynamic disks are disks that allocate space on the fly based on usage. There are rules about how big it can get and how the blocks are allocated but it appears the same as a fixed disk to the guest.
The Dynamic VHD file has three main sections.
The first is what is called the Hard Disk Footer (confusing, huh?). The fields that are the most valuable here are the cookie, data offset, original/current size, disk type, checksum, and unique Id. The cookie is just a 8-character string (not null terminated) and is always ‘conectix’. The cookie comes from the origins at Connectix which Microsoft purchased the virtual products. The data offset marks the next interesting section. The size fields describe how big VHD can be (mine are the same even though the data is not that big yet). The disk type reveals what kind of drive we have (fixed, dynamic (3), differencing). The checksum proves that the header is not corrupt. The unique Id is important for identification and matching with parents.
The second area is only for the dynamic and differencing disks called Dynamic Disk Header. It has a cookie as well which is always ‘cxsparse’. The data offset points to nowhere currently (0xFFFFFFFF) but the table offset points to the Block Allocation Table (BAT). Also of importance for dynamic disks is the block size and the ‘max table entries’. The max table entries is determined by the total size and the size of the blocks. The default is have the blocks size be 0x200000 (2MB). This is the smallest allocation unit in a VHD. Compared against the cluster size of around 4K, this is much larger. For my 10GB VHD, there are 0x1388 (5000) entries of 2MB each.
The third area is the Block Allocation Table. It is incredibly simple since it is just an array of 32-bit entries which each entry corresponding to a sector offset within the file. So, this means that I have 5000 DWORD values which each point to a block, each being 2MB in size.
There are a few things to warn about. First of all, all values are in big endian so most likely you will need to flip the values to little endian to be able to process anything. This just that the VHD format has values where the biggest part comes first. Intel architecture (and Windows) uses little endian. Perhaps this comes from some other platform (like the Mac?).
Also, never assume that the sector offsets are in a row. A dump of the offsets reveal that they can be scrambled up.
So, the way this works is that if the guest reads sector 0 of the drive, it actually gets translated to read the sector offset specified in the first entry of the BAT.
There is another twist to add to confuse you. Each block is actually slightly bigger depending on the block size. For example, a 2MB block has an extra sector (512 bytes) at the beginning to show which sectors have been written. The sector has a bit for each sector in the block. This means that the offsets in the BAT show an increment of not 0x1000 but actually 0x1001 (for 2MB blocks).
The actual drive data lives after the BAT area. In my case the first sector offset is 0x2C. This just means that the first 0x2C sectors (0x0 to 0x2B) were used for the Hard Disk Footer, Dynamic Disk Header, and the BAT.
After all the data, the “other footer” is at the end. It should reside in the last sector of the file. Obviously if you have code that changes one, it will need to change the other too. Also, don’t forget to fix the checksum.
The checksum algorithm is in the spec but it is not perfectly clear. The important missing bit is that the size of elements is a byte. The checksum seems a bit weak since it is just an addition of all the bytes and this would never reach past a low sum. It also takes no account of position so it would be possible to mix the bytes but still get the same checksum. Most likely it was done in early days and once done it was hard to change.
The first sector of the virtual disk is actually the Master Boot Record (MBR). From here, you can discover the partitions in the virtual disk. Usually the first entry is the boot (primary) partition. Using the structures for the MBR, the starting sector can be determined. The starting sector is the boot sector from the partition.
The chain from the first sector (Footer) to the boot sector is consistent and can be determined by code based on the spec. Most people probably wouldn’t bother with this.
However, there is value. Not only that, there is a gap between the sectors in the VHD file versus the files that are contained in the VHD itself. This gap has caused a few different problems. One issue is that it is impossible to know which files live in what parts of the VHD. Well, maybe not impossible but extremely difficult.
One of the side benefits is getting the VHD and NTFS code to work together. The allocation and free schemes need to be more tightly coupled. We have something new with VHD which makes it advantageous to have the files live in 2MB blocks. In other words, since we do not have a real disk, we would prefer that the files use up as few 2MB blocks as possible. It is similar to defrag but instead it is more the idea of clumping together files in allocation units greater than clusters. In fact, it really is not as important to make sure the files are contiguous as it is to make sure they use up the 2MB blocks well.
Last week I wrote a VHDDUMP program to show various parts of the VHD file. Unfortunately it is too big to show in a listing and I still have no place to upload tools to. This is said in the hope that someone knows of a public location that will allow tools to be uploaded without the intent of sale or wide distribution.