The VHD format is becoming more popular based on common use by Microsoft. It has been said that Windows 7 will have built in support for VHD and will even allow a VHD to be booted. As has been said a few times, the VHD specification is public which means that essentially anyone is allowed to program to it.
The format is fairly easy to understand and the specification, though short, covers what needs to be said.
However, having read the specification, certain things seemed a bit unclear. The only way to get full clarity was to experiment with a real VHD and match it to the spec.
The first concept is that each VHD has a header and a footer. Both happen to be identical for the sake of redundancy. Most likely the footer was defined first and was projected to the front as well. This is good news for getting key information up front.
This post will focus on Dynamic VHD files. There are two other types (fixed and differencing) but dynamic is perhaps the most common. Fixed is fixed. Once you allocate a size, you are stuck with it. It takes all the space specified without necessarily using any of it. It is good for guaranteeing the space will be there but a bad citizen for disk space usage on the host. Differencing is more advanced and essentially is used for parent/child disk relationships to create what could be called a linked clone. The idea is that the difference disk builds on its parent and does not require all the data the parent has. Dynamic disks are disks that allocate space on the fly based on usage. There are rules about how big it can get and how the blocks are allocated but it appears the same as a fixed disk to the guest.
The Dynamic VHD file has three main sections.
The first is what is called the Hard Disk Footer (confusing, huh?). The fields that are the most valuable here are the cookie, data offset, original/current size, disk type, checksum, and unique Id. The cookie is just a 8-character string (not null terminated) and is always ‘conectix’. The cookie comes from the origins at Connectix which Microsoft purchased the virtual products. The data offset marks the next interesting section. The size fields describe how big VHD can be (mine are the same even though the data is not that big yet). The disk type reveals what kind of drive we have (fixed, dynamic (3), differencing). The checksum proves that the header is not corrupt. The unique Id is important for identification and matching with parents.
The second area is only for the dynamic and differencing disks called Dynamic Disk Header. It has a cookie as well which is always ‘cxsparse’. The data offset points to nowhere currently (0xFFFFFFFF) but the table offset points to the Block Allocation Table (BAT). Also of importance for dynamic disks is the block size and the ‘max table entries’. The max table entries is determined by the total size and the size of the blocks. The default is have the blocks size be 0x200000 (2MB). This is the smallest allocation unit in a VHD. Compared against the cluster size of around 4K, this is much larger. For my 10GB VHD, there are 0x1388 (5000) entries of 2MB each.
The third area is the Block Allocation Table. It is incredibly simple since it is just an array of 32-bit entries which each entry corresponding to a sector offset within the file. So, this means that I have 5000 DWORD values which each point to a block, each being 2MB in size.
There are a few things to warn about. First of all, all values are in big endian so most likely you will need to flip the values to little endian to be able to process anything. This just that the VHD format has values where the biggest part comes first. Intel architecture (and Windows) uses little endian. Perhaps this comes from some other platform (like the Mac?).
Also, never assume that the sector offsets are in a row. A dump of the offsets reveal that they can be scrambled up.
So, the way this works is that if the guest reads sector 0 of the drive, it actually gets translated to read the sector offset specified in the first entry of the BAT.
There is another twist to add to confuse you. Each block is actually slightly bigger depending on the block size. For example, a 2MB block has an extra sector (512 bytes) at the beginning to show which sectors have been written. The sector has a bit for each sector in the block. This means that the offsets in the BAT show an increment of not 0x1000 but actually 0x1001 (for 2MB blocks).
The actual drive data lives after the BAT area. In my case the first sector offset is 0x2C. This just means that the first 0x2C sectors (0x0 to 0x2B) were used for the Hard Disk Footer, Dynamic Disk Header, and the BAT.
After all the data, the “other footer” is at the end. It should reside in the last sector of the file. Obviously if you have code that changes one, it will need to change the other too. Also, don’t forget to fix the checksum.
The checksum algorithm is in the spec but it is not perfectly clear. The important missing bit is that the size of elements is a byte. The checksum seems a bit weak since it is just an addition of all the bytes and this would never reach past a low sum. It also takes no account of position so it would be possible to mix the bytes but still get the same checksum. Most likely it was done in early days and once done it was hard to change.
The first sector of the virtual disk is actually the Master Boot Record (MBR). From here, you can discover the partitions in the virtual disk. Usually the first entry is the boot (primary) partition. Using the structures for the MBR, the starting sector can be determined. The starting sector is the boot sector from the partition.
The chain from the first sector (Footer) to the boot sector is consistent and can be determined by code based on the spec. Most people probably wouldn’t bother with this.
However, there is value. Not only that, there is a gap between the sectors in the VHD file versus the files that are contained in the VHD itself. This gap has caused a few different problems. One issue is that it is impossible to know which files live in what parts of the VHD. Well, maybe not impossible but extremely difficult.
One of the side benefits is getting the VHD and NTFS code to work together. The allocation and free schemes need to be more tightly coupled. We have something new with VHD which makes it advantageous to have the files live in 2MB blocks. In other words, since we do not have a real disk, we would prefer that the files use up as few 2MB blocks as possible. It is similar to defrag but instead it is more the idea of clumping together files in allocation units greater than clusters. In fact, it really is not as important to make sure the files are contiguous as it is to make sure they use up the 2MB blocks well.
Last week I wrote a VHDDUMP program to show various parts of the VHD file. Unfortunately it is too big to show in a listing and I still have no place to upload tools to. This is said in the hope that someone knows of a public location that will allow tools to be uploaded without the intent of sale or wide distribution.
Do you know of a way to change the offset of a vhd?
What do you mean by changing the offset? Do you mean changing the size?
There are many things that can potentially be changed within the VHD structure as long as some basic rules are followed.
For the bitmap of block, does the bit 0 mean the least significant bit or the most significate bit?
(I noticed the bit 0 in FSCTL_GET_VOLUME_BITMAP is the least significant bit. But for VHD, I cannot found info in the specification)
Thanks,
Dengfeng
Hi Dengfeng,
The bits are reversed from Windows standards. Bit 0 is presented as the most significant bit. The information came from closely understanding the specification and also from experience. I remember understanding this from the examples of how the differencing disk works.
Sector bit index
0 1 2 3 4 5 6 7
Matching bit in map
7 6 5 4 3 2 1 0
Example is index 0 corresponding bit 7 (mask 0x80).
Cheers,
Jeff
Hi, thanks for the paper. It helped me a lot. Only one thing that I still don’t understand: the functionality of block bitmap.
I have to make an VHD convrter/generator program.
I have instaled core-server on one fixed VHD. Then converted the fixed image to dynamic one. I have done all this with Hyper-V. Now, when I open both images with Hex Editor, I see that all block bitmaps are the same: 32 dwords of 0FFFFFFFFh and the rest of the 512 bytes bitmap filled with 0.
I don’t understand this. My block size is 8MB for all vhd-s produced with Hyper-V, but the block bitmap entries remain 512 Bytes. So, please clarify something to me, when I have to mark an sector(2048B? in my case the bitmap is 512, the block size is 8MB) as present/used, the corresponding bit has to be set or has to be cleaned?
Thanks!
Sorry for letting this one slip through. I did not understand the question at first and then I just forgot to answer.
The quick answer is that Microsoft does not suggest using non-standard block sizes. The only two sizes recommended in the VHD specification is either 256K or 2MB with the 2MB size being preferred. The 256K size was used for earlier version of Virtual PC.
The math of this makes a 2MB block fit into a 512 byte block bitmap. After having run this through tools many times, it typically means a 0x1001 sector length between the block data and the bitmap.
The problem with not using a standard is that the tools might not support other sizes. This looks like what happened to you. Converting from fixed to dynamic should have created a bitmap that lasted for 4 sectors (2048 bytes). The bitmap should be all ones given that the software does not know which part of the fixed disk was actually being used (unless it was all zeros for the sector or blocks). It sounds like the tool has not known to set the bits in the bitmap correctly.
The bitmap does not mean much for a dynamic disk. It really just marks whether the sector has been written to or not. The difference disk is another story. The bits in the diff disk control where the data is read from.
Both the MBR and VHD Header seem to be in the first sector of VHD. Am I right or missing something?
This does not sound correct. The VHD header only exists at the beginning and the end of the VHD file. The MBR is at the first content sector inside the VHD file. The content sector can vary based on the block table. Typically it would reside after the VHD header information in the first 2MB block.
Great article. Made lot of concepts clear to me.
I have doubt in understanding how to MBR and partition boundary in a VHD file. Assume, I create harddisk(dynamic vhd), with 3 partitions
D: – 10G
E: – 5G
F: – 5G
How to identify the MBR of the disk, where the first partition starts and first partition ends( same for other partitions). How is this information demarcated? The reason I ask this question is I want to convert the dynamic disk in to Fixed Image Disk.
The MBR is located in the first sector of the virtual disk. This translates into getting the first block of the VHD from the dynamic block list and interpreting the first 512 bytes. Read the VHD spec and it should become more obvious how to figure out the first block. Within the VHD file the block could potentially be anywhere (in a 2MB chunk) but most likely will be in a low offset in the file. The MBR is well understood and is documented at Wikipedia.
http://en.wikipedia.org/wiki/Master_boot_record
It is difficult to map the bytes into useful data but eventually you will get what you want. The number of sectors in a partition along with the LBA offset are key to determine partition size and location.
Sorry for the delay getting back to you.
Helpful post.
Is there a way to get information for OS in VHD ?
For example, OS name, OS version, Kernel Version and so forth.
Can I have to chance to use VHDDUMP ? of couse I will use it for study.
It is not easy to determine OS information from a VHD. The main reason why is that you need to be able to understand the file system of the operating system. For Windows, you would need to have NTFS support code to unravel files to examine OS information. The same would be true of Linux except it would be ext3 or something similar. You might get some decent clues from the MBR but this would not be enough to get the kind of detail you want.
What you want is possible but also difficult.
Currently I have no way of sharing VHDDUMP and I also suspect that I would need permission given that it was work done related to XenClient.
Hi,
thanks for great article. i am trying to convert VHD file into VMDK.For this i am using following algorithm.
1) Read 4 byte BAT entry (sequentially) (if it is not 0xFFFFFFFF).
2)This value is in Big-endian (as per Mention in article) so convert this value into Little-endian. After conversion value is Sector offset
3) Multiply sector offset by 512 to get actual offset of Block
4)goto block offset skip first 512bytes(BlockbitmapSector) and read remaining block(in my case block size is 2MB) and Write into VMDK file.
Do all the steps until BAT ends.
But unfortunately this is not working I am getting some extra sectors into the VMDK file than original VMDK .Is there any extra things needs to do for this.
Thanks in advance
This could be just a rounding error. VHDs have 2MB chunks and the original disk might not be exactly a multiple of 2MB for the original disk. The trick here would be to determine the actual disk size before doing the loop and then stopping when the disk size has been reached (on exactly the correct ending sector).
hello, I have a query and I was wondering if you could please provide some pointers. If I understood the spec right, it seems that the offset entries in the BAT correspond to an eqvivalent 2 MB window of consecutive sectors on the actual disk. This means that consecutive BAT entriey should increment in multiples of 512+2M. But in one the below BAT bump of the VHD, that I created via disk management i see that the jumps are like this:
00 00 00 04 00 00 90 0d 00 00 a0 0e 00 00 70 0b
00 00 b0 0f 00 00 c0 10 00 00 d0 11 00 00 e0 12
00 00 f0 13 00 01 00 14 00 01 10 15 00 00 50 09
00 01 20 16 00 01 30 17 00 00 20 06 00 00 30 07
00 00 40 08 00 01 40 18 00 01 50 19 00 01 60 1a
00 01 70 1b 00 01 80 1c 00 01 90 1d 00 01 a0 1e
00 01 b0 1f 00 01 c0 20 00 01 d0 21 00 01 e0 22
00 01 f0 23 00 02 00 24 00 02 10 25 00 02 20 26
00 02 30 27 00 02 40 28 00 02 50 29 00 02 60 2a
00 02 70 2b 00 02 80 2c 00 00 60 0a 00 00 80 0c
00 02 90 2d 00 02 a0 2e 00 02 b0 2f 00 02 c0 30
00 02 d0 31 00 02 e0 32 00 02 f0 33 00 03 00 34
00 00 10 05 ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff…….
the first offset of 4 is understood.. the next data block it says is at 900d meaning at 18MB from the start of the backing VHD file. This is strange… Then the third entry of A00E shows a jump of 1001, which is okay.. but then the 4th entry is less than the 3rd entry… This does not seem to add up from what I could make of the spec…
What you are seeing is perfectly normal. The offsets are to sectors within a certain range inside the VHD file. Each pointer points to a block that is by default 2MB + 512bytes which corresponds to 0x1001 sectors. The design of the dynamic VHD allows for the blocks to be non-contiguous. This means that the sequence of blocks can be out of order in the BAT. The code that supports mapping the virtual disk would need to use these offsets to make it look like a contiguous disk. This makes it similar to how file systems make a file look contiguous even though the file can be built from many different clusters (windows) on the physical disk.
VMDK takes it a step further and has another level of redirection. Breaking it down to another level gives it more granularity and more efficient use of disk space.
It might be helpful to see the pattern by dumping out the BAT and getting an idea of where all the data lives. You should find that all the blocks are used by the BAT for the range inside the VHD file. It would be easier to prove this with a smaller VHD.
Hi
First off thanks for this post.. really helpful in understanding the VHD format. I had a question on differencing vhd… Hope you could clear my doubt. Using CreateVirtualDisk API I created a one step parent-child chain. Now if i see the contents of the child, I see that the BAT starts from the 4th sector of the file..I can account for the first 3 sectors but not the 4th sector whose contents start with the parent’s name and nothing else(first 5sectors dumped together and 5th one is the start of BAT)
Sector 0:
0x0000 : 63 6f 6e 65 63 74 69 78 – 00 00 00 02 00 01 00 00 conectix……..
0x0010 : 00 00 00 00 00 00 02 00 – 16 a8 8f ce 77 69 6e 20 ………¿Å╬win.
0x0020 : 00 06 00 01 57 69 32 6b – 00 00 00 01 3f ef fe 00 ….Wi2k….?∩■.
0x0030 : 00 00 00 01 3f ef fe 00 – 28 a0 10 3f 00 00 00 04 ….?∩■.(á.?….
0x0040 : ff ff e9 65 72 e1 78 92 – b5 34 f6 44 96 6d 13 dd ΘerßxÆ╡4÷Dûm.▌
0x0050 : d3 c2 7b 53 00 00 00 00 – 00 00 00 00 00 00 00 00 ╙┬{S…………
0x0060 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0070 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0080 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0090 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x00a0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x00b0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x00c0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x00d0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x00e0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x00f0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0100 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0110 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0120 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0130 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0140 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0150 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0160 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0170 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0180 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0190 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x01a0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x01b0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x01c0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x01d0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x01e0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x01f0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
Sector 1:
0x0000 : 63 78 73 70 61 72 73 65 – ff ff ff ff ff ff ff ff cxsparse
0x0010 : 00 00 00 00 00 00 08 00 – 00 01 00 00 00 00 0a 00 …………….
0x0020 : 00 20 00 00 ff ff de 28 – 5f 23 91 a6 d7 62 5a 45 …. ▐(_#æª╫bZE
0x0030 : a8 8d a1 96 4a b4 93 d1 – 16 a8 8f 59 00 00 00 00 ¿ìíûJ┤ô╤.¿ÅY….
0x0040 : 00 43 00 3a 00 5c 00 63 – 00 6f 00 64 00 65 00 5c .C.:.\.c.o.d.e.\
0x0050 : 00 74 00 65 00 73 00 74 – 00 5c 00 70 00 61 00 72 .t.e.s.t.\.p.a.r
0x0060 : 00 65 00 6e 00 74 00 2e – 00 76 00 68 00 64 00 00 .e.n.t…v.h.d..
0x0070 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0080 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0090 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x00a0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x00b0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x00c0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x00d0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x00e0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x00f0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0100 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0110 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0120 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0130 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0140 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0150 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0160 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0170 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0180 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0190 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x01a0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x01b0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x01c0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x01d0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x01e0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x01f0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
Sector 2:
0x0000 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0010 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0020 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0030 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0040 : 57 32 6b 75 00 00 02 00 – 00 00 00 2e 00 00 00 00 W2ku…………
0x0050 : 00 00 00 00 00 00 06 00 – 57 32 72 75 00 01 00 00 ……..W2ru….
0x0060 : 00 00 00 18 00 00 00 00 – 00 00 00 00 00 00 30 00 …………..0.
0x0070 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0080 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0090 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x00a0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x00b0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x00c0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x00d0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x00e0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x00f0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0100 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0110 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0120 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0130 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0140 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0150 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0160 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0170 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0180 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0190 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x01a0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x01b0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x01c0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x01d0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x01e0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x01f0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
Sector 3:
0x0000 : 43 00 3a 00 5c 00 63 00 – 6f 00 64 00 65 00 5c 00 C.:.\.c.o.d.e.\.
0x0010 : 74 00 65 00 73 00 74 00 – 5c 00 70 00 61 00 72 00 t.e.s.t.\.p.a.r.
0x0020 : 65 00 6e 00 74 00 2e 00 – 76 00 68 00 64 00 00 00 e.n.t…v.h.d…
0x0030 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0040 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0050 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0060 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0070 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0080 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0090 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x00a0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x00b0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x00c0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x00d0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x00e0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x00f0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0100 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0110 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0120 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0130 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0140 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0150 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0160 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0170 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0180 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x0190 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x01a0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x01b0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x01c0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x01d0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x01e0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
0x01f0 : 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00 …………….
Sector 4:
0x0000 : ff ff ff ff ff ff ff ff – ff ff ff ff ff ff ff ff
0x0010 : ff ff ff ff ff ff ff ff – ff ff ff ff ff ff ff ff
0x0020 : ff ff ff ff ff ff ff ff – ff ff ff ff ff ff ff ff
0x0030 : ff ff ff ff ff ff ff ff – ff ff ff ff ff ff ff ff
0x0040 : ff ff ff ff ff ff ff ff – ff ff ff ff ff ff ff ff
0x0050 : ff ff ff ff ff ff ff ff – ff ff ff ff ff ff ff ff
0x0060 : ff ff ff ff ff ff ff ff – ff ff ff ff ff ff ff ff
0x0070 : ff ff ff ff ff ff ff ff – ff ff ff ff ff ff ff ff
0x0080 : ff ff ff ff ff ff ff ff – ff ff ff ff ff ff ff ff
0x0090 : ff ff ff ff ff ff ff ff – ff ff ff ff ff ff ff ff
0x00a0 : ff ff ff ff ff ff ff ff – ff ff ff ff ff ff ff ff
0x00b0 : ff ff ff ff ff ff ff ff – ff ff ff ff ff ff ff ff
0x00c0 : ff ff ff ff ff ff ff ff – ff ff ff ff ff ff ff ff
0x00d0 : ff ff ff ff ff ff ff ff – ff ff ff ff ff ff ff ff
0x00e0 : ff ff ff ff ff ff ff ff – ff ff ff ff ff ff ff ff
0x00f0 : ff ff ff ff ff ff ff ff – ff ff ff ff ff ff ff ff
0x0100 : ff ff ff ff ff ff ff ff – ff ff ff ff ff ff ff ff
0x0110 : ff ff ff ff ff ff ff ff – ff ff ff ff ff ff ff ff
0x0120 : ff ff ff ff ff ff ff ff – ff ff ff ff ff ff ff ff
0x0130 : ff ff ff ff ff ff ff ff – ff ff ff ff ff ff ff ff
0x0140 : ff ff ff ff ff ff ff ff – ff ff ff ff ff ff ff ff
0x0150 : ff ff ff ff ff ff ff ff – ff ff ff ff ff ff ff ff
0x0160 : ff ff ff ff ff ff ff ff – ff ff ff ff ff ff ff ff
0x0170 : ff ff ff ff ff ff ff ff – ff ff ff ff ff ff ff ff
0x0180 : ff ff ff ff ff ff ff ff – ff ff ff ff ff ff ff ff
0x0190 : ff ff ff ff ff ff ff ff – ff ff ff ff ff ff ff ff
0x01a0 : ff ff ff ff ff ff ff ff – ff ff ff ff ff ff ff ff
0x01b0 : ff ff ff ff ff ff ff ff – ff ff ff ff ff ff ff ff
0x01c0 : ff ff ff ff ff ff ff ff – ff ff ff ff ff ff ff ff
0x01d0 : ff ff ff ff ff ff ff ff – ff ff ff ff ff ff ff ff
0x01e0 : ff ff ff ff ff ff ff ff – ff ff ff ff ff ff ff ff
0x01f0 : ff ff ff ff ff ff ff ff – ff ff ff ff ff ff ff ff …………….
Thanks for the article. I recently started digging into VHD file and started with MS’s spec doc. Your article clarifies lots of things and was very helpful for dynamic VHD. But look like I’m still missing something. I wrote a test program to read 64 GB dynamic vhd file. For some reason I’m not able to match the size of the file with the one I calculated in my program.
This is the formula I use in my program:
TotalFileSize = 512 (Header) + 1024 (Dynamic header) + (32768 *4) //size of BAT + (valid entries in BAT * (2097152+512)) //size of blocks + 512 (footer)
32768 – #s of 2MB blocks needed for 64 GB data. I get this value correctly.
2 MB = 2097152
Based on this formula I can verify the VHD file size if the VHD disk is just initialized in disk manager. No partition yet. Just MBR. At this point, #s of valid entries in BAT is 1 (one).
Size of VHD file = 2230784
TotalFileSize = 2230784 // #s of 2 MB blocks = 1
Then I formatted the entire disk (64 GB) to NTFS volume.
Size of VHD file = 84868608
TotalFileSize = 81942016 // #s of 2 MB blocks = 39
I can’t understand what I’m doing wrong here. Why the file size that I calculate doesn’t match with actual size of the file.
I’m using 2008 R2 Hyper-V and my VM is 2008 R1. Any idea what I’m missing here.
The allocation of blocks in a dynamic VHD is done with a sparse method. In other words, it will only write blocks that are accessed with a write operation. What this really means is that creating and formatting the drive will only touch certain blocks. Most formats these days only focus on what they need to change instead of writing the entire disk.
So, you have not gone crazy. The assumption that format touches each block is the problem. You could prove this by dumping out the BAT. Quite a few of the entries should be empty.
Thanks for the reply. But if the sparse method is used, the actual file size should be be smaller than total I’m getting. Right?
> You could prove this by dumping out the BAT
I didn’t understood this. I printed all the valid entries (not -1) in the BAT and there are 39 of them. BAT[0]: 259 and the one pointing to last block is BAT[11]: 155945.
The simple answer is that a file system does take up space for a formatted drive. Even an empty drive really is not empty. Also, the rule is that anything that is written is saved. Even if the file system only uses the block temporarily and writes it, the VHD assumes that it needs to keep it. In other words, if you touch it, then it will be saved.
Hi,
I want to ask, how can i find the exact offset of the data of particular file. By using below formulas i was not able too see actual file content. (not able get actual file data offset)
BlockNumber = floor(RawSectorNumber / SectorsPerBlock)
SectorInBlock = RawSectorNumber % SectorsPerBlock
i refer values for
RawSectorNumber as actual file data offset (e.g 0x2FAD68)
SectorsPerBlock as Block Size in bytes so convert it in sector i.e 0x400
then i get values
BlockNumber = 0xBEB
SectorInBlock = 0x168
next is
ActualSectorLocation = BAT [BlockNumber] + BlockBitmapSectorCount + SectorInBlock
BAT[BlockNumber] => BAT[ 0x2FAC because (0xBEB * 4)] = 0x31FD1F
BlockBitmapSectorCount (didn’t get exactly ) but consider as 1
SectorInBlock = 0x168
so, final result for
ActualSectorLocation = 0x31FE88(in sector) * 0x200 = 0x63FD1000(in bytes)
but i found actual data offset at 0x63FD8E00 why is it so?
and also what is actual relevance for BlockBitmapSectorCount
Thanks,
Gaurav
Hi Gaurav,
I have read this a few times and there is one thing that sticks out.
The number of sectors in a block is 0x1000. There is an extra sector to track sectors used (in diff disks). This means that for every block there is 0x1001 sectors used. The core assumption is that the standard 2MB block is being used.
For your calculations, the magic SectorsPerBlock should be 0x1000.
I have not looked at this for some time so there might be some mistakes.
If the RawSectorNumber is 0x2FAD68, then the index into the BAT should be 0x2FA and the sector offset is 0xD68.
Since the BAT is a table, the values will vary greatly based on the index selected.
There is not quite enough information here to figure everything out.
The most common mistake is not to take into account the sector used to track the block sectors. It is the first sector of the block.
Regards,
Jeff
Hi Jeff,
I need to ask about SectorPerBlock field.actually question is How can we identify the SectorPerBlock field?
In (.vhd) dynamic image file on my machine was found the BlockSize (in bytes) is 0x80000 as i see in specification, thus In this scenario i referred value for SectorPerBlock is 0x400 is it correct?
and according to you, sectors in a block referred as 0×1000.
i didn’t get it exactly or how can i verify this value?
Thank you,
Gaurav
Hi Gaurav,
Yes, you are correct. Based on past experience, it is very rare to see a VHD with a block size that is not 2MB. Your block size is 512K which means there are 1024 sectors in one block. Based on the rules of how VHDs work, it would still require a sector at the beginning of each block for tracking differences between parent and child. This means that the number of sectors is actually 0x401 (1025) per block when you take into account the sector tracking. However, when it comes to your offset calculations, the assumption of 0x400 is correct to start from.
It seemed like a good idea to look at the problem from a different angle. Based on expected versus calculated, the difference is 0x3f sectors. This is the typical value associated with a “head” on the disk. The first thing that popped in my mind is the way the MBR is constructed and the starting sector of the volume can vary based on the settings in the MBR. If you have the incorrect file offset, it obviously would cause the mismatch.
How do you determine the file (in the volume) offset? Which file system is it?
Thanks,
Jeff
Hey Jeff,
Maktub!!!, Its NTFS File System, I found the actual data offset of file, Its really amazing man and very helpful with communicating with you.
very much things are clear.
Thank you very very much !!!
Gaurav
Hi jeff,
I face an issue within dynamic vhd file contains 4 NTFS partitions which are primary out of 3 are primary and 1 Logical Partition in extended but I got only 3 primary partition entries in MBR (4 entry is empty contains 0’s).
how can i find logical NTFS partition?
I created extended partition using disk management utility in windows 7 OS.
Regards,
Gaurav
The way I would suggest attacking this is better understanding how MBR and EBR work. These are best explained by Wikipedia.
http://en.wikipedia.org/wiki/Master_boot_record
http://en.wikipedia.org/wiki/Extended_boot_record
For a good comprehensive description, please read:
http://www.ntfs.com/logdrives.htm
I would guess that the logical drive is hiding in one of the three entries that you have.
Hi Jeff,
Great article! But I miss some information about the use of dynamic vhd’s and how they align with the “host volume”. From what I can read and understand from the VHD white paper and your article the dynamic VHD layout is like disk
Footer Header
Drive Header
BAT
512 bytes Sector bitmap
2MB Data
512 bytes Sector bitmap
2MB Data
No matter how I calculate this I can never get the dynamic VHD to align perfect with the host volume, because of the bitmap sector! Do you agree or am I missing something?