VHD Difference Disk

What is the difference?  With VHDs, it is a classification of virtual drive.  The other two types have already been covered and now it is time to briefly cover what makes the difference VHD disk interesting.

Why does it exist?  Perhaps the most obvious answer is that it could save space.  The difference disk is actually linked to a parent disk.  The parent disk represents a read-only copy of a VM.  Technically it does not need to be a full copy but let’s assume that it is to make it easy.  Once the parent and child (difference disk) are bound, the parent VHD is no longer allowed to change.  The child VHD has pointers to the parent VHD by using various name markers (relative/absolute, UNICODE/UTF-8).  The link is not guaranteed and obviously it is possible for an admin/user to break the connection.  It would be an easy mistake to make.  The difference file could be moved to another system which has no access to the parent file.

What lives in the child disk?  Only the written changes are kept in the virtual disk.  Also, the changes are marked on a sector bitmap which shows which sectors are coming from the child and which ones are coming from the parent.  From the operating system point of view, this is transparent.  However, the VM player is responsible for splitting up the requests between the two virtual disks (parent and child).  This also means that the two disks need to be opened when the VM is running.

The sector bitmap is actually at the front of the blocks in the VHD.  In a dynamic disk, the sector bitmap shows which sectors have been written.  For a difference disk, it shows the ownership of the sectors between child and parent.  

I have been playing with difference disks over the last couple of weeks and now understand the nature of how this fits together.  One key point is that this is happening at the sector level which would make it very hard to figure out which files had changed.

An annoying aspect of the sector bitmap is that it does not align with clusters.  Because the first volume sector happens at 0x3F, the first eight sector cluster happens from 0x3F to 0x46.  So, cluster zero maps to bits in three different bytes of the sector bitmap.  Life would have been a bit easier if the clusters aligned with the sector bitmap bytes.  Nevermind, this is really only annoying for people trying to correspond volume clusters to low-level sectors.

It is worth noting that the difference VHD disk has no intelligence about what is being written.  In other words, it is highly likely that data written which happens to be the same as before will still trigger usage in the difference disk.  Also of interest is that all VHD disks have no sense of what has been freed.  This means that even if written data is freed by the file system, it will still be retained in the VHD.  And finally, all data is treated equally so this means that even if the data is not worth keeping (temporary content) the VHD will do its best to hold onto it blindly.  It appears that the pagefile fails into this category.

The greatest value of the difference disk would come from a template model.  An admin could create a dynamic VHD disk for the work environment and then use the difference disk to create user copies.  The benefit would be space savings and potentially faster transfer for remote use (assuming the template is already there).  The missing piece is being able to update the template and have it take affect on the user difference disks.  By the current definitions/standards, this will not work.  The simple reason why is that it would be nearly impossible to merge the two together based on blocks changing on both the child and the parent.  Since the VHD format has no knowledge of files and directories, it has no way of knowing what to merge.

The difference disk seems similar to linked clone technology.  However, linked clone uses versioning which allows for the parent to move forward.  Unfortunately, even linked clones have no knowledge of how to merge with an updated parent.

About

Live near Brisbane, Australia. Software developer currently focused on iOS and Android. Avid Google Local Guide

Tagged with: ,
Posted in VHD
11 comments on “VHD Difference Disk
  1. Matt Moritz says:

    From your investigation of the sector mapping/BAT of the differencing VHD, is it possible that a tool that could be written to unravel the file format and read the remains of the filesystem?

    I’m thinking of cases where a parent VHD gets modified and breaks the differencing disks ability to be used by the virtualization software. Yes, it shouldn’t happen, but sometimes mistakes happen…

  2. Matt Moritz says:

    It would appear that the folks at winimage have already done the legwork on this one.

  3. Gaurav says:

    Hello jeff,

    In Differencing Image of VHD, i’m getting data for single partition but facing one scenario, that is for multiple partitions in Parent VHD, how can i identify files in Differencing image (child VHD) belonging to which Partition in Parent VHD?

    and is it possibility for whole partition stored in child VHD?

    • jeffreymuir says:

      Hi Gaurav,
      The simple answer is the disk offset in the virtual disk. VHD format does not care about NTFS or partitions. It only mirrors a physical disk in nature. This means that partitions belong on certain ranges of the virtual disk. If the file is within a certain location on the virtual disk, it can be associated with that particular partition. This argument sounds a bit circular but it does happen to be true.
      If you have a file record, it should be able to back track to the beginning of the partition. This offset should be another clue to which partition you are dealing with.
      Perhaps I have misunderstood your question.
      Regards,
      Jeff

  4. Rushikesh says:

    Hello Jeff,

    Thank you for wonderful information and as far as I understood, VHD represents a physical disk and its basic fundamental is a 512byte sector which is represented by BITMAP. BAT is used to reference BITMAP.

    I’m searching/trying to write for a program which can compare the block level differences in 2 VHDs. Lets say I have VHD1 and I make its copy by simple copy command. ( #cp x.vhd y.vhd ) and then I make some changes in x.vhd from VM. Now what can be done to find out the block level differences between x.vhd and y.vhd ?

    Thank you for your time.

    • jeffreymuir says:

      It is possible to do what you want if you use the VHD spec to guide you. The BAT points to different blocks inside the VHD file. Essentially it has an array of sector offsets (file offset divided by 512 bytes). Each block is typically 2MB in size which is 0x1000 sectors. The configuration of the VHD is specified in the VHD header information. Each block has a prefix sector which is the BITMAP of which sectors are used in the block. The position in the array of blocks determines the offset into the virtual disk.

      So, if you walked the array between two different VHDs, you could compare the BITMAP and following sectors to find differences. It just come down to use the BAT array values to calculate the offsets into the file. Once you have the file offsets, it should be fairly easy to read and compare.

      • Rishi says:

        Thanks Jeff for explanation.

        I do have the VHD spec guide with me and have done the initial work of comparing the BAT and then BIT MAP. From BAT I only get if a block is used or not. I actually need to find blocks that are modified. I think VHD spec has no provision for it as of now. I’m using blktap to modify the source and trying to get it work.

      • jeffreymuir says:

        The value you get from the BAT reveals the offset in the VHD for where the block data lives. You will need to compare the data for the blocks with your program since the VHD spec has no concept of tracking changes. There is one trick you could do where you could see bits in the BITMAP changing which would indicate differences if only one bit is set for the two different VHDs. The BITMAP indicates if the sector is being used.

        But, to get what you really want, you will need to read blocks and compare.

      • Rishi says:

        Right. The BITMAP can give a hint if sector has changed but its not reliable. I’m trying to inject a md5/sha1 header in each datablock but seems to be a lot of work as it will need to track every block write operation and as far as I understand the sector number inside VM denotes the offset in VHD & the actual location is calculated as header + (offset x sectorsize) ?

      • jeffreymuir says:

        I like the idea of having a MD5/SHA1 hash to track the differences but it might not make sense to embed the hash in the block. It implies that the block is fully under your control whereas most virtual disks would not allow this. It also would not be efficient to track every write. Block compares usually only happen occasionally based on running some checking tool. At one point I was using hashes to track differences between files and this worked fairly well. I checked the code I have for figuring the offset in the VHD. It appears that the sectoroffset in the BAT does not need to be adjusted except for multiplying by the sector size.

        Also keep in mind that Microsoft has been working on a new VHDX format for Windows 8.

  5. Rushikesh says:

    Thanks for the details Jeff, I got little away from this interesting VHD stuff but I’ll try to work more on it for some conclusion.

Comments are closed.

Archives
Categories
Follow Red Circle Blog on WordPress.com
%d bloggers like this: