This topic presents an interesting problem. A disk is made up of sectors which are arranged as clusters by the file system. Both NTFS and FAT use a cluster model to clump together sectors into bigger chunks. The cluster model has been around since the original DOS and still runs strong today. The boot sector of the volume contains how many sectors of a certain size belong to one cluster. On my Vista system the clusters are 4K (8 sectors of 512 bytes each). This can vary for USB Flash Drives and smaller hard drives. My flash drive reports a cluster size of 32K (64 sectors/cluster). All of this is fine but then the question becomes why should I care?
The answer becomes more relevant when virtualization comes into the picture. For a VM, the disk is virtual and is actually a file within another file system (most of the time). Microsoft and Citrix use the VHD format for the VM files. The VHD specification is public knowledge since Microsoft has documented as of a couple of years ago. Given that there is a VHD file, everything needed by the operating system is there. However, it becomes very difficult to manage this information from the outside. Yes, there are ways to mount VHD drives within a native operating system, but this process is not necessarily easy to automate. Well, at least not for everyone.
Then a new factor enters the equation. Since the outside tools cannot see inside the VHD to understand what Windows is actually using, it becomes very difficult to do any kind of analysis or consolidation. Microsoft does have a solution for compressing a VHD with Virtual PC 2007. Unfortunately, there are many steps and it involves executing code both inside and outside the VM. Wouldn’t it be nice if this could be managed completely from the outside? Wouldn’t it be nice if every cluster (block) was paired with a file?
This sounds difficult and overall the problem is very tough. The benefits however would be huge. Basically any file operation performed on the inside could potentially be performed on the outside. This would include things like defragmentation and shrinking the VHD to get rid of the blank chunks. It could also include peering into the VHD to see what is there and even the hope of doing updates.
Other possible ventures would include merging virtual disks and even creating virtual disks out of multiple virtual disks. It is possible to focus on the files instead of the blocks, it would much more possible to have base and delta disks which would both be allowed to change but yet form a cohesive volume to the user. It is good to dream.
The sources of information look promising. Microsoft has published APIs related to defragmenting disks which can locate a file on disk. The API also allow for cluster relocation. Beyond this, there are projects for Linux to understand NTFS. Those teams have done much to discover the structure of NTFS and have included this knowledge in their programs and their documentation. With these kind of guidelines, with patience, NTFS starts to open up and new things become possible.
There is a bit of vagueness about going on here. It is still too early to talk about in detail. However, it does seem that specific tasks are within reach which did not look so possible before. Combining the knowledge of VHD with NTFS to form new tools looks incredibly attractive.