One aspect of volume management is knowing which clusters are free and which ones are used. This is typically something managed solely by the operating system but it is sometimes possible to get a glimpse of how things align. Microsoft published a few interfaces a few years ago that were once considered undocumented. The set of API targets being able to defrag a disk. The cluster map is gathered using FSCTL_GET_VOLUME_BITMAP. A cluster is the most basic unit of the file system. It is defined by what is specified in the boot sector of the volume. Windows apparently always uses a sector size of 512 bytes with the option of different cluster sizes (multiples of the sectors). The two fields in the boot sector are “sectors per cluster” and “sector size”. The boot sector has this information at offset 0x0B for “sector size” (WORD) and offset 0xD for “sectors per cluster” (BYTE).
The cluster size typically corresponds to the size of the disk. The larger the disk, the larger the cluster size. My main 250GB drive has a cluster size of 4K. Originally the drives were small enough to have the sector size and cluster size match (512 bytes).
Back to FSCTL_GET_VOLUME_BITMAP. When the information is successfully returned from the IOCTL, it reveals the cluster pattern for the volume. The structure returned is VOLUME_BITMAP_BUFFER which is effectively a bitmap of used/free clusters. Each byte in this “Buffer” corresponds 8 clusters. The lowest bit represents the first cluster of that byte. Just today I figured that if you had 64 bytes of bitmap data, it would correspond to 2MB of data with 4K clusters.
The actual output of the bytes shows an idea of where the used and free space is concentrated. As expected, most of the early parts of the disk are used while the last parts are usually free. There is also hints of fragmentation since there is gaps between sections of data which probably used to be files.
It is actually possible to gather free/used cluster counts from the bitmap by throwing the data through a counter that changes the byte patterns to actual count pairs. I wrote a program that scanned the whole bitmap using each nibble to match against pre-programmed arrays. So, put in 0xF and get back 4 used 0 free. Put in 0x6 and get 2 used and 2 free. You get the idea. Originally I had thought of doing it against the byte but was not looking forward to entering the 256 combinations.
I keep on thinking of defrag programs from the past (like Norton) that show the cluster map (from a high level view) and moving files around. Now it seems fairly simplistic given the amount of clusters involved. It also seems a bit risky given the temporary nature of the free/used bitmap.
The point there is that the amount of free/used clusters is always changing based on system activity. A snapshot using the IOCTL is just a picture in time and does not guarantee that things are still the same. Even Microsoft recommend to assume that you might not get the free clusters you want for a defrag operation so you better be prepared to try again.
The actual information lives inside NTFS in a metadata file called $Bitmap. It is MFT record number 6 (reserved and for all time the same). $Bitmap cannot be directly read from any Windows program since it is only intended for the file system. Obviously Microsoft does not want anyone to change this file. It would play major havoc on Windows most likely.
The cluster map in $Bitmap is in theory the basis of what is returned from the IOCTL. However, based on not being able to do both at exactly the same instant means that they could vary. The exception to this would be if you could freeze Windows somehow.
Speaking of freezing Windows, the only way to do this successfully is to access the information when nothing is changing. The easiest way is to access the volume when it is not booted from. As long as no running program is changing the non-boot drive, it should be possible to get an accurate snapshot that will stay good over time.
Coming from a VHD angle, you could mount the VHD and then use the IOCTL. Or, you could spend a lot of time understanding the NTFS format along with the VHD format to go get the $Bitmap file yourself. Difficult, but entirely possible.
Having come to the end of this post, it seems that this topic might be a bit tangential to what most of you might be interested in. Let’s assume that it is really meant for the tinkerers out there that like to know where the disk space is really being used. Please expect a few more words about this area in the coming weeks.
I can’t understand the way to find out the free or used cluster from the bitmap buffer retrieved.