[Read This Fine Material] from Joshua Hoblitt

Shrinking and/or compressing raw VM images


I need to ship a clone of a VM image “transhemispherically” over a high latency link that tends to have many connection failures with scp/ssh+rsync. I decided to investigate shrinking the raw images that are in use for performance reason.

The VM image I used for testing in a clone of a real production image. It’s a 40GB raw file with the following paritioning and data usage.

$ du -sk foo.img
41943044	foo.img
[jhoblitt@foo ~]$ df -h
Filesystem            Size  Used Avail Use% Mounted on
                      7.9G  3.5G  4.1G  47% /
tmpfs                 1.9G     0  1.9G   0% /dev/shm
/dev/vda1             248M   57M  179M  25% /boot
                      7.9G  147M  7.4G   2% /home
                      4.0G  140M  3.7G   4% /tmp
                      7.9G  1.6G  6.0G  21% /var
                      7.9G  147M  7.4G   2% /d1/ftp

The test setup was as follows to test raw -> qcow2 (with and without native qcow2 zlib compression), lzma, bzip2, and gzip (zlib) compression.

# convert raw to qcow2 (non-sparse)
qemu-img convert -O qcow2 foo.img foo.qcow2

# convert raw to compresed qcow2 (non-sparse)
qemu-img convert -O qcow2 -c foo.img foo.compressed.qcow2

# compressing the raw image
lzma -k --best foo.img &
bzip2 -k --best foo.img &
gzip --best -c foo.img > foo.img.gz &

# compressing the non-natively compressed qcow2 image
lzma -k --best foo.qcow2
bzip2 -k --best foo.qcow2 &
gzip --best -c foo.qcow2 > foo.qcow2.gz &

The results are as follows:

$ du -sk foo.* | sort -nr
41943044	foo.img
6438348	foo.qcow2
2106320	foo.compressed.qcow2
1884580	foo.img.gz
1850252	foo.qcow2.gz
1667536	foo.img.bz2
1667204	foo.qcow2.bz2
1122428	foo.img.lzma
1116324	foo.qcow2.lzma

I find several things in the results fairly surprising. I would not have have expected the gzip’d qcow2 image to be ~12% smaller than the native qcow2 zlib compression. The other big surprise was that lzma compression of the raw image was within 1% of running lzma on the qcow2 image. Since I want raw images after transfer anyways, lzma of the raw image is the winner for me with it’s factor of 37 compression ratio. However, keep in mind that for my usage case I’m shipping across a high latency / low throughput link so I’m willing to pay for the substain lzma compression time. I did not include compression times in the results since I was compressing many images in parallel.

Leave a Reply