Read-only file compression with SquashFS

Data Crush

Using SquashFS

Using SquashFS is not difficult, comprising only two steps. The first step is to create a filesystem image using the SquashFS tools. You can create an image of an entire filesystem, a directory, or even a single file. This image, then, can be mounted directly (if it is a device) or mounted using a loopback device (if it is a file).

The tool that creates the image is called mksquashfs, which has a number of options that allow control over virtually all aspects of the image. The man page is not very long, and it's definitely worth a look at the various options. Any user can create an image of any part of their data they desire. However, mounting it requires root access (or at least sudo access).

As an example, I'll take a directory (/home/laytonjb/20170502) on my desktop where I have stored PDFs, ZIP files, and other bits of information and articles that I collect throughout the month (I'm a digital hoarder). I want to compress this directory and all its subdirectories and files. Then, I want to mount it read-only so I can access the information but still save some space.

Before compression the directory was about 358MB:

$ du -sh
358M    .

The first step is to create the image file, which can be done by the user as long as the resulting image is stored somewhere the user has permission (Listing 2). Notice that the command gives a reasonable amount of output without being too verbose.

Listing 2

Creating a SquashFS Image File

$ time mksquashfs /home/laytonjb/20170502 /home/laytonjb/squashfs/20170502.sqsh
Parallel mksquashfs: Using 4 processors
Creating 4.0 filesystem on /home/laytonjb/squashfs/20170502.sqsh, block size 131072.
[================================================-] 2904/2904 100%
Exportable Squashfs 4.0 filesystem, gzip compressed, data block size 131072
        compressed data, compressed metadata, compressed fragments, compressed xattrs
        duplicates are removed
Filesystem size 335196.73 Kbytes (327.34 Mbytes)
        91.53% of uncompressed filesystem size (366234.01 Kbytes)
Inode table size 8424 bytes (8.23 Kbytes)
        50.01% of uncompressed inode table size (16846 bytes)
Directory table size 2199 bytes (2.15 Kbytes)
        63.72% of uncompressed directory table size (3451 bytes)
Xattr table size 54 bytes (0.05 Kbytes)
        100.00% of uncompressed xattr table size (54 bytes)
Number of duplicate files found 1
Number of inodes 94
Number of files 93
Number of fragments 5
Number of symbolic links  0
Number of device nodes 0
Number of fifo nodes 0
Number of socket nodes 0
Number of directories 1
Number of ids (unique uids + gids) 1
Number of uids 1
        laytonjb (1000)
Number of gids 1
        laytonjb (1000)

I used the command defaults, which means a block size of 128KiB (131,072 bytes) and the use of gzip to compress the data. In the output, SquashFS states that it was able to compress the data to 91.53% of its uncompressed size, or to 328MB (327.34MB).

Notice that I used the time command to time how long it took to run the command. The results were:

real 0m7.675s user 0m29.074s sys 0m1.002s

This looks to be pretty fast for compressing 358MB of data (on an SSD).

The next step is to mount the SquashFS image as you would any other filesystem. Out of the box, root needs to do this because the user does not have access to the mount command.

$ mount -t squashfs /home/laytonjb/squashfs/20170502.sqsh /home/laytonjb/20170502_new -o loop
$ mount
...
/home/laytonjb/squashfs/20170502.sqsh on /home/laytonjb/20170502_new type squashfs (ro,relatime,seclabel)

Now look at /home/laytonjb/20170502_new to make sure everything is there and permissions are as expected (Listing 3). I can look at the files, and they are owned by me.

Listing 3

Viewing Mounted SquashFS Image

$ ls -lsat
...
  830 -rw-r--r--.  1 laytonjb laytonjb   848854 Jun 10 13:58 mesos.pdf
  535 -rw-r--r--.  1 laytonjb laytonjb   546505 Jun 10 13:58 Martins2003CSD.pdf
 8803 -rw-r--r--.  1 laytonjb laytonjb  9013307 Jun 10 13:58 Hwang2012c.pdf
...

Optimization Study

The two major options you are likely to use are -comp [comp] and -b [bsize]. The first option allows you to specify the compression algorithm used (from the options listed earlier). The second option allows you to control the block size (from the default of 128KiB to the maximum of 1MiB). Larger block sizes can help improve the amount of compression.

The simple command that uses the LZMA compression and a 1MiB block size would be:

$ mksquashfs /home/laytonjb/20170502 /home/laytonjb/squashfs/20170502.sqsh -comp lzma -b 1048576

The directory I've used in the examples is full of PDF and ZIP files. I didn't expect it to compress too much, but I did get some compression. As an experiment, I tried all four compression techniques with the default block size, 128KiB, and the maximum block size, 1MiB (Table 1).

Table 1

Compression and Block Size

Compression Technique Block Size User Time Compression
gzip 128KiB 00:29.074 91.53%
  1MiB 00:31.050 91.35%
lzo 128KiB 01:36.262 92.31%
  1MiB 01:47.967 92.08%
xz 128KiB 03:14.064 90.49%
  1MiB 03:47.730 88.71%
lzma 128KiB 03:10.494 90.48%
  1MiB 03:44.004 88.78%

Pretty obviously, the fastest compression technique is Gzip, with little difference in the user time it took for either block size (two-second difference, or a little less than 10%). The large block size did give a very tiny bit of extra compression.

The xz and lzma algorithms result in the most compression and take the longest – much longer than gzip – but even for the default block size, they can compress the data by about 10%. Using the largest block size, they can get a little more compression: a little over 11%.

You might scoff at 10%, but remember that the files are binary. If you have 100TB of data, 10% is 1TB. Not too bad. If you have 1PB, then 10% is 100TB, which is quite a bit of space.

Summary

Even though data storage has gotten inexpensive, data consumption grows at a faster rate than storage. I don't think I've ever heard anyone ask for less storage space. Finding ways to reduce the amount of data is a key function in the life of an HPC administrator.

One way to conserve space is to compress data that is not used very often. Although you can do this on a file-by-file basis, a better way is to collect all of the data into a single directory and create a compressed filesystem image. SquashFS is probably the best tool for the job, because it's very easy to use and comes with virtually every Linux distribution out there. Give it a try; you won't be disappointed.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs



Support Our Work

ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.

Learn More”>
	</a>

<hr>		    
			</div>
		    		</div>

		<div class=