Compressed Archives for User Projects

Jeff Layton

The ability to take a bunch of files and data and create a single archive file is very powerful. Thousands of files and data put together in a single file means you can easily and conveniently keep projects intact. A single file makes it easier to copy or move the entire project between systems and into and out of various types of storage.

Because the data going into the archive will have a great deal of repeated bit strings, you can use compression tools to squeeze the archive to a smaller file. Some archive tools can create the archive and compress it at the same time. In other cases, you create the archive with one tool and then compress it with another.

If you want to access anything in the archive, you have to uncompress/unarchive it, which gives you access to everything in that archive. It’s an all or nothing proposition, but what if the archive is large or it takes a lot of time to unarchive? For this scenario, archives lose some of their appeal.

I’ve written about a tool named archivemount that lets a user mount the archive, even if it is compressed, and then use the data in the archive as if it were a local filesystem. Over time, I started using this tool for another purpose. When I create a new project, whether it be a new article, new code, or whatever, I put everything in a single subdirectory, including any data I use in the project. For example, I am using archivemount to control this project. The cool thing is that I can do everything as a user and don’t need any intervention by root, which is the point of this article. I’ll begin by reviewing both FUSE and archivemount .

What is FUSE?

The Linux filesystem in userspace (FUSE) software interface allows users to create filesystems for any purpose without having to add it to the kernel. FUSE solutions can add latency related to data flow into and out of the kernel, and might not have the throughput a kernel-based filesystem can achieve; however, FUSE lets developers create new filesystems that solve particular problems or gives users additional capability without having to add lots of new filesystems to the kernel, allowing experimentation without the fear of taking the system down.

An additional advantage to FUSE solutions is that they can be used to mount a filesystem from a different operating system directly on a new system. For example, one of the more popular FUSE solutions is NTFS-3G, which lets you take an NTFS volume from a Windows system and mount it on a Linux system as though it were local.

You can find FUSE solutions all over the web (e.g., on GitHub), and you might find some that you did not suspect were FUSE solutions (e.g., one of my favorites, sshfs ).

archivemount

The archivemount FUSE solution allows you, as a user, to mount an archive even if it is compressed. Once mounted, you can read, remove, change, and add files to the mounted archive exactly as if it were a normal Linux filesystem. When you umount (unmount) it, it will recreate the archive – even compressed. How cool is that?

Compare this process to your normal method: You have to uncompress, and possibly unarchive, the file to your local filesystem to read, remove, edit, change, and add files to the directories contained in the archive; then, once finished, you have to re-archive the directory and perhaps even recompress it. Don’t forget, though, that you still have everything in the archive on the local filesystem and that you have two copies of the project. Ideally, you would erase the data in the local filesystem so you only have one copy. Although this method only takes a few commands, you still have to pay attention to what is most current – the local filesystem or the archive.

Over time, you will want to make a backup of the project, so you can either back up all of the files in the local directory or, if you are using archivemount , make a backup of a single file. Although the backup tool does all of the work, conceptually I like just having to back up one, albeit large, file.

To mount a compressed archive to a local directory with the archivemount command, enter:

$ archivemount <archive file> <mountpoint>

It's that simple.

archivemount Example

Assume you have a set of images you want to process, and you would like to create a compressed archive (.tar.gz ) of the data and then create Python scripts to do the processing. In this case, you start with a small set of images for cats and dogs that is used for learning about convolutional neural networks (CNNs); however, assume you’re doing something else with the images.

A main directory cats_dogs_light has two subdirectories, train and test . The train subdirectory has 1,000 images, and the test subdirectory has 400 images:

laytonjb@laytonjb:~/DATA_STORE/DATA1/cats_dogs_light/train$ ls | wc -l
1000
laytonjb@laytonjb:~/DATA_STORE/DATA1/cats_dogs_light/test$ ls | wc -l
400

To begin, create an archive of the data and then compress it:

laytonjb@laytonjb:~/DATA_STORE$ tar cf data1_08022025.tar DATA1
laytonjb@laytonjb:~/DATA_STORE$ pigz -9 data1_08022025.tar 
laytonjb@laytonjb:~/DATA_STORE$ ls -s
total 31156
    4 DATA1  31152 data1_08022025.tar.gz

Note the use of pigz , which is a parallel version of gzip that uses all the system cores, if possible, to do the compression and is much faster than the serial version.

Next, simply mount the archive, but first create the mountpoint:

laytonjb@laytonjb:~/DATA_STORE$ mkdir DATA1
laytonjb@laytonjb:~/DATA_STORE$ archivemount data1_08022025.tar.gz \
  /home/laytonjb/DATA_STORE/DATA1

I used the fully qualified path to the mountpoint, but you can use a local path if you like.

Next, check with the grep command that the archive is mounted to find the archivemount filesystem type:

laytonjb@laytonjb:~/DATA_STORE$ mount | grep archivemount
archivemount on /home/laytonjb/DATA_STORE/DATA1 \
  type fuse.archivemount (rw,nosuid,nodev,relatime,user_id=1000,group_id=1000)

Yep, it’s mounted. A subtle aspect you might miss is that you did all of this as a user, which is what FUSE gives you: the ability to create filesystems or virtual filesystems in user space so you don’t need elevated privileges.

In the current directory (here, /home/layton jb /DATA_STORE ), check the content (Listing 1). Note that DATA1 is the mountpoint. Also notice the date on the compressed archive, because you will use that in a bit.

Listing 1: Check Directory Content

$ ls -lstar
total 62310
31150 drwxrwxr-x  0 laytonjb laytonjb 31897275 Dec 31  1969 DATA1
31152 -rw-rw-r--  1 laytonjb laytonjb 31897275 Aug  2 09:52 data1_08022025.tar.gz
    4 drwxrwxr-x  3 laytonjb laytonjb     4096 Aug  2 09:55 .
    4 drwxr-x--- 18 laytonjb laytonjb     4096 Aug  2 09:56 ..

Now change directories into the mounted archive and look at the list of files to be 100% sure your data is there (Listing 2). Because I had 400 images in the test directory, I cut off the listing, but it’s all there.

Listing 2: Checking the Mounted Archive

laytonjb@laytonjb:~/DATA_STORE/DATA1$ cd DATA1/cats_dogs_light/test
laytonjb@laytonjb:~/DATA_STORE/DATA1/DATA1/cats_dogs_light/test$ ls -s
total 9222
31 cat.9818.jpg  36 cat.9872.jpg  11 cat.9925.jpg   4 cat.9978.jpg  17 dog.9807.jpg  25 dog.9860.jpg  23 dog.9913.jpg
10 cat.9819.jpg  27 cat.9873.jpg  14 cat.9926.jpg  25 cat.9979.jpg  14 dog.9808.jpg  26 dog.9861.jpg  32 dog.9914.jpg
15 cat.9820.jpg  15 cat.9874.jpg   5 cat.9927.jpg  51 cat.997.jpg   37 dog.9809.jpg  32 dog.9862.jpg  25 dog.9915.jpg
27 cat.9821.jpg  14 cat.9875.jpg  24 cat.9928.jpg  29 cat.9980.jpg  23 dog.9810.jpg  22 dog.9863.jpg  33 dog.9916.jpg
...

New Data in the Mounted Archive

The big test is to create new data in the mounted archive and have it survive the umount process, which should capture the file. To begin, make a file with touch :

laytonjb@laytonjb:~/DATA_STORE/DATA1/DATA1/cats_dogs_light$ touch test.py
laytonjb@laytonjb:~/DATA_STORE/DATA1/DATA1/cats_dogs_light$ ls -s
total 0
0 test  0 test.py  0 train

Notice that the file test . py is there but is zero length. The next step is to umount the archive, which should save any new data. You have to make the present working directory (pwd ) in a location different from the archive, so you can unmount it:

laytonjb@laytonjb:~$ cd 
laytonjb@laytonjb:~$ cd DATA_STORE/
laytonjb@laytonjb:~/DATA_STORE$ umount /home/laytonjb/DATA_STORE/DATA1

To make sure the archive unmounted, use the mount command to check for any mounted archivemount filesystems:

laytonjb@laytonjb:~/DATA_STORE$ mount | grep archivemount
laytonjb@laytonjb:~/DATA_STORE$

Because no output is reported by the command, the archive is not mounted.

Next, look at the files in the directory (Listing 3), paying particular attention to the compressed archive (.tar.gz ). Pay close attention to the date of the compressed archive. Originally it was Aug 2 09:52 , but now it is Aug 2 10:06 , so something changed the date. If you examine the size of the file, you can see that the original archive was 31897275, but now it is a little bigger, at 31899301.

Listing 3: Files in the Compressed Archive

laytonjb@laytonjb:~/DATA_STORE$ ls -lstar
total 62316
31152 -rw-rw-r--  1 laytonjb laytonjb 31897275 Aug  2 09:52 data1_08022025.tar.gz.orig
    4 drwxrwxr-x  2 laytonjb laytonjb     4096 Aug  2 09:55 DATA1
    4 drwxr-x--- 18 laytonjb laytonjb     4096 Aug  2 10:06 ..
    4 drwxrwxr-x  3 laytonjb laytonjb     4096 Aug  2 10:06 .
31152 -rw-r--r--  1 laytonjb laytonjb 31899301 Aug  2 10:06 data1_08022025.tar.gz

The archivemount tool rebuilt the archive during the umount process. To be extra cautious, I mounted the new archive with archivemount , and the new file was there.

Other Formats

Believe it or not, I had a somewhat difficult time getting a conclusive answer to whether other archives, compressed or not, worked with archivemount . After searching several sites, I ended up relying on the AI output from Google. I was able to verify some of the formats, but not all of them (I just couldn't find details or test them myself):

zip
.tar.bz
ISO 9660
RAR
7-zipcpiocompress
xz compression with tar
zstd compression with tar

I’ve worked with zip and .tar.bz2 , so I can stand by those, but give the others a try if you like.

Limitations

Now I come to the part of the article where I state some of the archivemount limitations. The first is speed: It is not fast. You don’t want to have an application in an archivemount -mounted volume that does a great deal of local I/O and expect to have the same or similar performance as with a regular filesystem such as ext4 or XFS, because archivemount filesystems will be noticeably slowed under heavy I/O pressure.

If you are using archivemount to save storage space, you might be able to achieve some savings, but it really depends on the data you are compressing. Some data doesn’t compress much at all, such as binary data, but sometimes compression tools can compress this data if the the tool uses a large “window.” The window determines how much data is examined at one time to look for repeated bit strings.

On the other hand, source code can be compressed quite a bit because it’s just text, and compression engines usually have an easy time finding repeated bit strings. In this case, the tools do not save the repeated strings (i.e., the data is compressed). Your mileage may vary, so test your data with various compression or archive tools to see if you save any space. Even if you don't save any space, you can use archivemount for other reasons.

Another limitation is that you should not use archivemount for archives that have a very large number of files. In some questions online, people have tried mounting an archive in the terabyte range with millions of files and have waited a long time to get it to mount. Most of the time they kill the command. Just imagine if it had to rebuild the compressed archive when it was unmounted!

Instead, you can use other methods for mounting an archive with a huge number of files, such as mounting the archive as read-only as a SquashFS filesystem, which might be a bit faster. Also, you could split the archive into parts on the basis of logical partitioning and mount only the archive you need. If these options don’t work, just buy lots and lots of storage and splat the archive all over a regular filesystem (they handle billions of files now).

archivemount Project Scenario

The point of this article is to use archivemount for projects. I use it as follows:

Mount the project to a local directory when I get on the system. Although I do this manually, you could automate this as part of the login process. In fact, you could have the login process read a file that has a list of the archive to mount, so you only have to modify the login process once.
Go to the mounted archive and start working.
When finished, but before loggin off or shutting down the system, I unmount the archive to create a new archive containing all of the updates.

The archive will have the same name as before, so I put the current date (day, month, and year) in the name of the archive. The next time I work on the archive, I first make a copy of the most recent archive with that day’s date and use archivemount to mount that archive. In this way, I can copy the previous version to a backup or have a backup tool do it for me.

Summary

Several years ago I wrote about archivemount as a way to mount a compressed archive and extract the file(s) needed. At the time, that was the problem I was tackling, but since then, I’ve been using archivemount to create compressed archives by project. I can mount the archive, do my code development and perhaps some light testing, and then unmount it, which saves all of my changes. Of course, I’m not working with millions of files in the archive, nor am I looking for high-performance I/O, so archivemount matches my needs. I find this method works really well and saves some storage space at the same time. In my academic years, storage space was at a premium, so you worked to save space as much as possible to avoid having the system administrator hunt you down.

I’ve found that archivemount is a great tool for the development process or for cases in which I need to look at specific files in an existing archive: something to try yourself.