Lead Image © Ulrich Krellner, 123RF.com

Lead Image © Ulrich Krellner, 123RF.com

Managing Linux Filesystems

Rank and File

Article from ADMIN 33/2016
By , By
Linux filesystems range from block-based network filesystems, to temporary filesystems in RAM, to pseudo filesystems. We explain how filesystems are set up and how to manage them.

Imagine a filesystem as a library that stores data efficiently and in a structured way. Without filesystems, persistent data would not be possible. Virtually every Linux system has at least one block-based filesystem (e.g., ext4, XFS, Btrfs). Block-based means that an underlying physical data store is involved, such as a hard drive, solid-state drive (SSD), or SD card. Linux has a number of filesystems from which to choose, and the ext2/3/4 series is likely known by everyone. If you work with a current distribution, you have probably met other filesystems, too (Table 1).

Table 1

Standard Filesystems

Distribution Filesystem
Debian (from v7.0 wheezy) ext4
Ubuntu (from v9.04) ext4
Fedora (from v22) XFS
SLES (from v12) Btrfs for the root partition, XFS for data partitions
RHEL 7 XFS

Most filesystems are very similar and differ only in detail. The following terms will help you understand them:

  • Superblock: Stores metadata about a filesystem, such as the total number of blocks and inodes, block sizes, UUIDs, and timestamps.
  • Inode: An index node , which comprises metadata associated with a file, such as permissions, owners, timestamps, and so on. In addition to this descriptive information, an inode can contain direct extents (data) or refer to another inode.
  • Extents: An area of storage reserved for a file. Older filesystems used direct and indirect blocks to reference blocks of data, whereas modern filesystems use a more efficient method with extents [1]. Extent mapping is a more efficient way to map logical filesystem blocks to physical blocks.
  • Journaling: A method of tracking changes that have not yet been committed to the filesystem. A journal comes into its own in exceptional situations, such as during the recovery of filesystems that have crashed (e.g., because of a sudden power failure). Journaling ensures filesystem consistency, because operations recorded in the journal are either performed in full or not at all. With this information, you can get back to a consistent state faster without having to go through a lengthy filesystem check.

From RAM to Persistent Memory

Random access memory (RAM) has speed advantages over hard drives and SSDs; therefore, the Linux kernel uses a caching mechanism that keeps data in RAM to reduce disk access. This cache is known as the page cache; running the free command reveals its current size (Listing 1). At first glance, 2.7GB of 7.7GB of RAM is available to the system. If the RAM usage for the page cache is deducted, then actually 5.6GB is free. The page cache thus occupies 2.7GB (cached column). The buffers column also belongs to the page cache; buffers is where cached filesystem metadata resides.

Listing 1

Free Space

$ free -h
                      total       used       free       shared       buffers       cached
Mem:                  7.7G        4.9G       2.7G       228M         203M          2.7G
-/+ buffers/cache:    2.1G        5.6G
Swap:                 1.0G          0B       1.0G

The page cache consists of physical pages in RAM, whose data pages are associated with a block device. The page cache size is always dynamic, because it uses any RAM that is not being used by the operating system. If the system suffers from high memory consumption, the page cache size is reduced, freeing up memory for applications.

The page cache is a write-back cache, which means it buffers both read and write data. A read from the block device propagates the data to the cache, which is then passed to the application. A write access lands directly in the page cache and not immediately on the block device. Data pages modified while in the page cache are called "dirty pages," because the modified data has not yet been written to persistent storage. Gradually, the Linux kernel writes data from RAM to the block device.

In addition to periodically writing data through the kernel, ext4 explicitly synchronizes its data and metadata using an interval of five seconds by default. You can change the sync time if necessary with the commit option to the mount command (see the ext4 documentation at kernel.org [2]). In the worst case, the data still in the RAM is lost in a sudden power outage. The longer the commit interval, the greater the risk of data loss.

The use of RAM as a cache provides huge performance advantages for the user. Don't forget, however, that RAM is volatile and not persistent. This fact forced itself into the awareness of many ext4 users recently when the "data corruption caused by unwritten and delayed extents" bug caused a stir [3]. On ext4, ephemeral files may never even reach the block device [4] under certain circumstances because of "delayed allocation."

Unlike ext3, ext4 delays allocating physical write blocks so the filesystem can accumulate data and allocate contiguous blocks later. This method gains the user a speed advantage when reading and writing the data while in RAM. Because ext4 cannot write unallocated blocks, they depend on the kernel to flush them out, which can translate to minutes in RAM instead of five seconds. Ext4 is not the only filesystem that uses this acceleration action: XFS, ZFS, and Btrfs also use delayed allocation (Table 2).

Table 2

Overview of Functional Filesystem Differences

  ext3 ext4 XFS Btrfs
Production-ready X X X Partially
Utilities package e2fsprogs xfsprogs btrfs-progs
Filesystem utilities mke2fs, resize2fs, e2fsck, tune2fs mkfs.xfs, xfs_growfs, xfs_repair, xfs_admin mkfs.btrfs, btrfs resize, btrfsck, btrfs filesystem
Maximum filesystem size 16TiB 1EiB 16EiB 16EiB
Maximum file size: 2TiB 1EiB 8EiB 8EiB
Expand on the fly X X X X
Shrink on the fly X
Expand offline X X
Shrink offline X X
Discard (ATA trim) [5] X X X X
Metadata CRC [6] X X X X
Data CRC X
Snapshots/clones/internal RAID/compression X

ext4

As the successor to ext3, ext4 is one of the most popular Linux filesystems. Although ext3 is slowly reaching its limits, with a maximum filesystem size of 16 tebibytes (TiB; slightly more than 16TB), ext4 provides sufficient space for many years with up to 1 exbibyte (EiB) capacity.

To create a new ext4 filesystem, you need an unused block device. You can simply use a spare partition (e.g., /dev/sdb1 if you have created an unused partition on the second disk) or an LVM logical volume. In the following examples, we use a logical volume (/dev/vg00/ext4fs), which means we can also expand and shrink the filesystem.

With root privileges, run mkfs.ext4 to create the new filesystem:

mkfs.ext4 /dev/vg/00/ext4fs

A newly created ext4 filesystem requires that all inode tables and the journal do not contain data. The corresponding areas must therefore be reliably overwritten with zeros. This may take a fair amount of time for larger filesystems, especially with hard drives; however, to let you use a new filesystem as soon as possible, the ext4 developers have implemented what they refer to as "lazy initialization," or initialization that occurs not when you create a filesystem, but in the background when you first mount the filesystem. Little wonder then that you suddenly notice I/O activity on mounting a new filesystem.

Caution is therefore advised if you want to perform performance tests with a newly created filesystem. In such cases, you should not create the filesystem with lazy initialization; instead, you should use the following parameters:

mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0 /dev/vg00/ext4fs

To mount the filesystem, create an appropriate mountpoint up front and then run the mount command:

mkdir /mnt/ext4fs
mount /dev/vg00/ext4fs /mnt/ext4fs

If you want to mount the new filesystem automatically at boot time, add a corresponding entry in the /etc/fstab file. You can optionally specify the -o parameter for the mount command (e.g., to mount a partition as read-only). For the list of possible options, see the kernel.org ext4 documentation [2]. Once the filesystem is mounted, /proc/mounts only shows a few options (rw,relatime,data=ordered) that need to run with the mount command or exist in /etc/fstab (e.g., errors = remount-ro) to be enabled:

# cat /proc/mounts | grep ext4
/dev/sda1 / ext4 rw,relatime,errors=remount-ro,data=ordered 0 0
/dev/mapper/vg00-ext4fs /mnt/ext4fs ext4 rw,relatime,data=ordered 0 0

In addition to these options, however, other standard options are active. Since Linux kernel version 3.4, you can now also view options in the /proc filesystem. Listing 2 shows an example.

Listing 2

/proc Filesystem Info

# cat /proc/fs/ext4/sda1/options
rw
delalloc
barrier
user_xattr
acl
resuid=0
resgid=0
errors=remount-ro
commit=5
min_batch_time=0
max_batch_time=15000
stripe=0
data=ordered
inode_readahead_blks=32
init_itable=10
max_dir_size_kb=0

Filesystem Check

After completing the most important setup steps, the advanced administration activities start with a filesystem check. When you run a check, the corresponding ext4 filesystem must not be mounted. You simply run the check using the e2fsck program; as an alternative, you can also use the symbolic link fsck.ext4. If the filesystem was not properly unmounted, the check terminates; alternatively, you can force validation with the -f parameter.

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Read-only file compression with SquashFS
    If you are an intensive, or even a typical, computer user, you store an amazing amount of data on your personal computers, servers, and HPC systems that you rarely touch. SquashFS is an underestimated filesystem that can address that needed, but little used, data.
  • Read-only File Compression with SquashFS

    If you are an intensive, or even a typical, computer user, you store an amazing amount of data on your personal systems, servers, and HPC systems that you rarely touch. SquashFS is an underestimated filesystem that can address that needed, but little used, data.

  • Manage logical volumes with GUI tools
    Linux uses the Logical Volume Manager to manage large hard drives and mass storage clusters efficiently. We look at various graphical tools that help serve up logical volumes and volume groups.
  • Filesystem Encryption

    The revelation of wide-spread government snooping has sparked a renewed interest in data storage security via encryption. In this article, we review some options for encrypting files, directories, and filesystems on Linux.

  • Tuning Your Filesystem’s Cache

    Keeping your key files in RAM reduces latency and makes response time more predictable.

comments powered by Disqus