Understanding inodes is key to a better understanding of HPC filesystems.

What Is an Inode?

If you are reading or learning about high-performance computing (HPC), where storage is a very important consideration, having a basic introduction to an inode is fairly important. In this article, I want to give you a high-level definition of an inode along with some additional details.

For clarity, I’ll start with the recent evolution of filesystems. Over time, more features have been added to filesystems, creating a “spectrum” of filesystems, from the simple to the really sophisticated. Moreover, filesystems now address specific usage models, so they are not so generalized and might not be POSIX compliant. In this article, I stay with the classic filesystems that use inodes, oftentimes referred to as block-oriented filesystems, which excludes object and pure key-value filesystems.

Filesystems

The classic POSIX or POSIX-like block-oriented filesystems, in general, have two parts: the data to be stored and the metadata (i.e., the “data” about the data). Everyone knows the first kind of data stored in filesystems. For some, that is a very large collection of cat photos or KC and the Sunshine Band recordings. The second part, the metadata, you don’t really see or have a visceral feel for. However, this metadata is a very key component of many filesystems. Think of the metadata as a database, of sorts, that contains the information about your data. More precisely, it includes information such as the file name; the date the file was modified or accessed; the file owner, group owner, and permissions; the blocks in the filesystem where the data resides; and so on. This type of information is key to a filesystem because, otherwise, you just have a bunch of bits on storage media, and you have no idea what is in those blocks and how files are spread across the blocks.

Inodes

In general for most *nix filesystems, each file or directory has its own metadata. (Remember that in *nix operating systems, directories are just files.) The metadata is generally a fixed-size data structure called an inode. Each inode is then assigned an inode number that is unique to that file. (Although sometimes the inodes aren't unique, the combination of the inode with other information makes the metadata about the file unique.)

For POSIX filesystems or most POSIX-compliant filesystems, the information in any inode is defined a priori, which allows applications and libraries to call a function that queries, creates, or deletes an inode; the information accessed is always the same.

The origin of the term “inode” is not known with any certainty. One of the original developers of Unix, Dennis Ritchie, said the following about the origin of the term:

In truth, … It was just a term that we started to use. “Index” is my best guess, because of the slightly unusual file system structure that stored the access information of files as a flat array on the disk, with all the hierarchical directory information living aside from this. Thus, the i-number is an index in this array, the i-node is the selected element of the array. (The “i-” notation was used in the 1st edition manual; its hyphen was gradually dropped.) [inodes Wikipedia page]

How the inodes in a filesystem are created depends on the specific filesystem. Several older filesystems create all of the inodes when the filesystem is created, resulting in a fixed number of inodes. For example, ext3 and ext4 filesystems do this. The result is that the filesystem has a fixed number of inodes, which then fixes the number of files or directories that can be held in the filesystem. For filesystems such as ext3 or ext4, it is possible use all of the inodes and still have free storage capacity in the filesystem. However, you won’t be able to store any more data because you have run out of inodes. (It doesn’t happen often, but it is theoretically possible.) If you need more inodes, you have to remake the filesystem, losing all the data already there.

Many recent filesystems use dynamic inode allocation; that is, they create inodes when they are needed. They typically start with a number of inodes, but as inodes are used, the filesystem creates more according to the heuristics of the filesystem. Typically, these inodes come at the expense of data storage capacity, but it is only a small percentage of the total capacity, so it is a reasonable trade-off. So that these filesystems don’t impose performance penalties, additional inodes are not created one at a time, but in blocks according to an algorithm included in the filesystem. A great example of this is the XFS filesystem.

To go even further, modern filesystems such as ZFS, OpenZFS, ReiserFS, and Btrfs don’t really have a fixed-size inode table. To be POSIX compatible or, at worst, mostly POSIX compliant, they provide an equivalent so that any stat-like command (see below) can be satisfied.

Inode Information

You have several easy ways of getting inode information. For example, you can see the inode numbers for files and directories simply by adding the -i switch with the ls command (Listing 1). The integer on the far left is the inode number associated with the file or directory. Remember that in *nix operating systems, everything is a file, including directories.

Listing 1: Listing the Inodes

$ ls -il
total 120920
42729565 drwxr-xr-x   7 laytonjb laytonjb     4096 May 15  2020 darshan-3.2.1
31872599 -rw-rw-r--   1 laytonjb laytonjb  3066907 Nov 20  2020 darshan-3.2.1.tar.gz
31992289 drwxrwxr-x   8 laytonjb laytonjb     4096 Jul 13  2021 darshan-darshan-3.3.1
31865359 -rw-rw-r--   1 laytonjb laytonjb  4053028 Jul 13  2021 darshan-darshan-3.3.1.tar.gz
32249740 drwxrwxr-x  17 laytonjb laytonjb    12288 Dec  4  2020 fio-fio-3.24
31863782 -rw-rw-r--   1 laytonjb laytonjb  1027274 Dec  1  2020 fio-fio-3.24.tar.gz
31872514 -rw-rw-r--   1 laytonjb laytonjb  4363294 Nov 20  2020 hydra-3.3.2.tar.gz
39588527 drwxrwxr-x.  5 laytonjb laytonjb     4096 Jul 13  2020 iozone3_490
31888630 -rw-rw-r--.  1 laytonjb laytonjb  4136960 Dec  9  2020 iozone3_490.tar
31984747 drwxrwxr-x  21 laytonjb laytonjb     4096 Nov 20  2020 Lmod-8.4.15
31863444 -rw-rw-r--   1 laytonjb laytonjb 19946519 Nov 20  2020 Lmod-8.4.15.tar.gz
31988342 drwxrwxr-x   2 laytonjb laytonjb     4096 Oct 27 14:22 mpibzip2-0.6
31988329 -rw-rw-r--   1 laytonjb laytonjb    92160 Oct 27 14:18 mpibzip2-0.6.tar
31872452 -rw-rw-r--   1 laytonjb laytonjb 27311775 Nov 20  2020 mpich-3.3.2.tar.gz
31870282 -rw-rw-r--   1 laytonjb laytonjb 18473572 Nov 20  2020 mvapich2-2.3.4.tar.gz
31984799 drwxrwxr-x  17 laytonjb laytonjb     4096 Nov 20  2020 OpenBLAS-0.3.10
31872561 -rw-rw-r--   1 laytonjb laytonjb 12246979 Nov 20  2020 OpenBLAS-0.3.10.tar.gz
31872449 -rw-rw-r--   1 laytonjb laytonjb 17163544 Nov 20  2020 openmpi-4.0.5.tar.gz
39322319 drwxrwxr-x   7 laytonjb laytonjb     4096 Oct 24  2020 psutil-release-5.7.3
32639755 drwxrwxr-x   2 laytonjb laytonjb     4096 Nov  6 10:25 pxz-master
31866302 -rw-rw-r--   1 laytonjb laytonjb    13228 Nov  6 10:25 pxz-master.zip
31865740 drwxrwxr-x   6 laytonjb laytonjb     4096 Jun 25  2021 pymp-master
31865276 -rw-rw-r--   1 laytonjb laytonjb    21738 Jun 25  2021 pymp-master.zip
31860913 -rw-rw-r--   1 laytonjb laytonjb  2831628 Nov 16  2020 remora-1.8.3.tar.gz
31850774 drwxrwxr-x   5 laytonjb laytonjb     4096 Nov 16  2020 remora-1.8.4
31861221 -rw-rw-r--   1 laytonjb laytonjb  2833018 Nov 16  2020 remora-1.8.4.tar.gz
31985247 drwxrwxr-x  20 laytonjb laytonjb     4096 Nov 23  2020 singularity-3.6.4
31872719 -rw-rw-r--   1 laytonjb laytonjb  6154050 Nov 23  2020 singularity-3.6.4.tar.gz

Each time a file or directory is created, an inode number is allocated, and the various entries of the inode are initialized or populated. Conversely, if a file or directory is deleted, the inode number is put back for reuse for a new file or directory.

To see at any time how many inodes exist, how many are used, and how many are free, you can query on a filesystem basis or for the entire system (Listing 2). As you can see, the second column is the number of inodes at the time of the query, the third column is the number of inodes in use (IUsed), and the fourth column is the number of free inodes (IFree). In the case of loopback devices, zero inodes available is expected because no more are needed.

Listing 2: Inode Info for Filesystem

$ df -i
Filesystem        Inodes    IUsed     IFree IUse% Mounted on
udev            32968052      916  32967136    1% /dev
tmpfs           32983590     1409  32982181    1% /run
/dev/nvme0n1p2  31227904   814030  30413874    3% /
tmpfs           32983590        6  32983584    1% /dev/shm
tmpfs           32983590        7  32983583    1% /run/lock
tmpfs           32983590       18  32983572    1% /sys/fs/cgroup
/dev/loop0            29       29         0  100% /snap/bare/5
/dev/loop3         10847    10847         0  100% /snap/core18/2284
/dev/loop2         10836    10836         0  100% /snap/core18/2253
/dev/loop1         12847    12847         0  100% /snap/core/12725
/dev/loop4         11777    11777         0  100% /snap/core20/1328
/dev/loop5         18500    18500         0  100% /snap/gnome-3-34-1804/77
/dev/loop6         18500    18500         0  100% /snap/gnome-3-34-1804/72
/dev/loop7         17441    17441         0  100% /snap/gnome-3-38-2004/87
/dev/nvme1n1p1  62513152  7087560  55425592   12% /home
/dev/nvme0n1p1         0        0         0     - /boot/efi
/dev/loop8         17495    17495         0  100% /snap/gnome-3-38-2004/99
/dev/sda1      183144448 38466772 144677676   22% /home2
/dev/loop9         64986    64986         0  100% /snap/gtk-common-themes/1515
/dev/loop11           14       14         0  100% /snap/gtk2-common-themes/13
/dev/loop10        24054    24054         0  100% /snap/p7zip-desktop/220
/dev/loop12        65095    65095         0  100% /snap/gtk-common-themes/1519
/dev/loop13        11777    11777         0  100% /snap/core20/1361
/dev/loop14          480      480         0  100% /snap/snapd/14978
/dev/loop15        17311    17311         0  100% /snap/snap-store/558
/dev/loop16        15841    15841         0  100% /snap/snap-store/547
tmpfs           32983590       49  32983541    1% /run/user/1000

If you are having trouble saving files to a filesystem, it’s a good idea to run df -i to see whether any free inodes are available. If not, depending on the filesystem type, you might have to copy all of the data from the existing filesystem, remake it with a larger number of inodes, and copy the data to the new filesystem.

Another command you can use for examining inode information is stat, which queries the inode information for a particular file or directory and returns some of this information to you (Listing 3).

Listing 3: Inode Info for File

<strong>$</strong> stat OpenBLAS-0.3.10.tar.gz
  File: OpenBLAS-0.3.10.tar.gz
  Size: 12246979   Blocks: 23920      IO Block: 4096   regular file
Device: 10302h/66306d Inode: 31872561    Links: 1
Access: (0664/-rw-rw-r--)  Uid: ( 1000/laytonjb)   Gid: ( 1000/laytonjb)
Access: 2021-06-14 02:53:51.268540668 -0400
Modify: 2020-11-20 09:08:56.928744633 -0500
Change: 2020-11-20 09:08:56.936745104 -0500
 Birth: -

The command does not output all elements of the specific inode corresponding to the file. For details on what it does output, look at the man pages with the man 2 stat command.

The output from stat gives you a fair amount of information, such as:

  • file name
  • size (in bytes)
  • size of the I/O block (4KiB in this case)
  • file type (in this case a regular file)
  • device (in hex and decimal)
  • number of hard links (1 in this case)
  • file permissions in numeric and symbolic
  • UID (user ID) of the file
  • GID (group ID) of the file
  • last time the file was accessed (the line below the permissions, UID, GID)
  • last time the file was modified
  • last time the file was changed
  • date of “Birth,” which isn't supported on Linux

Although I don’t discuss how filesystems are organized with inodes in this article, you can get more detail about the filesystem with the tune2fs command, including more inode information. An example of running the command on an ext4 filesystem is shown in Listing 4. You can scan through the output and pick out inode information, as well as other useful information.

Listing 4: tune2fs on ext4 Filesystem

$ sudo tune2fs -l /dev/nvme0n1p2
tune2fs 1.45.5 (07-Jan-2020)
Filesystem volume name:   
Last mounted on:          /
Filesystem UUID:          db7bca35-5c8d-4587-a2f8-ae0d7108d53d
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
Filesystem flags:         signed_directory_hash
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              31227904
Block count:              124895488
Reserved block count:     6244774
Free blocks:              100624000
Free inodes:              30413904
First block:              0
Block size:               4096
Fragment size:            4096
Group descriptor size:    64
Reserved GDT blocks:      1024
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
Flex block group size:    16
Filesystem created:       Sun Jan 31 09:38:44 2021
Last mount time:          Sun Feb 27 07:47:43 2022
Last write time:          Sun Feb 27 07:47:43 2022
Mount count:              296
Maximum mount count:      -1
Last checked:             Sun Jan 31 09:38:44 2021
Check interval:           0 ()
Lifetime writes:          811 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:          256
Required extra isize:     32
Desired extra isize:      32
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      23db69c1-dd63-4e11-9771-23b5f65ac46c
Journal backup:           inode blocks
Checksum type:            crc32c
Checksum:                 0xa90645bd

Summary

The concept of an inode is pretty fundamental to traditional filesystems within Linux and other *nix operating systems. Even for modern filesystems, the idea of an inode is important for POSIX compatibility. Conceptually, an inode is fairly easy to understand: It’s just the data about the data (i.e., metadata), such as the file and group owner, permissions, and several file timestamps.

Some filesystems (e.g., ext3 and ext4) create all the inodes at the time of their creation. Thus, you could “run out” of storage if all of the inodes are used, even though space is still available for more data. To help get around this problem, other filesystems (e.g., XFS) create inodes as needed.

Armed with the basic concepts of inodes, you can examine various filesystems and determine which is right for you. Also, grasping the concept of an inode helps you understand why something as simple as ls -l might take so long to respond.