One way to store metadata is with the originating file in extended file attributes.

Extended File Attributes

Many people think they can make money from their data – and they can – but the data has to have more than the simple information in an inode. It needs to have a rich description of the data, as well. Although you could use a database, it would be more difficult for other applications to access that data.

For example, to use search tools more effectively, applications and users need more information about files in the metadata. Rather than describe a cat photo as “cat eating cheeseburger,” it should include more information, such as the color of the cat, what is in the background, what is on the cheeseburger (i.e., does it have bacon), whether the cat seems to be enjoying the cheeseburger, and so on.

The same is true for artificial intelligence (AI), machine learning (ML), and deep learning (DL). With more descriptive information about the data, applications can better utilize it for training models. More metadata can also reduce the time it takes to find useful data and removes human intervention, which removes an additional source of error (although removing humans from the process could introduce other problems).

Having standard methods for storing additional metadata with files – and not in a separate file or database – would be very useful.

What is an Extended Attribute?

Extended attributes are a system of additional data that can be added to (i.e., extend) a file or directory in a filesystem. If you like, extended attributes add metadata to a file or directory, going beyond the definition of the inode.

Many Linux filesystems can use extended attributes:

  • ext2
  • ext3
  • ext4
  • JFS
  • SquashFS
  • UBIFS
  • Yaffs
  • ReiserFS
  • Reiser4
  • XFS
  • Btrfs
  • OrangeFS
  • Lustre
  • OCFS2 1.6
  • ZFS
  • F2FS

Some of the filesystems have restrictions on extended file attributes, such as the amount of data that can be added, but all do allow for the addition of user-controlled metadata.

Any regular file or directory that uses one of the previously mentioned filesystems may have a list of extended file attributes. The attributes are usually in a key-value format.

The attributes have a name (the key) and some associated data (the attribute or “value”). The name starts with what is called a namespace identifier (more on that later), followed by a dot (.), then followed by a null-terminated string. You can add as many names separated by dots as you like to create “classes” of attributes.

Currently on Linux, four namespaces are used for extended file attributes:

  1. user
  2. trusted
  3. security
  4. system

The system namespace is used primarily by the kernel for access control lists (ACLs) and can only be set by root. For example, it will use names such as system.posix_acl_access and system.posix_acl_default for extended file attributes. The general wisdom is that unless you are using ACLs to store additional metadata, which you can do, you should not use the system namespace. However, I believe that the system namespace is a place for metadata controlled by root or for metadata that is immutable with respect to users.

The security namespace is used by SELinux. An example of a name in this namespace would be something like security.selinux.

To use the trusted extended attributes the application or user has to have CAP_SYS_ADMIN capability (e.g., the superuser).

The focus of this article is on the user extended attributes, which are meant to be used by the user and any application run by the user. The user namespace attributes are protected by the normal Unix user permission settings on the file. If you have write permission on the file, then you can set an extended attribute. If you give someone else read access to the file, they can read the extended attributes. If another user can write to the file, they can read, write, or delete any of the user extended attributes.

The following are a few examples to give you an idea of what you can name the extended file attributes for this namespace:

  • user.checksum.md5
  • user.checksum.sha1
  • user.checksum.sha256
  • user.original_author
  • user.application
  • user.project
  • user.comment

The first three examples could be used for storing checksums about the file with the three different checksum methods. The next example could be used to tag the originating author of the file, which can be useful if multiple people have write access to the file or the original author leaves the company and the file is assigned to another user. However, if multiple people have write access, they could change the extended attributes.

The final three examples could be used to list the application that was used to generate the data (e.g., output from an application that is something beyond file extensions), to add project information to a file, and to add the all-purpose general comment. From these few examples, you see that you can create some very useful metadata.

In the Linux kernel, names can be a maximum of 255 bytes and the value can be up to 65,536 bytes (64KiB). XFS and ReiserFS allow these limits; however, ext3/4 and Btrfs impose smaller limits so that all of the extended attributes (names and values) fit into one block (usually 4KiB).

Tools for Extended File Attributes

Linux has several useful tools for manipulating, setting, and getting extended attributes. They are usually included in the attr package that comes with most distributions, so be sure this package is installed on your system.

After making sure the attr package is installed, you should check that the kernel has attribute support, which should be turned on for almost every distribution you might use, although some specialized distributions might not have it turned on. If you build your own kernels, be sure it is turned on. If the kernel source is installed, you can just grep the kernel’s .config file for any ATTR attributes.

If you want to be absolutely positive that your kernel can accommodate extended file attributes, make sure the libattr package is installed, which should have been installed when you installed the attr package. However, I like to be thorough and check for it explicitly. This package is also good for debugging when you run into problems.

Finally, you need to make sure the filesystem you are going to use with extended attributes is mounted with the user_xattr option, which is not usually enabled out of the box by distributions. Be sure you use this option in /etc/fstab.

If you have satisfied all of these criteria, you can now use extended attributes. To test the tools and show what you can do with them, begin by creating a simple file that contains some dummy data,

$ echo "The quick brown fox" > ./test.txt
$ more test.txt
The quick brown fox

then add some extended attributes to this file:

$ setfattr -n user.comment -v "This is a comment" test.txt

This command sets the extended file attribute to the name user.comment. The -v option indicates you are going to assign a value to the attribute, followed by that value. The final option for the command is the name of the file.

You can determine the extended attributes of a file with the simple command getfattr:

$ getfattr test.txt
# file: test.txt
user.comment

Notice that this command only lists what extended attributes are defined for a particular file, not the values of the attributes. Also notice that it only listed the user attributes because the command was submitted as a regular user. If you ran the command as root, and system or security attributes were assigned, you would see those listed.

To see the values of a specific attribute, you use the -n option:

$ getfattr -n user.comment test.txt
# file: test.txt
user.comment="This is a comment"

If you want to remove an extended attribute, use the setfattr command, but use the -x option:

$ setfattr -x user.comment test.txt
$ getfattr -n user.comment test.txt
test.txt: user.comment: No such attribute

You can tell that the extended attribute no longer exists because of the return from the command.

To illustrate defining multiple extended attributes, you can add a second attribute to the original user.comment attribute:

$ setfattr -n user.comment.name -v "Jeff Layton created this file" test.txt

The list of extended attributes for this file can be created:

$ getfattr test.txt
# file: test.txt
user.comment
user.comment.name

Now, check the value of the second attribute, just to be sure:

$ getfattr -n user.comment.name test.txt
# file: test.txt
user.comment.name="Jeff Layton created this file"

NFS and Extended Attributes

In the past, one of the issues holding back more use of extended attributes was that they were not usable with NFS. If a non-local NFS client tried to access the extended attributes, they would find nothing. This situation limited extended attributes to local filesystems only. However, in Linux kernel version 5.9, support for extended attributes was added.

Extended Attributes and File Manipulation

Although it might seem trivial, when you manipulate a file with extended attributes, you need to make sure the tool accommodates the attributes. A simple example is copying files. Red Hat has a good page on how you copy files so that extended attributes are preserved. Fortunately, the userspace tools are capable of copying the attributes with a simple option:

$ cp --preserve=xattr

You can create an alias so this command is part of the default.

The mv command preserves xattrs (extended attributes), although it is only true when the target filesystem supports xattrs.The rsync command can preserver the xattrs with either of the following options:

$ rsync -X
$ rsync -xattrs

Again, make this a default option when using rsync. If you are using RHEL or CentOS, you need version rsync-3.1.2-10.el7 or later.

Other good news is that modern versions of tar also support extended attributes. You can use the command-line argument --xattrs to tell tar to store extended attributes in the archive. When you extract the archive, the extended attributes are read and applied to the files.

[X]OPS

Before finishing this article, I think it is appropriate to look at some use cases for extended attributes, which I refer to as [X]OPS, where X could be anything. It could be DevOps or MLOps, but in this article I will consider MLOps because it is topical.

You can probably get a good feel for MLOps by combining DevOps and ML, which deals with developing, training, deploying, and maintaining ML models. These kinds of models need to track how the input data is curated, the design and structure of the models, the training results, and the testing results. The steps are done continuously in a loop, to track the development of the input data and the development and training of the models. From this loop, a subset of models is usually selected and tested in beta and scored. Finally, a model or two are selected for production.

In the loop of testing and developing models, the production models have to be monitored and tweaked accordingly, which creates a big loop over the entire process. Throw in the addition of new data in the data set (sometimes remove, as well), and you can see that tracking is extremely important. Depending on the application area, you could run into federal regulations that require tracking.

Extended file attributes are a great way to add information to files with a variety of content:

  • Input data (origin, type, history, originator, license)
  • Model (type, reasons, owner, history, details)
  • Training results (training parameters (even defaults), as much training history as possible, test results, people, licenses)
  • Inference modeling (trained model optimization, inference test results, dates, history, licenses, people)
  • Deployment (dates, history, authorization)

These file attributes can be used as part of an MLOps solution if used in combination with appropriate permissions.

Summary

Extended file attributes allow extra metadata to be added to files and directories. Four “namespaces” can be used. Effectively, the only namespace that can be used by users or groups is the user namespace. Extended attribute permissions mirror those of the file or directory. If the file can be read by anyone (i.e., the world), then the extended attributes can be read by anyone. Likewise, if the group can write to the file, then the group can write to the extended file attributes.

However, with elevated privileges, the system administrator could add extended file attributes to the system namespace or perhaps change them (I haven’t tested this).

The tools to set and get extended file attributes come with virtually every Linux distribution. You just need to be sure they are installed with your distribution; then, you can set, retrieve, or erase as many extended file attributes as you like.

For a long time, common file tools such as mvcprsync, and tar did not support extended attributes. For example, if you copied a file with extended attributes, they were not copied to the new file. You could write a script that wrapped the cp command to copy over the extended attributes that users could access, but the privileged attributes would not be copied over. Fortunately, newer versions of the tools include extended attributes.

Even better, NFS in kernels 5.9 or later supports extended attributes, meaning an NFS client will be able to access the extended attributes of files on the NFS host.